API Reference
This section provides detailed documentation for all ggpubpy functions and classes.
Plotting Functions
ggpubpy: matplotlib Based Publication-Ready Plots
A Python library that provides easy-to-use functions for creating and customizing matplotlib-based, publication-ready plots with built-in statistical tests and automatic p-value or significance star annotations.
This project is directly inspired by R’s ggpubr package.
- ggpubpy.get_iris_palette() Dict[str, str][source]
Get the default color palette for iris species.
- Returns:
dict –
- Dictionary mapping species names to hex colors.
Examples
——–
>>> from ggpubpy.datasets import get_iris_palette
>>> palette = get_iris_palette()
>>> print(palette)
{‘setosa’ (‘#00AFBB’, ‘versicolor’: ‘#E7B800’, ‘virginica’: ‘#FC4E07’})
- ggpubpy.get_titanic_palette() Dict[str, Dict[str, str]][source]
Get the default color palette for Titanic dataset categories.
- Returns:
Dictionary mapping category names to hex colors.
- Return type:
Examples
>>> from ggpubpy.datasets import get_titanic_palette >>> palette = get_titanic_palette() >>> print(palette) {'Survived': {'0': '#E74C3C', '1': '#2ECC71'}, 'Pclass': {'1': '#F39C12', '2': '#3498DB', '3': '#9B59B6'}, 'Sex': {'male': '#3498DB', 'female': '#E91E63'}}
- ggpubpy.list_datasets() Dict[str, Any][source]
List all available datasets with descriptions.
- Returns:
Dictionary with dataset names as keys and descriptions as values.
- Return type:
- ggpubpy.load_iris() DataFrame[source]
Load the famous iris dataset.
The iris dataset contains measurements of sepal and petal dimensions for three species of iris flowers (setosa, versicolor, virginica).
- Returns:
DataFrame with columns: sepal_length, sepal_width, petal_length, petal_width, species.
- Return type:
pd.DataFrame
Examples
>>> from ggpubpy.datasets import load_iris >>> iris = load_iris() >>> iris.head()
- ggpubpy.load_titanic() DataFrame[source]
Load the famous Titanic dataset.
The Titanic dataset contains information about passengers aboard the RMS Titanic, including survival status, passenger class, age, gender, and other details.
- Returns:
DataFrame with columns: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked.
- Return type:
pd.DataFrame
Examples
>>> from ggpubpy.datasets import load_titanic >>> titanic = load_titanic() >>> titanic.head()
- ggpubpy.plot_alluvial(*args: Any, **kwargs: Any) Any[source]
Create an alluvial (flow) diagram with explicit alluvium IDs.
- ggpubpy.plot_alluvial_with_stats(*args: Any, **kwargs: Any) Any[source]
Create an alluvial plot with optional statistical annotations.
- ggpubpy.plot_blandaltman(x, y, agreement: float = 1.96, xaxis: str = 'mean', confidence: float = 0.95, annotate: bool = True, ax=None, **kwargs)[source]
Create a Bland–Altman agreement plot between two measurements.
Returns the Matplotlib Axes for further customization.
- ggpubpy.plot_boxplot(*args: Any, **kwargs: Any) Any[source]
Create box plot with statistical annotations.
- ggpubpy.plot_boxplot_with_stats(*args: Any, **kwargs: Any) Any[source]
Create box plot with statistical annotations.
- ggpubpy.plot_correlation_matrix(*args: Any, **kwargs: Any) Any[source]
Create a correlation matrix plot with scatter plots and correlation values.
- ggpubpy.plot_shift(x: Any, y: Any, *args: Any, **kwargs: Any) Any[source]
Shift plot comparing two distributions.
- ggpubpy.plot_violin(*args: Any, **kwargs: Any) Any[source]
Create violin plot with statistical annotations.
- ggpubpy.plot_violin_with_stats(*args: Any, **kwargs: Any) Any[source]
Create violin plot with statistical annotations.
- ggpubpy.qqplot(x, dist: str | object = 'norm', sparams=(), confidence: float | bool = 0.95, square: bool = True, ax=None, **kwargs)[source]
Create a Q–Q plot with optional confidence envelope and regression line.
Parameters are consistent with scipy-based Q–Q plots. Returns the Matplotlib Axes.
- ggpubpy.significance_stars(*args: Any, **kwargs: Any) Any[source]
Convert p-values to significance stars.
Box Plot Functions
Boxplot functionality for ggpubpy.
This module contains the boxplot function with statistical annotations.
- ggpubpy.boxplot.plot_boxplot_with_stats(df: DataFrame, x: str, y: str, *, x_label: str | None = None, y_label: str | None = None, title: str | None = None, subtitle: str | None = None, order: List | None = None, palette: Dict | None = None, figsize: Tuple[int, int] = (6, 6), add_jitter: bool = True, jitter_std: float = 0.04, alpha: float | None = None, box_width: float = 0.6, global_test: bool = True, pairwise_test: bool = True, parametric: bool = False) Tuple[Figure, Axes][source]
Draw a colored boxplot with jittered points and statistical annotations.
- Parameters:
df (pd.DataFrame) – Your data.
x (str) – Column name for categories (must be categorical).
y (str) – Column name for numeric values.
x_label (str, optional) – Axis labels. Defaults to column names.
y_label (str, optional) – Axis labels. Defaults to column names.
title (str, optional) – Overall plot title and optional subtitle.
subtitle (str, optional) – Overall plot title and optional subtitle.
order (list, optional) – Order of x categories. Defaults to sorted unique values.
palette (dict, optional) – Mapping from category -> color.
figsize (tuple) – Figure size.
add_jitter (bool) – Whether to add jittered points.
jitter_std (float) – Standard deviation for horizontal jitter.
alpha (float, optional) – Transparency for jittered points (0-1). Defaults to 0.7.
box_width (float) – Width of each box in the plot.
global_test (bool) – Whether to perform and display global statistical test.
pairwise_test (bool) – Whether to perform and display pairwise comparisons.
parametric (bool) – If True, use parametric tests (ANOVA + t-test). If False, use non-parametric tests (Kruskal-Wallis + Mann-Whitney U).
- Return type:
tuple (figure, axes) matplotlib objects.
Violin Plot Functions
Violin plot functionality for ggpubpy.
This module contains the violin plot function with statistical annotations.
- ggpubpy.violinplot.plot_violin_with_stats(df: DataFrame, x: str, y: str, *, x_label: str | None = None, y_label: str | None = None, title: str | None = None, subtitle: str | None = None, order: List | None = None, palette: Dict | None = None, figsize: Tuple[int, int] = (6, 6), figsize_scale: float = 1.0, add_jitter: bool = True, jitter_std: float = 0.04, alpha: float | None = None, violin_width: float = 0.6, box_width: float = 0.15, global_test: bool = True, pairwise_test: bool = True, parametric: bool = False) Tuple[Figure, Axes][source]
Draw a violin + boxplot + jitter + stats.
- Parameters:
df (pd.DataFrame) – Your data.
x (str) – Categorical column name.
y (str) – Numeric column name.
x_label (str, optional) – Custom label for the x-axis.
y_label (str, optional) – Custom label for the y-axis.
title (str, optional) – Overall plot title and optional subtitle.
subtitle (str, optional) – Overall plot title and optional subtitle.
order (list, optional) – Order of x categories. Defaults to sorted unique values.
palette (dict, optional) – Mapping from category -> color.
figsize (tuple) – Figure size.
figsize_scale (float) – Scale factor for figure size.
add_jitter (bool) – Whether to add jittered points.
jitter_std (float) – Standard deviation for horizontal jitter.
alpha (float, optional) – Transparency for jittered points (0-1). Defaults to 0.6.
violin_width (float) – Width of violin plots.
box_width (float) – Width of boxplots inside violins.
global_test (bool) – Whether to perform global statistical test.
pairwise_test (bool) – Whether to perform pairwise statistical tests.
parametric (bool) – If True, use parametric tests (ANOVA + t-test). If False, use non-parametric tests (Kruskal-Wallis + Mann-Whitney U).
- Returns:
(figure, axes) matplotlib objects.
- Return type:
Shift Plot Functions
Shift plot functionality for ggpubpy.
This module contains the shift plot function for comparing distributions.
- ggpubpy.shiftplot.plot_shift(x: ndarray, y: ndarray, *, paired: bool = False, n_boot: int = 1000, percentiles: ndarray = array([10, 20, 30, 40, 50, 60, 70, 80, 90]), confidence: float = 0.95, seed: int | None = None, show_median: bool = True, violin: bool = True, show_quantiles: bool = False, show_quantile_diff: bool = False, parametric: bool = False, x_name: str = 'X', y_name: str = 'Y', x_label: str | None = None, y_label: str | None = None, title: str | None = None, subtitle: str | None = None, color: str | None = None, line_color: str | None = None, alpha: float | None = None, figsize: Tuple[float, float] | None = None) Figure[source]
Shift plot.
- Parameters:
x (array_like) – First and second set of observations.
y (array_like) – First and second set of observations.
paired (bool) – If True, x and y are paired samples.
n_boot (int) – Number of bootstrap iterations.
percentiles (array_like) – Sequence of percentiles (0-100) to compute.
confidence (float) – Confidence level for intervals.
seed (int or None) – Random seed.
show_median (bool) – If True, show median lines. violin : bool If True, plot half-violin densities.
show_quantiles (bool) – If True, show quantile connection lines between distributions.
show_quantile_diff (bool) – If True, show bottom subplot with quantile differences.
parametric (bool) – If True, use t-test; else Mann-Whitney U test.
- Returns:
fig
- Return type:
matplotlib Figure instance
Correlation Matrix Functions
Correlation matrix functionality for ggpubpy.
This module contains the correlation matrix plot function.
- ggpubpy.correlation_matrix.plot_correlation_matrix(df: DataFrame, columns: List[str] | None = None, *, figsize: Tuple[int, int] = (10, 10), color: str = '#2E86AB', alpha: float = 0.6, point_size: float = 20, show_stats: bool = True, method: str = 'pearson', title: str | None = None, subtitle: str | None = None) Tuple[Figure, ndarray][source]
Create a correlation matrix plot with scatter plots in lower triangle and correlation values in upper triangle and diagonal.
- Parameters:
df (pd.DataFrame) – Input dataframe with numeric columns.
columns (list of str, optional) – Specific columns to include. If None, all numeric columns are used.
figsize (tuple) – Figure size as (width, height).
color (str) – Color for scatter points.
alpha (float) – Transparency of scatter points (0-1).
point_size (float) – Size of scatter points.
show_stats (bool) – Whether to show statistical significance stars.
method (str) – Correlation method: ‘pearson’, ‘spearman’, or ‘kendall’.
title (str, optional) – Overall plot title and optional subtitle.
subtitle (str, optional) – Overall plot title and optional subtitle.
- Returns:
(figure, axes_array) matplotlib objects.
- Return type:
Alluvial Plot Functions
Alluvial plot functionality for ggpubpy.
This module contains the alluvial plot function for creating flow diagrams similar to ggalluvial in R.
- ggpubpy.alluvialplot.plot_alluvial(df: DataFrame, dims: List[str], value_col: str, color_by: str, id_col: str, *, orders: Dict[str, List[str]] | None = None, color_map: Dict[str, str] | None = None, title: str = '', subtitle: str = '', figsize: Tuple[int, int] = (9, 6), alpha: float = 0.8, x_label: str = 'Demographic', y_label: str = 'Frequency') Tuple[Figure, Axes][source]
Create an alluvial (flow) diagram with explicit alluvium IDs.
This function creates a flow diagram similar to ggalluvial in R, where each unique value of id_col represents one flow (alluvium) between categorical dimensions.
- Parameters:
df (pd.DataFrame) – Input data containing the dimensions, values, and identifiers.
dims (List[str]) – List of column names representing the dimensions (axes) of the flow.
value_col (str) – Column name containing the frequency/weight values for each flow.
color_by (str) – Column name to use for coloring the flows.
id_col (str) – Column name containing unique identifiers for each flow (alluvium).
orders (Dict[str, List[str]], optional) – Dictionary mapping dimension names to ordered lists of category values. If not provided, categories will be ordered by their appearance in data.
color_map (Dict[str, str], optional) – Dictionary mapping category values to colors. If not provided, a default palette will be used.
title (str, default "") – Main title for the plot.
subtitle (str, default "") – Subtitle for the plot.
figsize (Tuple[int, int], default (9, 6)) – Figure size in inches.
alpha (float, default 0.8) – Transparency level for the flow polygons.
x_label (str, default "Demographic") – Label for the x-axis.
y_label (str, default "Frequency") – Label for the y-axis.
- Returns:
Matplotlib figure and axes objects.
- Return type:
Tuple[plt.Figure, plt.Axes]
Examples
>>> from ggpubpy import load_titanic >>> import pandas as pd >>> import numpy as np >>> >>> # Load and prepare Titanic data >>> titanic = load_titanic() >>> titanic = titanic.dropna(subset=["Age"]) >>> titanic["Class"] = titanic["Pclass"].map({1: "1st", 2: "2nd", 3: "3rd"}) >>> titanic["AgeCat"] = np.where(titanic["Age"] < 18, "Child", "Adult") >>> titanic["Survived"] = titanic["Survived"].astype(str).replace({"0": "No", "1": "Yes"}) >>> >>> # Create frequency table with alluvium IDs >>> titanic_tab = (titanic.groupby(["Class", "Sex", "AgeCat", "Survived"]) ... .size() ... .reset_index(name="Freq") ... .rename(columns={"AgeCat": "Age"})) >>> titanic_tab["alluvium"] = titanic_tab.index >>> >>> # Create alluvial plot >>> fig, ax = plot_alluvial( ... titanic_tab, ... dims=["Class", "Sex", "Age"], ... value_col="Freq", ... color_by="Survived", ... id_col="alluvium", ... orders={"Class": ["1st", "2nd", "3rd"], ... "Sex": ["male", "female"], ... "Age": ["Child", "Adult"]}, ... color_map={"No": "#F17C7E", "Yes": "#6CCECB"}, ... title="Titanic Survival Analysis", ... subtitle="Class → Sex → Age", ... alpha=0.7 ... ) >>> plt.show()
- ggpubpy.alluvialplot.plot_alluvial_with_stats(df: DataFrame, dims: List[str], value_col: str, color_by: str, id_col: str, *, orders: Dict[str, List[str]] | None = None, color_map: Dict[str, str] | None = None, title: str = '', subtitle: str = '', figsize: Tuple[int, int] = (9, 6), alpha: float = 0.8, x_label: str = 'Demographic', y_label: str = 'Frequency', add_stats: bool = True) Tuple[Figure, Axes][source]
Create an alluvial plot with optional statistical annotations.
This is a wrapper around plot_alluvial that can add statistical information to the plot. Currently, this function is identical to plot_alluvial, but provides a consistent interface for future statistical enhancements.
- Parameters:
df (pd.DataFrame) – Input data containing the dimensions, values, and identifiers.
dims (List[str]) – List of column names representing the dimensions (axes) of the flow.
value_col (str) – Column name containing the frequency/weight values for each flow.
color_by (str) – Column name to use for coloring the flows.
id_col (str) – Column name containing unique identifiers for each flow (alluvium).
orders (Dict[str, List[str]], optional) – Dictionary mapping dimension names to ordered lists of category values.
color_map (Dict[str, str], optional) – Dictionary mapping category values to colors.
title (str, default "") – Main title for the plot.
subtitle (str, default "") – Subtitle for the plot.
figsize (Tuple[int, int], default (9, 6)) – Figure size in inches.
alpha (float, default 0.8) – Transparency level for the flow polygons.
x_label (str, default "Demographic") – Label for the x-axis.
y_label (str, default "Frequency") – Label for the y-axis.
add_stats (bool, default True) – Whether to add statistical annotations (currently not implemented).
- Returns:
Matplotlib figure and axes objects.
- Return type:
Tuple[plt.Figure, plt.Axes]
Dataset Functions
Built-in datasets for ggpubpy examples and testing.
This module provides easy access to commonly used datasets for demonstration and testing purposes.
- ggpubpy.datasets.get_iris_palette() Dict[str, str][source]
Get the default color palette for iris species.
- Returns:
dict –
- Dictionary mapping species names to hex colors.
Examples
——–
>>> from ggpubpy.datasets import get_iris_palette
>>> palette = get_iris_palette()
>>> print(palette)
{‘setosa’ (‘#00AFBB’, ‘versicolor’: ‘#E7B800’, ‘virginica’: ‘#FC4E07’})
- ggpubpy.datasets.get_titanic_palette() Dict[str, Dict[str, str]][source]
Get the default color palette for Titanic dataset categories.
- Returns:
Dictionary mapping category names to hex colors.
- Return type:
Examples
>>> from ggpubpy.datasets import get_titanic_palette >>> palette = get_titanic_palette() >>> print(palette) {'Survived': {'0': '#E74C3C', '1': '#2ECC71'}, 'Pclass': {'1': '#F39C12', '2': '#3498DB', '3': '#9B59B6'}, 'Sex': {'male': '#3498DB', 'female': '#E91E63'}}
- ggpubpy.datasets.list_datasets() Dict[str, Any][source]
List all available datasets with descriptions.
- Returns:
Dictionary with dataset names as keys and descriptions as values.
- Return type:
- ggpubpy.datasets.load_iris() DataFrame[source]
Load the famous iris dataset.
The iris dataset contains measurements of sepal and petal dimensions for three species of iris flowers (setosa, versicolor, virginica).
- Returns:
DataFrame with columns: sepal_length, sepal_width, petal_length, petal_width, species.
- Return type:
pd.DataFrame
Examples
>>> from ggpubpy.datasets import load_iris >>> iris = load_iris() >>> iris.head()
- ggpubpy.datasets.load_titanic() DataFrame[source]
Load the famous Titanic dataset.
The Titanic dataset contains information about passengers aboard the RMS Titanic, including survival status, passenger class, age, gender, and other details.
- Returns:
DataFrame with columns: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked.
- Return type:
pd.DataFrame
Examples
>>> from ggpubpy.datasets import load_titanic >>> titanic = load_titanic() >>> titanic.head()
Helper Functions
Helper functions for ggpubpy plotting modules.
This module contains shared utility functions used across different plotting modules.
- ggpubpy.helper.format_p_value(p: float) str[source]
Format p-value for display in statistical annotations.