API Reference

This section provides detailed documentation for all ggpubpy functions and classes.

Plotting Functions

ggpubpy: matplotlib Based Publication-Ready Plots

A Python library that provides easy-to-use functions for creating and customizing matplotlib-based, publication-ready plots with built-in statistical tests and automatic p-value or significance star annotations.

This project is directly inspired by R’s ggpubr package.

ggpubpy.get_iris_palette() → Dict[str, str][source]

Get the default color palette for iris species.

Returns:

dict –

Dictionary mapping species names to hex colors.
Examples
——–
>>> from ggpubpy.datasets import get_iris_palette
>>> palette = get_iris_palette()
>>> print(palette)
{‘setosa’ (‘#00AFBB’, ‘versicolor’: ‘#E7B800’, ‘virginica’: ‘#FC4E07’})

ggpubpy.get_titanic_palette() → Dict[str, Dict[str, str]][source]

Get the default color palette for Titanic dataset categories.

Returns:: Dictionary mapping category names to hex colors.
Return type:: dict

Examples

>>> from ggpubpy.datasets import get_titanic_palette
>>> palette = get_titanic_palette()
>>> print(palette)
{'Survived': {'0': '#E74C3C', '1': '#2ECC71'}, 'Pclass': {'1': '#F39C12', '2': '#3498DB', '3': '#9B59B6'}, 'Sex': {'male': '#3498DB', 'female': '#E91E63'}}

ggpubpy.list_datasets() → Dict[str, Any][source]

List all available datasets with descriptions.

Returns:: Dictionary with dataset names as keys and descriptions as values.
Return type:: dict

ggpubpy.load_iris() → DataFrame[source]

Load the famous iris dataset.

The iris dataset contains measurements of sepal and petal dimensions for three species of iris flowers (setosa, versicolor, virginica).

Returns:: DataFrame with columns: sepal_length, sepal_width, petal_length, petal_width, species.
Return type:: pd.DataFrame

Examples

>>> from ggpubpy.datasets import load_iris
>>> iris = load_iris()
>>> iris.head()

ggpubpy.load_titanic() → DataFrame[source]

Load the famous Titanic dataset.

The Titanic dataset contains information about passengers aboard the RMS Titanic, including survival status, passenger class, age, gender, and other details.

Returns:: DataFrame with columns: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked.
Return type:: pd.DataFrame

Examples

>>> from ggpubpy.datasets import load_titanic
>>> titanic = load_titanic()
>>> titanic.head()

ggpubpy.plot_alluvial(*args: Any, **kwargs: Any) → Any[source]: Create an alluvial (flow) diagram with explicit alluvium IDs.

ggpubpy.plot_alluvial_with_stats(*args: Any, **kwargs: Any) → Any[source]: Create an alluvial plot with optional statistical annotations.

ggpubpy.plot_blandaltman(x, y, agreement: float = 1.96, xaxis: str = 'mean', confidence: float = 0.95, annotate: bool = True, ax=None, **kwargs)[source]

Create a Bland–Altman agreement plot between two measurements.

Returns the Matplotlib Axes for further customization.

ggpubpy.plot_boxplot(*args: Any, **kwargs: Any) → Any[source]: Create box plot with statistical annotations.

ggpubpy.plot_boxplot_with_stats(*args: Any, **kwargs: Any) → Any[source]: Create box plot with statistical annotations.

ggpubpy.plot_correlation_matrix(*args: Any, **kwargs: Any) → Any[source]: Create a correlation matrix plot with scatter plots and correlation values.

ggpubpy.plot_shift(x: Any, y: Any, *args: Any, **kwargs: Any) → Any[source]: Shift plot comparing two distributions.

ggpubpy.plot_violin(*args: Any, **kwargs: Any) → Any[source]: Create violin plot with statistical annotations.

ggpubpy.plot_violin_with_stats(*args: Any, **kwargs: Any) → Any[source]: Create violin plot with statistical annotations.

ggpubpy.qqplot(x, dist: str | object = 'norm', sparams=(), confidence: float | bool = 0.95, square: bool = True, ax=None, **kwargs)[source]

Create a Q–Q plot with optional confidence envelope and regression line.

Parameters are consistent with scipy-based Q–Q plots. Returns the Matplotlib Axes.

ggpubpy.significance_stars(*args: Any, **kwargs: Any) → Any[source]: Convert p-values to significance stars.

Box Plot Functions

Boxplot functionality for ggpubpy.

This module contains the boxplot function with statistical annotations.

ggpubpy.boxplot.plot_boxplot_with_stats(df: DataFrame, x: str, y: str, *, x_label: str | None = None, y_label: str | None = None, title: str | None = None, subtitle: str | None = None, order: List | None = None, palette: Dict | None = None, figsize: Tuple[int, int] = (6, 6), add_jitter: bool = True, jitter_std: float = 0.04, alpha: float | None = None, box_width: float = 0.6, global_test: bool = True, pairwise_test: bool = True, parametric: bool = False) → Tuple[Figure, Axes][source]

Draw a colored boxplot with jittered points and statistical annotations.

Parameters:

df (pd.DataFrame) – Your data.
x (str) – Column name for categories (must be categorical).
y (str) – Column name for numeric values.
x_label (str, optional) – Axis labels. Defaults to column names.
y_label (str, optional) – Axis labels. Defaults to column names.
title (str, optional) – Overall plot title and optional subtitle.
subtitle (str, optional) – Overall plot title and optional subtitle.
order (list, optional) – Order of x categories. Defaults to sorted unique values.
palette (dict, optional) – Mapping from category -> color.
figsize (tuple) – Figure size.
add_jitter (bool) – Whether to add jittered points.
jitter_std (float) – Standard deviation for horizontal jitter.
alpha (float, optional) – Transparency for jittered points (0-1). Defaults to 0.7.
box_width (float) – Width of each box in the plot.
global_test (bool) – Whether to perform and display global statistical test.
pairwise_test (bool) – Whether to perform and display pairwise comparisons.
parametric (bool) – If True, use parametric tests (ANOVA + t-test). If False, use non-parametric tests (Kruskal-Wallis + Mann-Whitney U).

Return type:

tuple (figure, axes) matplotlib objects.

Violin Plot Functions

Violin plot functionality for ggpubpy.

This module contains the violin plot function with statistical annotations.

ggpubpy.violinplot.plot_violin_with_stats(df: DataFrame, x: str, y: str, *, x_label: str | None = None, y_label: str | None = None, title: str | None = None, subtitle: str | None = None, order: List | None = None, palette: Dict | None = None, figsize: Tuple[int, int] = (6, 6), figsize_scale: float = 1.0, add_jitter: bool = True, jitter_std: float = 0.04, alpha: float | None = None, violin_width: float = 0.6, box_width: float = 0.15, global_test: bool = True, pairwise_test: bool = True, parametric: bool = False) → Tuple[Figure, Axes][source]

Draw a violin + boxplot + jitter + stats.

Parameters:

df (pd.DataFrame) – Your data.
x (str) – Categorical column name.
y (str) – Numeric column name.
x_label (str, optional) – Custom label for the x-axis.
y_label (str, optional) – Custom label for the y-axis.
title (str, optional) – Overall plot title and optional subtitle.
subtitle (str, optional) – Overall plot title and optional subtitle.
order (list, optional) – Order of x categories. Defaults to sorted unique values.
palette (dict, optional) – Mapping from category -> color.
figsize (tuple) – Figure size.
figsize_scale (float) – Scale factor for figure size.
add_jitter (bool) – Whether to add jittered points.
jitter_std (float) – Standard deviation for horizontal jitter.
alpha (float, optional) – Transparency for jittered points (0-1). Defaults to 0.6.
violin_width (float) – Width of violin plots.
box_width (float) – Width of boxplots inside violins.
global_test (bool) – Whether to perform global statistical test.
pairwise_test (bool) – Whether to perform pairwise statistical tests.
parametric (bool) – If True, use parametric tests (ANOVA + t-test). If False, use non-parametric tests (Kruskal-Wallis + Mann-Whitney U).

Returns:

(figure, axes) matplotlib objects.

Return type:

tuple

Shift Plot Functions

Shift plot functionality for ggpubpy.

This module contains the shift plot function for comparing distributions.

ggpubpy.shiftplot.plot_shift(x: ndarray, y: ndarray, *, paired: bool = False, n_boot: int = 1000, percentiles: ndarray = array([10, 20, 30, 40, 50, 60, 70, 80, 90]), confidence: float = 0.95, seed: int | None = None, show_median: bool = True, violin: bool = True, show_quantiles: bool = False, show_quantile_diff: bool = False, parametric: bool = False, x_name: str = 'X', y_name: str = 'Y', x_label: str | None = None, y_label: str | None = None, title: str | None = None, subtitle: str | None = None, color: str | None = None, line_color: str | None = None, alpha: float | None = None, figsize: Tuple[float, float] | None = None) → Figure[source]

Shift plot.

Parameters:

x (array_like) – First and second set of observations.
y (array_like) – First and second set of observations.
paired (bool) – If True, x and y are paired samples.
n_boot (int) – Number of bootstrap iterations.
percentiles (array_like) – Sequence of percentiles (0-100) to compute.
confidence (float) – Confidence level for intervals.
seed (int or None) – Random seed.
show_median (bool) – If True, show median lines. violin : bool If True, plot half-violin densities.
show_quantiles (bool) – If True, show quantile connection lines between distributions.
show_quantile_diff (bool) – If True, show bottom subplot with quantile differences.
parametric (bool) – If True, use t-test; else Mann-Whitney U test.

Returns:

fig

Return type:

matplotlib Figure instance

Correlation Matrix Functions

Correlation matrix functionality for ggpubpy.

This module contains the correlation matrix plot function.

ggpubpy.correlation_matrix.plot_correlation_matrix(df: DataFrame, columns: List[str] | None = None, *, figsize: Tuple[int, int] = (10, 10), color: str = '#2E86AB', alpha: float = 0.6, point_size: float = 20, show_stats: bool = True, method: str = 'pearson', title: str | None = None, subtitle: str | None = None) → Tuple[Figure, ndarray][source]

Create a correlation matrix plot with scatter plots in lower triangle and correlation values in upper triangle and diagonal.

Parameters:

df (pd.DataFrame) – Input dataframe with numeric columns.
columns (list of str, optional) – Specific columns to include. If None, all numeric columns are used.
figsize (tuple) – Figure size as (width, height).
color (str) – Color for scatter points.
alpha (float) – Transparency of scatter points (0-1).
point_size (float) – Size of scatter points.
show_stats (bool) – Whether to show statistical significance stars.
method (str) – Correlation method: ‘pearson’, ‘spearman’, or ‘kendall’.
title (str, optional) – Overall plot title and optional subtitle.
subtitle (str, optional) – Overall plot title and optional subtitle.

Returns:

(figure, axes_array) matplotlib objects.

Return type:

tuple

Alluvial Plot Functions

Alluvial plot functionality for ggpubpy.

This module contains the alluvial plot function for creating flow diagrams similar to ggalluvial in R.

ggpubpy.alluvialplot.plot_alluvial(df: DataFrame, dims: List[str], value_col: str, color_by: str, id_col: str, *, orders: Dict[str, List[str]] | None = None, color_map: Dict[str, str] | None = None, title: str = '', subtitle: str = '', figsize: Tuple[int, int] = (9, 6), alpha: float = 0.8, x_label: str = 'Demographic', y_label: str = 'Frequency') → Tuple[Figure, Axes][source]

Create an alluvial (flow) diagram with explicit alluvium IDs.

This function creates a flow diagram similar to ggalluvial in R, where each unique value of id_col represents one flow (alluvium) between categorical dimensions.

Parameters:

df (pd.DataFrame) – Input data containing the dimensions, values, and identifiers.
dims (List[str]) – List of column names representing the dimensions (axes) of the flow.
value_col (str) – Column name containing the frequency/weight values for each flow.
color_by (str) – Column name to use for coloring the flows.
id_col (str) – Column name containing unique identifiers for each flow (alluvium).
orders (Dict[str, List[str]], optional) – Dictionary mapping dimension names to ordered lists of category values. If not provided, categories will be ordered by their appearance in data.
color_map (Dict[str, str], optional) – Dictionary mapping category values to colors. If not provided, a default palette will be used.
title (str, default "") – Main title for the plot.
subtitle (str, default "") – Subtitle for the plot.
figsize (Tuple[int, int], default (9, 6)) – Figure size in inches.
alpha (float, default 0.8) – Transparency level for the flow polygons.
x_label (str, default "Demographic") – Label for the x-axis.
y_label (str, default "Frequency") – Label for the y-axis.

Returns:

Matplotlib figure and axes objects.

Return type:

Tuple[plt.Figure, plt.Axes]

Examples

>>> from ggpubpy import load_titanic
>>> import pandas as pd
>>> import numpy as np
>>>
>>> # Load and prepare Titanic data
>>> titanic = load_titanic()
>>> titanic = titanic.dropna(subset=["Age"])
>>> titanic["Class"] = titanic["Pclass"].map({1: "1st", 2: "2nd", 3: "3rd"})
>>> titanic["AgeCat"] = np.where(titanic["Age"] < 18, "Child", "Adult")
>>> titanic["Survived"] = titanic["Survived"].astype(str).replace({"0": "No", "1": "Yes"})
>>>
>>> # Create frequency table with alluvium IDs
>>> titanic_tab = (titanic.groupby(["Class", "Sex", "AgeCat", "Survived"])
...                    .size()
...                    .reset_index(name="Freq")
...                    .rename(columns={"AgeCat": "Age"}))
>>> titanic_tab["alluvium"] = titanic_tab.index
>>>
>>> # Create alluvial plot
>>> fig, ax = plot_alluvial(
...     titanic_tab,
...     dims=["Class", "Sex", "Age"],
...     value_col="Freq",
...     color_by="Survived",
...     id_col="alluvium",
...     orders={"Class": ["1st", "2nd", "3rd"],
...             "Sex": ["male", "female"],
...             "Age": ["Child", "Adult"]},
...     color_map={"No": "#F17C7E", "Yes": "#6CCECB"},
...     title="Titanic Survival Analysis",
...     subtitle="Class → Sex → Age",
...     alpha=0.7
... )
>>> plt.show()

ggpubpy.alluvialplot.plot_alluvial_with_stats(df: DataFrame, dims: List[str], value_col: str, color_by: str, id_col: str, *, orders: Dict[str, List[str]] | None = None, color_map: Dict[str, str] | None = None, title: str = '', subtitle: str = '', figsize: Tuple[int, int] = (9, 6), alpha: float = 0.8, x_label: str = 'Demographic', y_label: str = 'Frequency', add_stats: bool = True) → Tuple[Figure, Axes][source]

Create an alluvial plot with optional statistical annotations.

This is a wrapper around plot_alluvial that can add statistical information to the plot. Currently, this function is identical to plot_alluvial, but provides a consistent interface for future statistical enhancements.

Parameters:

df (pd.DataFrame) – Input data containing the dimensions, values, and identifiers.
dims (List[str]) – List of column names representing the dimensions (axes) of the flow.
value_col (str) – Column name containing the frequency/weight values for each flow.
color_by (str) – Column name to use for coloring the flows.
id_col (str) – Column name containing unique identifiers for each flow (alluvium).
orders (Dict[str, List[str]], optional) – Dictionary mapping dimension names to ordered lists of category values.
color_map (Dict[str, str], optional) – Dictionary mapping category values to colors.
title (str, default "") – Main title for the plot.
subtitle (str, default "") – Subtitle for the plot.
figsize (Tuple[int, int], default (9, 6)) – Figure size in inches.
alpha (float, default 0.8) – Transparency level for the flow polygons.
x_label (str, default "Demographic") – Label for the x-axis.
y_label (str, default "Frequency") – Label for the y-axis.
add_stats (bool, default True) – Whether to add statistical annotations (currently not implemented).

Returns:

Matplotlib figure and axes objects.

Return type:

Tuple[plt.Figure, plt.Axes]

Dataset Functions

Built-in datasets for ggpubpy examples and testing.

This module provides easy access to commonly used datasets for demonstration and testing purposes.

ggpubpy.datasets.get_iris_palette() → Dict[str, str][source]

Get the default color palette for iris species.

Returns:

dict –

Dictionary mapping species names to hex colors.
Examples
——–
>>> from ggpubpy.datasets import get_iris_palette
>>> palette = get_iris_palette()
>>> print(palette)
{‘setosa’ (‘#00AFBB’, ‘versicolor’: ‘#E7B800’, ‘virginica’: ‘#FC4E07’})

ggpubpy.datasets.get_titanic_palette() → Dict[str, Dict[str, str]][source]

Get the default color palette for Titanic dataset categories.

Returns:: Dictionary mapping category names to hex colors.
Return type:: dict

Examples

>>> from ggpubpy.datasets import get_titanic_palette
>>> palette = get_titanic_palette()
>>> print(palette)
{'Survived': {'0': '#E74C3C', '1': '#2ECC71'}, 'Pclass': {'1': '#F39C12', '2': '#3498DB', '3': '#9B59B6'}, 'Sex': {'male': '#3498DB', 'female': '#E91E63'}}

ggpubpy.datasets.list_datasets() → Dict[str, Any][source]

List all available datasets with descriptions.

Returns:: Dictionary with dataset names as keys and descriptions as values.
Return type:: dict

ggpubpy.datasets.load_iris() → DataFrame[source]

Load the famous iris dataset.

The iris dataset contains measurements of sepal and petal dimensions for three species of iris flowers (setosa, versicolor, virginica).

Returns:: DataFrame with columns: sepal_length, sepal_width, petal_length, petal_width, species.
Return type:: pd.DataFrame

Examples

>>> from ggpubpy.datasets import load_iris
>>> iris = load_iris()
>>> iris.head()

ggpubpy.datasets.load_titanic() → DataFrame[source]

Load the famous Titanic dataset.

The Titanic dataset contains information about passengers aboard the RMS Titanic, including survival status, passenger class, age, gender, and other details.

Returns:: DataFrame with columns: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked.
Return type:: pd.DataFrame

Examples

>>> from ggpubpy.datasets import load_titanic
>>> titanic = load_titanic()
>>> titanic.head()

Helper Functions

Helper functions for ggpubpy plotting modules.

This module contains shared utility functions used across different plotting modules.

ggpubpy.helper.format_p_value(p: float) → str[source]

Format p-value for display in statistical annotations.

Parameters:: p (float) – The p-value to format.
Returns:: Formatted p-value string. Shows “<0.001” for very small values.
Return type:: str

ggpubpy.helper.harrelldavis(x: ndarray, quantile: float | List[float] | ndarray = 0.5, axis: int = -1) → ndarray[source]: Harrell-Davis robust quantile estimator.

ggpubpy.helper.significance_stars(p: float) → str[source]

Convert a p-value into star notation.

Parameters:: p (float) – The p-value to convert.
Returns:: Star notation: “**” for p <= 1e-4, “*” for p <= 1e-3, “**” for p <= 0.01, “*” for p <= 0.05, “ns” for p > 0.05.
Return type:: str