# Alluvial Plots in ggpubpy Alluvial plots (also known as flow diagrams) are a type of visualization that shows how data flows between different categorical dimensions. They are particularly useful for showing relationships and transitions between categories. ## Features - **Flow visualization**: Shows how data moves between categorical dimensions - **Customizable colors**: Color flows by any categorical variable - **Flexible ordering**: Control the order of categories in each dimension - **Publication-ready**: Clean, professional appearance suitable for publications - **Statistical integration**: Ready for future statistical enhancements - **Bézier curves**: Smooth, aesthetically pleasing flow connections ## Basic Usage ```python from ggpubpy import plot_alluvial, load_titanic import pandas as pd import numpy as np import matplotlib.pyplot as plt # Load and prepare data titanic = load_titanic() titanic = titanic.dropna(subset=["Age"]) titanic["Class"] = titanic["Pclass"].map({1: "1st", 2: "2nd", 3: "3rd"}) titanic["AgeCat"] = np.where(titanic["Age"] < 18, "Child", "Adult") titanic["Survived"] = titanic["Survived"].astype(str).replace({"0": "No", "1": "Yes"}) # Create frequency table with alluvium IDs titanic_tab = (titanic.groupby(["Class", "Sex", "AgeCat", "Survived"]) \ .size() \ .reset_index(name="Freq") \ .rename(columns={"AgeCat": "Age"})) titanic_tab["alluvium"] = titanic_tab.index # Create alluvial plot (matches examples/alluvial_examples.py) fig, ax = plot_alluvial( titanic_tab, dims=["Class", "Sex", "Age"], value_col="Freq", color_by="Survived", id_col="alluvium", orders={ "Class": ["1st", "2nd", "3rd"], "Sex": ["male", "female"], "Age": ["Child", "Adult"], }, color_map={"No": "#F17C7E", "Yes": "#6CCECB"}, title="Titanic Survival Analysis", subtitle="Class → Sex → Age", alpha=0.7, ) plt.show() ``` ![Alluvial Plot Titanic Example](../examples/alluvial_titanic_example.png) ## Functions ### `plot_alluvial()` Creates a basic alluvial plot with flow diagrams between categorical dimensions. **Parameters:** - `df`: DataFrame containing the data - `dims`: List of column names representing the dimensions (axes) of the flow - `value_col`: Column name containing the frequency/weight values - `color_by`: Column name to use for coloring the flows - `id_col`: Column name containing unique identifiers for each flow - `orders`: Optional dictionary mapping dimension names to ordered category lists - `color_map`: Optional dictionary mapping category values to colors - `title`: Main title for the plot - `subtitle`: Subtitle for the plot - `figsize`: Figure size in inches (default: (9, 6)) - `alpha`: Transparency level for flow polygons (default: 0.8) - `x_label`: Label for x-axis (default: "Demographic") - `y_label`: Label for y-axis (default: "Frequency") ### `plot_alluvial_with_stats()` Creates an alluvial plot with optional statistical annotations. Currently identical to `plot_alluvial()` but provides a consistent interface for future statistical enhancements. ## Examples ### Iris Dataset Example ```python from ggpubpy import plot_alluvial, load_iris import pandas as pd import matplotlib.pyplot as plt # Load Iris data iris = load_iris() # Create categorical variables from continuous ones iris["SepalLenCat"] = pd.cut(iris["sepal_length"], bins=3, labels=["Short", "Medium", "Long"]) iris["PetalLenCat"] = pd.cut(iris["petal_length"], bins=3, labels=["Short", "Medium", "Long"]) # Create frequency table with alluvium IDs iris_tab = (iris.groupby(["SepalLenCat", "PetalLenCat", "species"], observed=True) .size() .reset_index(name="Freq")) iris_tab["alluvium"] = iris_tab.index # Create alluvial plot fig, ax = plot_alluvial( iris_tab, dims=["SepalLenCat", "PetalLenCat"], value_col="Freq", color_by="species", id_col="alluvium", orders={"SepalLenCat": ["Short", "Medium", "Long"], "PetalLenCat": ["Short", "Medium", "Long"]}, title="Iris Dataset Analysis", subtitle="Sepal length → Petal length", alpha=0.7 ) plt.show() ``` ![Alluvial Plot Iris Example](../examples/alluvial_iris_example.png) ## Complete Examples Note: The figures on this page are generated by running `examples/alluvial_examples.py` using identical parameters. See `examples/alluvial_examples.py` for complete examples including: - Titanic survival analysis - Iris dataset analysis - Custom employee performance data ## Data Requirements Your data should be in a "long" format with: 1. **Dimensions**: Categorical columns representing the flow axes 2. **Values**: A numeric column representing the frequency/weight of each flow 3. **Colors**: A categorical column for coloring the flows 4. **IDs**: A unique identifier for each flow (alluvium) ## Tips 1. **Data preparation**: Create frequency tables using `groupby().size()` or `groupby().sum()` 2. **Alluvium IDs**: Add a unique identifier column (e.g., `df["alluvium"] = df.index`) 3. **Category ordering**: Use the `orders` parameter to control the sequence of categories 4. **Color schemes**: Provide custom `color_map` for consistent coloring across plots 5. **Transparency**: Adjust `alpha` to control flow visibility and overlapping ## Integration The alluvial plot functions are fully integrated into the ggpubpy package and can be imported alongside other plotting functions: ```python from ggpubpy import plot_alluvial, plot_boxplot, plot_violin ```