4 Reshaping and Tidy Data

There’s a whole paper by Hadley Wickham about tidy data but it’s in a PDF so you’re probably not going to read it. Here’s the paper’s definition of tidy data:

Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.

As Dr Wickham notes, this is Codd’s 3rd Normal Form but in statspeak rather than databasespeak.

The meaning of “variable” and “observation” depend on what we’re studying, so tidy data is a concept that you mostly learn through experience and ✨vibes✨.

Now we’ll explore what tidy data looks like for an NBA results dataset.

4.1 Get the data

from pathlib import Path
import polars as pl
import pandas as pd

pl.Config.set_tbl_rows(5)
pd.options.display.max_rows = 5

nba_dir = Path("../data/nba/")

column_names = {
    "Date": "date",
    "Visitor/Neutral": "away_team",
    "PTS": "away_points",
    "Home/Neutral": "home_team",
    "PTS.1": "home_points",
}

if not nba_dir.exists():
    nba_dir.mkdir()
    for month in (
        "october",
        "november",
        "december",
        "january",
        "february",
        "march",
        "april",
        "may",
        "june",
    ):
        # In practice we would do more data cleaning here, and save to parquet not CSV.
        # But we save messy data here so we can clean it later for pedagogical purposes.
        url = f"http://www.basketball-reference.com/leagues/NBA_2016_games-{month}.html"
        tables = pd.read_html(url)
        raw = (
            pl.from_pandas(tables[0].query("Date != 'Playoffs'"))
            .rename(column_names)
            .select(column_names.values())
        )
        raw.write_csv(nba_dir / f"{month}.csv")

nba_glob = nba_dir / "*.csv"
pl.scan_csv(nba_glob).head().collect()

shape: (5, 5)

date	away_team	away_points	home_team	home_points
str	str	i64	str	i64
"Fri, Apr 1, 2016"	"Philadelphia 76ers"	91	"Charlotte Hornets"	100
"Fri, Apr 1, 2016"	"Dallas Mavericks"	98	"Detroit Pistons"	89
"Fri, Apr 1, 2016"	"Brooklyn Nets"	91	"New York Knicks"	105
"Fri, Apr 1, 2016"	"Cleveland Cavaliers"	110	"Atlanta Hawks"	108
"Fri, Apr 1, 2016"	"Toronto Raptors"	99	"Memphis Grizzlies"	95

4.2 Cleaning 🧹

Nothing super interesting here:

Polars
Pandas

games_pl = (
    pl.scan_csv(nba_glob)
    .with_columns(
        pl.col("date").str.strptime(pl.Date, "%a, %b %d, %Y"),
    )
    .sort("date")
    .with_row_index("game_id")
)
games_pl.head().collect()

shape: (5, 6)

game_id	date	away_team	away_points	home_team	home_points
u32	date	str	i64	str	i64
0	2015-10-27	"Cleveland Cavaliers"	95	"Chicago Bulls"	97
1	2015-10-27	"Detroit Pistons"	106	"Atlanta Hawks"	94
2	2015-10-27	"New Orleans Pelicans"	95	"Golden State Warriors"	111
3	2015-10-28	"Washington Wizards"	88	"Orlando Magic"	87
4	2015-10-28	"Philadelphia 76ers"	95	"Boston Celtics"	112

games_pd = (
    pl.read_csv(nba_glob)
    .to_pandas()
    .dropna(how="all")
    .assign(date=lambda x: pd.to_datetime(x["date"], format="%a, %b %d, %Y"))
    .sort_values("date")
    .reset_index(drop=True)
    .set_index("date", append=True)
    .rename_axis(["game_id", "date"])
    .sort_index()
)
games_pd.head()

		away_team	away_points	home_team	home_points
game_id	date
0	2015-10-27	Cleveland Cavaliers	95	Chicago Bulls	97
1	2015-10-27	Detroit Pistons	106	Atlanta Hawks	94
2	2015-10-27	New Orleans Pelicans	95	Golden State Warriors	111
3	2015-10-28	Philadelphia 76ers	95	Boston Celtics	112
4	2015-10-28	Washington Wizards	88	Orlando Magic	87

Polars does have a drop_nulls method but the only parameter it takes is subset, which — like in Pandas — lets you consider null values just for a subset of the columns. Pandas additionally lets you specify how="all" to drop a row only if every value is null, but Polars drop_nulls has no such parameter and will drop the row if any values are null. If you only want to drop when all values are null, the docs recommend .filter(~pl.all(pl.all().is_null())).

Note

A previous version of the Polars example used pl.fold, which is for fast horizontal operations. It doesn’t come up anywhere else in this book, so consider this your warning that it exists.

4.3 Pivot and Melt

I recently came across someone who was doing advanced quantitative research in Python but had never heard of the Pandas .pivot method. I shudder to imagine the code he must have written in the absence of this knowledge, so here’s a simple explanation of pivoting and melting, lest anyone else suffer in ignorance. If you already know what pivot and melt are, feel free to scroll past this bit.

4.3.1 Pivot

Suppose you have a dataframe that looks like this:

Code

from datetime import date
prices = pl.DataFrame({
    "date": [*[date(2020, 1, 1)]*4, *[date(2020, 1, 2)]*4, *[date(2020, 1, 3)]*4],
    "ticker": [*["AAPL", "TSLA", "MSFT", "NFLX"]*3],
    "price": [100, 200, 300, 400, 110, 220, 330, 420, 105, 210, 315, 440],
})
prices

shape: (12, 3)

date	ticker	price
date	str	i64
2020-01-01	"AAPL"	100
2020-01-01	"TSLA"	200
2020-01-01	"MSFT"	300
…	…	…
2020-01-03	"MSFT"	315
2020-01-03	"NFLX"	440

In both Polars and Pandas you can call df.pivot to get a dataframe that looks like this:

Code

pivoted = prices.pivot(index="date", values="price", on="ticker")
pivoted

shape: (3, 5)

date	AAPL	TSLA	MSFT	NFLX
date	i64	i64	i64	i64
2020-01-01	100	200	300	400
2020-01-02	110	220	330	420
2020-01-03	105	210	315	440

As you can see, .pivot creates a dataframe where the columns are the unique labels from one column (“ticker”), alongside the index column (“date”). The values for the non-index columns are taken from the corresponding rows of the values column (“price”).

If our dataframe had multiple prices for the same ticker on the same date, we would use the aggregate_fn parameter of the .pivot method, e.g.: prices.pivot(..., aggregate_fn="mean"). Pivoting with an aggregate function gives us similar behaviour to what Excel calls “pivot tables”.

4.3.2 Melt / Unpivot

Melt is the inverse of pivot. While pivot takes us from long data to wide data, melt goes from wide to long. Note: Polars has recently replaced its melt method with an unpivot method.

If we call .unpivot(index="date", value_name="price") on our pivoted dataframe we get our original dataframe back:

Code

pivoted.unpivot(index="date", value_name="price")

shape: (12, 3)

date	variable	price
date	str	i64
2020-01-01	"AAPL"	100
2020-01-02	"AAPL"	110
2020-01-03	"AAPL"	105
…	…	…
2020-01-02	"NFLX"	420
2020-01-03	"NFLX"	440

4.4 Tidy NBA data

Suppose we want to calculate the days of rest each team had before each game. In the current structure this is difficult because we need to track both the home_team and away_team columns. We’ll use .unpivot so that there’s a single team column. This makes it easier to add a rest column with the per-team rest days between games.

Polars
Pandas

tidy_pl = (
    games_pl
    .unpivot(
        index=["game_id", "date"],
        on=["away_team", "home_team"],
        value_name="team",
    )
    .sort("game_id")
    .with_columns((
        pl.col("date")
        .alias("rest")
        .diff().over("team")
        .dt.total_days() - 1).cast(pl.Int8))
    .drop_nulls("rest")
    .collect()
)
tidy_pl

shape: (2_602, 5)

game_id	date	variable	team	rest
u32	date	str	str	i8
5	2015-10-28	"away_team"	"Chicago Bulls"	0
6	2015-10-28	"home_team"	"Detroit Pistons"	0
11	2015-10-28	"away_team"	"Cleveland Cavaliers"	0
…	…	…	…	…
1315	2016-06-19	"away_team"	"Cleveland Cavaliers"	2
1315	2016-06-19	"home_team"	"Golden State Warriors"	2

tidy_pd = (
    games_pd.reset_index()
    .melt(
        id_vars=["game_id", "date"],
        value_vars=["away_team", "home_team"],
        value_name="team",
    )
    .sort_values("game_id")
    .assign(
        rest=lambda df: (
            df
            .sort_values("date")
            .groupby("team")
            ["date"]
            .diff()
            .dt.days
            .sub(1)
        )
    )
    .dropna(subset=["rest"])
    .astype({"rest": pd.Int8Dtype()})
)
tidy_pd

	game_id	date	variable	team	rest
7	7	2015-10-28	away_team	New Orleans Pelicans	0
11	11	2015-10-28	away_team	Chicago Bulls	0
...	...	...	...	...	...
1315	1315	2016-06-19	away_team	Cleveland Cavaliers	2
2631	1315	2016-06-19	home_team	Golden State Warriors	2

2602 rows × 5 columns

Now we use .pivot so that this days-of-rest data can be added back to the original dataframe. We’ll also add columns for the spread between the home team’s rest and away team’s rest, and a flag for whether the home team won.

Polars
Pandas

by_game_pl = (
    tidy_pl
    .pivot(
        values="rest",
        index=["game_id", "date"],
        on="variable"
    )
    .rename({"away_team": "away_rest", "home_team": "home_rest"})
)
joined_pl = (
    by_game_pl
    .join(games_pl.collect(), on=["game_id", "date"])
    .with_columns([
        pl.col("home_points").alias("home_win") > pl.col("away_points"),
        pl.col("home_rest").alias("rest_spread") - pl.col("away_rest"),
    ])
)
joined_pl

shape: (1_303, 10)

game_id	date	away_rest	home_rest	away_team	away_points	home_team	home_points	home_win	rest_spread
u32	date	i8	i8	str	i64	str	i64	bool	i8
5	2015-10-28	0	null	"Chicago Bulls"	115	"Brooklyn Nets"	100	false	null
6	2015-10-28	null	0	"Utah Jazz"	87	"Detroit Pistons"	92	true	null
11	2015-10-28	0	null	"Cleveland Cavaliers"	106	"Memphis Grizzlies"	76	false	null
…	…	…	…	…	…	…	…	…	…
1314	2016-06-16	2	2	"Golden State Warriors"	101	"Cleveland Cavaliers"	115	true	0
1315	2016-06-19	2	2	"Cleveland Cavaliers"	93	"Golden State Warriors"	89	false	0

by_game_pd = (
    tidy_pd
    .pivot(
        values="rest",
        index=["game_id", "date"],
        columns="variable"
    )
    .rename(
        columns={"away_team": "away_rest", "home_team": "home_rest"}
    )
)
joined_pd = by_game_pd.join(games_pd).assign(
    home_win=lambda df: df["home_points"] > df["away_points"],
    rest_spread=lambda df: df["home_rest"] - df["away_rest"],
)
joined_pd

		away_rest	home_rest	away_team	away_points	home_team	home_points	home_win	rest_spread
game_id	date
7	2015-10-28	0	<NA>	New Orleans Pelicans	94	Portland Trail Blazers	112	True	<NA>
11	2015-10-28	0	<NA>	Chicago Bulls	115	Brooklyn Nets	100	False	<NA>
...	...	...	...	...	...	...	...	...	...
1314	2016-06-16	2	2	Golden State Warriors	101	Cleveland Cavaliers	115	True	0
1315	2016-06-19	2	2	Cleveland Cavaliers	93	Golden State Warriors	89	False	0

1303 rows × 8 columns

Here’s a lightly edited quote from Modern Pandas:

One somewhat subtle point: an “observation” depends on the question being asked. So really, we have two tidy datasets, tidy for answering team-level questions, and joined for answering game-level questions.

Let’s use the team-level dataframe to see each team’s average days of rest, both at home and away:

import seaborn as sns
sns.set_theme(font_scale=0.6)
sns.catplot(
    tidy_pl,
    x="variable",
    y="rest",
    col="team",
    col_wrap=5,
    kind="bar",
    height=1.5,
)

Plotting the distribution of rest_spread:

Polars
Pandas

import numpy as np
delta_pl = joined_pl["rest_spread"]
ax = (
    delta_pl
    .value_counts()
    .drop_nulls()
    .to_pandas()
    .set_index("rest_spread")
    ["count"]
    .reindex(np.arange(delta_pl.min(), delta_pl.max() + 1), fill_value=0)
    .sort_index()
    .plot(kind="bar", color="k", width=0.9, rot=0, figsize=(9, 6))
)
ax.set(xlabel="Difference in Rest (Home - Away)", ylabel="Games")

delta_pd = joined_pd["rest_spread"]
ax = (
    delta_pd
    .value_counts()
    .reindex(np.arange(delta_pd.min(), delta_pd.max() + 1), fill_value=0)
    .sort_index()
    .plot(kind="bar", color="k", width=0.9, rot=0, figsize=(9, 6))
)
ax.set(xlabel="Difference in Rest (Home - Away)", ylabel="Games")

Plotting the win percent by rest_spread:

Polars
Pandas

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(9, 6))
sns.barplot(
    x="rest_spread",
    y="home_win",
    data=joined_pl.filter(pl.col("rest_spread").is_between(-3, 3, closed="both")),
    color="#4c72b0",
    ax=ax,
)

fig, ax = plt.subplots(figsize=(9, 6))
sns.barplot(
    x="rest_spread",
    y="home_win",
    data=joined_pd.query('-3 <= rest_spread <= 3'),
    color="#4c72b0",
    ax=ax,
)

4.5 Stack / Unstack vs Melt / Pivot

Pandas has special methods stack and unstack for reshaping data with a MultiIndex. Polars doesn’t have an index, so anywhere you see stack / unstack in Pandas, the equivalent Polars code will use melt / pivot.

Polars
Pandas

rest_pl = (
    tidy_pl
    .group_by(["date", "variable"], maintain_order=True)
    .agg(pl.col("rest").mean())
)
rest_pl

shape: (418, 3)

date	variable	rest
date	str	f64
2015-10-28	"away_team"	0.0
2015-10-28	"home_team"	0.0
2015-10-29	"away_team"	0.333333
…	…	…
2016-06-19	"away_team"	2.0
2016-06-19	"home_team"	2.0

rest_pd = (
    tidy_pd
    .groupby(["date", "variable"])
    ["rest"]
    .mean()
)
rest_pd

date        variable 
2015-10-28  away_team    0.0
            home_team    0.0
                        ... 
2016-06-19  away_team    2.0
            home_team    2.0
Name: rest, Length: 418, dtype: Float64

In Polars we use .pivot to do what in Pandas would require .unstack:

Polars
Pandas

rest_pl.pivot(index="date", on="variable", values="rest")

shape: (209, 3)

date	away_team	home_team
date	f64	f64
2015-10-28	0.0	0.0
2015-10-29	0.333333	0.0
2015-10-30	1.083333	0.916667
…	…	…
2016-06-16	2.0	2.0
2016-06-19	2.0	2.0

rest_pd.unstack()

variable	away_team	home_team
date
2015-10-28	0.0	0.0
2015-10-29	0.333333	0.0
...	...	...
2016-06-16	2.0	2.0
2016-06-19	2.0	2.0

209 rows × 2 columns

Plotting the moving average of rest days:

Polars
Pandas

ax = (
    rest_pl.pivot(index="date", values="rest", on="variable")
    .filter(pl.col("away_team") < 7)
    .sort("date")
    .select([pl.col("date"), pl.col(pl.Float64).rolling_mean(7)])
    .to_pandas()
    .set_index("date")
    .plot(figsize=(9, 6), linewidth=3)
)
ax.set(ylabel="Rest (7 day MA)")

ax = (
    rest_pd.unstack()
    .query('away_team < 7')
    .sort_index()
    .rolling(7)
    .mean()
    .plot(figsize=(9, 6), linewidth=3)
)
ax.set(ylabel="Rest (7 day MA)")

4.6 Mini Project: Home Court Advantage?

We may as well do some (not very rigorous) analysis: let’s see if home advantage is a real thing.

4.6.1 Find the win percent for each team

We want to control for the strength of the teams playing. The team’s victory percentage is probably not a good control but it’s what we’ll use:

Polars
Pandas

win_col = pl.col("win")
wins_pl = (
    joined_pl.unpivot(
        index=["game_id", "date", "home_win"],
        value_name="team",
        variable_name="is_home",
        on=["home_team", "away_team"],
    )
    .with_columns(pl.col("home_win").alias("win") == (pl.col("is_home") == "home_team"))
    .group_by(["team", "is_home"])
    .agg(
        [
            win_col.sum().alias("n_wins"),
            win_col.count().alias("n_games"),
            win_col.mean().alias("win_pct"),
        ]
    )
    .sort(["team", "is_home"])
)
wins_pl

shape: (60, 5)

team	is_home	n_wins	n_games	win_pct
str	str	u32	u32	f64
"Atlanta Hawks"	"away_team"	22	46	0.478261
"Atlanta Hawks"	"home_team"	30	45	0.666667
"Boston Celtics"	"away_team"	20	44	0.454545
…	…	…	…	…
"Washington Wizards"	"away_team"	18	40	0.45
"Washington Wizards"	"home_team"	22	41	0.536585

wins_pd = (
    joined_pd
    .reset_index()
    .melt(
        id_vars=["game_id", "date", "home_win"],
        value_name="team",
        var_name="is_home",
        value_vars=["home_team", "away_team"],
    )
    .assign(win=lambda df: df["home_win"] == (df["is_home"] == "home_team"))
    .groupby(["team", "is_home"])["win"]
    .agg(['sum', 'count', 'mean'])
    .rename(columns={
        "sum": 'n_wins',
        "count": 'n_games',
        "mean": 'win_pct'
    })
)
wins_pd

		n_wins	n_games	win_pct
team	is_home
Atlanta Hawks	away_team	22	46	0.478261
Atlanta Hawks	home_team	30	45	0.666667
...	...	...	...	...
Washington Wizards	away_team	18	40	0.450000
Washington Wizards	home_team	22	41	0.536585

60 rows × 3 columns

4.7 Some visualisations

g = sns.FacetGrid(wins_pl, hue="team", aspect=0.8, palette=["k"], height=5)
g.map(
    sns.pointplot,
    "is_home",
    "win_pct",
    order=["away_team", "home_team"]).set(ylim=(0, 1))

sns.catplot(
    wins_pl,
    x="is_home",
    y="win_pct",
    col="team",
    col_wrap=5,
    hue="team",
    kind="point",
    height=1.5,
)

Now we calculate the win percent by team, regardless of whether they’re home or away:

Polars
Pandas

win_percent_pl = (
    wins_pl.group_by("team", maintain_order=True).agg(
        pl.col("n_wins").sum().alias("win_pct") / pl.col("n_games").sum()
    )
)
win_percent_pl

shape: (30, 2)

team	win_pct
str	f64
"Atlanta Hawks"	0.571429
"Boston Celtics"	0.563218
"Brooklyn Nets"	0.256098
…	…
"Utah Jazz"	0.487805
"Washington Wizards"	0.493827

win_percent_pd = (
    wins_pd
    .groupby(level="team", as_index=True)
    .apply(lambda x: x["n_wins"].sum() / x["n_games"].sum())
)
win_percent_pd

team
Atlanta Hawks         0.571429
Boston Celtics        0.563218
                        ...   
Utah Jazz             0.487805
Washington Wizards    0.493827
Length: 30, dtype: float64

(
    win_percent_pl
    .sort("win_pct")
    .to_pandas()
    .set_index("team")
    .plot.barh(figsize=(6, 12), width=0.85, color="k")
)
plt.xlabel("Win Percent")

Text(0.5, 0, 'Win Percent')

Here’s a plot of team home court advantage against team overall win percentage:

Polars
Pandas

wins_to_plot_pl = (
    wins_pl.pivot(index="team", on="is_home", values="win_pct")
    .with_columns(
        [
            pl.col("home_team").alias("Home Win % - Away %") - pl.col("away_team"),
            (pl.col("home_team").alias("Overall %") + pl.col("away_team")) / 2,
        ]
    )
)
sns.regplot(data=wins_to_plot_pl, x='Overall %', y='Home Win % - Away %')

wins_to_plot_pd = (
    wins_pd
    ["win_pct"]
    .unstack()
    .assign(**{'Home Win % - Away %': lambda x: x["home_team"] - x["away_team"],
               'Overall %': lambda x: (x["home_team"] + x["away_team"]) / 2})
)
sns.regplot(data=wins_to_plot_pd, x='Overall %', y='Home Win % - Away %')

Let’s add the win percent back to the dataframe and run a regression:

Polars
Pandas

reg_df_pl = (
    joined_pl.join(win_percent_pl, left_on="home_team", right_on="team")
    .rename({"win_pct": "home_strength"})
    .join(win_percent_pl, left_on="away_team", right_on="team")
    .rename({"win_pct": "away_strength"})
    .with_columns(
        [
            pl.col("home_points").alias("point_diff") - pl.col("away_points"),
            pl.col("home_rest").alias("rest_diff") - pl.col("away_rest"),
            pl.col("home_win").cast(pl.UInt8),  # for statsmodels
        ]
    )
)
reg_df_pl.head()

shape: (5, 14)

game_id	date	away_rest	home_rest	away_team	away_points	home_team	home_points	home_win	rest_spread	home_strength	away_strength	point_diff	rest_diff
u32	date	i8	i8	str	i64	str	i64	u8	i8	f64	f64	i64	i8
5	2015-10-28	0	null	"Chicago Bulls"	115	"Brooklyn Nets"	100	0	null	0.256098	0.506173	-15	null
6	2015-10-28	null	0	"Utah Jazz"	87	"Detroit Pistons"	92	1	null	0.505882	0.487805	5	null
11	2015-10-28	0	null	"Cleveland Cavaliers"	106	"Memphis Grizzlies"	76	0	null	0.488372	0.715686	-30	null
14	2015-10-28	0	null	"New Orleans Pelicans"	94	"Portland Trail Blazers"	112	1	null	0.526882	0.37037	18	null
17	2015-10-29	0	0	"Memphis Grizzlies"	112	"Indiana Pacers"	103	0	0	0.545455	0.488372	-9	0

reg_df_pd = (
    joined_pd.assign(
        away_strength=joined_pd['away_team'].map(win_percent_pd),
        home_strength=joined_pd['home_team'].map(win_percent_pd),
        point_diff=joined_pd['home_points'] - joined_pd['away_points'],
        rest_diff=joined_pd['home_rest'] - joined_pd['away_rest'])
)
reg_df_pd.head()

		away_rest	home_rest	away_team	away_points	home_team	home_points	home_win	rest_spread	away_strength	home_strength	point_diff	rest_diff
game_id	date
7	2015-10-28	0	<NA>	New Orleans Pelicans	94	Portland Trail Blazers	112	True	<NA>	0.370370	0.526882	18	<NA>
11	2015-10-28	0	<NA>	Chicago Bulls	115	Brooklyn Nets	100	False	<NA>	0.506173	0.256098	-15	<NA>
15	2015-10-28	<NA>	0	Utah Jazz	87	Detroit Pistons	92	True	<NA>	0.487805	0.505882	5	<NA>
16	2015-10-28	0	<NA>	Cleveland Cavaliers	106	Memphis Grizzlies	76	False	<NA>	0.715686	0.488372	-30	<NA>
17	2015-10-29	1	0	Atlanta Hawks	112	New York Knicks	101	False	-1	0.571429	0.382716	-11	-1

import statsmodels.formula.api as sm

mod = sm.logit(
    "home_win ~ home_strength + away_strength + home_rest + away_rest",
    reg_df_pl.to_pandas(),
)
res = mod.fit()
res.summary()

Optimization terminated successfully.
         Current function value: 0.554797
         Iterations 6

Logit Regression Results
Dep. Variable:	home_win	No. Observations:	1299
Model:	Logit	Df Residuals:	1294
Method:	MLE	Df Model:	4
Date:	Mon, 08 Jul 2024	Pseudo R-squ.:	0.1777
Time:	19:22:08	Log-Likelihood:	-720.68
converged:	True	LL-Null:	-876.38
Covariance Type:	nonrobust	LLR p-value:	3.748e-66

	coef	std err	z	P>\|z\|	[0.025	0.975]
Intercept	-0.0019	0.304	-0.006	0.995	-0.597	0.593
home_strength	5.7161	0.466	12.272	0.000	4.803	6.629
away_strength	-4.9133	0.456	-10.786	0.000	-5.806	-4.020
home_rest	0.1045	0.076	1.381	0.167	-0.044	0.253
away_rest	-0.0347	0.066	-0.526	0.599	-0.164	0.095

You can play around with the regressions yourself but we’ll end them here.

4.8 Summary

This was mostly a demonstration of .pivot and .melt, with several different examples of reshaping data in Polars and Pandas.