3 Performance

As you’ve probably heard by now, Polars is very very fast. Well-written Polars is quicker than well-written Pandas, and it’s easier to write Polars well. With that in mind…

3.1 Six fairly obvious performance rules

Here are some tips that are almost always a good idea:

Use the lazy API.
Use Exprs, and don’t use .apply unless you really have to.
Use the smallest necessary numeric types (so if you have an integer between 0 and 255, use pl.UInt8, not pl.Int64). This will save both time and space.
Use efficient storage (if you’re dumping stuff in files, Parquet is a good choice).
Use categoricals for recurring strings (but note that it may not be worth it if there’s not much repetition).
Only select the columns you need.

Tip

If your colleagues are happy with CSVs and can’t be convinced to use something else, tell them that the Modern Polars book says they should feel bad.

These are basically the same rules you’d follow when using Pandas, except for the one about the lazy API. Now for some comparisons between the performance of idiomatic Pandas and Polars.

3.2 Polars is faster at the boring stuff

Here we’ll clean up a messy dataset, kindly provided by Kaggle user Rachit Toshniwal as a deliberate example of a really crap CSV. Most of the cleanup involves extracting numeric data from awkward strings.

Also, the data is too small so I’ve concatenated it to itself 20 times. We’re not doing anything that will care about the duplication. Here’s how the raw table looks:

Code

import pandas as pd
pd.read_csv("../data/fifa21_raw_big.csv", dtype="string", nrows=2)

	ID	Name	LongName	photoUrl	playerUrl	Nationality	Age	↓OVA	POT	Club	...	A/W	D/W	IR	PAC	SHO	PAS	DRI	DEF	PHY	Hits
0	158023	L. Messi	Lionel Messi	https://cdn.sofifa.com/players/158/023/21_60.png	http://sofifa.com/player/158023/lionel-messi/2...	Argentina	33	93	93	FC Barcelona	...	Medium	Low	5 ★	85	92	91	95	38	65	771
1	20801	Cristiano Ronaldo	C. Ronaldo dos Santos Aveiro	https://cdn.sofifa.com/players/020/801/21_60.png	http://sofifa.com/player/20801/c-ronaldo-dos-s...	Portugal	35	92	92	Juventus	...	High	Low	5 ★	89	93	81	89	35	77	562

2 rows × 77 columns

For this exercise we’ll assume we want to make use of all the columns. First some boilerplate where we map out the different data types:

Code

import pandas as pd
import polars as pl
import numpy as np
import math
str_cols = [
    "Name",
    "LongName",
    "playerUrl",
    "photoUrl",
]
initial_category_cols_pl = [
    "Nationality",
    "Preferred Foot",
    "Best Position",
    "A/W",
    "D/W"
]
category_cols = [*initial_category_cols_pl, "Club"]
date_cols = [
    "Joined",
    "Loan Date End"
]
# these all start with the euro symbol and end with 0, M or K
money_cols = [
    "Value",
    "Wage",
    "Release Clause"
]
star_cols = [
    "W/F",
    "SM",
    "IR",
]
# Contract col is a range of years
# Positions is a list of positions
# Height is in cm
# Weight is in kg
# Hits is numbers with K and M 
messy_cols = [
    "Contract",
    "Positions",
    "Height",
    "Weight",
    "Hits"
]
initially_str_cols = str_cols + date_cols + money_cols + star_cols + messy_cols
initially_str_cols_pl = [*initially_str_cols, "Club"]
u32_cols = [
    "ID",
    "Total Stats"
]
u8_cols = [
    'Age',
    '↓OVA',
    'POT',
    'BOV',
    'Crossing',
    'Finishing',
    'Heading Accuracy',
    'Short Passing',
    'Volleys',
    'Dribbling',
    'Curve',
    'FK Accuracy',
    'Long Passing',
    'Ball Control',
    'Acceleration',
    'Sprint Speed',
    'Agility',
    'Reactions',
    'Balance',
    'Shot Power',
    'Jumping',
    'Stamina',
    'Strength',
    'Long Shots',
    'Aggression',
    'Interceptions',
    'Positioning',
    'Vision',
    'Penalties',
    'Composure',
    'Marking',
    'Standing Tackle',
    'Sliding Tackle',
    'GK Diving',
    'GK Handling',
    'GK Kicking',
    'GK Positioning',
    'GK Reflexes',
    'PAC',
    'SHO',
    'PAS',
    'DRI',
    'DEF',
    'PHY'
]

u16_cols = [
    'Attacking',
    'Skill',
    'Movement',
    'Power',
    'Mentality',
    'Defending',
    'Goalkeeping',
    'Total Stats',
    'Base Stats'
]

3.2.1 Dtypes

Here are the initial dtypes for the two dataframes:

Polars
Pandas

# can't use UInt8/16 in scan_csv
dtypes_pl = (
    {col: pl.Utf8 for col in initially_str_cols_pl}
    | {col: pl.Categorical for col in initial_category_cols_pl}
    | {col: pl.UInt32 for col in [*u32_cols, *u16_cols, *u8_cols]}
)

dtypes_pd = (
    {col: pd.StringDtype() for col in initially_str_cols}
    | {col: pd.CategoricalDtype() for col in category_cols}
    | {col: "uint32" for col in u32_cols}
    | {col: "uint8" for col in u8_cols}
    | {col: "uint16" for col in u16_cols}
)

One thing I’ll note here is that Pandas numeric types are somewhat confusing: "uint32" means np.uint32 which is not the same thing as pd.UInt32Dtype(). Only the latter is nullable. On the other hand, Polars has just one unsigned 32-bit integer type, and it’s nullable.

Tip

Polars expressions have a shrink_dtype method that can be more convenient than manually specifying the dtypes yourself. It’s not magic though, and it has to spend time finding the min and max of the column.

3.2.2 Data cleaning

There’s not much that you haven’t seen here already, so we won’t explain the code line by line. The main new thing here is pl.when for ternary expressions.

Polars
Pandas

def parse_date_pl(col: pl.Expr) -> pl.Expr:
    return col.str.strptime(pl.Date, format="%b %d, %Y")

def parse_suffixed_num_pl(col: pl.Expr) -> pl.Expr:
    suffix = col.str.slice(-1, 1)
    suffix_value = (
        pl.when(suffix == "K")
        .then(1_000)
        .when(suffix == "M")
        .then(1_000_000)
        .otherwise(1)
        .cast(pl.UInt32)
    )
    without_suffix = (
        col
        .str.replace("K", "", literal=True)
        .str.replace("M", "", literal=True)
        .cast(pl.Float32)
    )
    original_name = col.meta.output_name()
    return (suffix_value * without_suffix).alias(original_name)

def parse_money_pl(col: pl.Expr) -> pl.Expr:
    return parse_suffixed_num_pl(col.str.slice(1)).cast(pl.UInt32)

def parse_star_pl(col: pl.Expr) -> pl.Expr:
    return col.str.slice(0, 1).cast(pl.UInt8)

def feet_to_cm_pl(col: pl.Expr) -> pl.Expr:
    feet_inches_split = col.str.split_exact("'", 1)
    total_inches = (
        (feet_inches_split.struct.field("field_0").cast(pl.UInt8, strict=False) * 12)
        + feet_inches_split.struct.field("field_1").str.strip_chars_end('"').cast(pl.UInt8, strict=False)
    )
    return (total_inches * 2.54).round(0).cast(pl.UInt8)

def parse_height_pl(col: pl.Expr) -> pl.Expr:
    is_cm = col.str.ends_with("cm")
    return (
        pl.when(is_cm)
        .then(col.str.slice(0, 3).cast(pl.UInt8, strict=False))
        .otherwise(feet_to_cm_pl(col))
    )

def parse_weight_pl(col: pl.Expr) -> pl.Expr:
    is_kg = col.str.ends_with("kg")
    without_unit = col.str.extract(r"(\d+)").cast(pl.UInt8)
    return (
        pl.when(is_kg)
        .then(without_unit)
        .otherwise((without_unit * 0.453592).round(0).cast(pl.UInt8))
    )

def parse_contract_pl(col: pl.Expr) -> list[pl.Expr]:
    contains_tilde = col.str.contains(" ~ ", literal=True)
    loan_str = " On Loan"
    loan_col = col.str.ends_with(loan_str)
    split = (
        pl.when(contains_tilde)
        .then(col)
        .otherwise(None)
        .str.split_exact(" ~ ", 1)
    )
    start = split.struct.field("field_0").cast(pl.UInt16).alias("contract_start")
    end = split.struct.field("field_1").cast(pl.UInt16).alias("contract_end")
    free_agent = (col == "Free").alias("free_agent").fill_null(False)
    loan_date = (
        pl.when(loan_col)
        .then(col)
        .otherwise(None)
        .str.split_exact(" On Loan", 1)
        .struct.field("field_0")
        .alias("loan_date_start")
    )
    return [start, end, free_agent, parse_date_pl(loan_date)]

def parse_date_pd(col: pd.Series) -> pd.Series:
    return pd.to_datetime(col, format="%b %d, %Y")

def parse_suffixed_num_pd(col: pd.Series) -> pd.Series:
    suffix_value = (
        col
        .str[-1]
        .map({"K": 1_000, "M": 1_000_000})
        .fillna(1)
        .astype("uint32")
    )
    without_suffix = (
        col
        .str.replace("K", "", regex=False)
        .str.replace("M", "", regex=False)
        .astype("float")
    )
    return suffix_value * without_suffix

def parse_money_pd(col: pd.Series) -> pd.Series:
    return parse_suffixed_num_pd(col.str[1:]).astype("uint32")

def parse_star_pd(col: pd.Series) -> pd.Series:
    return col.str[0].astype("uint8")

def feet_to_cm_pd(col: pd.Series) -> pd.Series:
    feet_inches_split = col.str.split("'", expand=True)
    total_inches = (
        feet_inches_split[0].astype("uint8").mul(12)
        + feet_inches_split[1].str[:-1].astype("uint8")
    )
    return total_inches.mul(2.54).round().astype("uint8")

def parse_height_pd(col: pd.Series) -> pd.Series:
    is_cm = col.str.endswith("cm")
    cm_values = col.loc[is_cm].str[:-2].astype("uint8")
    inches_as_cm = feet_to_cm_pd(col.loc[~is_cm])
    return pd.concat([cm_values, inches_as_cm])

def parse_weight_pd(col: pd.Series) -> pd.Series:
    is_kg = col.str.endswith("kg")
    without_unit = col.where(is_kg, col.str[:-3]).mask(is_kg, col.str[:-2]).astype("uint8")
    return without_unit.where(is_kg, without_unit.mul(0.453592).round().astype("uint8"))

def parse_contract_pd(df: pd.DataFrame) -> pd.DataFrame:
    contract_col = df["Contract"]
    contains_tilde = contract_col.str.contains(" ~ ", regex=False)
    split = (
        contract_col.loc[contains_tilde].str.split(" ~ ", expand=True).astype(pd.UInt16Dtype())
    )
    split.columns = ["contract_start", "contract_end"]
    not_tilde = contract_col.loc[~contains_tilde]
    free_agent = (contract_col == "Free").rename("free_agent").fillna(False)
    loan_date = parse_date_pd(not_tilde.loc[~free_agent].str[:-8]).rename("loan_date_start")
    return pd.concat([df.drop("Contract", axis=1), split, free_agent, loan_date], axis=1)

3.2.3 Performance comparison

In this example, Polars is ~150x faster than Pandas:

Polars
Pandas

%%time
new_cols_pl = ([
    pl.col("Club").str.strip_chars().cast(pl.Categorical),
    parse_suffixed_num_pl(pl.col("Hits")).cast(pl.UInt32),
    pl.col("Positions").str.split(","),
    parse_height_pl(pl.col("Height")),
    parse_weight_pl(pl.col("Weight")),
]
+ [parse_date_pl(pl.col(col)) for col in date_cols]
+ [parse_money_pl(pl.col(col)) for col in money_cols]
+ [parse_star_pl(pl.col(col)) for col in star_cols]
+ parse_contract_pl(pl.col("Contract"))
+ [pl.col(col).cast(pl.UInt16) for col in u16_cols]
+ [pl.col(col).cast(pl.UInt8) for col in u8_cols]
)
fifa_pl = (
    pl.scan_csv("../data/fifa21_raw_big.csv", schema_overrides=dtypes_pl)
    .with_columns(new_cols_pl)
    .drop("Contract")
    .rename({"↓OVA": "OVA"})
    .collect()
)

CPU times: user 1.78 s, sys: 197 ms, total: 1.98 s
Wall time: 216 ms

<timed exec>:20: CategoricalRemappingWarning: Local categoricals have different encodings, expensive re-encoding is done to perform this merge operation. Consider using a StringCache or an Enum type if the categories are known in advance

%%time
fifa_pd = (
    pd.read_csv("../data/fifa21_raw_big.csv", dtype=dtypes_pd)
    .assign(Club=lambda df: df["Club"].cat.rename_categories(lambda c: c.strip()),
        **{col: lambda df: parse_date_pd(df[col]) for col in date_cols},
        **{col: lambda df: parse_money_pd(df[col]) for col in money_cols},
        **{col: lambda df: parse_star_pd(df[col]) for col in star_cols},
        Hits=lambda df: parse_suffixed_num_pd(df["Hits"]).astype(pd.UInt32Dtype()),
        Positions=lambda df: df["Positions"].str.split(","),
        Height=lambda df: parse_height_pd(df["Height"]),
        Weight=lambda df: parse_weight_pd(df["Weight"])
    )
    .pipe(parse_contract_pd)
    .rename(columns={"↓OVA": "OVA"})
)

CPU times: user 3.55 s, sys: 347 ms, total: 3.9 s
Wall time: 3.9 s

Output:

Polars
Pandas

fifa_pl.head()

shape: (5, 80)

ID	Name	LongName	photoUrl	playerUrl	Nationality	Age	OVA	POT	Club	Positions	Height	Weight	Preferred Foot	BOV	Best Position	Joined	Loan Date End	Value	Wage	Release Clause	Attacking	Crossing	Finishing	Heading Accuracy	Short Passing	Volleys	Skill	Dribbling	Curve	FK Accuracy	Long Passing	Ball Control	Movement	Acceleration	Sprint Speed	Agility	…	Strength	Long Shots	Mentality	Aggression	Interceptions	Positioning	Vision	Penalties	Composure	Defending	Marking	Standing Tackle	Sliding Tackle	Goalkeeping	GK Diving	GK Handling	GK Kicking	GK Positioning	GK Reflexes	Total Stats	Base Stats	W/F	SM	A/W	D/W	IR	PAC	SHO	PAS	DRI	DEF	PHY	Hits	contract_start	contract_end	free_agent	loan_date_start
u32	str	str	str	str	cat	u8	u8	u8	cat	list[str]	u8	u8	cat	u8	cat	date	date	u32	u32	u32	u16	u8	u8	u8	u8	u8	u16	u8	u8	u8	u8	u8	u16	u8	u8	u8	…	u8	u8	u16	u8	u8	u8	u8	u8	u8	u16	u8	u8	u8	u16	u8	u8	u8	u8	u8	u16	u16	u8	u8	cat	cat	u8	u8	u8	u8	u8	u8	u8	u32	u16	u16	bool	date
158023	"L. Messi"	"Lionel Messi"	"https://cdn.sofifa.com/players…	"http://sofifa.com/player/15802…	"Argentina"	33	93	93	"FC Barcelona"	["RW", " ST", " CF"]	170	72	"Left"	93	"RW"	2004-07-01	null	103500000	560000	138399993	429	85	95	70	91	88	470	96	93	94	91	96	451	91	80	91	…	69	94	347	44	40	93	95	75	96	91	32	35	24	54	6	11	15	14	8	2231	466	4	4	"Medium"	"Low"	5	85	92	91	95	38	65	771	2004	2021	false	null
20801	"Cristiano Ronaldo"	"C. Ronaldo dos Santos Aveiro"	"https://cdn.sofifa.com/players…	"http://sofifa.com/player/20801…	"Portugal"	35	92	92	"Juventus"	["ST", " LW"]	187	83	"Right"	92	"ST"	2018-07-10	null	63000000	220000	75900001	437	84	95	90	82	86	414	88	81	76	77	92	431	87	91	87	…	78	93	353	63	29	95	82	84	95	84	28	32	24	58	7	11	15	14	11	2221	464	4	5	"High"	"Low"	5	89	93	81	89	35	77	562	2018	2022	false	null
200389	"J. Oblak"	"Jan Oblak"	"https://cdn.sofifa.com/players…	"http://sofifa.com/player/20038…	"Slovenia"	27	91	93	"Atlético Madrid"	["GK"]	188	87	"Right"	91	"GK"	2014-07-16	null	120000000	125000	159399993	95	13	11	15	43	13	109	12	13	14	40	30	307	43	60	67	…	78	12	140	34	19	11	65	11	68	57	27	12	18	437	87	92	78	90	90	1413	489	3	1	"Medium"	"Medium"	3	87	92	78	90	52	90	150	2014	2023	false	null
192985	"K. De Bruyne"	"Kevin De Bruyne"	"https://cdn.sofifa.com/players…	"http://sofifa.com/player/19298…	"Belgium"	29	91	91	"Manchester City"	["CAM", " CM"]	181	70	"Right"	91	"CAM"	2015-08-30	null	129000000	370000	161000000	407	94	82	55	94	82	441	88	85	83	93	92	398	77	76	78	…	74	91	408	76	66	88	94	84	91	186	68	65	53	56	15	13	5	10	13	2304	485	5	4	"High"	"High"	4	76	86	93	88	64	78	207	2015	2023	false	null
190871	"Neymar Jr"	"Neymar da Silva Santos Jr."	"https://cdn.sofifa.com/players…	"http://sofifa.com/player/19087…	"Brazil"	28	91	91	"Paris Saint-Germain"	["LW", " CAM"]	175	68	"Right"	91	"LW"	2017-08-03	null	132000000	270000	166500000	408	85	87	62	87	87	448	95	88	89	81	95	453	94	89	96	…	50	84	356	51	36	87	90	92	93	94	35	30	29	59	9	9	15	15	11	2175	451	5	5	"High"	"Medium"	5	91	85	86	94	36	59	595	2017	2022	false	null

fifa_pd.head()

	ID	Name	LongName	photoUrl	playerUrl	Nationality	Age	OVA	POT	Club	...	SHO	PAS	DRI	DEF	PHY	Hits	contract_start	contract_end	free_agent	loan_date_start
0	158023	L. Messi	Lionel Messi	https://cdn.sofifa.com/players/158/023/21_60.png	http://sofifa.com/player/158023/lionel-messi/2...	Argentina	33	93	93	FC Barcelona	...	92	91	95	38	65	771	2004	2021	False	NaT
1	20801	Cristiano Ronaldo	C. Ronaldo dos Santos Aveiro	https://cdn.sofifa.com/players/020/801/21_60.png	http://sofifa.com/player/20801/c-ronaldo-dos-s...	Portugal	35	92	92	Juventus	...	93	81	89	35	77	562	2018	2022	False	NaT
2	200389	J. Oblak	Jan Oblak	https://cdn.sofifa.com/players/200/389/21_60.png	http://sofifa.com/player/200389/jan-oblak/210006/	Slovenia	27	91	93	Atlético Madrid	...	92	78	90	52	90	150	2014	2023	False	NaT
3	192985	K. De Bruyne	Kevin De Bruyne	https://cdn.sofifa.com/players/192/985/21_60.png	http://sofifa.com/player/192985/kevin-de-bruyn...	Belgium	29	91	91	Manchester City	...	86	93	88	64	78	207	2015	2023	False	NaT
4	190871	Neymar Jr	Neymar da Silva Santos Jr.	https://cdn.sofifa.com/players/190/871/21_60.png	http://sofifa.com/player/190871/neymar-da-silv...	Brazil	28	91	91	Paris Saint-Germain	...	85	86	94	36	59	595	2017	2022	False	NaT

5 rows × 80 columns

You could play around with the timings here and even try the .profile method to see what Polars spends its time on. In this scenario the speed advantage of Polars likely comes down to three things:

It is much faster at reading CSVs.
It is much faster at processing strings.
It can select/assign columns in parallel.

3.3 NumPy might make Polars faster sometimes

Polars gets along well with NumPy ufuncs, even in lazy mode (which is interesting because NumPy has no lazy API). Let’s see how this looks by calculating the great-circle distance between a bunch of coordinates.

3.3.1 Get the data

We create a lazy dataframe containing pairs of airports and their coordinates:

airports = pl.scan_csv("../data/airports.csv").drop_nulls().unique(subset=["AIRPORT"])
pairs = airports.join(airports, on="AIRPORT", how="cross").filter(
    (pl.col("AIRPORT") != pl.col("AIRPORT_right"))
    & (pl.col("LATITUDE") != pl.col("LATITUDE_right"))
    & (pl.col("LONGITUDE") != pl.col("LONGITUDE_right"))
)

3.3.2 Calculate great-circle distance

One use case for NumPy ufuncs is doing computations that Polars expressions don’t support. In this example Polars can do everything we need, though the ufunc version ends up being slightly faster:

Polars
NumPy

def deg2rad_pl(degrees: pl.Expr) -> pl.Expr:
    return degrees * math.pi / 180

def gcd_pl(lat1: pl.Expr, lng1: pl.Expr, lat2: pl.Expr, lng2: pl.Expr):
    ϕ1 = deg2rad_pl(90 - lat1)
    ϕ2 = deg2rad_pl(90 - lat2)

    θ1 = deg2rad_pl(lng1)
    θ2 = deg2rad_pl(lng2)

    cos = ϕ1.sin() * ϕ2.sin() * (θ1 - θ2).cos() + ϕ1.cos() * ϕ2.cos()
    arc = cos.arccos()
    return arc * 6373

def gcd_np(lat1, lng1, lat2, lng2):
    ϕ1 = np.deg2rad(90 - lat1)
    ϕ2 = np.deg2rad(90 - lat2)

    θ1 = np.deg2rad(lng1)
    θ2 = np.deg2rad(lng2)

    cos = np.sin(ϕ1) * np.sin(ϕ2) * np.cos(θ1 - θ2) + np.cos(ϕ1) * np.cos(ϕ2)
    arc = np.arccos(cos)
    return arc * 6373

We can pass Polars expressions directly to our gcd_np function, which is pretty nice since these things don’t even store the data themselves:

%%timeit
pairs.select(
    gcd_np(
        pl.col("LATITUDE"),
        pl.col("LONGITUDE"),
        pl.col("LATITUDE_right"),
        pl.col("LONGITUDE_right")
    )
).collect()

3.22 s ± 33.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On my machine the NumPy version used to be 5-20% faster than the pure Polars version, but this is no longer the case. Still you may want to see if it helps you:

%%timeit
pairs.select(
    gcd_pl(
        pl.col("LATITUDE"),
        pl.col("LONGITUDE"),
        pl.col("LATITUDE_right"),
        pl.col("LONGITUDE_right")
    )
).collect()

3.87 s ± 79.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This may not be a huge performance difference, but it at least means you don’t sacrifice speed when relying on NumPy. There are some gotchas though so watch out for those.

Also watch out for .to_numpy() - you don’t always need to call this and it can slow things down:

%%timeit
collected = pairs.collect()
gcd_np(
    collected["LATITUDE"].to_numpy(),
    collected["LONGITUDE"].to_numpy(),
    collected["LATITUDE_right"].to_numpy(),
    collected["LONGITUDE_right"].to_numpy()
)

3.94 s ± 47.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3.4 Polars can be slower than Pandas sometimes, maybe

Here’s an example where we calculate z-scores, using window functions in Polars and using groupby-transform in Pandas:

def create_frame(n, n_groups):
    return pl.DataFrame(
        {"name": np.random.randint(0, n_groups, size=n), "value2": np.random.randn(n)}
    )

def pandas_transform(df: pd.DataFrame) -> pd.DataFrame:
    g = df.groupby("name")["value2"]
    v = df["value2"]
    return (v - g.transform("mean")) / g.transform("std")


def polars_transform() -> pl.Expr:
    v = pl.col("value2")
    return (v - v.mean().over("name")) / v.std().over("name")

rand_df_pl = create_frame(50_000_000, 50_000)
rand_df_pd = rand_df_pl.to_pandas()

The Polars version tends to be 10-100% slower on my machine:

Polars
Pandas

%timeit rand_df_pl.select(polars_transform())

1.81 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pandas_transform(rand_df_pd)

1.35 s ± 62.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This example isn’t telling you to use Pandas in this specific situation. Once you add in the time spent reading a file, Polars likely wins.

And even here, if you sort by the name col, Polars wins again. It has fast-track algorithms for sorted data.

3.5 Summary

Polars is really fast. Pandas was already respectably fast and Polars wipes the floor with it.
You can still make Polars slow if you do silly things with it, but compared to Pandas it’s easier to do the right thing in the first place.
Polars works well with NumPy ufuncs.
There are still some situations where Pandas can be faster. They are probably not compelling, but we shouldn’t pretend they don’t exist.