Free Resource · Data Science

Pandas Cheat Sheet

The essential commands for every data science project — save it, share it, use it.

GROWINDATA · Pandas Cheat Sheet · Kostanca Kovaci
Pandas Cheat Sheet
The essential commands for every data science project
Free resource by GrowInData · kovacikostanca.github.io
Save · Share · Use in every project
01
Loading & Exploring Data
Import & Load
import pandas as pd
df = pd.read_csv("file.csv")# load CSV
df = pd.read_excel("file.xlsx")# load Excel
df = pd.read_json("file.json")# load JSON
df = pd.read_csv("file.csv", usecols=["col1","col2"])
First look — run these before anything else
df.head()# first 5 rows
df.tail()# last 5 rows
df.shape# (rows, cols)
df.columns# column names
len(df)# row count
df.info()# types + nulls
df.describe()# stats summary
df.dtypes# column types
df.nunique()# unique per col
df.sample(5)# random 5 rows
02
Cleaning — Nulls, Duplicates & Types
Null values
df.isnull().sum()# count
df.dropna()# remove
df.dropna(subset=["col"])# one col
df.fillna(0)# fill 0
df["col"].fillna(df["col"].mean())
Types & duplicates
df.duplicated().sum()# count
df.drop_duplicates()# remove
df["col"].astype(int)# to int
df["col"].astype(str)# to str
pd.to_datetime(df["date"])# to date
03
Filtering & Selecting
Select columns
df["col"]# 1 col
df[["col1","col2"]]# multi
df.iloc[0:5]# by index
df.loc[0, "col"]# row+col
Filter rows
df[df["age"] > 30]
df[df["city"]=="Paris"]
df[df["col"].isin(list)]
df.query("age > 30")
Multiple conditions
df[(df["age"] > 30) & (df["city"] == "Paris")]# AND
df[(df["age"] < 20) | (df["age"] > 60)]# OR
df[~df["col"].isnull()]# NOT null
04
Grouping & Aggregating
Basic aggregations
df.groupby("category")["sales"].mean()# mean per group
df.groupby("category")["sales"].sum()# sum per group
df.groupby("category")["sales"].count()# count per group
df.groupby("category")["sales"].agg(["mean","sum","count"])
Multiple aggregations
df.groupby("cat").agg(
    mean_s=("sales","mean"),
    total=("sales","sum"),
    n=("sales","count")
)
Group by multiple cols
df.groupby(
    ["region","category"]
)["sales"].sum()
# Reset index after groupby
05
Sorting & Ranking
Sort
df.sort_values("col")# asc
df.sort_values("col", ascending=False)# desc
df.sort_values(["c1","c2"])# multi
df.sort_index()# by idx
Top / bottom
df.nlargest(5,"sales")# top 5
df.nsmallest(5,"sales")# bottom 5
df["col"].rank()# rank col
df["col"].value_counts()# freq tbl
df["col"].unique()# unique vals
06
Merging & Joining
pd.merge(df1, df2, on="id")# inner join (default)
pd.merge(df1, df2, on="id", how="left")# left join
pd.merge(df1, df2, on="id", how="right")# right join
pd.merge(df1, df2, on="id", how="outer")# outer join
pd.merge(df1, df2, left_on="id", right_on="uid")# diff col names
pd.concat([df1, df2], ignore_index=True)# stack rows
pd.concat([df1, df2], axis=1)# stack columns
Quick Tips
df.info()
Run df.info() first — always catch type and null issues before analysis
df.copy()
Use df.copy() before cleaning — never overwrite raw data
df.reset_index(drop=True)
After filtering to fix the index
df.rename(columns={"old":"new"})
To rename columns cleanly
df["col"].str.lower().str.strip()
To clean messy text columns
df.memory_usage(deep=True)
To check dataframe size on large datasets

Navigation

Learn

Contact

Get Started