Skip to content

openpyxl vs pandas for Excel Automation: Which One Should You Use?

Purpose

Last week I was helping a freelancer on r/Python who knew pandas well but wondered if they should learn openpyxl for client work. This is a common trap — these two libraries look like alternatives because both read and write Excel files. They’re not. They operate at different layers, and understanding this distinction will save you from writing either over-engineered data pipelines or ugly spreadsheets.

The Real Relationship

pandas reads and writes Excel through ExcelWriter, which defaults to openpyxl for .xlsx files under the hood. So when you call df.to_excel(), you’re already using openpyxl — you just don’t have direct access to its features.

The confusion comes from thinking of them as competing tools. They’re complementary:

  • pandas is a data processing library that happens to have Excel I/O
  • openpyxl is an Excel manipulation library that doesn’t care about your data logic

When pandas Is Enough

If your workflow is “read data → transform → write data back”, pandas alone is fine. No formatting, no charts, no cell-level control — just numbers in cells.

pandas_only.py
import pandas as pd
df = pd.read_excel("sales.xlsx")
summary = df.groupby("region")["revenue"].sum()
summary.to_excel("summary.xlsx")

This works. The output is a plain table with no styling. For internal tools or personal scripts, that’s often acceptable. For anything going to a client or manager, it isn’t.

When openpyxl Is Necessary

openpyxl becomes essential the moment you need Excel-specific features most non-technical stakeholders expect:

  • Bold headers with custom background colors
  • Number formatting (currency, date, percentage)
  • Column widths that fit the content
  • Charts embedded in the sheet
  • Merged cells for report headers
  • Data validation dropdowns
  • Conditional formatting to highlight outliers
  • Freeze panes so header rows stay visible when scrolling

pandas can’t do any of this. openpyxl can do all of it.

The Combined Workflow

Here’s the pattern I use in nearly every freelancing project:

combined_workflow.py
import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.chart import BarChart, Reference
from openpyxl.utils.dataframe import dataframe_to_rows
# --- pandas: data processing ---
df = pd.DataFrame({
"Product": ["Widget A", "Widget B", "Widget C", "Widget D"],
"Revenue": [12000, 34000, 8900, 27500],
"Cost": [8000, 22000, 6000, 18500],
"Profit": [4000, 12000, 2900, 9000]
})
df["Margin"] = (df["Profit"] / df["Revenue"] * 100).round(1)
# --- pandas: write raw data ---
with pd.ExcelWriter("report.xlsx", engine="openpyxl") as writer:
df.to_excel(writer, sheet_name="Summary", index=False)
# --- openpyxl: polish the output ---
wb = load_workbook("report.xlsx")
ws = wb["Summary"]
thin_border = Border(
left=Side(style="thin"),
right=Side(style="thin"),
top=Side(style="thin"),
bottom=Side(style="thin")
)
header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
header_font = Font(bold=True, color="FFFFFF", size=11)
for cell in ws[1]:
cell.fill = header_fill
cell.font = header_font
cell.alignment = Alignment(horizontal="center")
cell.border = thin_border
for row in ws.iter_rows(min_row=2, max_row=ws.max_row, max_col=ws.max_column):
for cell in row:
cell.border = thin_border
if cell.column == 5: # Margin column
cell.number_format = '0.0"%'
ws.column_dimensions["A"].width = 14
ws.column_dimensions["B"].width = 12
ws.column_dimensions["C"].width = 12
ws.column_dimensions["D"].width = 12
ws.column_dimensions["E"].width = 12
chart = BarChart()
chart.title = "Revenue vs Cost vs Profit"
chart.style = 10
data = Reference(ws, min_col=2, min_row=1, max_col=4, max_row=5)
cats = Reference(ws, min_col=1, min_row=2, max_row=5)
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
chart.shape = 4
ws.add_chart(chart, "G2")
ws.freeze_panes = "A2"
wb.save("report.xlsx")

The comments are minimal on purpose — the flow is: pandas builds the data, openpyxl builds the presentation. Each library does what it’s good at.

Performance Reality

For most business reports (under 10MB, under 50K rows), both libraries perform fine. openpyxl can struggle with files over 100MB because it loads the entire workbook into memory. pandas with the openpyxl engine has the same limitation.

If you’re processing huge Excel files, look at openpyxl’s read-only mode or pandas chunked reading. For truly massive datasets, don’t use Excel as a database — that’s what databases are for.

When You Only Need One

Not every project needs both. Here’s my decision framework:

  • Raw data pipeline: pandas only. df.to_excel() is fine for intermediate files.
  • Fill a template: openpyxl only. Load the template workbook, insert values at specific cells, save. No pandas needed.
  • Generate a formatted report from data: Both. pandas for aggregation, openpyxl for layout.
  • Modify an existing workbook (add a sheet, update a cell): openpyxl only. pandas will overwrite the whole file.

Summary

In this post, I broke down the relationship between openpyxl and pandas for Excel work. The key point is they are complementary, not competitive — pandas handles data processing, openpyxl handles Excel presentation. For freelancing or any client-facing work, being able to chain them together is a skill that immediately elevates your output from “script that produces a spreadsheet” to “deliverable that looks like I spent hours on it.”

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments