openpyxl vs pandas for Excel Automation: Which One Should You Use?
Purpose
Last week I was helping a freelancer on r/Python who knew pandas well but wondered if they should learn openpyxl for client work. This is a common trap — these two libraries look like alternatives because both read and write Excel files. They’re not. They operate at different layers, and understanding this distinction will save you from writing either over-engineered data pipelines or ugly spreadsheets.
The Real Relationship
pandas reads and writes Excel through ExcelWriter, which defaults to openpyxl for .xlsx files under the hood. So when you call df.to_excel(), you’re already using openpyxl — you just don’t have direct access to its features.
The confusion comes from thinking of them as competing tools. They’re complementary:
- pandas is a data processing library that happens to have Excel I/O
- openpyxl is an Excel manipulation library that doesn’t care about your data logic
When pandas Is Enough
If your workflow is “read data → transform → write data back”, pandas alone is fine. No formatting, no charts, no cell-level control — just numbers in cells.
import pandas as pd
df = pd.read_excel("sales.xlsx")summary = df.groupby("region")["revenue"].sum()summary.to_excel("summary.xlsx")This works. The output is a plain table with no styling. For internal tools or personal scripts, that’s often acceptable. For anything going to a client or manager, it isn’t.
When openpyxl Is Necessary
openpyxl becomes essential the moment you need Excel-specific features most non-technical stakeholders expect:
- Bold headers with custom background colors
- Number formatting (currency, date, percentage)
- Column widths that fit the content
- Charts embedded in the sheet
- Merged cells for report headers
- Data validation dropdowns
- Conditional formatting to highlight outliers
- Freeze panes so header rows stay visible when scrolling
pandas can’t do any of this. openpyxl can do all of it.
The Combined Workflow
Here’s the pattern I use in nearly every freelancing project:
import pandas as pdfrom openpyxl import load_workbookfrom openpyxl.styles import Font, PatternFill, Alignment, Border, Sidefrom openpyxl.chart import BarChart, Referencefrom openpyxl.utils.dataframe import dataframe_to_rows
# --- pandas: data processing ---df = pd.DataFrame({ "Product": ["Widget A", "Widget B", "Widget C", "Widget D"], "Revenue": [12000, 34000, 8900, 27500], "Cost": [8000, 22000, 6000, 18500], "Profit": [4000, 12000, 2900, 9000]})
df["Margin"] = (df["Profit"] / df["Revenue"] * 100).round(1)
# --- pandas: write raw data ---with pd.ExcelWriter("report.xlsx", engine="openpyxl") as writer: df.to_excel(writer, sheet_name="Summary", index=False)
# --- openpyxl: polish the output ---wb = load_workbook("report.xlsx")ws = wb["Summary"]
thin_border = Border( left=Side(style="thin"), right=Side(style="thin"), top=Side(style="thin"), bottom=Side(style="thin"))
header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")header_font = Font(bold=True, color="FFFFFF", size=11)
for cell in ws[1]: cell.fill = header_fill cell.font = header_font cell.alignment = Alignment(horizontal="center") cell.border = thin_border
for row in ws.iter_rows(min_row=2, max_row=ws.max_row, max_col=ws.max_column): for cell in row: cell.border = thin_border if cell.column == 5: # Margin column cell.number_format = '0.0"%'
ws.column_dimensions["A"].width = 14ws.column_dimensions["B"].width = 12ws.column_dimensions["C"].width = 12ws.column_dimensions["D"].width = 12ws.column_dimensions["E"].width = 12
chart = BarChart()chart.title = "Revenue vs Cost vs Profit"chart.style = 10data = Reference(ws, min_col=2, min_row=1, max_col=4, max_row=5)cats = Reference(ws, min_col=1, min_row=2, max_row=5)chart.add_data(data, titles_from_data=True)chart.set_categories(cats)chart.shape = 4ws.add_chart(chart, "G2")
ws.freeze_panes = "A2"
wb.save("report.xlsx")The comments are minimal on purpose — the flow is: pandas builds the data, openpyxl builds the presentation. Each library does what it’s good at.
Performance Reality
For most business reports (under 10MB, under 50K rows), both libraries perform fine. openpyxl can struggle with files over 100MB because it loads the entire workbook into memory. pandas with the openpyxl engine has the same limitation.
If you’re processing huge Excel files, look at openpyxl’s read-only mode or pandas chunked reading. For truly massive datasets, don’t use Excel as a database — that’s what databases are for.
When You Only Need One
Not every project needs both. Here’s my decision framework:
- Raw data pipeline: pandas only.
df.to_excel()is fine for intermediate files. - Fill a template: openpyxl only. Load the template workbook, insert values at specific cells, save. No pandas needed.
- Generate a formatted report from data: Both. pandas for aggregation, openpyxl for layout.
- Modify an existing workbook (add a sheet, update a cell): openpyxl only. pandas will overwrite the whole file.
Summary
In this post, I broke down the relationship between openpyxl and pandas for Excel work. The key point is they are complementary, not competitive — pandas handles data processing, openpyxl handles Excel presentation. For freelancing or any client-facing work, being able to chain them together is a skill that immediately elevates your output from “script that produces a spreadsheet” to “deliverable that looks like I spent hours on it.”
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 openpyxl Documentation
- 👨💻 pandas ExcelWriter Documentation
- 👨💻 openpyxl vs pandas: r/Python Discussion
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments