How to Start Freelancing with Python: openpyxl, pandas, and numpy for Data Cleaning
Purpose
I got a message from a college student on r/Python the other day. He knows pandas basics and wants to freelance with openpyxl, pandas, and numpy. His question: where do I even start?
Short answer: data cleaning. It’s the highest-demand, lowest-barrier Python freelancing niche I know. Every business on the planet has spreadsheet problems, and almost nobody outside software knows how to fix them programmatically.
The market is real
Think about who deals with messy data every day:
- Real estate agents tracking leads in Excel
- Accountants consolidating monthly reports
- Marketing agencies merging CSV exports from six platforms
- Small business owners who copy-paste data by hand
They all have the same problem. They know Excel exists but don’t know how to clean, merge, or automate it. They will pay you to make it go away.
Your stack is already enough
If you know pandas + openpyxl + numpy, you have the complete pipeline:
- pandas — read, filter, merge, deduplicate, transform
- openpyxl — write formatted Excel with styling, charts, pivot tables
- numpy — vectorized calculations, array operations
- Bonus:
pathlibfor batch file handling,refor text cleanup
These tools aren’t competing. They form a pipeline: pandas cleans the data, openpyxl produces the polished output, numpy handles the math in between.
What people actually pay for
Here are the real gigs I’ve seen on Upwork and Fiverr:
- “Clean and standardize this CSV from our CRM” — $50-150
- “Merge these 12 monthly reports into one summary” — $80-200
- “Remove duplicates from our customer list” — $40-100
- “Convert these PDF tables to a clean Excel file” — $100-250
- “Build an automated weekly report generator” — $200-500 retainer
None of these require machine learning or web scraping. Just solid data handling and a script that works.
A concrete example
Here’s the kind of script you’d write week one. A customer sends a messy CSV export — inconsistent phone formats, mixed state abbreviations, duplicate rows.
import pandas as pdfrom openpyxl import load_workbookfrom openpyxl.styles import Font, Alignmentimport re
def clean_phone(phone): digits = re.sub(r'\D', '', str(phone)) if len(digits) == 10: return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}" return phone
def process_customer_export(input_path, output_path): df = pd.read_csv(input_path) print(f"Loaded {len(df)} records")
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]
df["phone"] = df["phone"].apply(clean_phone)
state_map = {"california": "CA", "new york": "NY", "texas": "TX"} if "state" in df.columns: df["state"] = df["state"].str.lower().map(state_map).fillna(df["state"])
before = len(df) dedup_col = "email" if "email" in df.columns else "phone" df = df.drop_duplicates(subset=[dedup_col]) print(f"Removed {before - len(df)} duplicates")
with pd.ExcelWriter(output_path, engine="openpyxl") as writer: df.to_excel(writer, sheet_name="Clean Data", index=False)
wb = load_workbook(output_path) ws = wb["Clean Data"] for cell in ws[1]: cell.font = Font(bold=True) cell.alignment = Alignment(horizontal="center") wb.save(output_path)
print(f"Finished -> {output_path}")That’s it. 35 lines, a real deliverable. Your client gets a formatted Excel file instead of a messy CSV. You get paid.
Getting started
Step 1: Build a portfolio (one weekend)
Create 3 GitHub repos with fake but realistic data. Clean a customer list. Merge sales reports. Automate an invoice summary. Make the README professional — describe what the script does, show a before/after screenshot.
Step 2: Set up shop
Create profiles on Upwork and Fiverr. Your title: “Python developer — Excel automation, data cleaning, reporting.” Write a short description that lists the exact problems you solve. Use plain language, not tech jargon.
Step 3: Find your first gigs
Search for “Excel automation Python” or “data cleaning” on Upwork. Bid low initially — $20-30/hour. Your goal is testimonials and a work history, not maximizing your first paycheck. After 3-5 gigs, raise your rates.
Pricing guide
| Service | Price range |
|---|---|
| One-time data cleanup script | $50-200 |
| Weekly/monthly report automation | $200-500/month retainer |
| Hourly consulting | $20-50/hr |
Start at the low end, deliver fast, ask for reviews, then increase.
What I’d do differently
If I were starting this today, I’d focus on one industry first. Real estate agents are a sweet spot — they have tons of CSV exports from MLS systems, limited tech skills, and budget for tools that save them time. Pick a niche, learn their data format, and become the specialist.
Also, save every script you write. After a few gigs you’ll have a library of reusable patterns. That same phone number cleaner, deduplication routine, or file merger — you’ll use them again and again.
Summary
In this post, I laid out a practical path from “knows some pandas” to “paid freelance data cleaner.” The key point is that openpyxl, pandas, and numpy form a complete pipeline for exactly the kind of spreadsheet work businesses will pay for. Start small, ship scripts, collect testimonials, and raise rates.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments