How to Start Freelancing with Python: openpyxl, pandas, and numpy for Data Cleaning

Jun 5, 2026

Purpose

I got a message from a college student on r/Python the other day. He knows pandas basics and wants to freelance with openpyxl, pandas, and numpy. His question: where do I even start?

Short answer: data cleaning. It’s the highest-demand, lowest-barrier Python freelancing niche I know. Every business on the planet has spreadsheet problems, and almost nobody outside software knows how to fix them programmatically.

The market is real

Think about who deals with messy data every day:

Real estate agents tracking leads in Excel
Accountants consolidating monthly reports
Marketing agencies merging CSV exports from six platforms
Small business owners who copy-paste data by hand

They all have the same problem. They know Excel exists but don’t know how to clean, merge, or automate it. They will pay you to make it go away.

Your stack is already enough

If you know pandas + openpyxl + numpy, you have the complete pipeline:

pandas — read, filter, merge, deduplicate, transform
openpyxl — write formatted Excel with styling, charts, pivot tables
numpy — vectorized calculations, array operations
Bonus: pathlib for batch file handling, re for text cleanup

These tools aren’t competing. They form a pipeline: pandas cleans the data, openpyxl produces the polished output, numpy handles the math in between.

What people actually pay for

Here are the real gigs I’ve seen on Upwork and Fiverr:

“Clean and standardize this CSV from our CRM” — $50-150
“Merge these 12 monthly reports into one summary” — $80-200
“Remove duplicates from our customer list” — $40-100
“Convert these PDF tables to a clean Excel file” — $100-250
“Build an automated weekly report generator” — $200-500 retainer

None of these require machine learning or web scraping. Just solid data handling and a script that works.

A concrete example

Here’s the kind of script you’d write week one. A customer sends a messy CSV export — inconsistent phone formats, mixed state abbreviations, duplicate rows.

import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import Font, Alignment
import re

def clean_phone(phone):
    digits = re.sub(r'\D', '', str(phone))
    if len(digits) == 10:
        return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}"
    return phone

def process_customer_export(input_path, output_path):
    df = pd.read_csv(input_path)
    print(f"Loaded {len(df)} records")

    df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]

    df["phone"] = df["phone"].apply(clean_phone)

    state_map = {"california": "CA", "new york": "NY", "texas": "TX"}
    if "state" in df.columns:
        df["state"] = df["state"].str.lower().map(state_map).fillna(df["state"])

    before = len(df)
    dedup_col = "email" if "email" in df.columns else "phone"
    df = df.drop_duplicates(subset=[dedup_col])
    print(f"Removed {before - len(df)} duplicates")

    with pd.ExcelWriter(output_path, engine="openpyxl") as writer:
        df.to_excel(writer, sheet_name="Clean Data", index=False)

    wb = load_workbook(output_path)
    ws = wb["Clean Data"]
    for cell in ws[1]:
        cell.font = Font(bold=True)
        cell.alignment = Alignment(horizontal="center")
    wb.save(output_path)

    print(f"Finished -> {output_path}")

That’s it. 35 lines, a real deliverable. Your client gets a formatted Excel file instead of a messy CSV. You get paid.

Getting started

Step 1: Build a portfolio (one weekend)

Create 3 GitHub repos with fake but realistic data. Clean a customer list. Merge sales reports. Automate an invoice summary. Make the README professional — describe what the script does, show a before/after screenshot.

Step 2: Set up shop

Create profiles on Upwork and Fiverr. Your title: “Python developer — Excel automation, data cleaning, reporting.” Write a short description that lists the exact problems you solve. Use plain language, not tech jargon.

Step 3: Find your first gigs

Search for “Excel automation Python” or “data cleaning” on Upwork. Bid low initially — $20-30/hour. Your goal is testimonials and a work history, not maximizing your first paycheck. After 3-5 gigs, raise your rates.

Pricing guide

Service	Price range
One-time data cleanup script	$50-200
Weekly/monthly report automation	$200-500/month retainer
Hourly consulting	$20-50/hr

Start at the low end, deliver fast, ask for reviews, then increase.

What I’d do differently

If I were starting this today, I’d focus on one industry first. Real estate agents are a sweet spot — they have tons of CSV exports from MLS systems, limited tech skills, and budget for tools that save them time. Pick a niche, learn their data format, and become the specialist.

Also, save every script you write. After a few gigs you’ll have a library of reusable patterns. That same phone number cleaner, deduplication routine, or file merger — you’ll use them again and again.

Summary

In this post, I laid out a practical path from “knows some pandas” to “paid freelance data cleaner.” The key point is that openpyxl, pandas, and numpy form a complete pipeline for exactly the kind of spreadsheet work businesses will pay for. Start small, ship scripts, collect testimonials, and raise rates.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!