How to Persist Python Data Without a Database
Problem
I needed to persist data between Python script runs. Not complex data - just a simple cache of user preferences and the last run timestamp for a CLI tool I was building.
My first instinct was to reach for SQLite. After all, it’s built into Python and handles persistence well. But setting up a database for a simple key-value cache felt like overkill. I’d need to design a schema, write SQL queries, manage connections… all for storing a handful of values.
Then I tried JSON files. Simple enough - just dump the data to a file and read it back. But I quickly ran into issues:
import json
# Problem 1: Must read/write entire file every timewith open('cache.json', 'r') as f: data = json.load(f)
data['user'] = 'Alice'data['score'] = 42
# Problem 2: No random access - rewrite everythingwith open('cache.json', 'w') as f: json.dump(data, f)
# Problem 3: Manual serialization for complex objectsThis approach worked, but it was tedious. Every time I needed to update a single value, I had to read the entire file, modify it, and write it all back. For a simple script, this felt unnecessarily complex.
I also considered pickle, but it had the same issues as JSON - read everything, modify, write everything back. Plus, pickle files aren’t human-readable, which made debugging harder.
Solution
Then I discovered the shelve module - a hidden gem in Python’s standard library that provides persistent, dictionary-like storage backed by a file.
import shelve
# Open, write, close - that's itwith shelve.open('my_cache') as db: db['user_data'] = {'name': 'Alice', 'score': 42} db['last_run'] = '2024-01-15'
# Next script run - data is still therewith shelve.open('my_cache') as db: print(db['user_data']) # {'name': 'Alice', 'score': 42}That’s the entire API. Open it like a dictionary, write to it, close it, and your data persists. No schema design, no SQL, no connection strings.
How It Works
Shelve uses Python’s dbm module to create a simple key-value store, with pickle handling serialization automatically. Each key is a string, and values can be any picklable Python object.
import shelve
# Create or open a shelfwith shelve.open('app_data') as db: # Store various data types db['config'] = {'debug': True, 'timeout': 30} db['users'] = ['alice', 'bob', 'charlie'] db['counter'] = 42
# Read back in another sessionwith shelve.open('app_data') as db: print(db['config']) # {'debug': True, 'timeout': 30} print(db['users']) # ['alice', 'bob', 'charlie'] print(db['counter']) # 42The with statement ensures the shelf is properly closed, which is critical for data integrity.
The Mutable Object Trap
Here’s where I made a mistake. I tried to update a nested value in a stored dictionary:
import shelve
with shelve.open('app_data') as db: db['config'] = {'debug': False, 'timeout': 30}
# Later, I tried to update a nested valuewith shelve.open('app_data') as db: config = db['config'] config['debug'] = True # This doesn't persist! # db['config'] is still {'debug': False, 'timeout': 30}The issue is that db['config'] returns a copy, not a reference. Modifying it doesn’t update the shelf.
There are two solutions:
Solution 1: Write back the modified object
import shelve
with shelve.open('app_data') as db: config = db['config'] config['debug'] = True db['config'] = config # Explicitly write backSolution 2: Use writeback mode
import shelve
# Enable writeback modewith shelve.open('app_data', writeback=True) as db: config = db['config'] config['debug'] = True # Automatically persisted on closeWriteback mode is convenient but has a memory cost - all accessed objects are cached in memory until the shelf is closed. For most scripts, this isn’t an issue, but be aware of it for large datasets.
When to Use Shelve vs Alternatives
I’ve learned that shelve fills a specific niche. Here’s when I reach for each tool:
| Use Case | Recommended Tool ||---------------------------------------|------------------|| Simple key-value cache for scripts | Shelve || Configuration storage for CLI tools | Shelve || Prototyping before database design | Shelve || Need SQL queries | SQLite || Multiple processes accessing same data| SQLite || Human-readable data files | JSON || Cross-language compatibility | JSON |Shelve is perfect for single-process scripts that need persistence without the overhead of a database. But it has limitations:
- Not thread-safe: Don’t use it in multi-threaded applications without locking
- Not process-safe: Multiple processes shouldn’t access the same shelf file
- Platform-dependent: The underlying dbm implementation varies by platform
- Not human-readable: Files are binary, so you can’t inspect them with a text editor
Real-World Example
Here’s how I use shelve in a CLI tool that tracks API rate limits:
import shelveimport timefrom datetime import datetime, timedelta
class RateLimiter: def __init__(self, shelf_path='rate_limits'): self.shelf_path = shelf_path
def can_make_request(self, api_name, max_per_hour=100): """Check if we can make a request without exceeding rate limit.""" with shelve.open(self.shelf_path) as db: key = f'{api_name}_requests' requests = db.get(key, [])
# Filter to requests in the last hour cutoff = datetime.now() - timedelta(hours=1) requests = [r for r in requests if r > cutoff]
if len(requests) >= max_per_hour: return False
# Record this request requests.append(datetime.now()) db[key] = requests return True
def get_remaining(self, api_name, max_per_hour=100): """Get remaining requests for this hour.""" with shelve.open(self.shelf_path) as db: key = f'{api_name}_requests' requests = db.get(key, [])
cutoff = datetime.now() - timedelta(hours=1) requests = [r for r in requests if r > cutoff]
return max(0, max_per_hour - len(requests))
# Usagelimiter = RateLimiter()if limiter.can_make_request('github_api'): # Make the API call passelse: print(f"Rate limit reached. {limiter.get_remaining('github_api')} remaining")This would have been more complex with JSON (reading/writing the entire file each time) or SQLite (schema setup, SQL queries). With shelve, it’s just a few lines.
Common Mistakes to Avoid
-
Forgetting to close the shelf: Always use
with shelve.open()to ensure proper cleanup. -
Not handling mutable object updates: Remember that
db['key']returns a copy unless you use writeback mode. -
Using shelve for concurrent access: Shelve is not thread-safe or process-safe. Use SQLite for concurrent access.
-
Storing unpicklable objects: Some objects can’t be pickled (e.g., file handles, database connections). Stick to basic data types.
-
Assuming cross-platform compatibility: The underlying dbm implementation varies, so shelf files may not be portable between operating systems.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments