Generate random large file using python
1. The environment
- Python 2.7.10
2. The targets
There are four targets in this post:
- Generate a big binary file filled with random hex codes
- Generate a big text file filled with random alphabets/letters
- Generate a big empty/sparse file
- Generate a big text file filled with lines of random strings (sentences)
2.1 Generate a big binary file filled with random hex codes
def generate_big_random_bin_file(filename, size): """ Generate a big binary file with the specified size in bytes :param filename: the filename :param size: the size in bytes :return: void """ import os with open('%s' % filename, 'wb') as fout: fout.write(os.urandom(size)) # 1 print('big random binary file with size %f generated ok' % size) pass
The line #1
uses the os.urandom function to generate random bytes. Here is the explanation of this function:
os.urandom(n)
Return a string of n random bytes suitable for cryptographic use.
This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On a UNIX-like system, this will query /dev/urandom, and on Windows, it will use CryptGenRandom(). If a randomness source is not found, NotImplementedError will be raised.
Run it:
if __name__ == '__main__': generate_big_random_bin_file("temp_big_bin.dat", 1024 * 1024)
And we get this result:
-rw-r--r-- 1 zzz staff 1.0M Apr 25 22:04 temp_big_bin.dat
2.2 Generate a big text file filled with random alphabets/letters
def generate_big_random_letters(filename, size): """ Generate big random letters/alphabets to a file :param filename: the filename :param size: the size in bytes :return: void """ import random import string
chars = ''.join([random.choice(string.letters) for i in range(size)]) # 1
with open(filename, 'w') as f: f.write(chars) pass
The key point is the random.choice function. Here is an introduction to this function:
random.choice(seq)
Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
Run it:
if __name__ == '__main__': generate_big_random_letters("temp_big_letters.txt", 1024 * 1024)
And we get this result:
-rw-r--r-- 1 zzz staff 1.0M Apr 25 22:15 temp_big_letters.txt
2.3 Generate a big empty/sparse file
def generate_big_sparse_file(filename, size): f = open(filename, "wb") f.seek(size - 1) f.write("\1") f.close() pass
The key point is the f.seek function call, which sets the pointer to the end of the file and writes a byte.
Run it:
if __name__ == '__main__': generate_big_sparse_file("temp_big_sparse.dat", 100)
And we get this file content:
0000 0000 0000 0000 0000 0000 0000 00000000 0000 0000 0000 0000 0000 0000 00000000 0000 0000 0000 0000 0000 0000 00000000 0000 0000 0000 0000 0000 0000 00000000 0000 0000 0000 0000 0000 0000 00000000 0000 0000 0000 0000 0000 0000 00000000 0001
We can see that the byte 0001 is at the end of the file.
2.4 Generate a big text file filled with lines of random strings (sentences)
def generate_big_random_sentences(filename, linecount): import random nouns = ("puppy", "car", "rabbit", "girl", "monkey") verbs = ("runs", "hits", "jumps", "drives", "barfs") adv = ("crazily.", "dutifully.", "foolishly.", "merrily.", "occasionally.") adj = ("adorable", "clueless", "dirty", "odd", "stupid")
all = [nouns, verbs, adj, adv]
with open(filename, 'w') as f: for i in range(linecount): f.writelines([' '.join([random.choice(i) for i in all]), '\n']) pass
The key points are:
- Set up four arrays containing elements of sentences, e.g., nouns, verbs, adverbs, and adjectives.
- Construct an array of arrays for use.
- Use the random.choice function to select a random word to construct a random sentence.
Run it (generate 1000 lines of sentences):
if __name__ == '__main__': generate_big_random_sentences("temp_big_sentences.txt", 1000)
And we get this file content:
==> temp_big_sentences.txt <==car hits stupid dutifully.puppy runs dirty occasionally.rabbit barfs adorable occasionally.puppy barfs adorable occasionally.girl jumps odd foolishly.monkey runs adorable crazily.girl drives stupid foolishly.puppy drives clueless dutifully.car hits clueless crazily.girl barfs adorable crazily.......
Summary
This post demonstrated how to generate large files in Python, including binary files filled with random hex codes, text files filled with random letters, sparse files, and files filled with random sentences. The key functions used include os.urandom
, random.choice
, and f.seek
. These techniques are useful for testing, data generation, and other applications where large files are needed.
Final Words + More Resources
My intention with this article was to help others who might be considering solving such a problem. So I hope that’s been the case here. If you still have any questions, don’t hesitate to ask me by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 python IO
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!