21
Lesson 21
File compression
Objective
By the end of this lesson, students will understand how to work with compressed files in Python using the gzip and zipfile modules. They will learn how to compress and decompress files to save storage space and improve file transfer efficiency.
1. Introduction to file compression:
File compression reduces the size of files to save disk space and make data transfer faster.
Common compression formats include Gzip and ZIP, each with its own use cases and benefits.
2. The gzip Module:
The gzip module provides a simple interface for reading and writing GNU zip files. It is particularly useful for compressing single files or streams.
By the end of this lesson, students will understand how to work with compressed files in Python using the gzip and zipfile modules. They will learn how to compress and decompress files to save storage space and improve file transfer efficiency.
1. Introduction to file compression:
File compression reduces the size of files to save disk space and make data transfer faster.
Common compression formats include Gzip and ZIP, each with its own use cases and benefits.
2. The gzip Module:
The gzip module provides a simple interface for reading and writing GNU zip files. It is particularly useful for compressing single files or streams.
import gzip
3. Compressing files with gzip:
You can compress a file using the gzip.open() function in write mode ('wb').
import gzip import shutil with open('file.txt', 'rb') as f_in: with gzip.open('file.txt.gz', 'wb') as f_out: shutil.copyfileobj(f_in, f_out)
4. Decompressing files with gzip:
To read a compressed file, use gzip.open() in read mode ('rb').
with gzip.open('file.txt.gz', 'rb') as f_in: with open('file_decompressed.txt', 'wb') as f_out: shutil.copyfileobj(f_in, f_out)
5. The zipfile module:
The zipfile module allows you to read and write ZIP files, supporting multiple files and directories.
It provides more flexibility for handling archives containing multiple files.
import zipfile
6. Creating ZIP files:
You can create a ZIP file by opening a new ZIP archive in write mode ('w') and adding files to it.
with zipfile.ZipFile('archive.zip', 'w') as zipf: zipf.write('file1.txt') zipf.write('file2.txt')
7. Extracting ZIP files:
To extract files from a ZIP archive, open it in read mode ('r').
with zipfile.ZipFile('archive.zip', 'r') as zipf: zipf.extractall('extracted_files')
8. Reading ZIP file contents:
You can list the contents of a ZIP file without extracting them.
with zipfile.ZipFile('archive.zip', 'r') as zipf: print(zipf.namelist()) # List all files in the archive
9. Compression levels and options:
Both gzip and zipfile allow specifying compression levels. For zipfile, use compresslevel during file creation.
The default compression level is usually sufficient, but you can adjust it for better compression or speed.
with zipfile.ZipFile('archive.zip', 'w', compression=zipfile.ZIP_DEFLATED, compresslevel=9) as zipf: zipf.write('file.txt')
10. Best practices:
- Use compression for large files to save space and improve transfer speed.
- Be mindful of the trade-off between compression speed and file size.
- Always check the integrity of compressed files after extraction.
11. Exercises:
Exercise 1: Compress a text file
1. Write a program that compresses a text file using the gzip module.
Exercise 2: Decompress a Gzip file
1. Implement a script that decompresses a Gzip file and verifies its contents.
Exercise 3: Create a ZIP archive
1. Create a ZIP archive containing multiple files and directories.
Exercise 4: Extract specific files from a ZIP archive
1. Write a function that extracts only specific files from a ZIP archive.
Exercise 5: Explore compression levels
1. Experiment with different compression levels in the zipfile module and measure the resulting file sizes.
Conclusion
In this lesson, students learned how to work with compressed files in Python using the gzip and zipfile modules. They explored the processes of compressing and decompressing files, creating ZIP archives, and the importance of file compression for efficient storage and transfer. Understanding these concepts equips students with essential tools for managing data in their applications effectively.