Hyper-Zipping With pigz
Hyper-Zipping With pigz

Pigz is a crazee fast compression utility, as it can scale in parallel to the number of threads available to a computer, however it is not designed to be easy to use, it is easily installed with:
sudo apt install pigz -y
Using it can really trick you, if you set off to use it like zip (-r) with compression you would think it would work the same and it seemingly does!
pigz --fast -r coding_new.zip coding_new
Naturally this looks like it would quickly create 'coding_new.zip' a new zip file recursively pulling from the coding_new directory.
- The command completely works, but will trick the person into thinking it is building like the 'zip -r' command, but what it is actually doing is is recursively crawling the directory AND the coding_new.zip and whatever files it finds it is building a single .gz file and then leaving it in the recursive directories it is crawling!
- This can be a disaster as it can muck directory structures, the following script will clean this back:
f=$(find . -type f -name "*.gz")
for file in $f; do
printf "Unzipping $file\n"
gunzip $file
done
- The above will reverse an inadvertent pigz that may even have been run multiple times. Save it to a script and enable it as an executable with:
chmod +x unpigz.sh
Here is the correct way to get a clean single .gz file that is recursively built, it literally built a 3 GB .gz single file from 73,000 files in about 12 seconds on a 24-core Ryzen 9. That's fast!
# Check if pigz is installed
if ! command -v pigz &> /dev/null; then
echo "Error: pigz is not installed. Please install it."
exit 1
fi
# Check if tar is installed
if ! command -v tar &> /dev/null; then
echo "Error: tar is not installed. Please install it."
exit 1
fi
# Check if exactly two parameters are provided
if [ $# -ne 2 ]; then
echo "Usage: $0 <directory_to_compress> <output_tar_gz>"
exit 1
fi
INPUT_DIR="$1"
OUTPUT_FILE="$2"
# Check if the first parameter is a directory
if [ ! -d "$INPUT_DIR" ]; then
echo "Error: '$INPUT_DIR' is not a directory."
exit 1
fi
# Check if the second parameter ends with .gz
if [[ ! "$OUTPUT_FILE" =~ \.gz$ ]]; then
echo "Error: Output file '$OUTPUT_FILE' must end with .gz"
exit 1
fi
# Get number of CPU cores for pigz
THREADS=$(nproc)
# Compress the directory using tar and pigz
echo "Compressing '$INPUT_DIR' to '$OUTPUT_FILE' with $THREADS threads..."
tar -c -C "$INPUT_DIR" . | pigz -p "$THREADS" > "$OUTPUT_FILE"
# Check if compression was successful
if [ $? -eq 0 ]; then
echo "Compression completed successfully. Output: $OUTPUT_FILE"
else
echo "Error: Compression failed."
exit 1
fi
Let's time it, this time the script will behave very similar to a zip -r however when it activates all the cores, we get:
time ./pigzip.sh coding_new coding.gz
Compressing 'coding_new' to 'coding.gz' with 24 threads...
Compression completed successfully. Output: coding.gz
real 0m15.006s
To get the size of the folder exactly in bytes
du -sb coding_new
Which gives us: 6533415246 / 15 -> 435 MB/s effectively the saturation speed of the SSD reading it.
That's fast!
Now to unzip it, we can apply a final script for that again:
#!/bin/bash
# Check if pigz is installed
if ! command -v pigz &> /dev/null; then
echo "Error: pigz is not installed. Please install it."
exit 1
fi
# Check if tar is installed
if ! command -v tar &> /dev/null; then
echo "Error: tar is not installed. Please install it."
exit 1
fi
# Check if exactly one parameter is provided
if [ $# -ne 1 ]; then
echo "Usage: $0 <input_tar_gz>"
exit 1
fi
INPUT_FILE="$1"
# Check if the input file exists
if [ ! -f "$INPUT_FILE" ]; then
echo "Error: '$INPUT_FILE' does not exist."
exit 1
fi
# Check if the input file ends with .gz
if [[ ! "$INPUT_FILE" =~ \.gz$ ]]; then
echo "Error: Input file '$INPUT_FILE' must end with .gz"
exit 1
fi
# Get number of CPU cores for pigz
THREADS=$(nproc)
# Extract the tar.gz file using pigz and tar
echo "Extracting '$INPUT_FILE' with $THREADS threads..."
pigz -dc -p "$THREADS" "$INPUT_FILE" | tar -x
# Check if extraction was successful
if [ $? -eq 0 ]; then
echo "Extraction completed successfully."
else
echo "Error: Extraction failed."
exit 1
fi
The same process is applied:
time unzip.sh file.gz
On a fresh 8-core 16-thread laptop it was blisteringly fast to rebuild this file:
Extracting 'coding.gz' with 16 threads...
Extraction completed successfully.
real 0m21.059s
user 0m19.741s
sys 0m19.262s
Which again is saturating the SSD itself!