‚ÄčEstimate amount of uncompressible data on Linux

Last Updated: Sep 02, 2014 09:15AM CEST

You may issue the following command on your Linux in order to find out how many jpegs, gifs, zipped files (badly compressible data) is on the system.
# find / \( -fstype iso9660 -o -fstype proc -o -fstype sysfs -o -fstype nfs -o -type s \) -prune -o -exec ls -l {} \; | egrep -e ".gif$|.mov$|.jpeg$|.zip$|.gz$|.Z$" > /var/tmp/find.out
You can add other extensions of movie formats if you know it to the egrep statement. Issue the command in one line. Now you can calculate the amount of data covered by the files listed in the find output:
# awk '{ s = $5 } END { printf ("%d\n", s/1024/1024) }' /var/tmp/find.out
Determine the number of respective files:
# wc -l /var/tmp/find.out
548319 /var/tmp/find.out
So the number of badly (or un-) compressible files on this system is around 550000 and the disc space used by those files is around 138 GiB in this example.
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
Invalid characters found