- Oh no, my disk is full.
- Old idea: Reduce multiple copies of the same files.
rdfind -outputname /dev/stdout ~
- Maybe this also works for
- Yes, but work on
.deb files instead.
- Just report issues and let maintainers fix them. ⇒ QA
Architecture: package import
- Currently processing sid main amd64.
- Save metadata such as version and dependencies.
- For each regular file, store filename and size.
- Compute hashes of files and store them.
gzip_sha512: Decompress. Then hash. Failure ⇒ no hash produced.
png_sha512: Convert PNG to 8bit RGBA, then hash. Ignore non-PNGs.
png_sha512. Consider first frame only.
- For each combination of packages and hash functions, compute the "sharing".
- All but one copy of a file in a single package are considered redundant.
- All copies also present in other packages are considered redundant.
- Differently compressed PNG files considered equal.
GPL-3 can be shared as
- Issues with individual files. Example: broken
.gz or PNGs not named
- 2 GB sqlite database file (800 MB indices), 400 MB
- 40k packages, 4m files, 5m hash values
- full import takes about 2 CPU days