Searching


We have struggled with VxWorks (included include files), Linux kernel for nested include files, and not knowing which one was used as the kernel hooks in almost the whole include directory. Searches we wrote picked out a byte at a time, however, we wanted to use the 64-bit power in present x86-64 for comparing up to 8 characters at a time. This is one of those tasks that never gets done until you cannot take the couple of minute searches through large hard drives. When looking for journal articles or phrases, matters get even worse as a few gigabytes don’t sift through very quickly. We have looked at a couple of open-source packages and would like to intergrate indexing into a MySQL database for some files. We looked at Donald Knuth’s “The Art of Computer Programming” for guidance and have a fair bit of code running and documented.

If you have a pressing need for disk deduplication without the latest and greatest software, we handle up to 100,000 files for trimming “bit-rot” and generate scripts that can be executed for removing files—even those horrible net downloads with spaces and forward slashes that look like directory separators. Funding will always improve the searching and trim software. Contact us if you have files on a Unix-like file system. We cannot offer the software on a Microsoft platform with back slashes and drive partitions. We are not sure what some of the Windows files are used for and would hate to render your Windows box more crippled than necessary. You could always copy a directory tree onto a serial drive and we can take a scan of it before ripping out the duplicates.

Typical clutter run

The command-line programs are called via a script, but it would be nice to call them from a graphical interface. The dots give progress as a hundred or so files are scanned, with a number of files called displayed before 80 dots to keep within the old 80 column terminal screens. The output are scripts that can be examined before execution, as the files are removed with a “rm -f” command, and there is no getting them back once they have been removed (they do not visit the trash can or recycle bin).

ian-clarks-computer:~ ian$ ~/shell_scripts/clutter.sh
Enter the reference directory (full path): /home
Enter the directory with clutter in (full path): /Volumes/Passport/homeMini
.......................................................................
7099 .......................................................................
14199 .......................................................................
21299 .......................................................................
28399 .......................................................................
35499 .......................................................................
42599 .......................................................................
49699 .......................................................................
56799 .......................................................................
63899 .......................................................................
70999 .......................................................................
78099 .......................................................................
85199 .......................................................................
92299 .......................................................................
99399 .......................................................................
106499 .......................................................................
113599 ...............
clutter: 94313 identical files out of 115168 total
clutter: 33 files differ out of 115168 total
clutter: 20822 files missing of 115168 total