I noticed a pattern when scrounging for target data on pentests. Most of the times in which I get valuable data (test creds/log data/unencrypted logs/etc) they are often in files that are in some way different than those around them. Sometimes its their filename, like when you have 400 files named "NightlyLogDATE" and you see a "NightlyLogDATE.bak". It also tends to happen with file sizes. You'll have the same directory and almost every file is around 400-600KB and a couple will be megabytes big or only a couple KB.
These files are "interesting" to me because they differ in some way. These are the outliers. Sometimes they will be temporary backup files where a tech needed to test credit card processing with encryption turned off, or maybe some error pumped traceback/debug output to an otherwise normal file.
I decided to scrounge around online to stitch together a script that will report these outlier files.
The following script will look in the target directory, calculate the median absolute deviation, compare it against a threshold and return the filenames for you to prioritize pillaging.
It's fairly basic so I'm happy to accept any code donations :D