Just doing some stuff and work and found a couple of new things, figured I’d share. The following bash code will list all the pages getting a 404 error, in order of the number of files, from a standard combined format apache log file:
cat apache-log | cut -d ‘ ‘ -f 7,9 | grep -r 404 | sed -e ‘s/404$//’ | sort | uniq -c | sort -nr