update Trey’s lang ID evaluation tools
My crufty homegrown tools for analysis are fine for my existing language optimization workflow, but I need to check that they can handle more general inputs, in particular deal with "i don't know" results from TextCat, evaluation by bucketing scores, and support finding optimal configs not based just on model size.

Create a new tool to set up and run a large number of possible configurations. (And see if it make any sense to use Erik's RelForge work.)