Parsing team used to run regular Parsoid round trip tests on ruthenium ( https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing ). These tests are run to catch any regressions before new versions of Parsoid are deployed into production. We use testreduce to do these test runs. The testreduce service uses a mysql db to store information about the test pages and the results from testing. There is a web UI to this db that lets the parsing team examine test results and identify regressions and fix them.
The parsoid-rt testing db will have 160K pages and each test run will create stats, results, performance entries for each of the pages. And, normally, we run tests ~3 times a week, sometimes more frequently.
We also use testreduce to do visual testing comparing Parsoid output to Mediawiki output ( https://www.mediawiki.org/wiki/Parsoid/Visual_Diffs_Testing ). This is right now a small database, but will eventually be updated to run tests on a much larger corpus (few 10s of thousand pages).
We need 2 databases right away to replace the ones we previously had there before the reimaging (T122328).
* testreduce_0715
* testreduce_vd
You can delete the testreduce database (which is the old version that testreduce_0715 replaced).
The schema is https://github.com/wikimedia/mediawiki-services-parsoid-testreduce/blob/master/server/sql/create_everything.mysql
Note that I don't necessary need this database to reside on ruthenium or a production machine necessarily. All I need is a place to host a couple of databases each of which can grow to upto 100 gb (but, if we are diligent about purging old results, we could perhaps manage with 50-75 gb, I imagine). These are also not performance critical databases. They are however critical for actual parsoid deployments.
A quick turnaround on this would be greatly appreciated since parsoid deploys are blocked on us being able to get ruthenium back in operation so we can run tests to identify regressions and fixes.