Page MenuHomePhabricator

Need databases provisioned for parsoid-rt testing, visual diff testing
Closed, ResolvedPublic

Description

Parsing team used to run regular Parsoid round trip tests on ruthenium ( https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing ). These tests are run to catch any regressions before new versions of Parsoid are deployed into production. We use testreduce to do these test runs. The testreduce service uses a mysql db to store information about the test pages and the results from testing. There is a web UI to this db that lets the parsing team examine test results and identify regressions and fix them.

The parsoid-rt testing db will have 160K pages and each test run will create stats, results, performance entries for each of the pages. And, normally, we run tests ~3 times a week, sometimes more frequently.

We also use testreduce to do visual testing comparing Parsoid output to Mediawiki output ( https://www.mediawiki.org/wiki/Parsoid/Visual_Diffs_Testing ). This is right now a small database, but will eventually be updated to run tests on a much larger corpus (few 10s of thousand pages).

We need 2 databases right away to replace the ones we previously had there before the reimaging (T122328).

  • testreduce_0715
  • testreduce_vd

You can delete the testreduce database (which is the old version that testreduce_0715 replaced). The data is at /mnt/data/mysql on ruthenium.

The schema is https://github.com/wikimedia/mediawiki-services-parsoid-testreduce/blob/master/server/sql/create_everything.mysql
Note that I don't necessary need this database to reside on ruthenium or a production machine necessarily. All I need is a place to host a couple of databases each of which can grow to upto 100 gb (but, if we are diligent about purging old results, we could perhaps manage with 50-75 gb, I imagine). These are also not performance critical databases. They are however critical for actual parsoid deployments.

A quick turnaround on this would be greatly appreciated since parsoid deploys are blocked on us being able to get ruthenium back in operation so we can run tests to identify regressions and fixes.

Event Timeline

ssastry created this task.Jan 25 2016, 8:40 PM
ssastry raised the priority of this task from to High.
ssastry updated the task description. (Show Details)
ssastry added a project: Operations.
ssastry added subscribers: ssastry, DBA.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 25 2016, 8:40 PM
ssastry set Security to None.
ssastry updated the task description. (Show Details)Jan 25 2016, 8:52 PM

2 questions:

  • Need for backups?
  • Can I bring the service down for maintenace?
  • No backups required.
  • Yes. But, please check in with us before doing so just in case we are running tests or checking test results for deployment.

Second part is assumed always.

Let's settle it on m5-master, if it requires too much load, we will need to create m6.

Does mariadb 10 work for you?

I assume the mysql schema is compliant with it? If so, yes.

I need to start mysql on ruthenium and export it for the migration. The application then has to puppetize its config to point the database host to 'm5-master.eqiad.wmnet'.

I am cloning the data before exporting it to start mysql on a file copy to not modify the original. That will take some time.

The database has been exported, the 3 databases are being imported now into m5-master.

jcrespo closed this task as Resolved.Jan 26 2016, 6:14 PM
jcrespo claimed this task.

The 3 databases have been successfully imported into m5-master. Use T124704 to request access and puppetizing it.