Page MenuHomePhabricator

Using a "_test" database that gets created and torn down for every test
Closed, ResolvedPublic

Description

I'm working on the new version of WP1.0 Bot that is being re-written in Python.

The general operation of the bot is to read data from enwiki_p and write data to it's own database based on what it finds.

As outlined in the wiki page for the rewrite, I initially wrote this using SQLAlchemy, but it was 10x too slow. I'm now trying to re-write using pymysql and raw database access.

This provides a problem for my unit tests (which I am also largely re-writing) because with SQLAlchemy I could access an in-memory sqlite database in a database agnostic way, but now MySQL specific database langauge, such as "ON DUPLICATE KEY UPDATE", create problems in tests.

What I would like is some kind of test database that I could populate and destroy, possibly with fixtures, on every test run. I'm thinking of something modeled on the Ruby on Rails system, where there is the production database of course, mytool_prod, but also a mytool_test database. Of course, I'm definitely worried about "crossing the streams" and making sure that my test code only destroys the test database. Perhaps I could have a different db user that I use in tests that only has access to the test database?

I'd also like to mock out the enwiki_p database, so I can populate it with fake data in tests and make sure my code is processing it correctly.

I have Travis CI set up for the project, based on the in-memory sqlite database, and I would love to also have it set up for CI with whatever solution I come up with for the test database.

Thanks for any insight on how this might be done or if I'm going down a completely insane path.

Event Timeline

That looks promising for the Travis CI environment yes. I guess my problem is that I'm not developing locally, I'm developing directly on the Toolforge instance and I guess I just need to provision a database for test there?

It seems like you could do it using a sensible name for the DB on ToolsDB. The tables there are replicated, which isn't ideal for this purpose, but it depends on how this is used. A db that pops up and is then torn down once in a while seems like it would be fine.

If this is just being run from the command line once in a while, it's how I'd try doing it, as long as we aren't trying to use the wikimedia jenkins CI for it. That would need a dockerized mariadb image, I imagine, that it has control of.

https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#User_databases

The thing here is just that you have to keep your db names really clear and separate, obviously :)

Have you tried using SQLAlchemy expession language as well? Was it just as slow as the ORM? This sort of management of setup is very much what SQLAlchemy does, so it seems awful to have to do it yourself.

Audiodude claimed this task.

Thanks for the comments. I realized that the test db never needs to be live in production and that helped me figure out the solution.

Basically, I've configured the test case to talk to localhost, like it will in CI, and then began developing my tests locally. This is working so far.

Thanks again, sorry for the convoluted post.

And yes, the ORM I was speaking of was SQLAlchemy. It was much too slow with the session management strategy I was using, and manually managing multiple sessions and object eviction in order to speed it up would make the code too complicated in my opinion and negate the benefits of using the ORM.