Description
Using database instead of csv helps to achieve multiple things:
- Remove any issues with race conditions for multiple programs working with one file
- Creating database in Toolforge namespace as explained here might solve problems with downloading the database for further analysis out of the Toolforge (or maybe we can use some bot/service for downloading it?)
Tasks
- Look into file downloading from Toolforge
- Design and create custom database into Toolforge space
- Mirror all the existing csv usage with database usage
- Meta table parser
- Database fetcher
- page_id + dbname is now the primary key
- API fetcher
- Comparison code for API and db
- Switch to db as the main source
- Remove csv code copies