Page MenuHomePhabricator

Run a one-off task to retrieve edits between August and September 2018
Open, NormalPublic

Description

The old Hashtags tool stopped functioning in August 2018. By the time the new tool was up we could only easily retrieve data from as far back as September. As such, there's around a month where no data has been stored in the database.

We could do a one-off task to retrieve data from the revision table for this period of time.

Event Timeline

Samwalton9 triaged this task as Normal priority.Feb 20 2019, 12:07 PM
Samwalton9 created this task.

We could do a one-off task to retrieve data from the revision table for this period of time

@Samwalton9 Could you please tell me what is 'revision table'? If I am not wrong, does this mean we have to retrieve edit data for that period from Eventstream and store it in tool's database?

Could you please tell me what is 'revision table'?

@AdityaJ The revision table refers to the MediaWiki database table which stores every edit on a Wikimedia project (https://www.mediawiki.org/wiki/Manual:Revision_table). We should be able to access replicas of the live databases (https://phabricator.wikimedia.org/phame/live/5/post/70/new_wiki_replica_servers_ready_for_use/) from the tool on the production server.

If I am not wrong, does this mean we have to retrieve edit data for that period from Eventstream and store it in tool's database?

Almost, yes, but we'll query the data from a replica database rather than the Eventstream. We can only retrieve up to 30 days of data from the eventstream, so we can't get this historical data from there. We can, however, craft a query which could fetch hashtag edits from the revision table of the database.

Since this is something that needs to happen from within the production server, I'll assign it to myself, as one of the only people with access to it!