Page MenuHomePhabricator

Article records should include information about language and project
Closed, ResolvedPublic

Description

Currently, the dashboard assumes that everything is happening on a single wiki. Add fields to article records which make the wiki explicit. We will continue to default to {language}wiki during this first step.

This task is complete when article records include a pointer to the wiki they belong to.

Assignments, articles, and revisions will have a 1-to-1 association with a target wiki.

User and course are more complicated and we need to do design work. These might benefit from a many-to-many relationship with wikis, through a polymorphic link table.

This is the first of several phases of work, which can each be deployed separately and should give us some runway in the event we have to rollback.

Event Timeline

awight created this task.Mar 23 2015, 8:28 PM
awight raised the priority of this task from to Needs Triage.
awight updated the task description. (Show Details)
awight added subscribers: dduvall, Ragesoss, AndyRussG and 3 others.

Although not well-tested and I'm pretty sure incomplete, we currently have basic support for tracking which language for each article record. However, we don't do that for the rest of the Wikipedia-based records (ie, User, and Revision.

This seems a little tricky, since we can't use the pageid as primary key if we're storing users, revisions and articles from different wikis in the same database. I guess we have to switch to using a primary key that combines the language with the local pageid? And for users, maybe we can rely on a globalid, although there are some logistics to work out for finding out what those global ids are for existing users.

awight set Security to None.
awight added a comment.Oct 7 2015, 9:11 PM

Compound keys get nasty, you inevitably find yourself exploding them and stuff. What about introducing an autoincrement dashboard-local primary key and including wiki and wiki page_id as regular columns? We can include a multi key on the discrete fields to improve the occasional reverse query, still.

That would work, probably. (I think?) But it'll be a lot of work, and a lot of careful testing, to clean up all the places that currently depend on ids matching Wikipedia's records. For revisions, we'll also need to be careful about implementing ordering whenever do queries and expect id-order to match up with date-order.

I've worked pretty hard to separate organize and refactor the code that interacts with Wikipedia's API and the wmflabs replica database, but the assumption of ids matching Wikipedia I've mostly taken for granted, so I'm not sure where all that assumption sneaks in outside of the Importer libraries.

awight added a comment.Oct 7 2015, 9:53 PM

Just noting that the simple matching IDs thing is being overturned regardless of how we proceed... If you think it's easier, we could make the schema changes I suggested but refer to rows using the multikey on wiki+wiki page id rather than the integer primary key. Having primary key be a calculated value seems like trouble to me, same with making monotonicity assumptions about the primary keys also having a date ordering. What about solving the latter problem by ordering by a timestamp column on revisions?

We have the timestamp column on Revision. It's just noticing all the places where we didn't order by date explicitly because we didn't need to.

Restricted Application added a subscriber: Base. · View Herald TranscriptOct 15 2015, 1:43 PM
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptFeb 3 2016, 7:58 AM
awight updated the task description. (Show Details)Feb 11 2016, 8:07 PM
awight updated the task description. (Show Details)Feb 17 2016, 7:02 AM
awight updated the task description. (Show Details)Feb 17 2016, 7:17 AM
awight claimed this task.Feb 17 2016, 7:41 AM
Ijon added a subscriber: Ijon.Apr 1 2016, 9:00 AM
Ragesoss closed this task as Resolved.May 21 2016, 1:14 AM

Done since a few weeks ago.