Page MenuHomePhabricator

Evaluate refactoring of change_tags to only associate with rev id
Open, LowestPublic

Description

Every time we build on this system we run into all kinds of inconsistencies in data. And contrary to what one might thing, we don't seem to be filling in the missing pieces, so it's not like they serve as a convenient shortcut.

Recent changes are a temporary snapshot of revision and logging. So it makes sense that, as long as it supports recentchanges, it must also support both revision and logging. But do we actually tag log events? Specifically log events that don't have a null revision associated with them.

And do we ever use change tags directly on rc_id exclusively? Because recentchanges already has rc_logid and rc_this_oldid readily available. Seems like there isn't much use in associating a change tag exclusively with rc_id. Especially considering recentchanges continuously rolls over and the tags would be inaccessible (but remain in the database?).

Event Timeline

Krinkle created this task.Mar 2 2015, 8:44 PM
Krinkle updated the task description. (Show Details)
Krinkle raised the priority of this task from to Needs Triage.
Krinkle added subscribers: Krinkle, Anomie, Legoktm.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 2 2015, 8:44 PM
Anomie added a comment.EditedMar 2 2015, 10:43 PM

But do we actually tag log events? Specifically log events that don't have a null revision associated with them.

OAuth will tag any log event that passes through the RecentChange_save hook, for one. Also AbuseFilter and TorBlock, it looks like.

Checking enwiki's change_tag table, I see 180123 rows with ct_rev_id null.

And do we ever use change tags directly on rc_id exclusively?

We don't tag on rc_id only that I've seen. I see 92 rows on enwiki with only rc_id filled in, but none correspond to a row currently in the recentchanges table.

But when joining with the recentchanges table we join on the ts_rc_id or tc_rc_id field rather than (rc_logid=ct_log_id OR rc_this_oldid=ct_rev_id), since I don't think MySQL is very good at joining on two indexes like that. And I note that recentchanges currently lacks indexes on rc_logid or rc_this_oldid that would allow the planner to go in the opposite direction if that would make sense for a query.

Aklapper triaged this task as Normal priority.Mar 3 2015, 10:06 AM
TTO added a subscriber: TTO.Mar 3 2015, 10:17 AM

Evaluate refactoring of change_tags to only associate with rev id

This seems like a fine idea, but what is your rationale?

TTO lowered the priority of this task from Normal to Lowest.Mar 14 2015, 11:37 AM

In the absence of a good reason to do this, marking "lowest" priority.