Page MenuHomePhabricator

Update mediawiki-history to use new Multi-Content-Revision tables
Open, HighPublic

Description

Mediawiki-history uses text_bytes to provide content size metrics (text_bytes itself and text_bytes_diff). The text-related fields (text_id, text_len) are planned to be deleted from the revisiontable, and access for the same data will need to use slots and content tables.
In addition to pbeing back-compatible in providing text-size, I think we should add slots-related information to mediwiki-history, possibly in the form of a map: {slot_role: slot_info}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 2 2019, 10:43 AM
JAllemandou updated the task description. (Show Details)Dec 2 2019, 10:45 AM

@WDoranWMF Hi!
We are trying to prioritize this task,
do you know when the changes to the revision table (move fields to content table through slots) are going to take place?
Thanks!

mforns triaged this task as High priority.Dec 5 2019, 6:01 PM
mforns moved this task from Incoming to Smart Tools for Better Data on the Analytics board.

@mforns Adding @cicalese the PM for this project as well to make sure she has space to correct me. But the timehorizon for this is quite far, and I believe most likely to take place near to the next DC switchover in April 2020.

My understanding was that the fields may be dropped from the replicas earlier, but they would not be dropped from the master until the switchover.

cicalese removed a subscriber: cicalese.Dec 5 2019, 7:11 PM
Nuria added a subscriber: Nuria.Dec 5 2019, 7:15 PM

Let's move this item to late Q2 given timelines.

Anomie added a subscriber: Anomie.Dec 6 2019, 2:56 PM

The text-related fields (text_id, text_len) are planned to be deleted from the revisiontable, and access for the same data will need to use slots and content tables.

There are no fields named text_id or text_len in the revision table in the MySQL schema. Did you mean rev_text_id and rev_len, or are you talking about something other than the MySQL/MariaDB databases?

Assuming the former,

  • The field rev_text_id is already not being populated in Wikimedia production, since November 12–14 (see 711c9ce3e, 50e47354c, and 740f6fd44). Even though the field is still there, it contains zero for revisions created since those dates.
    • If you're actually using this field for some purpose, you'll probably want to update your code ASAP and clean up any broken data for the affected revisions. If you're merely selecting it without actually using the value, you have some time yet before it is dropped.
  • There are no plans at this time to remove rev_len. This field will continue to contain the sum of content.content_size for all slots of the revision.
  • The full list of fields in revision that will be removed (and that are already no longer being populated) is rev_text_id, rev_content_model, rev_content_format, rev_comment, rev_user, and rev_user_text.

Thanks for the precision @Anomie . I meant rev_text_id and rev_len indeed.