Page MenuHomePhabricator

Compacting the revision table
Closed, ResolvedPublic

Description

This is the task for the schema change project documented on the wiki at User:Brion VIBBER/Compacting the revision table round 2. Part of the description is copied below.

Per ongoing discussion in ArchCom and at WikiDev17 about performance, future requirements, and future-proofing for table size it's proposed to do a major overhaul of the revision table, combining the following improvements:

  • Normalization of frequently duplicated data to separate tables, reducing the dupe strings to integer keys
  • Separation of content-specific from general-revision metadata to support:
    • Multi-content revisions allowing for storing of multiple content blobs per revision -- not related to compaction, but of great interest for structured data additions planned for multimedia and articles
  • general reduction in revision table width / on-disk size will make schema changes easier in future
  • trying to avoid inconsistencies in live index deployments
    • ideally all indexes should fit on all servers, making it easier to switch database backend around in production

The specific changes and associated Wikimedia production tasks involved here are:

  • Dropping rev_comment, adding rev_comment_id. (T166733, T215466)
    • Ready to go!
  • Dropping rev_user and rev_user_text, adding rev_actor. (T188327, T215466)
    • Ready to go!
  • Dropping rev_text_id, rev_content_model, and rev_content_format. (T238958, T238966)
    • Ready to go!
  • Fixing the type of rev_timestamp on old wikis to match tables.sql. (T298560, P8433)
    • Ready to go!

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

What ever happened to this project? Is it on ice?

@kaldari I think most of the efforts are currently going towards comment separation/normalization on T6715/T166733 (deployment in progress) and on user normalization on T167246. This is such a generic, epic ticket that most work is done on those subtickets. It is not fast because it requires a lot of code changes, technical debt cleanup (old data in a bad format) and long running schema changes- but as far as I can see it is going well.

There is also work on MCR T174043, which should split the revision table into 2.

I think one those 3 refactorings are done, I assume reevaluation will be done and we will see how compact revision is, improving at the same time how fast newer schema changes can be done.

More specifically,

More specifically,

(Added that last one as T184615.)

After those three components are complete, the idea is to declare this task "done", and if there is scope for further discussion at that point it should be its own task?

Change 350097 abandoned by Brion VIBBER:
WIP - provisional revision table restructure

Reason:
Abandoning this old experimental patch.

https://gerrit.wikimedia.org/r/350097

Changes that affect the revision table will usually also affect the archive table as well. But there is also something that we could do with just the archive table. Perhaps, we could create a page_archive table with columns named pa_id, pa_namespace, pa_title, pa_page_id, and pa_rev_count. Then, the ar_namespace, ar_title, and ar_page_id fields would be migrated to the new table, and we could then add in an ar_pa_id column to the archive table that points to a row in the page_archive table. Also, when a page is being deleted, all the deleted revisions would have a single pa_id, and the total number of revisions the page had prior to the deletion would then become the pa_rev_count. Finally, undeletion would delete row(s) from page_archive table or lower the pa_rev_count field(s) if necessary.

Change 552339 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Alter revision for actor, comment, and MCR

https://gerrit.wikimedia.org/r/552339

Status: This task is now unblocked! I've uploaded a patch for the database part of it, however there should probably be one before it that drops $wgMultiContentRevisionSchemaMigrationStage, or at least makes MediaWiki throw early if it's set to anything other than SCHEMA_COMPAT_NEW.

Anomie updated the task description. (Show Details)
Naike changed the task status from Open to Stalled.Jun 5 2020, 4:08 PM
Ladsgroup subscribed.

There might be some other clean ups left here and there but we dropped the last piece of migration (revision_comment_temp tables) yesterday and this is now officially done.