Compacting the revision table
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	tstarling
	Mar 29 2017, 5:05 AM

Description

This is the task for the schema change project documented on the wiki at User:Brion VIBBER/Compacting the revision table round 2. Part of the description is copied below.

Per ongoing discussion in ArchCom and at WikiDev17 about performance, future requirements, and future-proofing for table size it's proposed to do a major overhaul of the revision table, combining the following improvements:

Normalization of frequently duplicated data to separate tables, reducing the dupe strings to integer keys
Separation of content-specific from general-revision metadata to support:
- Multi-content revisions allowing for storing of multiple content blobs per revision -- not related to compaction, but of great interest for structured data additions planned for multimedia and articles
general reduction in revision table width / on-disk size will make schema changes easier in future
trying to avoid inconsistencies in live index deployments
- ideally all indexes should fit on all servers, making it easier to switch database backend around in production

The specific changes and associated Wikimedia production tasks involved here are:

Dropping rev_comment, adding rev_comment_id. (T166733, T215466)
- Ready to go!
Dropping rev_user and rev_user_text, adding rev_actor. (T188327, T215466)
- Ready to go!
Dropping rev_text_id, rev_content_model, and rev_content_format. (T238958, T238966)
- Ready to go!
Fixing the type of rev_timestamp on old wikis to match tables.sql. (T298560, P8433)
- Ready to go!

Details

	Subject	Repo	Branch	Lines +/-
	WIP - provisional revision table restructure	mediawiki/core	master	+480 -93

Customize query in gerrit

Related Objects
Search...

View Standalone Graph

This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Status	Assigned	Task
		· · ·
Resolved	daniel	T161671 Compacting the revision table
Resolved	Anomie	T6715 Allow comments longer than 255 bytes
Duplicate	None	T184615 Once MCR is deployed, drop the rev_text_id, rev_content_model, and rev_content_format fields from the revision table
Resolved	None	T227047 Complete actor table and comment table migration
Open	None	T215445 comment and actor view challenges for Cloud Services
Resolved	daniel	T251343 Drop unused columns from revision table in DatabaseUpdater (CommentStore, Actor, MCR)
Resolved	tstarling	T278917 Clean up obsolete ActorMigration usages for non-temp tables
		· · ·

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

What ever happened to this project? Is it on ice?

kaldari mentioned this in T49415: excessive autopatrol log entries on wikidata.Jan 10 2018, 2:35 AM

@kaldari I think most of the efforts are currently going towards comment separation/normalization on T6715/T166733 (deployment in progress) and on user normalization on T167246. This is such a generic, epic ticket that most work is done on those subtickets. It is not fast because it requires a lot of code changes, technical debt cleanup (old data in a bad format) and long running schema changes- but as far as I can see it is going well.

There is also work on MCR T174043, which should split the revision table into 2.

I think one those 3 refactorings are done, I assume reevaluation will be done and we will see how compact revision is, improving at the same time how fast newer schema changes can be done.

More specifically,

T166733: Deploy refactored comment storage will blank the rev_comment field and eventually allow it to be dropped in favor of a bigint rev_comment_id
T167246: Refactor "user" & "user_text" fields into "actor" reference table will blank the rev_user_text field and eventually allow it and rev_user to be dropped in favor of a bigint rev_actor.
MCR will allow rev_text_id, rev_content_model, and rev_content_format to be dropped.

Jdforrester-WMF removed a subtask: T153333: RFC: How should we store longer revision comments?.Jan 10 2018, 3:54 PM

In T161671#3889954, @Anomie wrote:

More specifically,

T166733: Deploy refactored comment storage will blank the rev_comment field and eventually allow it to be dropped in favor of a bigint rev_comment_id

T167246: Refactor "user" & "user_text" fields into "actor" reference table will blank the rev_user_text field and eventually allow it and rev_user to be dropped in favor of a bigint rev_actor.

MCR will allow rev_text_id, rev_content_model, and rev_content_format to be dropped.

(Added that last one as T184615.)

After those three components are complete, the idea is to declare this task "done", and if there is scope for further discussion at that point it should be its own task?

Change 350097 abandoned by Brion VIBBER:
WIP - provisional revision table restructure

Reason:
Abandoning this old experimental patch.

https://gerrit.wikimedia.org/r/350097

MER-C mentioned this in T28874: API: Allow filter of revision deleted usercontribs.Jan 25 2018, 9:59 PM

Anomie closed subtask T6715: Allow comments longer than 255 bytes as Resolved.Mar 2 2018, 12:34 PM

Anomie mentioned this in T188798: Change the limit of the edit summary to 500 characters on all Wikimedia wikis.Mar 15 2018, 4:19 PM

Nirmos subscribed.Apr 19 2018, 3:54 PM

daniel moved this task from Epic to Watching on the Multi-Content-Revisions board.May 7 2018, 10:44 AM

CCicalese_WMF moved this task from Watching to Epic on the Multi-Content-Revisions board.May 8 2018, 9:04 PM

Changes that affect the revision table will usually also affect the archive table as well. But there is also something that we could do with just the archive table. Perhaps, we could create a page_archive table with columns named pa_id, pa_namespace, pa_title, pa_page_id, and pa_rev_count. Then, the ar_namespace, ar_title, and ar_page_id fields would be migrated to the new table, and we could then add in an ar_pa_id column to the archive table that points to a row in the page_archive table. Also, when a page is being deleted, all the deleted revisions would have a single pa_id, and the total number of revisions the page had prior to the deletion would then become the pa_rev_count. Finally, undeletion would delete row(s) from page_archive table or lower the pa_rev_count field(s) if necessary.

GTrang mentioned this in T193690: RFC: How should we fix the undeletion system?.May 19 2018, 3:15 AM

CCicalese_WMF edited projects, added Core-Platform-Team-Old; removed MediaWiki-Platform-Team-Archived.Jul 12 2018, 12:38 AM

CCicalese_WMF moved this task from Inbox to Epic on the Core-Platform-Team-Old board.

CCicalese_WMF edited projects, added Platform Team Legacy; removed Core-Platform-Team-Old.Oct 1 2018, 4:23 PM

CCicalese_WMF moved this task from Inbox to Epics on the Platform Team Legacy board.

CCicalese_WMF added a project: Platform Engineering.Oct 2 2018, 2:03 PM

CCicalese_WMF moved this task from Inbox to Needs Cleaning - Code Health (TEC13) on the Platform Engineering board.Oct 3 2018, 1:50 PM

CCicalese_WMF edited projects, added Platform Engineering (Needs Cleaning - Code Health (TEC13)); removed Platform Engineering.

Restricted Application edited projects, added Platform Engineering; removed Platform Engineering (Needs Cleaning - Code Health (TEC13)). · View Herald TranscriptOct 3 2018, 1:50 PM

CCicalese_WMF moved this task from Inbox to Needs Cleaning - Code Health (TEC13) on the Platform Engineering board.Oct 3 2018, 1:53 PM

CCicalese_WMF edited projects, added Platform Engineering (Needs Cleaning - Code Health (TEC13)); removed Platform Engineering.

Restricted Application edited projects, added Platform Engineering; removed Platform Engineering (Needs Cleaning - Code Health (TEC13)). · View Herald TranscriptOct 3 2018, 1:53 PM

CCicalese_WMF edited projects, added Platform Team Legacy (Epic), Platform Engineering (Needs Cleaning - Code Health (TEC13)); removed Platform Engineering, Platform Team Legacy.Oct 3 2018, 2:02 PM

CCicalese_WMF moved this task from Epic to Watching on the Multi-Content-Revisions board.Oct 4 2018, 3:05 PM

GTrang mentioned this in T206587: Overhaul the undelete feature with a "pagearchive" table.Oct 10 2018, 2:09 AM

Liuxinyu970226 awarded a token.Nov 11 2018, 6:48 AM

Liuxinyu970226 subscribed.

JAllemandou subscribed.Nov 20 2018, 9:51 AM

Anomie mentioned this in T215466: Remove revision_comment_temp and revision_actor_temp.Feb 6 2019, 9:25 PM

Anomie mentioned this in T188327: Deploy refactored actor storage.

Anomie mentioned this in T166733: Deploy refactored comment storage.

Anomie mentioned this in P8433 rev_timestamp definition on all WMF wikis.Apr 25 2019, 3:28 PM

Jdforrester-WMF updated the task description. (Show Details)May 1 2019, 6:47 PM

Anomie mentioned this in T222224: RFC: Normalize MediaWiki link tables.May 14 2019, 8:58 PM

Anomie updated the task description. (Show Details)May 17 2019, 3:36 PM

Nikerabbit subscribed.May 28 2019, 1:21 PM

AfroThundr3007730 subscribed.Jul 1 2019, 12:19 AM

• WDoranWMF removed a project: Platform Team Legacy (Epic).Jul 5 2019, 7:32 PM

CCicalese_WMF added a subtask: T227047: Complete actor table and comment table migration.Jul 9 2019, 7:04 PM

• WDoranWMF edited projects, added Platform Engineering (Revision Storage Schema Improvements); removed Platform Engineering (Needs Cleaning - Code Health (TEC13)).Jul 9 2019, 7:04 PM

Nemo_bis added a subtask: T215445: comment and actor view challenges for Cloud Services.Jul 14 2019, 4:35 PM

Krinkle edited projects, added MediaWiki-Core-Revision-backend; removed MediaWiki-libs-Rdbms.Jul 18 2019, 8:32 PM

• WDoranWMF moved this task from Revision Storage Schema Improvements to mop on the Platform Engineering board.Jul 26 2019, 7:11 PM

• WDoranWMF edited projects, added Core Platform Team Initiatives (Revision Storage Schema Improvements); removed Platform Engineering (Revision Storage Schema Improvements).

Krinkle moved this task from Untriaged to Revision on the MediaWiki-Core-Revision-backend board.Aug 1 2019, 7:56 PM

Anomie updated the task description. (Show Details)Sep 20 2019, 2:04 PM

Iflorez subscribed.Oct 3 2019, 7:04 PM

Iflorez mentioned this in T234560: identify a reliable user field for use with the revision, page...maybe also the actor table.Oct 3 2019, 7:23 PM

Anomie mentioned this in T236376: SELECT /* Title::getFirstRevision */ sometimes using page_user_timestamp index instead of page_timestamp.Oct 24 2019, 1:47 PM

Anomie updated the task description. (Show Details)Nov 21 2019, 9:24 PM

Change 552339 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Alter revision for actor, comment, and MCR

https://gerrit.wikimedia.org/r/552339

DannyS712 mentioned this in T166477: Create a rev_user and rev_user_text associating maintenance script.Nov 22 2019, 7:04 AM

Status: This task is now unblocked! I've uploaded a patch for the database part of it, however there should probably be one before it that drops $wgMultiContentRevisionSchemaMigrationStage, or at least makes MediaWiki throw early if it's set to anything other than SCHEMA_COMPAT_NEW.

CCicalese_WMF updated the task description. (Show Details)Dec 5 2019, 10:18 PM

Anomie updated the task description. (Show Details)Dec 6 2019, 3:00 PM

Anomie updated the task description. (Show Details)

Reedy mentioned this in T244786: rev_comment_id, comment_id join clause.Feb 10 2020, 11:14 PM

CCicalese_WMF added a project: Platform Team Workboards (Initiatives).Mar 19 2020, 1:37 AM

CCicalese_WMF moved this task from Later to Paused on the Platform Team Workboards (Initiatives) board.

• eprodromou edited projects, added Platform Team Workboards (Epics); removed Platform Team Workboards (Initiatives).Apr 15 2020, 4:24 PM

• eprodromou moved this task from Epic Backlog to Paused on the Platform Team Workboards (Epics) board.

daniel claimed this task.Apr 27 2020, 4:53 PM

daniel edited projects, added Platform Team Workboards (Clinic Duty Team); removed Platform Team Workboards (Epics).

Recovered!

daniel closed subtask T251343: Drop unused columns from revision table in DatabaseUpdater (CommentStore, Actor, MCR) as Resolved.May 9 2020, 8:45 AM

Naike changed the task status from Open to Stalled.Jun 5 2020, 4:08 PM

Reedy mentioned this in T261059: drop_content_model_info.sql seems to be orphaned.Aug 23 2020, 1:04 AM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:02 PM

RhinosF1 subscribed.Jan 11 2021, 9:51 PM

tstarling added a subtask: T278917: Clean up obsolete ActorMigration usages for non-temp tables.Mar 31 2021, 5:39 AM

Izno removed a subtask: T167246: Refactor "user" & "user_text" fields into "actor" reference table.Apr 27 2021, 4:51 PM

tstarling closed subtask T278917: Clean up obsolete ActorMigration usages for non-temp tables as Resolved.Nov 23 2021, 11:49 PM

Zabe subscribed.Feb 14 2022, 1:59 PM

Umherirrender updated the task description. (Show Details)Aug 2 2022, 4:56 PM

Pppery removed a project: Patch-For-Review.Apr 2 2023, 1:11 AM

There might be some other clean ups left here and there but we dropped the last piece of migration (revision_comment_temp tables) yesterday and this is now officially done.

Compacting the revision tableClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Compacting the revision table
Closed, ResolvedPublic
Actions

Related Objects
Search...