Page MenuHomePhabricator

Remove all references to the rev_text_id and ar_text_id fields
Open, NormalPublic1 Story Points

Description

Any code that accesses rev_text_id and ar_text_id fields is unable to make use of MCR. In order to enable for MCR support, all references to these fields have to be removed from the code. They should be replaced with RevisionStore::getQueryInfo(), RevisionStore::newRevisionFromRow(), and RevisionRecord::getContent().

As a corollary, all references to text.old_id should be removed as well. If rev_text_id and ar_text_id are no longer there, there is no need to reference text.old_id (except in SqlBlobStore).

Related Objects

StatusAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenNone
ResolvedAbit
OpenNone
OpenNone
OpenNone
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedppelberg
ResolvedKrinkle
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedBstorm
OpenNone
StalledNone
OpenNone
OpenBPirkle
OpenBPirkle
Resolveddaniel
ResolvedBPirkle
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedAnomie
ResolvedAnomie
Resolvedtstarling
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
Resolveddaniel
ResolvedAnomie
ResolvedAnomie
Resolved Marostegui
ResolvedAnomie
Resolvedtstarling
ResolvedAnomie
Resolveddaniel
ResolvedAnomie
ResolvedAnomie
ResolvedAnomie
Resolveddaniel
Resolveddaniel
Resolveddaniel
ResolvedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
daniel updated the task description. (Show Details)Jun 27 2018, 5:53 PM

Change 442358 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] MCR WIP Remove references to rev_text_id and ar_text_id

https://gerrit.wikimedia.org/r/442358

daniel moved this task from Inbox to Next on the Multi-Content-Revisions board.Sep 17 2018, 1:07 PM
Addshore moved this task from ready to go to monitoring on the Wikidata board.Sep 19 2018, 7:03 AM

Task description refers to "RevisionStore::newRevisionRecordFromRow()". Should that be "RevisionStore::newRevisionFromRow()"?

Task description refers to "RevisionStore::newRevisionRecordFromRow()". Should that be "RevisionStore::newRevisionFromRow()"?

Yes, sorry about that. I'll fix the description.

daniel updated the task description. (Show Details)Feb 18 2019, 7:15 PM
greg added a project: Multimedia.Mar 7 2019, 10:59 PM

Question on one occurrence of these fields: is the maintenance script fixT22757.php still relevant?

Question on one occurrence of these fields: is the maintenance script fixT22757.php still relevant?

Looks like T22757 was resolved in 2010, so I think we can get rid of that script :) Just make a patch to removed it. Best ping Tim about it, just to be sure.

Thank you. Will ping Tim on that one when it is daylight for him.

One more question: rev_text_id is referenced in tests/phpunit/includes/ActorMigrationTest.php function provideInsertRoundTrip, which is a data provider for function testInsertRoundTrip. That function does many things with the SCHEMA_COMPAT_ constants, but related to the Actor migration, not the MCR Schema migration. Therefore the rev_text_id reference needs to be removed from that file. Am I correct about that?

Anomie added a comment.EditedMar 13 2019, 2:23 PM

Yes. You should just be able to delete the two references in that file.

When the code was added in rMW27c61fb1e94d: Add `actor` table and code to start using it rev_text_id did not have a default and so needed to be specified for the INSERTs to succeed. That was later changed in rMW8d15ade67267: Add default for revision.rev_text_id where missing, so now the rev_text_id can be omitted.

Thank you. With that in mind, here's my current understanding. The list below applies only to rev_text_id, not ar_text_id or text.old_id. When touching code for rev_text_id, I'll also remove any references to the other two fields. I'll then loop back and review/remove any remaining references to them.

rev_text_id references that will be removed/refactored in this task:

  • includes/Revision/RevisionStore.php (I'll touch only getQueryInfo(). All other references are either gated directly, or are in private functions with all calls gated)
  • includes/search/SearchPostgres.php
  • maintenance/Maintenance.php
  • maintenance/populateContentTables.php
  • maintenance/rebuildtextindex.php
  • maintenance/storage/checkStorage.php
  • maintenance/storage/compressOld.php
  • maintenance/storage/fixT22757.php (this script will be removed entirely, pending confirmation from @tstarling )
  • maintenance/storage/trackBlobs.php
  • tests/phpunit/includes/ActorMigrationTest.php

rev_text_id references that will not be touched in this task:

  • HISTORY (documentation)
  • includes/export/WikiExporter.php (gated, will be addressed in T198706 and T174031)
  • includes/export/XmlDumpWriter.php (will be addressed in T198706 and T174031)
  • includes/installer/MssqlUpdater.php (schema, historical)
  • includes/installer/MysqlUpdater.php (schema, historical)
  • includes/installer/PostgresUpdater.php (schema, historical)
  • includes/installer/SqliteUpdate.php (schema, historical)
  • includes/page/WikiPage.php (gated)
  • includes/Revision.php (gated and deprecated)
  • includes/Revision/RevisionStore.php (I'll touch only getQueryInfo(). All other references are either gated directly, or are in private functions with all calls gated)
  • maintenance/tables.sql (schema)
  • maintenance/archives/patch-rev_text_id-default.sql (schema, historical)
  • maintenance/archives/patch-rev_text_id.sql (schema, historical)
  • maintenance/mssql/tables.sql (schema)
  • maintenance/mssql/archives/patch-rev_text_id-default.sql (schema, historical)
  • maintenance/oracle/tables.sql (schema)
  • maintenance/postgres/tables.sql (schema)
  • maintenance/postgres/patch-textsearch_bug66650.sql (schema)
  • maintenance/sqlite/archives/initial-indexes.sql (schema)
  • maintenance/sqlite/archives/patch-comment-table.sql (schema)
  • maintenance/sqlite/archives/patch-rev_text_id-default.sql (schema, historical)
  • tests/phpunit/data/db/sqlite/tables-1.13.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.15.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.16.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.17.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.18.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.19.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.20.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.21.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.22.sql (historical)
  • tests/phpunit/data/db/sqlite/tables-1.23.sql (historical)
  • tests/phpunit/includes/RevisionMcrReadNewDbTest.php (testing)
  • tests/phpunit/includes/RevisionMcrWriteBothDbTest.php (testing)
  • tests/phpunit/includes/RevisionNoContentModelDbTest.php (testing)
  • tests/phpunit/includes/RevisionPreMcrDbTest.php (testing)
  • tests/phpunit/includes/Revision/create-pre-mcr-fiels.sql (testing)
  • tests/phpunit/includes/Revision/drop-pre-mcr-fields.sql (testing)
  • tests/phpunit/includes/Revision/McrReadNewRevisionStoreDbTest.php (testing)
  • tests/phpunit/includes/Revision/McrWriteBothRevisionStoreDbTest.php (testing)
  • tests/phpunit/includes/Revision/NoContentModelRevisionStoreDbTest.php (testing)
  • tests/phpunit/includes/Revision/PreMcrRevisionStoreDbTest.php (testing)
  • tests/phpunit/includes/Revision/RevisionQueryInfoTest.php (testing)
  • tests/phpunit/includes/Revision/RevisionStoreDbTestBase.php (testing)
  • tests/phpunit/includes/Revision/RevisionStoreTest.php (testing)

If anyone notices anything glaringly incorrect in that list, please let me know. Otherwise, I'll proceed and ask any further questions as they arise.

I'll also remove any references to the other two fields [text.old_id and archive.ar_text_id]

Note that text.old_id references are probably sill valid in low-level code like HistoryBlob.php and SqlBlobStore.

  • maintenance/populateContentTables.php

This maintenance script is intended to be called during update when rev_text_id still exists in the database, specifically to migrate data out of rev_text_id before dropping the column.

At some point it'll probably gain a check near line 190 something like

if ( !$this->dbw->fieldExists( $table, $fields['text_id'], __METHOD__ ) ) {
    $this->output( "No need to populate, $table.{$fields['text_id']} field does not exist\n" );
    return;
}

Whether that's done during this task or T184615, I have no real opinion.

Thanks.

MessageCache.php was not listed in my previous comment, because I've already incorporated Daniel's patch from June into my local codebase, but that'll be part of the final change as well.

includes/Revision/RevisionStore.php (I'll touch only getQueryInfo(). All other references are either gated directly, or are in private functions with all calls gated)

getQueryInfo() should have proper gating as well. If you think it does not, let's talk before you change things.

rev_text_id references that will be removed/refactored in this task:

  • includes/Revision/RevisionStore.php (I'll touch only getQueryInfo(). All other references are either gated directly, or are in private functions with all calls gated)

see my previous comment

  • includes/search/SearchPostgres.php

odd stuff, maybe @Anomie can help. Two uses seem completely pointless, the third one should probably use getQueryInfo?

  • maintenance/Maintenance.php

Hm, purgeRedundantText() looks like it needs a switch based on the migration stage. If the SCHEMA_READ_OLD is set, it should read rev_text_id, if SCHEMA_READ_NEW is set, it should read from content.cont_address, and use SqlBlobStore::getTextIdFromAddress to get the numeric ID (if any).

  • maintenance/populateContentTables.php

See Anomie'S comment

  • maintenance/rebuildtextindex.php

Just drop the 'text' flag from getQueryInfo() and the rev_text_id = old_id condition. And try that it still works ;)

  • maintenance/storage/checkStorage.php

Not sure this works at all any more for our wikis, since it has no batching. Anyway, the queries against the text table can stay as they are. The code that builds $this->oldIdMap needs to be modified much like the code in purgeRedundantText()

  • maintenance/storage/compressOld.php

$fields and friends should come from getQueryInfo(). $textRow and Revision::getRevisionText should be replaced by SqlBlobStore.
The logic also needs to be changed so it can process the blobs for multiple slots per revision. Probably needs input from Tim.

  • maintenance/storage/fixT22757.php (this script will be removed entirely, pending confirmation from @tstarling )

that

  • maintenance/storage/trackBlobs.php

Same approach as compressOld.php, same issue with processing multiple blobs per revision.

  • tests/phpunit/includes/ActorMigrationTest.php

I think Anomie siad somehwre that the rev_text_id stuff should just be removed, right?

I recommend doing separate patches for all of these.

includes/Revision/RevisionStore.php (I'll touch only getQueryInfo(). All other references are either gated directly, or are in private functions with all calls gated)

getQueryInfo() should have proper gating as well. If you think it does not, let's talk before you change things.

I looked again, you are correct. I saw the comment mentioning this task (T198341) and initially thought there was something to do. We just need to be sure callers don't pass a "text" option, but that's already covered under T198342 (which is probably a good candidate for my next task after I finish this one). So nothing to do to getQueryInfo() at the moment.

Thanks very much for your comments on the other items. I'll proceed down the list, making separate patches for each.

  • includes/search/SearchPostgres.php

odd stuff, maybe @Anomie can help. Two uses seem completely pointless, the third one should probably use getQueryInfo?

Note in the PostgreSQL schema the "text" table is named "pagecontent". That was done in rSVN15791; I don't know why quoting wasn't just used instead. The plan for T164898/T191231 includes renaming it back to "text".

It looks like that file needs some Technical-Debt cleanup to use the IDatabase querying methods rather than building SQL strings directly, but that alone would probably be out of scope here.

What is in scope is that it's trying to search directly against the text table, having added an extra column for that purpose, and will need to be converted to use a searchindex table like MySQL does.

  • tests/phpunit/includes/ActorMigrationTest.php

I think Anomie siad somehwre that the rev_text_id stuff should just be removed, right?

Correct. That was just above in T198341#5020970.

I recommend doing separate patches for all of these.

Here's one place where Daniel and I differ: I'd put most of these in one patch instead of having 10+ single-file changes.

SearchPostgres.php seems complicated enough that it would make sense to do it separately though, since it's going to involve a schema change for PG plus probably significant edits to the file itself.

Here's one place where Daniel and I differ: I'd put most of these in one patch instead of having 10+ single-file changes.

Ok, compromise: one patch for SearchPostgres, one for checkStorage.php and compressOld.php, and one for the rest.

Here's one place where Daniel and I differ: I'd put most of these in one patch instead of having 10+ single-file changes.

Ok, compromise: one patch for SearchPostgres, one for checkStorage.php and compressOld.php, and one for the rest.

Should compressOld.php and trackBlobs.php be in the same patch, as they both suffer from the " processing multiple blobs per revision" problem?

Should compressOld.php and trackBlobs.php be in the same patch, as they both suffer from the " processing multiple blobs per revision" problem?

Oh, right. Yea, probably.

Change 496816 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Remove many references to db fields being retired as part of MCR Schema Migration

https://gerrit.wikimedia.org/r/496816

Change 496816 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Remove many references to db fields being retired as part of MCR Schema Migration

https://gerrit.wikimedia.org/r/496816

Change 496816 merged by jenkins-bot:
[mediawiki/core@master] Remove many references to db fields being retired as part of MCR Schema Migration

https://gerrit.wikimedia.org/r/496816

  • includes/search/SearchPostgres.php

odd stuff, maybe @Anomie can help. Two uses seem completely pointless, the third one should probably use getQueryInfo?

Note in the PostgreSQL schema the "text" table is named "pagecontent". That was done in rSVN15791; I don't know why quoting wasn't just used instead. The plan for T164898/T191231 includes renaming it back to "text".
It looks like that file needs some Technical-Debt cleanup to use the IDatabase querying methods rather than building SQL strings directly, but that alone would probably be out of scope here.
What is in scope is that it's trying to search directly against the text table, having added an extra column for that purpose, and will need to be converted to use a searchindex table like MySQL does.

To make sure I'm heading in the right direction:

  • Postgres currently uses columns in specific tables (pagecontent.textvector and page.titlevector, both of type tsvector) to store search index information
  • Postgres also currently uses triggers/procedures to maintain these tables
  • And directly related to this task, searchPostgres.php currently and undesirably references fields rev_text_id and old_id

I should:

  • add a searchindex table, similar to the MySQL one, but adapted to Postgres data types
  • add related indexes/triggers/procedures to maintain this table
  • add necessary documentation and install/update support for these db changes
  • modify searchPostgres.php to use the new db changes

Questions:

  • I'm assuming we are using triggers/procedures to maintain the existing tsvector search columns because that works well in Postgres, and we want to retain that technique with the searchindex table. Am I right, or do we want to do more of this at the PHP level per SearchMySQL.php?
  • should I remove/refactor the existing columns/triggers/procedures and have all Postgres searches work off (only) the new searchindex table? Or should I leave that alone for now and use $wgMultiContentRevisionSchemaMigrationStage gating in searchPostgres.php to continue to use the existing columns if we're reading the old schema? Leaving the old and maintaining two sets of indexes at the db level sounds inefficient. But maybe there's a reason I'm missing to retain it.
  • there are a. bunch of "ts2_" prefixes floating around in postgres/tables.sql (ex. "CREATE TRIGGER ts2_page_text..."). Is that a meaningful naming convention I should be aware of and retain, or is that just "text search 2" to indicate it was a "new thing" at some point?
  • is there developer documentation on update.php/DatabaseUpdater.php/PostgresUpdater.php? I've looked through that code (and postgres/tables.sql). I have a formative understanding, but I need to be more solid on how all this works.

To make sure I'm heading in the right direction:

  • Postgres currently uses columns in specific tables (pagecontent.textvector and page.titlevector, both of type tsvector) to store search index information
  • Postgres also currently uses triggers/procedures to maintain these tables
  • And directly related to this task, searchPostgres.php currently and undesirably references fields rev_text_id and old_id

That sounds right to me, but then, I don't actually know how each works on postgres.

I should:

  • add a searchindex table, similar to the MySQL one, but adapted to Postgres data types
  • add related indexes/triggers/procedures to maintain this table
  • add necessary documentation and install/update support for these db changes
  • modify searchPostgres.php to use the new db changes

That sounds like "completely change how postgres does search". Now, my initial impression is that the way postgres does search is broken by design, since a database trigger has no way to correctly interpret the contents of the text table, which may be a serialized object, or compressed, or JSON, or a reference to an external storage system, or all of these. This has been the case for over ten years now.

But maybe I'm wrong about what it's trying to do. In any case, I'd recommend to simply try to keep the current functionality as-is, without trying to fix its semantics.

So, my (possibly completely wrong) understanding of how this currently works is: pagecontent.textvector contains the searchable version of the page text. We can keep that as it is, the trigger just needs to use the slots and content tables to connect rows in pagecontent to rows in revision, instead of using rev_text_id. From my current understanding, that should be it. But perhaps Brad has different ideas.

To make sure I'm heading in the right direction:

  • Postgres currently uses columns in specific tables (pagecontent.textvector and page.titlevector, both of type tsvector) to store search index information
  • Postgres also currently uses triggers/procedures to maintain these tables
  • And directly related to this task, searchPostgres.php currently and undesirably references fields rev_text_id and old_id

All correct.

I should:

  • add a searchindex table, similar to the MySQL one, but adapted to Postgres data types

Yes. Basically page.titlevector becomes searchindex.si_title and pagecontent.textvector becomes searchindex.si_text. searchindex.si_page works exactly like it does in MySQL.

  • add related indexes/triggers/procedures to maintain this table

Per T164898, we don't want to keep using triggers for this. SearchPostgres::update() should do it, similar to what SearchMySQL::update() does. Same for ::updateTitle().

  • add necessary documentation and install/update support for these db changes
  • modify searchPostgres.php to use the new db changes

Yes to both.

Questions:

  • I'm assuming we are using triggers/procedures to maintain the existing tsvector search columns because that works well in Postgres, and we want to retain that technique with the searchindex table. Am I right, or do we want to do more of this at the PHP level per SearchMySQL.php?

My guess is that whoever originally wrote the SearchPostgres code just thought triggers were nicer/"cleaner" than doing it from PHP. Unfortunately there's no explanation on r15335 or r15336 as to why they did it.

  • should I remove/refactor the existing columns/triggers/procedures and have all Postgres searches work off (only) the new searchindex table? Or should I leave that alone for now and use $wgMultiContentRevisionSchemaMigrationStage gating in searchPostgres.php to continue to use the existing columns if we're reading the old schema? Leaving the old and maintaining two sets of indexes at the db level sounds inefficient. But maybe there's a reason I'm missing to retain it.

Just transition straight to the new searchindex table, no need for a migration flag.

  • there are a. bunch of "ts2_" prefixes floating around in postgres/tables.sql (ex. "CREATE TRIGGER ts2_page_text..."). Is that a meaningful naming convention I should be aware of and retain, or is that just "text search 2" to indicate it was a "new thing" at some point?

I think it's just namespacing. It's not anything that you'd have to preserve, just name the new things whatever makes sense.

  • is there developer documentation on update.php/DatabaseUpdater.php/PostgresUpdater.php? I've looked through that code (and postgres/tables.sql). I have a formative understanding, but I need to be more solid on how all this works.

Not that I'm aware of. Feel free to ask any questions you have.

That sounds like "completely change how postgres does search". Now, my initial impression is that the way postgres does search is broken by design, since a database trigger has no way to correctly interpret the contents of the text table, which may be a serialized object, or compressed, or JSON, or a reference to an external storage system, or all of these. This has been the case for over ten years now.

Your initial impression is correct.

So, my (possibly completely wrong) understanding of how this currently works is: pagecontent.textvector contains the searchable version of the page text. We can keep that as it is, the trigger just needs to use the slots and content tables to connect rows in pagecontent to rows in revision, instead of using rev_text_id. From my current understanding, that should be it. But perhaps Brad has different ideas.

Yes, we could do something like that. Probably we wouldn't even need to change the triggers, just the SeachPostgres code to join revision↔slots↔content↔text (and that last join would probably be really hacky).

But IMO this is a situation where it would be better to cut out the bad code now instead of trying to hack around it and leave tech debt for the future. That way when we get to the part about wanting SearchMySQL and so on to handle more than just the main slot the changes to SearchPostgres will be similar rather than entirely different (and probably punted on again).

Yes, we could do something like that. Probably we wouldn't even need to change the triggers, just the SeachPostgres code to join revision↔slots↔content↔text (and that last join would probably be really hacky).
But IMO this is a situation where it would be better to cut out the bad code now instead of trying to hack around it and leave tech debt for the future. That way when we get to the part about wanting SearchMySQL and so on to handle more than just the main slot the changes to SearchPostgres will be similar rather than entirely different (and probably punted on again).

I agree we should do that, but that is way beyond the scope of this ticket, and should not block other things blocked on this ticket. The factors that go into prioritizing "get rid of rev_text_id" are very different from the ones that go into "fix postgres search". So I strongly suggest to split the efforts and track them in different tickets.

Now, if getting rid of rev_text_id in the postgres script is a hard thing to do, and would produce a lot of code that we'd have to throw away again once we address the searchindex issue, I agree that we should do both together (though the searchindex thing should still have a separate ticket). But that doesn't seem to be the case, your suggestion above makes it sound rather trivial.

So let's do the trivial thing first to move this ticket, and unblock the things blocked on it. And let's look into fixing postgres later. And by "later" I mean "when there is nothing more urgent or more important to be done".

The annoying bit is that the join condition for pagecontent will look something like old_id = substring( content_address from '^tt:(8[0-9]+)$' )::int, which hard-codes the "tt:". Thankfully PG has that regexp-based substring and casting NULL to int works right or it'd be even worse.

And let's look into fixing postgres later. And by "later" I mean "when there is nothing more urgent or more important to be done".

That sounds like code for "never". :(

And let's look into fixing postgres later. And by "later" I mean "when there is nothing more urgent or more important to be done".

That sounds like code for "never". :(

Don't get me wrong - I'd like to see this done. But I'd like to see twenty other things done more. And we, the core platform team, can only do so much. So we have to be picky about the rabbit holes we choose to fall into.

In the end, it's the job of product managers to decide which of all the things that we'd like to do are the most important, and should be worked on first.

Also, perhaps the necessary work can be done by someone outside the WMF, if they have an interest in making this work in postgres. Apparently it hasn't bothered anyone in the past ten years...

Change 499013 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Remove references to field rev_text_id

https://gerrit.wikimedia.org/r/499013

Change 500755 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Remove references to field rev_text_id

https://gerrit.wikimedia.org/r/500755

Per discussion in this task, I created new task T220450 for the additional Postgres changes (searchindex table).

Change 499013 merged by jenkins-bot:
[mediawiki/core@master] Remove references to field rev_text_id

https://gerrit.wikimedia.org/r/499013

WDoranWMF set the point value for this task to 1.Apr 15 2019, 3:15 PM

I still see a couple of usages of rev_text_id or ar_text_id in extensions that will need to be fixed[1]:

  • ActionDeletePagePermanently.php in DeletePagesForGood
  • Duplicator.page.php in Duplicator
  • bug-53687/fixOrphans.php in WikimediaMaintenance can probably be ignored

[1] https://codesearch.wmflabs.org/extensions/?q=-%3Erev_text_id%7C-%3Ear_text_id&i=nope&files=&repos=

One patch waiting for review, and a few things to do in extensions. Nearly done.

Change 500755 merged by jenkins-bot:
[mediawiki/core@master] Remove references to field rev_text_id

https://gerrit.wikimedia.org/r/500755

I'm also seeing three few ungated ar_text_id references in core:

  • ApiQueryAllDeletedRevisions::run()
  • ApiQueryDeletedRevisions::run()
  • ApiQueryDeletedrevs::execute()

These all use ar_text_id to load content by joining the text table. I'll refactor that to use the newer approach.

The ar_text_id field shows up a number of other times in core code, but after looking at them individually, I think the rest are intentional, and the above three are the only problematic references remaining.

Unless I'm missing something, @Anomie already did the bulk of the work for the first two (ApiQueryAllDeletedRevisions and ApiQueryDeletedRevisions) in commit 07842be3 as part of T200568, particularly by adding ApiQueryRevisionsBase::extractSlotInfo().

ApiQueryDeletedrevs would require larger changes because, as Anomie noted in the commit message for 07842be3 , "ApiQueryDeletedrevs wasn't touched,
since it has been deprecated since 1.25 anyway." May I remove it? If not, may I no-op it?

I'm posting a change for just the first two bullet points. If I can remove or no-op ApiQueryDeletedrevs, I'll do that as a separate change.

Change 506325 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Remove referencs to db field ar_text_id

https://gerrit.wikimedia.org/r/506325

Anomie added a comment.EditedApr 25 2019, 3:34 PM

I don't think ApiQueryDeletedrevs will need larger changes, actually. Like the other accesses to old_id, the JOIN with text there was just to prefetch fields for a call to Revision::getRevisionText(). Removing the JOIN will just make the later getRevisionText() call have to do an extra DB query.

The call to Revision::getRevisionText() without having done the JOIN will result in a deprecation warning once we switch MCR to write-new (instead of write-both), though. But I don't think that needs to block this task. For that matter, I should run a query on the Analytics data to see whether we can just remove ApiQueryDeletedrevs entirely yet.

For that matter, I should run a query on the Analytics data to see whether we can just remove ApiQueryDeletedrevs entirely yet.

T221869: Remove deprecated ApiQueryDeletedRevs

Possibly still worth removing the ar_text_id uses to close out this task though.

Change 506573 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/extensions/DeletePagesForGood@master] Remove references to db fields rev_text_id and ar_text_id

https://gerrit.wikimedia.org/r/506573

Change 506325 merged by jenkins-bot:
[mediawiki/core@master] Remove references to db field ar_text_id

https://gerrit.wikimedia.org/r/506325

Change 522228 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/extensions/Duplicator@master] Only use the revision.rev_text_id if it exists.

https://gerrit.wikimedia.org/r/522228

I've uploaded patches for all issues identified in this task. In an effort to keep this task focused on ar_text_id and rev_text_id, I created several related tasks for things that deserve attention and which are related to MCR, but not directly to rev_text_id or ar_text_id:

T220884 (additional DeletePagesForGood extension updates)
T227836 (maintenance script related to DeletePagesForGood, but possibly also generically useful)
T227840 (additional Duplicator extension updates)

While it may be slightly disingenuous to push the additional work to other tasks and declare this task done, I think it does make sense to move the extension work that is not part of the MCR Schema migration critical path to more focused tasks. Focused tasks are good practice, and extension-specific tasks are more likely to be noticed by people interested in those specific extensions.