Page MenuHomePhabricator

Run maintenance/FlowFixLinks.php
Closed, ResolvedPublic

Event Timeline

matthiasmullie raised the priority of this task from to High.
matthiasmullie updated the task description. (Show Details)

When we discussed T109814: Links from Flow topics to special pages are incorrectly included in link tables at triage, we thought it would be good to fix that first, since it should be straightforward and then we can just run this once.

DannyH set Security to None.
DannyH lowered the priority of this task from High to Normal.Aug 27 2015, 7:11 PM
DannyH added a subscriber: DannyH.

[enwiki]> select count(*) from pagelinks where pl_namespace= -1;
+----------+

count(*)

+----------+

0

+----------+
1 row in set (0.00 sec)

Prod still has 104 for enwiki_p:

select count(*) from pagelinks where pl_namespace= -1;

 [enwiki]> select count(*) from pagelinks where pl_namespace= -1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Prod still has 104 for enwiki_p:

select count(*) from pagelinks where pl_namespace= -1;

I'll run this Monday at 9 AM Pacific.

We were wrong. This has a hard dependency on T113245: Set wgFlowMigrateReferenceWiki to false in production.

The reason is that removeVirtualPages needs to only get matches from the current wiki, or it will fail to deserialize them with:

aawiki:  Catchable fatal error: Argument 3 passed to Flow\Model\WikiReference::__construct() must be an instance of Title, null given, called in /srv/mediawiki/php-1.26wmf23/extensions/Flow/includes/Model/WikiReference.php on line 67 and defined in /srv/mediawiki/php-1.26wmf23/extensions/Flow/includes/Model/WikiReference.php on line 24

Example in production:

mysql:research@x1-analytics-slave [flowdb]> SELECT ref_src_wiki, ref_src_namespace, ref_src_title, ref_target_namespace, ref_src_title FROM flow_wiki_ref WHERE ref_src_namespace = 102 AND ref_src_title = 'Astronomie/Porte_des_étoiles' AND ref_target_namespace = -1;
+--------------+-------------------+-------------------------------+----------------------+-------------------------------+
| ref_src_wiki | ref_src_namespace | ref_src_title                 | ref_target_namespace | ref_src_title                 |
+--------------+-------------------+-------------------------------+----------------------+-------------------------------+
| frwiki       |               102 | Astronomie/Porte_des_étoiles  |                   -1 | Astronomie/Porte_des_étoiles  |
+--------------+-------------------+-------------------------------+----------------------+-------------------------------+
1 row in set (0.00 sec)

This shows up on the Flow.log on fluorine:

2015-09-21 16:20:22 terbium aawikibooks Flow INFO: Flow\Model\WikiReference::makeTitle: Invalid title.  Namespace: 102, Title text: Astronomie/Porte_des_étoiles

So we have to wait for T113245: Set wgFlowMigrateReferenceWiki to false in production, and the script should explicitly check this global and bail.

Change 239882 had a related patch set uploaded (by Mattflaschen):
Fix FlowFixLinks to use ref_src_wiki and require migration be complete

https://gerrit.wikimedia.org/r/239882

Change 239882 merged by jenkins-bot:
Fix FlowFixLinks to use ref_src_wiki and require migration be complete

https://gerrit.wikimedia.org/r/239882

@Etonkovidova If you want to test this in Beta, you can run it as a normal maintenance script, with the --force flag added.

Not in QA review column because it hasn't actually been run in production yet.

This issue just happened not to crop up in Beta when it was run there earlier.

Now that $wgFlowMigrateReferenceWiki is off everywhere, I tried to run the script, but it didn't get very far:

catrope@terbium:/srv/mediawiki/php-1.26wmf24$ mwscript extensions/Flow/maintenance/FlowFixLinks.php --wiki=testwiki
Removed 3 links to special pages.
Rebuilt links for 300 workflows...

Warning: Invalid argument supplied for foreach() in /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/Compactor/FeatureCompactor.php on line 70

Fatal error: Unsupported operand types in /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/Compactor/FeatureCompactor.php on line 81

I deployed Matthias's debugging patch (https://gerrit.wikimedia.org/r/#/c/240439/) and tried again:

catrope@terbium:~$ mwscript extensions/Flow/maintenance/FlowFixLinks.php --wiki=testwiki
Removed 0 links to special pages.
Rebuilt links for 300 workflows...
[86218a63] [no req]   Flow\Exception\DataModelException from line 71 of /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/Compactor/FeatureCompactor.php: Cached data for "flowdb:flow_ref:wiki:by-source:v3:Parser's_"broken"_ _(page)_&_grill:testwiki:1:4.7"" should map to a valid query: 
Backtrace:
#0 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/Index/FeatureIndex.php(485): Flow\Data\Compactor\FeatureCompactor->expandCacheResult(array, array)
#1 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/ObjectLocator.php(80): Flow\Data\Index\FeatureIndex->findMulti(array, array)
#2 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/ObjectLocator.php(55): Flow\Data\ObjectLocator->findMulti(array, array)
#3 [internal function]: Flow\Data\ObjectLocator->find(array)
#4 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/ManagerGroup.php(129): call_user_func_array(array, array)
#5 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Data/ManagerGroup.php(141): Flow\Data\ManagerGroup->call(string, array)
#6 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/LinksTableUpdater.php(126): Flow\Data\ManagerGroup->find(string, array)
#7 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/LinksTableUpdater.php(49): Flow\LinksTableUpdater->getReferencesForTitle(Title)
#8 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/Content/BoardContent.php(191): Flow\LinksTableUpdater->mutateParserOutput(Title, ParserOutput)
#9 /srv/mediawiki/php-1.26wmf24/includes/content/AbstractContent.php(230): Flow\Content\BoardContent->getParserOutput(Title, NULL, NULL, boolean)
#10 /srv/mediawiki/php-1.26wmf24/extensions/Flow/includes/LinksTableUpdater.php(36): AbstractContent->getSecondaryDataUpdates(Title)
#11 /srv/mediawiki/php-1.26wmf24/extensions/Flow/maintenance/FlowFixLinks.php(94): Flow\LinksTableUpdater->doUpdate(Flow\Model\Workflow)
#12 /srv/mediawiki/php-1.26wmf24/extensions/Flow/maintenance/FlowFixLinks.php(45): FlowFixLinks->rebuildCoreTables()
#13 /srv/mediawiki/php-1.26wmf24/maintenance/Maintenance.php(1318): FlowFixLinks->doDBUpdates()
#14 /srv/mediawiki/php-1.26wmf24/maintenance/doMaintenance.php(103): LoggedUpdateMaintenance->execute()
#15 /srv/mediawiki/php-1.26wmf24/extensions/Flow/maintenance/FlowFixLinks.php(107): require_once(string)
#16 /srv/mediawiki/multiversion/MWScript.php(97): require_once(string)
#17 {main}
Catrope reassigned this task from Catrope to matthiasmullie.Sep 24 2015, 6:59 PM

that key looks suspicious with the space in there
looks like that space is a + in the title
var_dump($wgMemc->get('flowdb:flow_ref:wiki:by-source:v3:Parser\'s_"broken”_+_(page)_&_grill:testwiki:1:4.7'));
this yields a result
Flow or some BagOStuff does some inconsistent encoding, it seems

Change 241597 had a related patch set uploaded (by Matthias Mullie):
Fix Memcached key decode

https://gerrit.wikimedia.org/r/241597

Change 241597 merged by jenkins-bot:
Fix Memcached key decode

https://gerrit.wikimedia.org/r/241597

Change 242770 had a related patch set uploaded (by Catrope):
Fix Memcached key decode

https://gerrit.wikimedia.org/r/242770

Change 242906 had a related patch set uploaded (by Mattflaschen):
Fix Memcached key decode

https://gerrit.wikimedia.org/r/242906

Change 242770 merged by jenkins-bot:
Fix Memcached key decode

https://gerrit.wikimedia.org/r/242770

Change 242906 merged by jenkins-bot:
Fix Memcached key decode

https://gerrit.wikimedia.org/r/242906

matthiasmullie added a comment.EditedOct 1 2015, 7:33 PM

Successfully ran the script for testwiki:

$ mwscript extensions/Flow/maintenance/FlowFixLinks.php --wiki=testwiki --batch-size=50
Removed 0 links to special pages.
Rebuilt links for 50 workflows...
Rebuilt links for 100 workflows...
Rebuilt links for 150 workflows...
Rebuilt links for 200 workflows...
Rebuilt links for 250 workflows...
Rebuilt links for 300 workflows...
Rebuilt links for 350 workflows...
Rebuilt links for 400 workflows...
Rebuilt links for 450 workflows...
Rebuilt links for 500 workflows...
Rebuilt links for 526 workflows...
Completed

I ran the script for all wikis, and it was successful on all but one.

mediawikiwiki:  Removed 1176 links to special pages.
mediawikiwiki:  Rebuilt links for 300 workflows...
mediawikiwiki:  Rebuilt links for 600 workflows...
mediawikiwiki:  Rebuilt links for 900 workflows...
mediawikiwiki:  Rebuilt links for 1200 workflows...
mediawikiwiki:  Rebuilt links for 1500 workflows...
mediawikiwiki:  Rebuilt links for 1800 workflows...
mediawikiwiki:  Rebuilt links for 2100 workflows...
mediawikiwiki:  Rebuilt links for 2400 workflows...
mediawikiwiki:  Rebuilt links for 2700 workflows...
mediawikiwiki:  Rebuilt links for 3000 workflows...
mediawikiwiki:  Rebuilt links for 3300 workflows...
mediawikiwiki:  Rebuilt links for 3600 workflows...
mediawikiwiki:  Rebuilt links for 3900 workflows...
mediawikiwiki:  Rebuilt links for 4200 workflows...
mediawikiwiki:  Rebuilt links for 4500 workflows...
mediawikiwiki:  Rebuilt links for 4800 workflows...
mediawikiwiki:  Rebuilt links for 5100 workflows...
mediawikiwiki:  Rebuilt links for 5400 workflows...
mediawikiwiki:  
mediawikiwiki:  Catchable fatal error: Argument 7 passed to Flow\Model\WikiReference::__construct() must be an instance of Title, null given, called in /srv/mediawiki/php-1.27.0-wmf.1/extensions/Flow/includes/Model/WikiReference.php on line 67 and defined in /srv/mediawiki/php-1.27.0-wmf.1/extensions/Flow/includes/Model/WikiReference.php on line 24

That's a bit maddening because that error is the exact error we were trying to fix with this whole migration. I didn't manage to get a backtrace, but I'll try if I can see anything weird in the DB.

Aha! It's because https://www.mediawiki.org/wiki/Topic:Qsekw84h4a7ohpjd links to the VisualEditor namespace, which has been removed. I've removed the rows that had ref_target_namespace=2500, rerunning now.

Successfully ran on mediawikiwiki, so this is now done.

Aha! It's because https://www.mediawiki.org/wiki/Topic:Qsekw84h4a7ohpjd links to the VisualEditor namespace, which has been removed. I've removed the rows that had ref_target_namespace=2500, rerunning now.

Following up on our discussion in standup, this is because the script does not remove the Flow links entries (except for links to negative namespaces). It just removes the core ones and rebuilds them from the Flow ones (this solves the issue with removals not always working before).

So the VE ones are gone now (they'll be regenerated if one of those posts/whatevers are edited). I don't think they would have really been usable anyway since the namespace doesn't exist.

I don't know what happens to these entries in the core links tables when a namespace is deleted (Just moving the page out of the namespace doesn't affect how a link is interpreted, only whether the namespace exists). namespaceDupes handles similar cases, but I'm not sure if it handles this one.

In T110326#1698014, @Mattflaschen wrote:

Following up on our discussion in standup, this is because the script does not remove the Flow links entries (except for links to negative namespaces). It just removes the core ones and rebuilds them from the Flow ones (this solves the issue with removals not always working before).
So the VE ones are gone now (they'll be regenerated if one of those posts/whatevers are edited). I don't think they would have really been usable anyway since the namespace doesn't exist.

Hmm, yeah, but since what I actually did was remove all flow_wiki_ref rows for those workflows (so the deletion would use the PK), we now have no links tables entries for them. I assumed that the script would rebuild those rows, but I guess it doesn't. If there's a script that recomputes the information from scratch, I could run that on the 4 affected topics/workflows/whatever, but it's probably not a big deal.

Checked in production

research@s3-analytics-slave [enwiki]> select count(*) from pagelinks where pl_namespace= -1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)
DannyH removed a subscriber: DannyH.Oct 5 2015, 10:41 PM
Catrope closed this task as Resolved.Oct 9 2015, 7:41 PM