Page MenuHomePhabricator

UpdateRepo doesn't always propagate changes
Closed, ResolvedPublic

Description

When pages are moved in enwiki and other wikis, the sitelink is updated instantly, but when I move pages in nowiki I have to update the sitelinks in Wikidata manually. This is not a one-off thing, but happens every time I move pages in no.wikipedia.

Examples:

  1. Moved this page in enwiki, the move was reflected instantly in Wikidata: https://www.wikidata.org/w/index.php?title=Q19594854&action=history
  2. Moved this page in nowiki, the move was not reflected in Wikidata (I changed it manually several minutes later): https://www.wikidata.org/w/index.php?title=Q12005943&action=history

See also page moves/ deletions applied to Wikidata by Hoo Bot: https://www.wikidata.org/wiki/Special:Contributions/Hoo_Bot

Event Timeline

jhsoby created this task.Mar 15 2015, 11:45 PM
jhsoby raised the priority of this task from to Needs Triage.
jhsoby updated the task description. (Show Details)
jhsoby added a project: Wikidata.
jhsoby added a subscriber: jhsoby.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 15 2015, 11:45 PM

Just to note: This does not only happen to me, but to all moves in nowiki, as examplified by this move where the sitelink was later changed by a bot in Wikidata: https://www.wikidata.org/w/index.php?title=Q15703138&action=history

Based on that bot's activity, it seems that this is probably not a problem for nowiki only, though my bug title may make it seem that way.

hoo claimed this task.Mar 16 2015, 12:14 AM
hoo triaged this task as High priority.
hoo set Security to None.

This also seems to happen on nlwiki, not all page moves are processed to Wikidata.

hoo added a comment.Mar 17 2015, 10:45 AM

I have enabled more debug logging yesterday and will examine that today/ this week. Also T92931 is the cause for at least some lost deletions/ moves (but maybe not all?).

Change 197350 had a related patch set uploaded (by Hoo man):
Run UpdateRepo jobs on time() 2, not time() 1

https://gerrit.wikimedia.org/r/197350

hoo added a comment.Mar 17 2015, 4:42 PM

See also https://gerrit.wikimedia.org/r/197350 ... but that is not going to help here... will investigate more.

hoo renamed this task from When I move a page in nowiki, the sitelink isn't updated automatically to UpdateRepo doesn't always propagate changes.Mar 17 2015, 4:44 PM
hoo updated the task description. (Show Details)

Change 197350 merged by jenkins-bot:
Run UpdateRepo jobs on time() 2, not time() 1

https://gerrit.wikimedia.org/r/197350

hoo added a comment.Mar 23 2015, 10:53 PM
hoo@fluorine:/a/mw-log$ grep UpdateRepo runJobs.log | grep nowiki
hoo@fluorine:/a/mw-log$

Seems like we don't even get to schedule any update repo jobs for nowiki :( Needs more investigation.

Change 199596 had a related patch set uploaded (by Hoo man):
Add debug logging to client UpdateRepo code

https://gerrit.wikimedia.org/r/199596

Change 199613 had a related patch set uploaded (by Hoo man):
Add debug logging to client UpdateRepo code

https://gerrit.wikimedia.org/r/199613

Change 199596 merged by jenkins-bot:
Add debug logging to client UpdateRepo code

https://gerrit.wikimedia.org/r/199596

Change 199613 merged by jenkins-bot:
Add debug logging to client UpdateRepo code

https://gerrit.wikimedia.org/r/199613

hoo added a subscriber: aaron.Mar 26 2015, 3:40 PM

I poked at this some more and I'm super clueless now: It seems that the jobs on nowiki get created and queued in the JobQueue just fine, but they never seem to get started (according to runJobs.log).

According to showJobs they are active (what exactly is that?) though, after being scheduled (I scheduled a lot of jobs to just to be very sure they end up in there).

mwscript showJobs.php --wiki wikidatawiki --group | grep UpdateRepo
UpdateRepoOnMove: 0 queued; 772 claimed (563 active, 209 abandoned); 0 delayed
UpdateRepoOnDelete: 0 queued; 51 claimed (2 active, 49 abandoned); 0 delayed

Adding @aaron hoping that he has an idea what might be going on here or what steps I could take next.

aaron added a comment.Mar 26 2015, 4:08 PM

If a job is marked as active then it was popped and started but hasn't been ACKed. After an hour of being in such a state it's recycled for popping again. The 209 abandoned jobs must have failed 3 times. Unless the logging is broken, there should be runJobs entries (at least STARTED ones).

hoo added a comment.Mar 26 2015, 4:20 PM

If a job is marked as active then it was popped and started but hasn't been ACKed. After an hour of being in such a state it's recycled for popping again. The 209 abandoned jobs must have failed 3 times. Unless the logging is broken, there should be runJobs entries (at least STARTED ones).

They don't seem to end up there, that's why I'm out of ideas... also I can't find any information about the jobs that failed (that are abandoned).

hoo@fluorine:/a/mw-log$ grep -c 'UpdateRepo.*nowiki' runJobs.log
0
aaron added a comment.EditedMar 26 2015, 5:11 PM

So the jobs have nowiki as a parameter? If not, shouldn't you be grepping for wikidatawiki? I guess it's siteId.

hoo added a comment.Mar 26 2015, 5:16 PM

So the jobs have nowiki as a parameter? If not, shouldn't you be grepping for wikidatawiki? I guess it's siteId.

It indeed has nowiki as parameter (siteId). I checked (via a hacked up class on mw1017) that the job parameters are being set correctly (We're making use of JobSpecification). You can see what I'm expecting by eg. running the above grep for dewiki or enwiki or ...

hoo added a comment.Mar 27 2015, 9:14 AM

Once 9720c6c41079ff49804e1171eaa09121ddeaecc5 is deployed we can probably gain a bit more insight here... that might help us find the root cause as that only helps with listing jobs that got abandoned for some reason.

hoo added a comment.Apr 9 2015, 10:20 AM

These are definitely getting abandoned, but I still have no clue why. @aaron any ides how to continue here?

hoo@terbium:~$ mwscript showJobs.php --wiki wikidatawiki --type UpdateRepoOnMove --list | grep -c nowiki
70
hoo added a comment.Apr 9 2015, 3:13 PM

This seems to affect some wikis significantly more than others... maybe it could help to gather statistics for a longer period of time?

hoo@terbium:~$ mwscript showJobs.php --wiki wikidatawiki --type UpdateRepoOnMove --list | grep -oP 'siteId=[^ ]+' | sort | uniq -c
    317 siteId=elwiki
     71 siteId=nowiki
    128 siteId=svwiki
      1 siteId=zhwiki

Change 203967 had a related patch set uploaded (by Aaron Schulz):
Avoid using local titles in JobSpecification

https://gerrit.wikimedia.org/r/203967

hoo added a comment.Apr 13 2015, 10:11 PM

Just for the record: I investigated a bit further yesterday and today but wasn't able to come up with anything. Aaron managed to find the root cause: We (per default, in core) set the Job's title to the main page of the remote wiki. On the repo (wikidata), we then try to construct that title using Title::makeTitleSafe which sometimes fails. In those cases we abandoned the jobs. (Eg. > var_dump( Title::makeTitleSafe( 100, 'Forside' ) ); NULL for svwiki).

Change 203972 had a related patch set uploaded (by Hoo man):
Avoid using local main page title in JobSpecification

https://gerrit.wikimedia.org/r/203972

Change 203973 had a related patch set uploaded (by Hoo man):
Avoid using local main page title in JobSpecification

https://gerrit.wikimedia.org/r/203973

hoo closed this task as Resolved.Apr 13 2015, 10:19 PM
hoo removed a project: Patch-For-Review.

Will backport the fix with tonight's SWAT. Hopefully we wont lose any further edits then.

hoo moved this task from Doing to Done on the Wikidata-Sprint-2015-04-07 board.Apr 13 2015, 10:20 PM

Change 203967 merged by Aaron Schulz:
Avoid using local main page title in JobSpecification

https://gerrit.wikimedia.org/r/203967

Change 203973 merged by Hoo man:
Avoid using local main page title in JobSpecification

https://gerrit.wikimedia.org/r/203973

Change 203972 merged by Hoo man:
Avoid using local main page title in JobSpecification

https://gerrit.wikimedia.org/r/203972