Page MenuHomePhabricator

CTT tasks week of 2024-06-14
Closed, ResolvedPublic

Description

Documentation at https://www.mediawiki.org/wiki/Content_Transform_Team/Chores

  • Vendor patch for commit 7ef9a41e for train 1.43.0-wmf.10 (T361404 (train ticket))
    • RT-testing started (Friday by Europe EOD)
    • regression script run
    • RT-testing logs checked
    • Vendor+core patch created
    • Deployment changelog
    • Vendor patch reviewed
    • Patches (vendor + core) merged (Monday by US EOD)
  • Group 0
    • logstash checked
    • Grafana checked
  • Group 1
    • logstash checked
    • Grafana checked
  • Group 2
    • logstash checked
    • Grafana checked
  • Update status on deployment changelog to done
  • Monitor Parsoid Community-reported issues (Thursday before triage meeting)
  • PCS deployment
  • Wikifeeds deployment
  • Next week's phab created and linked on Slack bookmarks (T368118)

Event Timeline

We have a few issues with regression testing:

  • Out of disk space. I ran the script in https://wikitech.wikimedia.org/wiki/Parsoid/Common_Tasks#Freeing_disk_space but it failed halfway through (just after "Dumping results table") with a permissions error trying to write results.sql.gz. I was in my homedir and i'm pretty sure the script was running as my user so I don't know what that was about. At any rate, it had freed up enough space by that time to successfully start rt testing, so I didn't investigate further.
  • Between 7:30am-8am EST on Jun 13 we had a server issue, resulting in a large number of the following in the logs:
Zmqsmfn7_ueX-TT4hLc5ZwAAAII] /w/rest.php/fr.wikipedia.org/v3/page/wikitext/Badis_badis   Wikimedia\Rdbms\DBUnexpectedError: Database servers in cluster26 are overloaded. In order to protect application servers, the circuit breaking to databases of this section have been activated. Please try again a few seconds.

This resulted in a large number of 500 failures. With @Arlolra's advice I used the instructions in https://www.mediawiki.org/wiki/Parsing/Visual_Diff_Testing#Retesting_a_subset_of_titles to purge the failing files:

$ mysql -u testreduce -p testreduce
>update pages set claim_hash="",claim_num_tries=0, claim_timestamp=null,latest_stat=null,latest_result=null,latest_score=0,num_fetch_errors=0 where latest_score=1000000;

which (a) affected 166 rows, and (b) resulted in a huge spike in load average on the machine, as apparently all of those pages got requeued at once:

Top - 17:42:05 up 134 days,  7:20,  1 user,  load average: 102.97, 76.62, 43.64
Tasks: 148 total,  38 running, 110 sleeping,   0 stopped,   0 zombie

But eventually things settled down, although some of those pages failed with timeouts again and I had to repeat the sql command to requeue a handful of them for a third time.

Finally:

----- Comparing results -----
...
dewiki:Akademische_Gesellschaft_Sonderbund
No changes!
---------------------
*** No pages need investigation ***

Change #1043324 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a8

https://gerrit.wikimedia.org/r/1043324

Change #1043325 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Bump wikimedia/parsoid to 0.20.0-a8

https://gerrit.wikimedia.org/r/1043325

Change #1043324 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a8

https://gerrit.wikimedia.org/r/1043324

Change #1043325 merged by jenkins-bot:

[mediawiki/core@master] Bump wikimedia/parsoid to 0.20.0-a8

https://gerrit.wikimedia.org/r/1043325

ABreault-WMF updated the task description. (Show Details)