Page MenuHomePhabricator

ArielGlenn (ariel)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 7:09 PM (236 w, 4 d)
Availability
Available
IRC Nick
apergos
LDAP User
ArielGlenn
MediaWiki User
ArielGlenn [ Global Accounts ]

Recent Activity

Yesterday

ArielGlenn updated the task description for T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.
Sun, Apr 21, 3:27 PM · Dumps-Generation
ArielGlenn added a comment to T216160: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday.

@JAllemandou bouncing back to you in light of what others have said here.

Sun, Apr 21, 10:11 AM · Patch-For-Review, WikiCite, Analytics, Dumps-Generation, Wikidata
ArielGlenn updated the task description for T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.
Sun, Apr 21, 9:18 AM · Dumps-Generation
ArielGlenn triaged T221515: adds/changes dumps are much slower now as High priority.
Sun, Apr 21, 8:10 AM · Dumps-Generation
ArielGlenn added a comment to T220940: Abstracts dumps for Commons running very slowly.

Since the abstracts for Commons take (at least) 3 days without that patch, meaning that the current (Apr 20th) run could not complete in time (20th through 30th), I'm running them manually out f a screen session on snapshot1008, live-patched, and I'll clean up the patches at the end of the run.

Sun, Apr 21, 8:08 AM · Patch-For-Review, Dumps-Generation
ArielGlenn updated the task description for T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.
Sun, Apr 21, 7:14 AM · Dumps-Generation

Sat, Apr 20

ArielGlenn added a project to T221504: investigate why content history dump of certain wikidata page ranges is so slow: Wikidata.
Sat, Apr 20, 9:12 PM · Wikidata, Dumps-Generation
ArielGlenn added a comment to T221504: investigate why content history dump of certain wikidata page ranges is so slow.

There are an awful lot of entries that look like this one:

Sat, Apr 20, 9:05 PM · Wikidata, Dumps-Generation
ArielGlenn moved T221504: investigate why content history dump of certain wikidata page ranges is so slow from Backlog to Active on the Dumps-Generation board.
Sat, Apr 20, 8:44 PM · Wikidata, Dumps-Generation
ArielGlenn triaged T221504: investigate why content history dump of certain wikidata page ranges is so slow as High priority.
Sat, Apr 20, 7:52 PM · Wikidata, Dumps-Generation
ArielGlenn added a comment to T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.

The noop job is running for wikidata, which will generate checksums and update links; There's also one running for enwiki because of my typo; no harm done except that the rss feed might be a bit odd until later in the day.

Sat, Apr 20, 12:06 PM · Dumps-Generation
ArielGlenn added a comment to T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.

I'm delaying the start of the wikidata dumps for a few hours, to give the manual generation of 7z files time to complete. A four hours delay ought to be enough.

Sat, Apr 20, 7:36 AM · Dumps-Generation
ArielGlenn moved T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues from Backlog to Active on the Dumps-Generation board.
Sat, Apr 20, 7:34 AM · Dumps-Generation

Thu, Apr 18

ArielGlenn added a comment to T221399: imported pages for which there is no local user are no longer dumped under 1.34.0-wmf.1.

An example may be hard to find; I'd want something imported a while ago, no local user for the page and no modifications made to it since the import. Nonetheless I'll hunt around some.

Thu, Apr 18, 7:35 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn triaged T221399: imported pages for which there is no local user are no longer dumped under 1.34.0-wmf.1 as High priority.
Thu, Apr 18, 7:17 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.

Commons is done and should be available soon on the webserver. Wikidata is still finishing up page content bz2 files; that and the 7z recompressed files are the last for this run.

Thu, Apr 18, 6:56 PM · Dumps-Generation
ArielGlenn added a comment to T220940: Abstracts dumps for Commons running very slowly.

@Marostegui sorry to ping you again but we'd like your expertise: I can use this workaround of using he query as is and throwing away the revisions we don't want (LIMIT 50000 always) or we can change it to join on page_namespace right in the query (but the LIMIT will still be 50k). The upside of the second is that much less data will be sent but the downside is that it will take a lot longer to hit that LIMIT, what do you think about this tradeoff? Which is harder on the db servers and is the difference appreciable either way?

Thu, Apr 18, 3:29 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T220940: Abstracts dumps for Commons running very slowly.

The above patches have been tested locally and I have just used them on Commons to regenerate part 5 of the abstracts in record time. This does not address the underlying cause, which I suspect to be somewhere in the revision handling for MCR. Still investigating that.

Thu, Apr 18, 10:19 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.

The commons abstract part 5 is done already. Running commons flow history manually, after that will be the multistream dumps, and then I'll run the abstracts recombine and make all the abstracts available.

Thu, Apr 18, 9:59 AM · Dumps-Generation
ArielGlenn added a comment to T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.

Running commons abstract piece 5 into a separate directory now manually on snapshot1005, with https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/504792/ and https://gerrit.wikimedia.org/r/#/c/operations/dumps/+/504842/ applied locally to 1.34_wmf1 and 1.33_wmf25.

Thu, Apr 18, 9:45 AM · Dumps-Generation
ArielGlenn added a comment to T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.

The two wikis left to complete are commons and wikidata. I'll be trying to get the commons abstracts to complete today.

Thu, Apr 18, 7:33 AM · Dumps-Generation

Wed, Apr 17

ArielGlenn added a comment to T221285: deployment-snapshot01 puppet error due to nginx-apache2 conflict.

Ah wonderful, thanks a lot!

Wed, Apr 17, 9:00 PM · Beta-Cluster-Infrastructure
ArielGlenn added a comment to T221285: deployment-snapshot01 puppet error due to nginx-apache2 conflict.

I thought the proxy services thing doesn't get applied in beta; there's likely a missing hiera setting someplace.

Wed, Apr 17, 8:29 PM · Beta-Cluster-Infrastructure
ArielGlenn added a watcher for ActiveAbstract: ArielGlenn.
Wed, Apr 17, 3:05 PM

Tue, Apr 16

ArielGlenn added a comment to T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues.

I might as well note here things being done to patch up the Apr 1 2019 run as well.

Tue, Apr 16, 9:35 PM · Dumps-Generation
ArielGlenn added a comment to T220940: Abstracts dumps for Commons running very slowly.

One thing that occurs to me is that on wikis where the majority of the pages are not in the main namespace, the LIMIT might not be as effective as we want, and these could turn into pretty slow queries, especially on Commons where the vast majority of titles are in the File: namespace. I'm going to pursue a quick-n-dirty workaround in WikiExporter for testing first, and investigate modifying the query a little bit later.

Tue, Apr 16, 3:09 PM · Patch-For-Review, Dumps-Generation
ArielGlenn updated subscribers of T220424: XmlDUmpWriter::writeRevision sometimes broken by duplicate keys in Link Cache.

Hey @daniel the patchset for this goes together with the patchset for T220793 which you just +2'ed, care to have a look?

Tue, Apr 16, 1:44 PM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn triaged T221086: Apr 1 2019 and/or >=1.33-wmf.23 dump run issues as Normal priority.
Tue, Apr 16, 1:41 PM · Dumps-Generation
ArielGlenn added a comment to T220006: CirrusSearch dumps are broken since Mar 18 2019.

The previous data is no longer available; we get a dump of what the indexes hold at the time of the dump. It's not like article history where there are separate revisions showing the state at any given time.

Tue, Apr 16, 7:00 AM · Patch-For-Review, Dumps-Generation, Discovery-Search, CirrusSearch

Mon, Apr 15

ArielGlenn added a comment to T220940: Abstracts dumps for Commons running very slowly.

I should point out that the number of pages we ask for differs in practice from the above; typically it's going to be a range of up to 10k pages. I"ll run with that and see if I can get the explain for it at the time (I have a script for that!)

Mon, Apr 15, 2:07 PM · Patch-For-Review, Dumps-Generation
ArielGlenn updated subscribers of T220940: Abstracts dumps for Commons running very slowly.

What we want: namespace 0 only in the SELECT. Here's the explain from that:

root@PRODUCTION s4 slave[commonswiki]> explain extended SELECT rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,page_restrictions,1 AS `_load_content`  FROM `page` JOIN `revision` ON ((page_id=rev_page AND page_latest=rev_id)) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id))   WHERE (page_id >= 200 AND page_id < 2200) AND (page_namespace = 0) AND (rev_page>0 OR (rev_page=0 AND rev_id>0))  ORDER BY rev_page ASC,rev_id ASC LIMIT 50000;
+------+-------------+---------------------+--------+--------------------------------------------------------+---------+---------+----------------------------------------------------+------+----------+----------------------------------------------+
| id   | select_type | table               | type   | possible_keys                                          | key     | key_len | ref                                                | rows | filtered | Extra                                        |
+------+-------------+---------------------+--------+--------------------------------------------------------+---------+---------+----------------------------------------------------+------+----------+----------------------------------------------+
|    1 | SIMPLE      | page                | range  | PRIMARY,name_title                                     | PRIMARY | 4       | NULL                                               | 3206 |    75.02 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | revision            | eq_ref | PRIMARY,page_timestamp,page_user_timestamp,rev_page_id | PRIMARY | 4       | commonswiki.page.page_latest                       |    1 |   100.00 | Using where                                  |
|    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                 | PRIMARY | 4       | commonswiki.page.page_latest                       |    1 |   100.00 | Using index                                  |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                | PRIMARY | 8       | commonswiki.temp_rev_comment.revcomment_comment_id |    1 |   100.00 |                                              |
+------+-------------+---------------------+--------+--------------------------------------------------------+---------+---------+----------------------------------------------------+------+----------+----------------------------------------------+
4 rows in set, 1 warning (0.00 sec)

Here's the explain from the query for getting the current revision for pages rows as we have it now:

root@PRODUCTION s4 slave[commonswiki]> explain extended SELECT rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,page_restrictions,1 AS `_load_content`  FROM `page` JOIN `revision` ON ((page_id=rev_page AND page_latest=rev_id)) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id))   WHERE (page_id >= 200 AND page_id < 2200) AND (rev_page>0 OR (rev_page=0 AND rev_id>0))  ORDER BY rev_page ASC,rev_id ASC LIMIT 50000;
+------+-------------+---------------------+--------+--------------------------------------------------------+---------+---------+----------------------------------------------------+------+----------+----------------------------------------------+
| id   | select_type | table               | type   | possible_keys                                          | key     | key_len | ref                                                | rows | filtered | Extra                                        |
+------+-------------+---------------------+--------+--------------------------------------------------------+---------+---------+----------------------------------------------------+------+----------+----------------------------------------------+
|    1 | SIMPLE      | page                | range  | PRIMARY                                                | PRIMARY | 4       | NULL                                               | 3206 |   100.00 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | revision            | eq_ref | PRIMARY,page_timestamp,page_user_timestamp,rev_page_id | PRIMARY | 4       | commonswiki.page.page_latest                       |    1 |   100.00 | Using where                                  |
|    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                 | PRIMARY | 4       | commonswiki.page.page_latest                       |    1 |   100.00 | Using index                                  |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                | PRIMARY | 8       | commonswiki.temp_rev_comment.revcomment_comment_id |    1 |   100.00 |                                              |
+------+-------------+---------------------+--------+--------------------------------------------------------+---------+---------+----------------------------------------------------+------+----------+----------------------------------------------+
4 rows in set, 1 warning (0.00 sec)
Mon, Apr 15, 1:50 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T217549: bytemark dump mirror index.html file is out of date.

Erm @Reedy? If you can't get to it just now, understandable, maybe you can give an ETA?

Mon, Apr 15, 1:08 PM · Dumps-Generation
ArielGlenn added a comment to T217543: incrementals of punjabi.wikimedia.org fail.

Thanks! I'll keep this open until the next run goes by without whines.

Mon, Apr 15, 1:07 PM · Dumps-Generation
ArielGlenn added a comment to T220424: XmlDUmpWriter::writeRevision sometimes broken by duplicate keys in Link Cache.

Here's another one.

2019-04-13 21:00:06: commonswiki (ID 6588) 7999 pages (38.4|204.2/sec all|curr), 8000 revs (38.4|25.5/sec all|curr), ETA 2019-05-07 08:58:25 [max 78009778]
[a4981f3630af2b579a4a53dc] [no req]   InvalidArgumentException from line 100 of /srv/mediawiki/php-1.33.0-wmf.25/includes/Revision/RevisionStoreRecord.php: The given Title does not belong t
o page ID 56765073 but actually belongs to 78009849
Backtrace:
#0 /srv/mediawiki/php-1.33.0-wmf.25/includes/Revision/RevisionStore.php(1823): MediaWiki\Revision\RevisionStoreRecord->__construct(Title, User, CommentStoreComment, stdClass, MediaWiki\Revi
sion\RevisionSlots, boolean)
#1 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/XmlDumpWriter.php(311): MediaWiki\Revision\RevisionStore->newRevisionFromRow(stdClass, integer, Title)
#2 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(485): XmlDumpWriter->writeRevision(stdClass)
#3 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(445): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\ResultWrapper, stdClass)
#4 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(269): WikiExporter->dumpPages(string, boolean)
#5 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(154): WikiExporter->dumpFrom(string, boolean)
#6 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/includes/BackupDumper.php(288): WikiExporter->pagesByRange(integer, integer, boolean)
#7 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/dumpBackup.php(83): BackupDumper->dump(integer, integer)
#8 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/doMaintenance.php(96): DumpBackup->execute()
#9 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/dumpBackup.php(138): require_once(string)
#10 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string)
#11 {main}
Mon, Apr 15, 10:25 AM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn moved T154914: Add .nt to DCAT-AP for Wikidata dumps from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Mon, Apr 15, 10:15 AM · User-Smalyshev, Dumps-Generation, User-LokalProfil, Wikidata
ArielGlenn moved T218923: Make dumps scripts use mw php maint scripts to get db username and password from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Mon, Apr 15, 9:56 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T214293: See why wikidata xml/sql dumps pages-meta-history is so much slower than enwiki from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Mon, Apr 15, 9:56 AM · MW-1.33-notes (1.33.0-wmf.23; 2019-03-26), Patch-For-Review, Performance, Wikidata, Dumps-Generation
ArielGlenn moved T220942: Dumps of cirrussearch have been empty files since March 25th from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Mon, Apr 15, 9:56 AM · Dumps-Generation
ArielGlenn moved T220809: adds/changes dumps leave lock files around which get rsynced to public servers from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Mon, Apr 15, 9:56 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T220809: adds/changes dumps leave lock files around which get rsynced to public servers from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Mon, Apr 15, 9:56 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T220942: Dumps of cirrussearch have been empty files since March 25th from Backlog to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Mon, Apr 15, 9:55 AM · Dumps-Generation
ArielGlenn moved T220940: Abstracts dumps for Commons running very slowly from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:55 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T220006: CirrusSearch dumps are broken since Mar 18 2019 from Backlog to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Mon, Apr 15, 9:55 AM · Patch-For-Review, Dumps-Generation, Discovery-Search, CirrusSearch
ArielGlenn moved T220257: dumpBackups.php failing with InvalidArgumentException thrown from RevisionStoreRecord for certain wikis from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:54 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), MediaWiki-Revision-backend, Wikimedia-production-error, Patch-For-Review, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn moved T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:54 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn moved T220424: XmlDUmpWriter::writeRevision sometimes broken by duplicate keys in Link Cache from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:54 AM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn moved T220493: Xml stubs dumps are running 5 to 15x slower than previously from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:54 AM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn moved T220793: content still marked as flow-board on urwikibooks breaks abstract dumps from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:54 AM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-Maintenance-scripts, MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn moved T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:54 AM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn closed T220809: adds/changes dumps leave lock files around which get rsynced to public servers as Resolved.

I've cleaned up the old lock files on the other hosts, and verified that rsync no longer copies over lock files held during the current run.

Mon, Apr 15, 9:53 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T220809: adds/changes dumps leave lock files around which get rsynced to public servers.

The lock files are cleaned up on the host where these files are written, but they are rsynced over to the other hosts, which is incorrect. We should add *.lock to the rsync exclusion list for otherdumps, and manually clean up the files that are already on the dumpsdata fallback and labstore hosts.

Mon, Apr 15, 9:18 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T220809: adds/changes dumps leave lock files around which get rsynced to public servers from Backlog to Active on the Dumps-Generation board.
Mon, Apr 15, 9:14 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T154914: Add .nt to DCAT-AP for Wikidata dumps.

@hoo what's left to be done here?

Mon, Apr 15, 9:11 AM · User-Smalyshev, Dumps-Generation, User-LokalProfil, Wikidata
ArielGlenn updated subscribers of T198676: Add versioning to DCAT-AP config.

@Smalyshev do you have any insight on this?

Mon, Apr 15, 9:11 AM · Patch-For-Review, Dumps-Generation, User-LokalProfil
ArielGlenn closed T218923: Make dumps scripts use mw php maint scripts to get db username and password as Resolved.

Done.

Mon, Apr 15, 9:09 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T217543: incrementals of punjabi.wikimedia.org fail.

@Reedy, who do I ask about this? Or can $somebody just put a placeholder Main Page in there so that we have content?

Mon, Apr 15, 9:09 AM · Dumps-Generation
ArielGlenn closed T214293: See why wikidata xml/sql dumps pages-meta-history is so much slower than enwiki as Resolved.

This looks good, and memory/cpu usage looks comparable to the bz2 runs. Closing.

Mon, Apr 15, 9:08 AM · MW-1.33-notes (1.33.0-wmf.23; 2019-03-26), Patch-For-Review, Performance, Wikidata, Dumps-Generation
ArielGlenn added a comment to T220006: CirrusSearch dumps are broken since Mar 18 2019.

I've done a test for elwiki and the content looks reasonable.

Mon, Apr 15, 9:01 AM · Patch-For-Review, Dumps-Generation, Discovery-Search, CirrusSearch
ArielGlenn added a comment to T220006: CirrusSearch dumps are broken since Mar 18 2019.
dumpsgen@snapshot1008:/mnt/dumpsdata/temp/dumpsgen$ /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php extensions/CirrusSearch/maintenance/dumpIndex.php --wiki=elwiki --indexType=content | gzip > elwiki-cirrus-content.gz
[1a17456c9633fdaeb63a3c83] [no req]   Elastica\Exception\Connection\HttpException from line 189 of /srv/mediawiki/php-1.33.0-wmf.25/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php: Couldn't connect to host, Elasticsearch down?
Backtrace:
#0 /srv/mediawiki/php-1.33.0-wmf.25/vendor/ruflin/elastica/lib/Elastica/Request.php(193): Elastica\Transport\Http->exec(Elastica\Request, array)
#1 /srv/mediawiki/php-1.33.0-wmf.25/vendor/ruflin/elastica/lib/Elastica/Client.php(688): Elastica\Request->send()
#2 /srv/mediawiki/php-1.33.0-wmf.25/vendor/ruflin/elastica/lib/Elastica/Client.php(699): Elastica\Client->request(string, string, array, array)
#3 /srv/mediawiki/php-1.33.0-wmf.25/vendor/ruflin/elastica/lib/Elastica/Client.php(699): Elastica\Client->request(string, string, array, array)
#4 /srv/mediawiki/php-1.33.0-wmf.25/vendor/ruflin/elastica/lib/Elastica/Search.php(462): Elastica\Client->request(string, string, array, array)
#5 /srv/mediawiki/php-1.33.0-wmf.25/vendor/ruflin/elastica/lib/Elastica/Scroll.php(130): Elastica\Search->search()
#6 /srv/mediawiki/php-1.33.0-wmf.25/extensions/CirrusSearch/maintenance/dumpIndex.php(147): Elastica\Scroll->rewind()
#7 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/doMaintenance.php(96): CirrusSearch\Maintenance\DumpIndex->execute()
#8 /srv/mediawiki/php-1.33.0-wmf.25/extensions/CirrusSearch/maintenance/dumpIndex.php(268): require_once(string)
#9 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string)
#10 {main}
Mon, Apr 15, 7:34 AM · Patch-For-Review, Dumps-Generation, Discovery-Search, CirrusSearch
ArielGlenn triaged T220942: Dumps of cirrussearch have been empty files since March 25th as High priority.
Mon, Apr 15, 7:22 AM · Dumps-Generation
ArielGlenn triaged T220940: Abstracts dumps for Commons running very slowly as High priority.
Mon, Apr 15, 6:56 AM · Patch-For-Review, Dumps-Generation

Sat, Apr 13

ArielGlenn renamed T220887: Allow Bryan Davis to downtime alerts in Icinga from Allow WMCS to downtime alerts in Icinga to Allow Bryan Davis to downtime alerts in Icinga.
Sat, Apr 13, 6:50 PM · Patch-For-Review, Operations, SRE-Access-Requests, monitoring

Fri, Apr 12

ArielGlenn renamed T220493: Xml stubs dumps are running 5 to 15x slower than previously from Xml stubs dumps are running 20x slower than previously to Xml stubs dumps are running 5 to 15x slower than previously.
Fri, Apr 12, 3:28 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T220493: Xml stubs dumps are running 5 to 15x slower than previously.

I've checked the so-called adds-changes dumps to see if the numbers pan out. These take a lot less time and we can rerun them with less fuss, so they will be good for testing later. And here's what I have: total time to generate stubs for these dumps, on the following dates:

start date: 20190310 end date: 20190410
date:  20190310 runtime:  5 minutes
date:  20190311 runtime:  7 minutes
date:  20190312 runtime:  7 minutes
date:  20190313 runtime:  6 minutes
date:  20190314 runtime:  6 minutes
date:  20190315 runtime:  6 minutes
date:  20190316 runtime:  6 minutes
date:  20190317 runtime:  6 minutes
date:  20190318 runtime:  7 minutes
date:  20190319 runtime:  6 minutes
date:  20190320 runtime:  7 minutes
date:  20190321 runtime:  6 minutes
date:  20190322 runtime:  6 minutes
date:  20190323 runtime:  5 minutes
date:  20190324 runtime:  5 minutes
date:  20190325 runtime:  5 minutes <--
date:  20190326 runtime:  6 minutes <---
date:  20190327 runtime:  30 minutes <---
date:  20190328 runtime:  49 minutes <---
date:  20190329 runtime:  45 minutes
date:  20190330 runtime:  45 minutes
date:  20190331 runtime:  52 minutes
date:  20190401 runtime:  44 minutes
date:  20190402 runtime:  49 minutes
date:  20190403 runtime:  38 minutes
date:  20190404 runtime:  40 minutes
date:  20190405 runtime:  46 minutes
date:  20190406 runtime:  50 minutes
date:  20190407 runtime:  54 minutes
date:  20190408 runtime:  48 minutes
date:  20190409 runtime:  43 minutes
date:  20190410 runtime:  39 minutes
Fri, Apr 12, 3:25 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn triaged T220809: adds/changes dumps leave lock files around which get rsynced to public servers as Normal priority.
Fri, Apr 12, 11:33 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T220793: content still marked as flow-board on urwikibooks breaks abstract dumps.

I ran with the above patch to get page and revision information out of the run, see https://logstash.wikimedia.org/goto/2c69d90f6828d71045f38374d55fd6c4 and indeed all the pages whined about there are exactly the ones in the urwikibooks Flow disabling task.

Fri, Apr 12, 8:46 AM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-Maintenance-scripts, MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn triaged T220793: content still marked as flow-board on urwikibooks breaks abstract dumps as Normal priority.
Fri, Apr 12, 8:42 AM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-Maintenance-scripts, MediaWiki-General-or-Unknown, Dumps-Generation

Thu, Apr 11

ArielGlenn reopened T207627: Disable unused Flow extension on ur.wikibooks as "Open".

Re-openng until these stray entries are cleaned up.

Thu, Apr 11, 8:36 PM · Patch-For-Review, User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

text and blob table info:

wikiadmin@10.64.32.136(urwikibooks)> select * from text where old_id in (5502,5532, 5537, 5560, 5566, 5689 );
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id | old_namespace | old_title | old_text           | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
|   5502 |               |           | DB://cluster25/99  |             |          |               |               |                | utf-8,gzip,external |                   |
|   5532 |               |           | DB://cluster24/106 |             |          |               |               |                | utf-8,gzip,external |                   |
|   5537 |               |           | DB://cluster24/108 |             |          |               |               |                | utf-8,gzip,external |                   |
|   5560 |               |           | DB://cluster25/135 |             |          |               |               |                | utf-8,gzip,external |                   |
|   5566 |               |           | DB://cluster24/125 |             |          |               |               |                | utf-8,gzip,external |                   |
|   5689 |               |           | DB://cluster25/205 |             |          |               |               |                | utf-8,gzip,external |                   |
+--------+---------------+-----------+--------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
Thu, Apr 11, 8:29 PM · Patch-For-Review, User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

These are all still listed as content model 4 (flow board) in the slots table.

wikiadmin@10.64.32.136(urwikibooks)> select * from content where content_model = 4;
+------------+--------------+---------------------------------+---------------+-----------------+
| content_id | content_size | content_sha1                    | content_model | content_address |
+------------+--------------+---------------------------------+---------------+-----------------+
|       1927 |          141 | dre6r1l99h9ujd9h2j9ip2hs60o4fj4 |             4 | tt:5502         |
|       1958 |           36 | gsnkn05mqpgwhdalwg5itm7yic5r0nr |             4 | tt:5532         |
|       1963 |          135 | ob5hf0aqbqtyifvdxd63c0r0zzgutkj |             4 | tt:5537         |
|       1985 |           66 | 6z6zkta73nu2a6het7ae9i9zfvd85z6 |             4 | tt:5560         |
|       1991 |           57 | dyfq9jthrker9nhw6y7gcddft95lqz1 |             4 | tt:5566         |
|       2100 |           51 | brzrz58hn1gyw3myhet5qob4el54xk2 |             4 | tt:5689         |
+------------+--------------+---------------------------------+---------------+-----------------+
6 rows in set (0.01 sec)

Here's the slot, rev and page info for all of those:

wikiadmin@10.64.32.136(urwikibooks)> select * from slots where slot_content_id in (1927, 1958, 1963, 1985, 1991, 2100);
+------------------+--------------+-----------------+-------------+
| slot_revision_id | slot_role_id | slot_content_id | slot_origin |
+------------------+--------------+-----------------+-------------+
|             5519 |            1 |            1927 |        5519 |
|             5553 |            1 |            1958 |        5553 |
|             5559 |            1 |            1963 |        5559 |
|             5590 |            1 |            1985 |        5590 |
|             5598 |            1 |            1991 |        5598 |
|             5743 |            1 |            2100 |        5743 |
+------------------+--------------+-----------------+-------------+
6 rows in set (0.01 sec)
Thu, Apr 11, 6:22 PM · Patch-For-Review, User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T220424: XmlDUmpWriter::writeRevision sometimes broken by duplicate keys in Link Cache.

Abstract dumps are broken by this as well, fix incoming.

Thu, Apr 11, 8:21 AM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation

Wed, Apr 10

ArielGlenn added a comment to T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php.

Welp that change https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/499402/ doesn't have a large enough try block, because the line where we get the content model (which fails) is just before that. It seems to me that missing content models is a serious enough issue that I want to hear about it by having things broken.

Wed, Apr 10, 3:36 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T203075: Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row 5191150.

https://logstash.wikimedia.org/goto/57a4c9ec510eb287f8f7d48a3db08f66 Another sample, these turned up in the abstract dumps for jvwiki. There's a workaround patch for that in wmf.25 which will mask the problem but the bad data will still be there.

Wed, Apr 10, 2:04 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Security, stability, performance and scalability (TEC1)), Wikimedia-production-error
ArielGlenn added a comment to T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php.
wikiadmin@10.64.0.205(dewikiversity)> select * from text where old_id = 269201;
+--------+---------------+-----------+-----------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id | old_namespace | old_title | old_text              | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+--------+---------------+-----------+-----------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| 269201 |               |           | DB://cluster22/121006 |             |          |               |               |                | utf-8,gzip,external |                   |
+--------+---------------+-----------+-----------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
1 row in set (0.00 sec)
Wed, Apr 10, 1:56 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php.

https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/499402/ will work around this issue once wmf.25 lands; if there had not been issues with the train it would have already landed for this wiki.

Wed, Apr 10, 12:58 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn reopened T207626: Disable unused Flow extension on de.wikiversity as "Open".

The inaccessible page is still marked as content type flow-board in the content table, see T220594 for the gory details. This needs to be fixed up though I don't know the best way to do that. I"m going to add @daniel as it's MCR-related and he might have some insight.

Wed, Apr 10, 12:23 PM · Patch-For-Review, User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php.

Info for the page and the current revision (which is what is used for abstracts):

wikiadmin@10.64.16.191(dewikiversity)> select * from page where page_id = 47279;
+---------+----------------+--------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title               | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+--------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|   47279 |           2600 | Was_wir_hören_und_sehen  |                   |                0 |           1 | 0.649918805112 | 20110807104642 | NULL               |      274772 |      352 | wikitext           | NULL      |
+---------+----------------+--------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
1 row in set (0.01 sec)
Wed, Apr 10, 12:17 PM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php.

Reproduce by:

dumpsgen@snapshot1007:~$ /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=dewikiversity /srv/mediawiki/php-1.33.0-wmf.24 --plugin=AbstractFilter:/srv/mediawiki/php-1.33.0-wmf.24/extensions/ActiveAbstract/AbstractFilter.php --current --report=1 --output=file:/mnt/dumpsdata/temp/dumpsgen/bad-abstracts-dewv-sigh.xml --filter=namespace:NS_MAIN --filter=noredirect --filter=abstract --skip-header --start=47279 --skip-footer --end 47280
[3b5b23e8293282f8aecb46dc] [no req]   MWUnknownContentModelException from line 265 of /srv/mediawiki/php-1.33.0-wmf.24/includes/content/ContentHandler.php: The content model 'flow-board' is not registered on this wiki.
See https://www.mediawiki.org/wiki/Content_handlers to find out which extensions handle this content model.
Backtrace:
#0 /srv/mediawiki/php-1.33.0-wmf.24/includes/Revision/RevisionStore.php(1470): ContentHandler::getForModelID(string)
#1 /srv/mediawiki/php-1.33.0-wmf.24/includes/Revision/RevisionStore.php(1634): MediaWiki\Revision\RevisionStore->loadSlotContent(MediaWiki\Revision\SlotRecord, NULL, NULL, NULL, integer)
#2 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}(MediaWiki\Revision\SlotRecord)
#3 /srv/mediawiki/php-1.33.0-wmf.24/includes/Revision/SlotRecord.php(307): call_user_func(Closure, MediaWiki\Revision\SlotRecord)
#4 /srv/mediawiki/php-1.33.0-wmf.24/includes/export/XmlDumpWriter.php(308): MediaWiki\Revision\SlotRecord->getContent()
#5 /srv/mediawiki/php-1.33.0-wmf.24/includes/export/WikiExporter.php(485): XmlDumpWriter->writeRevision(stdClass)
#6 /srv/mediawiki/php-1.33.0-wmf.24/includes/export/WikiExporter.php(445): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\ResultWrapper, NULL)
#7 /srv/mediawiki/php-1.33.0-wmf.24/includes/export/WikiExporter.php(269): WikiExporter->dumpPages(string, boolean)
#8 /srv/mediawiki/php-1.33.0-wmf.24/includes/export/WikiExporter.php(154): WikiExporter->dumpFrom(string, boolean)
#9 /srv/mediawiki/php-1.33.0-wmf.24/maintenance/includes/BackupDumper.php(288): WikiExporter->pagesByRange(integer, integer, boolean)
#10 /srv/mediawiki/php-1.33.0-wmf.24/maintenance/dumpBackup.php(83): BackupDumper->dump(integer, integer)
#11 /srv/mediawiki/php-1.33.0-wmf.24/maintenance/doMaintenance.php(96): DumpBackup->execute()
#12 /srv/mediawiki/php-1.33.0-wmf.24/maintenance/dumpBackup.php(138): require_once(string)
#13 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string)
#14 {main}
Wed, Apr 10, 11:54 AM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php.

logstash entry: https://logstash.wikimedia.org/goto/ebf2e31a728bbcb597647ce2773ab791

Wed, Apr 10, 11:36 AM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T220160: getRedirectTarget should not automatically load revision content in all cases.

Page info for the above example:

wikiadmin@10.64.0.205(hrwiki)> select * from page where page_id = 192386;
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title                                                | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|  192386 |              0 | Franjevački_samostan_Sv._Antuna_Padovanskog_u_Koprivnici  |                   |                1 |           1 | 0.222437823375 | 20160904152621 | NULL               |     1705637 |       81 | wikitext           | NULL      |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+

Entry in redirect table:

wikiadmin@10.64.0.205(hrwiki)> select * from redirect where rd_from = 192386;
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+
| rd_from | rd_namespace | rd_title                                                          | rd_interwiki | rd_fragment |
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+
|  192386 |            0 | Franjevački_samostan_i_crkva_sv._Antuna_Padovanskog_u_Koprivnici  | NULL         | NULL        |
+---------+--------------+-------------------------------------------------------------------+--------------+-------------+
Wed, Apr 10, 11:14 AM · Core Platform Team Kanban, Core Platform Team (Security, stability, performance and scalability (TEC1)), Wikimedia-production-error, Regression, MediaWiki-Revision-backend, Dumps-Generation
ArielGlenn triaged T220594: abstracts dumps for dewikiversity fail with MWUnknownContentModelException from ContentHandler.php as High priority.
Wed, Apr 10, 11:10 AM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T203075: Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row 5191150.

https://hr.wikipedia.org/wiki/Franjeva%C4%8Dki_samostan_Sv._Antuna_Padovanskog_u_Koprivnici This url triggers the exception. Logstash entry: https://logstash.wikimedia.org/goto/2fc2e8da6b9cab496c8eb409b7347b21 Here's the relevant info from the page table, revision table for the current revision, and text table for that text id.

wikiadmin@10.64.0.205(hrwiki)> select * from page where page_id = 192386;
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title                                                | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|  192386 |              0 | Franjevački_samostan_Sv._Antuna_Padovanskog_u_Koprivnici  |                   |                1 |           1 | 0.222437823375 | 20160904152621 | NULL               |     1705637 |       81 | wikitext           | NULL      |
+---------+----------------+-----------------------------------------------------------+-------------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+--------------------+-----------+
1 row in set (0.00 sec)
Wed, Apr 10, 9:56 AM · Core Platform Team Backlog (Watching / External), Core Platform Team (Security, stability, performance and scalability (TEC1)), Wikimedia-production-error

Tue, Apr 9

ArielGlenn added a comment to T220547: Document CirrusSearch schema.

You could add a README file in other/cirruusdumps for users o those, and link it off of other.html; see https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/dumps/files/web/html/other_index.html for that and https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/dumps/files/web/html for where a README might live.

Tue, Apr 9, 10:13 PM · Discovery-Search (Current work), Documentation, Cloud-Services, Elasticsearch, Discovery
ArielGlenn added a comment to T220493: Xml stubs dumps are running 5 to 15x slower than previously.

Here's an explain of a sample query we run to get revision info for stubs:

root@PRODUCTION s4 slave[commonswiki]> explain extended SELECT  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,page_restrictions  FROM `revision` FORCE INDEX (rev_page_id) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `page` ON ((rev_page=page_id))   WHERE (page_id >= 36814248 AND page_id < 36834248) AND (rev_page>0 OR (rev_page=0 AND rev_id>0))  ORDER BY rev_page ASC,rev_id ASC LIMIT 50000;
+------+-------------+---------------------+--------+------------------------+-------------+---------+----------------------------------------------------+--------+----------+-------------
| id   | select_type | table               | type   | possible_keys          | key         | key_len | ref                                                | rows   | filtered | Extra       
+------+-------------+---------------------+--------+------------------------+-------------+---------+----------------------------------------------------+--------+----------+-------------
|    1 | SIMPLE      | revision            | range  | rev_page_id            | rev_page_id | 4       | NULL                                               | 154792 |   100.00 | Using index 
|    1 | SIMPLE      | temp_rev_comment    | ref    | PRIMARY,revcomment_rev | PRIMARY     | 4       | commonswiki.revision.rev_id                        |      1 |   100.00 | Using index 
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                | PRIMARY     | 8       | commonswiki.temp_rev_comment.revcomment_comment_id |      1 |   100.00 |             
|    1 | SIMPLE      | page                | eq_ref | PRIMARY                | PRIMARY     | 4       | commonswiki.revision.rev_page                      |      1 |   100.00 |             
+------+-------------+---------------------+--------+------------------------+-------------+---------+----------------------------------------------------+--------+----------+-------------
4 rows in set, 1 warning (0.00 sec)
Tue, Apr 9, 11:29 AM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn triaged T220493: Xml stubs dumps are running 5 to 15x slower than previously as High priority.
Tue, Apr 9, 11:25 AM · MediaWiki-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly.

This also broke abstracts on en wiki so I'm live patching on snapshot1009 for that. Maybe we can get a backport before today's deploy.

Tue, Apr 9, 8:30 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation

Mon, Apr 8

ArielGlenn added a comment to T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly.

I have live-patched this on snapsht1007 for .wmf24, so that stubs of commonswiki (the one outstanding job left) can run to completion.

Mon, Apr 8, 8:26 PM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn added a comment to T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly.

The other bug (T220424) is more or less that bug.

Mon, Apr 8, 6:44 PM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn triaged T220424: XmlDUmpWriter::writeRevision sometimes broken by duplicate keys in Link Cache as High priority.
Mon, Apr 8, 4:56 PM · MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn updated subscribers of T217899: Duplicate commas in JSON Content Translation Dumps.

@Etonkovidova Thanks for updating the workboard column. I like to move tasks there once they have been closed completely (for next time).

Mon, Apr 8, 4:54 PM · Language-Team (Language-2019-April-June), MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Unplanned-Sprint-Work, ContentTranslation, Dumps-Generation
ArielGlenn added a comment to T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly.

A comparison of stubs for all revisions from 20190320 and the ones run just now shows only new revisions, aside from newly imported pages (which add up to the correct number of revisions for those) and a redirect change reflected in an edit during the last couple weeks.

Mon, Apr 8, 7:24 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn added a comment to T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly.

I've tested this just now on the sa wikisource dumps and the stubs run past the problematic page. If they run to completion properly and the output looks good compared to the last month's full run, I'll rerun all of the problem wikis with this live-patched so we can get these jobs done.

Mon, Apr 8, 6:49 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn added a comment to T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly.

Since the underlying issue is that twp pages have the same key in the link cache, there's a few things that might be done:

Mon, Apr 8, 6:28 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn added a comment to T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly.

Note that in the 20190301 sa wiksource stubs, the revisions for pages 8821 and 8829 appear normally, with the revision for 8829 marked as a redirect, as it should be, and the title of the redirect target provided.

Mon, Apr 8, 12:35 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn triaged T220316: XmlDumpWriter::openPage handles main namespace articles with prefixes that are namespace names AND are redirects incorrectly as High priority.
Mon, Apr 8, 12:17 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, MediaWiki-General-or-Unknown, MediaWiki-Export-or-Import, Dumps-Generation

Sat, Apr 6

ArielGlenn added a comment to T220257: dumpBackups.php failing with InvalidArgumentException thrown from RevisionStoreRecord for certain wikis.

The script live-patched ran to completion. I will put this fix out on snapshot1005,6,7 for .wmf23 so that we can get stubs completed. The fix will be overwritten by the first deploy though.

Sat, Apr 6, 9:51 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), MediaWiki-Revision-backend, Wikimedia-production-error, Patch-For-Review, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn added a comment to T220257: dumpBackups.php failing with InvalidArgumentException thrown from RevisionStoreRecord for certain wikis.

I have live-patched WikiExporter.php on snapshot1009 and am running the dumpBackup command there as it would be run for stubs production (though to a different output file) :

/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=enwiki --full --stub --report=1000 --output=file:/mnt/dumpsdata/temp/dumpsgen/badstubs-history.xml --output=file:/mnt/dumpsdata/temp/dumpsgen/badstubs-current.xml --filter=latest --output=file:/mnt/dumpsdata/temp/dumpsgen/badstubs-articles.xml --filter=latest --filter=notalk '--filter=namespace:!NS_USER' --skip-header --start=4166862 --skip-footer --end 4186862
Sat, Apr 6, 9:26 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), MediaWiki-Revision-backend, Wikimedia-production-error, Patch-For-Review, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn added a project to T220257: dumpBackups.php failing with InvalidArgumentException thrown from RevisionStoreRecord for certain wikis: MediaWiki-General-or-Unknown.
Sat, Apr 6, 9:17 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), MediaWiki-Revision-backend, Wikimedia-production-error, Patch-For-Review, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn updated the task description for T220257: dumpBackups.php failing with InvalidArgumentException thrown from RevisionStoreRecord for certain wikis.
Sat, Apr 6, 9:16 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), MediaWiki-Revision-backend, Wikimedia-production-error, Patch-For-Review, MediaWiki-Export-or-Import, Dumps-Generation
ArielGlenn added a comment to T220257: dumpBackups.php failing with InvalidArgumentException thrown from RevisionStoreRecord for certain wikis.

The command in the task description should dump two pages, in order: 2.4 and 2.40 on enwiki.

Sat, Apr 6, 9:15 AM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), MediaWiki-Revision-backend, Wikimedia-production-error, Patch-For-Review, MediaWiki-Export-or-Import, Dumps-Generation