Page MenuHomePhabricator

ArielGlenn (ariel)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 7:09 PM (254 w, 3 d)
Availability
Available
IRC Nick
apergos
LDAP User
ArielGlenn
MediaWiki User
ArielGlenn [ Global Accounts ]

Recent Activity

Fri, Aug 23

ArielGlenn added a comment to T226698: Allow all Analytics tools to work with Kerberos auth.

@elukey On our previous server we let people pull from us and it was very difficult to manage upgrades or any sort of maintenance. Somewhere there's a ticket with the awfulness.

Fri, Aug 23, 3:11 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
ArielGlenn added a comment to T230856: RDF dump performance for SDC.

https://github.com/apergos/misc-wmf-crap/tree/master/glyph-image-generator Starting to get clever about this: ability to generate 50k small images with metadata that can be extracted for using in depicts and/or caption statements.

Fri, Aug 23, 3:05 PM · Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Thu, Aug 22

ArielGlenn added a project to T224563: Migrate dumpsdata hosts to Stretch/Buster: Dumps-Generation.
Thu, Aug 22, 3:21 PM · Dumps-Generation, Operations

Wed, Aug 21

ArielGlenn added a comment to T230856: RDF dump performance for SDC.

I'm looking at deployment-db05 now, and there are 63332 rows in the revision table, with 53250 rows in the content table. I guess we need to double the number of revisions and then add the structured data for those entries. we can probably be clever about this via a script.

Wed, Aug 21, 6:19 AM · Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn added a comment to T230856: RDF dump performance for SDC.

@Smalyshev Do you know how many entries have structured data on deployment-prep? Is that a useful testing ground right now or should we be populating the data over there first?

Wed, Aug 21, 5:51 AM · Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Mon, Aug 12

ArielGlenn moved T230099: set up a cron job that watches dumps logs for exceptions and reports periodically from Active to Done on the Dumps-Generation board.
Mon, Aug 12, 9:23 AM · Dumps-Generation
ArielGlenn moved T226167: audit public tables and make sure we dump them all from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Mon, Aug 12, 9:22 AM · Patch-For-Review, Dumps-Generation
ArielGlenn closed T230099: set up a cron job that watches dumps logs for exceptions and reports periodically as Resolved.

This appears to be working as it should. Closing.

Mon, Aug 12, 9:21 AM · Dumps-Generation
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

I'm not thinking about the amount of time it takes, but rather the load on the database servers. Reasonable sized batched queries will be better, as I've seen already with stub dumps and slot retrieval.

Mon, Aug 12, 7:58 AM · Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Patch-For-Review, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

I think T222497 should be resolved before this goes live. I can test it in deployment-prep before then, but I don't want to do production tests until there is some sort of batching.

Mon, Aug 12, 7:34 AM · Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Patch-For-Review, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Fri, Aug 9

ArielGlenn added a comment to T226167: audit public tables and make sure we dump them all.

Thanks a lot! I've updated the patch above to remove those entries. Now just waiting on the wb_terms migration to get further along.

Fri, Aug 9, 3:04 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T203075: Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row 5191150.

The entry in the text row points to a non-existent blob in a cluster.

wikiadmin@10.64.48.34(zhwiki)> select * from text where old_id = 9375723;
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id  | old_namespace | old_title | old_text         | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| 9375723 |             0 |           | DB://cluster20/0 |             |        0 |               |               |              0 | utf-8,gzip,external |                   |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
1 row in set (0.00 sec)

The id of 0 after the cluster20 address is the issue, just like other entries on this ticket.

Fri, Aug 9, 2:26 PM · Core Platform Team (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Wikimedia-production-error
ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Fri, Aug 9, 6:56 AM

Thu, Aug 8

ArielGlenn added a comment to T217329: bug in 1.33.0-wmf.18 breaks abstract dumps on testwikidatawiki | MWContentSerializationException $entityId and $targetId can not be the same.

The python scripts at the dump end are (mostly) protected against exceptions from MediaWiki generally and from this failure case in particular. Since we have problematic data in production I've re-opened the ticket so that the WikiBase issue can somehow be resolved.

Thu, Aug 8, 4:33 PM · Multi-Content-Revisions (Tech Debt), CPT Initiatives (MCR), wikidata-tech-focus, User-Addshore, Wikimedia-production-error, Wikidata-Campsite, Wikidata, Dumps-Generation
ArielGlenn moved T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB from Up Next to Active on the Dumps-Generation board.
Thu, Aug 8, 1:14 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation
ArielGlenn added a comment to T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB.

Most of these were handled in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/524666/ but not quite all.

Thu, Aug 8, 1:14 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation
ArielGlenn moved T226093: Capacity planning for Commons Structured Data from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Thu, Aug 8, 1:00 PM · Dumps-Generation, Operations, Wikidata, SDC General
ArielGlenn moved T219768: Get a third dumpsdata server from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Thu, Aug 8, 12:59 PM · hardware-requests, Operations, Dumps-Generation
ArielGlenn moved T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Thu, Aug 8, 12:59 PM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T230099: set up a cron job that watches dumps logs for exceptions and reports periodically from Backlog to Active on the Dumps-Generation board.
Thu, Aug 8, 12:59 PM · Dumps-Generation
ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

I should clarify; I think the only thing that is needed is to set the content model column for those rows in the content table to 1 (or whichever model is 'wikitext'). The dewikiversity tickets are similar.

Thu, Aug 8, 9:39 AM · User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

I expected that; this requires direct intervention at the db level. I was sort of hoping you were volunteering to do it :-D

Thu, Aug 8, 9:24 AM · User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn triaged T230099: set up a cron job that watches dumps logs for exceptions and reports periodically as Normal priority.
Thu, Aug 8, 7:00 AM · Dumps-Generation

Wed, Aug 7

ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

@MarcoAurelio Wonderful! It's not the pages though; in the page entry for each of the bad revisions, the content model is listed there as wikitext. It's only in the content table where the wrong content model (4) is shown.
The entries in the content table are listed in T207627#5105046 (double-checked just now to be sure the list is still the same). The slot, revision and pages infocorresponding to each of those is listed in the same comment, double-checked just now to be sure none of that changed either.

Wed, Aug 7, 10:45 AM · User-Zoranzoki21, Regression, Wikimedia-Site-requests

Mon, Aug 5

ArielGlenn updated subscribers of T226167: audit public tables and make sure we dump them all.

@aaron This revision rEFLR848ef073fa89036c40c440016a8092690ddcf56b for FlaggedRevs seems to indicate that flaggedrevs_stats and flaggedrevs_stats2 are no longer used. Do you know or can you point me to someone who could verify that this is the case? If they aren't used, I will add them to my 'don't ever dump these' list. Thanks!

Mon, Aug 5, 12:30 PM · Patch-For-Review, Dumps-Generation
ArielGlenn closed T51134: Create partial SQL dump of logging table, a subtask of T140977: Dump public data from all sql tables that have mixed public/private data, as Declined.
Mon, Aug 5, 12:18 PM · Dumps-Generation
ArielGlenn closed T51134: Create partial SQL dump of logging table as Declined.

I'm going to decline this because we would have to walk through and decide which entries can be published and which ones not .
This should be done by letting MediaWiki do the work, rather than re-implementing the logic in the python scripts and needing to keep it in sync.
The pages-logging xml dumps already do that for us.

Mon, Aug 5, 12:18 PM · Dumps-Generation
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

I sincerely apologize: this weekend the heat baked my brain and I did nothing related to computers at all. And Friday evening I was out. I'll set a notification to remind me this coming Friday earlier in the day, so that this gets done.

Mon, Aug 5, 7:23 AM · Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Patch-For-Review, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Fri, Aug 2

ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Fri, Aug 2, 10:35 AM
ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Fri, Aug 2, 9:55 AM

Thu, Aug 1

ArielGlenn moved T227084: Find and document current xml/sql dumps behavior for every dang edge case from Backlog to Up Next on the Dumps-Generation board.
Thu, Aug 1, 5:22 PM · Dumps-Generation

Tue, Jul 30

ArielGlenn added a comment to T198343: Replace all calls to Revision::getRevisionText().

The category links are the fallback that was designed, so this is a net positive. Going to go update the CR on the patch.

Tue, Jul 30, 12:42 PM · Structured-Data-Backlog, CPT Initiatives (MCR Schema Migration), Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Multi-Content-Revisions (Tech Debt), Structured Data Engineering, Wikidata
ArielGlenn added a comment to T198343: Replace all calls to Revision::getRevisionText().

Ran the following with old and new code for abstracts:

/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=anwiki /srv/v/mediawiki/php-1.34.0-wmf.15/extensions/ActiveAbstract/includes/AbstractFilter.php  --full --report=1 --output=file:/mnt/dumpsdata/temp/dumpsgen/abstracts-anwiki-cr-testing.txt.test  --filter=noredirect --filter=abstract --skip-header --start=36142 --skip-footer --end 36150
Tue, Jul 30, 11:28 AM · Structured-Data-Backlog, CPT Initiatives (MCR Schema Migration), Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Multi-Content-Revisions (Tech Debt), Structured Data Engineering, Wikidata
ArielGlenn added a comment to T202485: ActiveAbstract tests are failing.

is this still an issue? It wasn't even on my radar but I saw it now by a change search in phab.

Tue, Jul 30, 11:05 AM · MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), ActiveAbstract
ArielGlenn moved T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs! from Active to Done on the Dumps-Generation board.
Tue, Jul 30, 10:38 AM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Wikimedia-production-error, Wikidata, Dumps-Generation
ArielGlenn closed T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs! as Resolved.

The wikis ran to completion, but I forgot to close this. Doing so now!

Tue, Jul 30, 10:38 AM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Wikimedia-production-error, Wikidata, Dumps-Generation
ArielGlenn added a project to T229290: Incremental RDF dumps: Dumps-Generation.
Tue, Jul 30, 7:06 AM · Dumps-Generation, Wikidata

Mon, Jul 29

ArielGlenn moved T217329: bug in 1.33.0-wmf.18 breaks abstract dumps on testwikidatawiki | MWContentSerializationException $entityId and $targetId can not be the same from Done to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Mon, Jul 29, 3:28 PM · Multi-Content-Revisions (Tech Debt), CPT Initiatives (MCR), wikidata-tech-focus, User-Addshore, Wikimedia-production-error, Wikidata-Campsite, Wikidata, Dumps-Generation
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

The refactor patchset now checks out with all the wikidata dumps including json. I'd like to deploy it this weekend, giving plenty of time to make sure it's ok, test the structured data patchset, and then be able to deploy that separately.

Mon, Jul 29, 1:22 PM · Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Patch-For-Review, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn added a comment to T220608: Introduce UnknownContentHandler and UnknownContent.

If this gets done, stub dumps will go back to working for urwikibooks and dewikiversity. The other option at the moment is to do run the jobs manually with a hacked copy of XmlDumpWriter, which is not ideal. I'd like to not be doing that for more than a few more runs (say, a month).

Mon, Jul 29, 9:28 AM · Patch-For-Review, Core Platform Team Workboards (Clinic Duty Team), MediaWiki-ContentHandler
ArielGlenn added a comment to T207626: Disable unused Flow extension on de.wikiversity.

It looks like there is already such a ticket: T220608 which if I read it correctly would return the raw tex as it appears in the database, for purposes of jobs like the dumps. Should I move this to the Inbox column then?

Mon, Jul 29, 9:24 AM · User-Zoranzoki21, Regression, Wikimedia-Site-requests

Sat, Jul 27

ArielGlenn moved T228674: write regression test for XmlDumpWriter that will make sure content is not retrieved during stub dumps from Backlog to Up Next on the Dumps-Generation board.
Sat, Jul 27, 5:29 AM · Dumps-Generation
ArielGlenn moved T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB from Backlog to Up Next on the Dumps-Generation board.
Sat, Jul 27, 5:29 AM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation
ArielGlenn moved T228763: stubs are produced with xml:space="preserve" in the text tag; this is new behavior for the July 20th run of the xml/sql dumps from Backlog to Up Next on the Dumps-Generation board.
Sat, Jul 27, 5:29 AM · CPT Initiatives (MCR), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation
ArielGlenn moved T229114: flow dumps broken from Backlog to Done on the Dumps-Generation board.
Sat, Jul 27, 5:28 AM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn closed T229114: flow dumps broken as Resolved.

All but two wikis (commons, wikidata) have completed flow dumps successfully, which is good enough for me. Closing.

Sat, Jul 27, 5:28 AM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation

Fri, Jul 26

ArielGlenn added a comment to T229114: flow dumps broken.

The fix looks good but I'll keep this open until one of the big wikis like frwiki completes this step.

Fri, Jul 26, 4:09 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn added a comment to T229114: flow dumps broken.

@Daimona Thank you!

Fri, Jul 26, 3:01 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn added a comment to T229114: flow dumps broken.

I have tested the above on snapshot1008 for kabwiki and the job completes with normal output.

Fri, Jul 26, 2:48 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn updated subscribers of T229114: flow dumps broken.

Adding @Daimona as the person who did the phan patch, and can correct the fix if something else is more appropriate.

Fri, Jul 26, 2:39 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn added a comment to T229114: flow dumps broken.

These dumps need to complete by the end of the month, so we should get the fix in and deployed by the 30th.

Fri, Jul 26, 2:37 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn added a project to T229114: flow dumps broken: StructuredDiscussions.
Fri, Jul 26, 2:30 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn added a comment to T229114: flow dumps broken.

Broken in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Flow/+/495966/ at line 25 (patched) https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Flow/+/495966/20/includes/Search/Iterators/TopicIterator.php, I guess this needs to be "public $orderByUUID = false;"

Fri, Jul 26, 2:29 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn triaged T229114: flow dumps broken as High priority.
Fri, Jul 26, 2:16 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), StructuredDiscussions, Growth-Team, Dumps-Generation
ArielGlenn added a comment to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps.

I have run dewikiveristy and urwikibooks stubs manually with the patched XmlDumpWriter to unblock the current dump run.

Fri, Jul 26, 6:08 AM · Patch-For-Review, Dumps-Generation

Jul 25 2019

ArielGlenn added a comment to T207626: Disable unused Flow extension on de.wikiversity.
Jul 25 2019, 6:23 PM · User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T207626: Disable unused Flow extension on de.wikiversity.

The revision, the only one for the page, was created in 2011 with the unfortunate prefix 'Thema:' while Flow was enabled in 2015 on the wiki.

Jul 25 2019, 2:14 PM · User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T207626: Disable unused Flow extension on de.wikiversity.

No. The page already has a wikitext content model. It only needs to be returned to namespace 0 by running the usual namespaceDupes.php.

Jul 25 2019, 12:56 PM · User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps.

Copying here the comments from the patchset since it's getting a bit long.

Daniel Kinzler
12:02 AM
Jul 25 2019, 10:23 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval from Active to Done on the Dumps-Generation board.
Jul 25 2019, 6:55 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation

Jul 24 2019

ArielGlenn closed T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval as Resolved.

Stubs for frwiki finished a little bit ago. Closing!

Jul 24 2019, 10:04 PM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn added a comment to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps.

Note that the actual content of these revisions is in fact wikitext; when Flow was disabled the content model was not altered properly. See T220594#5101026 for extracted text for one of them.

Jul 24 2019, 9:24 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps.

Previous stubs had this entry for the example revision in dewikiversity:

<page>
  <title>Was wir hören und sehen</title>
  <ns>2600</ns>
  <id>47279</id>
  <revision>
    <id>274772</id>
    <timestamp>2011-08-07T10:46:42Z</timestamp>
    <contributor>
      <username>MartinKurz</username>
      <id>11256</id>
    </contributor>
    <comment>Automatische Zusammenfassung: Die Seite wurde neu angelegt.</comment>
    <model>wikitext</model>
    <format>text/x-wiki</format>
    <text id="269201" bytes="352" />
    <sha1>o9mhxk86c2bxcvymcwsiirc2vyczgnp</sha1>
  </revision>
</page>
Jul 24 2019, 8:07 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps.

Woops, I codged the above together before seeing your comment, @daniel Feel free to reject, replace, whatever is appropriate.

Jul 24 2019, 7:55 PM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps from Backlog to Active on the Dumps-Generation board.
Jul 24 2019, 7:36 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps.

Looking at where this occurs, I'd really prefer to have the Flow revisions set to the right content model for these two wikis, if it's something that can be done in the next 2-3 days. See T220594#5100772 for one of these revisions where the model name is wrong.

Jul 24 2019, 7:14 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

One of the wikis named on the ticket, frwiki, is still running stubs; I'll close this when it completes.

Jul 24 2019, 6:59 PM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

Yes for enwiki and dewiki, but now there is T228921 just to keep the fun coming.

Jul 24 2019, 6:48 PM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn added a project to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps: Core Platform Team.
Jul 24 2019, 6:45 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps.

It's still a side effect of https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/464768/ We could add one more exception to invokeLenient() in XmlDumpWriter.php I guess, so that the stubs for these last wikis can run (and the rest of the dump steps for those wikis too). But the underlying issue should be fixed as well. This Flow issue was reported earlier at T220793

Jul 24 2019, 6:45 PM · Patch-For-Review, Dumps-Generation
ArielGlenn triaged T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps as High priority.
Jul 24 2019, 6:40 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T228899: ApiGlobalBlock.php: Trying to get property 'gb_expiry' of non-object.

Same request id also has

PHP Notice: Trying to get property 'gb_anon_only' of non-object

at

#0 /srv/mediawiki/php-1.34.0-wmf.15/extensions/GlobalBlocking/includes/api/ApiGlobalBlock.php(46): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.34.0-wmf.15/includes/api/ApiMain.php(1583): ApiGlobalBlock->execute()
#2 /srv/mediawiki/php-1.34.0-wmf.15/includes/api/ApiMain.php(531): ApiMain->executeAction()
#3 /srv/mediawiki/php-1.34.0-wmf.15/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#4 /srv/mediawiki/php-1.34.0-wmf.15/api.php(86): ApiMain->execute()
#5 /srv/mediawiki/w/api.php(3): require(string)
#6 {main}
Jul 24 2019, 4:20 PM · Anti-Harassment, GlobalBlocking, Wikimedia-production-error
ArielGlenn added a comment to T228674: write regression test for XmlDumpWriter that will make sure content is not retrieved during stub dumps.

Note that this will currently fail if used with revisions with garbage in them, because getSha1() and getSize() can both force a content reload.

Jul 24 2019, 6:28 AM · Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

The above has been tested with the problematic frwiki and slwiki pages, and processes the bad revisions appropriately.

Jul 24 2019, 6:04 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

So very close. There are some instances of InvalidArgumentException thrown when going down the getSha1() -> getContent() rabbithole, for text addresses like DB://cluster16/54423 where cluster16 leads nowhere. Seen today for testwiki, slwiki, and frwiki. Sample stack trace:

InvalidArgumentException from line 226 of /srv/mediawiki/php-1.34.0-wmf.14/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Wikimedia\Rdbms\LBFactoryMulti::newExternalLB: Unk\
nown cluster "cluster16"
#0 /srv/mediawiki/php-1.34.0-wmf.14/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(246): Wikimedia\Rdbms\LBFactoryMulti->newExternalLB('cluster16')
#1 /srv/mediawiki/php-1.34.0-wmf.14/includes/externalstore/ExternalStoreDB.php(146): Wikimedia\Rdbms\LBFactoryMulti->getExternalLB('cluster16')
#2 /srv/mediawiki/php-1.34.0-wmf.14/includes/externalstore/ExternalStoreDB.php(156): ExternalStoreDB->getLoadBalancer('cluster16')
#3 /srv/mediawiki/php-1.34.0-wmf.14/includes/externalstore/ExternalStoreDB.php(259): ExternalStoreDB->getSlave('cluster16')
#4 /srv/mediawiki/php-1.34.0-wmf.14/includes/externalstore/ExternalStoreDB.php(65): ExternalStoreDB->fetchBlob('cluster16', '54423', false)
#5 /srv/mediawiki/php-1.34.0-wmf.14/includes/externalstore/ExternalStoreAccess.php(52): ExternalStoreDB->fetchFromURL('DB://cluster16/...')
#6 /srv/mediawiki/php-1.34.0-wmf.14/includes/Storage/SqlBlobStore.php(427): ExternalStoreAccess->fetchFromURL('DB://cluster16/...', Array)
#7 /srv/mediawiki/php-1.34.0-wmf.14/includes/libs/objectcache/WANObjectCache.php(1412): MediaWiki\Storage\SqlBlobStore->MediaWiki\Storage\{closure}(false, 604800, Array, NULL)
#8 /srv/mediawiki/php-1.34.0-wmf.14/includes/libs/objectcache/WANObjectCache.php(1258): WANObjectCache->fetchOrRegenerate('global:BlobStor...', 604800, Object(Closure), Array)
#9 /srv/mediawiki/php-1.34.0-wmf.14/includes/Storage/SqlBlobStore.php(431): WANObjectCache->getWithSetCallback('global:BlobStor...', 604800, Object(Closure), Array)
#10 /srv/mediawiki/php-1.34.0-wmf.14/includes/Storage/SqlBlobStore.php(358): MediaWiki\Storage\SqlBlobStore->expandBlob('DB://cluster16/...', Array, 'tt:1236282')
#11 /srv/mediawiki/php-1.34.0-wmf.14/includes/Storage/SqlBlobStore.php(286): MediaWiki\Storage\SqlBlobStore->fetchBlob('tt:1236282', 0)
#12 /srv/mediawiki/php-1.34.0-wmf.14/includes/libs/objectcache/WANObjectCache.php(1412): MediaWiki\Storage\SqlBlobStore->MediaWiki\Storage\{closure}(false, 604800, Array, NULL\
)
#13 /srv/mediawiki/php-1.34.0-wmf.14/includes/libs/objectcache/WANObjectCache.php(1258): WANObjectCache->fetchOrRegenerate('global:BlobStor...', 604800, Object(Closure), Array\
)
#14 /srv/mediawiki/php-1.34.0-wmf.14/includes/Storage/SqlBlobStore.php(288): WANObjectCache->getWithSetCallback('global:BlobStor...', 604800, Object(Closure), Array)
#15 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionStore.php(1464): MediaWiki\Storage\SqlBlobStore->getBlob('tt:1236282', 0)
#16 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionStore.php(1673): MediaWiki\Revision\RevisionStore->loadSlotContent(Object(MediaWiki\Revision\SlotRecord), NULL, \
NULL, NULL, 0)
#17 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}(Object(MediaWiki\Revision\SlotRecord))
#18 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/SlotRecord.php(307): call_user_func(Object(Closure), Object(MediaWiki\Revision\SlotRecord))
#19 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/SlotRecord.php(551): MediaWiki\Revision\SlotRecord->getContent()
#20 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionSlots.php(200): MediaWiki\Revision\SlotRecord->getSha1()
#21 [internal function]: MediaWiki\Revision\RevisionSlots->MediaWiki\Revision\{closure}(NULL, Object(MediaWiki\Revision\SlotRecord))
#22 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionSlots.php(202): array_reduce(Array, Object(Closure), NULL)
#23 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionStoreRecord.php(174): MediaWiki\Revision\RevisionSlots->computeSha1()
#24 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php(309): MediaWiki\Revision\RevisionStoreRecord->getSha1()
#25 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php(389): XmlDumpWriter->invokeLenient(Object(MediaWiki\Revision\RevisionStoreRecord), 'getSha1', Array, 'fa\
iled to deter...')
#26 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(530): XmlDumpWriter->writeRevision(Object(stdClass), Array)
#27 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(473): WikiExporter->outputPageStreamBatch(Object(Wikimedia\Rdbms\ResultWrapper), Object(stdClass))
#28 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(287): WikiExporter->dumpPages('page_id >= 1200...', false)
#29 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(172): WikiExporter->dumpFrom('page_id >= 1200...', false)
#30 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/includes/BackupDumper.php(289): WikiExporter->pagesByRange(12002, 12003, false)
#31 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/dumpBackup.php(82): BackupDumper->dump(1, 1)
#32 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/doMaintenance.php(99): DumpBackup->execute()
#33 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/dumpBackup.php(144): require_once('/srv/mediawiki/...')
#34 /srv/mediawiki/multiversion/MWScript.php(101): require_once('/srv/mediawiki/...')
#35 {main}
Jul 24 2019, 5:54 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation

Jul 23 2019

ArielGlenn added a comment to T222472: Investigate gerrit session expiration.

I had to log in again a few days ago but the week before that was fine. Maybe my login just expired, as they do.

Jul 23 2019, 9:04 PM · Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Development services), Gerrit
ArielGlenn triaged T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB as Normal priority.
Jul 23 2019, 4:34 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

With the above patch, the problematic command in the task description ran to completion and output was (almost) identical to an earlier run without the bug. Besides the issue raised in T228763 which is completely separate, the empty sha1 tag is written ',sha1 />' with a space before the closing slash, as opposed to the old code, which we can ignore.

Jul 23 2019, 4:15 PM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn triaged T228763: stubs are produced with xml:space="preserve" in the text tag; this is new behavior for the July 20th run of the xml/sql dumps as Normal priority.
Jul 23 2019, 2:46 PM · CPT Initiatives (MCR), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

We might want both of the above. Seen from the dumps lens:

Jul 23 2019, 12:05 PM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

...

What we could also do is tell the SlotRecord that it shouldn't try to auto-calculate, at the time it is being constructed. Calculating the sha1 on the fly is only needed when reading from an old pre-MCR database, and when constructing a revision programmatically. When reading from the MCR schema, this really doesn't make much sense.

Jul 23 2019, 9:24 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

I updated my post while you responded. Sorry about that!

When the revision content is nonexistent,/unreachable what will the sha1 be? We have plenty of these in the db already from old bugs. Reading revision metadata should not load the content if we don't want it to. Note that if we caught all exceptions in XMLDumpWriter we would have missed yesterday's issue.

Only if we catch and ignore. We should catch and report.

In an ideal world we would watch logstash output closely and see things like this right away. Unfortunately, we're living in this crappy one...

Jul 23 2019, 8:57 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

When the revision content is nonexistent,/unreachable what will the sha1 be? We have plenty of these in the db already from old bugs. Reading revision metadata should not load the content if we don't want it to. Note that if we caught all exceptions in XMLDumpWriter we would have missed yesterday's issue. I agree that we should catch them but I want to make sure we're not loading content where we aren't asking for it, first.

Jul 23 2019, 8:46 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn moved T227742: convert all hiera calls to lookup() in dumps profile manifests from Active to Done on the Dumps-Generation board.
Jul 23 2019, 7:04 AM · Dumps-Generation
ArielGlenn closed T227742: convert all hiera calls to lookup() in dumps profile manifests as Resolved.

Done and deployed.

Jul 23 2019, 6:04 AM · Dumps-Generation
ArielGlenn added a comment to T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

@daniel Relying on https://meta.wikimedia.org/wiki/Cunningham%27s_Law here is a patch. It fiddles with getSize() as well because that can also load content when it should not. I think getSize() and getSha1() are the only such methods but if you know differently, please add them to the mix.

Jul 23 2019, 6:02 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn moved T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval from Backlog to Active on the Dumps-Generation board.
Jul 23 2019, 5:24 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn updated subscribers of T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval.

This looks like it's caused by https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/464768/ at line 355 of the new XmlDumpWriter.php, which calls $rev->getSha1(). This method will try to load content of the revision and compute the sha1 directly if the sha1 is NULL. We need a way to override that behavior, probably within the method itself. Perhaps an argument load_content normally set to True. All those class attributes are private so we can't just grab the field value directly like the XMLDumpWriter code used to do.

Jul 23 2019, 4:57 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation
ArielGlenn triaged T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval as High priority.
Jul 23 2019, 4:45 AM · CPT Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation

Jul 22 2019

ArielGlenn added a comment to T224491: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262).

@Krinkle I'm suspicious that this can fail on different days with different versions of MediaWiki and on different hosts but in the exact same spot in the code. I would not expect opcache issues to lead to this sort of predictable breakage in the general case. Can we get the values of what MediaWiki thinks are in there for $batchKey, $this->titleInfo and $pageNames at the point where that exception is thrown? Would it be awful to put a try catch around that little stanza, log an entry with those values and then raise?

Jul 22 2019, 8:33 PM · User-jijiki, serviceops, PHP 7.2 support, Performance-Team (Radar), Operations, Wikimedia-production-error
ArielGlenn added a comment to T224491: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262).

From IRC conversation, Krinle says he ran

php7adm /opcache-free

and the problem immediately went away in the specific instance.

Jul 22 2019, 4:44 PM · User-jijiki, serviceops, PHP 7.2 support, Performance-Team (Radar), Operations, Wikimedia-production-error
ArielGlenn added a comment to T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!.

Thanks to Daniel and James for review and merge. This has been deployed. I verified that the new code runs to completion on the page with the problematic revision, and that Special:Export for pages with revision history produces what we expect. I won't close the ticket just yet, I'd like to watch the run through tomorrow and make sure there are no unexpected consequences.

Jul 22 2019, 3:36 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Wikimedia-production-error, Wikidata, Dumps-Generation
ArielGlenn triaged T228674: write regression test for XmlDumpWriter that will make sure content is not retrieved during stub dumps as Normal priority.
Jul 22 2019, 3:10 PM · Dumps-Generation
ArielGlenn added a comment to T147148: Wikipedia requires a patch to load its data from the dumps with mwdumper.

This should have been fixed in https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/mwdumper/+/191555/ if someone can verify.

Jul 22 2019, 1:45 PM · Dumps-Generation, Utilities-mwdumper
ArielGlenn moved T227742: convert all hiera calls to lookup() in dumps profile manifests from Backlog to Active on the Dumps-Generation board.
Jul 22 2019, 11:13 AM · Dumps-Generation
ArielGlenn moved T228558: Move viwiki and ukwiki to the big wikis list for xml/sql dumps from Backlog to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Jul 22 2019, 11:13 AM · Dumps-Generation
ArielGlenn moved T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs! from Backlog to Active on the Dumps-Generation board.
Jul 22 2019, 11:13 AM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Wikimedia-production-error, Wikidata, Dumps-Generation
ArielGlenn renamed T228558: Move viwiki and ukwiki to the big wikis list for xml/sql dumps from Move viwiki and svwiki to the big wikis list for xml/sql dumps to Move viwiki and ukwiki to the big wikis list for xml/sql dumps.
Jul 22 2019, 11:03 AM · Dumps-Generation
ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Jul 22 2019, 10:27 AM
ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Jul 22 2019, 10:19 AM