ArielGlenn (ariel)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 7:09 PM (210 w, 1 d)
Availability
Available
IRC Nick
apergos
LDAP User
ArielGlenn
MediaWiki User
ArielGlenn [ Global Accounts ]

Recent Activity

Yesterday

ArielGlenn added a comment to T207278: Move dumpsdata1001.

That time would be ok for me (my evening but it's not too late). @hoo?

Thu, Oct 18, 4:15 PM · Dumps-Generation, ops-eqiad, Operations
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

OK, it looks like everyone's weighed in. so I'll suggest:

<revision>
  <id>308722154</id>
  ...
  <format>text/x-wiki</format>
  <text location="tt:305112983" sha1="xxxx" bytes="143" />   <-- contains sha1 of content in main slot
  <sha1>a9kdtqq3buy5tribez2u0ad4b6fdxq2</sha1>               <-- revision sha1
Thu, Oct 18, 2:42 PM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T207278: Move dumpsdata1001.

Let's shoot for Oct 31 then. @ayounsi What time would the window be? @hoo would you prefer stopping and restarting scripts or just skipping the run for the week?

Thu, Oct 18, 2:31 PM · Dumps-Generation, ops-eqiad, Operations
ArielGlenn added a comment to T206535: wikidata weekly dumps take too long to complete.

...

Have you tested importing that via php (and/or anything else that uses the libzip2 compat stuff)?

Thu, Oct 18, 1:57 PM · Wikidata, Dumps-Generation
ArielGlenn added a comment to T206743: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared").

on db1124 with instance s8 we have a repliation error as

Last_Error: Could not execute Delete_rows_v1 event on table wikidatawiki.pagelinks; Can't find record in 'pagelinks', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log db1087-bin.003073, end_log_pos 582738698
Thu, Oct 18, 6:35 AM · Wikimedia-Incident, User-notice, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata-Campsite, DBA, User-Addshore, Datacenter-Switchover-2018, Lexicographical data, Wikidata

Wed, Oct 17

ArielGlenn added a project to T207278: Move dumpsdata1001: Dumps-Generation.
Wed, Oct 17, 2:19 PM · Dumps-Generation, ops-eqiad, Operations
ArielGlenn updated subscribers of T207278: Move dumpsdata1001.

Adding @hoo to see if we can work out timing; we could work out sometime on Oct 30th or 31st but the wikidata weeklies would be interrupted and need restarting manually.

Wed, Oct 17, 2:19 PM · Dumps-Generation, ops-eqiad, Operations
ArielGlenn added a comment to T206535: wikidata weekly dumps take too long to complete.
ariel@snapshot1008:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20181015$ date; zcat wikidata-20181015-all-BETA.ttl.gz | lbzip2 -n 4 > /mnt/dumpsdata/temp/ariel/wikidata-20181015-all-BETA.ttl.bz2; date 
Wed Oct 17 12:11:32 UTC 2018
Wed Oct 17 13:25:23 UTC 2018
ariel@snapshot1008:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20181015$ ls -lh wikidata-20181015-all-BETA.ttl.gz /mnt/dumpsdata/temp/ariel/wikidata-20181015-all-BETA.ttl.bz2
-rw-rw-r-- 1 ariel    wikidev  37G Oct 17 13:25 /mnt/dumpsdata/temp/ariel/wikidata-20181015-all-BETA.ttl.bz2
-rw-r--r-- 1 dumpsgen dumpsgen 44G Oct 16 15:05 wikidata-20181015-all-BETA.ttl.gz
Wed, Oct 17, 1:28 PM · Wikidata, Dumps-Generation
ArielGlenn added a comment to T206535: wikidata weekly dumps take too long to complete.

I've enabled the use of lbzip2 for the xml/sql dumps starting with the Oct 20th run; we could consider using this for the wikidata weeklies recompression into bz2 files, at, say, four threads (half the number of shards). As far as I can tell it puts out binary-format compatible dumps to those produced by bzip2, though not byte-identical output. What do folks think?

Wed, Oct 17, 11:18 AM · Wikidata, Dumps-Generation
ArielGlenn moved T206535: wikidata weekly dumps take too long to complete from Backlog to Active on the Dumps-Generation board.
Wed, Oct 17, 11:15 AM · Wikidata, Dumps-Generation
ArielGlenn moved T179059: Consider skipping or modifying recombine step for page content dumps for wikidata from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Wed, Oct 17, 11:15 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T179059: Consider skipping or modifying recombine step for page content dumps for wikidata.

These changes are now live on the snapshot hosts and will be in effect for the Oct 20th run. The hosts should be monitored more closely than usual to check load and disk i/o (including especially the nfs servers).

Wed, Oct 17, 11:15 AM · Patch-For-Review, Dumps-Generation

Tue, Oct 16

ArielGlenn updated subscribers of T202705: Degraded RAID on sodium.

Whoever has clinic duty should probably take this and hand it to the right person. I think @Dzahn and/or @MoritzMuehlenhoff may oversee the ubuntu mirror boxes (if not, please excuse the ping).

Tue, Oct 16, 7:56 PM · ops-eqiad, Operations

Mon, Oct 15

ArielGlenn added a comment to T207090: Requesting deployment access to servers for Performance Team task for perf-roots.

@lmarlier, can you sign off please?

Mon, Oct 15, 9:37 PM · Operations, SRE-Access-Requests
ArielGlenn renamed T207090: Requesting deployment access to servers for Performance Team task for perf-roots from Requesting deplyoment access to servers for Performance Team task for perf-roots to Requesting deployment access to servers for Performance Team task for perf-roots.
Mon, Oct 15, 9:36 PM · Operations, SRE-Access-Requests
ArielGlenn added a comment to T207030: wikidata rdf dumps cron job complaining for lexemes phase.

This is now deployed on snapshot1008 (where cron jobs run). We'll know next Monday if this took care of the problem; let's leave the task open til then.

Mon, Oct 15, 5:20 PM · Patch-For-Review, Dumps-Generation, Wikidata
ArielGlenn added a project to T207030: wikidata rdf dumps cron job complaining for lexemes phase: Dumps-Generation.
Mon, Oct 15, 1:34 PM · Patch-For-Review, Dumps-Generation, Wikidata
ArielGlenn triaged T207030: wikidata rdf dumps cron job complaining for lexemes phase as Normal priority.
Mon, Oct 15, 1:33 PM · Patch-For-Review, Dumps-Generation, Wikidata

Thu, Oct 11

ArielGlenn added a comment to T179059: Consider skipping or modifying recombine step for page content dumps for wikidata.

The above has been tested and is ready for merge and deployment as soon as the current dump run completes, probably in 2-3 days. I"ll prep a changeset for the config setting update in puppet too.

Thu, Oct 11, 12:17 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T179059: Consider skipping or modifying recombine step for page content dumps for wikidata.

I've done some followup testing with lbzip2, using multiple processors with read from stdin via pipe, examined the low level format of the output files as compared to bzip2 output, and checked memory use. A sample memory use comparison:

Thu, Oct 11, 10:15 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T206306: adds-changes dumps don't handle missing runs properly from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Thu, Oct 11, 7:49 AM · Patch-For-Review, Dumps-Generation
ArielGlenn closed T206306: adds-changes dumps don't handle missing runs properly as Resolved.
Thu, Oct 11, 7:49 AM · Patch-For-Review, Dumps-Generation
ArielGlenn renamed T206306: adds-changes dumps don't handle missing runs properly from adds-changes dumps don't fill in missing runs properly to adds-changes dumps don't handle missing runs properly.
Thu, Oct 11, 7:48 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T206306: adds-changes dumps don't handle missing runs properly.

The run is working fine. Now we still don't backfill missing runs; I'm going to say that this should be a manual process, which can be done by one command from a screen session should we have such a problem again. That way it can be run on an idle host without doubling the length of the regular daily runs, which already take 15 hours to complete.

Thu, Oct 11, 7:48 AM · Patch-For-Review, Dumps-Generation

Wed, Oct 10

ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

Re the ìd`attribute being optional or not: turns out, it's optional already: The "use" attribute of the <attribute> element in an xml schema is indeed optional, and its default value is "optional" :) The fact that this is declared for all other attributes but omitted for id is confusing, though, and should be fixed.

Wed, Oct 10, 9:05 AM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn moved T206306: adds-changes dumps don't handle missing runs properly from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Wed, Oct 10, 8:09 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T206306: adds-changes dumps don't handle missing runs properly.

Going to wait for today's run to complete to make sure everything's ok before closing the ticket.

Wed, Oct 10, 8:09 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T206306: adds-changes dumps don't handle missing runs properly.

Today's test run looks good. Merging and deploying.

Wed, Oct 10, 8:01 AM · Patch-For-Review, Dumps-Generation

Tue, Oct 9

ArielGlenn added a comment to T206306: adds-changes dumps don't handle missing runs properly.

Tests look good so far; I want to run one more tomorrow which will duplicate the circumstances of the duplicate revisions of the Oct 3rd run; if that pans out then this can be deployed.

Tue, Oct 9, 5:32 PM · Patch-For-Review, Dumps-Generation
ArielGlenn updated subscribers of T206535: wikidata weekly dumps take too long to complete.

Adding @hoo and @Smalyshev in hopes that they will have some good ideas.

Tue, Oct 9, 1:16 PM · Wikidata, Dumps-Generation
ArielGlenn triaged T206535: wikidata weekly dumps take too long to complete as Normal priority.
Tue, Oct 9, 1:15 PM · Wikidata, Dumps-Generation
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

...

Anything using the existing revision level sha1 for revert detection will miss-detect a revert (or a null-edit) for *all* revisions that did not affect the main slot. While analysis on the slot level may be useful, existing analysis is on the revision level (by definition - slots are new). So it seems reasonable to keep revision-level semantics intact.

Whatever we do, we should definitely include both hashes (main slot and revision), to make the distinction obvious, and the path forward clear, and give consumers the option to change their code to consistently do the thing they want. Only if both hashes are available can we support both kinds of consumers - those that focus on the revision as a whole, and those that focus on individual slots.

Tue, Oct 9, 10:58 AM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T206306: adds-changes dumps don't handle missing runs properly.

There are a few things I've found on looking at the code.

  • It doesn't backfill automatically; we'd be able to run the job for those dates and it should 'just work'. This needs testing; potentially the main index.html file for the adds changes dumps might be rewritten with info from these older runs.
  • Logging at verbose level is broken (patch coming).
  • It correctly finds the max rev id for the previous day's run from the db, even if the previous day's run did not complete. But it writes this information in the current run's max rev id file, which means that the next day's run will have duplicate revisions in it. (Patch coming)
Tue, Oct 9, 10:38 AM · Patch-For-Review, Dumps-Generation

Mon, Oct 8

ArielGlenn moved T206306: adds-changes dumps don't handle missing runs properly from Backlog to Active on the Dumps-Generation board.
Mon, Oct 8, 9:52 AM · Patch-For-Review, Dumps-Generation

Fri, Oct 5

ArielGlenn triaged T206306: adds-changes dumps don't handle missing runs properly as Normal priority.
Fri, Oct 5, 9:44 AM · Patch-For-Review, Dumps-Generation

Thu, Oct 4

ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

(Sorry for the near-stream-of-consciousness updates here, just trying to Get **It Done.) To get the sha1 discussion started:

Thu, Oct 4, 2:13 PM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

https://www.mediawiki.org/wiki/Requests_for_comment/Schema_update_for_multiple_content_objects_per_revision_(MCR)_in_XML_dumps#Schema This has now been updated.

Thu, Oct 4, 12:50 PM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T174031: MCR: Include all slots in XML dumps.

...

Do we currently have Flow content in dumps at all? I mean, the actual content, not the references?

Thu, Oct 4, 11:16 AM · Core Platform Team (MCR), Multi-Content-Revisions (New Features), Core Platform Team Backlog (Next), SDC Engineering, Wikidata
ArielGlenn added a comment to T174031: MCR: Include all slots in XML dumps.

Clarifying question: what slot is Flow content considered to be in, a 'main', or a 'secondary'? All that content has such a different schema in the db tables.

Thu, Oct 4, 10:22 AM · Core Platform Team (MCR), Multi-Content-Revisions (New Features), Core Platform Team Backlog (Next), SDC Engineering, Wikidata
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

I'll make the changes agreed upon in last night's meeting to the RFC a bit later today and will note here when they are done.

Thu, Oct 4, 9:16 AM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

Following up on the deduplication issue raised above:

Thu, Oct 4, 9:08 AM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata

Wed, Oct 3

ArielGlenn added a comment to T203424: Replace the WikiExporter backup dump streaming mode with batched queries.

That fixed the problem, thanks!

Wed, Oct 3, 4:55 PM · Core Platform Team Kanban (Done with CPT), MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), Core Platform Team (Security, stability, performance and scalability (TEC1)), Patch-For-Review, MediaWiki-Export-or-Import
ArielGlenn added a comment to T203424: Replace the WikiExporter backup dump streaming mode with batched queries.

https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/459885/ seems to have broken Flow dumps, which use streams:

2018-10-03 11:23:53 failed history content of flow pages in xml format
[48f79972c1cea21e104aa6a5] [no req] Error from line 82 of /srv/mediawiki/php-1.32.0-wmf.24/extensions/Flow/maintenance/dumpBackup.php: Undefined class constant 'STREAM' 
Backtrace: 
#0 /srv/mediawiki/php-1.32.0-wmf.24/extensions/Flow/maintenance/dumpBackup.php(62): FlowDumpBackup->dump(integer) 
#1 /srv/mediawiki/php-1.32.0-wmf.24/maintenance/doMaintenance.php(94): FlowDumpBackup->execute() 
#2 /srv/mediawiki/php-1.32.0-wmf.24/extensions/Flow/maintenance/dumpBackup.php(144): require_once(string) 
#3 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string) 
#4 {main}
Wed, Oct 3, 1:25 PM · Core Platform Team Kanban (Done with CPT), MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), Core Platform Team (Security, stability, performance and scalability (TEC1)), Patch-For-Review, MediaWiki-Export-or-Import
ArielGlenn added a comment to T203075: Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row 5191150.

...

@ArielGlenn Do you know if there are any side-effect or impact from this error that we need to be aware of? Or does the snapshot script account for this issue and recovers without issue?

Wed, Oct 3, 7:20 AM · Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Next), Wikimedia-production-error

Mon, Oct 1

ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

While not strictly a blocker, some related work is at T205825.

Mon, Oct 1, 2:27 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

This will be discussed at the TechCom meeting Wednesday, October 3rd at 2pm PST(21:00 UTC, 23:00 CET). The announcement was sent to Wikitech-l: https://lists.wikimedia.org/pipermail/wikitech-l/2018-September/090881.html

Mon, Oct 1, 9:03 AM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T204962: Fix various dump scripts to not use hardcoded paths to MWScript.php.

Two of the symlink fixes have been deployed, one to go.

Mon, Oct 1, 8:42 AM · Patch-For-Review, Dumps-Generation
ArielGlenn updated the task description for T204962: Fix various dump scripts to not use hardcoded paths to MWScript.php.
Mon, Oct 1, 8:41 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T205825: Restructure 'misc dump' cron scripts and infra so they can be easily tested in mw-vagrant from Backlog to Active on the Dumps-Generation board.
Mon, Oct 1, 8:41 AM · Patch-For-Review, Dumps-Generation
ArielGlenn triaged T205825: Restructure 'misc dump' cron scripts and infra so they can be easily tested in mw-vagrant as Normal priority.
Mon, Oct 1, 8:30 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T204962: Fix various dump scripts to not use hardcoded paths to MWScript.php from Backlog to Active on the Dumps-Generation board.
Mon, Oct 1, 8:22 AM · Patch-For-Review, Dumps-Generation

Thu, Sep 27

ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

...

Sounds mostly good, except that we have to bump the XML schema when we want to do T183490: MCR schema migration stage 4: Migrate External Store URLs (wmf production). If we made id optional & deprecated right away, we wouldn't have to do that. Perhaps this is one of the questions to answer during the RFC discussion.

Thu, Sep 27, 9:06 AM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata

Wed, Sep 26

ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

Yes, sorry I wasn't clear. I meant to include some base data and a script, rather than force an import. since some users may not want that data. Info on how to do the import would go in the README. As a side note.importing can be done as a one-time thing in puppet; the dumps role does this already.

Wed, Sep 26, 8:40 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

I wasn't happy with the different formats for the same attribute name either, using a different name for the content text id is grand! Using anyURI makes for a simpler schema too.

Wed, Sep 26, 3:40 PM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T205361: Make an HTML dump of the output of the CodeReview extension on MediaWiki.org.

Soooo... @Legoktm were you able to give it a try, and if so, what happened?

Wed, Sep 26, 12:32 PM · Core Platform Team Kanban (Doing), Core Platform Team ( Code Health (TEC13)), MediaWiki-extensions-CodeReview
ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

I wish I'd had a pointer to the changes in Storage/Blobstore.php and SqlBlobStore.php earlier, didn't realize this new form of text addressing was baked in already. Anyways, I've revised the draft accordingly, sorry for the long delay. Main slot still only gets an id number for text id (this is required for backwards compat); text id in content element gets schema:location format. Is that pattern regex in the schema ok?

Wed, Sep 26, 12:01 PM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

I've moved the dumps output to /var/www/dumps and the webroot to /var/www/dumps/public; what do folks think about that? (Note I have not tested the module with these new changes.)

Wed, Sep 26, 9:54 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation

Tue, Sep 25

ArielGlenn added a comment to T204801: Exec error "Possibly missing executable file: svn diff" from Special:Code.

@Krinkle Of course. But let's not reinstall svn everywhere just to get diffs working on one box, that's all I'm saying.

Tue, Sep 25, 7:47 PM · MediaWiki-extensions-CodeReview, Operations, Wikimedia-production-error
ArielGlenn updated subscribers of T204801: Exec error "Possibly missing executable file: svn diff" from Special:Code.

See T116948: this extension is long gone. It had a wonderful life, and now it rests in peace; let's not devote resources to putting it back. There was a proposal to get an HTML dump of the contents and put them somewhere (T116948). @Legoktm ?

Tue, Sep 25, 7:37 PM · MediaWiki-extensions-CodeReview, Operations, Wikimedia-production-error

Mon, Sep 24

ArielGlenn committed R1891:5527d50fdaee: option to skip siteinfo header, mw footer for recompresing files (authored by ArielGlenn).
option to skip siteinfo header, mw footer for recompresing files
Mon, Sep 24, 6:20 PM
ArielGlenn committed R1891:52435c8039cc: options for writeuptopageid to skip writing header or footer (authored by ArielGlenn).
options for writeuptopageid to skip writing header or footer
Mon, Sep 24, 6:20 PM
ArielGlenn committed R1891:b10ff0e6964a: use iohandlers for recompressxml input and output (authored by ArielGlenn).
use iohandlers for recompressxml input and output
Mon, Sep 24, 6:20 PM

Fri, Sep 21

ArielGlenn added a comment to T144103: Create .nt (NTriples) dumps for wikidata data.

The above change is now live on snapshot1008 (where this job runs) and will take effect for the next run on Monday morning.

Fri, Sep 21, 5:50 AM · Patch-For-Review, User-Smalyshev, Discovery, Wikidata-Query-Service, Wikidata
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

@awight What was your thinking behind the choice of /vagrant/srv/dumps/output as the location for dumps output files, as opposed to someplace not on the nfs mount?

Fri, Sep 21, 5:42 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation

Thu, Sep 20

ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

What do folks think about putting all the 'misc dump scripts' (puppet/modules/snapshot/files/cron) in their own repo, rather than having them in puppet? They'd have to be deployed by scap I suppose, but they could then be easily cloned into any testing platform instead of each tester having to copy in the scripts to e.g. vagrant or wherever else.

Thu, Sep 20, 3:13 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a subtask for T201478: Enhancements to vagrant dumps role: T204962: Fix various dump scripts to not use hardcoded paths to MWScript.php.
Thu, Sep 20, 2:55 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a parent task for T204962: Fix various dump scripts to not use hardcoded paths to MWScript.php: T201478: Enhancements to vagrant dumps role.
Thu, Sep 20, 2:55 PM · Patch-For-Review, Dumps-Generation
ArielGlenn triaged T204962: Fix various dump scripts to not use hardcoded paths to MWScript.php as Normal priority.
Thu, Sep 20, 2:53 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

https://apergos.wordpress.com/2018/09/19/xml-sql-dumps-and-mediawiki-vagrant-two-great-tastes-that-taste-great-together/ This is much longer than anyone here cares about, just skip to the last few sections and especially 'next steps'.

Thu, Sep 20, 12:22 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a watcher for Commons-Twitter-Bot: ArielGlenn.
Thu, Sep 20, 11:31 AM
ArielGlenn added a comment to T204301: Create phabricator project for Emoji-TwitterBot.

Hey @Aklapper, the bot is on toolforge though not currently running. Is that what you're asking?

Thu, Sep 20, 8:50 AM · Project-Admins

Wed, Sep 19

ArielGlenn added a comment to T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps.

Sorry, I've been trying to get a couple other things off my plate. Will be coming back to this shortly.

Wed, Sep 19, 7:08 PM · Core Platform Team Kanban (Doing), Core Platform Team (MCR), Multi-Content-Revisions (New Features), SDC Engineering, Dumps-Generation, User-ArielGlenn, User-Daniel, TechCom-RFC, Wikidata
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

The latest version of the patchset should allow one to apply the wikidata role, then apply the dumps role, then dump any of the wikis configured, including wikidata, and (once the wikidata json, rdf, and dumps_functions and dcat scripts/config are copied in) dump those too.

Wed, Sep 19, 4:39 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation

Sep 18 2018

ArielGlenn added a comment to T204531: Wikidata dumps creating large amounts of log spam.

Well, the mystery of 'why eqiad' is solved; the choice of whether to parse db-eqiad.php or db-codfw.php is determined by the global $wmfDatacenter. This is set in multiversion/MWRealm.php from the value of $wmfCluster, which is taken from the contents of /etc/wikimedia-cluster on the server itself. Since the dumps are run from servers in eqiad, there we have it.

Sep 18 2018, 8:47 AM · Datacenter-Switchover-2018, MediaWiki-Logging, Wikidata, Dumps-Generation, Wikimedia-production-error
ArielGlenn updated subscribers of T204531: Wikidata dumps creating large amounts of log spam.

Adding @hoo to see what insights he may have.

Sep 18 2018, 8:35 AM · Datacenter-Switchover-2018, MediaWiki-Logging, Wikidata, Dumps-Generation, Wikimedia-production-error
ArielGlenn added a comment to T204531: Wikidata dumps creating large amounts of log spam.
root@snapshot1008:~# netstat -a -p | grep php | grep eqiad | grep db
tcp        0      0 snapshot1008.eqia:41898 db1087.eqiad.wmne:mysql ESTABLISHED 105786/php7.0       
tcp        0      0 snapshot1008.eqia:42528 db1087.eqiad.wmne:mysql ESTABLISHED 107804/php7.0       
tcp        0      0 snapshot1008.eqia:39826 db1087.eqiad.wmne:mysql ESTABLISHED 120869/php7.0       
tcp        0      0 snapshot1008.eqia:36730 db1087.eqiad.wmne:mysql ESTABLISHED 118688/php7.0       
tcp        0      0 snapshot1008.eqia:39936 db1087.eqiad.wmne:mysql ESTABLISHED 104820/php7.0       
tcp        0      0 snapshot1008.eqia:35566 db1087.eqiad.wmne:mysql ESTABLISHED 118107/php7.0       
tcp        0      0 snapshot1008.eqia:36468 db1087.eqiad.wmne:mysql ESTABLISHED 118553/php7.0       
tcp        0      0 snapshot1008.eqia:35032 db1087.eqiad.wmne:mysql ESTABLISHED 117876/php7.0
Sep 18 2018, 8:33 AM · Datacenter-Switchover-2018, MediaWiki-Logging, Wikidata, Dumps-Generation, Wikimedia-production-error

Sep 17 2018

ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

Well those were some fun rabbit holes. I'll document what I did, but the long and short of it is, after sorting out github issues and then composer issues, it turns out that there are some annoyances with CommonSettings for the wikidata role. I need to think about the minimal workaround needed for the wikidata role and the dumps role to play well together, as well as the dumps role all by itself.

Sep 17 2018, 8:53 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T204531: Wikidata dumps creating large amounts of log spam.

The wikidata weeklies are running now; the 'regular' (xml/sql) dumps completed on Sept 15th for the first run of the month.

Sep 17 2018, 2:16 PM · Datacenter-Switchover-2018, MediaWiki-Logging, Wikidata, Dumps-Generation, Wikimedia-production-error

Sep 11 2018

ArielGlenn added a comment to T204005: Twitter user stream API is no more available. The WMEmojiBot needs a new way of listening to tweet events.

There is not a project for it (yet). There could be. @rosalieper what do you think?

Sep 11 2018, 12:06 PM · Commons-Twitter-Bot
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

well in applying the wikidata role I got:

==> default: Notice: /Stage[main]/Role::Wikidata/Mediawiki::Extension[Wikibase]/Git::Clone[mediawiki/extensions/Wikibase]/Exec[git_clone_mediawiki/extensions/Wikibase]/returns: executed successfully
==> default: Error: Command exceeded timeout
==> default: Error: /Stage[main]/Role::Wikidata/Mediawiki::Extension[Wikibase]/Php::Composer::Install[/vagrant/mediawiki/extensions/Wikibase]/Exec[composer-install--vagrant-mediawiki-extensions-Wikibase]/returns: change from notrun to 0 failed: Command exceeded timeout
==> default: Notice: /Stage[main]/Role::Wikidata/Mediawiki::Extension[Wikibase]/Mediawiki::Settings[Wikibase]/File[/vagrant/settings.d/puppet-managed/10-Wikibase.php]: Dependency Exec[composer-install--vagrant-mediawiki-extensions-Wikibase] has failures: true
==> default: Warning: /Stage[main]/Role::Wikidata/Mediawiki::Extension[Wikibase]/Mediawiki::Settings[Wikibase]/File[/vagrant/settings.d/puppet-managed/10-Wikibase.php]: Skipping because of failed dependencies

so that's a bit of an issue. Any ideas, @Smalyshev ? This is after applying the dumps role but I can't imagine that makes a difference.

Sep 11 2018, 9:35 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn moved T201478: Enhancements to vagrant dumps role from Up Next to Active on the Dumps-Generation board.
Sep 11 2018, 9:34 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

It looks like I might be able to get the job done with importDump.php and a small selection of content from an existing wd dump. I'll experiment some.

Sep 11 2018, 7:32 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

Check out also https://www.mediawiki.org/wiki/User:Smalyshev_(WMF)/Dump_Test which describes how I tested dumps and which tweaks may be useful.

Sep 11 2018, 6:04 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation

Sep 10 2018

ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

Finally. The current patchset works with xml/sql dumps, and I've tested it for category rdf dumps by copying in dump_functions.sh, dumpcategoriesrdf-shared.sh, and dumpcategoriesrdf.sh to /usr/local/etc and/or /usr/local/bin, and also making the log dir /var/log/categoriesrdf owned by vagrant. The log dirs for the various misc scripts can be added in a later patchset; this is just verification that the current code works.

Sep 10 2018, 10:10 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

Huh, it needed some changes for that. One more round of testing needed for the latest patch.

Sep 10 2018, 6:27 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T201478: Enhancements to vagrant dumps role.

The above patch is almost ready for review. I have run the regular xml/sql dumps with it applied, and they run clean. I'd like to try one of the 'misc' dumps that uses the misc crons functions and make sure it runs correctly.

Sep 10 2018, 4:01 PM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T199290: PHP Warning "Error while sending QUERY packet. PID" (DatabaseMysqli).

I thought we were capped at 2048k for revision text, so how would we exceed that 32M packet size? But you're right, it would be nice to log the query somehow.

Sep 10 2018, 12:27 PM · Datasets-General-or-Unknown, MediaWiki-Database, Wikimedia-production-error
ArielGlenn moved T203494: Requesting access to Root for Giovanni Tirloni from Untriaged to Manager/NDA Approval/Confimation on the SRE-Access-Requests board.
Sep 10 2018, 12:16 PM · Patch-For-Review, Operations, SRE-Access-Requests
ArielGlenn moved T203847: Requesting access to researchers for kharlan from Untriaged to Manager/NDA Approval/Confimation on the SRE-Access-Requests board.
Sep 10 2018, 12:14 PM · Patch-For-Review, Operations, SRE-Access-Requests
ArielGlenn added a comment to T203847: Requesting access to researchers for kharlan.

Can we get manager sign-off on this please? Thanks!

Sep 10 2018, 12:14 PM · Patch-For-Review, Operations, SRE-Access-Requests
ArielGlenn moved T202708: Onboarding Mathew Onipe from In Discussion to SRE Meeting Review Required on the SRE-Access-Requests board.
Sep 10 2018, 12:13 PM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations
ArielGlenn updated subscribers of T201478: Enhancements to vagrant dumps role.

We should support testing the 'misc' cron job dumps (from the command line); @Smalyshev did a little work on this for wikibase dumps, see https://gerrit.wikimedia.org/r/#/c/mediawiki/vagrant/+/456673/ and I think there might be more patchsets coming (hopefully :-) )

Sep 10 2018, 10:21 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn added a comment to T202072: Requesting Access to view EventLogging data for gabriel-wmde / gbirke.

Pinging @gabriel-wmde, this is just waiting on your input.

Sep 10 2018, 10:18 AM · Patch-For-Review, User-Addshore, Operations, SRE-Access-Requests
ArielGlenn closed T202614: Dataset rsyncs sometimes take longer than two days to complete. as Resolved.

It's a week later and the longest runs take about half an hour. I think we can safely close this; I have pending some work to restructure the rsyncs so that we can go at full speed to the labstore hosts, thus reducing issues if we should have a long downtime for those servers for some reason, which means we should never fall too far behind unless one or the other of the dumpsdata hosts dies. And in that case having some delay in recovery is expected.

Sep 10 2018, 10:16 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T201478: Enhancements to vagrant dumps role from Backlog to Up Next on the Dumps-Generation board.
Sep 10 2018, 10:09 AM · Patch-For-Review, MediaWiki-Vagrant, Dumps-Generation
ArielGlenn closed T203647: Clean up flow dump job problems for Sept 1 2018 dumps run as Resolved.

Frwiki is the only one left, and that is only because it's still running other steps. Closing this task.

Sep 10 2018, 10:08 AM · MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), Patch-For-Review, Dumps-Generation
ArielGlenn closed T203647: Clean up flow dump job problems for Sept 1 2018 dumps run, a subtask of T191066: 1.32.0-wmf.20 deployment blockers, as Resolved.
Sep 10 2018, 10:08 AM · Release-Engineering-Team (Kanban), Release, Train Deployments
ArielGlenn moved T185116: Write vagrant role for Wikimedia dumps from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Sep 10 2018, 10:07 AM · Patch-For-Review, Dumps-Generation, MediaWiki-Vagrant
ArielGlenn closed T185116: Write vagrant role for Wikimedia dumps as Resolved.

Welp, I'm closing it, since we have a ticket for enhancements now!

Sep 10 2018, 10:07 AM · Patch-For-Review, Dumps-Generation, MediaWiki-Vagrant
ArielGlenn moved T202268: Move hu, ar wiki to 'bigwikis' list from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Sep 10 2018, 10:07 AM · Patch-For-Review, Dumps-Generation