Page MenuHomePhabricator

ArielGlenn (ariel)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 7:09 PM (261 w, 6 d)
Availability
Available
IRC Nick
apergos
LDAP User
ArielGlenn
MediaWiki User
ArielGlenn [ Global Accounts ]

Recent Activity

Today

ArielGlenn added a project to T235188: Some revisions' contents are incorrect in the cache - wrong contents shown in history & diffs: User-ArielGlenn.
Wed, Oct 16, 5:04 AM · User-ArielGlenn, MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), Language-Team (Language-2019-October-December), Patch-For-Review, Core Platform Team Workboards (Clinic Duty Team), MediaWiki-General, affects-translatewiki.net

Yesterday

ArielGlenn created T235495: draft needs for incremental import tool in python.
Tue, Oct 15, 12:25 PM · Dumps-Generation
ArielGlenn added a comment to T233178: Use RevisionStore::newRevisionFromBatch in WikiExporter.

...

So, this is a problem only when dumping abstract? Do regular dumps perform ok?

Tue, Oct 15, 10:18 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, CPT Initiatives (MCR Schema Migration), Multi-Content-Revisions (Tech Debt)
ArielGlenn added a comment to T233178: Use RevisionStore::newRevisionFromBatch in WikiExporter.

I have found the reason for triple lookups of content; the third lookup, performed from within the Abstract Filter extension, is to determine whether or not the revision is a redirect or not. In the past, filtering from within the extension has not been a problem, because content is not loaded until after all filters are applied. Now that content is retrieved early, there's a substantial hit to performance. These pages with the selected revision as a redirect should not even have content loaded once, but the revisions should be discarded from the result set for processing, just like those belonging to the wrong namespaces.

Tue, Oct 15, 9:50 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, CPT Initiatives (MCR Schema Migration), Multi-Content-Revisions (Tech Debt)

Mon, Oct 14

ArielGlenn added a comment to T233178: Use RevisionStore::newRevisionFromBatch in WikiExporter.

All right, one of them is clear(ish) to me: abstracts should only run on the main namespace (0), and we skip over anything not in namespace 0 both in the extension and in WikiExporter.php, via command line args that so specify.

Mon, Oct 14, 4:38 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, CPT Initiatives (MCR Schema Migration), Multi-Content-Revisions (Tech Debt)
ArielGlenn added a comment to T233178: Use RevisionStore::newRevisionFromBatch in WikiExporter.

Adding here for posterity that the command I run (after making live mods to xhprofile and/or wancache code, with and without the patch) is

Mon, Oct 14, 4:11 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, CPT Initiatives (MCR Schema Migration), Multi-Content-Revisions (Tech Debt)
ArielGlenn added a comment to T233178: Use RevisionStore::newRevisionFromBatch in WikiExporter.

Just a short update, I have been doing profiling and logging on one of the currently idle snapshot hosts, and I think I have a lead.

Mon, Oct 14, 4:03 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, CPT Initiatives (MCR Schema Migration), Multi-Content-Revisions (Tech Debt)
ArielGlenn added a project to T233707: A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia: User-ArielGlenn.
Mon, Oct 14, 10:11 AM · User-ArielGlenn, Research, Outreachy (Round 19)

Wed, Oct 9

ArielGlenn added a comment to T233178: Use RevisionStore::newRevisionFromBatch in WikiExporter.

I've done profiling for abstract dumps and have not yet been able to tease out the part of the code where more time is spent, after several hours of scrying xhprof results. Back at it again later today. The difference in times is exacerbated if I add --namespaces=0 to the command line args. I'm hoping that will give me a lead.

Wed, Oct 9, 10:02 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, CPT Initiatives (MCR Schema Migration), Multi-Content-Revisions (Tech Debt)
ArielGlenn added a comment to T230531: Run Matrix trial using the Vector.im-hosted instance.

As an interested party, I'm curious about what's happening now as far as going ahead with the trial.

Wed, Oct 9, 7:20 AM · Matrix

Thu, Oct 3

ArielGlenn added a comment to T221399: imported pages for which there is no local user are no longer dumped under 1.34.0-wmf.1.

Still true for .wmf25.

Thu, Oct 3, 6:54 AM · MediaWiki-General, Dumps-Generation
ArielGlenn moved T232268: All dumps are broken by MW change which breaks getReplicaServer.php from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Thu, Oct 3, 6:53 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation
ArielGlenn moved T232120: Check snapshot1005 dewiki snapshot from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Thu, Oct 3, 6:53 AM · Dumps-Generation
ArielGlenn moved T224563: Migrate dumpsdata hosts to Stretch/Buster from Backlog to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Thu, Oct 3, 6:53 AM · Dumps-Generation, Operations
ArielGlenn added a comment to T224563: Migrate dumpsdata hosts to Stretch/Buster.

I can start on this once the new dumpsdata host is racked and has a base install.

Thu, Oct 3, 6:53 AM · Dumps-Generation, Operations
ArielGlenn moved T232120: Check snapshot1005 dewiki snapshot from Backlog to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Thu, Oct 3, 6:52 AM · Dumps-Generation
ArielGlenn moved T232268: All dumps are broken by MW change which breaks getReplicaServer.php from Backlog to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Thu, Oct 3, 6:52 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation
ArielGlenn moved T233460: Different field order of pagelinks only for Southern Azerbaijani Wikipedia from Active to Done on the Dumps-Generation board.
Thu, Oct 3, 6:52 AM · Datasets-General-or-Unknown, Dumps-Generation
ArielGlenn moved T233276: Manual catchup for wikidata Sept dump run from Backlog to Done on the Dumps-Generation board.
Thu, Oct 3, 6:52 AM · Dumps-Generation

Wed, Oct 2

ArielGlenn closed T233460: Different field order of pagelinks only for Southern Azerbaijani Wikipedia as Declined.
Wed, Oct 2, 8:48 AM · Datasets-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T233460: Different field order of pagelinks only for Southern Azerbaijani Wikipedia.

@Ebonetti90 I'd like to close this task as declined, meaning that we won't update the schema and you'll take steps on your end to adjust your script. OK by you?

Wed, Oct 2, 7:24 AM · Datasets-General-or-Unknown, Dumps-Generation

Tue, Oct 1

ArielGlenn updated subscribers of T234229: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing/hadooping the dump hosts .

Adding @Bstorm because the labstore servers are WMCS boxes.

Tue, Oct 1, 9:04 AM · User-Elukey, Analytics-Kanban, Analytics
ArielGlenn added a comment to T234076: rack/setup/install dumpsdata1003.eqiad.wmnet.

I'd like to request that both eth interfaces be cabled, as I'd like to try to set up bonding for this host.

Tue, Oct 1, 5:21 AM · ops-eqiad, Operations

Fri, Sep 27

ArielGlenn added a comment to T225056: Run Item Terms Rebuild script.

https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539498/ was merged in response and kicked in about 10 minutes ago, with good results on the graph.

Fri, Sep 27, 10:38 AM · User-Ladsgroup, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata, Wikidata wb_terms Trailblazing
ArielGlenn added a comment to T225056: Run Item Terms Rebuild script.

at around 6:50 UTC this morning we began seeing this:

Fri, Sep 27, 10:34 AM · User-Ladsgroup, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata, Wikidata wb_terms Trailblazing

Tue, Sep 24

ArielGlenn added a comment to T233460: Different field order of pagelinks only for Southern Azerbaijani Wikipedia.

Good morning Enrico!

Tue, Sep 24, 10:27 AM · Datasets-General-or-Unknown, Dumps-Generation
ArielGlenn moved T233460: Different field order of pagelinks only for Southern Azerbaijani Wikipedia from Backlog to Active on the Dumps-Generation board.
Tue, Sep 24, 6:48 AM · Datasets-General-or-Unknown, Dumps-Generation
ArielGlenn triaged T233460: Different field order of pagelinks only for Southern Azerbaijani Wikipedia as Normal priority.
Tue, Sep 24, 6:41 AM · Datasets-General-or-Unknown, Dumps-Generation
ArielGlenn added a comment to T233460: Different field order of pagelinks only for Southern Azerbaijani Wikipedia.

The sql in maintenance/tables.sql for creating the pagelinks table is the following:

--
-- Track page-to-page hyperlinks within the wiki.
--
CREATE TABLE /*_*/pagelinks (
  -- Key to the page_id of the page containing the link.
  pl_from int unsigned NOT NULL default 0,
  -- Namespace for this page
  pl_from_namespace int NOT NULL default 0,
Tue, Sep 24, 6:41 AM · Datasets-General-or-Unknown, Dumps-Generation

Sun, Sep 22

ArielGlenn updated subscribers of T208612: Release edit data lake data as a public json dump /mysql dump, other?.

How big are these dumps for one set, and how many sets do we intend to keep? Adding @Bstorm since the host behind dumps.wikimedia.org is a WMCS server.

Sun, Sep 22, 5:10 AM · Patch-For-Review, Analytics-Kanban, Research-Backlog, Analytics

Sat, Sep 21

ArielGlenn closed T233276: Manual catchup for wikidata Sept dump run as Resolved.

Everything back to how it was, 20th run going everywhere. Closing.

Sat, Sep 21, 10:18 AM · Dumps-Generation
ArielGlenn added a comment to T233276: Manual catchup for wikidata Sept dump run.

Status files copied over manually, cron job set to go off in about ten minutes. Once that's running I'll re-enable puppet there and be done with this task.

Sat, Sep 21, 9:55 AM · Dumps-Generation
ArielGlenn added a comment to T233276: Manual catchup for wikidata Sept dump run.

The multistream dumps are complete, which means the wikidata run is complete. I'll wait a little to see if the rsync picks up the status files; if not, I'll manually send them around to the labstore and other dumpsdat hosts.

Sat, Sep 21, 9:14 AM · Dumps-Generation
ArielGlenn added a comment to T233276: Manual catchup for wikidata Sept dump run.

Multistream dumps for wikidata are still running but we're closing in on the end.
I've disabled puppet on snapshot1006 and turned off the 20th dumps run for wikidata in the crontab which would start today, to be re-enabled once the current run completes.

Sat, Sep 21, 5:46 AM · Dumps-Generation

Fri, Sep 20

ArielGlenn added a comment to T233276: Manual catchup for wikidata Sept dump run.

All bz2, 7z page meta history files are done, and sha1/md5 sum files produced for them.
The pages logging job is complete, along with the recombine job.
The only jobs remaining are the multistream job and the multistream recombine.
I have marked all other jobs as complete in the dumpruninfo.txt file.

Fri, Sep 20, 6:25 PM · Dumps-Generation
ArielGlenn added a comment to T233276: Manual catchup for wikidata Sept dump run.

Currently running:

  • pages-meta-history 56x - 5803x
  • pages-meta-history 5803x - 60x
  • 7z's/hashes for all currently completed pages-meta-history bz2, 7z files
Fri, Sep 20, 8:58 AM · Dumps-Generation

Thu, Sep 19

ArielGlenn added a comment to T233276: Manual catchup for wikidata Sept dump run.

Meh the above comment is getting too hard to read. Here's what's running this evening:

  • bz2 pages-meta-history from 40x... to the end (it should be interrupted when it reaches the 50x files)
  • bz2 pages-meta-history from 50x to 599x
  • more part 27 7z files
Thu, Sep 19, 7:42 PM · Dumps-Generation
ArielGlenn added a comment to T233276: Manual catchup for wikidata Sept dump run.

Currently running:

  • bz2 pages-meta-history for parts 22,23 - once this completes we can start part 27 50x - 600x
  • bz2 pages-meta-history for part 27 39x - end (will be interrupted when it begins to duplicate completed output from other processes)
  • bz2 pages-meta-history 27 60x - end
  • 7z for parts 26, 27 (partial) - once this completes we can start 21 and rerun 26 to completion; after that we can generate md5/sha1 sums for bz2/7z files that don't have them.
Thu, Sep 19, 6:19 AM · Dumps-Generation
ArielGlenn triaged T233276: Manual catchup for wikidata Sept dump run as High priority.
Thu, Sep 19, 6:08 AM · Dumps-Generation
ArielGlenn closed T232120: Check snapshot1005 dewiki snapshot as Resolved.

This issue is indeed resolved. Closing.

Thu, Sep 19, 6:03 AM · Dumps-Generation

Tue, Sep 17

ArielGlenn added a comment to T219768: Get a third dumpsdata server.

I guess by the closure of the subtask that the server has arrived? What's the outlook for getting it racked?

Tue, Sep 17, 10:23 AM · hardware-requests, Operations, Dumps-Generation
ArielGlenn updated the task description for T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).
Tue, Sep 17, 8:26 AM · DC-Ops, Operations, ops-eqiad

Sep 12 2019

ArielGlenn added a comment to T232739: Requesting access to wmcs beta cluster for igorkim78.

Pretty sure you don't need all that checklist. Can whoever does the grant clean up the description to just leave whatever's necessary? Thanks in advance!

Sep 12 2019, 1:10 PM · Beta-Cluster-Infrastructure, Release-Engineering-Team

Sep 11 2019

ArielGlenn closed T232268: All dumps are broken by MW change which breaks getReplicaServer.php as Resolved.

I see dumpTextPass running for one of the wikis so things are at last back on track. Closing this ticket.

Sep 11 2019, 11:56 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation
ArielGlenn added a comment to T232268: All dumps are broken by MW change which breaks getReplicaServer.php.

Now that the above is deployed, I will watch for the dump scheduler to start succeeding at some of these jobs...

Sep 11 2019, 11:43 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation

Sep 8 2019

ArielGlenn lowered the priority of T232120: Check snapshot1005 dewiki snapshot from High to Normal.

The errors I see are related to a MediaWiki commit and not to this, but since we won't have verification that connecting works until that issue is resolved (see T232268) I'm leaving this open for now, but downgrading its priority.

Sep 8 2019, 5:29 AM · Dumps-Generation
ArielGlenn added a comment to T232268: All dumps are broken by MW change which breaks getReplicaServer.php.

Note that I'm on vacation so I might not be near a keyboard when the fix is pushed out for testing, please don't wait for me.

Sep 8 2019, 5:18 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation
ArielGlenn triaged T232268: All dumps are broken by MW change which breaks getReplicaServer.php as Unbreak Now! priority.
Sep 8 2019, 5:16 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation

Sep 7 2019

ArielGlenn triaged T232120: Check snapshot1005 dewiki snapshot as High priority.
Sep 7 2019, 6:32 AM · Dumps-Generation
ArielGlenn added a comment to T232120: Check snapshot1005 dewiki snapshot.

i'll be looking into this a bit later today/tomorrow (on vacation!). In theory nothing needs to be done; the dump scripts all ask MediaWiki for the password. But I see errors so something changed.

Sep 7 2019, 6:32 AM · Dumps-Generation

Sep 2 2019

ArielGlenn moved T228558: Move viwiki and ukwiki to the big wikis list for xml/sql dumps from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Sep 2 2019, 8:32 AM · Dumps-Generation
ArielGlenn moved T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB from Active to Done on the Dumps-Generation board.
Sep 2 2019, 8:32 AM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation
ArielGlenn closed T228558: Move viwiki and ukwiki to the big wikis list for xml/sql dumps as Resolved.

These wikis ran the Aug 20th dump run successfully with the new config, so closing.

Sep 2 2019, 8:31 AM · Dumps-Generation
ArielGlenn closed T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB as Resolved.

This is merged and live on at least some wikis without incident, so closing.

Sep 2 2019, 8:29 AM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation

Sep 1 2019

ArielGlenn added a comment to T228988: Create batch access interface for page content.

Marking as blocked externally on CPT Clinic Duty board, since @ArielGlenn should have a look before we merge.

Sep 1 2019, 5:12 AM · MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Purple), Patch-For-Review, CPT Initiatives (MCR Schema Migration), Multi-Content-Revisions (Tech Debt)

Aug 29 2019

ArielGlenn added a project to T231522: Two user pages on meta can't be rendered: "request has exceeded memory limit": MediaWiki-extensions-Babel.
Aug 29 2019, 8:38 AM · MediaWiki-extensions-Babel, Operations
ArielGlenn added a project to T231522: Two user pages on meta can't be rendered: "request has exceeded memory limit": Operations.

These two pages are aliases for the same contributor, and the problematic revisions were added in 2016 on each page, so this is some sort of regression (php? babel? combo?)

Aug 29 2019, 8:38 AM · MediaWiki-extensions-Babel, Operations
ArielGlenn added a comment to T231522: Two user pages on meta can't be rendered: "request has exceeded memory limit".

Wikitext for Gangleri: https://meta.wikimedia.org/w/index.php?title=User:Gangleri&action=edit Just look at all those babel entries. Same for the other user: https://meta.wikimedia.org/w/index.php?title=User:%D7%91%D7%B2%D6%B7_%D7%9E%D7%99%D7%A8_%D7%91%D7%99%D7%A1%D7%98%D7%95_%D7%A9%D7%99%D7%99%D7%9F&action=edit

Aug 29 2019, 8:34 AM · MediaWiki-extensions-Babel, Operations
ArielGlenn created T231522: Two user pages on meta can't be rendered: "request has exceeded memory limit".
Aug 29 2019, 8:30 AM · MediaWiki-extensions-Babel, Operations

Aug 28 2019

ArielGlenn added a comment to T231224: MWDumper loads titles but revisions aren't accessible.

Which xml dump file did you import, can you provide a link? And can you let us know the version of MediaWiki you have installed? Also, can you provide the full stack trace from the error output? Thank you.

Aug 28 2019, 6:22 AM · Utilities-mwdumper
ArielGlenn added a comment to T68025: [Story] Monitor size of some Wikidata database tables.

When I look at that image it looks pretty empty, am I missing something?

Aug 28 2019, 6:18 AM · WMDE-Analytics-Engineering, DBA, Story, Wikidata, Wikidata.org

Aug 27 2019

ArielGlenn added a comment to T68025: [Story] Monitor size of some Wikidata database tables.

I think Reedy was away and didn't see my pings. Anyways, thanks for moving forward on this, and we'll see how it looks in a week!

Aug 27 2019, 6:45 PM · WMDE-Analytics-Engineering, DBA, Story, Wikidata, Wikidata.org
ArielGlenn added a comment to T231276: RevisionBasedEntityLookup.php: Revision 363395998 belongs to M77688146 instead of expected M81625979.

...

It’s part of the serialization. Not sure why that would be a new issue, though – this seems like a fairly fundamental issue (tying the page ID to the page content even though it’s not stable across delete+restore). Is it possible that File:Bolsonaro_etc is just the first file with structured data that was deleted and then restored?

Aug 27 2019, 1:39 PM · Structured-Data-Backlog (Current Work), Structured Data Engineering, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata, WikibaseMediaInfo, Wikimedia-production-error
ArielGlenn added a comment to T231276: RevisionBasedEntityLookup.php: Revision 363395998 belongs to M77688146 instead of expected M81625979.

https://commons.wikimedia.org/wiki/Special:Log?type=&user=&page=File%3ABolsonaro_with_Israeli_PM_Benjamin_Netanyahu%2C_Tel_Aviv%2C_31_March_2019.jpg&wpdate=&tagfilter= It was deleted and restored on 02:45, 26 Αυγούστου 2019 so I guess something isn't handled quite right in MediaInfo entities for these cases.

Aug 27 2019, 9:59 AM · Structured-Data-Backlog (Current Work), Structured Data Engineering, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata, WikibaseMediaInfo, Wikimedia-production-error

Aug 23 2019

ArielGlenn added a comment to T226698: Allow all Analytics tools to work with Kerberos auth.

@elukey On our previous server we let people pull from us and it was very difficult to manage upgrades or any sort of maintenance. Somewhere there's a ticket with the awfulness.

Aug 23 2019, 3:11 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
ArielGlenn added a comment to T230856: RDF dump performance for SDC.

https://github.com/apergos/misc-wmf-crap/tree/master/glyph-image-generator Starting to get clever about this: ability to generate 50k small images with metadata that can be extracted for using in depicts and/or caption statements.

Aug 23 2019, 3:05 PM · Structured-Data-Backlog (Current Work), Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Aug 22 2019

ArielGlenn added a project to T224563: Migrate dumpsdata hosts to Stretch/Buster: Dumps-Generation.
Aug 22 2019, 3:21 PM · Dumps-Generation, Operations

Aug 21 2019

ArielGlenn added a comment to T230856: RDF dump performance for SDC.

I'm looking at deployment-db05 now, and there are 63332 rows in the revision table, with 53250 rows in the content table. I guess we need to double the number of revisions and then add the structured data for those entries. we can probably be clever about this via a script.

Aug 21 2019, 6:19 AM · Structured-Data-Backlog (Current Work), Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn added a comment to T230856: RDF dump performance for SDC.

@Smalyshev Do you know how many entries have structured data on deployment-prep? Is that a useful testing ground right now or should we be populating the data over there first?

Aug 21 2019, 5:51 AM · Structured-Data-Backlog (Current Work), Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Aug 12 2019

ArielGlenn moved T230099: set up a cron job that watches dumps logs for exceptions and reports periodically from Active to Done on the Dumps-Generation board.
Aug 12 2019, 9:23 AM · Dumps-Generation
ArielGlenn moved T226167: audit public tables and make sure we dump them all from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Aug 12 2019, 9:22 AM · Patch-For-Review, Dumps-Generation
ArielGlenn closed T230099: set up a cron job that watches dumps logs for exceptions and reports periodically as Resolved.

This appears to be working as it should. Closing.

Aug 12 2019, 9:21 AM · Dumps-Generation
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

I'm not thinking about the amount of time it takes, but rather the load on the database servers. Reasonable sized batched queries will be better, as I've seen already with stub dumps and slot retrieval.

Aug 12 2019, 7:58 AM · Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Patch-For-Review, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

I think T222497 should be resolved before this goes live. I can test it in deployment-prep before then, but I don't want to do production tests until there is some sort of batching.

Aug 12 2019, 7:34 AM · Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Patch-For-Review, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Aug 9 2019

ArielGlenn added a comment to T226167: audit public tables and make sure we dump them all.

Thanks a lot! I've updated the patch above to remove those entries. Now just waiting on the wb_terms migration to get further along.

Aug 9 2019, 3:04 PM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T203075: Warning: MediaWiki\Storage\SqlBlobStore::fetchBlob: Bad data in text row 5191150.

The entry in the text row points to a non-existent blob in a cluster.

wikiadmin@10.64.48.34(zhwiki)> select * from text where old_id = 9375723;
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| old_id  | old_namespace | old_title | old_text         | old_comment | old_user | old_user_text | old_timestamp | old_minor_edit | old_flags           | inverse_timestamp |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
| 9375723 |             0 |           | DB://cluster20/0 |             |        0 |               |               |              0 | utf-8,gzip,external |                   |
+---------+---------------+-----------+------------------+-------------+----------+---------------+---------------+----------------+---------------------+-------------------+
1 row in set (0.00 sec)

The id of 0 after the cluster20 address is the issue, just like other entries on this ticket.

Aug 9 2019, 2:26 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-production-error
ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Aug 9 2019, 6:56 AM

Aug 8 2019

ArielGlenn added a comment to T217329: bug in 1.33.0-wmf.18 breaks abstract dumps on testwikidatawiki | MWContentSerializationException $entityId and $targetId can not be the same.

The python scripts at the dump end are (mostly) protected against exceptions from MediaWiki generally and from this failure case in particular. Since we have problematic data in production I've re-opened the ticket so that the WikiBase issue can somehow be resolved.

Aug 8 2019, 4:33 PM · Multi-Content-Revisions (Tech Debt), CPT Initiatives (MCR), wikidata-tech-focus, User-Addshore, Wikimedia-production-error, Wikidata-Campsite, Wikidata, Dumps-Generation
ArielGlenn moved T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB from Up Next to Active on the Dumps-Generation board.
Aug 8 2019, 1:14 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation
ArielGlenn added a comment to T228772: convert dump maintenance scripts to use Maintenance::getDB instead of getConnection /wfGetDB.

Most of these were handled in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/524666/ but not quite all.

Aug 8 2019, 1:14 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Dumps-Generation
ArielGlenn moved T226093: Capacity planning for Commons Structured Data from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Aug 8 2019, 1:00 PM · Dumps-Generation, Operations, SDC General, Wikidata
ArielGlenn moved T219768: Get a third dumpsdata server from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Aug 8 2019, 12:59 PM · hardware-requests, Operations, Dumps-Generation
ArielGlenn moved T228921: incomplete conversion of flow revisions after disabling flow, breaks stubs dumps from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Aug 8 2019, 12:59 PM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T230099: set up a cron job that watches dumps logs for exceptions and reports periodically from Backlog to Active on the Dumps-Generation board.
Aug 8 2019, 12:59 PM · Dumps-Generation
ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

I should clarify; I think the only thing that is needed is to set the content model column for those rows in the content table to 1 (or whichever model is 'wikitext'). The dewikiversity tickets are similar.

Aug 8 2019, 9:39 AM · User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

I expected that; this requires direct intervention at the db level. I was sort of hoping you were volunteering to do it :-D

Aug 8 2019, 9:24 AM · User-Zoranzoki21, Regression, Wikimedia-Site-requests
ArielGlenn triaged T230099: set up a cron job that watches dumps logs for exceptions and reports periodically as Normal priority.
Aug 8 2019, 7:00 AM · Dumps-Generation

Aug 7 2019

ArielGlenn added a comment to T207627: Disable unused Flow extension on ur.wikibooks.

@MarcoAurelio Wonderful! It's not the pages though; in the page entry for each of the bad revisions, the content model is listed there as wikitext. It's only in the content table where the wrong content model (4) is shown.
The entries in the content table are listed in T207627#5105046 (double-checked just now to be sure the list is still the same). The slot, revision and pages infocorresponding to each of those is listed in the same comment, double-checked just now to be sure none of that changed either.

Aug 7 2019, 10:45 AM · User-Zoranzoki21, Regression, Wikimedia-Site-requests

Aug 5 2019

ArielGlenn updated subscribers of T226167: audit public tables and make sure we dump them all.

@aaron This revision rEFLR848ef073fa89036c40c440016a8092690ddcf56b for FlaggedRevs seems to indicate that flaggedrevs_stats and flaggedrevs_stats2 are no longer used. Do you know or can you point me to someone who could verify that this is the case? If they aren't used, I will add them to my 'don't ever dump these' list. Thanks!

Aug 5 2019, 12:30 PM · Patch-For-Review, Dumps-Generation
ArielGlenn closed T51134: Create partial SQL dump of logging table, a subtask of T140977: Dump public data from all sql tables that have mixed public/private data, as Declined.
Aug 5 2019, 12:18 PM · Dumps-Generation
ArielGlenn closed T51134: Create partial SQL dump of logging table as Declined.

I'm going to decline this because we would have to walk through and decide which entries can be published and which ones not .
This should be done by letting MediaWiki do the work, rather than re-implementing the logic in the python scripts and needing to keep it in sync.
The pages-logging xml dumps already do that for us.

Aug 5 2019, 12:18 PM · Dumps-Generation
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

I sincerely apologize: this weekend the heat baked my brain and I did nothing related to computers at all. And Friday evening I was out. I'll set a notification to remind me this coming Friday earlier in the day, so that this gets done.

Aug 5 2019, 7:23 AM · Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Patch-For-Review, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata

Aug 2 2019

ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Aug 2 2019, 10:35 AM
ArielGlenn edited P8774 Db table audit for dumps 2019 part 2.
Aug 2 2019, 9:55 AM

Aug 1 2019

ArielGlenn moved T227084: Find and document current xml/sql dumps behavior for every dang edge case from Backlog to Up Next on the Dumps-Generation board.
Aug 1 2019, 5:22 PM · Dumps-Generation

Jul 30 2019

ArielGlenn added a comment to T198343: Replace all calls to Revision::getRevisionText().

The category links are the fallback that was designed, so this is a net positive. Going to go update the CR on the patch.

Jul 30 2019, 12:42 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), MW-1.34-release, Core Platform Team Workboards (Clinic Duty Team), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Structured-Data-Backlog, CPT Initiatives (MCR Schema Migration), Patch-For-Review, Multi-Content-Revisions (Tech Debt), Structured Data Engineering, Wikidata
ArielGlenn added a comment to T198343: Replace all calls to Revision::getRevisionText().

Ran the following with old and new code for abstracts:

/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=anwiki /srv/v/mediawiki/php-1.34.0-wmf.15/extensions/ActiveAbstract/includes/AbstractFilter.php  --full --report=1 --output=file:/mnt/dumpsdata/temp/dumpsgen/abstracts-anwiki-cr-testing.txt.test  --filter=noredirect --filter=abstract --skip-header --start=36142 --skip-footer --end 36150
Jul 30 2019, 11:28 AM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), MW-1.34-release, Core Platform Team Workboards (Clinic Duty Team), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Structured-Data-Backlog, CPT Initiatives (MCR Schema Migration), Patch-For-Review, Multi-Content-Revisions (Tech Debt), Structured Data Engineering, Wikidata
ArielGlenn added a comment to T202485: ActiveAbstract tests are failing.

is this still an issue? It wasn't even on my radar but I saw it now by a change search in phab.

Jul 30 2019, 11:05 AM · MW-1.32-notes (WMF-deploy-2018-09-04 (1.32.0-wmf.20)), ActiveAbstract
ArielGlenn moved T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs! from Active to Done on the Dumps-Generation board.
Jul 30 2019, 10:38 AM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Wikimedia-production-error, Wikidata, Dumps-Generation
ArielGlenn closed T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs! as Resolved.

The wikis ran to completion, but I forgot to close this. Doing so now!

Jul 30 2019, 10:38 AM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Wikimedia-production-error, Wikidata, Dumps-Generation