Page MenuHomePhabricator

ArielGlenn (ariel)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 7:09 PM (294 w, 3 d)
Availability
Available
IRC Nick
apergos
LDAP User
ArielGlenn
MediaWiki User
ArielGlenn [ Global Accounts ]

Recent Activity

Wed, May 27

ArielGlenn added a comment to T251768: Make partman/custom/no-srv-format.cfg work.

I have the recipe dumpsdata100X-no-data-format.cfg which does less than it should (but at least doesn't format the array). I'd love a fully functional solution.

Wed, May 27, 5:40 PM · DBA
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

@dcausse what's your time frame?

Wed, May 27, 5:04 PM · Patch-For-Review, Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Wikidata-Query-Service, Commons, Wikidata
ArielGlenn added a comment to T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run.

Unless folks want to keep it open to work on speeding it up in the future?

Wed, May 27, 4:57 PM · Wikimedia-Incident, Wikimedia-database-error, MediaWiki-Special-pages, Wikidata
Mahir256 awarded T221917: Create RDF dump of structured data on Commons a Party Time token.
Wed, May 27, 2:53 PM · Patch-For-Review, Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Wikidata-Query-Service, Commons, Wikidata

Mon, May 25

ArielGlenn moved T253468: stubs dumps (plus all later stages) broken on select wikis from Backlog to Active on the Dumps-Generation board.
Mon, May 25, 5:13 AM · MW-1.35-notes (1.35.0-wmf.31; 2020-05-05), Dumps-Generation, Core Platform Team Workboards (Clinic Duty Team)

Sun, May 24

ArielGlenn added a comment to T253468: stubs dumps (plus all later stages) broken on select wikis.

I have tested the above patch on snapshot1010 and the exception is suppressed.

Sun, May 24, 8:56 AM · MW-1.35-notes (1.35.0-wmf.31; 2020-05-05), Dumps-Generation, Core Platform Team Workboards (Clinic Duty Team)
ArielGlenn added a comment to T253468: stubs dumps (plus all later stages) broken on select wikis.

I guess that we need to wrap $page->getRedirectTarget() with invokeLenient() and make sure that no other parts of the code can later trip up on this exception.

Sun, May 24, 8:24 AM · MW-1.35-notes (1.35.0-wmf.31; 2020-05-05), Dumps-Generation, Core Platform Team Workboards (Clinic Duty Team)
ArielGlenn triaged T253468: stubs dumps (plus all later stages) broken on select wikis as Unbreak Now! priority.
Sun, May 24, 8:19 AM · MW-1.35-notes (1.35.0-wmf.31; 2020-05-05), Dumps-Generation, Core Platform Team Workboards (Clinic Duty Team)

Thu, May 21

ArielGlenn added a comment to T252396: Split page-meta-history wikidata dump job across multiple hosts.

More things needed:

Thu, May 21, 10:04 AM · Patch-For-Review, Dumps-Generation

Wed, May 20

ArielGlenn moved T252632: Restart wikidata entity dumps from Backlog to Done on the Dumps-Generation board.
Wed, May 20, 8:10 AM · Wikidata, Dumps-Generation
ArielGlenn moved T248857: Wikdata entities dump not generated from Backlog to Done on the Dumps-Generation board.
Wed, May 20, 8:10 AM · Discovery-Search (Current work), Dumps-Generation, Wikidata
ArielGlenn moved T242221: Don't remove generated 7z content files when rerunning 7z step from Backlog to Done on the Dumps-Generation board.
Wed, May 20, 8:10 AM · Dumps-Generation
ArielGlenn moved T241169: Create database dump of new Wikibase term store from Backlog to Done on the Dumps-Generation board.
Wed, May 20, 8:10 AM · Dumps-Generation, Wikidata
ArielGlenn moved T240213: Write integration tests for XML dumps with multiple MCR slots per revision from Backlog to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration), Dumps-Generation
ArielGlenn moved T218168: Content Translation Parallel Corpus API and Dumps have different data from Other teams to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · Dumps-Generation, ContentTranslation
ArielGlenn moved T191639: Wikidata JSON dumps do not have the 'ns' (namespace) from Other teams to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · User-Addshore, Datasets-General-or-Unknown, Dumps-Generation, Wikidata
ArielGlenn moved T222497: dumpRDF for MediaInfo entities loads each page individually from Other teams to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · MW-1.35-notes (1.35.0-wmf.10; 2019-12-10), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Structured-Data-Backlog (Current Work), Core Platform Team, MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Dumps-Generation, User-Smalyshev, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn moved T246074: Improve performance when writing multi-content revisions to XML dumps from Other teams to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · MW-1.35-notes (1.35.0-wmf.26; 2020-03-31), MW-1.35-release, Core Platform Team Workboards (Clinic Duty Team), StructuredDataOnCommons, Dumps-Generation
ArielGlenn moved T239905: dumpRdf for mediainfo entities loads data from db more often than it needs to from Other teams to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · MW-1.35-notes (1.35.0-wmf.11; 2019-12-17), Structured-Data-Backlog (Current Work), Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn moved T220160: getRedirectTarget should not automatically load revision content in all cases from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-production-error, Regression, MediaWiki-Revision-backend, Dumps-Generation
ArielGlenn moved T230856: RDF dump performance for SDC from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · Structured-Data-Backlog (Current Work), Dumps-Generation, WikibaseMediaInfo, Wikidata-Query-Service, SDC General, Commons, Wikidata
ArielGlenn moved T226167: audit public tables and make sure we dump them all from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · Patch-For-Review, Dumps-Generation
ArielGlenn moved T241149: rdfDump.php generates error messages when dumping for pages without mediainfo items from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:09 AM · Structured-Data-Backlog (Current Work), Structured Data Engineering, WikibaseMediaInfo, Dumps-Generation
ArielGlenn moved T243055: Publish SQL dumps of CodeReview tables from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:08 AM · Security-Team, DBA, Dumps-Generation, MediaWiki-extensions-CodeReview
ArielGlenn moved T245721: Promote kowiki to membership in the 'bigwikis' list from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:08 AM · Dumps-Generation
ArielGlenn moved T238921: MCR: Include all slots in XML dumps per default from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:08 AM · MW-1.35-notes (1.35.0-wmf.28; 2020-04-14), MW-1.35-release, CPT Initiatives (MCR Schema Migration), Core Platform Team Workboards (Clinic Duty Team), Dumps-Generation
ArielGlenn moved T236431: Data dumps for the MachineVision extension from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:08 AM · Security-Team, Structured-Data-Backlog, SDC-Statements (Machine-vision-depicts), Dumps-Generation, MachineVision
ArielGlenn moved T242209: Clean up after kiling snapshot1006 dump processes due to wikidata dump run abort from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:08 AM · Dumps-Generation
ArielGlenn moved T249477: Fix multistream download link in recent dumps index.html pages from Blocked/Stalled/Waiting for event to Done on the Dumps-Generation board.
Wed, May 20, 8:08 AM · Dumps-Generation
ArielGlenn moved T238959: Make TextPassDumperTest work with 0.11 dump schema from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · MW-1.35-notes (1.35.0-wmf.18; 2020-02-04), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration), Dumps-Generation, Structured-Data-Backlog, Multi-Content-Revisions (New Features), Structured Data Engineering, Wikidata
ArielGlenn moved T241794: (Need By Jan 25) rack/setup/install snapshot1010.eqiad.wmnet from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · Dumps-Generation, Operations
ArielGlenn moved T246465: clean up page content generation code and file listing methods as prep work for splitting page content generation across multiple servers from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · Dumps-Generation
ArielGlenn moved T249131: write unittest to check prefetch ranges for checkpoint files from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · Dumps-Generation
ArielGlenn moved T249508: Simplify private/public wiki dump handling from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · Dumps-Generation
ArielGlenn moved T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · Dumps-Generation
ArielGlenn moved T243434: Dumps should write pagerange info for page content jobs to a file from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · Dumps-Generation
ArielGlenn moved T245193: Wikidata aborted page-meta-history jobs after db1087 depooled from Active to Done on the Dumps-Generation board.
Wed, May 20, 8:07 AM · Dumps-Generation

Tue, May 19

ArielGlenn moved T252396: Split page-meta-history wikidata dump job across multiple hosts from Backlog to Active on the Dumps-Generation board.
Tue, May 19, 10:16 AM · Patch-For-Review, Dumps-Generation

Sat, May 16

ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

I see that we're no longer blocked. Does this mean that we're good to go for weekly runs?

Sat, May 16, 9:51 AM · Patch-For-Review, Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Wikidata-Query-Service, Commons, Wikidata

Thu, May 14

ArielGlenn added a comment to T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run.

...

Anyway, Lydia said it's fine to do it tomorrow when it gets announced by our communication manager. Does that work for you?

Thu, May 14, 5:44 AM · Wikimedia-Incident, Wikimedia-database-error, MediaWiki-Special-pages, Wikidata

Wed, May 13

ArielGlenn added a comment to T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run.

Can we do this temporarily while the query is being fixed up? It looks like it had to be killed in Nov, Feb, Apr, May, so I'd rather temp disable than require folks to shoot it (and anything else hung as a side effect).

Wed, May 13, 10:02 AM · Wikimedia-Incident, Wikimedia-database-error, MediaWiki-Special-pages, Wikidata
ArielGlenn added a comment to T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run.

Can we just skip the updateSpecialPages.php wikidatawiki --override --only=Fewestrevisions script altogether, instead of shooting it every month?

Wed, May 13, 9:50 AM · Wikimedia-Incident, Wikimedia-database-error, MediaWiki-Special-pages, Wikidata
ArielGlenn added a comment to T252632: Restart wikidata entity dumps .

As I understand it the long running query comes from a monthly cron job.

Wed, May 13, 9:48 AM · Wikidata, Dumps-Generation
ArielGlenn created T252632: Restart wikidata entity dumps .
Wed, May 13, 9:42 AM · Wikidata, Dumps-Generation

Mon, May 11

ArielGlenn added a comment to T252396: Split page-meta-history wikidata dump job across multiple hosts.

The basic idea is that the scripts will run as they have done up to now, but after a host running 'small' or 'big' (but not huge) wikis completes, it will check for 'batches' for wikidatawiki and run any o those available, at the same time that the main server for the wikidatawiki is going about its business writing out page meta history files, also in batches.

Mon, May 11, 11:48 AM · Patch-For-Review, Dumps-Generation
ArielGlenn triaged T252396: Split page-meta-history wikidata dump job across multiple hosts as High priority.
Mon, May 11, 10:52 AM · Patch-For-Review, Dumps-Generation

Fri, May 8

ArielGlenn added a comment to T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..

One more question: are these changes going to be applied in deployment-prep first? Because I'd like to test there just to be extra sure.

Fri, May 8, 9:59 AM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
ArielGlenn added a comment to T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..

When these new fields rev_actor and rev_comment_id are added, what populates them?

Fri, May 8, 8:14 AM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)
ArielGlenn added a comment to T238966: Apply updates for MCR, actor migration, and content migration, to production wikis..

Should I get a ticket open and get communicating with the community ASAP on dropping rev_text_id, rev_content_model, rev_content_format, ar_text_id, ar_content_model, and ar_content_format?

+1
I will also start the change on a s6 (frwiki, jawiki, ruwiki) codfw slave first, leave it for a few days, and then on an s6 eqiad slave and leave it for a few days as well to make sure we are good to go.

Also ping @ArielGlenn as this will be a massive schema change, pinging in case there's stuff that needs to change within the dumps world!

Fri, May 8, 7:32 AM · Cloud-Services, DBA, Schema-change, Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (MCR Schema Migration)

Thu, May 7

ArielGlenn added a comment to T250715: Drop (and archive?) aft_feedback.

No internal links @Reedy, sorry ;-) The above can go as soon as someone gives the final thumbs up.

Thu, May 7, 2:09 PM · Patch-For-Review, Privacy Engineering, Security-Team, DBA

Wed, May 6

ArielGlenn added a comment to T250715: Drop (and archive?) aft_feedback.

Let's add that to the 'other' index.html page too, or no one will know it's there. Can someone supply a phrase describing the contents, for downloaders?

Wed, May 6, 3:44 PM · Patch-For-Review, Privacy Engineering, Security-Team, DBA
ArielGlenn added a comment to T251980: Unable to use force index on replicas (Key 'PRIMARY' doesn't exist in table 'page').

On which host(s) are you running the above queries?

Wed, May 6, 7:28 AM · Quarry, Data-Services

Tue, May 5

ArielGlenn moved T251411: page_restrictions field incomplete in current and historical dumps from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Tue, May 5, 6:33 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics
ArielGlenn added a parent task for T35334: Remove database column page.page_restrictions from MediaWiki core: T251411: page_restrictions field incomplete in current and historical dumps.
Tue, May 5, 6:33 AM · MediaWiki-General, Patch-For-Review, Schema-change
ArielGlenn added a parent task for T218446: Remove use of legacy page.page_restrictions field: T251411: page_restrictions field incomplete in current and historical dumps.
Tue, May 5, 6:33 AM · MW-1.35-notes (1.35.0-wmf.25; 2020-03-24), Patch-For-Review, Core Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), Schema-change, MediaWiki-General, Technical-Debt (Deprecation process)
ArielGlenn added subtasks for T251411: page_restrictions field incomplete in current and historical dumps: T35334: Remove database column page.page_restrictions from MediaWiki core, T218446: Remove use of legacy page.page_restrictions field.
Tue, May 5, 6:33 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics
ArielGlenn added a comment to T251411: page_restrictions field incomplete in current and historical dumps.

Making the command decision to block this task on T218446 / T35334.

Tue, May 5, 6:32 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics

Sat, May 2

ArielGlenn added a comment to T251411: page_restrictions field incomplete in current and historical dumps.

See also T35334 for another bug on removing that field from core.

Sat, May 2, 5:16 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics

Fri, May 1

ArielGlenn moved T251411: page_restrictions field incomplete in current and historical dumps from Backlog to Active on the Dumps-Generation board.
Fri, May 1, 10:41 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics
ArielGlenn added a comment to T205361: Make an HTML dump of the output of the CodeReview extension on MediaWiki.org.

Sounds good to me, though we probably want that dicussion on the other task.

Fri, May 1, 10:38 AM · Core Platform Team Workboards (Clinic Duty Team), MW-1.33-notes (1.33.0-wmf.25; 2019-04-09), MediaWiki-extensions-CodeReview
ArielGlenn added a comment to T205361: Make an HTML dump of the output of the CodeReview extension on MediaWiki.org.

The HTML dump can be in a tarball for download, sure. But that is separate from what was requested in T243056 i.e. actually serving a static copy for browsing. I don't think the labstore boxes should be doing that.

Fri, May 1, 10:22 AM · Core Platform Team Workboards (Clinic Duty Team), MW-1.33-notes (1.33.0-wmf.25; 2019-04-09), MediaWiki-extensions-CodeReview
ArielGlenn updated subscribers of T243055: Publish SQL dumps of CodeReview tables.

I don't know the extension either. Hey @brion :-D

Fri, May 1, 10:13 AM · Security-Team, DBA, Dumps-Generation, MediaWiki-extensions-CodeReview
ArielGlenn added a project to T251411: page_restrictions field incomplete in current and historical dumps: Core Platform Team Workboards (Clinic Duty Team).
Fri, May 1, 7:35 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics
ArielGlenn added a comment to T251411: page_restrictions field incomplete in current and historical dumps.

See also: T218446

Fri, May 1, 7:34 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics
ArielGlenn added a comment to T251411: page_restrictions field incomplete in current and historical dumps.

This has been broken a long time. I need to check a little bit more of the history, but I can verify that exports use the contents of the obsolete page_restrictions field from the page table instead of the cooresponding entries in the page_restrictions table. Proof of this: enwiki page 'Authur Schoepnhauer' with page id 700 has the following row entry:

wikiadmin@10.64.32.76(enwiki)> select * from page where page_id = 700;
+---------+----------------+---------------------+-------------------+------------------+-------------+--------------------+----------------+--------------------+-------------+----------+--------------------+-----------+
| page_id | page_namespace | page_title          | page_restrictions | page_is_redirect | page_is_new | page_random        | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
+---------+----------------+---------------------+-------------------+------------------+-------------+--------------------+----------------+--------------------+-------------+----------+--------------------+-----------+
|     700 |              0 | Arthur_Schopenhauer | move=:edit=       |                0 |           0 | 0.8153719695428131 | 20200430073836 | 20200426170144     |   952937594 |   155341 | wikitext           | NULL      |
+---------+----------------+---------------------+-------------------+------------------+-------------+--------------------+----------------+--------------------+-------------+----------+--------------------+-----------+

and no entries in the page_restrictions table. But the contents of the xml element in the pages-articles bz2 file are "move=:edit="

Fri, May 1, 7:29 AM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics

Apr 30 2020

ArielGlenn added a comment to T251411: page_restrictions field incomplete in current and historical dumps.

I've been looking at the en wiki dump files for part 18 (containing this page) and there are no entries for page restrictions in either the 2019-02 dumps or the 2018-02 dumps. So this bug seems to have been around for awhile. I'm checking older dumps but it will take a while to download and decompress them.

Apr 30 2020, 4:24 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics
ArielGlenn added a comment to T251411: page_restrictions field incomplete in current and historical dumps.
wikiadmin@10.64.32.76(enwiki)> select * from page_restrictions where pr_page = 13856248;
+----------+---------+----------------+------------+---------+-----------+--------+
| pr_page  | pr_type | pr_level       | pr_cascade | pr_user | pr_expiry | pr_id  |
+----------+---------+----------------+------------+---------+-----------+--------+
| 13856248 | edit    | templateeditor |          0 |    NULL | infinity  | 503137 |
| 13856248 | move    | templateeditor |          0 |    NULL | infinity  | 503138 |
+----------+---------+----------------+------------+---------+-----------+--------+
2 rows in set (0.01 sec)
Apr 30 2020, 2:15 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Dumps-Generation, Analytics
ArielGlenn added a comment to T205361: Make an HTML dump of the output of the CodeReview extension on MediaWiki.org.

Maybe talk to @ArielGlenn about getting it on the official dumps servers (dumps.wikimedia.org) under "misc". That would be more stable than the people VM.

Apr 30 2020, 1:41 PM · Core Platform Team Workboards (Clinic Duty Team), MW-1.33-notes (1.33.0-wmf.25; 2019-04-09), MediaWiki-extensions-CodeReview
ArielGlenn added a comment to T243055: Publish SQL dumps of CodeReview tables.

labstore1006.wikimedia.org and labstore1007.wikimedia.org in /srv/dumps/xmldatadumps/public/other let's make the subdirectory codereview with a subdirectory for the date (20200428 I guess) and drop the gz files in there. I guess you can make the directories and files owner and group dumpsgen:dumpsgen.

Apr 30 2020, 12:33 PM · Security-Team, DBA, Dumps-Generation, MediaWiki-extensions-CodeReview
ArielGlenn added a watcher for affects-Kiwix-and-openZIM: ArielGlenn.
Apr 30 2020, 12:21 PM

Apr 22 2020

ArielGlenn added a comment to T250772: Special:Import should skip revisions with identical content to the current revision.

So to be very clear, this report is only about importing the current revision of a page if it is the same as the existing top revision on the local wiki. I am unsure as to whether skipping such revisions ought to be the default, but it could at least be a checkbox.

Apr 22 2020, 8:47 AM · MediaWiki-Export-or-Import

Apr 21 2020

ArielGlenn closed T249131: write unittest to check prefetch ranges for checkpoint files as Resolved.

Done.

Apr 21 2020, 8:27 AM · Dumps-Generation
ArielGlenn closed T249508: Simplify private/public wiki dump handling as Resolved.

This is complete and tables have been dumped using the new code.

Apr 21 2020, 8:26 AM · Dumps-Generation
ArielGlenn moved T250260: skip existing page content dump output files in bz2 and 7z generation jobs for each batch of commands from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Apr 21 2020, 8:26 AM · Dumps-Generation
ArielGlenn added a comment to T250260: skip existing page content dump output files in bz2 and 7z generation jobs for each batch of commands.

Keeping this open until the code path gets used, probably during the May run.

Apr 21 2020, 8:25 AM · Dumps-Generation
ArielGlenn closed T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link as Resolved.

Closing this since all files are now available and patch has been deployed and used.

Apr 21 2020, 8:25 AM · Dumps-Generation
ArielGlenn added a comment to T243055: Publish SQL dumps of CodeReview tables.

Just a check-in to see where this is on people's radar.

Apr 21 2020, 8:24 AM · Security-Team, DBA, Dumps-Generation, MediaWiki-extensions-CodeReview
ArielGlenn closed T236431: Data dumps for the MachineVision extension, a subtask of T238574: Create wiki replica views for MachineVision extension tables, as Resolved.
Apr 21 2020, 8:23 AM · Privacy Engineering, Privacy, Security-Team, Structured-Data-Backlog, Data-Services, cloud-services-team (Kanban), SDC-Statements (Machine-vision-depicts), MachineVision
ArielGlenn closed T236431: Data dumps for the MachineVision extension as Resolved.

Sent: https://lists.wikimedia.org/pipermail/xmldatadumps-l/2020-April/001531.html so closing.

Apr 21 2020, 8:23 AM · Security-Team, Structured-Data-Backlog, SDC-Statements (Machine-vision-depicts), Dumps-Generation, MachineVision
ArielGlenn moved T246074: Improve performance when writing multi-content revisions to XML dumps from Blocked/Stalled/Waiting for event to Other teams on the Dumps-Generation board.
Apr 21 2020, 8:17 AM · MW-1.35-notes (1.35.0-wmf.26; 2020-03-31), MW-1.35-release, Core Platform Team Workboards (Clinic Duty Team), StructuredDataOnCommons, Dumps-Generation
ArielGlenn added a comment to T221917: Create RDF dump of structured data on Commons.

Hi, just checking in: any progress on invetigating the 'extra' dumps content?

Apr 21 2020, 8:16 AM · Patch-For-Review, Dumps-Generation, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Wikidata-Query-Service, Commons, Wikidata
ArielGlenn closed T249477: Fix multistream download link in recent dumps index.html pages as Resolved.

Closing this, as all links have been fixed up.

Apr 21 2020, 8:15 AM · Dumps-Generation
ArielGlenn added a comment to T250772: Special:Import should skip revisions with identical content to the current revision.

Can you clarify the specific circumstances in which you would want a revision for import to be skipped? What would the person by trying to do, import the current revision of a page from elsewhere or potentially multiple revisions?

Apr 21 2020, 8:11 AM · MediaWiki-Export-or-Import

Apr 18 2020

ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

The noop for wikidata is running now.

Apr 18 2020, 5:42 PM · Dumps-Generation

Apr 17 2020

ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

Running

dumpsgen@snapshot1009:/srv/deployment/dumps/dumps/xmldumps-backup$ bash fixup_scripts/do_dumptextpass_jobs.sh --wiki wikidatawiki --config /etc/dumps/confs/wikidump.conf.dumps:wd --date 20200401 --jobinfo 27:70360538:88436737 --skiplock --numjobs 27

in a screen session. I'll need to intervene at some point to stop the current run, catch up the 7zs, and perhaps do a noop.

Apr 17 2020, 7:12 PM · Dumps-Generation
ArielGlenn added a comment to T236431: Data dumps for the MachineVision extension.

This is done now. Feel free to announce it wherever you like. After I have sent mail to the xmldatadumps-l list, I will close this task unless there is something else you need.

Apr 17 2020, 10:12 AM · Security-Team, Structured-Data-Backlog, SDC-Statements (Machine-vision-depicts), Dumps-Generation, MachineVision

Apr 16 2020

ArielGlenn moved T249903: Decommission francium from Backlog to Up Next on the Dumps-Generation board.
Apr 16 2020, 10:22 AM · Dumps-Generation
ArielGlenn moved T250260: skip existing page content dump output files in bz2 and 7z generation jobs for each batch of commands from Backlog to Active on the Dumps-Generation board.
Apr 16 2020, 10:22 AM · Dumps-Generation

Apr 15 2020

ArielGlenn triaged T250260: skip existing page content dump output files in bz2 and 7z generation jobs for each batch of commands as Medium priority.
Apr 15 2020, 10:17 AM · Dumps-Generation
ArielGlenn moved T249477: Fix multistream download link in recent dumps index.html pages from Active to Blocked/Stalled/Waiting for event on the Dumps-Generation board.
Apr 15 2020, 10:13 AM · Dumps-Generation
ArielGlenn closed T226167: audit public tables and make sure we dump them all as Resolved.

This is deployed, dumps have run with it, so it can be closed at last.

Apr 15 2020, 10:11 AM · Patch-For-Review, Dumps-Generation
ArielGlenn added a comment to T236431: Data dumps for the MachineVision extension.

@Mholloway care to give me a one line description that I can use for the index.html page mentioned above? See the existing page for examples. I could announce it on the xmldatadumps-l mailing list afterwards, unless you care to do the honours.

Apr 15 2020, 7:07 AM · Security-Team, Structured-Data-Backlog, SDC-Statements (Machine-vision-depicts), Dumps-Generation, MachineVision
ArielGlenn moved T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link from Backlog to Active on the Dumps-Generation board.
Apr 15 2020, 7:05 AM · Dumps-Generation
ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

Enwiki is complete, run with the merged patch. Wikidata wikis 7zs are being generated by manual runs a couple of times a day as the bz2 page content files are generated. I'll leave this task open until that's done.

Apr 15 2020, 6:49 AM · Dumps-Generation

Apr 14 2020

ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

I am manually running 7zs on wikidatawiki for te bz2 files that have been produced, to make sure we'll have them at the end of the run. Screen session on snapshot1009. 7zs for enwiki are proceeding in screen session on snapshot1010. The regular wikidatawiki run is continuing as usual on snapshot1006.

Apr 14 2020, 4:15 PM · Dumps-Generation
ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

Done: arwiki commonswiki kowiki eswiki hewiki huwiki ukwiki metawiki dewiki jawiki zhwiki nlwiki plwiki viwiki ptwiki itwiki svwiki
In progress: ptwiki itwiki svwiki enwiki
Remaining: enwiki wikidatawiki

Apr 14 2020, 6:22 AM · Dumps-Generation

Apr 13 2020

ArielGlenn added a comment to T236431: Data dumps for the MachineVision extension.

The dumps ran on Saturday as expected and are now available, but we should add an entry to https://dumps.wikimedia.org/other/ describing these. See https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/dumps/files/web/html/other_index.html

Apr 13 2020, 9:42 PM · Security-Team, Structured-Data-Backlog, SDC-Statements (Machine-vision-depicts), Dumps-Generation, MachineVision
ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

I now understand the issue fully and have revised the patchset accordingly. Catchup will likely take a day or so on the wikis to fix.

Apr 13 2020, 2:05 PM · Dumps-Generation
ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

I am running the 7z job on the above wikis with the patchset on the testbed host snapshot1010 in a screen session, three wikis at a time. If I get very impatient I may do another 3 at a time on another host.

Apr 13 2020, 1:04 PM · Dumps-Generation
ArielGlenn added a comment to T250018: 7z version of en-wiki dump (20200401) pages-meta-history is not accessible via download link.

This ws a consequence of the file lister refactoring done recently, and although I have a fix I d o not fully understand what caused the coed to break. Still investigating.

Apr 13 2020, 11:53 AM · Dumps-Generation