Page MenuHomePhabricator

Milimetric (Dan Andreescu)
Staff Engineer (Data Engineering)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 5:48 PM (467 w, 3 d)
Availability
Available
IRC Nick
Milimetric
LDAP User
Milimetric
MediaWiki User
Milimetric (WMF) [ Global Accounts ]

Recent Activity

Fri, Sep 22

BTullis awarded T345948: Dan's review of Data Platform Engineering Maintainership Scope a Love token.
Fri, Sep 22, 9:39 PM · Data Products (Sprint 01)
Milimetric claimed T345948: Dan's review of Data Platform Engineering Maintainership Scope.

I burned through my budgeted hour (it was really two hours) on this and I added a bunch of links, details, cleanup, and I only got through like 15% of the sheet. I think my estimate of 12 hours is closer to what it would take to get through it, but marking done as per spike.

Fri, Sep 22, 3:48 PM · Data Products (Sprint 01)
Milimetric added a comment to T346684: Enable canary events for mediawiki.revision-visibility-change.

Also, while we're talking canaries, if it's just as easy, enabling them for all EventBus - sourced streams is a good idea. Otherwise we have the problem Xabriel explains above. Here's an example job using the page move events.

Fri, Sep 22, 3:37 PM · Data Engineering and Event Platform Team

Thu, Sep 21

Milimetric moved T345948: Dan's review of Data Platform Engineering Maintainership Scope from Sprint Backlog to In Process on the Data Products (Sprint 01) board.
Thu, Sep 21, 3:16 PM · Data Products (Sprint 01)
Milimetric added a comment to T346890: Windows 11 missing in analytics ?.

Ok, so the action here would be to label the data better, and add an annotation for Phase 5 and any other big changes.

Thu, Sep 21, 3:15 PM · Data Products, Data-Engineering-Dashiki, Data-Engineering
Milimetric moved T342213: Route to new AQS Knowledge Gaps endpoint from In Testing to Sign Off on the Data Products (Sprint 01) board.

AQS 1.0 is sending the required headers now, etag is enabled on all endpoints (not just knowledge gaps). Hugh, please verify and let us know if anything else needs to happen before we can route to the knowledge gaps endpoint. Thank you!

Thu, Sep 21, 3:06 PM · Data Products (Sprint 01), serviceops, Patch-For-Review, Code-Health-Objective, RESTBase Sunsetting
Milimetric added a comment to T309738: Move Mediawiki QueryPages computation to Hadoop.
  • Do we want/need a public-facing API? @Ladsgroup's use-case doesn't require one, is there demand for this elsewhere?
Thu, Sep 21, 2:00 PM · Patch-For-Review, DBA
Milimetric added a comment to T315902: New error "DB is set and has not been closed by the Load Balancer" for certain bad revisions during page content dumps.

I marked a couple of these as bad just to see what that process was like, see T346969

Thu, Sep 21, 10:03 AM · Platform Engineering, Dumps-Generation

Wed, Sep 20

Milimetric moved T346969: Clean up Bad Blobs from Active to Done on the Dumps-Generation board.
Wed, Sep 20, 6:09 PM · Dumps-Generation
Milimetric added a comment to T346969: Clean up Bad Blobs.

mwscript maintenance/findBadBlobs.php --wiki hrwiki --revisions 1705637

Wed, Sep 20, 6:09 PM · Dumps-Generation
Milimetric claimed T346969: Clean up Bad Blobs.

mwscript maintenance/findBadBlobs.php --wiki azwiki --revisions 413206,413238,413328

Wed, Sep 20, 6:07 PM · Dumps-Generation
Milimetric created T346969: Clean up Bad Blobs.
Wed, Sep 20, 6:06 PM · Dumps-Generation
Milimetric added a comment to T346890: Windows 11 missing in analytics ?.

Also related, T342267: Investigate surprising "10% Other" portion of Analytics Browsers report which really needs some love as well.

Wed, Sep 20, 2:38 PM · Data Products, Data-Engineering-Dashiki, Data-Engineering
Milimetric added a project to T346890: Windows 11 missing in analytics ?: Data Products.

I vaguely remember this thing in 2018... Windows did get grouped up, but I agree with the DJ's points and that this data makes no sense without at least some kind of annotation.

Wed, Sep 20, 2:38 PM · Data Products, Data-Engineering-Dashiki, Data-Engineering

Tue, Sep 19

Milimetric added a comment to T345441: Decide on data required for launch.

to keep the archives happy, I talked to Fabian on Monday and answered this question - yes, all-projects means all wikis. For aggregates at the project family level, for example "all wikipedias", we use all-wikipedia-projects (see wikistats example)

Tue, Sep 19, 9:04 PM · Research
Milimetric claimed T342213: Route to new AQS Knowledge Gaps endpoint.

Ok, I hear Ben's concerns but kind of decided to risk updating everything at once (because it's easier to roll back now than when we move to AQS 2.0). When reviewed, deployed, and validated, I will move this to blocked until Hugh can review.

Tue, Sep 19, 2:50 PM · Data Products (Sprint 01), serviceops, Patch-For-Review, Code-Health-Objective, RESTBase Sunsetting
Milimetric moved T346165: clouddumps100[12] puppet alert: "Puppet performing a change on every puppet run" from Backlog to Done on the Dumps-Generation board.
Tue, Sep 19, 10:44 AM · cloud-services-team, Dumps-Generation, Data-Platform-SRE
Milimetric added a comment to T346165: clouddumps100[12] puppet alert: "Puppet performing a change on every puppet run".

Ok, to resolve I'm going to erase this dvd.html file from all the dumpsdata hosts as per docs:

Tue, Sep 19, 10:44 AM · cloud-services-team, Dumps-Generation, Data-Platform-SRE
Milimetric moved T346279: [Spike] Figure out what are good indicators for dumps data quality from Sprint Backlog to In Process on the Data Products (Sprint 01) board.
Tue, Sep 19, 10:06 AM · Dumps 2.0, Data Products (Sprint 01)
Milimetric assigned T346279: [Spike] Figure out what are good indicators for dumps data quality to JEbe-WMF.
Tue, Sep 19, 10:06 AM · Dumps 2.0, Data Products (Sprint 01)
Milimetric edited projects for T346279: [Spike] Figure out what are good indicators for dumps data quality, added: Data Products (Sprint 01); removed Data Products.
Tue, Sep 19, 10:05 AM · Dumps 2.0, Data Products (Sprint 01)
Milimetric added a comment to T346646: Move wmf_dumps.wikitext_rc1 to the correct HDFS directory.

Moving is fine, let's not make a new RC until we have a new schema

Tue, Sep 19, 8:47 AM · Data Products (Sprint 01), Dumps 2.0
Milimetric added a comment to T346281: Figure out if an intermediary table while backfilling is beneficial.

Hmmm... no my prompt for that would be something more like "in the theme of Tron crossed with Lawnmower man but replacing any sadness or darkness with joy and dance"

Tue, Sep 19, 8:45 AM · Data Products (Sprint 01)

Mon, Sep 18

Milimetric moved T342213: Route to new AQS Knowledge Gaps endpoint from In Process to Ready for Code Review on the Data Products (Sprint 01) board.

@hnowlan: TL;DR; do you see the cache-control that AQS is already setting and do we need an ETag header or is that just a nice to have?

Mon, Sep 18, 8:46 PM · Data Products (Sprint 01), serviceops, Patch-For-Review, Code-Health-Objective, RESTBase Sunsetting
Milimetric created T346679: Surface Temporary user information to Cloud Wiki Replicas.
Mon, Sep 18, 6:40 PM · Data-Engineering, cloud-services-team, Data-Services
Milimetric moved T342213: Route to new AQS Knowledge Gaps endpoint from Sprint Backlog to In Process on the Data Products (Sprint 01) board.
Mon, Sep 18, 4:01 PM · Data Products (Sprint 01), serviceops, Patch-For-Review, Code-Health-Objective, RESTBase Sunsetting
Milimetric claimed T342213: Route to new AQS Knowledge Gaps endpoint.
Mon, Sep 18, 4:01 PM · Data Products (Sprint 01), serviceops, Patch-For-Review, Code-Health-Objective, RESTBase Sunsetting
Milimetric moved T345208: [Spike] Identify and mitigate risks associated with MediaWiki History pipeline from BLOCKED to In Process on the Data Products (Sprint 01) board.
Mon, Sep 18, 3:42 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team
Milimetric moved T344572: Add punjabi language in Wikistats from Incoming to Data Products & Metrics on the Data-Engineering board.

This is now done

Mon, Sep 18, 3:00 PM · Data-Engineering, Data-Engineering-Wikistats
Milimetric awarded T266641: Test Alluxio as cache layer for Presto a Party Time token.
Mon, Sep 18, 2:45 PM · Data-Platform-SRE, Data-Engineering
Milimetric added a comment to T342588: Requesting access to analytics-privatedata-users for Nat Hillard.

(sorry this slipped through)

Mon, Sep 18, 2:42 PM · SRE, SRE-Access-Requests
Milimetric awarded T250065: Store tabular data in a format that's machine-readable and can be shared between wikis a Love token.
Mon, Sep 18, 2:30 PM · WMF-General-or-Unknown, Crosswiki, covid-19

Fri, Sep 15

Milimetric awarded T343320: Request Access to Superset querying presto_analytics_hive datasets a Like token.
Fri, Sep 15, 8:04 PM · Product-Analytics, SRE-Access-Requests, SRE, CommRel-Specialists-Support (Jul-Sep-2023)
Milimetric added a comment to T346281: Figure out if an intermediary table while backfilling is beneficial.

I know they're just computers and they don't have feelings and stuff, but something about this makes me so happy, just picturing free RAM and CPU resources frolicking in the YARN clouds...

Fri, Sep 15, 8:00 PM · Data Products (Sprint 01)
Milimetric added a comment to T345877: Requesting shell access, deployment and analytics-privatedata-users rights for acooper.

approved!

Fri, Sep 15, 7:37 PM · SRE-Access-Requests, SRE
Milimetric added a comment to T336715: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices.
  1. we need to find a way to tag the prefetch proxy traffic. Ideally in webrequest and other derived pageview tables for easy analysis.
    • this is possible using the 'Sec-Purpose: prefetch; anonymous-client-ip' request header.
    • note that we would not want to change any of our existing dimensions (like agent_type) to indicate prefetch pageviews since this will break our reporting and has consequences on Superset dashboards. Instead find a way to store this in any existing field or create new field
Fri, Sep 15, 7:33 PM · Movement-Insights, Research-Freezer, Data-Engineering, Product-Analytics
Milimetric added a project to T336544: Codex, Graph, and Wikistats walk into a bar graph: Data Products.

When this gets prioritized, we can make it into a proper epic and break it down, but if anybody else wants to take any piece of it, please don't let this stop you.

Fri, Sep 15, 3:19 PM · Data Products, Data-Engineering

Wed, Sep 13

Milimetric moved T345208: [Spike] Identify and mitigate risks associated with MediaWiki History pipeline from Sprint Backlog to In Process on the Data Products (Sprint 01) board.
Wed, Sep 13, 2:23 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team
Milimetric moved T344690: [Spike] Quantify pages and revisions as relevant to dumps from Sprint Backlog to Ready for Code Review on the Data Products (Sprint 01) board.
Wed, Sep 13, 2:23 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14)
Milimetric moved T344919: [SPIKE] Model Impact of IP masking on datasets from Sprint Backlog to Ready for Code Review on the Data Products (Sprint 01) board.
Wed, Sep 13, 2:23 PM · Data Products (Sprint 01)
Milimetric edited projects for T344690: [Spike] Quantify pages and revisions as relevant to dumps, added: Data Products (Sprint 01); removed Data Products (Sprint 00).
Wed, Sep 13, 2:23 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14)
Milimetric edited projects for T344919: [SPIKE] Model Impact of IP masking on datasets, added: Data Products (Sprint 01); removed Data Products (Sprint 00).
Wed, Sep 13, 2:23 PM · Data Products (Sprint 01)
Milimetric edited projects for T345208: [Spike] Identify and mitigate risks associated with MediaWiki History pipeline, added: Data Products (Sprint 01); removed Data Products (Sprint 00).
Wed, Sep 13, 2:22 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team
Milimetric added a comment to T336084: [SPIKE] Model impact of User-Agent deprecation on top line metrics.

It makes sense, @mforns, it's just a little strange, I would've expected the minor versions to be in the first 200 characters of the strings and indeed we saw a drop in the UA string entropy, so it's strange that this doesn't affect everything else. But I believe you and I'll just file that away with other life curiosities :)

Wed, Sep 13, 10:00 AM · Data Products (Sprint 01), Data Pipelines (Sprint 14), Google-Chrome-User-Agent-Deprecation, Product-Analytics (Kanban), Data-Engineering

Tue, Sep 12

Milimetric added a comment to T336573: PHP Warning: XMLReader::read(): Memory allocation failed : growing input buffer.

quick recap of cleanup:

Tue, Sep 12, 6:05 PM · Unstewarded-production-error, Dumps-Generation, Wikimedia-production-error

Mon, Sep 11

Milimetric added a comment to T336573: PHP Warning: XMLReader::read(): Memory allocation failed : growing input buffer.

running a manual version on a screen, as the dumpsgen user: 14381.pts-0.snapshot1009

Mon, Sep 11, 3:29 PM · Unstewarded-production-error, Dumps-Generation, Wikimedia-production-error
Milimetric added a comment to T345874: XMLDumps broken on deployment-mwmaint02 due to Jade Extension related content.

@Ladsgroup: would you prefer that to the settings change? I'm happy to delete, but as I understood the maintenance delete scripts won't work without a content handler. So I guess I could update all the content models to json and then delete?

Mon, Sep 11, 2:21 PM · Dumps-Generation, MediaWiki-ContentHandler, Beta-Cluster-Infrastructure

Fri, Sep 8

Milimetric added a comment to T344691: [Spike] Understand how "large" pages (with lots of revisions) are problematic when writing XML to Hadoop.

I spoke to Antoine, and it turns out this was not really the biggest issue, some spark tuning shrugged off the problem. There are lots of other super interesting details in the XML publishing machinery that's built as part of T335862: Implement job to generate Dump XML files. See code here: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/938941/

Fri, Sep 8, 8:38 PM · Data Products (Sprint 00), Data Pipelines (Sprint 14)
Milimetric moved T344691: [Spike] Understand how "large" pages (with lots of revisions) are problematic when writing XML to Hadoop from Sprint Backlog to Done on the Data Products (Sprint 00) board.
Fri, Sep 8, 8:35 PM · Data Products (Sprint 00), Data Pipelines (Sprint 14)
Milimetric moved T344919: [SPIKE] Model Impact of IP masking on datasets from In Process to Ready for Code Review on the Data Products (Sprint 00) board.
Fri, Sep 8, 8:35 PM · Data Products (Sprint 01)
Milimetric moved T344690: [Spike] Quantify pages and revisions as relevant to dumps from In Process to Ready for Code Review on the Data Products (Sprint 00) board.
Fri, Sep 8, 8:34 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14)
Milimetric added a comment to T344690: [Spike] Quantify pages and revisions as relevant to dumps.

The above gives us 524,283,851 pages across all projects and namespaces to play with.

Fri, Sep 8, 8:34 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14)
Milimetric added a project to T345950: Update Webrequest docs or job: Data Products.
Fri, Sep 8, 7:38 PM · Data Products
Milimetric created T345950: Update Webrequest docs or job.
Fri, Sep 8, 7:38 PM · Data Products

Thu, Sep 7

Milimetric moved T335862: Implement job to generate Dump XML files from In Process to In code review / Tech Input on the Data Products (Sprint 00) board.
Thu, Sep 7, 1:20 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)
Milimetric moved T335862: Implement job to generate Dump XML files from In Progress to In Review on the Data Pipelines (Sprint 14) board.
Thu, Sep 7, 1:20 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)
Milimetric added a comment to T345633: Requesting access to analytics-privatedata-users and ops for brouberol.

Approved! I think maybe you also need analytics-admins as per data access docs

Thu, Sep 7, 1:12 PM · SRE, SRE-Access-Requests

Wed, Sep 6

Milimetric added a subtask for T299947: Normalize pagelinks table: T345771: Adapt Sqoop to pagelinks schema change.
Wed, Sep 6, 7:17 PM · Platform Engineering, MediaWiki-Page-derived-data
Milimetric added a parent task for T345771: Adapt Sqoop to pagelinks schema change: T299947: Normalize pagelinks table.
Wed, Sep 6, 7:17 PM · Data Products
Milimetric created T345771: Adapt Sqoop to pagelinks schema change.
Wed, Sep 6, 7:17 PM · Data Products
Sj awarded T336544: Codex, Graph, and Wikistats walk into a bar graph a Love token.
Wed, Sep 6, 12:29 PM · Data Products, Data-Engineering

Tue, Sep 5

Milimetric moved T344690: [Spike] Quantify pages and revisions as relevant to dumps from Sprint Backlog to In Process on the Data Products (Sprint 00) board.
Tue, Sep 5, 12:19 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14)

Fri, Sep 1

Milimetric added a comment to T344690: [Spike] Quantify pages and revisions as relevant to dumps.

First, some setup.

Fri, Sep 1, 3:07 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14)
Milimetric moved T344690: [Spike] Quantify pages and revisions as relevant to dumps from Next Up to In Progress on the Data Pipelines (Sprint 14) board.
Fri, Sep 1, 10:11 AM · Data Products (Sprint 01), Data Pipelines (Sprint 14)

Thu, Aug 31

Milimetric added projects to T345385: Epic: Quality of new Dumps 2.0 output: Data Products, Data Engineering and Event Platform Team.
Thu, Aug 31, 5:35 PM · Dumps 2.0, Data Engineering and Event Platform Team, Data Products
Milimetric created T345385: Epic: Quality of new Dumps 2.0 output.
Thu, Aug 31, 5:34 PM · Dumps 2.0, Data Engineering and Event Platform Team, Data Products
Milimetric moved T336400: AQS 2.0: Geo Analytics Service Deploy to Staging and production from Ready for Testing to Sign Off on the Data Products (Sprint 00) board.
Thu, Aug 31, 12:09 PM · Data Products (Sprint 00), AQS2.0, Patch-For-Review
Milimetric moved T336400: AQS 2.0: Geo Analytics Service Deploy to Staging and production from In Process to Ready for Testing on the Data Products (Sprint 00) board.
Thu, Aug 31, 12:08 PM · Data Products (Sprint 00), AQS2.0, Patch-For-Review
Milimetric moved T336380: AQS 2.0: Media Analytics Service - Deploy to staging and production from In Process to Ready for Testing on the Data Products (Sprint 00) board.
Thu, Aug 31, 12:07 PM · Patch-For-Review, Data Products (Sprint 01)
Milimetric moved T330355: Incorporate librarized Metrics Platform Java client into the Android app from Sign Off to Done on the Data Products (Sprint 00) board.
Thu, Aug 31, 12:04 PM · Data Products (Sprint 00), Patch-For-Review, Metrics Platform Backlog (Metrics Platform Kanban)

Wed, Aug 30

Milimetric added a comment to T345183: Implement a mechanism for figuring out suppressed data on backfills.

Parking some links that will be useful to this work:

Wed, Aug 30, 7:49 PM · Data Products (Sprint 01), Dumps 2.0
Milimetric claimed T345208: [Spike] Identify and mitigate risks associated with MediaWiki History pipeline.
Wed, Aug 30, 7:40 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team

Tue, Aug 29

Milimetric created T345208: [Spike] Identify and mitigate risks associated with MediaWiki History pipeline.
Tue, Aug 29, 8:06 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team
Milimetric updated the task description for T344919: [SPIKE] Model Impact of IP masking on datasets.
Tue, Aug 29, 7:54 PM · Data Products (Sprint 01)

Mon, Aug 28

Milimetric set the point value for T344919: [SPIKE] Model Impact of IP masking on datasets to 3.
Mon, Aug 28, 2:11 PM · Data Products (Sprint 01)
Milimetric moved T335862: Implement job to generate Dump XML files from Sprint Backlog to In Process on the Data Products (Sprint 00) board.
Mon, Aug 28, 1:17 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)
Milimetric reassigned T336084: [SPIKE] Model impact of User-Agent deprecation on top line metrics from Milimetric to mforns.
Mon, Aug 28, 1:15 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14), Google-Chrome-User-Agent-Deprecation, Product-Analytics (Kanban), Data-Engineering
Milimetric assigned T336411: AQS 2.0: Geo Analytics service - configure routing in staging and production to hnowlan.
Mon, Aug 28, 1:11 PM · Data Products (Sprint 01)
Milimetric moved T330355: Incorporate librarized Metrics Platform Java client into the Android app from Paused to BLOCKED on the Data Products (Sprint 00) board.
Mon, Aug 28, 1:10 PM · Data Products (Sprint 00), Patch-For-Review, Metrics Platform Backlog (Metrics Platform Kanban)
Milimetric moved T340702: [SPIKE] Experiment with flattening custom data object from In code review / Tech Input to Sign Off on the Data Products (Sprint 00) board.
Mon, Aug 28, 1:07 PM · Data Products (Sprint 00), Product-Analytics (Kanban), Metrics Platform Backlog (Metrics Platform Kanban)
Milimetric moved T336377: AQS 2.0: Media Analytics Service - Deployment pipeline integration from Sign Off to Done on the Data Products (Sprint 00) board.
Mon, Aug 28, 1:05 PM · Data Products (Sprint 00), AQS2.0
Milimetric moved T336393: AQS 2.0: Geo Analytics Service deployment pipeline integration from Sign Off to Done on the Data Products (Sprint 00) board.
Mon, Aug 28, 1:04 PM · Data Products (Sprint 00), AQS2.0

Aug 22 2023

Milimetric moved T335862: Implement job to generate Dump XML files from Sprint 0 to Sprint 00 on the Data Products board.
Aug 22 2023, 2:18 PM · Data Products (Sprint 01), Data Engineering and Event Platform Team (Sprint 2), Patch-For-Review, Data Pipelines (Sprint 14)
Milimetric moved T340861: Implement a backfill job for the dumps hourly table from Ready for Code Review/ Ready for Tech input to Blocked/Paused on the Data Products (Sprint 0) board.
Aug 22 2023, 1:42 PM · Data Products (Sprint 01)
Milimetric moved T340861: Implement a backfill job for the dumps hourly table from Blocked/Paused to Ready for Code Review/ Ready for Tech input on the Data Products (Sprint 0) board.
Aug 22 2023, 1:42 PM · Data Products (Sprint 01)
Milimetric moved T336400: AQS 2.0: Geo Analytics Service Deploy to Staging and production from Blocked/Paused to In Progress on the AQS2.0 (Sprint 10) board.
Aug 22 2023, 1:38 PM · Data Products (Sprint 00), AQS2.0, Patch-For-Review
Milimetric moved T336400: AQS 2.0: Geo Analytics Service Deploy to Staging and production from In Progress to Blocked/Paused on the AQS2.0 (Sprint 10) board.
Aug 22 2023, 1:38 PM · Data Products (Sprint 00), AQS2.0, Patch-For-Review
Milimetric moved T343325: Develop Dumps Triage Runbook from In code Review/ Tech Input to Sign off on the Data Products (Sprint 0) board.
Aug 22 2023, 1:29 PM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering
Milimetric created T344693: Understand Hadoop OutputFormat and how to solve the problem.
Aug 22 2023, 1:17 PM · Dumps 2.0, Data Products, Data Pipelines (Sprint 14)
Milimetric created T344691: [Spike] Understand how "large" pages (with lots of revisions) are problematic when writing XML to Hadoop.
Aug 22 2023, 1:15 PM · Data Products (Sprint 00), Data Pipelines (Sprint 14)
Milimetric created T344690: [Spike] Quantify pages and revisions as relevant to dumps.
Aug 22 2023, 1:13 PM · Data Products (Sprint 01), Data Pipelines (Sprint 14)

Aug 17 2023

Milimetric assigned T336714: Design and implement tables to store parsed content from mediawiki.page_content_change to xcollazo.
Aug 17 2023, 5:02 PM · Data Products (Sprint 00)

Aug 8 2023

Milimetric created T343793: Mediarequests top articles: should use a disallow filter just like top articles.
Aug 8 2023, 11:33 AM · Data-Engineering

Aug 7 2023

Milimetric added a comment to T342593: Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service.

Just a random drive-by note, since I'm not the one playing with this, but it might be interesting to instrument EventBus a little bit. For example, from the deferred job that publishes to Kafka, we could log a basic key for each event that we publish. It should be possible to aggregate these logs and compare them against what we see in Kafka to figure out what we missed, perhaps even facilitate retries.

Aug 7 2023, 5:50 PM · Event-Platform, Data-Engineering, Data Engineering and Event Platform Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Aug 5 2023

Milimetric added a comment to T343556: Improve "Wikistats: Pageview complete dumps" readme page.

Thank you for filing this @VeniVidiVicipedia! We're going through a reorg so things are in a bit of a messy state right now. Bear with us as we triage

Aug 5 2023, 4:43 PM · Data Products, Datasets-General-or-Unknown
Milimetric added a project to T343556: Improve "Wikistats: Pageview complete dumps" readme page: Data Products.
Aug 5 2023, 4:41 PM · Data Products, Datasets-General-or-Unknown

Aug 1 2023

Milimetric awarded T340702: [SPIKE] Experiment with flattening custom data object a Barnstar token.
Aug 1 2023, 7:25 PM · Data Products (Sprint 00), Product-Analytics (Kanban), Metrics Platform Backlog (Metrics Platform Kanban)
Milimetric added a comment to T341134: Investigate drift between `dt` and `meta.dt`.

My ramblings that got me to the null edits, might be useful for someone else testing that there is no other source of unexpected drift:

Aug 1 2023, 1:39 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Data Products (Sprint 0), Patch-For-Review

Jul 27 2023

Milimetric created T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow.
Jul 27 2023, 5:10 PM · Data-Engineering

Jul 25 2023

Milimetric moved T341134: Investigate drift between `dt` and `meta.dt` from Incoming to Sprint 0 on the Data Products board.
Jul 25 2023, 1:43 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Data Products (Sprint 0), Patch-For-Review