Page MenuHomePhabricator

Milimetric (Dan Andreescu)
Staff Engineer (Data Engineering)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 5:48 PM (492 w, 5 d)
Availability
Available
IRC Nick
Milimetric
LDAP User
Milimetric
MediaWiki User
Milimetric (WMF) [ Global Accounts ]

Recent Activity

Sun, Mar 17

rokejulianlockhart awarded T249419: RFC: Render data visualizations on the server a Like token.
Sun, Mar 17, 7:02 PM · Wikimedia-Performance-recommendation, JavaScript, MediaWiki-extensions-Graph, covid-19, TechCom-RFC

Feb 5 2024

Mayakp.wiki awarded T333223: Adding user_is_temp to the user table a Barnstar token.
Feb 5 2024, 7:50 PM · MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), MW-1.41-notes (1.41.0-wmf.10; 2023-05-23), Anti-Harassment, Data-Persistence, Data-Engineering, Temporary accounts

Jan 9 2024

Milimetric set the point value for T353296: Netherlands appears twice as "The Netherlands" or "Netherlands" in country coded data to 3.
Jan 9 2024, 6:14 PM · Movement-Insights, Data Products (Data Products Sprint 05), Data-Platform
Milimetric set the point value for T352793: MediaWiki History Plan: Maintenance Plan to 2.
Jan 9 2024, 6:13 PM · Data Products (Data Products Sprint 05)
Milimetric set the point value for T352790: MediaWiki History Plan: use cases and potential work to 2.
Jan 9 2024, 6:13 PM · Data Products (Data Products Sprint 05)
Milimetric assigned T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow to mforns.
Jan 9 2024, 1:20 PM · Data Products (Data Products Sprint 11), Movement-Metrics, Movement-Insights
Milimetric moved T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow from Sprint Backlog to In Process on the Data Products (Data Products Sprint 05) board.
Jan 9 2024, 1:20 PM · Data Products (Data Products Sprint 11), Movement-Metrics, Movement-Insights
Milimetric added a project to T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow: Data Products (Data Products Sprint 05).
Jan 9 2024, 1:19 PM · Data Products (Data Products Sprint 11), Movement-Metrics, Movement-Insights

Jan 8 2024

Milimetric updated subscribers of T342911: Data Quality Issue: Wikitext History Job fail / rerun in Airflow.

@VirginiaPoundstone this issue came up again (thanks very much to @xcollazo who remembered this task). I support option b) in Xabriel's plan above, and I think this should be triaged with high importance as a production issue. This table is used by lots of people and it seems to me it'll keep failing. If the folks looking into it don't remember this, it's a lot of time wasted.

Jan 8 2024, 9:02 PM · Data Products (Data Products Sprint 11), Movement-Metrics, Movement-Insights
Milimetric added a comment to T353956: Traffic anomaly detection triggers alerts because of a MaxMind Country rename.

Quick mention of this other task where some of the work took place: T353296. Relevant to this, the gerrit change https://gerrit.wikimedia.org/r/c/analytics/refinery/+/982899 included updates to the following pipelines/datasets:

Jan 8 2024, 5:39 PM · Data Products (Data Products Sprint 05)

Jan 4 2024

Milimetric added a comment to T354074: Wikistats - incorrect number of content articles for Latvian Wikipedia .

TL;DR; the data pipeline up to AQS seems fine, my guess is we're not filtering properly to exclude redirects in AQS 2, timeline corresponds with the reported problem. Sorry for the inconvenience, working on a fix.

Jan 4 2024, 9:12 PM · Data Products (Data Products Sprint 07), Data-Engineering, Analytics, Data-Engineering-Wikistats
Milimetric added a comment to T346463: Identify and label prefetch proxy data in our traffic.

@Mayakp.wiki the patch to watch is: https://gerrit.wikimedia.org/r/c/operations/puppet/+/981352/. This has not yet been merged and deployed. When it is, you'll start seeing the changes in x_analytics.

Jan 4 2024, 2:57 PM · Traffic, Movement-Insights, Data-Engineering
Milimetric added a comment to T307040: Propagate field descriptions from event schemas to Hive event tables.

Datahub allows you to add descriptions at sub-field level. We should at some point get to consensus about where we want all this description stuff to live. We talked about:

Jan 4 2024, 2:46 PM · Patch-For-Review, Product-Analytics, Data-Engineering

Dec 22 2023

xcollazo awarded T352793: MediaWiki History Plan: Maintenance Plan a Pterodactyl token.
Dec 22 2023, 7:28 PM · Data Products (Data Products Sprint 05)

Dec 12 2023

Milimetric updated subscribers of T312566: Emit lineage information about Airflow jobs to DataHub.

Quick recap for anyone looking to implement lineage. First, a note regarding lineage as part of centralized configuration. I think this would be very useful, and I'm in no way suggesting that we slow down on the work that @JAllemandou and @lbowmaker are leading on that front. The reality is that a centralized config may take a few more months to get implemented. In the meantime, we could instrument lineage in the airflow DAGs in a few minutes per DAG. Done in a standard way, this would be very easy to migrate to centralized config. In addition, as we implement this we may find exceptions and edge cases that would inform the centralized config. If anyone disagrees with anything here, you are very welcome, please don't take this as a "decision". Just a thought. If we agree with this and there's some slow-down to migrate back to the centralized config, I hereby promise that I'll do it myself on all DAGs.

Dec 12 2023, 8:06 PM · Data-Engineering, Data-Catalog
Milimetric added a comment to T351117: Move analytics log from Varnish to HAProxy.

Hi @Milimetric sorry for the late reply, I'll try to answer to your question but consider we're still investigating about all pro and cons of this "migration", and for sure we'll share our thought and our action plan before moving on with this...

Dec 12 2023, 4:17 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

The following is a quick rundown of what I would think about if something goes wrong, and how I would check.

Dec 12 2023, 3:56 PM · Data Products (Data Products Sprint 05)

Dec 11 2023

Milimetric moved T352790: MediaWiki History Plan: use cases and potential work from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 05) board.

A full list of current use cases could only be compiled by reaching out to researchers who download this dataset. Limited to what we know, current use cases are roughly:

Dec 11 2023, 9:21 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352790: MediaWiki History Plan: use cases and potential work.

MediaWiki History is described in detail in the following places:

Dec 11 2023, 9:00 PM · Data Products (Data Products Sprint 05)
Milimetric moved T352793: MediaWiki History Plan: Maintenance Plan from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 05) board.
Dec 11 2023, 8:59 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

The algorithm is explained at length starting here.

Dec 11 2023, 8:59 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

A shortened and updated list of Changes and Known Problems.

Dec 11 2023, 8:56 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T352793: MediaWiki History Plan: Maintenance Plan.

MediaWiki History is described in detail in the following places:

Dec 11 2023, 8:32 PM · Data Products (Data Products Sprint 05)
Milimetric claimed T352790: MediaWiki History Plan: use cases and potential work.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric claimed T352793: MediaWiki History Plan: Maintenance Plan.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric moved T352790: MediaWiki History Plan: use cases and potential work from Sprint Backlog to In Process on the Data Products (Data Products Sprint 05) board.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric moved T352793: MediaWiki History Plan: Maintenance Plan from Sprint Backlog to In Process on the Data Products (Data Products Sprint 05) board.
Dec 11 2023, 8:01 PM · Data Products (Data Products Sprint 05)
Milimetric added a comment to T353134: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2023-11-27.

wmf_raw.mediawiki_pagelinks and wmf_raw.mediawiki_page_props is available with snapshot 2023-11

Dec 11 2023, 3:00 PM · Discovery-Search (Current work), Data-Engineering, Structured-Data-Backlog, Image-Suggestions, CirrusSearch

Dec 8 2023

Milimetric updated subscribers of T333716: "Active editors by country" doesn't display numbers for Belarus, Kazakhstan, Russia.

I agree, @stjn, hopefully that's not as hyper-urgent and maybe @VirginiaPoundstone + @lbowmaker can triage.

Dec 8 2023, 7:10 PM · Russian-Sites, Data-Engineering, Data-Engineering-Wikistats

Dec 7 2023

Milimetric added a comment to T339318: Indicate that some country data are unavailable on Wikistats.

I'm really sorry this didn't get through the pipeline sooner, someone only told me about the issue last week. Had I known sooner I would have made the fix sooner. We are going to bring this up in our retro.

Dec 7 2023, 2:49 PM · Trust-and-Safety, Russian-Sites, Data-Engineering-Wikistats, Data-Engineering
Milimetric added a comment to T333716: "Active editors by country" doesn't display numbers for Belarus, Kazakhstan, Russia.

@Milimetric: this is great, but I think it should be also indicated under the map that some countries do not have any results, so people can see this easier. For example, page view stats have this in the bottom: Those countries with less than 100 views are not reported and are blank in the map. Seems like the absence of data for privacy reasons is good to report there as well. Can you also add that?

Dec 7 2023, 2:47 PM · Russian-Sites, Data-Engineering, Data-Engineering-Wikistats

Dec 6 2023

Milimetric added a comment to T333716: "Active editors by country" doesn't display numbers for Belarus, Kazakhstan, Russia.

The above patches do what I suggested in a comment on the talk page: https://meta.wikimedia.org/wiki/Talk:Requests_for_comment/Hiding_the_number_of_Russian/Belorussian/Kazakh_contributors_on_the_statistics_map which is to gray out the countries currently on the protection list and explain that the data is hidden. If and when the country list chagnes, we should update this or make it more reactive to the data itself.

Dec 6 2023, 10:05 PM · Russian-Sites, Data-Engineering, Data-Engineering-Wikistats
Ladsgroup awarded T249419: RFC: Render data visualizations on the server a Love token.
Dec 6 2023, 7:45 PM · Wikimedia-Performance-recommendation, JavaScript, MediaWiki-extensions-Graph, covid-19, TechCom-RFC
Milimetric updated subscribers of T352879: Update the sqoop configuration for mediawiki to obtain linktarget from the production replicas, instead of wikireplicas.

Sqooping from the production replicas would mean applying the same sanitization rules on our side. I see the filter here is:

Dec 6 2023, 4:31 PM · Data-Platform-SRE, Data-Engineering
Milimetric added a comment to T346463: Identify and label prefetch proxy data in our traffic.

This is the varnish code (VCL) that does analytics-y things to create and update the X-analytics header. Adding stuff here would prevent us from having to change varnishkafka. Or maybe I misunderstood the whole thing, which is always possible in Varnish land :)

Dec 6 2023, 12:10 PM · Traffic, Movement-Insights, Data-Engineering

Dec 5 2023

Milimetric updated subscribers of T352650: Migrate current-generation dumps to run from our containerized images.

This sounds like it would work... but I do want to point out a potential maintenance issue:

Dec 5 2023, 5:07 PM · MW-on-K8s, Dumps-Generation, Release-Engineering-Team, serviceops
Milimetric created T352793: MediaWiki History Plan: Maintenance Plan.
Dec 5 2023, 4:54 PM · Data Products (Data Products Sprint 05)
Milimetric created T352790: MediaWiki History Plan: use cases and potential work.
Dec 5 2023, 4:49 PM · Data Products (Data Products Sprint 05)
Milimetric renamed T352787: [Sprint 05 GOAL] MediaWiki History Knowledge Hub from [User Story] <title> to [User Story] MediaWiki History Plan.
Dec 5 2023, 4:43 PM · Data Products (Data Products Sprint 05)
Milimetric created T352787: [Sprint 05 GOAL] MediaWiki History Knowledge Hub.
Dec 5 2023, 4:43 PM · Data Products (Data Products Sprint 05)

Nov 30 2023

Milimetric added a comment to T351909: Duplicate keys in x_analytics header corrupt some wmf_raw.webrequest rows and break refinement of wmf.webrequest.

Is it possible to have the monitoring log some information about the rows such that we can figure out where they're coming from?

Nov 30 2023, 4:36 PM · Data Products (Data Products Sprint 04), Data-Engineering

Nov 29 2023

Milimetric moved T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint from In Process to Sign Off on the Data Products (Data Products Sprint 04) board.
Nov 29 2023, 6:37 PM · Data Products (Data Products Sprint 04)
Milimetric updated the task description for T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint.
Nov 29 2023, 6:37 PM · Data Products (Data Products Sprint 04)

Nov 28 2023

Milimetric added a comment to T169027: Provide iframe sandboxing for rich-media extensions (defense in depth).

I would like to emphatically support Timo in T169027#9362252 here. And just to re-state what I think is the most critical part of the argument:

Nov 28 2023, 7:23 PM · MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Security, Technical-Debt, MediaWiki-File-management, Commons, Multimedia
Milimetric moved T351909: Duplicate keys in x_analytics header corrupt some wmf_raw.webrequest rows and break refinement of wmf.webrequest from In Process to Done on the Data Products (Data Products Sprint 04) board.

merged and deployed right now, used to fix another instance of the webrequest duplicate map key failures. Note for future selves: it would be good to figure out where these are coming from still.

Nov 28 2023, 6:30 PM · Data Products (Data Products Sprint 04), Data-Engineering

Nov 27 2023

Milimetric reassigned T347953: Spike : AQS 2.0 Versioning Options from Milimetric to SGupta-WMF.
Nov 27 2023, 5:12 PM · Data Products (Data Products Sprint 04), AQS2.0
Milimetric moved T347953: Spike : AQS 2.0 Versioning Options from Code Review / Tech Input to Sign Off on the Data Products (Data Products Sprint 04) board.
Nov 27 2023, 5:12 PM · Data Products (Data Products Sprint 04), AQS2.0
Milimetric added a comment to T351117: Move analytics log from Varnish to HAProxy.

Besides the great discussion above, I just want to point out some related things.

Nov 27 2023, 5:07 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Nov 20 2023

Milimetric added a comment to T347953: Spike : AQS 2.0 Versioning Options.

@SGupta-WMF may I please have permissions to the doc too? Will asked me to review

Nov 20 2023, 5:33 PM · Data Products (Data Products Sprint 04), AQS2.0
Milimetric moved T347998: Commons Impact Metrics - Implement prototype from Done to Sign Off on the Data Products (Data Products Sprint 04) board.
Nov 20 2023, 5:12 PM · Data Products (Data Products Sprint 04)
Milimetric moved T347998: Commons Impact Metrics - Implement prototype from In Process to Done on the Data Products (Data Products Sprint 04) board.
Nov 20 2023, 5:12 PM · Data Products (Data Products Sprint 04)
Milimetric set the point value for T351195: WikimediaEvents: Remove partial migration of *UIActions instrument to 3.
Nov 20 2023, 5:10 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Technical-Debt, Data Products (Data Products Sprint 04), MediaWiki-extensions-WikimediaEvents, good first task
Milimetric moved T348571: Create tasks for remaining Dumps work with Product Manager from Paused to Sprint Backlog on the Data Products (Data Products Sprint 04) board.
Nov 20 2023, 5:07 PM · Data Products

Nov 16 2023

Milimetric added a comment to T351388: Add a spark global config for better file commit strategy.

+1 for leaving writing to Hive tables alone (and erring towards correctness and jobs failing and hopefully comments that we can find)
+1 to instead focusing on the Iceberg migration

Nov 16 2023, 9:58 PM · Data-Engineering (Sprint 5), Data-Platform-SRE
Milimetric added a comment to T342487: [Event Platform] Actor performing suppression revealed publicly.

My apologies for the late review, +1 to Scott's point of resolving this and making it public.

Nov 16 2023, 5:31 PM · Data-Engineering (Sprint 6), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), SecTeam-Processed, Privacy Engineering, Event-Platform, Vuln-Infoleak, Security
Milimetric moved T345874: XMLDumps broken on deployment-mwmaint02 due to Jade Extension related content from Backlog to Done on the Dumps-Generation board.
Nov 16 2023, 3:31 PM · Dumps-Generation, MediaWiki-ContentHandler, Beta-Cluster-Infrastructure

Nov 14 2023

Milimetric claimed T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint.
Nov 14 2023, 3:18 PM · Data Products (Data Products Sprint 04)
Milimetric created T351229: [Spike] Onboard Dan to AQS 2.0 via Knowledge Gaps Endpoint.
Nov 14 2023, 3:18 PM · Data Products (Data Products Sprint 04)

Nov 13 2023

Milimetric moved T349416: Synthesize results for Product Analytics' review of Metrics Platform event types from Sign Off to Done on the Data Products (Data Products (Sprint 03)) board.
Nov 13 2023, 5:06 PM · Data Products (Data Products (Sprint 03))
Milimetric moved T348731: Follow up on remaining requests to pageviews endpoints from Sign Off to Done on the Data Products (Data Products (Sprint 03)) board.
Nov 13 2023, 5:06 PM · RESTBase Sunsetting, Wikifeeds, Content-Transform-Team-WIP, Data Products (Data Products (Sprint 03))

Nov 9 2023

Milimetric edited P53125 category tree parsing.
Nov 9 2023, 10:22 PM
Milimetric moved T350898: Failure on enwiki and ukwikinews from Active to Done on the Dumps-Generation board.

Since the dumps for enwiki and ukwikinews are both complete now, I looked at the snapshot hosts 101[0123]. I see that the code that seems to be failing in the stack trace has been updated to -wmf.4 (the stack traces are from -wmf.2 and -wmf.3 respectively). So this seems like it was fixed by someone else, deployed, and the snapshot hosts resumed their work.

Nov 9 2023, 7:00 PM · Dumps-Generation
Milimetric added a comment to T350898: Failure on enwiki and ukwikinews.

Full output from email:

Nov 9 2023, 5:16 PM · Dumps-Generation
Milimetric moved T350898: Failure on enwiki and ukwikinews from Backlog to Active on the Dumps-Generation board.
Nov 9 2023, 5:14 PM · Dumps-Generation
Milimetric created T350898: Failure on enwiki and ukwikinews.
Nov 9 2023, 5:14 PM · Dumps-Generation
Milimetric moved T350309: latest-all.json.bz2 does not contain a record for Charlies Bunion (Q5085764) from Backlog to Other teams on the Dumps-Generation board.

indeed, there are quite some differences in the different pipelines. When the Wikidata folks look at this, do ping us as we have been working on a new dumps process and migrating other dumps to our Airflow scheduler. cc @VirginiaPoundstone

Nov 9 2023, 5:13 PM · wmde-wikidata-tech, Dumps-Generation, Wikidata
Milimetric moved T343556: Improve "Wikistats: Pageview complete dumps" readme page from Backlog to Up Next on the Dumps-Generation board.
Nov 9 2023, 5:08 PM · Dumps-Generation, Data Products, Datasets-General-or-Unknown
Milimetric moved T343629: Provide a regular JSON dump of all objects in Wikifunctions from Backlog to Other teams on the Dumps-Generation board.

When you all would like to start this work, let's talk. We would love to move this kind of dump to an Airflow pipeline for ease of maintenance.

Nov 9 2023, 5:07 PM · Dumps-Generation, Abstract Wikipedia team, Wikifunctions
Milimetric added a comment to T314541: Refactor active editors ETL.

Try and combine that into one.

Nov 9 2023, 4:27 PM · Movement-Metrics, Movement-Insights

Nov 7 2023

Milimetric edited P53125 category tree parsing.
Nov 7 2023, 10:20 PM
xcollazo awarded P53125 category tree parsing a Party Time token.
Nov 7 2023, 2:33 PM
Milimetric edited P53125 category tree parsing.
Nov 7 2023, 1:24 PM
JAllemandou awarded T350617: Use hive metastore when registering views a 100 token.
Nov 7 2023, 10:50 AM · Data-Engineering

Nov 6 2023

Milimetric created T350617: Use hive metastore when registering views.
Nov 6 2023, 6:58 PM · Data-Engineering

Nov 4 2023

Milimetric moved T350097: Find a way to extract data from the category tree from Ready for Code Review to In Testing on the Data Products (Data Products (Sprint 03)) board.

This has been used over the last few days to generate trees and it seems to be working well so far. We have some sample data and can use the logic to output a new set once Fiona and Virginia decide on it. Code is at https://phabricator.wikimedia.org/P53125

Nov 4 2023, 2:19 AM · Data Products (Data Products (Sprint 03))
Milimetric edited P53125 category tree parsing.
Nov 4 2023, 2:17 AM

Nov 3 2023

Milimetric edited P53125 category tree parsing.
Nov 3 2023, 8:01 PM
Milimetric edited P53125 category tree parsing.
Nov 3 2023, 4:07 PM
Milimetric edited P53125 category tree parsing.
Nov 3 2023, 4:07 PM

Nov 2 2023

Milimetric edited P53125 category tree parsing.
Nov 2 2023, 4:44 PM
Milimetric edited P53125 category tree parsing.
Nov 2 2023, 11:51 AM
Milimetric created P53125 category tree parsing.
Nov 2 2023, 12:12 AM

Oct 31 2023

Milimetric added a comment to T350097: Find a way to extract data from the category tree.

New logic includes vertexType and writes to milimetric.sample_category_graph (it's writing right now). See updated spark for coordinating the rest of the work:

Oct 31 2023, 7:31 PM · Data Products (Data Products (Sprint 03))
Milimetric updated subscribers of T350097: Find a way to extract data from the category tree.
Oct 31 2023, 4:30 AM · Data Products (Data Products (Sprint 03))
Milimetric moved T350097: Find a way to extract data from the category tree from Sprint Backlog to Ready for Code Review on the Data Products (Data Products (Sprint 03)) board.
Oct 31 2023, 4:30 AM · Data Products (Data Products (Sprint 03))
Milimetric claimed T350097: Find a way to extract data from the category tree.
Oct 31 2023, 4:30 AM · Data Products (Data Products (Sprint 03))
Milimetric created T350097: Find a way to extract data from the category tree.
Oct 31 2023, 4:29 AM · Data Products (Data Products (Sprint 03))

Oct 23 2023

Milimetric moved T348761: Add siteinfo element to XML output from Sign Off to To Deploy on the Data Products (Sprint 02) board.
Oct 23 2023, 3:52 PM · Data Products (Data Products (Sprint 03))
Milimetric moved T348767: Fix Import of Dumps 1.0 XML into HDFS from Sign Off to To Deploy on the Data Products (Sprint 02) board.
Oct 23 2023, 3:52 PM · Data Products (Data Products (Sprint 03))
Milimetric moved T344254: Make Metrics Platform Client record agent context attributes by default from Ready for Code Review to Sign Off on the Data Products (Sprint 02) board.
Oct 23 2023, 3:38 PM · Patch-For-Review, Data Products (Sprint 02), Metrics Platform Backlog
Milimetric moved T346288: [Java] Create Metrics Platform API for Submitting Core Interaction Events from In code review / Tech Input to Paused on the Data Products (Sprint 02) board.
Oct 23 2023, 3:38 PM · Data Products (Data Products (Sprint 03))

Oct 20 2023

Milimetric moved T348767: Fix Import of Dumps 1.0 XML into HDFS from Ready for Code Review to Sign Off on the Data Products (Sprint 02) board.

This is ready for deploy

Oct 20 2023, 8:57 PM · Data Products (Data Products (Sprint 03))
Milimetric added a comment to T340863: Mechanism for error logging when doing MERGE INTO.
  1. Introduce a row_visibility_last_update column.
Oct 20 2023, 3:04 PM · Data Products (Data Products Sprint 05), Patch-For-Review, Dumps 2.0

Oct 18 2023

Milimetric moved T348767: Fix Import of Dumps 1.0 XML into HDFS from In Process to Ready for Code Review on the Data Products (Sprint 02) board.
Oct 18 2023, 7:12 PM · Data Products (Data Products (Sprint 03))

Oct 17 2023

Milimetric moved T348578: Add more detail to project namespace map from In code review / Tech Input to Sign Off on the Data Products (Sprint 02) board.

Thomas deployed (did a great job!). I checked the table and it looks good, this is ready for sign off.

Oct 17 2023, 4:00 PM · Data Products (Sprint 02)

Oct 16 2023

Milimetric moved T346279: [Spike] Figure out what are good indicators for dumps data quality from Sign Off to Done on the Data Products (Sprint 02) board.
Oct 16 2023, 3:45 PM · Data Products (Sprint 02), Dumps 2.0
Milimetric added a comment to T346279: [Spike] Figure out what are good indicators for dumps data quality.

@JEbe-WMF - I'm sorry I had this comment but forgot to Submit! Your plan looks good to me, thank you for putting it together.

Oct 16 2023, 3:45 PM · Data Products (Sprint 02), Dumps 2.0

Oct 12 2023

Milimetric created T348767: Fix Import of Dumps 1.0 XML into HDFS.
Oct 12 2023, 3:23 PM · Data Products (Data Products (Sprint 03))
Milimetric added a comment to T346378: Update XML dump generation code to use wmf_dumps.wikitext_raw_rc1 schema..

Just to wrap this task up, the code that's merged now uses the rc1 schema. This was mostly done by Antoine. Any remaining work on XML publishing has been broken up in separate tasks, all of which are part of epic T347994. This task can be considered done.

Oct 12 2023, 3:18 PM · Data Products (Sprint 02), Dumps 2.0
Milimetric created T348761: Add siteinfo element to XML output.
Oct 12 2023, 3:01 PM · Data Products (Data Products (Sprint 03))