Page MenuHomePhabricator

awight (Adam Wight)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Saturday

  • No visible events.

User Details

User Since
Oct 12 2014, 9:02 PM (595 w, 3 d)
Availability
Available
IRC Nick
awight
LDAP User
Awight
MediaWiki User
Adamw [ Global Accounts ]

WMDE Technical Wishes developer

Recent Activity

Yesterday

awight added a comment to T419655: Migrate Tech Wishes scraper to gitlab.wikimedia.org.

Just for the record, I think it's a good policy to not pull binaries from servers outside of Wikimedia infrastructure—so I'm not asking for a change here, only providing information about what went wrong :-)

Wed, Mar 11, 2:57 PM · WMDE-TechWish-Sprint-2026-03-03-Spinach
awight added a comment to T419655: Migrate Tech Wishes scraper to gitlab.wikimedia.org.

The artifact was defined like so, in wmde/config/artifacts.yaml:

page-summary-scraper-0.6.1.tgz:
  id: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/package_files/278501910/download
  source: url
Wed, Mar 11, 2:55 PM · WMDE-TechWish-Sprint-2026-03-03-Spinach
awight added a comment to T419655: Migrate Tech Wishes scraper to gitlab.wikimedia.org.

https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/jobs/761299 shows the issue

Wed, Mar 11, 2:50 PM · WMDE-TechWish-Sprint-2026-03-03-Spinach
awight added a comment to T418442: Finish and deploy scraper Airflow job.

For anyone who wants to monitor memory usage: Thanos

Wed, Mar 11, 10:33 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight added a comment to T418442: Finish and deploy scraper Airflow job.

The job seems to be hitting an out-of-memory error now. I increased the memory from 2GB to 4GB but now it crashes in the third chunk of dewiki.

Wed, Mar 11, 10:31 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight closed T419654: Append environment for BashSensor, a subtask of T418442: Finish and deploy scraper Airflow job, as Invalid.
Wed, Mar 11, 9:13 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight closed T419654: Append environment for BashSensor as Invalid.

Wrote this too quickly—after looking into it more I see that the BashSensor actually uses the same krb5 credentials path cache so my shortcut *does* work.

Wed, Mar 11, 9:13 AM · WMDE-TechWish-Sprint-2026-03-03-Spinach
awight created T419655: Migrate Tech Wishes scraper to gitlab.wikimedia.org.
Wed, Mar 11, 7:56 AM · WMDE-TechWish-Sprint-2026-03-03-Spinach
awight claimed T419654: Append environment for BashSensor.
Wed, Mar 11, 7:29 AM · WMDE-TechWish-Sprint-2026-03-03-Spinach
awight created T419654: Append environment for BashSensor.
Wed, Mar 11, 7:27 AM · WMDE-TechWish-Sprint-2026-03-03-Spinach

Tue, Mar 10

awight added a comment to T418442: Finish and deploy scraper Airflow job.

I found that the BashSensor is not getting the correct KRB5CCNAME, and I'll have to implement an append_env which adds my additional variables to the executor's environment.

Tue, Mar 10, 4:52 PM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Tue, Mar 10, 4:50 PM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach

Mon, Mar 9

awight added a project to T416486: [Refactor] Add logic to get all reuses from a `listIndex`: Patch-For-Review.
Mon, Mar 9, 10:17 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight placed T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used up for grabs.
Mon, Mar 9, 10:17 AM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet
awight added a comment to T419187: Options to be selective about the revision when loading userscripts.

We already have a way to pin to a specific revision, for example using oldid in the URL:

https://en.wikipedia.org/w/index.php?title=User:Adamw/DraftTopic.js&oldid=855943446&action=raw&ctype=text/javascript
Mon, Mar 9, 7:50 AM · MediaWiki-Platform-Team (Radar), MediaWiki-ResourceLoader, 2026-user-javascript-incident, JavaScript

Fri, Mar 6

awight added a comment to T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.

So far, it seems that mainBody (Parsoid-added refListItemId of the main ref, as an attribute on a main+details footnote marker) already carries the information we need. I'm still experimenting with transclusion edge cases.

Fri, Mar 6, 2:42 PM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet
awight added a comment to T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.

We found that any new solution to finding a subref's main ref needs to *not* know about main refs (or main part of a main+details) that were produced by a transclusion and do not appear at the top-level document, to avoid perpetuating the issue from T412007: VE unexpectedly copies reference content from one sub-ref to another if the main ref is defined within a template. The main ref's InternalList item if produced by a transclusion should render as missing, "This reference is defined in a template"—until T355858: References from template transclusions should be included (read-only) in the internalList is solved.

Fri, Mar 6, 10:36 AM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet
awight updated the task description for T355858: References from template transclusions should be included (read-only) in the internalList.
Fri, Mar 6, 10:32 AM · WMDE-TechWish-Sprint-2024-02-15, WMDE-TechWish-Sprint-2024-01-31, WMDE-TechWish, VisualEditor, Cite, WMDE-References-FocusArea
awight renamed T355858: References from template transclusions should be included (read-only) in the internalList from References from template transclusions should be part of the internalList to References from template transclusions should be included (read-only) in the internalList.
Fri, Mar 6, 10:24 AM · WMDE-TechWish-Sprint-2024-02-15, WMDE-TechWish-Sprint-2024-01-31, WMDE-TechWish, VisualEditor, Cite, WMDE-References-FocusArea
awight added a comment to T411134: "On Initial view -> Will not switch to edit mode if the column clicked is not selected" test is flaky.

Hi, Tech Wishes dev here! After a brief chat, we would recommend disabling the entire suite for now. We've noticed the flakiness as well and it counteracts any value that might be gotten from having the tests. I'm happy to do that if you agree that it's the right direction to go in.

Fri, Mar 6, 9:04 AM · Two-Column-Edit-Conflict-Merge, QS-Test-Automation

Thu, Mar 5

awight updated the task description for T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.
Thu, Mar 5, 8:41 AM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet

Wed, Mar 4

awight updated the task description for T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.
Wed, Mar 4, 2:49 PM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Wed, Mar 4, 1:26 PM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight added a project to T418977: Parsoid parity for Cite: several wikis have backlink customization which doesn't work for Parsoid: Parsoid.
Wed, Mar 4, 8:23 AM · Content-Transform-Team, Cite
awight updated the task description for T384948: Implement explicit backlink markers for Parsoid.
Wed, Mar 4, 8:22 AM · Patch-For-Review, WMDE-TechWish-Sprint-2025-01-22, Cite, WMDE-References-FocusArea
awight created T418977: Parsoid parity for Cite: several wikis have backlink customization which doesn't work for Parsoid.
Wed, Mar 4, 8:19 AM · Content-Transform-Team, Cite

Tue, Mar 3

awight created T418864: Backfill per-wiki scraper summaries into Hive.
Tue, Mar 3, 1:20 PM · WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T416001: Scraper should write directly to Hive from Epics / Watching / Stalled to Done on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Tue, Mar 3, 10:05 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Patch-For-Review, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight moved T418082: Scraper will output to a simple ND-JSON file from Tech Review to Done on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Tue, Mar 3, 9:00 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight updated the task description for T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.
Tue, Mar 3, 8:27 AM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet

Fri, Feb 27

awight updated the task description for T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.
Fri, Feb 27, 1:10 PM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet
awight updated the task description for T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.
Fri, Feb 27, 12:07 PM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet
awight claimed T395083: VE: Cannot render main content from inline main+details when {{reflist}} is used.

I'm going to pick this up with a focus on the risk that even our newly-refactored solution could still be incompatible with the {{reflist}} template. The current assumption is that if we can solve this task using our new wiring, then the approach will work out overall since the lack of main content is exactly the hole provisionally filled by the synthetic ref.

Fri, Feb 27, 11:55 AM · MW-1.46-notes (1.46.0-wmf.20; 2026-03-17), WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Cite (Sub-referencing), MW-1.45-notes (1.45.0-wmf.10; 2025-07-15), WMDE-TechWish-Sprint-2025-07-11-Strawberry-Cheesecake-Ice-Cream, Patch-For-Review, WMDE-TechWish-Sprint-2025-06-25-Basil-Lime-Sorbet
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Fri, Feb 27, 11:34 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Fri, Feb 27, 11:33 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight moved T418442: Finish and deploy scraper Airflow job from Doing to Tech Review on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.

(CC'ing data platform engineers who have generously helped us, and might be interested in watching the exciting conclusion.)

Fri, Feb 27, 9:53 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight placed T418442: Finish and deploy scraper Airflow job up for grabs.
Fri, Feb 27, 9:52 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Fri, Feb 27, 9:51 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach

Thu, Feb 26

awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Thu, Feb 26, 4:21 PM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Thu, Feb 26, 3:23 PM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Thu, Feb 26, 12:57 PM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Thu, Feb 26, 12:57 PM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Thu, Feb 26, 8:28 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight updated the task description for T418442: Finish and deploy scraper Airflow job.
Thu, Feb 26, 8:25 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight moved T418442: Finish and deploy scraper Airflow job from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Thu, Feb 26, 6:44 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight created T418442: Finish and deploy scraper Airflow job.
Thu, Feb 26, 6:43 AM · Patch-For-Review, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-03-03-Spinach
awight moved T418082: Scraper will output to a simple ND-JSON file from Doing to Tech Review on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Thu, Feb 26, 6:37 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T418082: Scraper will output to a simple ND-JSON file from Tech Review to Doing on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Thu, Feb 26, 6:37 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)

Wed, Feb 25

awight closed T417633: Airflow devenv (WMDE) cannot see webproxy as Resolved.
Wed, Feb 25, 3:13 PM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering
awight closed T417633: Airflow devenv (WMDE) cannot see webproxy, a subtask of T412019: [Epic] Schedule scraper and aggregations as an Airflow job, as Resolved.
Wed, Feb 25, 3:13 PM · WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight added a comment to T417633: Airflow devenv (WMDE) cannot see webproxy.

I think it works!

Wed, Feb 25, 3:13 PM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering
awight added a comment to T417633: Airflow devenv (WMDE) cannot see webproxy.

@brouberol That's amazing, thank you. I'll wait for the chart deployment and will post the outcome here.

Wed, Feb 25, 2:44 PM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering

Tue, Feb 24

awight moved T418209: Deploy subreferencing: pilot wikis phase 2 from Tech Review to Demo on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Tue, Feb 24, 3:09 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board)
awight placed T418209: Deploy subreferencing: pilot wikis phase 2 up for grabs.
Tue, Feb 24, 3:08 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board)
awight added a comment to T416303: Scraper: Run the scraper for all wikis with the January data.

Dumping the summary here before I sign out for the day,

dbname dewiki
snapshot_date 2026-02-02 
identical_refs_count 194973
identical_refs_on_pages_with_25_or_less_refs_average 194972.95
identical_refs_on_pages_with_over_25_refs_average 0.7808942
identical_refs_on_pages_with_over_25_refs_count 93687
list_defined_ref_per_page_having_ref 0.36901948
list_defined_ref_sum 731064
max_ref_reuse_average 2.8919237
nested_ref_sum 578
page_count 3093332
pages_with_automatically_named_refs_count 116612
pages_with_identical_refs_and_over_25_refs_count 25736
pages_with_identical_refs_count 91934
pages_with_multiple_reflists_count 29991
pages_with_named_refs_count 896769
pages_with_nested_refs_count 243
pages_with_over_25_refs_count 119974
pages_with_ref_reuse_count 7024761981077
pages_with_refs_count 248833
pages_with_similar_refs_count 5986
pages_with_subrefs_count 0.5766826
proportion_of_named_refs_uniquely_named_average 0.04640607
proportion_of_pages_with_identical_refs 1.2266055E-4
proportion_of_pages_with_nested_refs 0.12560491
proportion_of_pages_with_similar_refs 0.6404346
proportion_of_pages_with_refs 0.079681166
proportion_of_refs_from_transclusion 0.3740749
proportion_of_refs_having_transclusion 0.26614386
proportion_of_refs_named_average 0.118505105
proportion_of_refs_reused_average 0.7054597
ref_by_transclusion_average 1397570
ref_by_transclusion_count 17539527
ref_count 5.6701083
ref_count_per_page 8.853531
ref_count_per_page_having_ref 2015523
reflist_count 1.0173875
reflists_per_page_having_ref 5475486
refs_with_solely_transclusion_count 6561097
refs_with_transclusions_countsimilar_refs_count 1038935
subrefs_sum 62401
transclusion_average 10.077638
transclusion_sum 31173480
wikitext_length_average 6913.2437
Tue, Feb 24, 3:08 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight placed T416303: Scraper: Run the scraper for all wikis with the January data up for grabs.

Data has landed in wmde.wiki_page_cite_references_raw (per-page) and wmde.wiki_page_cite_references_monthly (totals are in one row for dewiki).

Tue, Feb 24, 3:04 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight added a comment to T418082: Scraper will output to a simple ND-JSON file.

January data is being imported manually and provides a verification of the queries here.

Tue, Feb 24, 3:03 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight added a comment to T417669: [5.3.3 Epic] Provide reference count information within Attribution endpoint..
Tue, Feb 24, 11:33 AM · Epic, OKR-Work, [MWI] FY2025-26 Q3
awight added a comment to T416303: Scraper: Run the scraper for all wikis with the January data.

This is the command line I used to process the saved chunks, in my home directory on stat1010:

mix scrape --dir ~/dewiki-chunks-2026-02-02/ --output=dewiki-2026-02-02-page-summary.ndjson
Tue, Feb 24, 11:18 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight added a comment to T418082: Scraper will output to a simple ND-JSON file.

Note that the file is reopened for each chunk, in an unexpected combination of reading saved chunks from separate files, and outputting to a file. I don't think this will hurt anything; the aggregation step rejects duplicate pages (ignoring whether the revisions change).

Tue, Feb 24, 11:15 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight updated the task description for T418209: Deploy subreferencing: pilot wikis phase 2.
Tue, Feb 24, 9:42 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board)
awight created T418209: Deploy subreferencing: pilot wikis phase 2.
Tue, Feb 24, 9:01 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board)
awight moved T418082: Scraper will output to a simple ND-JSON file from Dev ready for sprint to Tickets in sprint on the WMDE-TechWish (product board) board.
Tue, Feb 24, 8:59 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T418082: Scraper will output to a simple ND-JSON file from Incoming to Dev ready for sprint on the WMDE-TechWish (product board) board.
Tue, Feb 24, 8:59 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)

Mon, Feb 23

awight added a comment to T338057: Upgrade Spark to a version with long term Iceberg support, and with fixes to support Dumps 2.0.

One more +1 for Spark 3.5.

Mon, Feb 23, 12:08 PM · Epic, Data-Platform-SRE, Data-Engineering, Patch-For-Review
awight placed T418082: Scraper will output to a simple ND-JSON file up for grabs.
Mon, Feb 23, 8:53 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight updated the task description for T418082: Scraper will output to a simple ND-JSON file.
Mon, Feb 23, 8:52 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T418082: Scraper will output to a simple ND-JSON file from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Mon, Feb 23, 8:13 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight created T418082: Scraper will output to a simple ND-JSON file.
Mon, Feb 23, 8:13 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight added a comment to T416303: Scraper: Run the scraper for all wikis with the January data.

I've downloaded and stored the 2026-02-02 snapshot chunks on stat1010 and created a new input mode that will allow the scraper to read from those files. This makes it possible to finish processing even after the snapshot is replaced with newer revisions.

Mon, Feb 23, 8:08 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T416001: Scraper should write directly to Hive from Tech Review to Epics / Watching / Stalled on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.

The thoroughput was 20 rows/sec even after batching heavily (100 rows/statement). This is far slower than we can accept, so I'm abandoning the approach.

Mon, Feb 23, 8:06 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Patch-For-Review, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius

Fri, Feb 20

awight closed T417997: Run Spark Connect server in Analytics cluster, a subtask of T412019: [Epic] Schedule scraper and aggregations as an Airflow job, as Declined.
Fri, Feb 20, 4:03 PM · WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight closed T417997: Run Spark Connect server in Analytics cluster as Declined.

After discussion with @xcollazo , I'll take a simpler path and write files to a temporary filesystem. Will be described in a new task...

Fri, Feb 20, 4:03 PM · Data-Engineering, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots
awight updated the task description for T417997: Run Spark Connect server in Analytics cluster.
Fri, Feb 20, 3:15 PM · Data-Engineering, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots
awight updated the task description for T417997: Run Spark Connect server in Analytics cluster.
Fri, Feb 20, 3:10 PM · Data-Engineering, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots
awight created T417997: Run Spark Connect server in Analytics cluster.
Fri, Feb 20, 2:53 PM · Data-Engineering, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots

Wed, Feb 18

awight added a comment to T416303: Scraper: Run the scraper for all wikis with the January data.

Paving over the above errors by sending complex types as strings for the moment.

Wed, Feb 18, 2:16 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight added a comment to T416303: Scraper: Run the scraper for all wikis with the January data.

Running into two issues which strangely didn't appear on the test server.

Wed, Feb 18, 1:06 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight claimed T416303: Scraper: Run the scraper for all wikis with the January data.
Wed, Feb 18, 1:02 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T416303: Scraper: Run the scraper for all wikis with the January data from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Wed, Feb 18, 1:02 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T416304: Make the ReferenceTooltips Gadget sub-ref compatible from Doing to Sprint Backlog on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Wed, Feb 18, 1:02 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight placed T416304: Make the ReferenceTooltips Gadget sub-ref compatible up for grabs.
Wed, Feb 18, 1:02 PM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight changed the status of T415524: [Refactor] Render search results withouth relying on `mainRefKey` in MWReferenceSearchWidget, a subtask of T415884: [Refactor] Replace MWReferenceNode.mainRefKey with an index to the main ref InternalItemNode, from Stalled to Open.
Wed, Feb 18, 9:12 AM · MW-1.46-notes (1.46.0-wmf.15; 2026-02-10), WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Cite (Sub-referencing), VisualEditor
awight changed the status of T415524: [Refactor] Render search results withouth relying on `mainRefKey` in MWReferenceSearchWidget from Stalled to Open.

Seems ready to work on now?

Wed, Feb 18, 9:12 AM · WMDE-TechWish-Sprint-2026-03-03-Spinach, WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Patch-For-Review, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, Cite (Sub-referencing), WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors
awight updated the task description for T416304: Make the ReferenceTooltips Gadget sub-ref compatible.
Wed, Feb 18, 8:39 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight claimed T416304: Make the ReferenceTooltips Gadget sub-ref compatible.
Wed, Feb 18, 7:50 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight updated the task description for T416304: Make the ReferenceTooltips Gadget sub-ref compatible.
Wed, Feb 18, 7:50 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish (product board), Cite (Sub-referencing)
awight added a comment to T416001: Scraper should write directly to Hive.

Implementation has been smoke-tested on the Analytics testing cluster. It's verified as able to perform inserts and queries, and can authenticate and encrypt through Kerberos.

Wed, Feb 18, 7:43 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Patch-For-Review, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight updated the task description for T416001: Scraper should write directly to Hive.
Wed, Feb 18, 7:42 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Patch-For-Review, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight updated the task description for T416303: Scraper: Run the scraper for all wikis with the January data.
Wed, Feb 18, 7:42 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight moved T417633: Airflow devenv (WMDE) cannot see webproxy from Sprint Backlog to Epics / Watching / Stalled on the WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots board.
Wed, Feb 18, 7:41 AM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering
awight added a project to T417633: Airflow devenv (WMDE) cannot see webproxy: WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots.
Wed, Feb 18, 7:41 AM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering
awight added a parent task for T417633: Airflow devenv (WMDE) cannot see webproxy: T412019: [Epic] Schedule scraper and aggregations as an Airflow job.
Wed, Feb 18, 7:41 AM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering
awight added a subtask for T412019: [Epic] Schedule scraper and aggregations as an Airflow job: T417633: Airflow devenv (WMDE) cannot see webproxy.
Wed, Feb 18, 7:41 AM · WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight updated the task description for T416303: Scraper: Run the scraper for all wikis with the January data.
Wed, Feb 18, 7:40 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish (product board), Cite (Sub-referencing)
awight updated subscribers of T417633: Airflow devenv (WMDE) cannot see webproxy.
Wed, Feb 18, 7:20 AM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering

Tue, Feb 17

awight created T417633: Airflow devenv (WMDE) cannot see webproxy.
Tue, Feb 17, 12:19 PM · Data-Platform-SRE (2026-02-13 - 2026-03-06), WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Data-Engineering
awight updated the task description for T416001: Scraper should write directly to Hive.
Tue, Feb 17, 10:13 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Patch-For-Review, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight added a project to T416001: Scraper should write directly to Hive: Patch-For-Review.
Tue, Feb 17, 9:25 AM · WMDE-TechWish-Sprint-2026-02-17-Beautiful-Beetroots, Patch-For-Review, WMDE-TechWish-Sprint-2026-02-03-Tomatoes-of-many-colors, WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius
awight closed T413954: Set up Hive aggregation table and populate with sample data, a subtask of T412019: [Epic] Schedule scraper and aggregations as an Airflow job, as Resolved.
Tue, Feb 17, 9:23 AM · WMDE-TechWish-Sprint-2026-01-20-Carrots-of-many-colors, Epic, WMDE-TechWish-Sprint-2026-01-06-New-Year-Donuts, Cite (Sub-referencing), WMDE-TechWish-Sprint-2025-12-09-Christmas-Cookie, WMDE-TechWish-Sprint-2025-11-25-Spekulatius