Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Today

awight added a project to T364068: Requesting access to analytics-privatedata-users for linafaridwmde: WMDE-TechWish-Sprint-2024-04-24.

Fri, May 3, 7:52 AM · WMDE-TechWish-Sprint-2024-04-24, SRE, SRE-Access-Requests

awight updated subscribers of T363453: Question HTML dump page order.

@Protsack.stephan I found one nearly monotonic column in the dumps: .event.date_created. But I'm still having trouble understanding the full pipeline—these event objects have type: update but the dates span roughly 14 months. What stream is this part of the data coming from? Maybe this is a bucket of titles, and the update events come from a job which refreshes the titles in order to capture renaming and new articles?

Fri, May 3, 7:30 AM · WMDE-TechWish-Maintenance, WMDE-References-FocusArea

awight claimed T363675: Publish scraper results on figshare, publicize dataset at conference.

Fri, May 3, 7:06 AM · WMDE-References-FocusArea, Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-24

awight moved T363675: Publish scraper results on figshare, publicize dataset at conference from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2024-04-24 board.

Fri, May 3, 7:06 AM · WMDE-References-FocusArea, Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-24

Yesterday

awight moved T362859: Build dashboards in superset for editor usage statistics from UX/PM Review to Watching / Epic / Stalled on the WMDE-TechWish-Sprint-2024-04-24 board.

Thu, May 2, 10:53 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight added a comment to T362859: Build dashboards in superset for editor usage statistics.

@Lina_Farid_WMDE Thanks for the error report! For the record, unfortunately, it seems that you'll need to request the "analytics-privatedata-users (no Kerberos, no ssh)" access group.

Thu, May 2, 10:53 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight moved T362859: Build dashboards in superset for editor usage statistics from Tech Review to UX/PM Review on the WMDE-TechWish-Sprint-2024-04-24 board.

Thu, May 2, 7:49 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

Tue, Apr 30

awight added a comment to T351550: TypeError: Cannot read properties of undefined (reading 'getAttribute') in ve.ui.MWReferenceSearchWidget.buildIndex .

(Let's keep this task and solve the root issue after the production error is worked-around.)

Tue, Apr 30, 10:16 AM · MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), VisualEditor-MediaWiki-References, VisualEditor, Citoid, Wikimedia-production-error

Mon, Apr 29

awight created T363675: Publish scraper results on figshare, publicize dataset at conference.

Mon, Apr 29, 9:43 AM · WMDE-References-FocusArea, Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-24

awight added a project to T356871: [Refactor] Decouple reference reuse workflow from internalList: Unplanned-Sprint-Work.

Mon, Apr 29, 9:23 AM · MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-24, MW-1.42-notes (1.42.0-wmf.25; 2024-04-02), WMDE-TechWish-Sprint-2024-02-15, Patch-For-Review, WMDE-TechWish-Sprint-2024-01-31, WMDE-References-FocusArea

awight added a project to T356871: [Refactor] Decouple reference reuse workflow from internalList: WMDE-TechWish-Sprint-2024-04-24.

Mon, Apr 29, 9:22 AM · MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-24, MW-1.42-notes (1.42.0-wmf.25; 2024-04-02), WMDE-TechWish-Sprint-2024-02-15, Patch-For-Review, WMDE-TechWish-Sprint-2024-01-31, WMDE-References-FocusArea

awight moved T353695: Remove meaningless "Cite error: …" prefix in favor of independent error messages from Sprint Backlog to Tech Review on the WMDE-TechWish-Sprint-2024-04-24 board.

Mon, Apr 29, 9:22 AM · WMDE-TechWish-Sprint-2024-04-24, Unplanned-Sprint-Work, Patch-For-Review, WMDE-TechWish-Maintenance, I18n, Technical-Debt, Cite

awight added projects to T353695: Remove meaningless "Cite error: …" prefix in favor of independent error messages: Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-24.

Mon, Apr 29, 9:21 AM · WMDE-TechWish-Sprint-2024-04-24, Unplanned-Sprint-Work, Patch-For-Review, WMDE-TechWish-Maintenance, I18n, Technical-Debt, Cite

awight moved T363096: [Refactor] Get rid of "auto/<#>" and "literal/<name>" internal ref IDs from Sprint Backlog to Tech Review on the WMDE-TechWish-Sprint-2024-04-24 board.

Mon, Apr 29, 9:20 AM · MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, VisualEditor, VisualEditor-MediaWiki-References, WMDE-TechWish-Maintenance, WMDE-TechWish-Sprint-2024-04-12, Cite

awight moved T363293: Aggregate some numbers from scraper results from Watching / Epic / Stalled to Sprint Backlog on the WMDE-TechWish-Sprint-2024-04-24 board.

Mon, Apr 29, 7:50 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

Fri, Apr 26

awight updated the task description for T357611: Re-run the scraper on a limited set of wikis.

Fri, Apr 26, 12:06 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight moved T357611: Re-run the scraper on a limited set of wikis from Watching / Epic / Stalled to Demo on the WMDE-TechWish-Sprint-2024-04-24 board.

Fri, Apr 26, 12:05 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight updated the task description for T363156: Move ReferencePreviews i18n messages from Popups to Cite.

Fri, Apr 26, 11:39 AM · Patch-For-Review, WMDE-TechWish-Sprint-2024-04-24, Reference Previews, Cite, Page-Previews

awight updated the task description for T363565: Scraper: concurrent summarization.

Fri, Apr 26, 11:12 AM · WMDE-References-FocusArea, WMDE-TechWish-Maintenance

awight created T363565: Scraper: concurrent summarization.

Fri, Apr 26, 11:11 AM · WMDE-References-FocusArea, WMDE-TechWish-Maintenance

awight added a comment to T362859: Build dashboards in superset for editor usage statistics.

FWIW, I've been trying to add a simple line chart of editor interface popularity over time, but it's been unironically timing out.

Fri, Apr 26, 7:35 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

Thu, Apr 25

awight awarded T363476: docker-dev: Enable auto reference generation with Citoid a Barnstar token.

Thu, Apr 25, 2:55 PM · Patch-For-Review, WMDE-TechWish-Sprint-2024-04-24

awight created T363467: Profile scraper on slow pages.

Thu, Apr 25, 12:26 PM · WMDE-TechWish-Maintenance, WMDE-References-FocusArea

awight moved T362859: Build dashboards in superset for editor usage statistics from Doing to Tech Review on the WMDE-TechWish-Sprint-2024-04-24 board.

Devs: it would be nice to get a second opinion on these SQL queries, especially some sanity checking of the way users and sessions are grouped.

Thu, Apr 25, 11:38 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight added a comment to T362859: Build dashboards in superset for editor usage statistics.

Screencaps:

Thu, Apr 25, 11:37 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight placed T362859: Build dashboards in superset for editor usage statistics up for grabs.

Thu, Apr 25, 11:35 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight added a comment to T362900: Investigate scraper performance drop-off.

Hypothesis 4 can be followed up in T363453: Question HTML dump page order.

Thu, Apr 25, 9:48 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight created T363453: Question HTML dump page order.

Thu, Apr 25, 9:47 AM · WMDE-TechWish-Maintenance, WMDE-References-FocusArea

awight updated the task description for T362859: Build dashboards in superset for editor usage statistics.

Thu, Apr 25, 7:25 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight updated the task description for T362859: Build dashboards in superset for editor usage statistics.

Thu, Apr 25, 7:03 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

Wed, Apr 24

awight added a comment to T362859: Build dashboards in superset for editor usage statistics.

Sneak preview:

Wed, Apr 24, 2:37 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight claimed T362859: Build dashboards in superset for editor usage statistics.

Wed, Apr 24, 1:22 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight moved T362859: Build dashboards in superset for editor usage statistics from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 1:22 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight moved T363327: Prepare spreadsheet for new scraper results from Doing to Demo on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 1:20 PM · WMDE-References-FocusArea, WMDE-TechWish-Sprint-2024-04-24

awight placed T363327: Prepare spreadsheet for new scraper results up for grabs.

Wed, Apr 24, 1:20 PM · WMDE-References-FocusArea, WMDE-TechWish-Sprint-2024-04-24

awight claimed T363327: Prepare spreadsheet for new scraper results.

Wed, Apr 24, 12:49 PM · WMDE-References-FocusArea, WMDE-TechWish-Sprint-2024-04-24

awight moved T363327: Prepare spreadsheet for new scraper results from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 12:49 PM · WMDE-References-FocusArea, WMDE-TechWish-Sprint-2024-04-24

awight updated the task description for T363327: Prepare spreadsheet for new scraper results.

Wed, Apr 24, 12:39 PM · WMDE-References-FocusArea, WMDE-TechWish-Sprint-2024-04-24

awight created T363327: Prepare spreadsheet for new scraper results.

Wed, Apr 24, 12:39 PM · WMDE-References-FocusArea, WMDE-TechWish-Sprint-2024-04-24

awight moved T362358: Log events for copy and paste action around references in VE from Sprint Backlog to Watching / Epic / Stalled on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 12:29 PM · WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight moved T363293: Aggregate some numbers from scraper results from Sprint Backlog to Watching / Epic / Stalled on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 12:29 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

awight moved T362347: Log events for some simple interactions in the VE cite dialogs from Sprint Backlog to Watching / Epic / Stalled on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 12:29 PM · WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight added a project to T362347: Log events for some simple interactions in the VE cite dialogs: WMDE-TechWish-Sprint-2024-04-24.

Wed, Apr 24, 12:28 PM · WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight added a project to T362358: Log events for copy and paste action around references in VE: WMDE-TechWish-Sprint-2024-04-24.

Wed, Apr 24, 12:28 PM · WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight updated the task description for T363313: Provide basic indicators on how active different communties are.

Wed, Apr 24, 12:25 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea, WMDE-TechWish

awight moved T362771: Move ReferencePreviews related config flags to Cite's codebase from Demo to Tech Review on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 12:20 PM · Patch-For-Review, MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, Reference Previews, Cite, Page-Previews

awight moved T362771: Move ReferencePreviews related config flags to Cite's codebase from Sprint Backlog to Demo on the WMDE-TechWish-Sprint-2024-04-24 board.

Wed, Apr 24, 12:20 PM · Patch-For-Review, MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, Reference Previews, Cite, Page-Previews

awight removed a project from T362900: Investigate scraper performance drop-off: WMDE-TechWish-Sprint-2024-04-24.

Wed, Apr 24, 12:20 PM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight moved T362900: Investigate scraper performance drop-off from Tech Review to Done on the WMDE-TechWish-Sprint-2024-04-12 board.

Wed, Apr 24, 12:16 PM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight renamed T363293: Aggregate some numbers from scraper results from Aggregate some numbers from Scraper results to Aggregate some numbers from scraper results.

Wed, Apr 24, 11:43 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-References-FocusArea

Tue, Apr 23

awight added a comment to T362904: Scraper: track with production Prometheus.

Sneak preview for those playing at home:

Tue, Apr 23, 8:25 PM · Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight added a comment to T362904: Scraper: track with production Prometheus.

When the metrics land, they should appear on https://prometheus-eqiad.wikimedia.org/analytics/targets?search=wmde_tewu .

Tue, Apr 23, 1:23 PM · Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight moved T357611: Re-run the scraper on a limited set of wikis from Doing to Watching / Epic / Stalled on the WMDE-TechWish-Sprint-2024-04-12 board.

Tue, Apr 23, 10:10 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight updated the task description for T357611: Re-run the scraper on a limited set of wikis.

Tue, Apr 23, 10:10 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight moved T357611: Re-run the scraper on a limited set of wikis from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2024-04-12 board.

Tue, Apr 23, 10:10 AM · WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

Mon, Apr 22

awight added a project to T357613: Measure the reference use and re-use in VE: Epic.

Mon, Apr 22, 1:13 PM · WMDE-TechWish-Sprint-2024-04-24, Epic, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-03-27, WMDE-TechWish-Sprint-2024-03-13, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight placed T362904: Scraper: track with production Prometheus up for grabs.

Mon, Apr 22, 1:09 PM · Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight moved T357611: Re-run the scraper on a limited set of wikis from Demo to Sprint Backlog on the WMDE-TechWish-Sprint-2024-04-12 board.

Mon, Apr 22, 1:08 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight added a parent task for T362904: Scraper: track with production Prometheus: T357611: Re-run the scraper on a limited set of wikis.

Mon, Apr 22, 12:38 PM · Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight added a subtask for T357611: Re-run the scraper on a limited set of wikis: T362904: Scraper: track with production Prometheus.

Mon, Apr 22, 12:38 PM · WMDE-TechWish-Sprint-2024-04-24, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight claimed T362904: Scraper: track with production Prometheus.

Mon, Apr 22, 12:38 PM · Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight added projects to T362904: Scraper: track with production Prometheus: WMDE-TechWish-Sprint-2024-04-12, Unplanned-Sprint-Work.

This will simplify how we share monitoring duty during the long-running scrape job.

Mon, Apr 22, 12:38 PM · Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight created T363096: [Refactor] Get rid of "auto/<#>" and "literal/<name>" internal ref IDs.

Mon, Apr 22, 12:18 PM · MW-1.43-notes (1.43.0-wmf.4; 2024-05-07), WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, VisualEditor, VisualEditor-MediaWiki-References, WMDE-TechWish-Maintenance, WMDE-TechWish-Sprint-2024-04-12, Cite

awight created T363095: [Refactor] New class to encapsulate Cite refs in VE.

Mon, Apr 22, 12:11 PM · VisualEditor, VisualEditor-MediaWiki-References, Cite, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight changed the status of T362358: Log events for copy and paste action around references in VE from Open to Stalled.

Stalled waiting for WMF legal review.

Mon, Apr 22, 11:35 AM · WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight changed the status of T362358: Log events for copy and paste action around references in VE, a subtask of T357613: Measure the reference use and re-use in VE, from Open to Stalled.

Mon, Apr 22, 11:33 AM · WMDE-TechWish-Sprint-2024-04-24, Epic, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Sprint-2024-03-27, WMDE-TechWish-Sprint-2024-03-13, WMDE-TechWish-Sprint-2024-02-28, WMDE-TechWish-Sprint-2024-02-15, WMDE-References-FocusArea

awight moved T362358: Log events for copy and paste action around references in VE from Tech Review to Watching / Epic / Stalled on the WMDE-TechWish-Sprint-2024-04-12 board.

Mon, Apr 22, 11:33 AM · WMDE-TechWish-Sprint-2024-04-24, Patch-For-Review, WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight added a project to T362900: Investigate scraper performance drop-off: Patch-For-Review.

Mon, Apr 22, 11:26 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight placed T362900: Investigate scraper performance drop-off up for grabs.

Well, it could be simple after all. Articles at the end are on average twice as long (by HTML length).

Mon, Apr 22, 11:22 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight added a comment to T362900: Investigate scraper performance drop-off.

In this example, the segment on the left is processing the tail articles starting at the 2.6M'th row, and on the right we're processing the first articles in the dump.

Mon, Apr 22, 10:26 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight added a comment to T362900: Investigate scraper performance drop-off.

Very surprisingly to me, Hypothesis 4 seems to be the only validated theory. I haven't yet identified what makes the last articles harder to process, but the performance characteristics are almost perfectly repeatable when going back and forth between sets of articles at the beginning vs. the end of the dump. Initial articles can be processed at ~1.5k articles/s, and final articles at ~250 articles/s.

Mon, Apr 22, 10:20 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight updated the task description for T362900: Investigate scraper performance drop-off.

Mon, Apr 22, 10:04 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight updated the task description for T362900: Investigate scraper performance drop-off.

Mon, Apr 22, 9:57 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight updated the task description for T362900: Investigate scraper performance drop-off.

Mon, Apr 22, 9:51 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight updated the task description for T362900: Investigate scraper performance drop-off.

Mon, Apr 22, 9:35 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight updated the task description for T362900: Investigate scraper performance drop-off.

Mon, Apr 22, 7:42 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight claimed T362900: Investigate scraper performance drop-off.

Mon, Apr 22, 7:40 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight moved T362900: Investigate scraper performance drop-off from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2024-04-12 board.

Mon, Apr 22, 7:40 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

Fri, Apr 19

awight added a comment to T362900: Investigate scraper performance drop-off.

WIP on the low-level-concurrency branch will let us experiment with per-page timeouts and debugging.

Fri, Apr 19, 11:57 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight updated the task description for T362900: Investigate scraper performance drop-off.

Fri, Apr 19, 10:08 AM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

Thu, Apr 18

awight created T362904: Scraper: track with production Prometheus.

Thu, Apr 18, 3:36 PM · Unplanned-Sprint-Work, WMDE-TechWish-Sprint-2024-04-12, WMDE-TechWish-Maintenance

awight created T362900: Investigate scraper performance drop-off.

Thu, Apr 18, 3:17 PM · WMDE-TechWish-Sprint-2024-04-12, WMDE-References-FocusArea

awight added a comment to T354018: Duplicate articles in snapshot dump.

This may be related to T362894: Data quality: HTML dumps contain unexplainably outdated revisions of some pages. The duplicates seem to have various revision ids, here's a set showing that the article is included three times with the same title and page id, but at different versions:

tar xzf dewiki-NS0-20240201-ENTERPRISE-HTML.json.tar.gz -O | jq 'select(.name == "10.000 B.C.") | .identifier,.version.identifier'

Thu, Apr 18, 3:00 PM · Wikimedia Enterprise

awight updated the task description for T362894: Data quality: HTML dumps contain unexplainably outdated revisions of some pages.

Thu, Apr 18, 2:19 PM · Wikimedia Enterprise, Dumps-Generation, WMDE-References-FocusArea

awight created T362894: Data quality: HTML dumps contain unexplainably outdated revisions of some pages.

Thu, Apr 18, 2:18 PM · Wikimedia Enterprise, Dumps-Generation, WMDE-References-FocusArea

awight closed T362678: Package request: install elixir and erlang-otp to the analytics clients as Resolved.

@BTullis Thanks for highlighting this possibility! I tried the Conda environment as you suggested and it works perfectly for our needs. Even at high concurrency, the performance seems to be the same as in the bare metal environment I had cobbled together previously.

Thu, Apr 18, 12:29 PM · Data-Platform-SRE, Data-Engineering