Page MenuHomePhabricator

Milimetric (Dan Andreescu)
Staff Engineer (Data Engineering)

Projects (12)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 5:48 PM (334 w, 2 d)
Availability
Available
IRC Nick
Milimetric
LDAP User
Milimetric
MediaWiki User
Milimetric (WMF) [ Global Accounts ]

Recent Activity

Thu, Mar 4

Milimetric closed T276120: Wikimedia history dump - undocumented "create-page" event as Resolved.

Thank you for flagging! Updated docs to explain: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FData_Lake%2FEdits%2FMediawiki_history_dumps&type=revision&diff=1901897&oldid=1867966

Thu, Mar 4, 6:28 PM · Analytics, Documentation
Milimetric closed T276119: Wikimedia history dump - undocumented "merge" event as Resolved.

Thank you for flagging! Updated docs to explain: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FData_Lake%2FEdits%2FMediawiki_history_dumps&type=revision&diff=1901897&oldid=1867966

Thu, Mar 4, 6:28 PM · Analytics, Documentation
Milimetric placed T274988: Backfill metrics for TemplateWizard and VisualEditor up for grabs.

oh my bad, I guess you're working on this. Let me know when you get to re-running, in case you hadn't run into that feature. I think you need my permissions to execute it, but I'm happy to help (ping me on IRC, I'm not real-time on phab).

Thu, Mar 4, 5:42 PM · Analytics-Radar, WMDE-TechWish-Sprint-2021-02-17, WMDE-Templates-FocusArea

Wed, Mar 3

Milimetric added a comment to T274557: Broken graphs on wikis accounting for 45% of our client side errors.

To @Krinkle's point from above, there's not much inside of Jon's try block that could be our fault, because it's 99.99% third-party logic running in there, save for what version of vega we call and so on. If we do something bad, I think graphs would all break and we'd know about it. And the future of the graph extension is very uncertain right now anyway, these regressions will hopefully drive some product thinking.

Wed, Mar 3, 4:05 PM · MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), covid-19, Reading-Web-Local-Wiki-Issues, MediaWiki-extensions-Graph
Milimetric added a comment to T247058: Deployment strategy and hardware requirement for new Flink based WDQS updater.

This is a bit of a drive-by, but have we considered https://min.io/? I went a bit deeper than just the marketing and was impressed by their error-correcting implementation.

Wed, Mar 3, 3:53 PM · wdwb-tech-focus, Analytics, SRE, Wikidata, Wikidata-Query-Service
Milimetric updated the task description for T274322: Clean up issues with jobs after Hadoop Upgrade.
Wed, Mar 3, 3:42 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric added a comment to T265971: Check data currently stored on thorium and drop what it is not needed anymore.

Ok to drop from everywhere in my opinion :P but definitely from everywhere not hdfs, including stat1006

Wed, Mar 3, 3:23 PM · Analytics-Kanban, Analytics
Milimetric added a comment to T265971: Check data currently stored on thorium and drop what it is not needed anymore.

Sorry to be so late looking at this. The newest of these files are 2.5 years old and besides us absolutely zero people know they exist (they've all left the foundation). Still I double checked they're all safely backed up on hdfs, I think it's ok to delete.

Wed, Mar 3, 3:11 PM · Analytics-Kanban, Analytics
Milimetric added a comment to T272926: Upgrade UA Parser to 1.5.1+.

verified the new version of UA Parser is processing data as of the restart this morning, 2021-03-03T13:00 has the new data.

Wed, Mar 3, 3:02 PM · Analytics-Kanban, Analytics
Milimetric moved T272926: Upgrade UA Parser to 1.5.1+ from In Progress to Done on the Analytics-Kanban board.
Wed, Mar 3, 2:56 PM · Analytics-Kanban, Analytics

Mon, Mar 1

Milimetric added a comment to T272926: Upgrade UA Parser to 1.5.1+.

A bunch more work to do to package and deploy, but at least the basic patch is tested and done.

Mon, Mar 1, 10:12 PM · Analytics-Kanban, Analytics
Milimetric added a comment to T272926: Upgrade UA Parser to 1.5.1+.
  • There are 18% changes in unique classifications, we should do this more often, once a quarter still seems too slow
  • No changes are hugely problematic to any of the metrics I can think of
  • While I was typing this, UA Parser maintainers fixed a build problem and released 1.5.2 so that seems safe to upgrade to, will submit a patch shortly
Mon, Mar 1, 8:38 PM · Analytics-Kanban, Analytics

Mon, Feb 22

Milimetric moved T274690: Update sqoop to work with multi-instance clouddb1021 mariadb host from Next Up to In Progress on the Analytics-Kanban board.
Mon, Feb 22, 7:50 PM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
Milimetric claimed T274690: Update sqoop to work with multi-instance clouddb1021 mariadb host.
Mon, Feb 22, 7:50 PM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
Milimetric added a comment to T271571: Update Image usage metric.

This was deployed and runs monthly with a week delay. So the next run, around March 6th, should reflect the new logic.

Mon, Feb 22, 7:21 PM · Analytics-Kanban, Product-Analytics, Analytics
Milimetric moved T271571: Update Image usage metric from Next Up to Done on the Analytics-Kanban board.
Mon, Feb 22, 7:19 PM · Analytics-Kanban, Product-Analytics, Analytics
Milimetric moved T272926: Upgrade UA Parser to 1.5.1+ from Next Up to In Progress on the Analytics-Kanban board.
Mon, Feb 22, 7:19 PM · Analytics-Kanban, Analytics
Milimetric moved T274322: Clean up issues with jobs after Hadoop Upgrade from Next Up to In Progress on the Analytics-Kanban board.
Mon, Feb 22, 7:18 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric moved T262725: "Active editors" panel keeps flashing on stats.wikimedia.org from Ready to Deploy to Done on the Analytics-Kanban board.
Mon, Feb 22, 7:18 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats
Milimetric moved T273083: Filter out webrequest where debug=1 from pageview from Ready to Deploy to Done on the Analytics-Kanban board.

@jijiki & @ema: just following up on this, everything was deployed on our side and looks to be working. If you've seen the actual data, let us know if it doesn't look right (for example https://w.wiki/326j)

Mon, Feb 22, 7:18 PM · Analytics-Kanban, Analytics
Milimetric assigned T274384: Repackage spark without hadoop, use provided hadoop jars to Ottomata.
Mon, Feb 22, 7:15 PM · Analytics-Kanban, Patch-For-Review, Analytics
Milimetric assigned T275171: Growth: shorten welcome survey retention to 90 days to mforns.
Mon, Feb 22, 4:53 PM · Growth-Team (Current Sprint), Analytics-Radar, Growth-Scaling, Product-Analytics
Milimetric assigned T275172: Growth: update welcome survey aggregation schedule to mforns.
Mon, Feb 22, 4:52 PM · Growth-Team (Current Sprint), Product-Analytics (Kanban), Analytics-Radar, Growth-Scaling
Milimetric assigned T274986: Purge deprecated reportupdater outputs to mforns.
Mon, Feb 22, 4:50 PM · Analytics
Milimetric claimed T274647: Archive #Analytics-Visualization (which seems to be about Limn)?.
Mon, Feb 22, 4:45 PM · Project-Admins, Analytics-Visualization, Analytics
Milimetric added a comment to T274647: Archive #Analytics-Visualization (which seems to be about Limn)?.

Hi @Aklapper, yes, definitely please archive Analytics-Visualization, decline all the tasks in bulk if you could. Let me know if I can help.

Mon, Feb 22, 4:44 PM · Project-Admins, Analytics-Visualization, Analytics
Milimetric moved T274297: Growth: remove deletion timers for Growth's sanitized EL tables from Incoming to Security Maturity and Data Privacy on the Analytics board.
Mon, Feb 22, 4:42 PM · Analytics-Kanban, Growth-Scaling, Product-Analytics, Analytics, Growth-Team
Milimetric assigned T274297: Growth: remove deletion timers for Growth's sanitized EL tables to mforns.
Mon, Feb 22, 4:42 PM · Analytics-Kanban, Growth-Scaling, Product-Analytics, Analytics, Growth-Team

Fri, Feb 19

Milimetric renamed T274557: Broken graphs on wikis accounting for 45% of our client side errors from Broken graphs on wikis accounting for 20% of our client side errors to Broken graphs on wikis accounting for 45% of our client side errors.
Fri, Feb 19, 4:38 PM · MW-1.36-notes (1.36.0-wmf.32; 2021-02-23), covid-19, Reading-Web-Local-Wiki-Issues, MediaWiki-extensions-Graph

Thu, Feb 18

Milimetric committed rARPQbf42288aac98: Remove invalid job, echo tables no longer where expected (authored by Milimetric).
Remove invalid job, echo tables no longer where expected
Thu, Feb 18, 2:40 AM
Milimetric committed rARPQe6339fd13032: Fix more syntax errors (authored by Milimetric).
Fix more syntax errors
Thu, Feb 18, 2:28 AM
Milimetric committed rARPQb2ede546f8ec: Remove disabled jobs from reportupdater (authored by Milimetric).
Remove disabled jobs from reportupdater
Thu, Feb 18, 2:17 AM

Wed, Feb 17

Milimetric committed rARPQe1d180d35f04: Revert to escaping to not break header evolution (authored by Milimetric).
Revert to escaping to not break header evolution
Wed, Feb 17, 5:58 PM

Tue, Feb 16

Milimetric committed rARPQe19c2382f014: Avoid reserved keyword `date` (authored by awight).
Avoid reserved keyword `date`
Tue, Feb 16, 9:51 PM
Milimetric committed rARPQ681be0659ac7: Drop redundant operations on literal date (authored by awight).
Drop redundant operations on literal date
Tue, Feb 16, 9:51 PM
Milimetric committed rARPQba51a3002f29: Update commons_file_usage_in_wikimedia_projects logic per Isaac (authored by Milimetric).
Update commons_file_usage_in_wikimedia_projects logic per Isaac
Tue, Feb 16, 9:29 PM
Milimetric updated the task description for T274322: Clean up issues with jobs after Hadoop Upgrade.
Tue, Feb 16, 9:15 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric added a comment to T274880: Deployment access request for some analytics repos.

Big +2 from me for access to reportupdater-queries and any access needed to rerun jobs if needed. Deployment there is a matter of puppet-sync, and queries are all completely separate.

Tue, Feb 16, 2:26 PM · Analytics, WMDE-TechWish

Fri, Feb 12

Milimetric updated the task description for T274322: Clean up issues with jobs after Hadoop Upgrade.
Fri, Feb 12, 8:09 PM · Patch-For-Review, Analytics-Kanban, Analytics

Thu, Feb 11

Milimetric updated subscribers of T263489: AQS 2.0.

Ping @Pchelolo, @lexnasser was looking at this as the next thing he might focus on. I hesitated to ping before because I know your plate's full. My question is, what are your plans with this upgrade, and can we take over part of it with Lex as a resource? So, one option that might work would be you & team do the service-template-node updates (Cassandra support, etc. as discussed above), and we (mostly Lex) do the AQS rewrite (maybe even TypeScript - if we can trick Lex :P). Thoughts?

Thu, Feb 11, 8:13 PM · Code-Health-Objective, Platform Engineering Roadmap, Platform Team Initiatives (API Gateway), Analytics, Epic
Milimetric updated the task description for T274322: Clean up issues with jobs after Hadoop Upgrade.
Thu, Feb 11, 2:59 AM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric added a comment to T274322: Clean up issues with jobs after Hadoop Upgrade.

Update: all cassandra jobs restarted and seem ok, except mediarequests per_file daily. Patch for that WIP above. See note in description, when figured out it needs to start at 2021-02-09. Other problems fixed and jobs restarted.

Thu, Feb 11, 2:52 AM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric updated the task description for T274322: Clean up issues with jobs after Hadoop Upgrade.
Thu, Feb 11, 2:49 AM · Patch-For-Review, Analytics-Kanban, Analytics

Wed, Feb 10

Milimetric added a comment to T274322: Clean up issues with jobs after Hadoop Upgrade.

mediacounts and mediarequest have the same problem now that the syntax was worked out. Some hours, but not all hours, fail because of the way UDFs return structs. I think the best idea was Joseph's, to run them as spark sql (or pyspark if that's easier). But in the long term, we need to standardize these kinds of jobs and run them all the same way with the same boilerplate and only changing the query.

Wed, Feb 10, 2:25 AM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric updated the task description for T274322: Clean up issues with jobs after Hadoop Upgrade.
Wed, Feb 10, 2:23 AM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric created T274322: Clean up issues with jobs after Hadoop Upgrade.
Wed, Feb 10, 12:40 AM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Feb 9

Milimetric reassigned T273374: Uncaught TypeError: navigator.sendBeacon is not a function from Milimetric to Ottomata.
Tue, Feb 9, 4:44 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-EventLogging

Mon, Feb 8

Milimetric added a comment to T273741: Investigate unusual media traffic pattern for AsterNovi-belgii-flower-1mb.jpg on Commons.

+1 to @Gilles's idea. Reverse image searches don't yield anything obvious.

Mon, Feb 8, 5:28 PM · Patch-For-Review, Commons, Traffic, SRE
Milimetric added a comment to T273685: Turnilo "Display Druid query" gives "general error".

Weird, it works on https://w.wiki/yPx for example. It seems that something's different about the netflow dataset, we'll try and brainstorm what that might be.

Mon, Feb 8, 4:45 PM · Analytics

Fri, Feb 5

Milimetric edited projects for T98396: WMF-Last-Access cookie breaks Java client, added: Analytics; removed Analytics-Clusters.
Fri, Feb 5, 8:15 PM · Analytics, Analytics-Kanban
Milimetric added a comment to T272530: Adding a graph to a page doubles JS payload on mobile and desktop.

To add to Gilles's points:

Fri, Feb 5, 8:11 PM · Performance-Team (Radar), Readers-Web-Backlog (Tracking), Mobile, MediaWiki-extensions-Graph
Milimetric added a comment to T261681: Add time interval limits to pageview API.

The oversight is my fault, my apologies, I was too focused on our team's usage of the API. The initial motivation was high volume or wide timespan queries from a single user agent. Maybe the right solution isn't an overall limit but per-UA limits. I think the "All Time" use case is perfectly valid and I don't want to force you to rewrite that if there's an alternative. @lexnasser what do you think about a per-UA limit?

Fri, Feb 5, 8:00 PM · Patch-For-Review, Analytics

Feb 3 2021

Milimetric created T273818: Build a process to check permissions when changing datasets from non-PII to PII.
Feb 3 2021, 8:35 PM · Analytics
Milimetric added a comment to T272883: Agree on and document processes for review of WVUI code.

I'll just jump in here and say that a few of us at the foundation have been arguing for a Design System. This is defined here and brought into our context by Santosh. The basic idea is that we've had too many false starts trying to centralize our design ideas, and that this is a precious opportunity to approach it in a more holistic way.

Feb 3 2021, 1:47 AM · Vue.js Migration

Feb 2 2021

Milimetric moved T273374: Uncaught TypeError: navigator.sendBeacon is not a function from Next Up to In Progress on the Analytics-Kanban board.

@Amorymeltzer: I believe you, but something's not making sense. navigator.sendBeacon has been available in Firefox since v31, in 2014 (https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon) and your version is just from May of last year. Something is disabling / removing navigator.sendBeacon somehow. Could be some rogue code executing from an extension in your browser, gadget you have enabled, etc. I found very few references to other folks experiencing the same problem, like this obscure chat (look for the error): https://mozilla.logbot.info/fxos/20151112/raw.

Feb 2 2021, 11:36 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-EventLogging
Milimetric added a project to T273374: Uncaught TypeError: navigator.sendBeacon is not a function: Analytics-Kanban.
Feb 2 2021, 11:22 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-EventLogging
Milimetric moved T273470: Bug: Active Editors showing July numbers from Next Up to Ready to Deploy on the Analytics-Kanban board.
Feb 2 2021, 5:17 PM · Analytics-Kanban, Analytics
Milimetric moved T269883: Wikistats map's choropleth shows the same color for 0 and minimum nonzero value from Next Up to Ready to Deploy on the Analytics-Kanban board.
Feb 2 2021, 5:17 PM · Analytics-Kanban, Analytics
Milimetric moved T262725: "Active editors" panel keeps flashing on stats.wikimedia.org from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Feb 2 2021, 5:15 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats

Feb 1 2021

Milimetric added a comment to T261681: Add time interval limits to pageview API.

Agreed with @JAllemandou, if I was thinking anything fancier than hard-coded start dates for each dataset, it's lost in some dusty corner of my brain. Thanks for taking this, glad it's straightforward.

Feb 1 2021, 11:21 PM · Patch-For-Review, Analytics
Milimetric moved T262725: "Active editors" panel keeps flashing on stats.wikimedia.org from In Progress to In Code Review on the Analytics-Kanban board.
Feb 1 2021, 2:15 PM · Patch-For-Review, Analytics-Kanban, Analytics, Analytics-Wikistats
Milimetric created T273470: Bug: Active Editors showing July numbers.
Feb 1 2021, 1:57 PM · Analytics-Kanban, Analytics

Jan 29 2021

Milimetric added a comment to T272530: Adding a graph to a page doubles JS payload on mobile and desktop.

To clarify on my broken heart, this is what I explained would happen in my RFC to replace graphoid: T249419. I really hope we can prioritize pushing that forward.

Jan 29 2021, 9:54 PM · Performance-Team (Radar), Readers-Web-Backlog (Tracking), Mobile, MediaWiki-extensions-Graph
Milimetric awarded T272530: Adding a graph to a page doubles JS payload on mobile and desktop a Heartbreak token.
Jan 29 2021, 9:53 PM · Performance-Team (Radar), Readers-Web-Backlog (Tracking), Mobile, MediaWiki-extensions-Graph
Milimetric added a comment to T40010: RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis.

The above sound like very workable plans, thank you both for stepping up. To be clear, I can't coordinate this work, but hopefully as this goes through the new process we can find someone who can.

Jan 29 2021, 9:52 PM · TechCom-RFC, MediaWiki-File-management, Commons, Multimedia, Wikimedia-SVG-rendering
Milimetric added a comment to T271953: Add client TCP source port to webrequest.

The pageview definition was changed to react to the debug header, and the change to load webrequest_128 is up for review (both linked to T273083).

Jan 29 2021, 9:50 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric created T273329: Groom Incoming for Security Tasks.
Jan 29 2021, 9:23 PM · Analytics
Milimetric moved T273083: Filter out webrequest where debug=1 from pageview from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Jan 29 2021, 8:19 PM · Analytics-Kanban, Analytics
Milimetric added a comment to T271568: Follow up on Druid alarms not firing when Druid indexations were failing due to permission issues.

For what it's worth, I agree with your suggestion above, we could just unconditionally sys.exit(if ...). If there's a reason not to do that, I'd be interested.

Jan 29 2021, 2:52 PM · Analytics-Kanban, Analytics

Jan 28 2021

Milimetric moved T273083: Filter out webrequest where debug=1 from pageview from Next Up to In Code Review on the Analytics-Kanban board.
Jan 28 2021, 4:32 PM · Analytics-Kanban, Analytics
Milimetric claimed T273083: Filter out webrequest where debug=1 from pageview.
Jan 28 2021, 4:32 PM · Analytics-Kanban, Analytics

Jan 26 2021

Milimetric added a comment to T272926: Upgrade UA Parser to 1.5.1+.

https://github.com/ua-parser/uap-java/wiki/ChangeLog

Jan 26 2021, 12:15 AM · Analytics-Kanban, Analytics
Milimetric created T272926: Upgrade UA Parser to 1.5.1+.
Jan 26 2021, 12:15 AM · Analytics-Kanban, Analytics

Jan 22 2021

Milimetric raised the priority of T261681: Add time interval limits to pageview API from Medium to High.

This should be re-prioritized to high, and perhaps the parent task should be reopened and revisited because we're seeing service disruption caused by single UAs doing either large volumes or big queries.

Jan 22 2021, 7:20 PM · Patch-For-Review, Analytics

Jan 21 2021

Milimetric added a subtask for T120242: Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth: T215001: Revisions missing from mediawiki_revision_create.
Jan 21 2021, 6:21 PM · WMF-Architecture-Team, Platform Team Legacy (Later), Analytics, Event-Platform, Services (later)
Milimetric added a parent task for T215001: Revisions missing from mediawiki_revision_create: T120242: Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth.
Jan 21 2021, 6:21 PM · Analytics-Kanban, Growth-Team, Product-Analytics, Analytics
Milimetric added a comment to T240995: AQS is not OpenAPI 3 compliant.

ping @Pchelolo: what's the latest plan on this?

Jan 21 2021, 6:20 PM · Analytics-Kanban, Patch-For-Review, Analytics

Jan 19 2021

Milimetric added a comment to T271953: Add client TCP source port to webrequest.

Hi! We'd like to add this to the X-Analytics header, if that's ok with everyone. This way we don't have to add a new field. Here's another task that's adding a debug flag this way: T263683, along with a patch for example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/629735

Jan 19 2021, 5:23 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric added a comment to T263683: Mechanism to flag webrequests as "debug".

ping @jijiki on the above question ^. In the meantime we have another request to add client source port to this data, so I wanted to bundle both changes together. That's tracked here T271953. Maybe we can meet quickly to sort this out?

Jan 19 2021, 5:22 PM · serviceops, Analytics-Kanban, Analytics, User-jijiki
Milimetric added a comment to T40010: RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis.

I'm sorry, I found this buried somewhere in my notes, that I was supposed to post this on Commons at some point, as a call for product management on a potential switch. Putting it here, as the RFC process winds down, just so it's not lost. But I think it's almost a year old at this point:

Jan 19 2021, 4:08 PM · TechCom-RFC, MediaWiki-File-management, Commons, Multimedia, Wikimedia-SVG-rendering

Jan 14 2021

Milimetric lowered the priority of T262201: Gather all data-purge into a single job from High to Medium.
Jan 14 2021, 6:04 PM · Analytics
Milimetric added a subtask for T272060: Implement Data Governance Tool: T262201: Gather all data-purge into a single job.
Jan 14 2021, 6:04 PM · Privacy Engineering, Analytics
Milimetric added a parent task for T262201: Gather all data-purge into a single job: T272060: Implement Data Governance Tool.
Jan 14 2021, 6:04 PM · Analytics
Milimetric moved T272060: Implement Data Governance Tool from Incoming to Smart Tools for Better Data on the Analytics board.
Jan 14 2021, 6:04 PM · Privacy Engineering, Analytics
Milimetric triaged T272060: Implement Data Governance Tool as Medium priority.
Jan 14 2021, 6:01 PM · Privacy Engineering, Analytics
Milimetric added a comment to T271571: Update Image usage metric.

@cchen: patch is up, will be reviewed shortly and will apply on the next monthly run. I'm wondering if there's a need to backfill old data, I think we keep at least one old snapshot, but it's a lot of computation so let me know if it's important.

Jan 14 2021, 2:15 PM · Analytics-Kanban, Product-Analytics, Analytics

Jan 12 2021

Milimetric added a comment to T215858: Plan a replacement for wiki replicas that is better suited to typical OLAP use cases than the MediaWiki OLTP schema.

I'm happy to advise someone working on this, but I can't drive the work, we have had to re-focus and trim down a lot of scope. We're struggling with all the changes on our team.

Jan 12 2021, 7:53 PM · cloud-services-team (Kanban), Data-Services, Analytics
Milimetric updated subscribers of T120242: Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth.

@Clarakosi: I think @Ottomata meant to ping you above, adding here.

Jan 12 2021, 7:49 PM · WMF-Architecture-Team, Platform Team Legacy (Later), Analytics, Event-Platform, Services (later)
Milimetric added a comment to T215001: Revisions missing from mediawiki_revision_create.

note to self: re-evaluate after the 503s above got better

Jan 12 2021, 7:43 PM · Analytics-Kanban, Growth-Team, Product-Analytics, Analytics
Milimetric added a comment to T133452: RFC: Create temporary accounts for anonymous editors.

One potential solution we can borrow from Google docs is to assign random names to users. This would be trickier at our scale than on a document shared with a few dozen people, but could be possible. Of course, we allow actual users to have any name they want, so styling still comes into play.

Jan 12 2021, 7:41 PM · Privacy Engineering, TechCom-RFC, User-Tgr, WMF-Legal, Privacy, MediaWiki-Authentication-and-authorization
Milimetric moved T268811: AQS should be more resilient to druid nodes not available from In Code Review to Done on the Analytics-Kanban board.

Deployed on January 5th according to the train etherpad. Change was https://gerrit.wikimedia.org/r/c/analytics/aqs/+/649884

Jan 12 2021, 7:37 PM · Analytics-Kanban, Analytics
Milimetric added a comment to T137291: Transition all use of EasyTimeline to the Graph extension and decommission it from Wikimedia's servers.

@Seb35: apologies for the drive-by comment but it should actually be quite easy to write either a static conversion script or a dynamic Lua script:

Jan 12 2021, 7:31 PM · Technical-Debt, Wikimedia-Extension-setup, Multimedia, Epic, MediaWiki-extensions-Graph, EasyTimeline
Milimetric moved T188859: Wikistats 2.0: Add statistics for the geographical origin of the contributors from In Code Review to Done on the Analytics-Kanban board.
Jan 12 2021, 7:10 PM · Analytics-Kanban, Analytics, Analytics-Wikistats
Milimetric added a comment to T249419: RFC: Render data visualizations on the server.

I shall not let this go stale for more than a few months. I had a meeting with Product to try and figure out how we schedule this, but so far we're just in the brainstorming phase. I'll update this task even as we shift to a different decision making process, so anyone interested can stay subscribed here.

Jan 12 2021, 7:04 PM · covid-19, TechCom-RFC
Milimetric added a comment to T270140: Release dataset on top search engine referrers by country, device, and language.

@bmansurov that's a good overview but it can be too detailed and not all of it is relevant. My suggestion is to look at the pageview hourly job, because you'll be writing something very similar. You're basically depending on the pageview_actor dataset, and transforming the data. That's what this job does: https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pageview/hourly The xml is just setting up that dependency and this HQL query does the transformation: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/hourly/pageview_hourly.hql

Jan 12 2021, 12:24 AM · Patch-For-Review, Privacy Engineering, Research, Analytics

Jan 11 2021

Milimetric added a comment to T270140: Release dataset on top search engine referrers by country, device, and language.

Thanks @Isaac, hi @bmansurov!! Actually, this should be an oozie job. They're a bit more of a pain to write, but I can help with that. The major benefit is that we get alerts if the pipeline breaks or gets stuck, and it's easier to rerun and backfill. Do ping me everywhere if I'm not responsive enough.

Jan 11 2021, 3:51 PM · Patch-For-Review, Privacy Engineering, Research, Analytics

Jan 4 2021

Milimetric triaged T270140: Release dataset on top search engine referrers by country, device, and language as Medium priority.

We groomed this today, and here are our thoughts:

Jan 4 2021, 5:05 PM · Patch-For-Review, Privacy Engineering, Research, Analytics

Dec 22 2020

Milimetric moved T263055: Add log entry details to page and user events in EventBus from In Progress to In Code Review on the Analytics-Kanban board.
Dec 22 2020, 9:38 PM · Patch-For-Review, Platform Engineering, Analytics-Kanban, Analytics

Dec 17 2020

Milimetric closed T250049: Drop data from Prefupdate schema that is older than 90 days as Resolved.

sudo -u analytics hdfs dfs -rm -r -skipTrash /wmf/data/archive/backup/tmp/event_PrefUpdate_Backup

sudo -u analytics hdfs dfs -rm -r -skipTrash /wmf/data/archive/backup/tmp/event_sanitized_PrefUpdate_Backup
Dec 17 2020, 7:05 PM · Analytics-Kanban, audits-data-retention, Analytics, Product-Analytics, Privacy Engineering, Privacy, Security
Milimetric added a comment to T269308: Uncaught TypeError: Cannot read property 'items' of null / TypeError: null is not an object (evaluating 'view.model().scene().items').

(I'm only tangentially involved with client-side graphs as they relate to the graphoid replacement I'm trying to prioritize. So I'm happy to see @Jseddon slaying this bug. And I'm working with product to prioritize a bigger effort to maintain graphs going forward; progress is very slow there)

Dec 17 2020, 5:14 PM · Product-Infrastructure-Team-Backlog, Mobile, Wikimedia-production-error, covid-19, MediaWiki-extensions-Graph