Milimetric (Dan Andreescu)
User

Projects (13)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 5:48 PM (217 w, 4 d)
Availability
Available
LDAP User
Milimetric
MediaWiki User
Milimetric (WMF) [ Global Accounts ]

Recent Activity

Fri, Dec 7

Milimetric moved T177950: Add a tooltip to all non-obvious concepts like split categories, abbreviations from Next Up to In Progress on the Analytics-Kanban board.
Fri, Dec 7, 8:40 PM · Analytics-Kanban, Analytics, Analytics-Wikistats
Milimetric moved T187207: Spin out a tiny EventLogging RL module for lightweight logging from In Progress to In Code Review on the Analytics-Kanban board.
Fri, Dec 7, 5:13 PM · MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Patch-For-Review, Analytics-Kanban, Performance-Team (Radar), Analytics, Analytics-EventLogging
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

To duplicate something like check_private_data in Hadoop, I'd guess a day to write it and a couple of days to review and test it. So probably like a week to get everything deployed and integrated with the current job. We have to change some other jobs too to make them depend on this check.

Fri, Dec 7, 3:16 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Whilst we still discuss if this JOIN "feature" can do what we think

Fri, Dec 7, 2:31 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics

Thu, Dec 6

thiemowmde awarded T106244: URL encoded values using fallback 8-bit encoding (invalid UTF-8) cause mediawiki.Uri to crash a Heartbreak token.
Thu, Dec 6, 1:42 PM · MediaWiki-General-or-Unknown, Performance-Team, JavaScript

Wed, Dec 5

Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

On the issue of storage, the average per wiki would definitely not come out to 5GB. If you plot all wikis by number of revisions, the curve has a very long tail, with big wikis like enwiki, commons, wikidata being more outliers than the norm. I just did a basic count(*) in hadoop and enwiki has 801325542 out of a total of 3704431563 revisions across all wikis across all time. If the comment and actor table scale roughly in line with this*, and enwiki has a 31GB materialized comment view as @Banyek mentioned, then my estimate for the overall size of the materialized comment view is around 150GB. Actor should be smaller because it benefits from sharing. I'm not sure if that's still too big, but just wanted to say I have good reason to think it's not as bad as the 4500GB estimate.

Wed, Dec 5, 10:36 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Thanks very much @Anomie, I understand my misunderstanding, and your third answer is what I was asking.

Wed, Dec 5, 5:44 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

According to the actor view definition: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/labs/db/views/maintain-views.yaml#222, a particular actor_id is not present if it's sanitized by *any* log/rev/ar/etc_deleted flag.

You seem to have the logic backwards. The actor view will expose the row if any log/rev/ar/etc_deleted flag doesn't "sanitize" it, or if it's referenced from the user or image tables which don't even have an xx_deleted flag.

Wed, Dec 5, 5:00 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Following up on a good problem that @Bstorm raised with my approach. I would love @Anomie to take a look at my comment as well and see if it makes sense. Ok, so to confirm @Marostegui's understanding, yes, we are importing archive, logging, and revision from the views in the cloud replicas, and actor, comment unsanitized from the production replicas. And the only way we use rows from actor and comment is to join them to rows from the three views, which are sanitized. And I believe what @Bstorm was pointing out was that if rev_deleted is set to sanitize rev_actor, that would be fine for the revision table. But if the archive table references the same actor_id through ar_actor, but the ar_deleted flag is not set, then that would not be sanitized. According to the actor view definition: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/templates/labs/db/views/maintain-views.yaml#222, a particular actor_id is not present if it's sanitized by *any* log/rev/ar/etc_deleted flag.

Wed, Dec 5, 4:47 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric updated subscribers of T210693: Create materialized views on Wiki Replica hosts for better query performance.
Wed, Dec 5, 4:30 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics

Tue, Dec 4

Milimetric added a comment to T210522: Refactor Sqoop, join actor and comment from analytics replicas.

I have done a limited test on 3 wikis: etwiki, simplewiki, and hawiktionary. Description of the test and results:

Tue, Dec 4, 9:49 PM · Analytics-Kanban, Analytics
Milimetric added a comment to T210749: Hardware for cloud db replicas for analytics usage .

The other reason for a single host instead of redundant hosts is this: our only critical use of the box is during the first few days of the month. The box would have to break during those first days to really impact us, if it breaks at any other time and it's rebuilt by the time we need it, we're fine. So that further mitigates our risk. Assuming we can query from the other replicas in the worst case scenario, I think we're fine with just one additional box.

Tue, Dec 4, 8:17 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

And to follow up on my first bullet from before:

Tue, Dec 4, 8:12 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

+1 to @Bstorm's framing of the problem. I was going to say the same thing, and withdraw my revision query from above from @Banyek's consideration. We can work around our performance problem with or without more hardware (more is better), but everyone wins if we get faster queries. What the DBAs should consider, with our help, is just improving the performance in general as @Bstorm outlined.

Tue, Dec 4, 7:58 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210313: Statistics for views of individual Wikimedia images.

Still, if we got all the media requests from Media Viewer into EventLogging, we would not have all media requests for mediawiki in general. To do that, we'd have to go around instrumenting any place that fetches and renders media (in core, extensions, etc.). I think if we want thorough and sensical mediacounts until all that work happens, we need to handle the filtering on Hadoop.

Tue, Dec 4, 4:17 PM · Analytics, Tool-Pageviews
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.
  1. exact query for a materialized view that would allow us to import the revision table into Hadoop. There are two possible ways to do this, depending on where the query runs:
Tue, Dec 4, 3:59 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Thanks to @Banyek for the questions and a talk we just had over hangouts, we decided to go in two directions in parallel:

Tue, Dec 4, 3:48 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics

Mon, Dec 3

Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Yes, all the templates for the queries are here, and they're easy to read, basic python templating: https://github.com/wikimedia/analytics-refinery/blob/7cf62e1a7b9cc65cdf229e8a13ced8f6769f0a14/python/refinery/sqoop.py#L180 (that's archive, but scroll down and you'll see all the tables).

Mon, Dec 3, 8:23 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210749: Hardware for cloud db replicas for analytics usage .

Ok, done and agreed. But instead of trying to find hardware that will keep up with replication, I'm asking if replication is necessary, could we do it any other way, given the relatively simple requirements?

Mon, Dec 3, 7:58 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
Milimetric added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

More details on requirements for Analytics. The queries that we'll be running will be of the form:

Mon, Dec 3, 7:56 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric added a comment to T210749: Hardware for cloud db replicas for analytics usage .

Got it. Luca asked me to comment here describing exactly what we need to do on the boxes. But if basic replication can't even run, I of course defer to you all. It does raise questions about this approach, though, to waste so many resources to replicate real-time all operations for hundreds of tables, when ultimately we need a snapshot of a small minority of tables.

Mon, Dec 3, 7:47 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
Milimetric added a comment to T210749: Hardware for cloud db replicas for analytics usage .

The queries that we'll be running on here will be of the form:

Mon, Dec 3, 7:30 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
Milimetric updated subscribers of T210693: Create materialized views on Wiki Replica hosts for better query performance.

@Marostegui, we would like to go over plans for implementation during our Wednesday meeting. Is there anything else you'd like us to define or discuss before then?

Mon, Dec 3, 5:41 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric moved T210541: Update sqoop to work with the new schema from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Mon, Dec 3, 4:39 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric moved T210543: Update refinery-source jobs to join labsdb with actor and comment from In Progress to In Code Review on the Analytics-Kanban board.
Mon, Dec 3, 4:39 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric moved T210522: Refactor Sqoop, join actor and comment from analytics replicas from In Progress to In Code Review on the Analytics-Kanban board.
Mon, Dec 3, 4:39 PM · Analytics-Kanban, Analytics
Milimetric moved T187207: Spin out a tiny EventLogging RL module for lightweight logging from Done to In Progress on the Analytics-Kanban board.
Mon, Dec 3, 4:39 PM · MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Patch-For-Review, Analytics-Kanban, Performance-Team (Radar), Analytics, Analytics-EventLogging
Milimetric added a comment to T187207: Spin out a tiny EventLogging RL module for lightweight logging.

I can take care of this, @Krinkle, unless you're doing it as an urgent matter. Let me know.

Mon, Dec 3, 4:38 PM · MW-1.33-notes (1.33.0-wmf.4; 2018-11-13), MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), Patch-For-Review, Analytics-Kanban, Performance-Team (Radar), Analytics, Analytics-EventLogging
Milimetric added a comment to T210313: Statistics for views of individual Wikimedia images.

FWIW, there is a way to detect that - virtual media views (T89088) were developed for that specific purpose (and MediaViewer sets different headers, at least on reasonably modern browsers). It just wasn't implemented in the mediacounts logic.
Also, I wouldn't call them corner cases - if I remember the stats correctly, for file types supported by Media Viewer preloads would comprise over half of the requests.

Mon, Dec 3, 4:33 PM · Analytics, Tool-Pageviews
Milimetric created T211030: Use virtual image views to filter mediacounts.
Mon, Dec 3, 4:32 PM · Analytics
Milimetric added a comment to T191964: Clickstream dataset for Persian Wikipedia only includes external values.

It gives me application not found :/
Where can I submit a spark job like you did? stat1007? and how I can download my patch to test it?
(If I can test it on my own, I will make sure it works and then I bug you)

Mon, Dec 3, 4:13 PM · Patch-For-Review, Analytics-Kanban, Analytics

Thu, Nov 29

Milimetric added a comment to T210313: Statistics for views of individual Wikimedia images.
  1. I assume that instead of images what we really mean here is "files." E.g., presumably this will also give us a count of pageviews (not plays) to video or audio files?
  2. My understanding is that it is actually pretty common for users to upload images, etc., directly to individual wikis. Can this track that as well?
Thu, Nov 29, 10:26 PM · Analytics, Tool-Pageviews
Milimetric moved T210749: Hardware for cloud db replicas for analytics usage from Incoming to Radar on the Analytics board.
Thu, Nov 29, 6:31 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
Milimetric moved T210031: Create alert on EventBus 400 error rate from Incoming to Radar on the Analytics board.
Thu, Nov 29, 6:31 PM · Services (done), EventBus, Analytics
Milimetric raised the priority of T210705: Move turnilo to nodejs 10 from Normal to High.
Thu, Nov 29, 6:30 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
Milimetric raised the priority of T210706: Move AQS to nodejs 10 from Normal to High.
Thu, Nov 29, 6:30 PM · Analytics
Milimetric assigned T210741: EventStreams process occasionally OOMs to Ottomata.
Thu, Nov 29, 6:30 PM · Patch-For-Review, Services, Wikimedia-Stream, Analytics
Milimetric moved T210741: EventStreams process occasionally OOMs from Incoming to Kafka Work on the Analytics board.
Thu, Nov 29, 6:30 PM · Patch-For-Review, Services, Wikimedia-Stream, Analytics
Milimetric moved T209822: Add new wikis to analytics from Incoming to Operational Excellence on the Analytics board.
Thu, Nov 29, 6:14 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T209822: Add new wikis to analytics as High priority.
Thu, Nov 29, 6:13 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T209888: Alert and halt mediawiki processing on schema changes as Normal priority.
Thu, Nov 29, 6:12 PM · Analytics
Milimetric renamed T210006: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. from Event counts from Mysql and Hive don't match. Hive is persisting data from crawlers. to Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. .
Thu, Nov 29, 6:09 PM · Product-Analytics, Analytics
Milimetric lowered the priority of T210006: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. from High to Normal.
Thu, Nov 29, 6:08 PM · Product-Analytics, Analytics
Milimetric triaged T210006: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. as High priority.
Thu, Nov 29, 6:08 PM · Product-Analytics, Analytics
Milimetric moved T210099: druid ingestion should calculate 1/sample rate to be able to normalize event counts from Incoming to Smart Tools for Better Data on the Analytics board.
Thu, Nov 29, 6:07 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T210099: druid ingestion should calculate 1/sample rate to be able to normalize event counts as High priority.
Thu, Nov 29, 6:07 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T210110: [EventLogging Sanitization] Fix passing of input_path_regex params to Refine as High priority.
Thu, Nov 29, 5:56 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric moved T210306: Feedback on Wikistats 2 new edits pages from Incoming to Radar on the Analytics board.
Thu, Nov 29, 5:56 PM · Analytics, Internet-Archive, Analytics-Wikistats
Milimetric triaged T210313: Statistics for views of individual Wikimedia images as High priority.
Thu, Nov 29, 5:55 PM · Analytics, Tool-Pageviews
Milimetric merged T88775: Add mediacounts to pageview API into T210313: Statistics for views of individual Wikimedia images.
Thu, Nov 29, 5:55 PM · Analytics, Tool-Pageviews
Milimetric merged task T88775: Add mediacounts to pageview API into T210313: Statistics for views of individual Wikimedia images.
Thu, Nov 29, 5:55 PM · Multimedia, Analytics
Milimetric added a comment to T210313: Statistics for views of individual Wikimedia images.

Will merge others into this, but keep in mind this nice analysis about the storage in Cassandra implications: T88775#4751882

Thu, Nov 29, 5:55 PM · Analytics, Tool-Pageviews
Milimetric claimed T210422: Link to User Contribution page in wikistats UI rather than user page.
Thu, Nov 29, 5:53 PM · Analytics-Kanban, Analytics
Milimetric moved T210422: Link to User Contribution page in wikistats UI rather than user page from Incoming to Wikistats Beta on the Analytics board.
Thu, Nov 29, 5:52 PM · Analytics-Kanban, Analytics
Milimetric triaged T210423: Wikistats2 metric: top article creators as High priority.
Thu, Nov 29, 5:52 PM · Analytics
Milimetric moved T210424: Wikistats2 UX bug: table option should not be available in table graph selected from Incoming to Wikistats Production on the Analytics board.
Thu, Nov 29, 5:51 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric renamed T210424: Wikistats2 UX bug: table option should not be available in table graph selected from Wikistats2 UX bug: table option should not be available in table option (example table view of top edits) to Wikistats2 UX bug: table option should not be available in table graph selected.
Thu, Nov 29, 5:51 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T210457: Alert on validation errors on new stream intake service as Normal priority.
Thu, Nov 29, 5:50 PM · Analytics
Milimetric triaged T210459: Explore adding validation alarms to eventbus using logstash as High priority.
Thu, Nov 29, 5:50 PM · EventBus, Services (watching), Analytics
Milimetric lowered the priority of T210462: Merge metadata from filtered-tables.txt and maintain-views.yaml from High to Normal.
Thu, Nov 29, 5:49 PM · Analytics
Milimetric triaged T210462: Merge metadata from filtered-tables.txt and maintain-views.yaml as High priority.
Thu, Nov 29, 5:49 PM · Analytics
Milimetric renamed T210543: Update refinery-source jobs to join labsdb with actor and comment from New refinery-source job to join labsdb with actor and comment to Update refinery-source jobs to join labsdb with actor and comment.
Thu, Nov 29, 4:57 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric added a comment to T195473: [GOAL] Invest in the MobileFrontend & MinervaNeue frontend architecture.

Thanks for outlining that, @Jhernandez. I agree with @Jdlrobson about making a separate task, so I did: T210694

Thu, Nov 29, 4:06 AM · Readers-Web-Backlog, MobileFrontend, MinervaNeue, Goal
Milimetric created T210694: Is there a future where we unify front ends.
Thu, Nov 29, 4:05 AM · Readers-Web-Backlog, MobileFrontend
Milimetric updated subscribers of T210693: Create materialized views on Wiki Replica hosts for better query performance.

cc @Nuria so she's in the loop that we're following up here about the materialized views work.

Thu, Nov 29, 3:55 AM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Milimetric triaged T210693: Create materialized views on Wiki Replica hosts for better query performance as High priority.
Thu, Nov 29, 3:54 AM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics

Wed, Nov 28

Milimetric added a comment to T195473: [GOAL] Invest in the MobileFrontend & MinervaNeue frontend architecture.

I think the questions are around unifying Mobile Front End and Core. Is this a good idea, something that's planned at some point, or considered and decided against? And, relevant to this epic, how does the Minerva refactor affect any such plans?

Wed, Nov 28, 9:24 PM · Readers-Web-Backlog, MobileFrontend, MinervaNeue, Goal
Milimetric updated subscribers of T195473: [GOAL] Invest in the MobileFrontend & MinervaNeue frontend architecture.

I also have some questions about this effort on behalf of TechCom. Mainly, is there a future where we unify front ends? If so, we could sketch plans together for how to get that done. If not, do you have thoughts about keeping the separation? @daniel is thinking a lot about not having two front ends, so cc-ing him on any conversation. I personally don't know enough yet, I think, but this is an area of our platform that I've been interested in. I'd like to take a more active role, maybe helping with some basic tasks to get up to speed.

Wed, Nov 28, 2:16 PM · Readers-Web-Backlog, MobileFrontend, MinervaNeue, Goal
Milimetric moved T210570: Dashiki should filter out empty newlines from In Progress to Done on the Analytics-Kanban board.
Wed, Nov 28, 2:51 AM · Patch-For-Review, Analytics, Analytics-Kanban, Analytics-Dashiki
Milimetric added a comment to T210135: Adjust MT graph to clarify the presented concepts.

The undefined line is due to an empty newline at the end of the file, I'll fix it as a bug in dashiki and deploy your dashboard again, no other action will be required. See T210570

Wed, Nov 28, 2:46 AM · CX-analytics, Language-Team (Language-2018-October-December)
Milimetric moved T210570: Dashiki should filter out empty newlines from Next Up to In Progress on the Analytics-Kanban board.
Wed, Nov 28, 2:17 AM · Patch-For-Review, Analytics, Analytics-Kanban, Analytics-Dashiki
Milimetric created T210570: Dashiki should filter out empty newlines.
Wed, Nov 28, 2:17 AM · Patch-For-Review, Analytics, Analytics-Kanban, Analytics-Dashiki
Milimetric updated the task description for T205744: Deprecation Information for EventLogging ResourceLoader modules.
Wed, Nov 28, 2:07 AM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Analytics-Kanban, Analytics-EventLogging, Analytics
Milimetric added a comment to T210320: flake8 errors on wikimetrics.

@rafidaslam, I'll leave this up to you, do whatever you like: ignore 504, 503, or even ignore both. It's a matter of style that I don't think affects the readability of the code too much.

Wed, Nov 28, 1:54 AM · Patch-For-Review, WorkType-Maintenance, Analytics, Analytics-Wikimetrics

Tue, Nov 27

Milimetric moved T210541: Update sqoop to work with the new schema from Next Up to In Code Review on the Analytics-Kanban board.
Tue, Nov 27, 8:17 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric moved T210543: Update refinery-source jobs to join labsdb with actor and comment from Next Up to In Progress on the Analytics-Kanban board.
Tue, Nov 27, 8:17 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric moved T210542: Update datasets definitions and oozie jobs for dual-sqoop of comments and actors from Next Up to In Progress on the Analytics-Kanban board.
Tue, Nov 27, 8:17 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T210543: Update refinery-source jobs to join labsdb with actor and comment as High priority.
Tue, Nov 27, 8:17 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T210542: Update datasets definitions and oozie jobs for dual-sqoop of comments and actors as High priority.
Tue, Nov 27, 8:16 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric triaged T210541: Update sqoop to work with the new schema as High priority.
Tue, Nov 27, 8:15 PM · Patch-For-Review, Analytics-Kanban, Analytics
Milimetric moved T210522: Refactor Sqoop, join actor and comment from analytics replicas from Next Up to In Progress on the Analytics-Kanban board.
Tue, Nov 27, 5:31 PM · Analytics-Kanban, Analytics
Milimetric triaged T210522: Refactor Sqoop, join actor and comment from analytics replicas as High priority.
Tue, Nov 27, 5:30 PM · Analytics-Kanban, Analytics
Milimetric added a comment to T165118: Support Vega 3.0 and Vega Lite 2.0.

@domoritz I've been trying to make time to work on it. I think I can start this Friday, thank you for offering to help! I'll ping you with any questions on this task.

Tue, Nov 27, 4:15 PM · Google-Summer-of-Code, Graphs
Milimetric added a comment to T210105: Cassandra JSON fields getting reversed.

@Pchelolo, weird! I just blamed the first thing I saw that made any sense, and now we have no explanation. But, it doesn't matter, like you said the bug was not sorting in the first place. Agreed with Invalid, thanks.

Tue, Nov 27, 4:09 PM · Services

Mon, Nov 26

Milimetric created T210462: Merge metadata from filtered-tables.txt and maintain-views.yaml.
Mon, Nov 26, 10:33 PM · Analytics
Milimetric added a comment to T146774: Add external link to tabs layout.

I tried to register at the https://codein.withgoogle.com/ site, but that site has some major bugs... For example, when signing in, it says I am not authorized to be there. But it also signs me out of Chrome!!! I didn't think that was possible... I'm going to stay away unless someone lets me know how to get around this.

Mon, Nov 26, 10:27 PM · Google-Code-in-2018, goodfirstbug, Analytics, Analytics-Dashiki
Milimetric added a comment to T210091: Pageviews top endpoint in descending order as of 2018-11-20.

Just to follow up to put this issue to rest, the caches are all cleared and the responses are all consistent, going from rank 1 to rank 1000 in order. This is a sort at the service level so it won't be affected by underlying libraries anymore.

Mon, Nov 26, 4:39 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Wed, Nov 21

Milimetric added a comment to T210091: Pageviews top endpoint in descending order as of 2018-11-20.

The fix for this has been deployed, but it'll take a while to clear the cache. Sorry for the inconvenience.

Wed, Nov 21, 8:16 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API
Milimetric moved T210091: Pageviews top endpoint in descending order as of 2018-11-20 from Incoming to Operational Excellence on the Analytics board.
Wed, Nov 21, 8:15 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API
Milimetric moved T210091: Pageviews top endpoint in descending order as of 2018-11-20 from In Progress to Done on the Analytics-Kanban board.
Wed, Nov 21, 7:55 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API
Milimetric placed T210022: Allow access to Data Lake/Hive for Niharika up for grabs.
Wed, Nov 21, 7:06 PM · Patch-For-Review, SRE-Access-Requests, Operations, Analytics
Milimetric added a comment to T210022: Allow access to Data Lake/Hive for Niharika.

Private. Niharika would benefit from being a part of analytics-privatedata-users, including access to data before it's sanitized.

Wed, Nov 21, 7:06 PM · Patch-For-Review, SRE-Access-Requests, Operations, Analytics
Milimetric created T210105: Cassandra JSON fields getting reversed.
Wed, Nov 21, 7:04 PM · Services
Milimetric added a comment to T208909: [Bug] Update old nonuniformly distributed page_random values.

Random thought related to this: part of the fun of "random page" was getting really weird things, and this bug might have helped with that. The older the page, the higher the probability it's a more common topic (Britney Spears was added before Free energy principle). Not sure how everyone feels about messing with "random page", but we could for example build a service that picks a random page, making it less likely to be picked if it has high pageviews.

Wed, Nov 21, 5:37 PM · MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), Patch-For-Review, Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q2), DBA, MediaWiki-General-or-Unknown
Milimetric moved T190434: Issues with page deleted dates on data lake from In Progress to Paused on the Analytics-Kanban board.
Wed, Nov 21, 5:12 PM · Analytics, Analytics-Kanban
Milimetric moved T210091: Pageviews top endpoint in descending order as of 2018-11-20 from Next Up to In Progress on the Analytics-Kanban board.
Wed, Nov 21, 5:12 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API
Milimetric claimed T210091: Pageviews top endpoint in descending order as of 2018-11-20.
Wed, Nov 21, 5:12 PM · Patch-For-Review, Analytics-Kanban, Analytics, Pageviews-API

Tue, Nov 20

Milimetric added a comment to T200970: Add logging to gauge TemplateWizard usage.

Sorry to have missed this ping so long. If you're waiting on me for more than a day, and you need an answer quickly, do ping me on IRC.

Tue, Nov 20, 10:19 PM · MW-1.33-notes (1.33.0-wmf.2; 2018-10-30), Patch-For-Review, Community-Tech-Sprint, Product-Analytics, Community-Tech, MediaWiki-extensions-TemplateWizard
Milimetric added a comment to T209819: Expose new ipblocks_restrictions table to Wiki Replica users.

Great. Once this is available in the cloud replicas, give me a ping and I'll update the Hadoop import to include it.

Tue, Nov 20, 10:12 PM · cloud-services-team (Kanban), Data-Services, Security-Team, Anti-Harassment
Milimetric added a comment to T209031: Not able to scoop comment table in labs for mediawiki reconstruction process.

I think @Bawolff was referring to the automatic query that Sqoop generates against the table you point it at, usually something like select min(id_you_split_by), max(id_you_split_by) from table_to_sqoop. I would hope the mariadb optimizer knows to treat that the same as the order-by-comment approach you mentioned, but you never know :) In any case, we don't have much control over how Sqoop does these queries, but we could choose to import those tables in parallel, which would skip that inefficient query. But that would slow it down as well. Basically it's a tricky problem, and we tried a few different solutions but ultimately the new views are just a bit too slow.

Tue, Nov 20, 10:09 PM · Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics