Thu, Apr 18
@Krinkle: so how is this going to move forward? Do you search around for all uses of schema.Something and make patches? Want me to do that?
Wed, Apr 17
Tue, Apr 16
hashar: these actually don't have anything to do with limn, they all hold configuration for reportupdater reports. We would rename them but that's not possible in gerrit, right? We could start a conversation about centralizing all of the reportupdater reports into one repo, each set of reports lives in its own folder so that would work fine. This is probably a decision for Product-Analytics, we'll talk it over with them.
Mon, Apr 15
Nuria: I'm not seeing a problem, but I did deploy the browser dashboard a few times since last time you saw it, maybe it was a caching thing.
ping @AndyRussG, can you confirm that it's ok to delete the old data?
Ok, thanks very much Gergo, looks like Sam set up a meeting for next week, we'll take it from there. Once I understand better how I can help, we should clean up these phab tasks.
Fri, Apr 12
quick drop-by to say that dashiki has support for these stacked bars (though no dashboards are currently using them).
Thanks @Tgr, that's a big pivot from what I was expecting, but hey, let's do it! How/when/who is making the decision on each bullet point? I could drive client and pipeline work, and beg ops for help with the Sentry server / Logstash part. In terms of owning the work going forward, I think Analytics is overloaded at the moment but it makes the most sense there. Extension:Sentry is broadly similar to Extension:EventLogging, and it sounds like EventGate/Kafka is the preferred choice for the pipeline, so that's squarely in our world.
Thu, Apr 11
to push forward on Sentry, we have to sort of do this and T500 at the same time. So let's start here. Two options:
+1 to this and tech talk, btw Sarah and Subbu are looking for more tech talks (see their recent email)
Wed, Apr 10
I would say so, @dr0ptp4kt, I'd maybe even go so far as to host a graph edit-a-thon to upgrade all v1.5 and v2 graphs to v4 and eliminate the problem entirely. I've just been talking with Wikimedia NYC and they wanted to experiment with a bit of tech flavor in their regular edit-a-thons, this would be a perfect start. Obviously we have to support v3 first. I would suggest the following plan:
Tue, Apr 9
Maybe as a quick fix, graphs could initially render as a generic "graph loading" image. Then vega and all dependencies needed could be loaded async and eventually render the graph. With graphs not super widely used yet, this shouldn't add too much load on our static asset serving. But it might become a problem if graphs become widely used. Hopefully a suitable solution / replacement for Graphoid would be available by then. For what it's worth, I think this is what Yuri wanted to do at the very beginning, but ops and others raised concerns that resulted in Graphoid.
Hm, I thought I was onto something with the is_bot flag, because the mysql consumer filters out events when useragent.is_bot is true. But I can't back this up from the existing data, mostly because is_bot seems to always be false now (in all data we have for this schema, only 6 records have is_bot true). I reran the queries with AND not useragent.is_bot anyway, and results are almost always identical. When they're not, my guess is the mysql consumer is dropping events due to restarts (like during the April 1st outage (also see March 7th and 19th)).
Mon, Apr 8
Thanks @Bawolff, I'll reopen this as it's enough of a pain for me and other workflows that it needs to be taken care of. Feel free to unsubscribe.
The easiest thing to do is to delete the old data and change the schema going forward. Let me know if this is ok to do, @AndyRussG. If not, I can do a more painful copy/rename/rename thing to keep the old data.
Thu, Apr 4
ah yea, I did this as part of filtering out all the metrics, and forgot to look after we reverted the desktop filtering
Wed, Apr 3
Would 429s get through to webrequest? I know at some point restbase was sending the 429s in which case, yes, but I forget if the throttling got moved to varnish in which case it might be earlier in the request life?
Hey @Tgr, I'm going to work on this as a side project, to get more familiar with mediawiki. I'm going to read up on the status in the various subtasks, let me know if there's somewhere obvious I should start.
Tue, Apr 2
@DStrine can you let us know what you'd like to do here? It's not technically complicated, but it's a little time sensitive in case you want to look at raw EventLogging data (which gets dropped after 90 days without a whitelist policy)
someone did this for us :)
Mon, Apr 1
oh! thanks! I was confused, now I'm not :)
@Bawolff I ran into the same issue, and sadly as Manuel pointed out we can't do that going forward. Our team's proposal is to replicate mysql to Hadoop via a solution like Debezium. We could then have semi-real-time data queryable on the cluster. We think the lag would be on the order of hours for tables that need data transforms and minutes for tables that are append-only. So we could have an append-only version of revision, for example, but to edit rows in place like revision deletion does we'd need to apply some transformation to the existing data based on the replication coming in.
warning: if this is not fun, reassign to dan or defer to after-beta
ping @fdans can you test if this gets better with your new time selector and if not, let's talk