User Details
- User Since
- Oct 8 2014, 5:48 PM (334 w, 2 d)
- Availability
- Available
- IRC Nick
- Milimetric
- LDAP User
- Milimetric
- MediaWiki User
- Milimetric (WMF) [ Global Accounts ]
Thu, Mar 4
Thank you for flagging! Updated docs to explain: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FData_Lake%2FEdits%2FMediawiki_history_dumps&type=revision&diff=1901897&oldid=1867966
Thank you for flagging! Updated docs to explain: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FData_Lake%2FEdits%2FMediawiki_history_dumps&type=revision&diff=1901897&oldid=1867966
oh my bad, I guess you're working on this. Let me know when you get to re-running, in case you hadn't run into that feature. I think you need my permissions to execute it, but I'm happy to help (ping me on IRC, I'm not real-time on phab).
Wed, Mar 3
To @Krinkle's point from above, there's not much inside of Jon's try block that could be our fault, because it's 99.99% third-party logic running in there, save for what version of vega we call and so on. If we do something bad, I think graphs would all break and we'd know about it. And the future of the graph extension is very uncertain right now anyway, these regressions will hopefully drive some product thinking.
This is a bit of a drive-by, but have we considered https://min.io/? I went a bit deeper than just the marketing and was impressed by their error-correcting implementation.
Ok to drop from everywhere in my opinion :P but definitely from everywhere not hdfs, including stat1006
Sorry to be so late looking at this. The newest of these files are 2.5 years old and besides us absolutely zero people know they exist (they've all left the foundation). Still I double checked they're all safely backed up on hdfs, I think it's ok to delete.
verified the new version of UA Parser is processing data as of the restart this morning, 2021-03-03T13:00 has the new data.
Mon, Mar 1
A bunch more work to do to package and deploy, but at least the basic patch is tested and done.
- There are 18% changes in unique classifications, we should do this more often, once a quarter still seems too slow
- No changes are hugely problematic to any of the metrics I can think of
- While I was typing this, UA Parser maintainers fixed a build problem and released 1.5.2 so that seems safe to upgrade to, will submit a patch shortly
Mon, Feb 22
This was deployed and runs monthly with a week delay. So the next run, around March 6th, should reflect the new logic.
@jijiki & @ema: just following up on this, everything was deployed on our side and looks to be working. If you've seen the actual data, let us know if it doesn't look right (for example https://w.wiki/326j)
Hi @Aklapper, yes, definitely please archive Analytics-Visualization, decline all the tasks in bulk if you could. Let me know if I can help.
Fri, Feb 19
Thu, Feb 18
Wed, Feb 17
Tue, Feb 16
Big +2 from me for access to reportupdater-queries and any access needed to rerun jobs if needed. Deployment there is a matter of puppet-sync, and queries are all completely separate.
Fri, Feb 12
Thu, Feb 11
Ping @Pchelolo, @lexnasser was looking at this as the next thing he might focus on. I hesitated to ping before because I know your plate's full. My question is, what are your plans with this upgrade, and can we take over part of it with Lex as a resource? So, one option that might work would be you & team do the service-template-node updates (Cassandra support, etc. as discussed above), and we (mostly Lex) do the AQS rewrite (maybe even TypeScript - if we can trick Lex :P). Thoughts?
Update: all cassandra jobs restarted and seem ok, except mediarequests per_file daily. Patch for that WIP above. See note in description, when figured out it needs to start at 2021-02-09. Other problems fixed and jobs restarted.
Wed, Feb 10
mediacounts and mediarequest have the same problem now that the syntax was worked out. Some hours, but not all hours, fail because of the way UDFs return structs. I think the best idea was Joseph's, to run them as spark sql (or pyspark if that's easier). But in the long term, we need to standardize these kinds of jobs and run them all the same way with the same boilerplate and only changing the query.
Tue, Feb 9
Mon, Feb 8
+1 to @Gilles's idea. Reverse image searches don't yield anything obvious.
Weird, it works on https://w.wiki/yPx for example. It seems that something's different about the netflow dataset, we'll try and brainstorm what that might be.
Fri, Feb 5
To add to Gilles's points:
The oversight is my fault, my apologies, I was too focused on our team's usage of the API. The initial motivation was high volume or wide timespan queries from a single user agent. Maybe the right solution isn't an overall limit but per-UA limits. I think the "All Time" use case is perfectly valid and I don't want to force you to rewrite that if there's an alternative. @lexnasser what do you think about a per-UA limit?
Feb 3 2021
I'll just jump in here and say that a few of us at the foundation have been arguing for a Design System. This is defined here and brought into our context by Santosh. The basic idea is that we've had too many false starts trying to centralize our design ideas, and that this is a precious opportunity to approach it in a more holistic way.
Feb 2 2021
@Amorymeltzer: I believe you, but something's not making sense. navigator.sendBeacon has been available in Firefox since v31, in 2014 (https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon) and your version is just from May of last year. Something is disabling / removing navigator.sendBeacon somehow. Could be some rogue code executing from an extension in your browser, gadget you have enabled, etc. I found very few references to other folks experiencing the same problem, like this obscure chat (look for the error): https://mozilla.logbot.info/fxos/20151112/raw.
Feb 1 2021
Agreed with @JAllemandou, if I was thinking anything fancier than hard-coded start dates for each dataset, it's lost in some dusty corner of my brain. Thanks for taking this, glad it's straightforward.
Jan 29 2021
To clarify on my broken heart, this is what I explained would happen in my RFC to replace graphoid: T249419. I really hope we can prioritize pushing that forward.
The above sound like very workable plans, thank you both for stepping up. To be clear, I can't coordinate this work, but hopefully as this goes through the new process we can find someone who can.
The pageview definition was changed to react to the debug header, and the change to load webrequest_128 is up for review (both linked to T273083).
For what it's worth, I agree with your suggestion above, we could just unconditionally sys.exit(if ...). If there's a reason not to do that, I'd be interested.
Jan 28 2021
Jan 26 2021
Jan 22 2021
This should be re-prioritized to high, and perhaps the parent task should be reopened and revisited because we're seeing service disruption caused by single UAs doing either large volumes or big queries.
Jan 21 2021
ping @Pchelolo: what's the latest plan on this?
Jan 19 2021
Hi! We'd like to add this to the X-Analytics header, if that's ok with everyone. This way we don't have to add a new field. Here's another task that's adding a debug flag this way: T263683, along with a patch for example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/629735
I'm sorry, I found this buried somewhere in my notes, that I was supposed to post this on Commons at some point, as a call for product management on a potential switch. Putting it here, as the RFC process winds down, just so it's not lost. But I think it's almost a year old at this point:
Jan 14 2021
@cchen: patch is up, will be reviewed shortly and will apply on the next monthly run. I'm wondering if there's a need to backfill old data, I think we keep at least one old snapshot, but it's a lot of computation so let me know if it's important.
Jan 12 2021
I'm happy to advise someone working on this, but I can't drive the work, we have had to re-focus and trim down a lot of scope. We're struggling with all the changes on our team.
@Clarakosi: I think @Ottomata meant to ping you above, adding here.
note to self: re-evaluate after the 503s above got better
One potential solution we can borrow from Google docs is to assign random names to users. This would be trickier at our scale than on a document shared with a few dozen people, but could be possible. Of course, we allow actual users to have any name they want, so styling still comes into play.
Deployed on January 5th according to the train etherpad. Change was https://gerrit.wikimedia.org/r/c/analytics/aqs/+/649884
@Seb35: apologies for the drive-by comment but it should actually be quite easy to write either a static conversion script or a dynamic Lua script:
I shall not let this go stale for more than a few months. I had a meeting with Product to try and figure out how we schedule this, but so far we're just in the brainstorming phase. I'll update this task even as we shift to a different decision making process, so anyone interested can stay subscribed here.
@bmansurov that's a good overview but it can be too detailed and not all of it is relevant. My suggestion is to look at the pageview hourly job, because you'll be writing something very similar. You're basically depending on the pageview_actor dataset, and transforming the data. That's what this job does: https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pageview/hourly The xml is just setting up that dependency and this HQL query does the transformation: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/hourly/pageview_hourly.hql
Jan 11 2021
Thanks @Isaac, hi @bmansurov!! Actually, this should be an oozie job. They're a bit more of a pain to write, but I can help with that. The major benefit is that we get alerts if the pipeline breaks or gets stuck, and it's easier to rerun and backfill. Do ping me everywhere if I'm not responsive enough.
Jan 4 2021
We groomed this today, and here are our thoughts:
Dec 22 2020
Dec 17 2020
sudo -u analytics hdfs dfs -rm -r -skipTrash /wmf/data/archive/backup/tmp/event_PrefUpdate_Backup
sudo -u analytics hdfs dfs -rm -r -skipTrash /wmf/data/archive/backup/tmp/event_sanitized_PrefUpdate_Backup
(I'm only tangentially involved with client-side graphs as they relate to the graphoid replacement I'm trying to prioritize. So I'm happy to see @Jseddon slaying this bug. And I'm working with product to prioritize a bigger effort to maintain graphs going forward; progress is very slow there)