Fri, Jul 21
Thanks @greg, added info for EventBus. This is a good resource to keep updated.
Thu, Jul 20
In my opinion, we could remove vue-router as a dependency and use just a simple component that mirrors some property from the $store to the url and vice versa. That's what I had in mind with cleaning up the routing code. This way, whenever we add something that needs to be mirrored to the URL, it's a small change in the $store and the routing continues to work. It would also make the source smaller because vue does a bunch of stuff we don't need.
Wed, Jul 19
Oh, gotcha. Agreed. Am I right in guessing that the Structured Data work would make JsonConfig obsolete and we can migrate to that once it's ready? Or is that just enabling multiple types of content, and not necessarily solving the problem of storing Json specifically? If so, then maybe we have to learn JsonConfig. It doesn't seem too terribly complicated, maybe we I can pair with someone and the two of us can be responsible for it.
No, I wrote this extension. I know this code inside and out, because it's like five lines, and there's no problem in the code itself as far as we can tell, just in how JsonConfig expects it to provide something in extension.json. I couldn't replicate the problem we see in production, so I don't know how to fix it except for the "test in production" advice I got above - which definitely freaks me out.
I couldn't replicate in vagrant or beta, though I tried. I'm all up for sitting with someone who knows mediawiki better to try and debug. This extension itself is about as simple as an extension can get, it's basically a Hello World using JsonConfig classes.
it's good that it worked, Goran. Stat1005 is not yet fully vetted and ready to go, so we don't expect everything to work there yet.
The analytics ones (analytics, dashiki, reportcard) no longer need it, we should clean that up, but I'm a bit hesitant to start anything new now before parental leave.
Tue, Jul 18
Here is a template for how to do this in general. The example is using a test directory in hdfs under my user, and the cewiki database.
Yeah, it's kind of a bug in Phabricator that it copies all parent tags on subtasks. I think that's causing a ton of unwanted pinging. It would be fine if there was an option like "copy parent tags" but leave them off by default.
Mon, Jul 17
I mean, whether it's Python or R has nothing to do with the access level. My point is, there's an example of a sqoop script there, that works when run either as hdfs or as an unprivileged user. So we can use that as a starting point. Now, if you could explain what you would like to do ideally, that would be a good starting point. As I was saying, it's hard to figure out from the long thread above.
@GoranSMilovanovic I pinged you in IRC but in case you missed it, let's meet to go over the problem here. I wrote the sqoop job that loads all mediawiki tables into HDFS: https://github.com/wikimedia/analytics-refinery/blob/master/bin/sqoop-mediawiki-tables and I should be able to help. But I need to know exactly what you're trying to do, and I lost context with this long thread.
Thu, Jul 13
One version of the history schema simplified and loaded to test how Druid can work as a direct back-end for AQS:
Wed, Jul 12
Tue, Jul 11
Mon, Jul 10
@Halfak is it a deal-breaker if we couldn't migrate the history of Quarry to Redash? I'm wondering if you care as much about the history as the features themselves. If it's the history, I'd say that would take too much work and will most likely never happen.
Thu, Jul 6
It looks like the split-by parallelization (the easy option) worked!
The thing to consider here is that the people who are likely in the loop about what's happening on the mediawiki dbs are not in the loop about research being done on those clones on dbstore1002 (analytics-store) or vice versa. So I think it's totally fine to delete them from the production boxes whenever folks wish to do that. But we should give some warning to researchers who may be running queries on those tables, or have plans to. I can write to the typical research lists to ask this question if you like, so this task doesn't stall. But if this situation comes up again we should think about a good process.
hm, annoying, it seems to redirect fine now. It happened right when I created this task, and I tested it on a few different devices. It's also not the first time, the first time it happened I thought I was just crazy, because later it stopped happening. But something fishy's going on here.
Thanks @yuvipanda, looks promising.
Reedy helped me find it, of course :) Thanks!!!
That's a good point of view, @Whatamidoing-WMF, and generally it's correct. But if we're to be able to measure this thing at all, we need some sort of hook. Any hook that the communities deem acceptable is fine by me, and my role here is just to help crunch the numbers no matter how big the data is. In other words, to remove any "big data" blockers.
Yeah, let's work on it when we have time. Unfortunately we have too many things on our plate so this has to get pushed back.
For what it's worth, we are just working on bundling our very minimal alpha build of wikistats so far (not the prototype, the real application with live data). And the total size for everything minified and gz is 200kB. That's including CSS, state manager, router, apis to grab data, logic and data manipulation, lodash modules, d3 components to visualize, semantic ui widgets, date and number formatting, etc. The only thing that's extra are the fonts for icons. And this is not slimmed down aggressively yet, it's just what we shaved off to have a decent first build. We can probably realistically get around 150kB, especially if we get rid of unused CSS.
Thanks very much, Joseph, no worries, I got it.
Wed, Jul 5
more context: it looks like Bob West ran into the same issue a couple of years ago while reading from analytics-store: https://wm-bot.wmflabs.org/logs/%23wikimedia-databases/20150622.txt. In that case it sounded like the server was just exceptionally busy. Based on that, I suppose we can try sqooping again manually and see if we get a better result.
Joseph - if you do end up updating this, no worries on too much detail. I get how the error is happening and will see how to increase that net_write_timeout or find other advice.
This is where the log of the error shows up in hue btw: https://hue.wikimedia.org/jobbrowser/jobs/job_1498042433999_48036/job_attempt_logs/0
Diagnosis so far:
Mon, Jul 3
Fri, Jun 30
Thu, Jun 29
Try different preset