@dbarratt Nuria's on vacation this week, here's your piwik snippet, you're site 14! :)
yes, @GoranSMilovanovic, that's it. But my main point was about the mediawiki tables, definitely look at those before you try something else, they're worth your time.
@GoranSMilovanovic you can query either wmf_raw.mediawiki_user or wmf.mediawiki_user_history and wmf.mediawiki_history in Hive/Spark to get what you need.
Thu, Feb 15
For the record, I liked Joseph's idea of 3 data sources. One being served right now, one backup, and one being loaded next. When loaded_next is done, it is checked against served_right_now for accuracy and cache warming. When that passes, the backup is deleted, and served_right_now becomes backup, loaded_next becomes served_right_now. How to do this is still up for debate.
We have built a little tool called "reportupdater". We have only used it in our production cluster for now, but there's no reason it can't be used outside. Basically, its mission is to make it easy to execute a templated SQL query on one or more wikis, with as many parameters as you need, and generate regular report output from the results. If that sounds useful, the docs are here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater
We really did try and ping people several times about blocking *pageview* (which catches our pageview-api). We submitted a patch, I think Kaldari even knew some devs on there and tried to hook us up, but it never got anywhere. They were skeptical of us, and stayed skeptical despite us explaining our stance on privacy. Sorry to be pessimistic, but I think any attempt to fix this will just get lost in the noise of the world.
Marked the swift task done, I asked Fillipo and we updated the task Miriam opened. In short, we can access all of the files at low resolution through the API. If that's too slow, we can dump them to HDFS, Swift can do anything we want, but ideally we try the API first.
Thanks for the report! I agree with your proposals, was saying something similar yesterday. We're trying to internationalize our number formatting so we're going back and forth a bit on it.
+1 that this doesn't look at all like a bot. @kaldari, any more thoughts? I'll just close this otherwise.
Wed, Feb 14
I'm sorry, I was going to fix this before this month's run and I forgot. I will fix it now and new data will be available by March 10th if nothing goes wrong.
Tue, Feb 13
Mon, Feb 12
If there's any disagreement with Lego's thoughts, please re-open, but I agree.
Yeah, /home/milimetric/GeoIP-toolbox/MaxMind-database/GeoIP is a git repository with all the backups of the GeoIP dbs that we kept over the years (once a week usually except when the scripts don't work)
Yeah, I don't know much about pywikibot, who maintains it? I thought it was a crucial package for a lot of wiki bots, how can it be broken for so long?
We still have to check Debezium with the DBAs and hear their thoughts on it, but it's possible we could go forward with both ways of generating events and figure out which is easier in practice:
@jcrespo: We really liked Clickhouse's performance and interface, but looking at it more closely we came up with the following negatives:
Sun, Feb 11
(points increased a bit because I have to split this code up, but on track for being done early this week)
Thu, Feb 8
WRT solution 2 consider also that the backing system is still object-storage with a lot of files, thus certain operations (e.g. listing all files) would not be practical to achieve.
Tue, Feb 6
Mon, Feb 5
Fri, Feb 2
- The fig bars were incredible, I need to know what brand/bakery they came from
- If we want product input, I suggest they work on organizing the position papers into sessions and inject questions and concerns that they want each session to address.
- If we don't want product input, we should work together on organizing the sessions instead of putting that burden on a small committee that might not have the necessary context.
- Sessions like Evolving the Mediawiki Architecture were too short and interrupted while sessions like the Contributor Experience were unnecessarily long.
- Having two separate sessions in each time slot seemed forced at times. For the big sessions like Architecture and Open Source, we can just have one at a time.
- The position papers and limited participation was brilliant, it was the first summit that gave me hope. With another iteration and working out the kinks above I think it can become one of our most productive gatherings.
Jan 19 2018
Jan 18 2018
Jan 17 2018
and if you're nervous about what that did, you can check the .reruns folder:
So, I'm not sure, but my bet is that there was some outage that caused the data to land there after the reports were run. When that happens, you can re-run the reports:
Jan 16 2018
Jan 12 2018
Some thoughts from post-standup:
Jan 11 2018
totally, put a meeting on our calendar or let's chat here.
Jan 10 2018
Jan 8 2018
From the Wiki selector, you have to first type "Wikipedia" and then "English" (or press Enter to auto-complete when it's highlighting what you want). The selector is un-intuitive, we're working on that here: T179530, but it's definitely possible to select English Wikipedia. The next version will be easier to use, but keep us in the loop with what you think.
We don't have time for this now, but I'm documenting this idea in the Unique Devices docs, so as to not forget about it: https://meta.wikimedia.org/wiki/Research:Unique_Devices#Can_we_count_unique_users_instead_of_devices?
Jan 6 2018
I started a new landing page for definitions, https://meta.wikimedia.org/wiki/Research:Wikistats_metrics, and it's meant to mirror the Standard_metrics page. I'm not sure if we want to go with the same exact format, but we can improve/change that as we go. For now, there are some basic definitions on each sub-page.
Jan 5 2018
Jan 4 2018
ah, it's getting cut off, cool
Metrics are currently configured as additive (Pageviews, Edits) or non-additive (Uniques, Edited Pages). I think in this case the metric is just mis-configured. But do you think it should show both total and average in the additive metric cases?
Jan 3 2018
It looks like there are some reading metrics available, but there seems to be no editing activity whatsoever. That might be wrong, we'll look into it:
fixed, @mforns has priority :)