Oct 30 2023
The Product Analytics Styleguide suggests PEP 8, but maybe we want to consider black in that it's now Python Software Foundation maintained and can integrate with Ruff.
Oct 25 2023
Just updated wiki comparison (0f377b3).
This would only work for the snapshots, but a simple solution would be to just pull the sqoop list from canonical_data.wikis. There's no automated process keeping that up to date, but:
- I've been pretty diligent about keeping it up to date (adding new wikis within a few weeks)
- We are motivated to switch it to automatic generation as soon as possible, since the manual updates are a source of toil for the Movement Insights team
- It centralizes us further on a single canonical list of wikis, easing the maintenance burden (see T339928 for another example)
Just updated movement-metrics (7a4998b). I verified that the new queries produce the exact same September unique device metrics.
Oct 24 2023
Oct 14 2023
Just repeating what I shared on Slack for accessibility:
I drafted the following section for the annual plan metrics update:
Potential block impacts: We estimated the impact of a block in India on global traffic (3% to 6% decline if blocked for a full month, 0.05% to 0.1% decline if blocked for a single day) using unique device data from April to June 2023. We assumed that many Indian users would circumvent the block using tools like VPNs. In a 2019 survey on global VPN usage by GWI, a major market research firm, 45% of Indian internet users reported using a VPN in the previous month. Because this data point might be skewed or outdated and a block might directly cause more VPN use, we used a range of 15% around this number, giving a low estimate of 30% and a high estimate of 60% of users circumventing the block. A naive way to estimate the impact of a single-day block is to assume that it would lead to the loss of 1/30 (3.3%) of monthly unique devices. However, this will be a significant overestimate because many devices are active on multiple days in a month. We do not collect data on how many days in a month devices are active, so we cannot measure the true value. Due to this uncertainty, we used half the naive estimate of the loss (1/30/2 = 1.7%).
I finally discussed a last update to the IRS success metrics with Irene last week, so this is done!
I've published my code at https://gitlab.wikimedia.org/nshahquinn-wmf/new-data-center-impact/ (previously it was in a private GitLab repo because it contained some sensitive data, so I've removed all the history before publishing). The readme has a detailed guide to the work and the results.
Oct 13 2023
@kzimmerman the numbers you pulled are accurate and there aren't any errors in the arithmetic.
Oct 12 2023
I'm making this the task for both monthly and quarterly content metrics, because 95% of the work is shared. I've also focused this on the notebooks/code rather than the report and its distribution, because those are pretty separate pieces of work.
Hamid has been the one working on this. He has written the code, which is available at https://github.com/wikimedia-research/core-annual-plan-metrics, and will be working on integrating it into the main movement-metrics repo in the coming weeks.
Yes, this is resolved! Thanks, everyone 😊
Oct 11 2023
Pretty sure this is done 😁
Sep 30 2023
https://os-reports.wikimedia.org/stretch.html now reports:
A total of 0 hosts are running stretch
Sep 25 2023
Sep 23 2023
I just finished backfilling active_editors (in addition to editor_month which I did yesterday). I updated the notebook linked above.
Sep 22 2023
I've backfilled August's Wikifunctions data. I documented my workflow in this notebook: https://gitlab.wikimedia.org/nshahquinn-wmf/miscellaneous-notebooks/-/blob/main/2023/2023-09_editor_month_backfill.ipynb
Sep 21 2023
Sep 20 2023
Sep 19 2023
Sep 18 2023
Pretty sure this is done now 😁
Sep 14 2023
Sep 13 2023
Thanks for the reminder! The list is at P52488.
Sep 8 2023
I'm working on a full report, but I'm already confident in the key finding:
I estimate that the South American data center will add 6.6 M monthly unique devices, which would be a 0.41% increase globally.
The CausalImpact models tell us that, over the year following the switch to the Singapore data center, tranche 1 countries saw a 6.2% increase in monthly unique devices, while tranche 2 countries saw an 8.3% increase. This translates to a 6.3% increase overall (because tranche 1 was far bigger than tranche 2).
Looking at the last year of data, South American monthly unique devices averaged 104 M. A 6.3% increase would be 6.6 M. Among 1.59 B average global unique devices, that is 0.41% increase.
Sep 6 2023
I've fixed the mobile domain code of Test Wikitech (labtestwiki) with https://github.com/wikimedia-research/canonical-data/commit/d055204978fba5c448ebeccef0cbff8e556e9176.
Sep 5 2023
I've now fully handed over my responsibilities and stopped attending the sync meetings.
Sep 1 2023
This will lead to unexpected breakage and need an immediate patch at some point, when the coordinator role is switched to a different server.
@BTullis do you have any idea how to make the CNAME work here?
Wmfdata now pins Urllib3 below v2 (PR 45), so that's the short-term solution done.
Aug 31 2023
Thank you, good to know!
FYI, Urllib3 version 2, released in April 2023, removed the fallback from serverAltName to commonName, so it will not be able to connect to internal servers.
Aug 30 2023
@HMonroy thank you for deploying this! I thought I had signed up for the window tomorrow, so I wasn't on IRC today, but I guess it didn't matter 😅