Fri, Oct 22
Thu, Oct 21
Fri, Oct 15
Wed, Oct 13
Tue, Oct 12
Closing this ticket as the data is not missing. Documentation for this table notes:
Fri, Oct 8
Wed, Sep 29
@Tgr Do you know of recent bot detection deployments that should be considered here? Or insights about changes to bot activity that I should review? Another potential component is whether bot-created new registered user accounts are part of the picture. If there have been changes in bot activity/behavior or in bot detection, I will include that in this investigation.
(1) Do any of the movement metrics ETL notebooks currently use Presto to query?
No; A few use spark:
2b hive, spark
1 hive, spark
1b hive, spark
2a hive, spark
I believe that I didn't install any Python packages for the data wrangling notebooks when I created a new environment and cloned the repos in July. The packages that I installed were R packages to run the three data viz notebooks.
These are the packages imported:
from cycler import cycler
from dateutil.relativedelta import relativedelta
from functools import reduce
from google.oauth2.service_account import Credentials
from numbers import Number
from pathlib import Path
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Tue, Sep 28
Wikistats numbers for New Registered Users are in line with the numbers seen in the logging table (where log_action = 'create') and SSAC table (where event.isselfmade = true). See how wikistats defines New Registered Users.
These numbers were tested by comparing them to user counts on wikis for user accounts created within the same time period, using the below query:
Sep 15 2021
When comparing logging table records where log_action = 'create' to ssac table records where event.isselfmade = true, I see a very similar number of results in the two tables for the list of wikis queried. Out of 77 wikis: 66 showed the same number of new registrations in July in both tables, 11 showed a .44% to a .02% difference in registration counts between the two tables with the ssac table undercounting by a few registrants each of the 11 times.
Sep 9 2021
Sep 8 2021
I am looking at counts from ServerSideAccountCreation and comparing those to counts from logging for swiki, bnwiki, idwiki, and enwiki to start.
I will check in with @mpopov tomorrow about this and will post an update here by Thursday.
Sep 3 2021
Aug 30 2021
Aug 18 2021
Question: what percentage of editors in SSA Africa participated in campaign activities in the 2019-2020 period?
Aug 16 2021
Aug 13 2021
Aug 11 2021
Aug 10 2021
Aug 3 2021
Jul 26 2021
Jul 12 2021
Hello I've joined the Product Analytics team with @mpopov as my manager. Hooorah!
Jul 9 2021
Jul 8 2021
May 3 2021
Feb 24 2021
Feb 22 2021
Aug 4 2020
Thank you, @elukey !
Jun 18 2020
Jun 2 2020
I deleted all files on nb3 and shutdown the server.
I rsynced all files from nb4 and shutdown the server.
May 21 2020
These were interesting and helpful metrics to review for GLOW India articles:
Namespace (or just main/not main?)
num of editors
num of edits
num of watchers
time since last edit
May 14 2020
Hi @elukey I'll transfer files and shut down notebooks over the next few days. I'll check in on Tuesday with an update or questions if any.
Apr 21 2020
thank you @nettrom_WMF! sorry for the delay
Apr 1 2020
Loading neighboring contest articles:
Mar 27 2020
Data wrangling code to pull the items in this task can be found here: https://github.com/IreneFlorez/GLOW/tree/article_suggestions/scripts/data_wrangling
Mar 24 2020
articles that were edited using a translation tool (by type):
expanded 113 (expanded total 1418)
new 3602. (new total 7445)
Expanded articles edited using a translation tool:
Mar 20 2020
Mar 18 2020
@mpopov maybe the faulty link was related to a bug? I'm receiving bug reports related to this ticket. Would it make sense to create a new ticket?
Sorry about that, I just updated the ticket with a functional task link.
Mar 16 2020
In an effort to run these queries from a Python3 notebook without needing to change the notebook type, I've switched these queries to run as spark queries using the wmf data package's spark.run function. I'm now able to run the queries. For example, here's the code for the translation query:
Thank you. Yes, I can confirm that I had run kinit and entered my kerberos credentials in a notebook-terminal.
@JAllemandou I tried running these spark queries over the weekend on a small batch of articles and they timed out.
Might you have tips or insights? I didn't receive any error messages, simply the queries took a very long time and eventually I stopped the kernel.
Given that behavior, I also tried running the queries as hive queries and had similar issues.
Mar 12 2020
Total values in full rec list: 34295
total recs in translation: 14155
total recs in editing: 20102