Page MenuHomePhabricator

Use Reportupdater for WMCS edits queries
Open, MediumPublic

Description

First step for visualizing WMCS edits data - set up a cron job for wmcs-edits.py script. Store the datasets in https://analytics.wikimedia.org/datasets/periodic/reports/metrics/ in the TSV format via the Reportupdater: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater.

Details

Related Gerrit Patches:
analytics/reportupdater-queries : masterAdd funnel parameter to wmcs queries that return multiple rows
analytics/reportupdater-queries : masterEscape dollar sign in hive script for wmcs
analytics/reportupdater-queries : masterCorrect minor details in wmcs queries
analytics/reportupdater-queries : masterModify WMCS queries
analytics/reportupdater-queries : masterFix report name in wmcs config
analytics/reportupdater-queries : masterFix report name in wmcs config
analytics/reportupdater-queries : masterAdd hive query for wmcs edits

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 11 2019, 8:27 PM
srishakatux triaged this task as Medium priority.Sep 11 2019, 8:27 PM
srishakatux updated the task description. (Show Details)
srishakatux updated the task description. (Show Details)Sep 11 2019, 8:29 PM

@JAllemandou Some update and questions on the reportupdater:

  • The docs says: “The first column must be equal to start_date parameter (consider naming it date). This is an unnecessary limitation and might be removed in the future...”. Do I need to follow this?
  • As the docs say that I should put all queries and scripts in a repo, I’ve it here along with a config.yaml file: https://github.com/srish/wmcs-edits. The script takes a start and end parameter right now and outputs a wmcs_edits.tsv file. I am not sure if this is sufficient for the reportupdater or I should try to generate the output in a different format. As we are going to anyways rely on the SQL query, maybe I should not spend too much time fixing the script at this point. I am guessing that when we use the SQL query, the config.yaml should be updated to include database information.
  • I'm getting setup for writing SQL query to get the data via geoeditors_daily. For reference, I’m looking at: https://github.com/wikimedia/analytics-limn-ee-data/blob/master/ee-migration/daily_edits.sql and trying to run it via beeline -f daily_edits.sql > out.txt but I’m getting the following error and not sure what I am doing wrong here:
Error: Error while compiling statement: FAILED: ParseException line 3:18 cannot recognize input near 'AS' '{' 'wiki_db' in selection target (state=42000,code=40000)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 3:18 cannot recognize input near 'AS' '{' 'wiki_db' in selection target
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:255)
	at org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)

Hi @srishakatux - Please excuse me again for another delayed answer - I do hope my personal issues are over now :)

@JAllemandou Some update and questions on the reportupdater:

  • The docs says: “The first column must be equal to start_date parameter (consider naming it date). This is an unnecessary limitation and might be removed in the future...”. Do I need to follow this?

I don't know if the limitation is still valid or not - I have looked at some example queries (see repo in my next line) and the first column is always date ... @mforns can you tell us more?

About your code I suggest using a new folder in this repo: https://gerrit.wikimedia.org/r/#/admin/projects/analytics/reportupdater-queries as it already contains all the queries (or so I think).

The script takes a start and end parameter right now and outputs a wmcs_edits.tsv file. I am not sure if this is sufficient for the reportupdater or I should try to generate the output in a different format. As we are going to anyways rely on the SQL query, maybe I should not spend too much time fixing the script at this point. I am guessing that when we use the SQL query, the config.yaml should be updated to include database information.

The script you have written doesn't follow the reportupdater convertion (not far though :): The input parameters are not named-parameters but positional parameters, and the expected TSV output is to be sent to stdout. As we plan on relying on hive for the raw data, it's better IMO not to spend too much time on the python script. There are examples of reports generated using hive queries in the repo I pasted above (for instance the browser folder). It is interesting to note that using hive in reportupdater is actually made using scripts, and that SQL is used for MySQL only queries.

Error: Error while compiling statement: FAILED: ParseException line 3:18 cannot recognize input near 'AS' '{' 'wiki_db' in selection target (state=42000,code=40000)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 3:18 cannot recognize input near 'AS' '{' 'wiki_db' in selection target
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:255)
	at org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)

There multiple things here :) The above query is meant to run on MySQL, not hive, so the syntax and DBs are not aligned. Also, this example uses the explode_by function of reportupdater, allowing to run one query per parameter from a file (see https://github.com/wikimedia/analytics-limn-ee-data/blob/master/ee-migration/config.yaml#L19 and https://github.com/wikimedia/analytics-limn-ee-data/blob/master/ee-migration/wiki_dbs.txt).
I suggest using https://github.com/wikimedia/analytics-reportupdater-queries/tree/master/browser as an example starting point: the config file is simple enough and the hive-query-scripts as well - You will just need to reduce them to a single config block and a single script file :)

Also, the code to extract network-origin in geoditors_daily hive table has been merged. This means we should have production data next month. I have a test table you can use to test your script manually: joal.test_geoeditors_daily_network_origin. It contains a single month partition for 2019-08. Here is an example query:

SELECT
  wiki_db,
  network_origin,
  SUM(edit_count) as edit_count
FROM joal.test_geoeditors_daily_network_origin
WHERE month = '2019-08'
  AND wiki_db IN ('wikidatawiki', 'enwiki')
GROUP BY
  wiki_db,
  network_origin
ORDER BY
  wiki_db,
  network_origin
LIMIT 100;

wiki_db	network_origin	edit_count
enwiki	internet	3979331
enwiki	wikimedia_labs	5650
wikidatawiki	internet	2562977
wikidatawiki	wikimedia	63849
wikidatawiki	wikimedia_labs	8618997

I hope my answers are clear enough :)
I'll be working late this evening, please don't hesitate to ping me :)

In T232671#5548529, @srishakatux wrote:
@JAllemandou Some update and questions on the reportupdater:

  • The docs says: “The first column must be equal to start_date parameter (consider naming it date). This is an unnecessary limitation and might be removed in the future...”. Do I need to follow this?

I don't know if the limitation is still valid or not - I have looked at some example queries (see repo in my next line) and the first column is always date ... @mforns can you tell us more?

Yes, this limitation is still there. The first column of the results should be the date in YYYY-MM-DD format.

About your code I suggest using a new folder in this repo: https://gerrit.wikimedia.org/r/#/admin/projects/analytics/reportupdater-queries as it already contains all the queries (or so I think).

Yes, please, consider using reportupdater-queries. Recently we made an effort to unify all reportupdater jobs there. :]

@srishakatux if you're adapting a MySQL query to Hive, there are some things to watch out for, like:

  • You should use a bash script like this:
#!/bin/bash
hive -e "
<YOUR QUERY HERE>
" 2> /dev/null | grep -v parquet.hadoop

The redirection of stderr and grep are necessary to avoid some Hive outputs to land in the resulting report. Adding a Hive client to reportupdater is also in our backlog, but still TODO :/

  • Use {1} {2} ... as placeholders for the parameters passed to the script. The first and second params passed to the script are always timestamp_from and timestamp_to, already in YYYY-MM-DD format. The following parameters passed to the script are explode_by parameters (if you don't use this feature just ignore this), they come ordered alphabetically by placeholder name.

Please, let us know if you have any question!
Cheers!

Thank you @JAllemandou and @mforns for your helpful reply :)

Getting closer to the desired outcome, I think. I've made slight changes to Joal's script https://github.com/srish/wmcs-edits/blob/master/wmcs_edits that gives me the output included here: https://github.com/srish/wmcs-edits/blob/master/output.txt.
The config file I've is here https://github.com/srish/wmcs-edits/blob/master/config.yaml.

I've two questions:

Once you review the config.yaml and the wmcs_edits (hql), I will send a patch for adding it to reportupdater-queries in Gerrit.

JAllemandou added a comment.EditedOct 14 2019, 7:07 PM

Hi @srishakatux :)

The config file I've is here https://github.com/srish/wmcs-edits/blob/master/config.yaml.

One comment on the value you picked for lagin the config. Reportupdater will by default run a query having granularity month after the month is done. For instance it will run query for the month of 2019-11 on 2019-12-01, with start_date = 2019-11-01 and end_date = 2019-12-01 (inclusive start-date, exclusive end-date). The lag parameter is time you add to the default running date for data availability (or any other reason). In our case, data is being copied from mysql databases onto the hadoop cluster on the 1st of the month, and in past months it has been very regular in taking just a few hours. I therefore suggest using a lag of 1 day, also noticing that rerunning reportupdater queries is just a matter of removing the file with incorrect data.

I don't think there is any other place where the time parameters are needed. Scheduling being managed by reportupdater, the query will be run monthly with updated start-date ($1 in script) and end-date ($2).

There is however a change needed in the example code as it is actually not taking advantage of hive partition pruning (we are correcting this right now, you can track it here: T234283)

CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '$1' AND
CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) < '$2'

should be converted to

month = substr('$1', 1, 7)

as the query is monthly.
Another detail: let's not forget to grow the limit 100 on the query to a bigger number allowing to get all rows (limit 10000 is more than enough :)
Last - let's rename the query to wmcs_edits_monthly_by_wiki, to be very explicit :)

  • Also, because of the very nature of the output, I'm wondering if, including start_date as the first parameter, would make sense as it is going to show the same value?

I think this is the limitation of reportupdater we were talking of earlier: you're right it's redundant but it;s not big deal, data is small. On the other hand, I kinda like having complete data in files, as it prevents loosing it in case of file wrong renaming for instance. Anyway, the date is needed for the system to work, even if the system probably could overcome this limitation and if we could functionally do without it :)

Please send a code-review on the gerrit repo when you want.
edited for code correction - Thanks @mforns

Change 543008 had a related patch set uploaded (by Srishakatux; owner: srish):
[analytics/reportupdater-queries@master] Add hive query for wmcs edits

https://gerrit.wikimedia.org/r/543008

Change 543008 merged by Mforns:
[analytics/reportupdater-queries@master] Add hive query for wmcs edits

https://gerrit.wikimedia.org/r/543008

Sorry @srishakatux I didn't notice in the code review that the report name in the config does not match the file name of the script.
They should match for RU to work. Could you please fix that in a new patch?
Thanks a lot.

Change 543709 had a related patch set uploaded (by Srishakatux; owner: Mforns):
[analytics/reportupdater-queries@master] Fix report name in wmcs config

https://gerrit.wikimedia.org/r/543709

Change 543709 abandoned by Srishakatux:
Fix report name in wmcs config

https://gerrit.wikimedia.org/r/543709

Change 543719 had a related patch set uploaded (by Srishakatux; owner: srish):
[analytics/reportupdater-queries@master] Fix report name in wmcs config

https://gerrit.wikimedia.org/r/543719

Change 543719 merged by Mforns:
[analytics/reportupdater-queries@master] Fix report name in wmcs config

https://gerrit.wikimedia.org/r/543719

Change 551690 had a related patch set uploaded (by Srishakatux; owner: srish):
[analytics/reportupdater-queries@master] Modify WMCS queries

https://gerrit.wikimedia.org/r/551690

@mforns @JAllemandou Would it be possible for one of you to review the patch and give feedback? I've made the necessary changes to the scripts to adhere to the data format here that we are using for the first version of the dashboard: https://wmcs-edits.wmflabs.org

The scripts are working fine for me but maybe there is room for code refactoring. I also have two questions related to the changes:

  • Sometimes I'm getting exponential values like "8.0E-4". I am not sure how to deal with these values. Should I leave them or try to convert them (if so how)?
  • Also, the wmcs edits data outputted by these scripts seems to be accurate but not the total edits data. It does not matches with the one that Bryan's script was generating. Would you be able to tell why? For example for the tabular view that's how I am evaluating total edits: SUM(edit_count) AS total_edits.

@srishakatux hi!

I reviewed your patch, and LGTM! I left a couple comments, but if you say the scripts run fine, then probably my comments are unnecessary :]

Sometimes I'm getting exponential values like "8.0E-4". I am not sure how to deal with these values. Should I leave them or try to convert them (if so how)?

Hmm, what we did with the Browser reports which also used the pivot script was threshold the data, to only show values greater than a given value.
You can not see that thresholding in the queries, because it was done beforehand when generating the table wmf.browser_general.
But maybe you can threshold and filter out the wikis that have percentages smaller than a given value, i.e. 0.001, in the queries themselves?
This would also ensure that you don't end up with a really long tail of pivoted columns (one column per wiki) with very small values that are difficult to read in a chart.
If thresholding is not an potion to you, because you need all wikis all the time, then maybe rouding the value to less decimal places can help? Not sure...

Also, the wmcs edits data outputted by these scripts seems to be accurate but not the total edits data. It does not matches with the one that Bryan's script was generating. Would you be able to tell why? For example for the tabular view that's how I am evaluating total edits: SUM(edit_count) AS total_edits.

Not sure about that, the code looked good to me, but maybe I missed something! Can you point me to Bryan's script?

srishakatux added a comment.EditedWed, Nov 20, 8:46 PM

@mforns Thanks! I've made minor changes in response to the code review to the queries. Also, rounding off the percent to 3 decimals now in all the queries which also resolves the issue with exponential values.

Bryan's script is here:
https://phabricator.wikimedia.org/T226663#5287195. I just realized that not only the wmcs_edits but also total_edits data generated by the queries do not match the data outputted by the script. That makes me wonder if it is something to do with since when we started reporting this data in the wmf.geoeditors_daily table?

For 2019-10 via queries:

wiki_db	         wmcs_edits	total_edits  wmcs_percent
commonswiki	 41100	        2171996	     1.9%
enwiki	         4582	        4252868	     0.01%
wikidatawiki	 7053115        9714767	     72.6%

via Bryan's script:

wiki_db           wmcs_edits	  total_edits	wmcs_percent
commonswiki       573263          4157574	13.79%
enwiki	          414401	  5851360	7.08%
wikidatawiki	  12216608	  19628856	62.24%

Hi @srishakatux, thanks a lot for having double checked the data. The discrepancy is due to bot-edits being removed from geoeditors data. I'm very sorry not to have pinpointed that earlier :(
I'm assuming bot edits are important for your metric (a lot of bots are run from wmcs IIUC).
I think the best idea would be to add bots-edits to the geoeditors dataset, as it would be valuable for other analysis as well. We currently flag bots in 2 ways: by group (when the user is in the BOT group), and by name, when its name matches this regexp.
We (analytics) need to make a decision on how we want to add that info to the table, and then your requests should just work as is without difference, except for that one other thing: I think Bryan's script is counting more than edits as it doesn't filter for cuc_type (see https://www.mediawiki.org/wiki/Manual:Recentchanges_table#rc_type).
Let's confirm the above plan works for you @srishakatux .

Thanks, @JAllemandou, for recognizing the missing piece in the data! The plan sounds good to me. Looking at the possible rc_type, my interpretation is that we would need the data unfiltered as all the fields in there seems relevant. But, Bryan may have been able to say more on this and he is currently on a very long vacation :(

JAllemandou added a subscriber: Addshore.EditedFri, Nov 22, 2:31 PM

Analysis of non-edit cuc_type for 3 big wikis:

spark.sql("""
  select
    cuc_type,
    wiki_db,
    count(1) as c
  from wmf_raw.mediawiki_private_cu_changes
  where month = '2019-10'
    and wiki_db in ('enwiki', 'commonswiki', 'wikidatawiki')
  group by cuc_type, wiki_db
  order by cuc_type, wiki_db
""").show(100, false)

+--------+------------+--------+                                                
|cuc_type|wiki_db     |c       |
+--------+------------+--------+
|0       |commonswiki |2995168 |
|0       |enwiki      |4717817 |
|0       |wikidatawiki|15237398|
|1       |commonswiki |110275  |
|1       |enwiki      |253398  |
|1       |wikidatawiki|3684901 |
|3       |commonswiki |1052137 |
|3       |enwiki      |880152  |
|3       |wikidatawiki|704542  |
|142     |wikidatawiki|2019    |
+--------+------------+--------+

Almost all non-edits rows come from logging (cuc_type = 3). I wonder if this would really be useful...
As for wikidata cuc_type 142, @Addshore found it is related to flow.

@JAllemandou WOW! I'm guessing that the edits in different cuc_type categories are all mutually exclusive, assuming that if I add the different cuc_type edits for a wiki, I get a number very close to Bryan's. Not sure what 142 category is, it is not in here https://www.mediawiki.org/wiki/Manual:Recentchanges_table#rc_type. I don't have a strong opinion on whether or not we should report the remaining cuc_type to the table, what are your thoughts? Do you want me to check with someone from the Cloud Services team on this?

We have historically counted all cuc_types in generating the "edit count" data for Cloud VPS projects. The fine distinction between creating a new page, editing the wikitext of an existing page, or taking an admin action like a page move or applying protection to the page does exist in the data. For the purposes of showing relative contribution from Cloud VPS hosted software vs the rest of the internet the action taken itself does not matter.

The "bot filtering" that is done by the Analytics team for page views is not helpful at all in our case. Traffic from Cloud VPS to the main cluster wikis is likely all "bot" based, meaning either autonomous or semi-autonomous software driven actions. The fact that any data shows at all with known bots filtered out actually just shows how fragile the idea of user-agent and account rights based detection is.

I believe we figured out 142 was flow

Thanks for the command @bd808. We're adding the change_type dimension to the underlying dataset, allowing for more use-cases (wmcs included).
About bot-filtering, being in the case of edits and not pageviews, we use a different approach. We use mediawiki user-groups, a user being in the bot group is flagged in the user_is_bot_by dimension with the value group, and we also use a regexp (https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/user/UserEventBuilder.scala#L24), a user having its username matching it is flagged with the value name is the column (that column is an array, a user can have both values). Indeed detection is fragile :)

Adding action-type to the base table makes the result be extremely similar. The very small difference is due to imperfect user-data we join to to gather bot information. I hope this is ok for you @srishakatux and @bd808.

  • Data from Bryan's script:
wiki_dbwmcs_editstotal_editswmcs_percent
commonswiki573263415757413.79%
enwiki41440158513607.08%
wikidatawiki122166081962885662.24%
  • Data from report-updater request on data with new dimensions:
wiki_dbwmcs_editstotal_editswmcs_percent
commonswiki57326341575800.13788381702817504
enwiki41440258513670.0708213995122849
wikidatawiki12216612196288600.6223801076578059

@JAllemandou this is perfect! :) Let me know when I can query the new data from the table (or if there is a test table that I can play with), I will then modify this patch accordingly.

Hi @srishakatux :)
Code is ready (see https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/552510/), but won't get merge before next week.
I'll backfill the december month next week, then let you know so that you can test and update the queries.
Something to note: I have renamed the table you're using, as it's not geoeditors-inly related, but more braodly storing information on editors. It's now:editors_daily (instead of geoeditors_daily).
I'll ping you when the change has been deployed and teh data available :)

Hi @srishakatux - New data is available in table wmf.editors_daily for month '2019-11'. I think except from table name, your queries should work as-is. Let me know!

Thanks @JAllemandou! I've changed the table name in the queries and made the changes to the patch. I've also tested the queries with the new table. It looks good to me :) For the record, here is the comparison now between data via Bryan's script and hive queries for the month of November:

For 2019-11 via hive queries:

wiki_db	         wmcs_edits	total_edits  wmcs_percent
commonswiki	 705703	        6155816	     11.5%
enwiki	         475714	        5525632	    8.6%
wikidatawiki	 16431176        25006814	     65.7%

via Bryan's script:

wiki_db           wmcs_edits	  total_edits	wmcs_percent
commonswiki       693122          6025812	11.5%
enwiki	          461736	  5361063	8.61%
wikidatawiki	  15809813	  24076095	65.67%

@mforns You can now review the patch :)

Change 551690 merged by Mforns:
[analytics/reportupdater-queries@master] Modify WMCS queries

https://gerrit.wikimedia.org/r/551690

@mforns @JAllemandou when will I see the files in here https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/wmcs/? I see there is one already there, not sure if I added it.

If there is nothing more left to do in this task, I can close this.

@srishakatux I checked reportupdater logs and it seems the queries have failed, they seem to return no results for some reason.
I will troubleshoot this tomorrow and let you know!

Change 554527 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/reportupdater-queries@master] Correct minor details in wmcs queries

https://gerrit.wikimedia.org/r/554527

Change 554527 merged by Mforns:
[analytics/reportupdater-queries@master] Correct minor details in wmcs queries

https://gerrit.wikimedia.org/r/554527

mforns added a comment.Wed, Dec 4, 3:20 PM

@srishakatux
There were a couple minor bugs in 2 of the queries.
That's why the reports weren't there.
Sorry for not having catched those in the code review.
I created another patch that fixes the problems and also removes some unnecessary code.
Merged the patch to unbreak production but left some comments there in case you want to look!
Hopefully, in a couple hours you should see the reports updated in https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/wmcs/.
Cheers!

Change 554564 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/reportupdater-queries@master] Escape dollar sign in hive script for wmcs

https://gerrit.wikimedia.org/r/554564

Change 554564 merged by Mforns:
[analytics/reportupdater-queries@master] Escape dollar sign in hive script for wmcs

https://gerrit.wikimedia.org/r/554564

mforns added a comment.Wed, Dec 4, 4:47 PM

@srishakatux
One of the queries failed again!
I had tested it before from the hive command line and it worked fine!
But as reportupdater executes it as a script, the ${wikis} hive var was being interpreted and replaced as a bash parameter, thus failing in hive.
I escaped the $ sign in the query and this will hopefully fix the problem, see gerrit changes.

Change 554585 had a related patch set uploaded (by Mforns; owner: Mforns):
[analytics/reportupdater-queries@master] Add funnel parameter to wmcs queries that return multiple rows

https://gerrit.wikimedia.org/r/554585

Change 554585 merged by Mforns:
[analytics/reportupdater-queries@master] Add funnel parameter to wmcs queries that return multiple rows

https://gerrit.wikimedia.org/r/554585

mforns added a comment.Wed, Dec 4, 7:25 PM

@srishakatux Finally I think the jobs run and their results are expected!
Please check the results!
They are not yet in https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/wmcs/ but will be soon, I hope.

@mforns Thanks a lot for making all the changes and sorry for causing trouble :) I just refreshed the dashboard with new data and all looks good to me except one thing that we don't see the November data in the query results. Is there a reason for it?

mforns added a comment.Wed, Dec 4, 9:52 PM

@srishakatux I can see November data in the dashboard. It must be a caching issue, and should be over soon.

@mforns So, I see data corresponding to two dates: 2019-11-01 and 2019-10-01 meaning for the month of September and October only and we are missing for November right now.

I looked at data from https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/wmcs/ and I have 2 comments:

  • I find it misleading that wikis_by_wmcs_edits.tsv contains percents without mentioning it in the filename while the other two contain absolute edit values and percents in relation to their names.
  • First of the month is used to mention the named month data - 2019-11-01 means data from 2019-11-01 to 2019-11-30

Ahha, I see! So may be then, we should display the date in the queries as YYYY-MM and for that change SELECT $1 .... to SELECT substr('$1', 1, 7) everywhere. Perhaps, filenames could be:

wikis_by_internet_and_wmcs_edits (leave this as it is)
wikis_by_wmcs_edits_percent (change this to wmcs_edits_by_wmcs_total_percent)
wikis_by_wmcs_edits (change this to wmcs_edits_by_total_edit_percent)

srishakatux added a comment.EditedFri, Dec 6, 6:32 PM

@mforns @JAllemandou I can send a patch if the above suggestion makes sense :)

Update > I'm confused looking at the starts field in config.yaml. I'm wondering if to interpret it like this - 2019-10-01 start date means that the script will run on this date and try to fetch the data for October. If this is the case, then I'm thinking how is this going to work :-/