Page MenuHomePhabricator

Provide some Pywikibot usage statistics for Python2.7 and Python3.x
Closed, ResolvedPublic

Description

Some people are extremely eager to get rid of python 2.7, but we have no clue what the impact will be on our user base. Fortunately Pywikibot provides user agent data. Please gather some statistics to get an idea of Python2.x usage vs Python3.x usage so we can get an idea about the impact.

Possible things to gather:

  • Number of unique users per version
  • Number of edits per version
  • Top bots still using Python 2.x
  • ....

We had similar tasks in the switch from pywikibot-compat -> core, maybe some re-use possibilities.

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJan 7 2020, 9:09 PM
Xqt added a subscriber: Xqt.Jan 8 2020, 2:21 AM

An all usage statistic for 2019 is given here:
https://www.jetbrains.com/lp/devecosystem-2019/python/
where one of 10 users still takes Python 2.
Anyway it could be important if a huge amount of bots are still running with Python 2 and not Python 3 which was recommended for a long time ago.

Urbanecm claimed this task.Jan 8 2020, 9:15 AM
Urbanecm added a subscriber: Urbanecm.

Maybe I'm doing something wrong, but I logged to Turnilo and submitted the following query against webrequest_sampled_128 (1/128 of all webrequests):

It should be top 100 useragents containing Pywikibot in last 30 days. Downloaded the result (full useragents) to a CSV and analyzed it using a table processor:

I'm posting here the data using percentages. I've spot-checked the analytics, and it seems to be correct (I did check the 2.7.13.final.0 row, it indeed is 48%). It's actually percentage of webrequests instead of edits - I don't see post data in Turnilo, but given how specific PWB useragents are, it shouldn't be a big problem.

Maybe we can add the deprecation to the tech news, or even massmessage technical village pumps with a special message, warning all bot maintainers that Python 2 is to be depracated soon? I can do either thing for you. I can do a second analytics after a month (?) and email bot maintainers who still show at the list.

Restricted Application added a project: User-Urbanecm. · View Herald TranscriptJan 8 2020, 9:15 AM
Dvorapa added a subscriber: Dvorapa.EditedJan 8 2020, 10:00 AM

This seems similar to core/compat times. One of the reasons is that help pages and docs suggest python pwb.py, which in Debian still is Python 2. Anyway if we don't want to do the transition using a new branch, we will need to inform bot maintainers every possible way and also warn the last ones by email.

This should be pretty similar to moving tools to Debian Stretch at the beginning of this year. Is there any schedule of Stretch transition we could use?

There are also some tasks for some bots already under T242120

Xqt added a comment.Jan 8 2020, 1:28 PM

The result is surprising me a bit:

  • good news: Python 3.4 isn't used and we can drop support for that release soon. (T239542)
  • strange: since April 2018 Pywikibot master branch and stable release 3.0.20180403 or newer cannot run with Python 2.7.3 (T191192)
  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning

The result is surprising me a bit:

  • good news: Python 3.4 isn't used and we can drop support for that release soon. (T239542)

Actually, it is used. It wasn't in my original list because Turnilo allows me to see just top 100 user agents matching conditions listed. No 3.4 version was in that dataset. To help you get an idea of the user inpact, total number of requests using an UA matching Pywikibot.*Python/3.4 regex is 31.9k, while when I look for Pywikibot.*Python/3, it's 3,3mil (and when I look for Pywikibot.*Python/2, it's 3.7mil).

  • strange: since April 2018 Pywikibot master branch and stable release 3.0.20180403 or newer cannot run with Python 2.7.3 (T191192)

Full user agent for this one is REDACTED (wikipedia:REDACTED; User:REDACTED) Pywikibot/3.0-dev (-1 (unknown)) requests/2.7.0 Python/2.7.3.final.0 (username, script name and project language redacted by myself for privacy). Not sure what Pywikibot/3.0-dev (-1 (unknown)) means exactly through.

  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning
Xqt added a comment.Jan 8 2020, 4:13 PM
  • strange: since April 2018 Pywikibot master branch and stable release 3.0.20180403 or newer cannot run with Python 2.7.3 (T191192)

Full user agent for this one is REDACTED (wikipedia:REDACTED; User:REDACTED) Pywikibot/3.0-dev (-1 (unknown)) requests/2.7.0 Python/2.7.3.final.0 (username, script name and project language redacted by myself for privacy). Not sure what Pywikibot/3.0-dev (-1 (unknown)) means exactly through.

This is a pywikibot release prior than 17/02/2018 and an insecure requests release

Do you think we should inform the user about it?

Xqt added a comment.Jan 8 2020, 7:03 PM

Do you think we should inform the user about it?

Sure.

@Urbanecm thanks for looking into this. Appreciated. Not sure if bot usernames are a privacy issue, I think not because these are role accounts. I recall in the past we had a list of top bots using an old version so we could contact the operator. Maybe double check with the legal privacy guru's?

Bot usernames are (mostly) connected with exactly one operator. As said in previous comment, I'm happy to contact the bot owners on your behalf, I'm also happy to give them your contact info if you want me to do so, so they can contact you directly with question. That would be totally okay. I'm not going to release the data unless approved by Legal. I can contact them, but approval can take weeks to months :-).

Maybe we can add the deprecation to the tech news, or even massmessage technical village pumps with a special message, warning all bot maintainers that Python 2 is to be depracated soon? I can do either thing for you. I can do a second analytics after a month (?) and email bot maintainers who still show at the list.

I think this could be useful, especially if combined with a link to some general info about how to migrate (both as a script user and as a script writer) and a link to a partilly filled out new task template for requesting support (for migrating specific bots). Maybe set up a temporary subproject to Pywikibot for the migration tasks?

Same e-mail should also go out on the pywikibot list.

  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning

Also surprised but I would expect a bunch of them to be regularly run cron jobs where nobody looks at the logs unless it crashes. As the co-maintainer of one such bot I know we don't (and when I do look at the logs I filter out FutureWarning 🙄).

@Urbanecm If not too much work it would be interesting to see what the percentage distribution looks like for number of users (alternativet named scripts) rather than requests (just to get a feel for if 2.7 is dominated by a few large users or if it's simple scripts run by many small volume users.

Xqt added a comment.Jan 9 2020, 7:36 PM

BTW I created T242120 for migration support (and did a few)

Maybe we can add the deprecation to the tech news, or even massmessage technical village pumps with a special message, warning all bot maintainers that Python 2 is to be depracated soon? I can do either thing for you. I can do a second analytics after a month (?) and email bot maintainers who still show at the list.

I think this could be useful, especially if combined with a link to some general info about how to migrate (both as a script user and as a script writer) and a link to a partilly filled out new task template for requesting support (for migrating specific bots). Maybe set up a temporary subproject to Pywikibot for the migration tasks?
Same e-mail should also go out on the pywikibot list.

  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning

Also surprised but I would expect a bunch of them to be regularly run cron jobs where nobody looks at the logs unless it crashes. As the co-maintainer of one such bot I know we don't (and when I do look at the logs I filter out FutureWarning 🙄).
@Urbanecm If not too much work it would be interesting to see what the percentage distribution looks like for number of users (alternativet named scripts) rather than requests (just to get a feel for if 2.7 is dominated by a few large users or if it's simple scripts run by many small volume users.

No problem:

To whose who're wondering about less Python versions used there: I had to exclude anonymous requests from this statistics because I wasn't able to get an username to filter those on and decidedv that IP-based filtering is too much work.

Urbanecm moved this task from Backlog to Working on on the User-Urbanecm board.Jan 12 2020, 2:20 PM

Is there anything else I can help you with, or can we close this task?

Xqt added a comment.Jan 13 2020, 1:28 PM

Probably it would be interesting to have a new statistic in few months but this is enough for me now. Thanks a lot.

Urbanecm closed this task as Resolved.Jan 13 2020, 1:29 PM
Urbanecm moved this task from Working on to Backlog on the User-Urbanecm board.

Okay, closing to hide it from my dashboard. Feel free to re-open once you need a followup analytics!

Bot usernames are (mostly) connected with exactly one operator. As said in previous comment, I'm happy to contact the bot owners on your behalf, I'm also happy to give them your contact info if you want me to do so, so they can contact you directly with question. That would be totally okay. I'm not going to release the data unless approved by Legal. I can contact them, but approval can take weeks to months :-).

Coming back to this. See T240369 . That seems to be a good way to follow up on this. Maybe create tasks like that for the top bots still running Python2? That way we can help these operators to do the move.