Page MenuHomePhabricator

Provide some Pywikibot usage statistics for Python2.7 and Python3.x
Closed, ResolvedPublic

Description

Some people are extremely eager to get rid of python 2.7, but we have no clue what the impact will be on our user base. Fortunately Pywikibot provides user agent data. Please gather some statistics to get an idea of Python2.x usage vs Python3.x usage so we can get an idea about the impact.

Possible things to gather:

  • Number of unique users per version
  • Number of edits per version
  • Top bots still using Python 2.x
  • ....

We had similar tasks in the switch from pywikibot-compat -> core, maybe some re-use possibilities.

Event Timeline

An all usage statistic for 2019 is given here:
https://www.jetbrains.com/lp/devecosystem-2019/python/
where one of 10 users still takes Python 2.
Anyway it could be important if a huge amount of bots are still running with Python 2 and not Python 3 which was recommended for a long time ago.

Urbanecm subscribed.

Maybe I'm doing something wrong, but I logged to Turnilo and submitted the following query against webrequest_sampled_128 (1/128 of all webrequests):

image.png (139×1 px, 13 KB)

It should be top 100 useragents containing Pywikibot in last 30 days. Downloaded the result (full useragents) to a CSV and analyzed it using a table processor:

image.png (316×190 px, 10 KB)

I'm posting here the data using percentages. I've spot-checked the analytics, and it seems to be correct (I did check the 2.7.13.final.0 row, it indeed is 48%). It's actually percentage of webrequests instead of edits - I don't see post data in Turnilo, but given how specific PWB useragents are, it shouldn't be a big problem.

Maybe we can add the deprecation to the tech news, or even massmessage technical village pumps with a special message, warning all bot maintainers that Python 2 is to be depracated soon? I can do either thing for you. I can do a second analytics after a month (?) and email bot maintainers who still show at the list.

This seems similar to core/compat times. One of the reasons is that help pages and docs suggest python pwb.py, which in Debian still is Python 2. Anyway if we don't want to do the transition using a new branch, we will need to inform bot maintainers every possible way and also warn the last ones by email.

This should be pretty similar to moving tools to Debian Stretch at the beginning of this year. Is there any schedule of Stretch transition we could use?

There are also some tasks for some bots already under T242120

The result is surprising me a bit:

  • good news: Python 3.4 isn't used and we can drop support for that release soon. (T239542)
  • strange: since April 2018 Pywikibot master branch and stable release 3.0.20180403 or newer cannot run with Python 2.7.3 (T191192)
  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning

The result is surprising me a bit:

  • good news: Python 3.4 isn't used and we can drop support for that release soon. (T239542)

Actually, it is used. It wasn't in my original list because Turnilo allows me to see just top 100 user agents matching conditions listed. No 3.4 version was in that dataset. To help you get an idea of the user inpact, total number of requests using an UA matching Pywikibot.*Python/3.4 regex is 31.9k, while when I look for Pywikibot.*Python/3, it's 3,3mil (and when I look for Pywikibot.*Python/2, it's 3.7mil).

  • strange: since April 2018 Pywikibot master branch and stable release 3.0.20180403 or newer cannot run with Python 2.7.3 (T191192)

Full user agent for this one is REDACTED (wikipedia:REDACTED; User:REDACTED) Pywikibot/3.0-dev (-1 (unknown)) requests/2.7.0 Python/2.7.3.final.0 (username, script name and project language redacted by myself for privacy). Not sure what Pywikibot/3.0-dev (-1 (unknown)) means exactly through.

  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning
  • strange: since April 2018 Pywikibot master branch and stable release 3.0.20180403 or newer cannot run with Python 2.7.3 (T191192)

Full user agent for this one is REDACTED (wikipedia:REDACTED; User:REDACTED) Pywikibot/3.0-dev (-1 (unknown)) requests/2.7.0 Python/2.7.3.final.0 (username, script name and project language redacted by myself for privacy). Not sure what Pywikibot/3.0-dev (-1 (unknown)) means exactly through.

This is a pywikibot release prior than 17/02/2018 and an insecure requests release

Do you think we should inform the user about it?

Do you think we should inform the user about it?

Sure.

@Urbanecm thanks for looking into this. Appreciated. Not sure if bot usernames are a privacy issue, I think not because these are role accounts. I recall in the past we had a list of top bots using an old version so we could contact the operator. Maybe double check with the legal privacy guru's?

Bot usernames are (mostly) connected with exactly one operator. As said in previous comment, I'm happy to contact the bot owners on your behalf, I'm also happy to give them your contact info if you want me to do so, so they can contact you directly with question. That would be totally okay. I'm not going to release the data unless approved by Legal. I can contact them, but approval can take weeks to months :-).

Maybe we can add the deprecation to the tech news, or even massmessage technical village pumps with a special message, warning all bot maintainers that Python 2 is to be depracated soon? I can do either thing for you. I can do a second analytics after a month (?) and email bot maintainers who still show at the list.

I think this could be useful, especially if combined with a link to some general info about how to migrate (both as a script user and as a script writer) and a link to a partilly filled out new task template for requesting support (for migrating specific bots). Maybe set up a temporary subproject to Pywikibot for the migration tasks?

Same e-mail should also go out on the pywikibot list.

  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning

Also surprised but I would expect a bunch of them to be regularly run cron jobs where nobody looks at the logs unless it crashes. As the co-maintainer of one such bot I know we don't (and when I do look at the logs I filter out FutureWarning 🙄).

@Urbanecm If not too much work it would be interesting to see what the percentage distribution looks like for number of users (alternativet named scripts) rather than requests (just to get a feel for if 2.7 is dominated by a few large users or if it's simple scripts run by many small volume users.

BTW I created T242120 for migration support (and did a few)

Maybe we can add the deprecation to the tech news, or even massmessage technical village pumps with a special message, warning all bot maintainers that Python 2 is to be depracated soon? I can do either thing for you. I can do a second analytics after a month (?) and email bot maintainers who still show at the list.

I think this could be useful, especially if combined with a link to some general info about how to migrate (both as a script user and as a script writer) and a link to a partilly filled out new task template for requesting support (for migrating specific bots). Maybe set up a temporary subproject to Pywikibot for the migration tasks?

Same e-mail should also go out on the pywikibot list.

  • surprising: Didn't expect that highly usage of Python 2 after 8 months of FutureWarning

Also surprised but I would expect a bunch of them to be regularly run cron jobs where nobody looks at the logs unless it crashes. As the co-maintainer of one such bot I know we don't (and when I do look at the logs I filter out FutureWarning 🙄).

@Urbanecm If not too much work it would be interesting to see what the percentage distribution looks like for number of users (alternativet named scripts) rather than requests (just to get a feel for if 2.7 is dominated by a few large users or if it's simple scripts run by many small volume users.

No problem:

image.png (251×277 px, 9 KB)

To whose who're wondering about less Python versions used there: I had to exclude anonymous requests from this statistics because I wasn't able to get an username to filter those on and decidedv that IP-based filtering is too much work.

Is there anything else I can help you with, or can we close this task?

Probably it would be interesting to have a new statistic in few months but this is enough for me now. Thanks a lot.

Urbanecm moved this task from Working on to Backlog on the User-Urbanecm board.

Okay, closing to hide it from my dashboard. Feel free to re-open once you need a followup analytics!

Bot usernames are (mostly) connected with exactly one operator. As said in previous comment, I'm happy to contact the bot owners on your behalf, I'm also happy to give them your contact info if you want me to do so, so they can contact you directly with question. That would be totally okay. I'm not going to release the data unless approved by Legal. I can contact them, but approval can take weeks to months :-).

Coming back to this. See T240369 . That seems to be a good way to follow up on this. Maybe create tasks like that for the top bots still running Python2? That way we can help these operators to do the move.

Sorry @Multichill, missed your message. Seems the warning messages that got sent out helped a lot:

image.png (308×296 px, 13 KB)