Page MenuHomePhabricator

modernize DjangoBannerStats to python3
Closed, ResolvedPublic

Description

DjangoBannerStats is still running on python2 which is obsolete.

Note the 'pgehres' database this uses has not been fully converted to utf8mb4, and should probably be done that the same time as we deploy the python3 port. That is task T277484. Also there's the cleanup of bad country and language values T198639.

Event Timeline

greg added a subscriber: greg.

07:49:59 <Jeff_Green> fr-tech is there anyone who can look at https://phabricator.wikimedia.org/T301905? The urgency has just gone up since we're upgrading the analytics servers to bullseye

For what it's worth, I did a little poking at this on a virtualbox bullseye box and got LoadLPImpressions to stop faceplanting on startup by adjusting several minorish things:

  • invoke with python3
  • add sys.path.append("/etc/fundraising") to manage.py so it finds django_settings.py
  • cPickle -> pickle (cPickle appears to be unnecessary in python3, as pickle has improved performance)
  • urllib.unquote -> urllib.parse.unquote (untested, might need more fixing)
  • urlparse -> urllib.parse (also untested)
  • models.ForeignKey() stuff is missing an on_delete parameter, I'm not sure what is appropriate here

Clearly it's a project, but I'm hopeful it's not a nightmare project.

There is a functional test environment on frdev1002. I've done a bunch of work on LoadBannerImpressions2Aggregate and LoadLPImpressions which are the only subcommands we seem to use. Work in progress is /srv/DjangoBannerStats, and /etc/fundraising/django_settings.py is set up to use frdb1006 with dev_pgehres. This needs review and testing! I'm not bothering to commit any of this until someone with more familiarity with django and the dataset has looked at it.

greg triaged this task as High priority.Jan 31 2023, 4:59 PM

Change 885435 had a related patch set uploaded (by Jgreen; author: Jgreen):

[wikimedia/fundraising/tools/DjangoBannerStats@master] WIP: first pass at porting to python3

https://gerrit.wikimedia.org/r/885435

Thanks for all this!!

For debugging through this locally: here's a draft fr-dev patch with setup and instructions for attaching a debugger to a different Python script in the tools repo. Should in theory work for DjangoBannerStats, too.

https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/dev/+/841441

Change 886936 had a related patch set uploaded (by AndyRussG; author: AndyRussG):

[wikimedia/fundraising/dev@master] Add DjangoBannerStats source to fd-dev

https://gerrit.wikimedia.org/r/886936

Change 841441 had a related patch set uploaded (by AndyRussG; author: AndyRussG):

[wikimedia/fundraising/dev@master] [WIP] jobs service

https://gerrit.wikimedia.org/r/841441

Instructions for running under fundraising-dev with bullseye:

This allows running the WIP python3 version of DjangoBannerStats under fundraising-dev on bullseye.

  • Follow the instructions in the commit message for https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/dev/+/841441
  • Open a shell in the jobs container: docker-compose exec jobs bash
  • Set PYTHONPATH: export PYTHONPATH="/etc/fundraising:/srv/django-banner-stats/"
  • Install specific versions of packages.
    • pip3 install "Django==1.11.29" -t /srv/django-banner-stats/packages
    • pip3 install "PyMySQL==0.9.3" -t /srv/django-banner-stats/packages
    • pip3 install "mysqlclient==1.3.14" -t /srv/django-banner-stats/packages
  • Run the tool:
    • python3 /srv/django-banner-stats/manage.py LoadLPImpressions --verbose --recent
    • python3 /srv/django-banner-stats/manage.py LoadBannerImpressions2Aggregate --verbose --top --recent

Change 885435 merged by jenkins-bot:

[wikimedia/fundraising/tools/DjangoBannerStats@master] Porting to python3

https://gerrit.wikimedia.org/r/885435

Update

Summary

  • The updated code for this is merged and deployed, but I'm unable to turn on the process-control job myself due to a permissions issue. (Can't push to fran1001:/var/lib/git/localsettings.git/)
  • Comparing the output of the old and new versions of the script, via spot-checking, we see the following:
    • Data in banner impressions seems identical.
    • Data in landingpageimpression_raw seems almost identical.
    • Data in donatewiki_unique seems to get additional entries with the new script. These entries are not present in data generated previously by the old script in the production pgehres database. However, this was probably due to an issue with the old script, or something related to the setup at that time, causing the old script to incorrectly drop data.

Details of data comparison

Below we compare legacy data (in pgehres, on frdb1003), data generated recently with the old version of the script (in dev_pgehres_p2 on frdb1006) and data generated with the new version (in dev_pgehres_p3 on frdb1006), for specific days only. Thanks so much @Jgreen for setting up these test dbs and running the code, and for all the work on this!

bannerimpressions

After taking into account details of how timestamps appear in logs lines and log filenames, I was able to run queries that get identical results for both versions. Here is a query of dev_pgehres_p3 on frdb1006 (output from the python3 version, run on only logs for 2022-12-11):

select count(*) from dev_pgehres_p3.bannerimpressions where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';
+----------+
| count(*) |
+----------+
|   167310 |
+----------+

The same query, run on the production pgehres on frdb1003 (containing results processed by the python2 verison of the script before the outage) yields an identical number of rows:

select count(*) from pgehres.bannerimpressions where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';
+----------+
| count(*) |
+----------+
|   167310 |
+----------+

landingpageimpression_raw

Almost identical results all around. I haven't dug into the 1-row difference shown here.

frdb1006:

MariaDB [(none)]> select count(*) from dev_pgehres_p3.landingpageimpression_raw where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';
+----------+
| count(*) |
+----------+
|   162634 |
+----------+
MariaDB [(none)]> select count(*) from dev_pgehres_p2.landingpageimpression_raw where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';
+----------+
| count(*) |
+----------+
|   162633 |
+----------+

frdb1003:

MariaDB [(none)]> select count(*) from pgehres.landingpageimpression_raw where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';
+----------+
| count(*) |
+----------+
|   162633 |
+----------+

donatewiki_unique

frdb1006:

MariaDB [(none)]> select count(*) from dev_pgehres_p2.donatewiki_unique where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';
+----------+
| count(*) |
+----------+
|     1089 |
+----------+
MariaDB [(none)]> select count(*) from dev_pgehres_p3.donatewiki_unique where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';
+----------+
| count(*) |
+----------+
|     1089 |
+----------+

frdb1003:

MariaDB [(none)]> select count(*) from pgehres.donatewiki_unique where timestamp >= '2022-12-11 01:00:00' and timestamp < '2022-12-11 23:00:00';                                            
+----------+
| count(*) |
+----------+
|      892 |
+----------+

So, here we have almost 200 rows less on production pgehres than we got by running the either the new or old versions of the script recently on the same log files. However, at least for the logs that I checked, the new rows created only by the recent runs of the script seem fine. They contain legitimate information that is indeed in the log files. So, it seems likely the difference is due to an issue with the old version of the script, or something about how it was set up (some encoding issue? issue with a library? db issue?) that was causing it to incorrectly drop entries. It seems unlikely to be a db timeout issue, since the corresponding landingpageimpression_raw entries were written correctly to the prod database.

Jgreen claimed this task.
Jgreen added a project: fundraising-tech-ops.
Jgreen moved this task from Triage to Done on the fundraising-tech-ops board.

Change 841441 abandoned by Jgleeson:

[wikimedia/fundraising/dev@master] [WIP] jobs service

Reason:

Breaking this up into smaller patches. First one here I985b050c2b6a563d22a0f3c5aec3a72107c154e3

https://gerrit.wikimedia.org/r/841441