Goals
- Confirm whether this issue is still present (see original report below)
- If present, understand the causes
- If present, improve performance to reach comparable speed (e.g s/item) to other message groups. Filed as the following follow-ups:
- T245974: Alter ttmserver-export.php to allow re-running the script to repair data without having user impact due to deleting all data up front
- T245973: Improve ttmserver-export.php with intelligent batching
Approach
To test this, one needs to run ttmserver-export.php on a database with many banners and other groups (e.g. metawiki). However, that script alters the index, so running it on production index should be avoided. There are three main (non-exclusive) approaches:
Approach 1: setup a local testing environment.
– Requires work to setup
– Requires work to fill with data (and learning how the system works)
– May not sufficiently reflect the production environment
+ Most flexible debugging (no need to wait for deployments)
+ Very good thing to have for future issues
Approach 2: make the script non-destructive [CHOSEN]
– Possibly requires deployment (see below)
– Skips the ElasticSearch part (which probably doesn't matter)
+ Can test with production data
Approach 3: use testing index
– Requires config deployment
– Some reservations from Discovery (we don't want to accidentally cause troubles with production cluster)
+ Most realistic test case
+ Can test with production data
My suggestion is to start with approach 2, while adding some additional instrumentation to measure which part of the code may be slow. It may be possible to avoid deployment by creating a modified copy of the script on the maintenance server and requiring it from eval.php. If not, it has to go through Gerrit and normal deployments.
Original report
See P1968 for log. CN message groups alone about two hours to index, which is more than all other message groups across all sites.
There seems to be something blocking there, because even though I ran with 4 threads, only one of them seemed to be doing something at a time during CN groups. You can see this in the log by having bunch of groups complete (not shown in the log) at the same time, and new threads spawned for new groups at the same time (shown in the log).