During the update process, one of the slowest steps is fetching pageviews (depending on how many pages are being processed). This is currently done synchronously. The API allows us to make up to 100 requests at the same time. We should take advantage of this. The performance improvement should be in the orders of magnitude.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | BUG REPORT | None | T219835 [Spike: 3hrs] Event Metrics 'crunched numbers' for hours then stopped, but showed neither results nor an error msg. | ||
Resolved | MusikAnimal | T217911 Performance improvement: Fetch pageviews asynchronously |
Event Timeline
Comment Actions
This is certainly a "nice to have" and nothing urgent. I'm only logging it here so we don't forget about it.
Comment Actions
PR: https://github.com/wikimedia/eventmetrics/pull/288
To give an example, fetching pageviews for https://eventmetrics-dev.wmflabs.org/programs/76/events/370 takes around 2.5 minutes. With the async implementation it takes around 4.8 seconds.