Page MenuHomePhabricator

Unique visitors data only sporadically available
Open, Needs TriagePublicBUG REPORT

Description

The unique visitors endpoints offer daily visitor counts based on IP address, which is very helpful since IP information isn't available to tool authors. Unfortunately, it only seems to be recorded every-so-often. Most dates such as 2023-04-29 will return zeros, and in some cases the API returns a 404, such as for 2023-02-29.

Below is a chart using the upcoming Toolviews visualization tool showing the unique visitors data across all tools from May 2022 through May 2023:

Screenshot from 2023-06-02 13-06-20.png (816×1 px, 119 KB)

As you can see, a lot of data is missing. If it helps, I believe that for the dates that the unique visitor counts are missing, they are missing for all tools, as opposed to just some of them.

Event Timeline

MusikAnimal renamed this task from Unique visitors data only sporatically available to Unique visitors data only sporadically available.Jun 2 2023, 5:11 PM
MusikAnimal updated the task description. (Show Details)

Actually, the normal "hits" data is sometimes unavailable, too. Here's the same date range (May 2022 - May 2023) for hit counts across all tools:

Screenshot from 2023-06-02 13-21-26.png (816×1 px, 157 KB)

I'm assuming getting historical data for the missing hits and unique visitor counts won't be possible, so I can state in the new Toolviews frontend that some data before [some date] may be unavailable. Some sort of user-facing warning should be shown, I think.

I'm assuming getting historical data for the missing hits and unique visitor counts won't be possible, so I can state in the new Toolviews frontend that some data before [some date] may be unavailable. Some sort of user-facing warning should be shown, I think.

You are correct that we do not have any way to backfill missing data. The database is populated by logrotate triggers on the proxy servers. There are a number of reasons that data might not make it to the database, but the most likely one would be issues with ToolsDB itself at the time that a front proxy's nginx access log was rotated. We do not currently monitor or alert on failures in this pipeline.

Unique visitors numbers do seem to be very broken. That's a feature that @Andrew added to the system. My only real involvement with it was merging a patch on the API side of things. I'm not seeing any data at all in the daily_ip_views table and I think that if I understand the general logic of https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/toolforge/files/toolviews.py that table should nearly always have some data in it.