Page MenuHomePhabricator

http://reportcard.wmflabs.org/ is not updating automatically
Closed, DuplicatePublic

Description

Report card

Loading http://reportcard.wmflabs.org/ shows that were in October but we only have up to date date till August. Screenshot attached. What's blocking us automating this?


Version: unspecified
Severity: normal

Attached:

Report_Card.jpg (761×1 px, 103 KB)

Details

Reference
bz56030

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:25 AM
bzimport set Reference to bz56030.

Tomasz, the reportcard works as follows:

The meeting titled "Month X Metrics Meeting" will happen at the beginning of month X. Therefore, the most recent data it can have is month X-1 pageview data and month X-2 wikistats data. The wikistats data is resource intensive and can't be computed for month X-1 in time for this meeting. Also for some months, pageview data doesn't get delivered because there are problems with wikistats data that require manual intervention; in those cases we use month X-2 pageview data.

We have suggested a solution for this, which is to simply move the computation to hadoop. That migration is tracked by these two epics:

https://mingle.corp.wikimedia.org/projects/analytics/cards/1126
https://mingle.corp.wikimedia.org/projects/analytics/cards/1125

However, as of now, those epics have not been prioritized above our other work. I think to answer "What's blocking us automating this?", I would just say "Too many other - higher priority - requests for analytics engineering resources, coupled with not enough analytics resources".

If you'd like more info about how hard those epics would be to implement, or what all these other requests are, just ping me in IRC or over email - I'm happy to talk.

"The wikistats data is resource intensive and can't be computed for month X-1 in time for this meeting."
This is actually the dumps which take up to three weeks into the next month to arrive. Only a live stream of all relevant tables to hadoop could fix that.

"to simply move the computation to hadoop" not sure how simple that would be though, I've seen some wild optimism in earlier estimations on hadoop, but for sure this is where we want to go

"Also for some months, pageview data doesn't get delivered because there are problems with wikistats data that require manual intervention; in those cases we use month X-2 pageview data." This is a mistake. Page views counts are updated every day. There is a manual step in Limn (and for some files in Wikistats, but page view data could be sent automated right now). So only when Metrics Meeting is on 1st or 2nd day of the month, latest pageview data don't make in into Limn. Updating Limn after the Metrics Meeting could also help.

It's now November and all of our core metrics are still only updated to August.

When will the current run finish?

Which core metrics?

Dump based stats are up to date for September. (F5 may be needed)
http://stats.wikimedia.org/EN/Sitemap.htm

(At recent Quartely Analytics Review Meeting I proposed once again to make the dump process smarter, we can and should have stub dumps within days after closure of month)

Page view stats are updated daily, as they have been for years.
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Squid log based reports are up to date to September
http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm

Geo based reports likewise
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryOverview2013Q3.htm

For October (which was ongoing) we need to do some serious investigating, seems there were external issues with our ip->geo lookup. We just found out about that.

(In reply to comment #5)

Which core metrics?

The default "Core" tab referred to in the url of this bug

Created attachment 13664
November Core Graph showing issue

Attached:

Screen_Shot_2013-11-01_at_4.01.29_PM.png (988×1 px, 210 KB)

Comment on attachment 13664
November Core Graph showing issue

Ah, comScore stats are published around 20th of the month. Two people need to take some manual steps to get this into Limn. I prep all data for Limn in one go for efficiency sake. If someone with less backlog (BTW I work 1/3 FTE) wants to take over I'd be happy to explain how to do this.

(In reply to comment #8)

Comment on attachment 13664 [details]
November Core Graph showing issue

Ah, comScore stats are published around 20th of the month. Two people need to
take some manual steps to get this into Limn. I prep all data for Limn in one
go for efficiency sake. If someone with less backlog (BTW I work 1/3 FTE)
wants
to take over I'd be happy to explain how to do this.

Thanks for the info. All graphs on the page run into this problem including our own data sets. comScore just happens to be at the top.

This was brought up at the Analytics Quarterly review and we are working with Ken/ops to address the root cause.

This was brought up at the Analytics Quarterly review and we are working with Ken/ops to address the root cause.

@Tnegrin: Has any progress been made in the last 22 months?

@Aklapper, Toby shouldn't be assigned this any more, he's not the director of Analytics. But no, no progress has been made in the last 22 months. Wikistats 2.0 is a project we're starting next quarter, and we'll be prioritizing automating the reportcard as part of that project. This issue is somewhat redundant, so I'll close it as a high level duplicate of our Wikistats 2.0 project task.