Page MenuHomePhabricator

Sharp dip in MediaWiki.site.users metric since wmf11 roll-out
Closed, DeclinedPublic

Description

The roll-out of wmf11 coincides with a sharp decline in new user registrations as recorded by the MediaWiki.site.users graphite metric. This is true of the second roll-out as well, so whatever is depressing the account creation rate is still not fixed.


http://graphite.wikimedia.org/S/s

Related SAL entries for Jan 27th around that time (time is UTC):

21:26 logmsgbot: ori@mira Synchronized docroot and w: (no message) (duration: 02m 26s)
19:48 YuviPanda: started nfs-exports daemon on labstore1001, had been dead for a few days
19:32 mutante: stat1002 - redis.exceptions.ConnectionError: Error connecting to mira.codfw.wmnet:6379. timed out.
19:31 mutante: stat1002 - running puppet, was reported as last run about 4 hours ago but not deactivated
19:14 logmsgbot: dduvall@mira rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.11
19:07 ejegg: set donation queue consumer time limit back to 90 sec
18:49 logmsgbot: jynus@mira Synchronized wmf-config/db-eqiad.php: Repool pc1006 after cloning (duration: 02m 25s)
18:48 bd808: HHVM on mw1019 still dying on a regular basis with "Lost parent, LightProcess exiting"
18:00 csteipp: deploy patch for T103239
17:50 csteipp: deploy patch for T97157

Related Objects

Event Timeline

ori created this task.Jan 28 2016, 9:05 AM
ori raised the priority of this task from to Unbreak Now!.
ori updated the task description. (Show Details)
ori added subscribers: ori, Anomie, bd808, Tgr.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 28 2016, 9:05 AM
Tgr added a comment.Jan 28 2016, 9:34 AM

I don't see an obvious difference in successful registrations (which of course might simply mean that the logging code is wrong).

web

api

Where does MediaWiki.site.users come from, exactly?

ori added a comment.Jan 28 2016, 10:22 AM

I don't see an obvious difference in successful registrations (which of course might simply mean that the logging code is wrong).

Yes, that seems likely, given the graph in your comment.

Where does MediaWiki.site.users come from, exactly?

It is incremented in SiteStatsUpdate::doUpdateContextStats in MediaWiki.

hashar added a subscriber: hashar.Jan 28 2016, 10:27 AM

Another graph of MediaWiki.site.rate.users centered on Jan 27th shows the drop occurred between 19:10 and 19:20 :

@Tgr wrote:

Where does MediaWiki.site.users come from, exactly?

I asked the same to ori, quoting him from IRC:

the rate at which new accounts are being created, I believe
it is incremented in SiteStatsUpdate::doUpdateContextStats() whenever an account is created
(and on some other conditions)
by which i mean that method gets called for other reasons too. but the user count is only incremented when an account is created.

hashar updated the task description. (Show Details)Jan 28 2016, 10:28 AM
hashar set Security to None.
Tgr added a comment.Jan 28 2016, 10:43 AM

Where does MediaWiki.site.users come from, exactly?

Ah, found it, SiteStatsUpdate.

Seems like the stats update is broken, not registrations:

tgr@mw1152:~$ sql centralauth -e 'select substr(gu_registration, 1, 11) hour, count(*) users from globaluser where gu_registration > '20160120000000' group by hour order by hour desc' > reg_hourly.tsv

https://docs.google.com/spreadsheets/d/1Kja3tnTZucXurCpYrT-z_WO2aP_J0DHqxnFc4PIU0HI/edit?usp=sharing

From the https://grafana.wikimedia.org/dashboard/db/authentication-metrics authentication dashboard. The error rate corresponds to login need_token. The first spike being 19:23 - 19:30 , the second 19:47 - 19:51.

Login:

Error rate (relative)

The authentication metric dashboard is based on MediaWiki.authmanager.login.<entry point>.failure. The need_token spikes are related to the api entry point.


Account creation doesn't show a drop. Seems to indicate the new session manager does not invoke SiteStatsUpdate::doUpdateContextStats() the same way (i.e. it would skip calling it for some reason. That would explain the MediaWiki.site.users.rate drop.

Graph based on MediaWiki.authmanager.accountcreation.<entry point>.{success,failure}:

hashar lowered the priority of this task from Unbreak Now! to High.Jan 28 2016, 10:57 AM

No more unbreak now. @Tgr has shown the # of accounts created per hour is stable.

bd808 renamed this task from Sharp dip in new accounts since wmf11 roll-out to Sharp dip in MediaWiki.site.users metric since wmf11 roll-out.Jan 28 2016, 4:01 PM
bd808 updated the task description. (Show Details)

I updated the description and title to be a bit less inflammatory based on database analysis in T125054#1973610.

Tgr added a comment.Jan 29 2016, 2:21 AM

This is probably caused by T125133 (and possibly a similar effect on other sites) - the previous graph counts central accounts while Ori's counts local accounts over all wikis.

Tgr added a comment.Feb 12 2016, 5:18 PM

I suspect this is caused by T70012. The two main sources of autocreation are CentralAuthCreateLocalAccountJob running on loginwiki, mediawiki and meta, and edge login that preemptive logs you in (and thus autocreates) on all second level domains when you use the login page. Loginwiki autocreation was fixed in T125133 which brought the pre-and post-SessionManager graphs closer but not quite overlapping; the remainder is probably due to the edge login requests getting into an autocreation race on loginwiki and thus getting aborted and not autocreating on their eventual target wiki. Given that loginwiki has an accont creation job that should prevent such problems, I wonder if something is wrong with that. (Or this happens to people who register and then immediately log in to another wiki, but I wouldn't expect that to happen often enough to visibly affect stats.)

I don't see any current artifacts as badly misaligned as the 2016-01-22 through 2016-01-23 delta highlighted in the original report. Do we actually have any historical confidence in the periodicity of this statistic?

hashar removed a subscriber: hashar.Mar 7 2016, 3:19 PM

Six weeks later: Anybody knows how to proceed here? Is this still an issue to investigate?
I'd like to avoid having this task just rotting... Thanks!

Tgr added a comment.Apr 27 2016, 2:10 PM

Six weeks later: Anybody knows how to proceed here? Is this still an issue to investigate?

Try to reproduce T18864; probably that behavior has been affected somehow and there are less account autocreations on never-visited wikis.

Not sure if it's worth the effort to look into, though. That behavior was always unwanted, and any real bug (e.g. account not autocreated when the user visits the wiki) would probably have been reported by now.

Tgr added a comment.Jun 10 2016, 5:26 PM

T137261 might be related (not terribly likely though).

Anyone still investigating here, half a year later? Or is this task realistically "declined"?

Aklapper closed this task as Declined.Nov 18 2016, 12:17 PM

No replies for ages so I'm going to decline this to reflect reality.
Feel free to reopen if you still plan to investigate.