Page MenuHomePhabricator

New users not created on loginwiki in Labs
Closed, ResolvedPublic

Description

Noticed this while digging into another bug. In beta, users are not created on loginwiki. So cross-project login isn't working.

The odd thing that I think is happening is that the browser isn't getting all of the cookies that the wiki is setting. Maybe a Varnish issue?

In the beta logs, I see the user is redirected to loginwiki:

0.1661 4.2M Start request GET /wiki/Special:CentralLogin/start?token=b4bac89e539642795df51e425a974ffc
HTTP HEADERS:
HOST: login.wikimedia.beta.wmflabs.org

And then (as it should), loginwiki sets the user's cookies:

loginwiki-9c1ccc4f: 0.1995 8.8M [cookie] setcookie: "centralauth_Session", "aa2442f2d2e716dc89eb16d1f89f3cfe", "0", "/", "", "", "1"
loginwiki-9c1ccc4f: 0.2004 8.8M [cookie] setcookie: "centralauth_User", "TestBug16864'7", "1396574220", "/", "", "", "1"
loginwiki-9c1ccc4f: 0.2006 8.8M [cookie] setcookie: "centralauth_Token", "", "1393895820", "/", "", "", "1"

However, in my browser, the only Set-Cookie header is:

Set-Cookie centralauth_Session=aa2442f2d2e716dc89eb16d1f89f3cfe; path=/; httponly; GeoIP=US:Santa_Clara:37.3541:-121.9552:v4; path=/

When the AutoLogin code runs, loginwiki then has:

0.1973 4.2M Start request GET /wiki/Special:CentralAutoLogin/refreshCookies?type=1x1&wikiid=enwiki&proto=http
HTTP HEADERS:
HOST: login.wikimedia.beta.wmflabs.org
USER-AGENT: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0
ACCEPT: image/png,image/*;q=0.8,*/*;q=0.5
ACCEPT-LANGUAGE: en-US,en;q=0.5
DNT: 1
REFERER: http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Main_Page&gettingStartedReturn=true
COOKIE: GeoIP=US:Santa_Clara:37.3541:-121.9552:v4; centralauth_Session=aa2442f2d2e716dc89eb16d1f89f3cfe
X-VARNISH: 497335809, 1267638667
X-FORWARDED-FOR: 50.136.233.16, 127.0.0.1
ACCEPT-ENCODING: gzip
loginwiki-8577f040: 0.1980 4.2M CACHES: MemcachedPhpBagOStuff[main] MemcachedPhpBagOStuff[message] MemcachedPhpBagOStuff[parser]
loginwiki-8577f040: 0.2054 5.0M LocalisationCache: using store LCStoreCDB
loginwiki-8577f040: 0.2084 5.5M Fully initialised
loginwiki-8577f040: 0.2148 5.8M CentralAuthHooks::onUserLoadFromSession: no User cookie, so unable to check for session mismatch

So no session because the centralauth_User cookie is missing.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=62001

Details

Reference
bz62244

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:59 AM
bzimport set Reference to bz62244.
bzimport added a subscriber: Unknown Object (MLST).
csteipp created this task.Mar 5 2014, 1:42 AM

Moving this to labs bugs, since I think this is a config issue with beta, not the extension.

This issue is not showing up in production (although hopefully not by some config that is going into production soon!)

This is currently also causing Selenium tests to fail which need to log in. e.g. https://saucelabs.com/jobs/5ca77278cb5e412c8e7a231561f6ba4d

upping the importance here. this seems to be a recent phenomena, if you want to pinpoint a time let me know

also, we seem to have the same issue on test2wiki (and presumably mw.o)

The problem started showing up yesterday afternoon SF time.

FWIW, just before the automated test displays the "No active login attempt... " message, Firefox shows a message "Transferring data from bits..." and then "Read bits..."

And, it seems to be intermittent. Of seven automated tests that login, two (in the most recent run) Mar 5 incurred the "No active login attempt..." message.

(In reply to Chris McMahon from comment #3)

upping the importance here. this seems to be a recent phenomena, if you
want to pinpoint a time let me know

Please!

(In reply to Chris McMahon from comment #4)

also, we seem to have the same issue on test2wiki (and presumably mw.o)
The problem started showing up yesterday afternoon SF time.
FWIW, just before the automated test displays the "No active login
attempt... " message, Firefox shows a message "Transferring data from
bits..." and then "Read bits..."
And, it seems to be intermittent. Of seven automated tests that login, two
(in the most recent run) Mar 5 incurred the "No active login attempt..."
message.

That's a different issue. That is memcache not working right. In the issue on this bug, loginwiki has the token and session, it's a problem with the cookie setting.

Created attachment 14743
central login problem

Attached:

on mw.o just now:

load a page, get a message saying Central login/You are centrally logged in. Reload the page to apply your user settings.

Reload the page, notice re-occurs, I am still not logged in.

(In reply to Chris McMahon from comment #7)

on mw.o just now:
load a page, get a message saying Central login/You are centrally logged in.
Reload the page to apply your user settings.
Reload the page, notice re-occurs, I am still not logged in.

Again, this is a separate issue. Please file separate bugs for them.

Mentioned on IRC: to have multiple login issues show up simultaneously on beta labs and test2wiki/mw.o within roughly a 12-hour period suggests a single root cause.

I could be wrong.

I have a manual repro for the login issue, still not sure it's connected to the original report, and apologies for hijacking the BZ ticket

on http://en.wikipedia.beta.wmflabs.org

clear cache and cookies

  • Enter username and password
  • Click "Log in " button and immediately click "Random article" link

Get error message "Central user log in/No active login attempt is in progress for your session."

Repeat these steps in the same browser session and the second time get the error message:

Central user log in/The token does not match with your session username.

Someone might be trying to log you in under a malicious account./If you are just trying to log in as another user, please log out first.

Unable to repro manually on test2wiki but see screen shot for evidence that it can happen.

Created attachment 14744
no active login error msg on test2wiki

Attached:

Created attachment 14753
login script for internal/external comparison

We definitely have an issue with the Caching. This script when run internally against the apache directly returns the correct Set-Cookie headers. When run externally, only the Session and GeoIP cookies are set.

Further, if you add a

-H 'Cookie: centralauth_Session=asdf; GeoIP=asdf'

to the final curl call, the correct cookies are returned. It doesn't matter what values they have, but when those 2 cookies are set, then all of the cookies come back through.

Since GeoIP is needed for the correct cookies to come back, I'll add ori in case this is related to the GeoIP cookie vcl.

Attached:

hashar added a comment.Mar 6 2014, 9:15 AM

Running script script against Apache and Varnish have the following difference:

Apache:

Cache-control: private, must-revalidate, max-age=0
Set-Cookie: centralauth_Session=2f8f82190cd8c3a066aa5c8abca47699; path=/; httponly
Set-Cookie: centralauth_User=TestBug16864; expires=Sat, 05-Apr-2014 09:09:28 GMT; path=/; httponly
Set-Cookie: centralauth_Token=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly

Varnish:

Age: 0
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Set-Cookie: centralauth_Session=4dc3572e4bde7a617de112f22d3fa18a; path=/; httponly; GeoIP=FR::48.8600:2.3500:v4; path=/
X-Cache: deployment-cache-text1 miss (0), deployment-cache-text1 frontend miss (0)

hashar added a comment.Mar 6 2014, 9:29 AM

Ori change "Send GeoIP lookup result as 'GeoIP' cookie" is https://gerrit.wikimedia.org/r/#/c/113935/

hashar added a comment.Mar 6 2014, 9:37 AM

On deployment-cache-text1.pmtpa.wmflabs I edited /etc/varnish/text-frontend.inc.vcl and commented out the GeoIP snippet added by change 113935:

https://gerrit.wikimedia.org/r/#/c/113935/6/templates/varnish/text-frontend.inc.vcl.erb,unified )

Restarted varnish with:

/etc/init.d/varnish restart ; /etc/init.d/varnish-frontend restart

I then ran Chris script and I received three cookies:

< Set-Cookie: centralauth_Session=3428f185da995eaaf9550ae54b4e634f; path=/; httponly
< Set-Cookie: centralauth_User=TestBug16864; expires=Sat, 05-Apr-2014 09:34:59 GMT; path=/; httponly
< Set-Cookie: centralauth_Token=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly
< Location: http://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralLogin/complete?token=ef3afb1dbc82549c1655b6841d8614ae
< Age: 0
< X-Cache: deployment-cache-text1 miss (0), deployment-cache-text1 frontend miss (0)
< Cache-Control: private, s-maxage=0, max-age=0, must-revalidate

I have no idea why it would work in production and not on labs though :(

hashar added a comment.Mar 6 2014, 9:52 AM

Apparently happens on production as well: bug 62288 'Random API issues'

Fixed by Faidon by disabling the cookie based geolocation on text varnish: https://gerrit.wikimedia.org/r/#/c/117004/

The shell script pass and shows three Set-Cookie: headers now.

For completeness, copying from my postmortem:

What happens is, manipulation of the Set-Cookie header from VCL was
broken. Ori's code is:

set_cookie_header = VRT_GetHdr(sp, HDR_RESP, "\013set-cookie:");
if (set_cookie_header == NULL) {
        // New header
        VRT_SetHdr(sp, HDR_RESP, "\013Set-Cookie:", cookie_out, vrt_magic_string_end);
} else {
        // Append to existing header
        VRT_SetHdr(sp, HDR_RESP, "\013Set-Cookie:", set_cookie_header, "; ", cookie_out, vrt_magic_string_end);
}

While this looks correct on a first glance, it makes you wonder what
would happen with multiple occurences of Set-Cookie. What happens is
exactly what you observe: VRT_GetHdr() returns only one of them and
VRT_SetHdr() sets all of them, essentially discarding all Set-Cookie
headers but one.

As for going forward: the std vmod has an std.collect() function to
collect multiple headers into one, but I don't believe this would work
with Set-Cookie. There's https://github.com/varnish/libvmod-header
that's designed exactly to address this use case -- its description is
"Varnish Module (vmod) for manipulation of duplicated headers (for
instance multiple set-cookie headers)". We should probably use this
instead.

Zeljkof confirmed the QA job that had the issue is now passing the login step

https://wmf.ci.cloudbees.com/job/MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox/91/
https://saucelabs.com/tests/1a52304399624b5a8a76243f32372e63

(Job still fails but for a different reason).

Thanks Chris Steipp for the shell script!

hashar added a comment.Mar 6 2014, 1:43 PM
  • Bug 62288 has been marked as a duplicate of this bug. ***

Confirmed that beta is once again giving out all the cookies for loginwiki, and so accounts are automatically attached on loginwiki on creation and SUL actually all works now. Thanks for all the help!

Interesting, because we had an occurrence of bug 62288 with the commons api monitor, but it passed on re-run.

The same symptoms have reappeared for the UploadWizard API test:

https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/389/console

Could be a different bug, but the last occurrence of this failure on that particular test led my to this bug.

Gilles: Isn't that bug 63981?

Probably, since it just came back to normal.