Page MenuHomePhabricator

Search autocompletion broken for recent articles (after April 30, 2020?) for some users / browsers
Closed, ResolvedPublic

Description

For some people in some browser combinations, search autocompletion seems to be stuck in its April 30 state (ie. articles created after that date are not suggested). An example is the search string Révai Le not resulting in a suggestion for hu:Révai Leó. More examples and personal experiences can be found here (permalink). I don't know for sure if this is a huwiki-specific problem, but this enwiki report (permalink) seems to be about the same issue.

There seems to be very little logic in when the issue happens - for example, I can reproduce it with my WMF account but not with my personal account. Same browser, same IP, same search-related user preferences, disabling browser cache makes no difference... also if I directly request the API URL used by the search autocompletion JS code (this for the example above) it gives the correct result for all browsers. Maybe there is some kind of userid- or session-based load balancing involved, and some of the cache servers are corrupted?

Example of a request that works (via Chrome's "Copy as cURL" option):

1curl -v 'https://hu.wikipedia.org/w/api.php?action=opensearch&format=json&formatversion=2&search=R%C3%A9vai%20Le&namespace=&limit=10' \
2 -H 'authority: hu.wikipedia.org' \
3 -H 'pragma: no-cache' \
4 -H 'cache-control: no-cache' \
5 -H 'accept: application/json, text/javascript, */*; q=0.01' \
6 -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36' \
7 -H 'x-requested-with: XMLHttpRequest' \
8 -H 'sec-fetch-site: same-origin' \
9 -H 'sec-fetch-mode: cors' \
10 -H 'sec-fetch-dest: empty' \
11 -H 'referer: https://hu.wikipedia.org/wiki/Speci%C3%A1lis:Be%C3%A1ll%C3%ADt%C3%A1saim' \
12 -H 'accept-language: hu-HU,hu;q=0.9,en-US;q=0.8,en;q=0.7,de;q=0.6' \
13 -H 'cookie: thanks-thanked=21572598%252C21642211%252C21647732%252C21663766%252C21888484%252C22007193%252C22244880%252C22248534; GeoIP=HU:GS:Sopron:47.67:16.59:v4; huwikiUserName=Tgr; huwikiUserID=445; forceHTTPS=true; centralauth_User=Tgr; centralauth_Token=<redacted>; huwikimwuser-sessionId=<redacted>; stopMobileRedirect=true; loginnotify_prevlogins=<redacted>; VEE=wikitext; WMF-Last-Access-Global=19-May-2020; WMF-Last-Access=19-May-2020; huwikiSession=<redacted>; centralauth_Session=<redacted>' \
14 --compressed
15
16* Trying 2620:0:862:ed1a::1...
17* TCP_NODELAY set
18* Connected to hu.wikipedia.org (2620:0:862:ed1a::1) port 443 (#0)
19* ALPN, offering h2
20* ALPN, offering http/1.1
21* successfully set certificate verify locations:
22* CAfile: /etc/ssl/certs/ca-certificates.crt
23 CApath: /etc/ssl/certs
24* TLSv1.3 (OUT), TLS handshake, Client hello (1):
25* TLSv1.3 (IN), TLS handshake, Server hello (2):
26* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
27* TLSv1.3 (IN), TLS handshake, Unknown (8):
28* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
29* TLSv1.3 (IN), TLS handshake, Certificate (11):
30* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
31* TLSv1.3 (IN), TLS handshake, CERT verify (15):
32* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
33* TLSv1.3 (IN), TLS handshake, Finished (20):
34* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
35* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
36* TLSv1.3 (OUT), TLS handshake, Finished (20):
37* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
38* ALPN, server accepted to use h2
39* Server certificate:
40* subject: C=US; ST=California; L=San Francisco; O=Wikimedia Foundation, Inc.; CN=*.wikipedia.org
41* start date: Nov 12 00:00:00 2019 GMT
42* expire date: Oct 6 12:00:00 2020 GMT
43* subjectAltName: host "hu.wikipedia.org" matched cert's "*.wikipedia.org"
44* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
45* SSL certificate verify ok.
46* Using HTTP2, server supports multi-use
47* Connection state changed (HTTP/2 confirmed)
48* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
49* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
50* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
51* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
52* Using Stream ID: 1 (easy handle 0x55a3cfcd2580)
53* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
54> GET /w/api.php?action=opensearch&format=json&formatversion=2&search=R%C3%A9vai%20Le&namespace=&limit=10 HTTP/2
55> Host: hu.wikipedia.org
56> Accept-Encoding: deflate, gzip
57> authority: hu.wikipedia.org
58> pragma: no-cache
59> cache-control: no-cache
60> accept: application/json, text/javascript, */*; q=0.01
61> user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36
62> x-requested-with: XMLHttpRequest
63> sec-fetch-site: same-origin
64> sec-fetch-mode: cors
65> sec-fetch-dest: empty
66> referer: https://hu.wikipedia.org/wiki/Speci%C3%A1lis:Be%C3%A1ll%C3%ADt%C3%A1saim
67> accept-language: hu-HU,hu;q=0.9,en-US;q=0.8,en;q=0.7,de;q=0.6
68> cookie: <redacted>
69>
70* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
71* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
72* TLSv1.3 (IN), TLS Unknown, Unknown (23):
73* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
74* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
75* TLSv1.3 (IN), TLS Unknown, Unknown (23):
76< HTTP/2 200
77< date: Tue, 19 May 2020 10:20:58 GMT
78< server: mw1356.eqiad.wmnet
79< x-content-type-options: nosniff
80< x-search-id: 4i1wne9lcv53t3nodpgtswbcy
81< x-opensearch-type: prefix
82< x-frame-options: DENY
83< content-disposition: inline; filename=api-result.json
84< vary: Accept-Encoding,Treat-as-Untrusted,X-Forwarded-Proto,Cookie,Authorization
85< cache-control: private, must-revalidate, max-age=10800
86< content-type: application/json; charset=utf-8
87< content-encoding: gzip
88< age: 0
89< x-cache: cp3054 miss, cp3064 pass
90< x-cache-status: pass
91< server-timing: cache;desc="pass"
92< strict-transport-security: max-age=106384710; includeSubDomains; preload
93< x-client-ip: 2a02:ab88:56c3:c880:c564:f7e5:be47:f091
94< accept-ranges: bytes
95< content-length: 90
96<
97* Connection #0 to host hu.wikipedia.org left intact
98
99["Révai Le",["Révai Leó"],[""],["https://hu.wikipedia.org/wiki/R%C3%A9vai_Le%C3%B3"]]

Example of a request that does not work:

1curl -v 'https://hu.wikipedia.org/w/api.php?action=opensearch&format=json&formatversion=2&search=R%C3%A9vai%20Le&namespace=0%7C1%7C2%7C3%7C4%7C5%7C6%7C7%7C8%7C9%7C10%7C11%7C12%7C13%7C14%7C15%7C90%7C91%7C92%7C93%7C100%7C101%7C828%7C829%7C2300%7C2301%7C2302%7C2303&limit=10' \
2 -H 'authority: hu.wikipedia.org' \
3 -H 'pragma: no-cache' \
4 -H 'cache-control: no-cache' \
5 -H 'accept: application/json, text/javascript, */*; q=0.01' \
6 -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36' \
7 -H 'x-requested-with: XMLHttpRequest' \
8 -H 'sec-fetch-site: same-origin' \
9 -H 'sec-fetch-mode: cors' \
10 -H 'sec-fetch-dest: empty' \
11 -H 'referer: https://hu.wikipedia.org/wiki/Kezd%C5%91lap' \
12 -H 'accept-language: en-US,en;q=0.9,hu;q=0.8' \
13 -H 'cookie: GeoIP=US:CA:San_Francisco:37.80:-122.41:v4; huwikiUserID=245425; huwikiUserName=Tgr+%28WMF%29; centralauth_User=Tgr+%28WMF%29; centralauth_Token=<redacted>; huwikimwuser-sessionId=<redacted>; forceHTTPS=true; VEE=wikitext; stopMobileRedirect=true; huwikiSession=<redacted>; centralauth_Session=<redacted>; WMF-Last-Access=19-May-2020; WMF-Last-Access-Global=19-May-2020' \
14 --compressed
15
16* Trying 2620:0:862:ed1a::1...
17* TCP_NODELAY set
18* Connected to hu.wikipedia.org (2620:0:862:ed1a::1) port 443 (#0)
19* ALPN, offering h2
20* ALPN, offering http/1.1
21* successfully set certificate verify locations:
22* CAfile: /etc/ssl/certs/ca-certificates.crt
23 CApath: /etc/ssl/certs
24* TLSv1.3 (OUT), TLS handshake, Client hello (1):
25* TLSv1.3 (IN), TLS handshake, Server hello (2):
26* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
27* TLSv1.3 (IN), TLS handshake, Unknown (8):
28* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
29* TLSv1.3 (IN), TLS handshake, Certificate (11):
30* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
31* TLSv1.3 (IN), TLS handshake, CERT verify (15):
32* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
33* TLSv1.3 (IN), TLS handshake, Finished (20):
34* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
35* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
36* TLSv1.3 (OUT), TLS handshake, Finished (20):
37* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
38* ALPN, server accepted to use h2
39* Server certificate:
40* subject: C=US; ST=California; L=San Francisco; O=Wikimedia Foundation, Inc.; CN=*.wikipedia.org
41* start date: Nov 12 00:00:00 2019 GMT
42* expire date: Oct 6 12:00:00 2020 GMT
43* subjectAltName: host "hu.wikipedia.org" matched cert's "*.wikipedia.org"
44* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
45* SSL certificate verify ok.
46* Using HTTP2, server supports multi-use
47* Connection state changed (HTTP/2 confirmed)
48* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
49* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
50* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
51* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
52* Using Stream ID: 1 (easy handle 0x55cbe815c580)
53* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
54> GET /w/api.php?action=opensearch&format=json&formatversion=2&search=R%C3%A9vai%20Le&namespace=0%7C1%7C2%7C3%7C4%7C5%7C6%7C7%7C8%7C9%7C10%7C11%7C12%7C13%7C14%7C15%7C90%7C91%7C92%7C93%7C100%7C101%7C828%7C829%7C2300%7C2301%7C2302%7C2303&limit=10 HTTP/2
55> Host: hu.wikipedia.org
56> Accept-Encoding: deflate, gzip
57> authority: hu.wikipedia.org
58> pragma: no-cache
59> cache-control: no-cache
60> accept: application/json, text/javascript, */*; q=0.01
61> user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36
62> x-requested-with: XMLHttpRequest
63> sec-fetch-site: same-origin
64> sec-fetch-mode: cors
65> sec-fetch-dest: empty
66> referer: https://hu.wikipedia.org/wiki/Kezd%C5%91lap
67> accept-language: en-US,en;q=0.9,hu;q=0.8
68> cookie: <redacted>
69>
70* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
71* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
72* TLSv1.3 (IN), TLS Unknown, Unknown (23):
73* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
74* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
75* TLSv1.3 (IN), TLS Unknown, Unknown (23):
76< HTTP/2 200
77< date: Tue, 19 May 2020 10:24:26 GMT
78< server: mw1408.eqiad.wmnet
79< x-content-type-options: nosniff
80< x-search-id: 5hw68zt3j64j9tacp7zam4tr8
81< x-opensearch-type: comp_suggest
82< x-frame-options: DENY
83< content-disposition: inline; filename=api-result.json
84< vary: Accept-Encoding,Treat-as-Untrusted,X-Forwarded-Proto,Cookie,Authorization
85< cache-control: private, must-revalidate, max-age=10800
86< content-type: application/json; charset=utf-8
87< content-encoding: gzip
88< age: 0
89< x-cache: cp3052 miss, cp3064 pass
90< x-cache-status: pass
91< server-timing: cache;desc="pass"
92< strict-transport-security: max-age=106384710; includeSubDomains; preload
93< x-client-ip: 2a02:ab88:56c3:c880:c564:f7e5:be47:f091
94< accept-ranges: bytes
95< content-length: 222
96<
97* Connection #0 to host hu.wikipedia.org left intact
98
99["Révai Le",["Révai nagy lexikona","Révai új lexikona","Révai Testvérek","Révai Dezső","Révai Kereskedelmi, Pénzügyi és Ipari Lexikona"],["","","","",""],["https://hu.wikipedia.org/wiki/R%C3%A9vai_nagy_lexikona","https://hu.wikipedia.org/wiki/R%C3%A9vai_%C3%BAj_lexikona","https://hu.wikipedia.org/wiki/R%C3%A9vai_Testv%C3%A9rek","https://hu.wikipedia.org/wiki/R%C3%A9vai_Dezs%C5%91","https://hu.wikipedia.org/wiki/R%C3%A9vai_Kereskedelmi,_P%C3%A9nz%C3%BCgyi_%C3%A9s_Ipari_Lexikona"]]

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

also if I directly request the API URL used by the search autocompletion JS code (this for the example above) it gives the correct result for all browsers

That's not actually true, I missed that the two browsers issue different requests. (working, not working.) The same URL behaves consistently in all browsers. So not a network issue.

Specifically, the working request has namespace= and the other one has namespace=0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|90|91|92|93|100|101|828|829|2300|2301|2302|2303. And the response for the first says x-opensearch-type: prefix and the second one x-opensearch-type: comp_suggest. So maybe this is the combination of a CompletionSuggester bug and some kind of obscure user setting resulting in the use of a different search method for one account? Not sure what that would be though, both accounts have the Search completion user preference set to default.

Manually selecting the "Classic prefix search" user preference makes all browsers give the correct results.

So maybe this is the combination of a CompletionSuggester bug and some kind of obscure user setting resulting in the use of a different search method for one account?

The difference is caused by having all namespaces unselected. For the user who got the CompletionSuggester result, the search namespace settings (searchNsX user preferences, configurable only via Special:Search these days) were the default: searchNs0 true, everything else unset. For the user who got prefix search all namesapces were unselected (searchNs0 the empty string, everything else unset).

Not sure if "all namespaces unset" is a legitimate setting but it seems impossible to set via the UI today: if you unselect all namespaces and search, the search will be in the main namespace but the and that configuration is what the "Remember selection for future searches" checkbox will save. I had to unset it by hand with

Object.keys(mw.user.options.values).filter( function(k) { return k.match(/^searchNs\d+$/); } ).forEach( function(pref) { (new mw.Api()).saveOption(pref, pref === 'searchNs0' ? '' : false); } );

As an aside, the search autocompleter honoring the namespace defaults set for advanced search seems highly counterintuitive to me. I seem to recall having reported that in the past but I'm unable to find any relevant task now.

Several users have been experiencing a similar problem on cswiki.

Test samples: "Arroyofresno (stanice metra)", "Ottendorfský potok (přítok Křinice)".

For partial matches (eg. "Arroyofr") nothing is suggested.

Should this be in Tech News User-notice ?

If it really affects all wikis then definitely.

Same observations are described on frwiki.

Aklapper renamed this task from Search autocompletion broken for recent articles (after April 30?) for some users / browsers to Search autocompletion broken for recent articles (after April 30, 2020?) for some users / browsers.Jun 2 2020, 8:09 PM

Change 601882 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] maintenance::cirrussearch: extract to file

https://gerrit.wikimedia.org/r/601882

Thanks all for reporting this behavior. The Search team is working on a fix.

Change 601882 merged by Ryan Kemper:
[operations/puppet@production] maintenance::cirrussearch: extract index rebuild

https://gerrit.wikimedia.org/r/601882

Completions still seems to be broken. AIUI the patch above should have taken effect by now.

dcausse moved this task from needs triage to Ops / SRE on the Discovery-Search board.
dcausse subscribed.

There was another issue that https://gerrit.wikimedia.org/r/601882 uncovered (T254331), I think the indices resumed being updated after https://gerrit.wikimedia.org/r/c/operations/puppet/+/602015. I tested a few examples mentioned in this task and they're now returning results.
Moving to done so that we can add it to SoS and possibly mentioning this problem in a user notice.

dcausse triaged this task as High priority.Jun 22 2020, 7:37 AM
dcausse moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.