Page MenuHomePhabricator

Special:Search performs DB writes on GET request
Closed, ResolvedPublic

Description

Special:Search has a 'Remember selection for future searches' option in the advanced search profile. The search form itself is submitted via GET, so this means we are writing to the master database on GET requests. Not sure best option:

  • Make all search requests POST's. In a multi-datacenter world this would force all search queries to the primary datacenter, even though we have search clusters in more than one DC that might be queried
  • Default to POST, but magic together some javascript to swap the form submission to GET unless the 'remember' input is clicked. Slightly better than above, allows javascript enabled browsers to query their local datacenter.
  • Do nothing, allowing cross-datacenter writes. Seems a poor solution, but usage of this feature is very low. The feature is used a good bit, but remembering a new configuration is relatively rare.
  • Something else?

Event Timeline

Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptNov 29 2016, 5:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev added a comment.EditedNov 29 2016, 5:44 PM

Make "remember" part be a separate POST via some JS magic, not part of the actual search request?
So the search is the same GET in both cases, but if you check the remember checkbox, JS also issues POST that saves the params. It may in rare cases cause some weirdness (such as POST failing but GET succeeding and thus settings not saved) but in general should work I think?

Smalyshev added a comment.EditedNov 29 2016, 5:59 PM

Another option - since that seems to control only namespace list, make it client-side option instead of server-side. That means it won't work for people with multiple browsers, but this use case seems to be pretty small.

Deskana triaged this task as Low priority.Dec 8 2016, 11:12 PM
Deskana moved this task from needs triage to later on... on the Discovery-Search board.

I'm not sure this needs to be done, since it is already deferred until post-send so should not have any impact on user-visible latency.

IIRC the point was to eliminate all writes in GET due to their impact on multi-DC setup, not on the performance per se?

The issue is that some GET requests will be routed to the secondary data centre. Eliminating all writes in such requests is not feasible and was abandoned as a goal very early in the planning. Instead, when MW calls wfGetDB(DB_MASTER) from the secondary datacentre, it will connect to the remote master database. This takes about 6 RTTs per SSL connection plus 1 RTT per query. So avoiding such queries is useful for performance.

Smalyshev lowered the priority of this task from Low to Lowest.Feb 21 2018, 11:04 PM

OK, got it, thanks for explaining. So this may be performance issue, but not a very important one since this happens relatively infrequently, and unless we detect some trouble from it, we probably can live with it now.

Joe added a subscriber: Joe.Feb 12 2019, 7:27 AM

Wouldn't it be feasible to have the search request generate a simple job that saves that preference asynchronously?

The jobqueue is already capable of listening to the events in both DCs, and execute them in the primary one only, so most things that can happen post-send can probably be deferred to the jobqueue.

aaron added a comment.Feb 14 2019, 7:05 PM

Wouldn't it be feasible to have the search request generate a simple job that saves that preference asynchronously?
The jobqueue is already capable of listening to the events in both DCs, and execute them in the primary one only, so most things that can happen post-send can probably be deferred to the jobqueue.

I generic preference setting job would be useful for this and similar cases.

EBernhardson added a comment.EditedFeb 14 2019, 8:05 PM

Wouldn't it be feasible to have the search request generate a simple job that saves that preference asynchronously?
The jobqueue is already capable of listening to the events in both DCs, and execute them in the primary one only, so most things that can happen post-send can probably be deferred to the jobqueue.

I generic preference setting job would be useful for this and similar cases.

A delay in making the new preference available might be unintuitive for users. In particular if they perform another search a few seconds after the one where they saved they will expect the new setting to load. I'm not entirely clear on things, but i think ChronologyProtector ensures this with the current setup, but if the preference pushed into a job there would be no guarantee? It could also be that simple UI affordances to keep everything selected will negate any concern about reading the preference.

aaron added a comment.Feb 15 2019, 1:23 AM

Wouldn't it be feasible to have the search request generate a simple job that saves that preference asynchronously?
The jobqueue is already capable of listening to the events in both DCs, and execute them in the primary one only, so most things that can happen post-send can probably be deferred to the jobqueue.

I generic preference setting job would be useful for this and similar cases.

A delay in making the new preference available might be unintuitive for users. In particular if they perform another search a few seconds after the one where they saved they will expect the new setting to load. I'm not entirely clear on things, but i think ChronologyProtector ensures this with the current setup, but if the preference pushed into a job there would be no guarantee? It could also be that simple UI affordances to keep everything selected will negate any concern about reading the preference.

Right, no guarantee there, not without cookie/localstorage tricks to tide things over til the job runs :(

Joe added a comment.EditedFeb 15 2019, 6:30 AM

Given how efficient the jobqueue is, we can expect that more than 99% of the preferences will be saved before a new search happens. Having some sort of protection against that 1% (like, localstorage "tricks") would help, but I'd argue that would be better than 50% of the calls needing a cross-datacenter write in terms of reliability and user experience.
But this is clearly a more general question: do we prefer to impose strong causality[1] at the risk of increasing the overall error rate, or do we prefer to accept asynchronicity and deal with the added complexity, while guaranteeing a better eventual consistency?

I see a third option, though:

  • Save the preferences to the user session, which will be writable from both datacenters
  • Persist it to the database via a job

Upon the next query, we'd check the user session, and use preferences stored there if they're fresher than the ones saved in the database. This would really guarantee strong causality, and offer the best user experience possible

[1] Please note that saving to the master cross-dc does NOT guarantee strong causality in the user experience. First, we'd save post-send, which already doesn't guarantee the data will be saved early enough. Second, you need to account for cross-dc replication lag, which is usually small but can easily be several seconds. There is really no way around managing such situations than by using sessions, which are read-write from all datacenters.

aaron added a comment.Feb 15 2019, 7:04 AM

I'd prefer we didn't take preferences and bits of data like this onto sessions (we used to many years ago and that was lame), especially given the cross-DC latency. I worry people will go overboard with such features that cause slow writes.

Given good job queue speed, I'd agree that just using that is much better than anything cross-DC synchronous. If it really is 99% fast enough, that seems OK to by me (though I don't maintain these features).

Change 499985 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Add UserOptionsUpdateJob class and use it for namespaces at SpecialSearch

https://gerrit.wikimedia.org/r/499985

Change 508933 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Make powersearch form use POST if JS is disabled

https://gerrit.wikimedia.org/r/508933

Change 499985 abandoned by Aaron Schulz:
Add UserOptionsUpdateJob class and use it for namespaces at SpecialSearch

Reason:
Taking a different route

https://gerrit.wikimedia.org/r/499985

Change 508933 merged by jenkins-bot:
[mediawiki/core@master] Make powersearch form use POST if JS is disabled

https://gerrit.wikimedia.org/r/508933

aaron closed this task as Resolved.May 8 2019, 10:08 PM
aaron claimed this task.

Change 511025 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/AdvancedSearch@master] Enforce a GET request when the "remember" user setting doesn't change

https://gerrit.wikimedia.org/r/511025

Change 511025 merged by Thiemo Kreuz (WMDE):
[mediawiki/extensions/AdvancedSearch@master] Enforce a GET request when the "remember" user setting doesn't change

https://gerrit.wikimedia.org/r/511025

Change 518007 had a related patch set uploaded (by WMDE-Fisch; owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/AdvancedSearch@wmf/1.34.0-wmf.8] Enforce a GET request when the "remember" user setting doesn't change

https://gerrit.wikimedia.org/r/518007

Change 518007 merged by jenkins-bot:
[mediawiki/extensions/AdvancedSearch@wmf/1.34.0-wmf.8] Enforce a GET request when the "remember" user setting doesn't change

https://gerrit.wikimedia.org/r/518007

Change 518756 had a related patch set uploaded (by WMDE-Fisch; owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/AdvancedSearch@wmf/1.34.0-wmf.8] Enforce a GET request when the "remember" user setting doesn't change

https://gerrit.wikimedia.org/r/518756

Change 518756 merged by jenkins-bot:
[mediawiki/extensions/AdvancedSearch@wmf/1.34.0-wmf.8] Enforce a GET request when the "remember" user setting doesn't change

https://gerrit.wikimedia.org/r/518756