Page MenuHomePhabricator

Set max execution time for several expensive mediawiki actions
Closed, ResolvedPublic

Description

Since T195792: Add support for setting individual query timeout in wikimedia/rdbms is done now we can set a max execution time. For the initial deployment:

  • Five special pages and their API counterparts will get max query time of 30s (why 30s?): RecentChanges, RecentChangesLinked, Watchlist, Log, Contributions
  • All of direct queries done by DPL will get a 10s timeout.

Rolling out plan for the special pages:

  • Before 2022, we roll it out on 1/3rd of requests randomly (with a rand() function in CommonSettings.php)
  • First week of January: We roll it out everywhere

DPL will get it without gradual deployment.

Detailed announcement: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/IPJNO75HYAQWIGTHI5LJHTDVLVOC4LJP/

Event Timeline

Ladsgroup updated the task description. (Show Details)
Ladsgroup moved this task from Triage to In progress on the DBA board.

Change 747142 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [WIP] Allow setting max execution time to several special pages

https://gerrit.wikimedia.org/r/747142

Change 747564 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/Wikibase@master] Add a config to pass the test

https://gerrit.wikimedia.org/r/747564

Change 747564 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Add a config to pass the test

https://gerrit.wikimedia.org/r/747564

Change 747142 merged by jenkins-bot:

[mediawiki/core@master] Allow setting max execution time to several special pages

https://gerrit.wikimedia.org/r/747142

Change 747840 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] beta: Set wgMaxExecutionTimeForExpensiveQueries

https://gerrit.wikimedia.org/r/747840

Change 747840 merged by jenkins-bot:

[operations/mediawiki-config@master] beta: Set wgMaxExecutionTimeForExpensiveQueries

https://gerrit.wikimedia.org/r/747840

Change 747845 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/intersection@master] Set a maximum allowed time for db queries

https://gerrit.wikimedia.org/r/747845

Change 747693 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.12] Add a config to pass the test

https://gerrit.wikimedia.org/r/747693

Change 747694 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.13] Add a config to pass the test

https://gerrit.wikimedia.org/r/747694

Change 747695 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.13] Allow setting max execution time to several special pages

https://gerrit.wikimedia.org/r/747695

Change 747696 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.12] Allow setting max execution time to several special pages

https://gerrit.wikimedia.org/r/747696

Change 747697 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/intersection@wmf/1.38.0-wmf.13] Set a maximum allowed time for db queries

https://gerrit.wikimedia.org/r/747697

Change 747698 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/intersection@wmf/1.38.0-wmf.12] Set a maximum allowed time for db queries

https://gerrit.wikimedia.org/r/747698

Change 747693 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.12] Add a config to pass the test

https://gerrit.wikimedia.org/r/747693

Change 747694 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@wmf/1.38.0-wmf.13] Add a config to pass the test

https://gerrit.wikimedia.org/r/747694

Change 747845 merged by jenkins-bot:

[mediawiki/extensions/intersection@master] Set a maximum allowed time for db queries

https://gerrit.wikimedia.org/r/747845

Change 747857 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Gradual roll out of $wgMaxExecutionTimeForExpensiveQueries

https://gerrit.wikimedia.org/r/747857

Change 747696 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.12] Allow setting max execution time to several special pages

https://gerrit.wikimedia.org/r/747696

Change 747695 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.13] Allow setting max execution time to several special pages

https://gerrit.wikimedia.org/r/747695

Mentioned in SAL (#wikimedia-operations) [2021-12-16T15:30:49Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.13/includes/DefaultSettings.php: Backport: [[gerrit:747695|Allow setting max execution time to several special pages (T297708)], Part I (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2021-12-16T15:32:09Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.13/includes/: Backport: [[gerrit:747695|Allow setting max execution time to several special pages (T297708)], Part II (duration: 01m 12s)

Change 747857 merged by jenkins-bot:

[operations/mediawiki-config@master] Gradual roll out of $wgMaxExecutionTimeForExpensiveQueries

https://gerrit.wikimedia.org/r/747857

Mentioned in SAL (#wikimedia-operations) [2021-12-16T15:34:19Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/includes/DefaultSettings.php: Backport: [[gerrit:747696|Allow setting max execution time to several special pages (T297708)], Part I (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2021-12-16T15:35:40Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/includes/: Backport: [[gerrit:747696|Allow setting max execution time to several special pages (T297708)], Part II (duration: 01m 11s)

Mentioned in SAL (#wikimedia-operations) [2021-12-16T15:36:57Z] <ladsgroup@deploy1002> Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:747857|Gradual roll out of $wgMaxExecutionTimeForExpensiveQueries (T297708)]] (duration: 01m 06s)

This is rolled out for one third of requests and we will see its result soon. After new year, we will roll this out to everyone.

Change 747698 merged by jenkins-bot:

[mediawiki/extensions/intersection@wmf/1.38.0-wmf.12] Set a maximum allowed time for db queries

https://gerrit.wikimedia.org/r/747698

Change 747697 merged by jenkins-bot:

[mediawiki/extensions/intersection@wmf/1.38.0-wmf.13] Set a maximum allowed time for db queries

https://gerrit.wikimedia.org/r/747697

Mentioned in SAL (#wikimedia-operations) [2021-12-16T16:30:08Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.13/extensions/intersection: Backport: [[gerrit:747697|Set a maximum allowed time for db queries (T297708)]] (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2021-12-16T16:32:41Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/extensions/intersection: Backport: [[gerrit:747698|Set a maximum allowed time for db queries (T297708)]] (duration: 01m 06s)

This is rolled out for one third of requests and we will see its result soon. After new year, we will roll this out to everyone.

This goes to freezer until Jan 3rd.

Change 747876 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[operations/puppet@production] logspam: Consolidate max_statement_time errors

https://gerrit.wikimedia.org/r/747876

Change 747876 merged by Ladsgroup:

[operations/puppet@production] logspam: Consolidate max_statement_time errors

https://gerrit.wikimedia.org/r/747876

Change 750831 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Full roll out of wgMaxExecutionTimeForExpensiveQueries

https://gerrit.wikimedia.org/r/750831

I will send an announcement today but I think it should be in the tech news as well.

Change 750831 merged by jenkins-bot:

[operations/mediawiki-config@master] Full roll out of wgMaxExecutionTimeForExpensiveQueries

https://gerrit.wikimedia.org/r/750831

Mentioned in SAL (#wikimedia-operations) [2022-01-03T07:00:28Z] <ladsgroup@deploy1002> Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:750831|Full roll out of wgMaxExecutionTimeForExpensiveQueries (T297708)]], Part I (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2022-01-03T07:02:08Z] <ladsgroup@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:750831|Full roll out of wgMaxExecutionTimeForExpensiveQueries (T297708)]], Part I (duration: 01m 20s)

Ladsgroup moved this task from In progress to Done on the DBA board.

Seeing the Query execution was interrupted increase (as expected, probably) in the logs since this rolled out.

For the purposes of train is it safe to ignore all of these messages?

query-execution-interrupted-2022-01-06.png (241×663 px, 15 KB)

Per T298010, the API is currently returning errors that look like this: internal_api_error_DBQueryError: [12099b97-17b0-42c5-883d-464328a6e662] Caught exception of type Wikimedia\Rdbms\DBQueryError I suggest a more informative error message be used in case of timeouts.

I will send an announcement today but I think it should be in the tech news as well.

For Tech News, I'd suggest something like this as the summary (please edit or approve):

Five special pages (and their API counterparts) now have a maximum database query execution time of 30 seconds. These special pages are: RecentChanges, RecentChangesLinked, Watchlist, Contributions, and Log. This change will help with site performance and stability. You can read more details about this change including some possible solutions if this affects your workflows.

Seeing the Query execution was interrupted increase (as expected, probably) in the logs since this rolled out.

For the purposes of train is it safe to ignore all of these messages?

query-execution-interrupted-2022-01-06.png (241×663 px, 15 KB)

Yes. Thank you. That's intentional (like php timeout errors basically, if they go really high suddenly, that's an issue but a baseline is fine)

Per T298010, the API is currently returning errors that look like this: internal_api_error_DBQueryError: [12099b97-17b0-42c5-883d-464328a6e662] Caught exception of type Wikimedia\Rdbms\DBQueryError I suggest a more informative error message be used in case of timeouts.

Agreed. Please file a task and subscribe me.

I will send an announcement today but I think it should be in the tech news as well.

For Tech News, I'd suggest something like this as the summary (please edit or approve):

Five special pages (and their API counterparts) now have a maximum database query execution time of 30 seconds. These special pages are: RecentChanges, RecentChangesLinked, Watchlist, Contributions, and Log. This change will help with site performance and stability. You can read more details about this change including some possible solutions if this affects your workflows.

Looks good to me but That's four now. I had to drop RCLinked (I will work on it).

This is also sometimes causing the API to emit status code 500, apart from internal_api_errors (which have status 200).