Page MenuHomePhabricator

CX tables miss a lot of important indexes causing partial outages
Closed, ResolvedPublic8 Estimated Story Points

Description

See T390510: Fatal DBUnexpectedError: "Database servers in extension1 are overloaded"

First {T391153}

Then T390510#10772985 (CREATE INDEX translation_started_by_last_updated_timestamp ON cx_translations (translation_started_by, translation_last_updated_timestamp))

Then: T390510#10772917 (cx_suggestions (cxs_source_language, cxs_title))

Then: T390510#10772940 (cxs_source_language_title ON cx_suggestions (cxs_source_language, cxs_title)) This has been scanning 3M rows every time a user made a query. T390510#10772985

Now it's much better but:

  • Please add the indexes I added to the code. And we should add them to testwiki and other places
  • Please audit your code for missing indexes. I have found three so far but according to graphs, there might be more.
  • Please make sure new issues like this doesn't get introduced again (by doing queries on conditions that don't have indexes)

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
mediawiki/extensions/ContentTranslationmaster+410 -0
mediawiki/extensions/ContentTranslationmaster+396 -0
mediawiki/extensions/ContentTranslationmaster+204 -0
mediawiki/extensions/ContentTranslationmaster+148 -1
mediawiki/extensions/ContentTranslationmaster+382 -0
mediawiki/extensions/ContentTranslationmaster+4 -3
mediawiki/extensions/ContentTranslationwmf/1.44.0-wmf.28+2 -0
mediawiki/extensions/ContentTranslationwmf/1.44.0-wmf.27+2 -0
mediawiki/extensions/ContentTranslationmaster+2 -0
mediawiki/extensions/ContentTranslationmaster+0 -19
mediawiki/extensions/ContentTranslationmaster+3 K -3 K
mediawiki/extensions/ContentTranslationwmf/1.44.0-wmf.27+0 -19
mediawiki/extensions/ContentTranslationmaster+0 -19
mediawiki/extensions/ContentTranslationmaster+0 -7
mediawiki/extensions/ContentTranslationwmf/1.44.0-wmf.27+0 -7
mediawiki/extensions/ContentTranslationwmf/1.44.0-wmf.27+0 -6
mediawiki/extensions/ContentTranslationmaster+0 -6
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Nikerabbit changed the task status from Open to In Progress.May 5 2025, 7:28 AM
Nikerabbit set the point value for this task to 8.

Change #1141849 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] Disbale Special:ContentTranslationStats page

https://gerrit.wikimedia.org/r/1141849

Change #1141850 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] Disable ApiQueryContentTranslationStats

https://gerrit.wikimedia.org/r/1141850

Change #1141853 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] Remove links to Special:ContentTranslationStats from dashboards

https://gerrit.wikimedia.org/r/1141853

We are disabling our stats page and some related API modules given the queries that were mentioned above. We have looked at those queries before and there is no simple fix for them and a new dashboard was already in the plans, but not ready yet.

We have started working on the query audit in an internal document.

More queries causing full table scans from T390510#10790296:

SELECT  translation_target_language AS `language`,COUNT(DISTINCT translation_started_by) AS `translators`  FROM `cx_translations`    WHERE ((translation_status = 'published' OR translation_target_url IS NOT NULL))  GROUP BY translation_target_language ;

SELECT  translation_source_language AS `language`,COUNT(DISTINCT translation_started_by) AS `translators`  FROM `cx_translations`    WHERE ((translation_status = 'published' OR translation_target_url IS NOT NULL))  GROUP BY translation_source_language ;

SELECT  translation_source_language AS `sourceLanguage`,translation_target_language AS `targetLanguage`,(CASE WHEN (translation_status = 'published' OR translation_target_url IS NOT NULL) THEN 'published' ELSE 'draft' END) AS `status`,COUNT(*) AS `count`,COUNT(DISTINCT translation_started_by) AS `translators`  FROM `cx_translations`    WHERE translation_status IN ('draft','published')   GROUP BY translation_source_language,translation_target_language,status ;

SELECT  MAX(translation_last_updated_timestamp) AS `date`,COUNT(translation_id) AS `count`  FROM `cx_translations`    WHERE ((translation_status = 'published' OR translation_target_url IS NOT NULL))  GROUP BY YEARWEEK(translation_last_updated_timestamp, 3) ;

These queries are used in SpecialContentTranslationStats page, and we've now disabled that page. It was slow, and timed out most of the times.

For the last query:

SELECT  MAX(translation_last_updated_timestamp) AS `date`,COUNT(translation_id) AS `count`  FROM `cx_translations`    WHERE ((translation_status = 'published' OR translation_target_url IS NOT NULL))  GROUP BY YEARWEEK(translation_last_updated_timestamp, 3) ;

There is another caller from ApiQueryTranslatorStats but that ends up using an index (cx_translation_languages) due to there WHERE condition:

SELECT 
    MAX(translation_last_updated_timestamp) AS date,
    COUNT(translation_id) AS count
FROM 
    cx_translations
WHERE 
    (translation_status = 'draft' AND translation_target_url IS NULL)
    AND translation_source_language = :source
    AND translation_target_language = :target
    AND translation_last_update_by = :translatorId
GROUP BY 
    YEARWEEK(translation_last_updated_timestamp, 3)
idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1SIMPLEcx_translationsrefcx_translation_languagescx_translation_languages78const,const,const1Using index condition; Using where; Using temporary; Using filesort

Change #1141866 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] Disable Special:ContentTranslationStats page

https://gerrit.wikimedia.org/r/1141866

Change #1141867 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] Disable APIs used in Special:ContentTranslationStats

https://gerrit.wikimedia.org/r/1141867

Change #1141868 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] Remove links to Special:ContentTranslationStats from dashboards

https://gerrit.wikimedia.org/r/1141868

Change #1141849 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Disable Special:ContentTranslationStats page

https://gerrit.wikimedia.org/r/1141849

Change #1141850 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Disable APIs used in Special:ContentTranslationStats

https://gerrit.wikimedia.org/r/1141850

Change #1141853 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Remove links to Special:ContentTranslationStats from dashboards

https://gerrit.wikimedia.org/r/1141853

Change #1141866 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] Disable Special:ContentTranslationStats page

https://gerrit.wikimedia.org/r/1141866

Mentioned in SAL (#wikimedia-operations) [2025-05-05T13:06:12Z] <kartik@deploy1003> Started scap sync-world: Backport for [[gerrit:1141866|Disable Special:ContentTranslationStats page (T392839 T325790)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-05T13:10:55Z] <kartik@deploy1003> kartik, abi: Backport for [[gerrit:1141866|Disable Special:ContentTranslationStats page (T392839 T325790)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-05-05T13:21:42Z] <kartik@deploy1003> Finished scap sync-world: Backport for [[gerrit:1141866|Disable Special:ContentTranslationStats page (T392839 T325790)]] (duration: 15m 29s)

Change #1141867 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] Disable APIs used in Special:ContentTranslationStats

https://gerrit.wikimedia.org/r/1141867

Mentioned in SAL (#wikimedia-operations) [2025-05-05T13:34:32Z] <kartik@deploy1003> Started scap sync-world: Backport for [[gerrit:1141867|Disable APIs used in Special:ContentTranslationStats (T392839)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-05T13:39:02Z] <kartik@deploy1003> kartik, abi: Backport for [[gerrit:1141867|Disable APIs used in Special:ContentTranslationStats (T392839)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-05-05T13:47:56Z] <kartik@deploy1003> Finished scap sync-world: Backport for [[gerrit:1141867|Disable APIs used in Special:ContentTranslationStats (T392839)]] (duration: 13m 23s)

Change #1141868 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] Remove links to Special:ContentTranslationStats from dashboards

https://gerrit.wikimedia.org/r/1141868

Mentioned in SAL (#wikimedia-operations) [2025-05-05T13:52:38Z] <kartik@deploy1003> Started scap sync-world: Backport for [[gerrit:1141868|Remove links to Special:ContentTranslationStats from dashboards (T392839)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-05T14:10:53Z] <kartik@deploy1003> kartik, abi: Backport for [[gerrit:1141868|Remove links to Special:ContentTranslationStats from dashboards (T392839)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Change #1141940 had a related patch set uploaded (by Eamedina; author: Eamedina):

[mediawiki/extensions/ContentTranslation@master] CX3 Build 1.0.0+20250505

https://gerrit.wikimedia.org/r/1141940

Change #1141940 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] CX3 Build 1.0.0+20250505

https://gerrit.wikimedia.org/r/1141940

Change #1141948 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] Reapply "Remove links to Special:ContentTranslationStats from dashboards"

https://gerrit.wikimedia.org/r/1141948

Change #1141948 abandoned by Abijeet Patro:

[mediawiki/extensions/ContentTranslation@master] Reapply "Remove links to Special:ContentTranslationStats from dashboards"

Reason:

not needed.

https://gerrit.wikimedia.org/r/1141948

Change #1142612 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@master] ApiQueryPublishedTranslations: Make `from` and `to` mandatory

https://gerrit.wikimedia.org/r/1142612

Following indexes have been added on production and need to be added to our code:

CREATE INDEX cx_translators_translation_id ON cx_translators (translator_translation_id);
  • Done in T391153

CREATE INDEX translation_started_by_last_updated_timestamp ON cx_translations (translation_started_by, translation_last_updated_timestamp);

CREATE INDEX cxs_source_language_title ON cx_suggestions (cxs_source_language, cxs_title);

CREATE INDEX cxl_owner ON cx_lists (cxl_owner);

CREATE INDEX translation_last_update_by_last_updated_timestamp ON cx_translations (translation_last_update_by, translation_last_updated_timestamp);

CREATE INDEX cx_translations_last_updated_timestamp ON cx_translations ( translation_last_updated_timestamp );

Change #1143024 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] cx_translator: Add index fetch by translator and update time faster

https://gerrit.wikimedia.org/r/1143024

Change #1143050 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] cx_suggestions: Add index to make deletion by title and language faster

https://gerrit.wikimedia.org/r/1143050

Change #1143051 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@master] dump-corpora: apply vslow query group

https://gerrit.wikimedia.org/r/1143051

Change #1143062 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] cx_lists: Add index to make user suggestion fetching faster

https://gerrit.wikimedia.org/r/1143062

Change #1143070 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] cx_translations: Add index to make fetching language trends faster

https://gerrit.wikimedia.org/r/1143070

Query audit is complete. Working to mitigate remaining issues and adding remaining indexes to the code.

Mentioned in SAL (#wikimedia-operations) [2025-05-07T12:58:04Z] <Amir1> [wikishared]> CREATE INDEX translation_last_updated_timestamp ON cx_translations (translation_last_updated_timestamp); (T392839)

Change #1142612 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] ApiQueryPublishedTranslations: Make `from` and `to` mandatory

https://gerrit.wikimedia.org/r/1142612

Change #1143138 had a related patch set uploaded (by Sbisson; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] ApiQueryPublishedTranslations: Make `from` and `to` mandatory

https://gerrit.wikimedia.org/r/1143138

(I think you need wmf.28 too, just in case it's forgotten, if not then ignore my message)

Change #1143142 had a related patch set uploaded (by Sbisson; author: Nikerabbit):

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.28] ApiQueryPublishedTranslations: Make `from` and `to` mandatory

https://gerrit.wikimedia.org/r/1143142

Change #1143138 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.27] ApiQueryPublishedTranslations: Make `from` and `to` mandatory

https://gerrit.wikimedia.org/r/1143138

Change #1143142 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.44.0-wmf.28] ApiQueryPublishedTranslations: Make `from` and `to` mandatory

https://gerrit.wikimedia.org/r/1143142

Mentioned in SAL (#wikimedia-operations) [2025-05-07T21:19:20Z] <ladsgroup@deploy1003> Started scap sync-world: Backport for [[gerrit:1143138|ApiQueryPublishedTranslations: Make from and to mandatory (T392839)]], [[gerrit:1143142|ApiQueryPublishedTranslations: Make from and to mandatory (T392839)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-07T21:26:54Z] <ladsgroup@deploy1003> ladsgroup, sbisson: Backport for [[gerrit:1143138|ApiQueryPublishedTranslations: Make from and to mandatory (T392839)]], [[gerrit:1143142|ApiQueryPublishedTranslations: Make from and to mandatory (T392839)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-05-07T21:33:33Z] <ladsgroup@deploy1003> Finished scap sync-world: Backport for [[gerrit:1143138|ApiQueryPublishedTranslations: Make from and to mandatory (T392839)]], [[gerrit:1143142|ApiQueryPublishedTranslations: Make from and to mandatory (T392839)]] (duration: 14m 12s)

Nikerabbit lowered the priority of this task from Unbreak Now! to High.May 8 2025, 8:41 AM
Nikerabbit moved this task from Needs Triage to Bugs on the ContentTranslation board.

Change #1143051 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] dump-corpora: apply vslow query group

https://gerrit.wikimedia.org/r/1143051

Change #1143024 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] cx_translations: Add index to fetch by translator & update time quicker

https://gerrit.wikimedia.org/r/1143024

Change #1144564 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/ContentTranslation@master] cx_translations: Add index to make purging of unpublished draft faster

https://gerrit.wikimedia.org/r/1144564

Great to see that the slowness is being addressed! +1 to everything done already, and although I think the special page was valuable, it was very undependable.

Unfortunately for me, I've just published research based on these public statistics and now I need to find another way to make the data available in the interest of transparency and making my work reproduceable. For my use case I would be happy to do a one-off query and post the results somewhere like figshare, unless there are other ideas? I doubt and legal review of the data release can be done within my timeframe (1 week) so perhaps I'll just use my locally stored output from the public API before it was disabled.

@awight I am currently working on developing a public dashboard currently, it will live on the public Superset instance. You can read more details of the task here: T365813

All the stats that were being presented on the Special:CXStats page will be updated daily (T394431). The dashboard itself might not be the fully ready until mid-June. However, the datasets will be publicly published to https://analytics.wikimedia.org/published/datasets/ towards the end of next week (or early the week after if there is any delay with code reviews etc.) The data will be aggregated to daily granularity. I am not sure if the timing will work for you, but wanted to share anyway as an update.

@awight I am currently working on developing a public dashboard currently, it will live on the public Superset instance. You can read more details of the task here: T365813

All the stats that were being presented on the Special:CXStats page will be updated daily (T394431). The dashboard itself might not be the fully ready until mid-June. However, the datasets will be publicly published to https://analytics.wikimedia.org/published/datasets/ towards the end of next week (or early the week after if there is any delay with code reviews etc.) The data will be aggregated to daily granularity. I am not sure if the timing will work for you, but wanted to share anyway as an update.

Fantastic news, thank you for sharing! IMHO the proposed stats are good for ordinary per-language monitoring, and it's excellent to see that there will still be aggregation by both source and target language. Happily for me, it also covers my main use case (for background: extended abstract of research into translation flows). It would also be great to see flows grouped by (source, target) language pairs but this might be a specialized need... unless you can imagine it being useful to know eg. what the major flows to or from a given language are?

It would also be great to see flows grouped by (source, target) language pairs but this might be a specialized need... unless you can imagine it being useful to know eg. what the major flows to or from a given language are?

Yes, the task actually mentions only the metrics, but not many details on the filters. But the base datasets that will be published, will have the following dimensions/data:

cx_published_translations_daily

  • snapshot_date
  • translation_date
  • translation_source_language
  • translation_target_language
  • published_translation_count
  • translator_count

cx_draft_translations_daily

  • snapshot_date
  • translation_last_updated_date
  • translation_source_language
  • translation_target_language
  • translation_new_draft_count
  • translation_expansion_draft_count

Also, maybe it is better to continue the conversation on these metrics and the dashboard on the task for that: T365813 - as the scope of this task is different :)

Change #1143050 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] cx_suggestions: Add index to make deletion by title and language faster

https://gerrit.wikimedia.org/r/1143050

Change #1143062 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] cx_lists: Add index to make user suggestion fetching faster

https://gerrit.wikimedia.org/r/1143062

Change #1143070 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] cx_translations: Add index to make fetching language trends faster

https://gerrit.wikimedia.org/r/1143070

Change #1144564 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] cx_translations: Add index to make purging of unpublished draft faster

https://gerrit.wikimedia.org/r/1144564

All patches are merged. We need to ensure that the indexes are created on testwiki.

Is it only testwiki? The rest are on wikishared? like no test2wiki or anything like that?

I did this in test wiki. I think that's enough?

cumin2024@db1223.eqiad.wmnet[testwiki]> CREATE INDEX translation_started_by_last_updated_timestamp ON cx_translations (translation_started_by, translation_last_updated_timestamp);
Query OK, 0 rows affected (0.015 sec)
Records: 0  Duplicates: 0  Warnings: 0

cumin2024@db1223.eqiad.wmnet[testwiki]> CREATE INDEX cxs_source_language_title ON cx_suggestions (cxs_source_language, cxs_title);
Query OK, 0 rows affected (0.006 sec)
Records: 0  Duplicates: 0  Warnings: 0

cumin2024@db1223.eqiad.wmnet[testwiki]> CREATE INDEX cx_lists_owner ON /*_*/cx_lists (cxl_owner);
Query OK, 0 rows affected (0.007 sec)
Records: 0  Duplicates: 0  Warnings: 0

cumin2024@db1223.eqiad.wmnet[testwiki]> CREATE INDEX cx_translations_last_update_by_last_updated_timestamp ON /*_*/cx_translations (
    ->   translation_last_update_by, translation_last_updated_timestamp
    -> );
Query OK, 0 rows affected (0.010 sec)
Records: 0  Duplicates: 0  Warnings: 0

cumin2024@db1223.eqiad.wmnet[testwiki]> CREATE INDEX cx_translations_last_updated_timestamp ON /*_*/cx_translations (
    ->   translation_last_updated_timestamp
    -> );
Query OK, 0 rows affected (0.007 sec)
Records: 0  Duplicates: 0  Warnings: 0

I can confirm that the indexes are created on testwiki.

Review offset handling in ApiQueryContentTranslation.php

I think ApiQueryContentTranslation.php is fine. The one I was thinking of is ApiQueryPublishedTranslations.php and that code might be removed by T395319: CX: Remove "cxpublishedtranslations" Action API endpoint.