Page MenuHomePhabricator

Audit of unused indexes, 2023
Open, LowPublic

Description

This is the report from a random replica:

wikiadmin2023@10.64.131.10(performance_schema)> select OBJECT_NAME, INDEX_NAME from table_io_waits_summary_by_index_usage where OBJECT_SCHEMA = 'fawiki' and  SUM_TIMER_READ = 0 and NOT INDEX_NAME IS NULL limit 500;

To be checked:

  • abuse_filter:af_actor: needed
  • abuse_filter:af_user: will be dropped with actor clean up.
  • abuse_filter_log:afl_ip_timestamp: Might be redundant
  • abuse_filter_log:afl_rev_id: needed
  • abuse_filter_log:afl_wiki_timestamp: needed
  • archive:PRIMARY
  • babel:babel_lang_level
  • change_tag:PRIMARY
  • comment:comment_hash
  • content_models:model_name
  • cu_changes:cuc_xff_hex_time
  • cu_log:PRIMARY
  • filearchive:fa_deleted_timestamp
  • flaggedpage_pending:fpp_quality_pending
  • flaggedpages:fp_quality_page
  • flaggedpages:fp_reviewed_page
  • flaggedrevs:fr_page_qal_rev
  • flaggedrevs:fr_page_qal_time
  • flaggedrevs:fr_user
  • global_block_whitelist:gbw_by
  • ipblocks_restrictions:ir_type_value
  • log_search:ls_log_id
  • oldimage:oi_sha1
  • ores_classification:PRIMARY
  • ores_model:PRIMARY
  • ores_model:ores_model_status
  • ores_model:oresm_version
  • page:page_len
  • page_restrictions:pr_level
  • querycachetwo:qcc_titletwo
  • recentchanges:rc_ip
  • securepoll_votes:spvote_ip
  • site_identifiers:PRIMARY
  • site_identifiers:si_key
  • slot_roles:role_name
  • transcode:PRIMARY
  • updatelog:PRIMARY
  • uploadstash:PRIMARY
  • uploadstash:us_user
  • user:user_email_token: Needed
  • wikilove_log:PRIMARY
  • wikilove_log:wll_receiver_time
  • wikilove_log:wll_sender_time
  • wikilove_log:wll_timestamp
  • wikilove_log:wll_type_time

Now we need to check if the index is needed or not. there are many cases that the index is needed in some shape or form but might end up in list of unused indexes. So any action must be done with caution.

To be dropped:

  • cur:name_title
  • cur:cur_title
  • cur:cur_timestamp
  • cur:cur_random
  • cur:name_title_timestamp
  • cur:user_timestamp
  • cur:usertext_timestamp
  • cur:namespace_redirect_timestamp |
  • cur:jamesspecialpages
  • cur:qry_checktouched
  • cur:id_title_ns_red
  • sites:site_domain
  • sites:site_forward
  • sites:site_global_key
  • sites:site_group
  • sites:site_language
  • sites:site_protocol
  • sites:site_source
  • sites:site_type
  • categorylinks:cl_collation_ext
  • iwlinks:iwl_prefix_from_title

Event Timeline

I cross checked with a random replica in enwiki and took the intersection.

The biggest impact can be made by dropping cl_collation_ext. We need to make sure updateCollation.php maint script doesn't need it (or modify it in a way that wouldn't need to use it)

We should check that in production and with different types of hosts: main traffic, API and vslow, just in case.
While the replicas are good as an indication, we've had issues with production hosts, especially because of the optimizer gathered stats.

The biggest impact can be made by dropping cl_collation_ext.

I suspect this may be related to ICU updates for collation, which we do very rarely, but when they happen, they're quite important. E.g. annually around PHP upgrades.

Yeah I know about ICU updates (and some plans on improving it (T329491: ICU transition towards ICU 67) but the maint script can simply iterate through cl_from in batches of 1000 values (cl_from between x and x+1000) or something like that without needing to rely on an index.

We should check that in production and with different types of hosts: main traffic, API and vslow, just in case.
While the replicas are good as an indication, we've had issues with production hosts, especially because of the optimizer gathered stats.

Definitely, also this is a starting point. I'm checking in code to make sure we don't end up with another tmp1 incident, for example. the index mentioned on user (user_email_token) is actually used when you search for the columns being indexed but it's used quite rarely that's not showing up in used indexes but I'm not gonna drop it.

Ladsgroup moved this task from Triage to In progress on the DBA board.

For the MediaModeration 2.0 work we will probably use the oi_sha1 index to search for old images by their sha1.

Change 994154 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Make it possible to find in use DB indexes via a code search

https://gerrit.wikimedia.org/r/994154

Change 994154 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Make it possible to find in use DB indexes via a code search

https://gerrit.wikimedia.org/r/994154

Ladsgroup updated the task description. (Show Details)