Some tables lack unique or primary keys, may allow confusing duplicate data
Open, HighPublic

Description

All database tables should have PRIMARY KEY's, or at minimum, a UNIQUE index

Details

Reference
bz15441

Related Objects

StatusAssignedTask
InvalidNone
DuplicateNone
ResolvedNone
Resolved RobLa-WMF
OpenNone
OpenNone
ResolvedNone
DeclinedNone
Resolveddaniel
Resolvedmatthiasmullie
ResolvedTTO
ResolvedMarostegui
ResolvedMarostegui
ResolvedAddshore
StalledNone
StalledNone
OpenNone
OpenNone
ResolvedMarostegui
ResolvedReedy
ResolvedReedy
ResolvedMarostegui
OpenNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedReedy
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2017-03-30T05:56:06Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1094 - T17441 (duration: 00m 45s)

db1094 (s7) is back in the pool with all the UNIQUE converted to PK for all the wikis except centralauth. Monitoring it for errors now...
Tables in use without PK:

oldimage
tag_summary
change_tag
user_newtalk
renameuser_status

Change 345507 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1090

https://gerrit.wikimedia.org/r/345507

Change 345507 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1090

https://gerrit.wikimedia.org/r/345507

Mentioned in SAL (#wikimedia-operations) [2017-03-30T08:03:50Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1090 - T17441 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2017-03-30T08:13:21Z] <marostegui> Convert UNIQUE keys to PK on db1090 (s2) - T17441

db1090 (s2) has had the following tables with UNIQUE -> PK:

categorylinks
imagelinks
iwlinks
langlinks
log_search
math
module_deps
objectcache
pagelinks
querycache_info
site_stats
templatelinks
text
transcache
user_former_groups
user_properties
watchlist
l10n_cache

Tables in use without PK:

oldimage
tag_summary
change_tag
user_newtalk

Mentioned in SAL (#wikimedia-operations) [2017-03-30T14:30:11Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1090 - T17441 (duration: 00m 45s)

jcrespo added a comment.EditedApr 11 2017, 6:51 PM

tl_from may need the same fix than pl_from, I am seeing some errors on db1093 on ruwiki. https://logstash.wikimedia.org/goto/605f2ef10f517db6c2724d7a6adf0aac

tl_from may need the same fix than pl_from, I am seeing some errors on db1093 on ruwiki. https://logstash.wikimedia.org/goto/605f2ef10f517db6c2724d7a6adf0aac

I will get that fixed now

tl_from may need the same fix than pl_from, I am seeing some errors on db1093 on ruwiki. https://logstash.wikimedia.org/goto/605f2ef10f517db6c2724d7a6adf0aac

I will get that fixed now

Actually not as there is the mediawiki train deployment window for the next two hours, so it will be hard to depool the server

I do not think this is an unbreak now- we can probably wait until tomorrow. This went fully unnoticed for weeks.

Change 347805 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1093

https://gerrit.wikimedia.org/r/347805

Change 347805 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1093

https://gerrit.wikimedia.org/r/347805

Mentioned in SAL (#wikimedia-operations) [2017-04-12T06:04:21Z] <marostegui> Deploy schema change on s6 - db1093 - T17441

db1093 is now fixed:

root@neodymium:/home/marostegui# for i in frwiki jawiki ruwiki; do echo $i; mysql --skip-ssl -hdb1093 $i -e "show create table templatelinks\G" | grep "KEY \`tl_from\`" ;done
frwiki
  KEY `tl_from` (`tl_from`,`tl_namespace`,`tl_title`)
jawiki
  KEY `tl_from` (`tl_from`,`tl_namespace`,`tl_title`)
ruwiki
  KEY `tl_from` (`tl_from`,`tl_namespace`,`tl_title`)

Going over logtash for the past 30 days I have not seen any other errors apart from the ones this server was giving for pagelinks and templatelinks.

Mentioned in SAL (#wikimedia-operations) [2017-04-12T06:42:01Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1093 - T17441 (duration: 00m 46s)

jcrespo claimed this task.Apr 25 2017, 4:33 PM

So, for enwiki the updated list (having into account the tl and pl errors + watchlist and tag tables) is:

alter table templatelinks add primary key (tl_from,tl_namespace,tl_title); -- duplicate index, to be fixed on mediawiki later
alter table pagelinks add primary key (pl_from,pl_namespace,pl_title); -- duplicate index, to be fixed on mediawiki later
alter table categorylinks drop key cl_from, add primary key (cl_from,cl_to);
alter table ep_users_per_course drop key ep_users_per_course, add primary key (upc_user_id,upc_course_id,upc_role);
alter table flaggedrevs_tracking drop key from_namespace_title, add primary key (ftr_from,ftr_namespace,ftr_title`);
alter table imagelinks drop key il_from, add primary key (il_from,il_to);
alter table iwlinks drop key iwl_from, add primary key (iwl_from,iwl_prefix,iwl_title);
alter table langlinks drop key ll_from, add primary key (ll_from,ll_lang);
alter table log_search drop key ls_field_val, add primary key (ls_field,ls_value,ls_log_id);
alter table math drop key math_inputhash, add primary key (math_inputhash);
alter table module_deps drop key md_module_skin, add primary key (md_module,md_skin);
alter table objectcache drop key keyname, add primary key (keyname);
alter table profiling drop key pf_name_server, add primary key (pf_name,pf_server);
alter table querycache_info drop key  qci_type, add primary key (qci_type);
alter table site_stats drop key ss_row_id, add primary key (ss_row_id);
alter table text drop key old_id, add primary key (old_id);
alter table transcache drop key tc_url_idx, add primary key (tc_url);
alter table user_former_groups drop key ufg_user_group, add primary key (ufg_user,ufg_group);
alter table user_properties drop key user_properties_user_property, add primary key (up_user,up_property);
alter table vote_log drop key log_id, add primary key (log_id);

I will double check it with db1089 (which, BTW, will break several times as this change as already been applied).

Mentioned in SAL (#wikimedia-operations) [2017-04-25T17:36:39Z] <jynus> running test schema change on etwiki on eqiad (depooled) T17441

Results (including errors):

root@db1075.eqiad.wmnet[(none)]> use etwiki
Database changed
root@db1075.eqiad.wmnet[etwiki]> alter table templatelinks add primary key (tl_from,tl_namespace,tl_title);
Query OK, 0 rows affected (10.64 sec)               
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table pagelinks add primary key (pl_from,pl_namespace,pl_title);
Query OK, 0 rows affected (1 min 22.80 sec)         
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table categorylinks drop key cl_from, add primary key (cl_from,cl_to);
Query OK, 0 rows affected (5.76 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table ep_users_per_course drop key ep_users_per_course, add primary key (upc_user_id,upc_course_id,upc_role);
ERROR 1146 (42S02): Table 'etwiki.ep_users_per_course' doesn't exist
root@db1075.eqiad.wmnet[etwiki]> alter table flaggedrevs_tracking drop key from_namespace_title, add primary key (ftr_from,ftr_namespace,ftr_title`);
    `> `\c
root@db1075.eqiad.wmnet[etwiki]> alter table flaggedrevs_tracking drop key from_namespace_title, add primary key (ftr_from,ftr_namespace,ftr_title);
ERROR 1146 (42S02): Table 'etwiki.flaggedrevs_tracking' doesn't exist
root@db1075.eqiad.wmnet[etwiki]> alter table imagelinks drop key il_from, add primary key (il_from,il_to);
Query OK, 0 rows affected (2.61 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table iwlinks drop key iwl_from, add primary key (iwl_from,iwl_prefix,iwl_title);
Query OK, 0 rows affected (0.60 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table langlinks drop key ll_from, add primary key (ll_from,ll_lang);
Query OK, 0 rows affected (25.97 sec)               
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table log_search drop key ls_field_val, add primary key (ls_field,ls_value,ls_log_id);
Query OK, 0 rows affected (0.18 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table math drop key math_inputhash, add primary key (math_inputhash);
Query OK, 0 rows affected (0.28 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table module_deps drop key md_module_skin, add primary key (md_module,md_skin);
Query OK, 0 rows affected (0.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table objectcache drop key keyname, add primary key (keyname);
Query OK, 0 rows affected (0.16 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table profiling drop key pf_name_server, add primary key (pf_name,pf_server);
ERROR 1146 (42S02): Table 'etwiki.profiling' doesn't exist
root@db1075.eqiad.wmnet[etwiki]> alter table querycache_info drop key  qci_type, add primary key (qci_type);
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table site_stats drop key ss_row_id, add primary key (ss_row_id);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table text drop key old_id, add primary key (old_id);
Query OK, 0 rows affected (26.58 sec)               
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table transcache drop key tc_url_idx, add primary key (tc_url);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table user_former_groups drop key ufg_user_group, add primary key (ufg_user,ufg_group);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table user_properties drop key user_properties_user_property, add primary key (up_user,up_property);
Query OK, 149551 rows affected (1.16 sec)              
Records: 149551  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table vote_log drop key log_id, add primary key (log_id);
ERROR 1146 (42S02): Table 'etwiki.vote_log' doesn't exist

No errors on labs or sanitariums.

Main concerns:

  • Duplication of indexes on tl and pl can be an issue, not only disk space but also in cache overhead
  • *links tables and text will take quite some time to run on large wikis

Above list minus the errors:

alter table templatelinks add primary key (tl_from,tl_namespace,tl_title); -- duplicate index, to be fixed on mediawiki later
alter table pagelinks add primary key (pl_from,pl_namespace,pl_title); -- duplicate index, to be fixed on mediawiki later
alter table categorylinks drop key cl_from, add primary key (cl_from,cl_to);
alter table imagelinks drop key il_from, add primary key (il_from,il_to);
alter table iwlinks drop key iwl_from, add primary key (iwl_from,iwl_prefix,iwl_title);
alter table langlinks drop key ll_from, add primary key (ll_from,ll_lang);
alter table log_search drop key ls_field_val, add primary key (ls_field,ls_value,ls_log_id);
alter table math drop key math_inputhash, add primary key (math_inputhash);
alter table module_deps drop key md_module_skin, add primary key (md_module,md_skin);
alter table objectcache drop key keyname, add primary key (keyname);
alter table querycache_info drop key  qci_type, add primary key (qci_type);
alter table site_stats drop key ss_row_id, add primary key (ss_row_id);
alter table text drop key old_id, add primary key (old_id);
alter table transcache drop key tc_url_idx, add primary key (tc_url);
alter table user_former_groups drop key ufg_user_group, add primary key (ufg_user,ufg_group);
alter table user_properties drop key user_properties_user_property, add primary key (up_user,up_property);

With this change + watchlist + tags, the list of tables without a primary key that are NOT clearly scheduled for dropping are:

cur
hidden
interwiki
l10n_cache
logging_pre_1_10
oldimage
querycache
querycachetwo
searchindex
securepoll_lists
securepoll_msgs
securepoll_properties
site_identifiers
user_newtalk

Most are not too large, but oldimage will cause problem on commons in the future. We'll see.

Catrope added a subscriber: Catrope.May 3 2017, 2:54 PM

ls_field_val is used in a FORCE INDEX, which is causing errors like:

[Exception Wikimedia\Rdbms\DBQueryError] (/srv/mediawiki/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php:1075) A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_comment,log_params,log_deleted,user_id,user_name,user_editcount,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_log_id=log_id  ) AS `ts_tags`  FROM `logging` FORCE INDEX (PRIMARY) LEFT JOIN `user` ON ((log_user=user_id)) INNER JOIN `log_search` FORCE INDEX (ls_field_val) ON ((ls_log_id=log_id))   WHERE log_type = 'delete' AND log_action = 'revision' AND ls_field = 'rev_id' AND ls_value = 'X' AND (log_type != 'suppress') AND log_namespace = 'X' AND log_title = 'X' AND ((log_deleted & 9) != 9)  ORDER BY log_timestamp DESC LIMIT 26  
Function: IndexPager::buildQueryInfo (LogPager)
Error: 1176 Key 'ls_field_val' doesn't exist in table 'log_search' (10.64.0.206)

  #0 /srv/mediawiki/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php(933): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
  #1 /srv/mediawiki/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php(1269): Wikimedia\Rdbms\Database->query(string, string)
  #2 /srv/mediawiki/php-1.29.0-wmf.21/includes/pager/IndexPager.php(368): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
  #3 /srv/mediawiki/php-1.29.0-wmf.21/includes/pager/IndexPager.php(225): IndexPager->reallyDoQuery(string, integer, boolean)
  #4 /srv/mediawiki/php-1.29.0-wmf.21/includes/logging/LogPager.php(428): IndexPager->doQuery()
  #5 /srv/mediawiki/php-1.29.0-wmf.21/includes/pager/IndexPager.php(422): LogPager->doQuery()
  #6 /srv/mediawiki/php-1.29.0-wmf.21/includes/logging/LogEventsList.php(623): IndexPager->getBody()
  #7 /srv/mediawiki/php-1.29.0-wmf.21/includes/specials/SpecialRevisiondelete.php(224): LogEventsList::showLogExtract(OutputPage, string, Title, string, array)
  #8 /srv/mediawiki/php-1.29.0-wmf.21/includes/specialpage/SpecialPage.php(522): SpecialRevisionDelete->execute(NULL)
  #9 /srv/mediawiki/php-1.29.0-wmf.21/includes/specialpage/SpecialPageFactory.php(578): SpecialPage->run(NULL)
  #10 /srv/mediawiki/php-1.29.0-wmf.21/includes/MediaWiki.php(287): SpecialPageFactory::executePath(Title, RequestContext)
  #11 /srv/mediawiki/php-1.29.0-wmf.21/includes/MediaWiki.php(862): MediaWiki->performRequest()
  #12 /srv/mediawiki/php-1.29.0-wmf.21/includes/MediaWiki.php(523): MediaWiki->main()
  #13 /srv/mediawiki/php-1.29.0-wmf.21/index.php(43): MediaWiki->run()
  #14 /srv/mediawiki/w/index.php(3): include(string)
  #15 {main}

Change 351653 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/core@master] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351653

Mentioned in SAL (#wikimedia-operations) [2017-05-03T15:13:06Z] <catrope@naos> Synchronized php-1.29.0-wmf.21/includes/logging/LogPager.php: Replace FORCE INDEX(ls_field_val) with IGNORE INDEX(ls_log_id) (https://gerrit.wikimedia.org/r/#/c/351653/ for T17441) (duration: 01m 14s)

Change 351653 merged by jenkins-bot:
[mediawiki/core@master] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351653

Change 351705 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/core@wmf/1.29.0-wmf.21] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351705

Change 351705 merged by jenkins-bot:
[mediawiki/core@wmf/1.29.0-wmf.21] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351705

Lahwaacz removed a subscriber: Lahwaacz.May 3 2017, 8:10 PM

Change 370190 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[mediawiki/core@master] tables.sql: Convert UNIQUE keys into PK

https://gerrit.wikimedia.org/r/370190

Marostegui added a comment.EditedAug 4 2017, 12:50 PM

I would like someone to help getting this: https://gerrit.wikimedia.org/r/#/c/370190/ merged and deployed.

To sum up, we have finished converting UNIQUE into PRIMARY keys and we would like this to be reflected on tables.sql so future wikis are created with the same PKs that we have deployed in core.

Thanks!

Reedy added a comment.EditedAug 29 2017, 5:50 PM

So... After 1.30.0-wmf.17 is everywhere, our DBA can drop the other two primary indexes that were blocked on a code change

ALTER TABLE /*_*/pagelinks DROP INDEX /*i*/pl_from;
ALTER TABLE /*_*/templatelinks DROP INDEX /*i*/tl_from;
Reedy updated the task description. (Show Details)
Reedy updated the task description. (Show Details)
Reedy updated the task description. (Show Details)Aug 29 2017, 9:31 PM
Reedy added a comment.Aug 29 2017, 9:36 PM

hitcounter removed from core. externalinks now has el_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,

I suspect archive and oldimage should be left alone due to the other major db overhauls to be done...

So querycache querycachetwo user_newtalk probably still want sorting?

Thank you very much @Reedy , I will create a separate task for deployment tracking- so we do not spam wmf-deployment related tasks on this mediawiki task: T174509

@Reedy I understand that T162774 should be resolved, too?

Reedy added a comment.Aug 29 2017, 9:43 PM

@Reedy I understand that T162774 should be resolved, too?

I think so, yup. Will do so

Reedy changed the status of subtask T146568: Add a primary key to oldimage from Open to Stalled.Aug 29 2017, 9:46 PM

I suspect archive and oldimage should be left alone due to the other major db overhauls to be done...

Archive should have ar_id already- if it doesn't have on wmf, it would be a deployment error, but I belive that is not the case or at least it is not widespread.

For image I would ask for an estimation to reformat the image tables -if anyone plans to work on it soon-, if it is long term it may be reasonable to deploy a PK shorter term (it may even facilitate further schema changes). Edit: it already has img_name ?

I have yet to see querycache querycachetwo user_newtalk, but I suspect they may be much easier to fix and deploy, and not much of a problem.

Reedy updated the task description. (Show Details)Aug 29 2017, 9:51 PM
Reedy added a comment.Aug 29 2017, 9:58 PM

Image does have img_name

oldimage also has oi_name, but in this case, it's not unique, as an image can have multiple revisions... oi_archive_name may work (but is quite a long string)

querycache has a patch... querycachetwo doesn't yet, neither does user_newtalk (need to decide what to use as a PK, likely a composite)

If we use qc_type, qc_value for querycache... We can probably use qcc_value, qcc_type for querycachetwo

Oh, sorry, I confused image and oldimage, of course, silly me.

Reedy changed the status of subtask T146571: Add a primary key to querycache from Open to Stalled.Sep 5 2017, 9:11 PM
jcrespo removed jcrespo as the assignee of this task.Nov 22 2017, 11:49 AM

Still a work in progress.

Reedy updated the task description. (Show Details)Jul 4 2018, 6:53 PM
Reedy updated the task description. (Show Details)Jul 11 2018, 8:26 PM