Page MenuHomePhabricator

Create a whitelist of tables to checksum on all wikis
Closed, ResolvedPublic

Description

The following errors were found when running pt-table-checksum on testwiki:

06-30T09:19:03 Error checksumming table testwiki.change_tag: 
Use of uninitialized value in string ne at /usr/bin/pt-table-checksum line 6682.

06-30T09:19:04 Cannot checksum table testwiki.click_tracking: 
There is no good index and the table is oversized. at /usr/bin/pt-table-checksum line 6408.

06-30T09:19:23 Skipping chunk 3 of testwiki.querycache because it is oversized.  
The current chunk size limit is 2000 rows (chunk size=1000 * chunk size limit=2.0), 
but MySQL estimates that there are 2462 rows in the chunk.

06-30T09:19:23 Error checksumming table testwiki.querycache: 
Possible infinite loop detected!  
The lower boundary for chunk 4 is <Deadendpages, Deadendpages, 0> 
and the lower boundary for chunk 5 is also <Deadendpages, Deadendpages, 0>.  
This usually happens when using a non-unique single column index.  
The current chunk index for table testwiki.querycache is qc_type which is not unique and covers 2 columns.

06-30T09:19:28 Skipping chunk 1 of testwiki.tag_summary because MySQL 
chose no index  instead of the tag_summary_log_idindex.

06-30T09:19:28 Error checksumming table testwiki.tag_summary: 
Use of uninitialized value in string ne at /usr/bin/pt-table-checksum line 6682.

While computed checksums were the same on all production hosts, except labsdb hosts, the following non-existent tables broke replication from db1069:

Error 'Table 'testwiki.accountaudit_login' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.blob_orphans' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.blob_tracking' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.bv2009_edits' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.cu_changes' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.cu_log' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.edit_page_tracking' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.email_capture' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.filejournal' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.hidden' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.job' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.log_search' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.logging_pre_1_10' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.moodbar_feedback' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.moodbar_feedback_response' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.objectcache' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.pr_index' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.prefstats' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.prefswitch_survey' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.profiling' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.querycache' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.querycache_info' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.querycachetwo' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_cookie_match' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_elections' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_entity' doesn't exist' on query. Default database: 'testwiki'. 
Error 'Table 'testwiki.securepoll_lists' doesn't exist' on query. Default database: 'testwiki'. 
Error 'Table 'testwiki.securepoll_msgs' doesn't exist' on query. Default database: 'testwiki'. 
Error 'Table 'testwiki.securepoll_options' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_properties' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_questions' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_strike' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_voters' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.securepoll_votes' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.spoofuser' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.text' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.titlekey' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.transcache' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.uploadstash' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.user_newtalk' doesn't exist' on query. Default database: 'testwiki'.
Error 'Table 'testwiki.watchlist' doesn't exist' on query. Default database: 'testwiki'.

Only some core tables from a whitelist should be checked- those that are InnoDB, have primary keys and exist everywhere.
Create such a list.

Related Objects

StatusSubtypeAssignedTask
Resolvedjcrespo
ResolvedLadsgroup
ResolvedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedReedy
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
DeclinedNone
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedLadsgroup
ResolvedKormat
OpenMarostegui
ResolvedKormat
ResolvedMarostegui
ResolvedKormat

Event Timeline

jcrespo claimed this task.
jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added a project: DBA.
jcrespo added subscribers: jcrespo, Springle.
jcrespo triaged this task as Medium priority.Jul 2 2015, 7:47 AM
jcrespo set Security to None.
jcrespo moved this task from Triage to In progress on the DBA board.

This is a list of tables that are safe to check on "regular" wikis (they are part of the core tables and create no problems to pt-online-schema-change):

archive,category,categorylinks,externallinks,filearchive,image,imagelinks,interwiki,ipblocks,iwlinks,l10n_cache,langlinks,logging,module_deps,msg_resource,msg_resource_links,oldimage,page,page_props,page_restrictions,pagelinks,protected_titles,recentchanges,redirect,site_stats,sites,templatelinks,updatelog,user,user_former_groups,user_groups,user_properties,valid_tag

The following tables either do not exist on all slaves, do not have appropriate primary keys to iterate on them (or at all) or are so large that create lag on the slaves:

change_tag
job
user_newtalk
log_search
objectcache
querycache
querycache_info
querycachetwo
revision
tag_summary
text
transcache
uploadstash
watchlist

I am particularly concerned about revision, as it is one of the most important tables of core.

Other non-standard wikis may have problems with the first list, such as commons with the table image, or have extra important tables.

The rest of the tasks should be handled by T17441.

This list of tables that Jaime mentioned look the same at the time of the ticket, so probably should keep excluded from the list, because they either remain without a PK or are too massive:

change_tag

user_newtalk
log_search
objectcache
querycache
querycache_info
querycachetwo
revision
tag_summary
text
transcache
uploadstash
watchlist

Table that now do have a PK:

job

Just to be safe, we should double check every specific shard before doing the consistency check

The thing is, revision and text are key tables, and those should be checked. Many other tables, except maybe user, page and others are not that important, those 2 are. They have PKs, so maybe there is a way to do it, just slowly.

The thing is, revision and text are key tables, and those should be checked. Many other tables, except maybe user, page and others are not that important, those 2 are. They have PKs, so maybe there is a way to do it, just slowly.

Maybe adjusting the chunk-size manually instead of leaving the tool to automatically adjust it itself. As we need to decomm servers from pretty much all the shards we can try with the smaller versions of revision table first and see how it behaves.
Also we can see how it works with phabricator_file.file_storageblob which is a 28G table with PK...or phabricator_metamta.metamta_mail which is 15G with PK too.

Maybe adjusting the chunk-size manually instead of leaving the tool to automatically adjust it itself. As we need to decomm servers from pretty much all the shards we can try with the smaller versions of revision table first and see how it behaves.

+1

Also we can see how it works with phabricator_file.file_storageblob which is a 28G table with PK...or phabricator_metamta.metamta_mail which is 15G with PK too.

You do not need to be gentle for most of misc- lag is not a problem there (slaves are passive), and neither are iops.