Page MenuHomePhabricator

P6. beta-update-databases-eqiad failing on enwiki
Closed, ResolvedPublic2 Estimated Story Points

Description

All seem to be passing, but enwiki seems to be failing consistently since Jan 15, 2015 12:20:00 UTC

All I get from the console output:

beta-update-databases-eqiad » deployment-bastion-eqiad,enwiki completed with result FAILURE

Event Timeline

greg created this task.Jan 15 2015, 5:05 PM
greg raised the priority of this task from to High.
greg updated the task description. (Show Details)
greg added a subscriber: greg.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 15 2015, 5:05 PM
hashar added a subscriber: hashar.

The beta-update-databases-eqiad Jenkins job is a 'multiple configuration job'. When run, it spawns an instance of itself with a specific setting: the database name.

From the build history, it started failing between 15 janv. 2015 11:20:00 UTC and 15 janv. 2015 12:20:00 UTC. The first failing build is https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/6918/ which shows the instance with parameter wikidb=enwiki failing.

The console output shows a failure while running update.php:

..Update 'FlowPopulateLinksTables' already logged as completed.
.......................................A database query error has occurred.
12:21:41 Query: INSERT  INTO `flow_revision` (rev_id,rev_user_id,rev_user_ip,rev_user_wiki,rev_parent_id,rev_change_type,rev_type,rev_type_id,rev_content,rev_flags,rev_mod_state,rev_mod_user_id,rev_mod_user_ip,rev_mod_user_wiki,rev_mod_timestamp,rev_mod_reason,rev_last_edit_id,rev_edit_user_id,rev_edit_user_ip,rev_edit_user_wiki,rev_content_length,rev_previous_content_length)
         VALUES ('¿\\��','0','162.222.73.152','enwiki','«�\\��','edit-title','post','«�\\��','�,�IUHM�,IM1г45�0350�0�0627�\0','utf-8,gzip,wikitext','',NULL,NULL,NULL,NULL,NULL,'¿\\��','0','162.222.73.152','enwiki','30','54')
12:21:41 Function: Flow\Data\Storage\RevisionStorage::insert
12:21:41 Error: 1062 Duplicate entry '\x05\x0C\xC2\xBF\x07\x05\\xB2\x04\xFA\x02' for key 'PRIMARY' (10.68.16.193)
12:21:41 
12:21:41 PHP Notice:  Uncommitted DB writes (transaction from DatabaseBase::begin). in /mnt/srv/mediawiki-staging/php-master/includes/db/Database.php on line 4322

So something is not taken in account by the Flow MediaWiki extension. It might a bug in Flow SQL query or some unexpected data in the beta cluster database for enwiki.

Although in production we do not use update.php, we surely reuse the crafted SQL and it might well cause an issue on the production database as well.

CCing Flow since that needs investigation on their part.

To investigate, here how to connect to the enwiki beta cluster database:

$ ssh deployment-bastion.eqiad.wmflabs
$ sql enwiki
(wikiadmin@deployment-db1) [enwiki]>   # insert magical SQL command here
EBernhardson set Security to None.
EBernhardson moved this task from Untriaged to In Development on the Collaboration-Team-Triage board.

this looks to have been caused by gerrit 165683, the maint script that goes with it(which will be run in prod eventually) is not working correctly. will fix.

somehow we have 29 unattributed revisions on beta, and those are causing this to error out. Those revisions also fail normal rendering such as at:

http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Topic:Rmhzy9700v0knbb6&topic_showPostId=rmhzy97353vvboqq#flow-post-rmhzy97353vvboqq

The timestamps on these revisions are jan 2 through jan 12 of 2014 and i dont see any matching errors in prod using the following SQL:

select hex(rev_id), flow_revision.* from flow_revision where rev_user_id = 0 and rev_user_ip = ''\G

As such, i dont think there is a current bug to track down that creates these revisions, I will use some manual SQL on the beta mysql to reassign these posts to the flow maintenance user.

still looking into this, the above fixed one problem but there is still something else happening that is triggered by the current database state.

It appears that deployment-bastion-eqiad isn't executing anything (and hasn't since approximately 18:00 on 16 Jan.):

https://integration.wikimedia.org/ci/label/deployment-bastion-eqiad/load-statistics?type=hour

I cancelled the stacked-up beta-update-databases-eqiad tasks without effect, and they came back…

It appears that deployment-bastion-eqiad isn't executing anything (and hasn't since approximately 18:00 on 16 Jan.):

https://integration.wikimedia.org/ci/label/deployment-bastion-eqiad/load-statistics?type=hour

I cancelled the stacked-up beta-update-databases-eqiad tasks without effect, and they came back…

That is unrelated to enwiki failing. Jenkins deadlock from time to time and can no more execute jobs on deployment-bastion :( I have restarted Jenkins.

It appears that deployment-bastion-eqiad isn't executing anything (and hasn't since approximately 18:00 on 16 Jan.):

https://integration.wikimedia.org/ci/label/deployment-bastion-eqiad/load-statistics?type=hour

I cancelled the stacked-up beta-update-databases-eqiad tasks without effect, and they came back…

That is unrelated to enwiki failing. Jenkins deadlock from time to time and can no more execute jobs on deployment-bastion :( I have restarted Jenkins.

Thanks.

Mattflaschen-WMF renamed this task from beta-update-databases-eqiad failing on enwiki to [2] beta-update-databases-eqiad failing on enwiki.Jan 28 2015, 7:49 PM
gerritbot added a subscriber: gerritbot.

Change 187272 had a related patch set uploaded (by EBernhardson):
Call CachingObjectMapper::clear() from ObjectManager::clear()

https://gerrit.wikimedia.org/r/187272

Patch-For-Review

Change 187272 merged by jenkins-bot:
Call CachingObjectMapper::clear() from ObjectManager::clear()

https://gerrit.wikimedia.org/r/187272

DannyH renamed this task from [2] beta-update-databases-eqiad failing on enwiki to [2] P6. beta-update-databases-eqiad failing on enwiki.Jan 30 2015, 6:38 PM
hashar closed this task as Resolved.Jan 30 2015, 9:20 PM

https://gerrit.wikimedia.org/r/187272 fixed the issue. The job kept failing due to a problem with composer which has been fixed as well ( https://gerrit.wikimedia.org/r/#/c/187656/ ).

DannyH renamed this task from [2] P6. beta-update-databases-eqiad failing on enwiki to P6. beta-update-databases-eqiad failing on enwiki.Feb 3 2015, 11:54 PM
greg moved this task from INBOX to Done on the Release-Engineering-Team board.Feb 5 2015, 7:09 PM