Page MenuHomePhabricator

Strange Mysql server gone away error - possible outage during 2017-10-22 night
Closed, DeclinedPublic

Description

Hello,

I just noticed my script encouraged Mysql server gone away error 2017-10-22 22:40:31. All relevant info are below.

Code of my script

/home/urbanecm/Documents/cswiki/tayari/script.py
#!/usr/bin/env python
#-*- coding: utf-8 -*-

from wmflabs import db
import pywikibot
conn = db.connect('cswiki')
site = pywikibot.Site()

zacatky = [
	'Alba roku',
	'Divadelní hry roku',
	'EP roku',
	'Filmy roku',
	'Koncertní video alba roku',
	'Knihy roku',
	'Opery roku',
	'Písně roku',
	'Kompilační alba roku',
	'Koncertní alba roku',
	'Singly roku',
	'Soundtracky roku'
]
for zacatek in zacatky:
	conn = db.connect('cswiki')
	cur = conn.cursor()
	with cur:
		sql = 'select page_title from page where page_namespace=14 and page_is_redirect=0 and page_title like "' + zacatek.replace(' ', '_') + '%";'
		print sql
		cur.execute(sql)
		data = cur.fetchall()
	if len(data)==0:
		continue
	for row in data:
		old_page_title = row[0]
		new_page_title = old_page_title.replace('roku', 'z_roku')
		page = pywikibot.Page(site, old_page_title.decode('utf-8'), ns=14)
		page.move(newtitle=u'Kategorie:' + new_page_title.decode('utf-8'), reason="Robot: Přesunutí dle ŽOPP od Tayari z 14/10/2017")
		cur = conn.cursor()
		with cur:
			sql = 'select page_title from categorylinks join page on page_id=cl_from where cl_to="' + old_page_title + '"'
			cur.execute(sql)
			data2 = cur.fetchall()
		if len(data2) != 0:
			for row2 in data2:
				page_title = row2[0]
				page2 = pywikibot.Page(site, page_title.decode('utf-8'))
				page2.text = page2.text.replace(old_page_title.replace('_', ' ').decode('utf-8'), new_page_title.replace('_', ' ').decode('utf-8'))
				page2.save("Robot: Přesunutí dle ŽOPP od Tayari z 14/10/2017")

Error tracebak

/home/urbanecm/tayari.err
Traceback (most recent call last):
  File "script.py", line 41, in <module>
    data2 = cur.fetchall()
  File "cursor.pyx", line 260, in oursql.Cursor.__exit__ (oursqlx/oursql.c:18328)
  File "connection.pyx", line 215, in oursql.Connection.rollback (oursqlx/oursql.c:6130)
  File "connection.pyx", line 183, in oursql.Connection._raise_error (oursqlx/oursql.c:5885)
oursql.OperationalError: (2006, 'MySQL server has gone away', None)
CRITICAL: Closing network session.

Other important things

My script was running during a few of hours with no problem at a gridengine. I just restarted it and it's running again, from the same point it stopped. My idea is that MySQL server was really away for a while during the night.

Discussion from IRC

#wikimedia-cloud
<Urbanecm> Hello, do we have an evidence of outages of labsdb replicas available from toolsforge?
<bd808> Urbanecm: not that I've heard. Do you have specifics?
<chasemp> I just connected to enwiki using a Tool I own Urbanecm
* ggp (~ggp@unaffiliated/ggp) has joined
<Urbanecm> Ok, I be specific. My script ended with MySQL server has gone away this night. It was making some queries like select page_title from page where page_namespace=14 and page_is_redirect=0 and page_title like "Alba_roku%"; (the part after "Alba_roku" differs in each query) and renaming those categories by Pywikibot. 
<Urbanecm> Complete stderr is at ~urbanecm/tayari.err 
<Urbanecm> My code is at ~urbanecm/Documents/cswiki/tayari/script.py
<Urbanecm> Any other info needed from me?
<bd808> Urbanecm: could you file a phabricator task? Chase and I are in a meeting right now and may forget otherwise
<Urbanecm> Sure. Should I assign somebody (you/Chase) to the task?

Thank you for any help,

Martin Urbanec

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Just a note: Outage is almost out of chance, as it failed with the same backtrace for the second time as I run the script the second time as I said... Just for now I'll restart it again to make it do some next renaming and wait for any help here as I dont know where could be my mistake...

db.connect('cswiki') would connect to labsdb1001. 'MySQL server has gone away' can mean many different things, but ultimately means that the client unexpectedly lost communication with the server.

@Urbanecm can you try to run the same script using the new database servers instead of labsdb1001?

@Urbanecm can you try to run the same script using the new database servers instead of labsdb1001?

I came here to make this same suggestion fwiw.

Thanks. I didnt find the way, but I really will examine the new DB servers later! This was just a single purpose script for one-time maintenance-related task, it does not make sense to rewrite it as resetting connection before all queries helped.