Page MenuHomePhabricator

Memory Exhausted Near / Tool labs error while querying with Python
Closed, DeclinedPublic

Description

I am doing a query to several databases at the same time (can be between 1 to 150) asking for edit_count on the user table and I order them desc.

File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
  self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
  raise errorclass, errorvalue

_mysql_exceptions.ProgrammingError: (1064, 'memory exhausted near \'UNION ALL SELECT "cawiki",user_editcount FROM cawiki_p.user WHERE user_name LIKE\' at line 1')

This is the message I am getting. I wonder if I can do anything with the code or with the limitation (increasing RAM?).

Every query is independent, so I don't need to keep any information. Maybe there is something to empty temporal space in python?

Thank you so much.

Marc Miquel
marcmiquel@gmail.com

Event Timeline

marcmiquel raised the priority of this task from to Needs Triage.
marcmiquel updated the task description. (Show Details)
marcmiquel added a project: Toolforge.
marcmiquel added subscribers: marcmiquel, Springle.

Are you using prepared statements at all?

Yes. I do. Now I got a memory error after days working well.

marcmiquel@tools-bastion-01:~$ cat scriptabroad.err
Traceback (most recent call last):

File "cira_abroad.py", line 574, in <module>
  else: extract_data_interwiki(originarylang, attributeslist, mysql_cur, input_file_name)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 249, in __exit__
  self.rollback()

_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in <bound method Cursor.del of <MySQLdb.cursors.Cursor object at 0x1c47090>> ignored
Traceback (most recent call last):

File "cira_abroad.py", line 570, in <module>
  if "abroad" in attributeslist: extract_data(originarylang, mysql_cur, input_file_name, page_titles)
File "cira_abroad.py", line 336, in extract_data
  extract_cira_abroadrank(mysql_cur, list_of_lang_to_examine, llengua, articlelangs_titles[langu], page_title, output_file7)
File "cira_abroad.py", line 156, in extract_cira_abroadrank
  mysql_cur.execute(query,tuple(user_asstring))
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
  self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
  raise errorclass, errorvalue

MemoryError

I was wondering whether this memory error was from Tool labs not having enough memory or from the database query.
Is there a way to free memory during the execution?

Thanks.

Marc

For the error 1064: https://bugs.mysql.com/bug.php?id=69383 . Check the size of your largest prepare statements.

The second error, "Commands out of sync; you can't run this command now", might have returned as a MemoryError in python, but actually indicates some internal problem such as executing client API methods out of order[1] or somehow the server has seen malformed SQL (maybe that was flow-on effect due to memory issue on the client -- can't tell).

Can you repeat the problem? Can you reduce it to an example test case?

[1] http://dev.mysql.com/doc/refman/5.5/en/commands-out-of-sync.html

Dear Springle,

Thanks for your comments. I will check these things you say.

I want to run a query to enwiki and I am afraid it will break (It has run for 7 hours by now without showing results yet).

SELECT rev_user_text, COUNT(*) FROM revision INNER JOIN page ON rev_page=page_id WHERE page_namespace=0 AND page_is_redirect=0 GROUP BY rev_user_text ORDER BY 2 DESC

Do you think it is technically possible to query this? Should I treat it differently with mysql and python (like freeing space, some sort of caché, etc.)?

Thank you very much for your advice.

Cheers,

Marc

Indeed, it broke:

Traceback (most recent call last):

File "cira_abroad.py", line 602, in <module>

  langlist = []

File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 249,

in exit

self.rollback()

_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')

In such a big query, how could I avoid this?

Thank you very much.

Marc

2015-03-25 13:00 GMT+01:00 marcmiquel <no-reply@phabricator.wikimedia.org>:

marcmiquel added a comment.

Dear Springle,

Thanks for your comments. I will check these things you say.

I want to run a query to enwiki and I am afraid it will break (It has run
for 7 hours by now without showing results yet).

SELECT rev_user_text, COUNT(*) FROM revision INNER JOIN page ON
rev_page=page_id WHERE page_namespace=0 AND page_is_redirect=0 GROUP BY
rev_user_text ORDER BY 2 DESC

Do you think it is technically possible to query this? Should I treat it
differently with mysql and python (like freeing space, some sort of caché,
etc.)?

Thank you very much for your advice.

Cheers,

Marc

TASK DETAIL

https://phabricator.wikimedia.org/T93074

REPLY HANDLER ACTIONS

Reply to comment or attach files, or !close, !claim, !unsubscribe or

!assign <username>.

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: marcmiquel
Cc: Springle, marcmiquel, Aklapper, yuvipanda, jayvdb, scfc, coren

scfc triaged this task as Low priority.Apr 6 2015, 11:04 AM
scfc moved this task from Triage to Backlog on the Toolforge board.
scfc added a subscriber: scfc.
bd808 added a subscriber: bd808.

closing this very old issue