Page MenuHomePhabricator

Quarry cannot store results with identical column names
Closed, ResolvedPublic

Description

Something very strange is happening with this query:
https://quarry.wmflabs.org/query/18832

As @Zache reported, the query runs fine from a labs console,

8011 rows in set (7.12 sec)

However, it hangs forever when run from quarry. The older query above has been in the "running" status for over a month now.


Update: I've changed the query and it runs fine like this, https://quarry.wmflabs.org/query/20200

Leaving this card for anyone who is interested.

Event Timeline

awight lowered the priority of this task from Medium to Low.Jul 12 2017, 8:37 PM
awight updated the task description. (Show Details)

Probably not an OOM as in T172086.

MariaDB [quarry]> select * from query where id = 18832;
+-------+---------+-------------------+---------------+---------------------+-----------+-----------+------------------------------------------------------------------------------+
| id    | user_id | title             | latest_rev_id | last_touched        | parent_id | published | description                                                                  |
+-------+---------+-------------------+---------------+---------------------+-----------+-----------+------------------------------------------------------------------------------+
| 18832 |      22 | Fiwiki good diffs |        195408 | 2017-07-18 11:51:32 |      NULL |         0 |   log_action 'approve', 'approve2' is Flagged revs manually approved edits

 |
+-------+---------+-------------------+---------------+---------------------+-----------+-----------+------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [quarry]> select * from query_run where query_rev_id = 195408;
+--------+--------------+--------+---------------------+--------------------------------------+------------+
| id     | query_rev_id | status | timestamp           | task_id                              | extra_info |
+--------+--------------+--------+---------------------+--------------------------------------+------------+
| 191481 |       195408 |      2 | 2017-07-18 11:51:32 | 71791105-d9f8-402f-a6f0-17bd0c81caa0 | NULL       |
+--------+--------------+--------+---------------------+--------------------------------------+------------+
1 row in set (0.23 sec)

No OOM kills that day.

Ok, I forked the query to https://quarry.wmflabs.org/query/20623 and reproduced the issue:

zhuyifei1999@quarry-runner-01:~$ grep 33b3c878-4f00-46a1-ad8c-a39ef615872f /var/log/syslog -B 1 -A 11
Jul 31 10:37:01 quarry-runner-01 CRON[16532]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: [2017-07-31 10:37:54,996: ERROR/MainProcess] Task worker.run_query[33b3c878-4f00-46a1-ad8c-a39ef615872f] raised unexpected: OperationalError('duplicate column name: rev_id',)
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: Traceback (most recent call last):
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: R = retval = fun(*args, **kwargs)
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 437, in __protected_call__
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: return self.run(*args, **kwargs)
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: File "/srv/quarry/quarry/web/worker.py", line 70, in run_query
Jul 31 10:37:54 quarry-runner-01 celery-quarry-worker[591]: output.start_resultset([c[0] for c in cur.description], cur.rowcount)
Jul 31 10:37:55 quarry-runner-01 celery-quarry-worker[591]: File "/srv/quarry/quarry/web/results.py", line 33, in start_resultset
Jul 31 10:37:55 quarry-runner-01 celery-quarry-worker[591]: self.db.execute(sql)
Jul 31 10:37:55 quarry-runner-01 celery-quarry-worker[591]: OperationalError: duplicate column name: rev_id
Jul 31 10:38:01 quarry-runner-01 CRON[16544]: (prometheus) CMD (/usr/local/bin/prometheus-puppet-agent-stats --outfile /var/lib/prometheus/node.d/puppet_agent.prom)
zhuyifei1999 renamed this task from Quarry query in unknown state to Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result.Jul 31 2017, 10:43 AM

Reproduced on vagrant with:

SELECT query.id, query_revision.id
FROM query, query_revision
WHERE query.id = 1
AND query_revision.id = 1;
MariaDB [quarry]> SELECT query.id, query_revision.id
    -> FROM query, query_revision
    -> WHERE query.id = 1
    -> AND query_revision.id = 1;
+----+----+
| id | id |
+----+----+
|  1 |  1 |
+----+----+
1 row in set (0.00 sec)
MariaDB [fiwiki_p]> SELECT
    ->   log_title, 
    ->   r1.rev_page, 
    ->   r1.rev_id, 
    ->   r2.rev_id, 
    ->   concat("https://fi.wikipedia.org/w/index.php?diff=",  r1.rev_id , "&oldid=", r2.rev_id ) AS diff
    -> FROM
    ->   logging_userindex, 
    ->   revision AS r1, 
    ->   revision AS r2 
    -> WHERE 
    ->   log_action IN ('approve', 'approve2') 
    ->   AND log_params LIKE concat("%i:0;i:", r1.rev_id, ";%") 
    ->   AND log_params LIKE concat("%i:1;i:", r2.rev_id, ";%")
    ->   AND r1.rev_page=r2.rev_page
    ->   AND log_page=r1.rev_page
    ->   AND r1.rev_id > r2.rev_id
    ->   AND log_timestamp > @datelimit
    ->   AND log_namespace=0
    -> LIMIT 1;
+-----------------------+----------+----------+----------+-------------------------------------------------------------------+
| log_title             | rev_page | rev_id   | rev_id   | diff                                                              |
+-----------------------+----------+----------+----------+-------------------------------------------------------------------+
| 2_Minutes_to_Midnight |    75811 | 16584329 | 16520058 | https://fi.wikipedia.org/w/index.php?diff=16584329&oldid=16520058 |
+-----------------------+----------+----------+----------+-------------------------------------------------------------------+
1 row in set (0.50 sec)

The selected column names do not include the table names, so they are literally the same.

zhuyifei1999 renamed this task from Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result to Quarry cannot store results with identical column names.Jul 31 2017, 11:26 AM

Some background for anyone who can cleanly resolve this:

Results are stored in SQLite tables.

zhuyifei1999 raised the priority of this task from Low to Needs Triage.Aug 4 2017, 1:38 PM

Thanks @zhuyifei1999 -- I hadn't noticed that I was making a duplicate.

It's okay. If we could just somehow fix this...

It's fairly confusing to have the query stall with no indication of stalling in the user interface. However, clicking on the button Explain will show: "Error: Hmm... Is the SQL actually running?", so the system does seem to be aware that it's not actually running any query. Maybe the status task could check the status in the same manner that the Explain task does?

Not really related.

Mentioned in SAL (#wikimedia-cloud) [2023-02-28T12:16:59Z] <Rook> Don't crash if multiple columns share a name a2c6d3fbf4c52b0967e4016ccdf3910db330cf0d T170464