Page MenuHomePhabricator

#get_db_data ignores 'join on' clause
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Have defined:

{{#get_db_data:
db=test
|from=t1
|where=t1.c1='test'
|join on=t1.c1=t2.id
|data=t1c1=t1.c1
}}

What happens?:
Generated query is
SELECT t1.c1 FROM t1 WHERE t1.c1='test'

What should have happened instead?:
Query should include JOIN clause
SELECT t1.c1 FROM t1 JOIN t2 ON t1.c1=t2.id WHERE t1.c1='test'

Software version (skip for WMF-hosted wikis like Wikipedia):
3.1 (REL1_39), 3.2, git master.

Other information (browser name/version, screenshots, etc.):
Used with Mediawiki 1.39, MariaDB 11, PHP 8.1.

Event Timeline

Sugarbravo updated the task description. (Show Details)
Sugarbravo updated the task description. (Show Details)

What is the database engine, to which {{#get_db_data:}} connects?

In $wgExternalDataSources[] data source is configured with 'type' => 'mysql'.

Database server is MariaDB 11.

Change 971207 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Add implicit tables to database query

https://gerrit.wikimedia.org/r/971207

I have never heard that a table used in a query could be omitted from the from clause, and that mentioning it in the join on would be enough. However, I have submitted a patch allowing this.

Oh... if t2 was not in "from", that would definitely explain the problem.

@alex-mashin (or @Sugarbravo) - is this change a good idea? Maybe it would be better to display an error message if a "join on" table is not in "from"?

Whether t2 is listed in from= or not doesn't make a difference for producing a JOIN clause by ED.

@alex-mashin As far as I know, it is perfectly normal to list only the first table in FROM and then joined ones in their corresponding JOIN conditions.

@Sugarbravo - are you saying that this call fails (or returns the wrong result) as well?

{{#get_db_data:
db=test
|from=t1,t2
|where=t1.c1='test'
|join on=t1.c1=t2.id
|data=t1c1=t1.c1
}}

With that, ED produces the following SQL:

SELECT t1.c1 FROM `t1`,`t2` WHERE t1.c1='test'

i.e. lacking t1.c1=t2.id condition. Thus neither explicit nor implicit join is done.

To add some context, we are upgrading an MW wiki, which uses ED a lot, from MW 1.35 to 1.39. Many articles, which use ED, appear broken, when MW and ED are upgraded to 1.39. Since everything “just works” in 1.35, I have never looked closely into SQL produced by #get_db_data before. I have just checked MW 1.35 (corresponding ED version is 2.1) and

{{#get_db_data:
db=test
|from=t1,t2
|where=t1.c1='test'
|join on=t1.c1=t2.id
|data=t1c1=t1.c1
}}

produces

SELECT t1.c1 FROM t1 JOIN t2 ON ((t1.c1=t2.id)) WHERE t1.c1=test ORDER BY t1.c1

which is an expected JOIN (though maybe with some superfluous parentheses).

That seems to be not working (JOIN missing) since at least ED 3.1. Glancing at the current ED code it looks like a major rewrite since MW 1.35 (ED 2.1).

EDConnectorRdbms.php invokes database->select(), which, according to docs, expects join conditions passed as an array of arrays. However in EDConnectorComposed.php $this->joins is prepared simply as a list of conditions. So the resulting query has no JOIN part.

This JOIN condition was correctly built in EDUtils.php (static function searchDB()) of ED 2.1. That part of code did not make it into rewritten ED 3.2.

I have updated my patch so that it reproduces the old behaviour of join on. Please note that, in mySQL or MariaDB, that join will always be an inner join, since there is no way to set the type of join in this syntax.

I suggest that you move to prepared statements: that will improve security and performance and will give a complete control over queries.

Change 971207 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Restore JOIN clauses, add implicit tables to database query

https://gerrit.wikimedia.org/r/971207

Thank you fixing this quickly. Will it be ported back to 3.2 or is this for the next version?

Thank you fixing this quickly. Will it be ported back to 3.2 or is this for the next version?

You should always use the master branch.

alex-mashin changed the task status from Open to In Progress.Nov 11 2023, 6:15 AM
alex-mashin claimed this task.

@Sugarbravo, have you updated the extension and made sure that JOINs work, so that this task can be closed?

Just tried git master version and it didn't work. This #get_db_data:

{#get_db_data:
|db=enc
|from=enccategorylinks=ecl,encpage=p
|join on=ecl.cl_from=p.page_id 
|where=ecl.cl_to='Gimę_{{CURRENTMONTHNAMEGEN}}_{{LOCALDAY}}_d.' and ecl.cl_type='page' and p.page_namespace = 0
|limit=200
|order by=ecl.cl_sortkey asc
|data=asm=p.page_title,asmt=replace(p.page_title, '_', ' ')
}}

results in this SQL:

SELECT p.page_title,replace(p.page_title, '_', ' ') FROM `enccategorylinks` `ecl`,`encpage` `p` WHERE ecl.cl_to='Gimę_lapkričio_13_d.' and ecl.cl_type='page' and p.page_namespace = 0 ORDER BY ecl.cl_sortkey asc LIMIT 200

I'm using my own quick fix (borrowing from REL1_35), which works:

$ diff EDConnectorComposed.php.3.2 EDConnectorComposed.php.myfix 
42c42,58
<               $this->joins = isset( $args['join on'] ) ? self::paramToArray( $args['join on'] ) : null;
---
>               if ( isset( $args['join on'] ) ) {
>                   $joinStrings = explode( ',', $args['join on'] );
>                   foreach ( $joinStrings as $i => $joinString ) {
>                       if ( $joinString == '' ) {
>                               continue;
>                       }
>                       if ( strpos( $joinString, '=' ) === false ) {
>                               return "Error: every \"join on\" string must contain an \"=\" sign.";
>                       }
>                       if ( count( $this->tables ) <= $i + 1 ) {
>                               return "Error: too many \"join on\" conditions.";
>                       }
>                       $aliases = array_keys($this->tables);
>                       $alias = $aliases[$i + 1];
>                       $this->joins[$alias] = [ 'LEFT JOIN', $joinString ];
>                   }
>               }

(Sorry, I'm travelling and won't be able to do any more testing until Nov. 24.)

Change 973891 had a related patch set uploaded (by Alex Mashin; author: mashin):

[mediawiki/extensions/ExternalData@master] Remove unnecessary checks

https://gerrit.wikimedia.org/r/973891

Change 973891 merged by jenkins-bot:

[mediawiki/extensions/ExternalData@master] Remove unnecessary checks

https://gerrit.wikimedia.org/r/973891

Waiting till the 24th of November.

Sorry, I spoke too soon, cache was in action. Something is not right, while the join seems to be done, queries, which worked in 1.35, don't give the same result now.

(Yes, JOINs work now (git master, Nov. 28).
Thank you!)

Could you elaborate on what is going wrong, or at least what changed?

Yes. The fix for JOINs actually seems to work (I had error reporting off and misinterpreted the problem).

If add my fix for dealing with table prefixes (https://phabricator.wikimedia.org/T351000) to git master version, then everything works fine.