Page MenuHomePhabricator

Odd behaviour in cargo's lua method 'query'
Closed, ResolvedPublic

Description

Hey

First of all, great work introducing Cargo's lua support. Something I have been looking forward to for a long time. Thanks for all the amazing work!

However (sry), I noticed some problems with the implementation of the query method:

  1. Trying to use alternative names (headlines in a regular cargo query) produces odd fieldnames
  2. Trying to use mysql functions that use commas excempts that field from the result.

Let me give you an example (which doesn't make much sense in a lua context, I know):

local result = cargo.query(
  tables='customer',
  fields = '_pageName=Customer, CONCAT("[[", REPLACE(organizational_responsible__full, ",", "]], [["), "]]")=Boss, CONCAT("[[", REPLACE(technical_responsible_person__full, ",", "]], [["), "]]")=Admin, CONCAT("[[", technical_responsible_group, "]]")=Admingroup, ao=AO',
  args = {
    where='department="Department:25"',
    orderBy='name ASC'
  }
}

produces

result = {
  {
    ao=AO = 123456789,
    _pageName=Customer = 'Customer:00000002',
  },
  {
    ao=AO = '',
    _pageName=Customer = 'Customer:00000001',
  },
}

when I was expecting something along the line of:

result = {
  {
    Customer = 'Customer:00000002',
    Admin = '',
    AO = 123456789,
    Boss = '[[Person:user1]], [[Person:user2]]',
    Admingroup = '[[Group:group1]]',
  },
  {
    Customer = 'Customer:00000001',
    Admin = '',
    AO = '',
    Boss = '[[Person:user3]], [[Person:user3]]',
    Admingroup = '[[Group:group2]]',
  },
}

I then took a look at the source code in /CargoLua.library.php and found, that if you apply

diff --git a/CargoLua.library.php b/CargoLua.library.php
index 966438e..c2a8c66 100644
--- a/CargoLua.library.php
+++ b/CargoLua.library.php
@@ -64,20 +64,7 @@ class CargoLuaLibrary extends Scribunto_LuaLibraryBase {
                        $groupBy, $having, $orderBy, $limit);
                $rows = $query->run();

-           $result = array();
-
-           $fieldArray = array_map( 'trim', explode( ',', $fields ) );
-
-           $rowIndex = 1; // because Lua arrays start at 1
-           foreach ( $rows as $row ) {
-                   $values = array();
-                   foreach ( $fieldArray as $fieldString ) {
-                           $alias = $query->getAliasForFieldString( $fieldString );
-                           $values[$fieldString] = $row[$alias];
-                   }
-                   $result[$rowIndex++] = $values;
-           }
-
-           return array( $result );
+         array_unshift( $rows, null );
+         return array( $rows );
        }
 }

With this, the aforementioned query works as expected. Also, standard queries (without "headline" or mysql functions) seems to work properly.

I am reluctant to submit this patch though, because I am not quite sure why the data transformation in the two foreach loops was introduced in the first place. Maybe @cicalese can shed some light on this?

Event Timeline

Hello, @Oetterer.

I'm glad that the Lua functionality is helpful to you!

That's quite a query! I did not account for anything that complex in the code, and it will take some thought to decide the best way to handle it in conjunction with other concerns. In particular, the reason for the double loop is to fix some issues that I reported to Yaron with the original implementation:

The problem is that the array indices are not the original field names. They have underscores replaced with spaces (except for sometimes _pageName) and if the field name contains a period, the array index is on the part after the last period. So, if the query is for “CONCAT(Vocabulary_Enumerations._pageName)”, the corresponding array index in the result array to get that value is “_pageName)”. If the query is for “CONCAT(Associated_Page)”, the corresponding array index in the result array is “CONCAT(Associated Page)”. And, “CONCAT(_pageName)” becomes “CONCAT( pageName)”. This makes it annoying, error prone, and potentially fragile (in case this is fixed in the future) to access the query results in Lua.

My goal was to make the indices in the query result predictable. The disconnect here seems to be that I usually do not use aliases in my queries, whereas your query uses an alias for each field. Perhaps that is the key to a solution. When no alias is provided, the double loop is required to set the field index in the result. When an alias is provided, though, especially for such a complex query as yours above, it seems that the alias should be used for the field index in the result.

Does that make sense?

Cindy

Hey @cicalese

Thanks, now I see the problem. Have to think about a proper handling, myself. All I can say for now is that my suggested change would allow you to alleviate that problem by manually adding an alias CONCAT(Vocabulary_Enumerations._pageName)=Vocabulary_Enumerations._pageName would be indexed by Vocabulary_Enumerations._pageName. But that leaves you to manually adding an alias everytime you suspect running into such a problem. :(

One thing besides the alias introduction, though: your double loop will probably fail, if a field containes a comma (because of your $fieldArray-calculation). I used the complex CONCAT example above to show this. Notice how these fields are absent in my first result example (introduced with "produces").

PPS: Maybe you remember me from Berlin SMWCon Fall 2013? I have very fond memories about our in-depth conversations about Game of Thrones and Klingon grammar! :)

Regards,
Tobi

Hi, Tobi.

Of course I remember you from Berlin! :-) And, I always enjoy your polite and helpful bug reports.

Yes, I agree that you have shown that commas are problematic. And, a simple solution would be to use aliases for complex field specifications. It would be nice to come up with a solution that didn't require that, though. I will give it more thought and welcome any additional suggestions.

Cindy

Assuming I understand the issue correctly: using CargoUtils::smartSplit() instead of explode() might help. (There are a lot of places in the Cargo code where parsing has to be precise.)

Thank you, @Yaron_Koren. That does indeed look like exactly what would be necessary. I'll try to play around with a patch using CargoUtils:smartSplit() as soon as I have time.

Hey Cindy,

Today I stumbled upon another oddity in my cargo modules. It seems, the cargo lua library returns some html_special_characters encoded. I tested with a simple

fields = "field1, concat(field2, ' - äöü<>\"&')=field2"

which produced

{
  field1 = 'value for field 1',
  field2 = 'value for field 2 - äöü&lt;&gt;&quot;&amp;'
}

Yeah. The german special characters remain strangely untouched. Nevertheless I am unsure whether this encoding is intentional or not. Tests with a normal #cargo_query did yield the decoded characters, and so did a lua call to callParserFunction('#cargo_query').

Because I needed a result fast, I implemented a not-so-nice-looking hack. Maybe, if indeed this behaviour is unintended, you can use it to build a better solution on:

$rows = $query->run();

foreach ( $rows as $id => $row ) {
  foreach ( $row as $key => $value ) {
    $row[$key] = htmlspecialchars_decode($value, ENT_QUOTES);
  }
  $rows[$id] = $row;
}

Looking forward to hearing from you, best regards,
Tobi

Hi, Tobi.

Thanks for reporting this issue. I will take a look at it soon.

Thanks!

Cindy

I just checked in the change from explode() to CargoUtils::smartSplit() - it didn't seem like it would cause any problems, and will hopefully fix all but the last issue here.

Change 335649 had a related patch set uploaded (by Cicalese):
Use field alias, decode HTML special characters

https://gerrit.wikimedia.org/r/335649

I'm so sorry for the delay @Oetterer. I finally had a chance to look at this again. I just submitted a patch that does two things. If an alias is provided for a field, that alias is used as the index in the results table that is returned. And, I added the HTML special character decoding. I was able to test both capabilities, and they work for me, but I'd appreciate if you could test in your environment. Thank you for the suggestions, which make this functionality even more useful.

I was not able to test with the REPLACE function as in your original example. That would be pretty powerful, but I'm getting the following message:

Exception Caught: Error: the SQL function "REPLACE()" is not allowed.

@Yaron_Koren, was the REPLACE function disabled after this bug was filed last May?

There's a "whitelist" for SQL functions, though it's been in place since well before May. REPLACE() isn't in the list, though it probably should be. But you can easily get around this by adding the following to LocalSettings.php:

$wgCargoAllowedSQLFunctions[] = 'REPLACE';

I tried it out, and REPLACE now works great for me! That's really powerful!

I did find a string that makes CargoUtils::smartSplit() choke though:

'REPLACE(Extension_Name,"\"","\'")=ext'

It gives me:

Error: unclosed string literal.

If I remove the \", it works fine. The \' doesn't seem to bother it.

But, that's not relevant to this task. If the patch I submitted fixes the problem for @Oetterer, I think this task can be closed.

Change 335649 merged by Yaron Koren:
Use field alias, decode HTML special characters

https://gerrit.wikimedia.org/r/335649

Yaron_Koren claimed this task.

I guess this problem has been fixed now? Feel free to re-open if not.