Page MenuHomePhabricator

Truncate string values before row insertion
Open, Needs TriagePublic

Description

Currently, Cargo stores entire strings into the underlying SQL tables and lets the table columns handle string truncation automatically. However, this may cause CargoStore::doesRowAlreadyExist to fail, because a truncated string in the table will never match the full string. This has at least two consequences:

  • Storing a string longer than the size of a string field will introduce duplicate rows when the table is repopulated.
  • If the string field also happens to have a uniqueness constraint, CargoUtils::escapedInsert may throw because a duplicate row is inserted but Cargo has no way to tell it. This breaks subsequent stores from the page, including the _pageData store. (IDatabase::insert may suppress the exception by passing the IGNORE option, then affectedRows can be used to check whether the row was successfully inserted.)

I managed to fix it on a local installation with the following:

@@ -237,6 +237,7 @@ class CargoStore {
        }
	public static function storeAllData( $title, $tableName, $tableFieldValues, $tableSchema ) {
+               $cdb = CargoUtils::getDB();
                $pageID = $title->getArticleID();
                $pageName = $title->getPrefixedText();
                $pageTitle = $title->getText();
@@ -337,6 +338,15 @@ class CargoStore {
                                } else {
                                        $tableFieldValues[$fieldName] = '1';
                                }
+                       } else if ( $fieldType != 'Coordinates') {
+                               // Page, String, etc.
+                               $size = $fieldDescription->mSize;
+                               if ( $size == null ) {
+                                       $size = $wgCargoDefaultStringBytes;
+                               }
+                               $tableFieldValues[$fieldName] = mb_substr( $curValue, 0, $size, 'UTF-8' );
                        }
                }

@@ -349,8 +359,6 @@ class CargoStore {
                // Allow other hooks to modify the values.
                Hooks::run( 'CargoBeforeStoreData', array( $title, $tableName, &$tableSchema, &$tableFieldValues ) );

-               $cdb = CargoUtils::getDB();
-
                // Somewhat of a @HACK - recreating a Cargo table from the web
                // interface can lead to duplicate rows, due to the use of jobs.
                // So before we store this data, check if a row with this

But I don't think it is appropriate to assume the text is always encoded in UTF-8, since the table might use a different character set; calling SQL's SUBSTR directly may work in this case. This is also not ideal, as it could be done before even CargoStore::blankOrRejectBadData to detect duplicate fields (which also calls doesRowAlreadyExist) at an earlier stage.

Event Timeline

Hi,

I have also had problems due to having some long strings not being truncated, but in my case the SQL insertion queries failed and no data was added to the database for the pages with those long strings. I am using MySQL 5.7. To fix this I just changed the type of the affected columns to text in my database, but I agree with HertzDevil that the strings should be truncated before being added to the database. Also, currently there is no easy way of knowing if something went wrong, if a string is truncated or if an insertion/update query fails there is no warning or error message anywhere.

Sorry for the long delay. I understand the problem, but I don't think this specific patch will work, because fields of type "Text" and "Wikitext" don't get truncated at all in the database - or if they do, it's after many more characters than the value of $wgCargoDefaultStringBytes. The same goes for fields of type "String", "Page", etc., if they have a "size" value that's set to more than 1000 - they get stored in the database as just a "Text" field. So I think it makes sense to do the truncation in doesRowAlreadyExist() instead - compare a substring of the new value with a substring of the database value, I guess by using both mb_substr() and SUBSTR().

@HertzDevil, @Tombolano - I just checked in an attempted fix for this problem, at 60ddb9cfc3dd . It would be great if either or both of you could test out my fix and see if it seems to work for you.