Page MenuHomePhabricator

Choose a field length for entity id columns and stick to it
Open, MediumPublic

Description

Database table fields supposed to contain (serialized) entity ids have a wide variety of definitions:

  • change_object_id varbinary(14) NOT NULL
  • term_full_entity_id VARBINARY(32) DEFAULT NULL (this one will go away with the terms table eventually)
  • eu_entity_id VARBINARY(255) NOT NULL
  • epp_redirect_target VARBINARY(255) DEFAULT NULL (this one will go away with the entity per page table eventually)

We should find a common length for these and stick to that.

Event Timeline

Once you've decided which size goes best for the needs and the future, just let us (dba) know. Obviously the smaller the better, but that is probably not possible :-)

I had to modify it on an installation. I modified in LocalSettings.php the parameter $wgWBRepoSettings['string-limits']['multilang']['length'] (default is 250) and augmented accordingly the two columns term_text and term_search_key of the table wb_terms (default is 255). Is there anything else to change?

Also it is worth noting that the unit in MySQL is the byte for strings encoded in UTF-8, at the contrary of the number in LocalSetings.php which is the character, hence it would be better to fix the maximum length in MySQL to 4 times the number in LS.php else the strings are truncated in MySQL and the uniqueness constraint provided by these MySQL columns could be weakened for non-ASCII characters.

@Seb35 What you describe here sounds like T142691: [Bug] wb_terms table truncates labels exceeding 255 bytes, possibly leaving invalid UTF-8. This task is about the columns storing entity ids as string (like Q123 or P456).

Reedy moved this task from Change to Cleanup on the Schema-change board.

(While change_object_id belongs to a WikibaseRepo table, it’s still part of the change dispatching system, so we expect that this whole task belongs with the Wikidata Integrations Team.)