Page MenuHomePhabricator

Consider avoiding hardcoding integer sizes for blobs in abstract schema json
Open, Needs TriagePublic

Description

Example from change 666780:

{
	"name": "value",
	"type": "blob",
	"options": { "notnull": false, "length": 16777215 }
}

This seems a bit sub-optimal in terms of usability (I can't write this number by heart when creating a patch), as well as in terms of code review and debuggability (not obvious what it will do or why it is chosen).

I believe the above is the max value of the mysql int column types, based on 32-bit.

Questions:

  • Is this part of the json designed by us or directly fed to an upstream library?
  • Is this number treated special in our schema patch creation? E.g. will the effective schema be notably different if the value were 16217715 instead?
  • Do we actually want to support and encourage arbitrary numbers here? Or would it make sense to abstract this such that the common case is easy to get right (e.g. special type value like mediumblob that takes no length, or maybe a special length field or other property to indicate something like length: "32bit" or size: "medium". This would mean arbitrary numbers can be made hard (but possible), or even impossible until we want/need otherwise.

Event Timeline

  • Is this part of the json designed by us or directly fed to an upstream library?

Yes but we can control and change it before feeding it to the upstream library (see generateSchemaSql and DoctrineSchemaBuilder. I assume we can change it in DoctrineSchemaBuilder

  • Is this number treated special in our schema patch creation? E.g. will the effective schema be notably different if the value were 16217715 instead?

They are coming from https://github.com/doctrine/dbal/blob/2.12.x/lib/Doctrine/DBAL/Platforms/MySqlPlatform.php#L39 which means they will be treated differently if they pass the thresholds but usually it's pretty arbitrary inside the thresholds.

  • Do we actually want to support and encourage arbitrary numbers here? Or would it make sense to abstract this such that the common case is easy to get right (e.g. special type value like mediumblob that takes no length, or maybe a special length field or other property to indicate something like length: "32bit" or size: "medium". This would mean arbitrary numbers can be made hard (but possible), or even impossible until we want/need otherwise.

I think we shouldn't support and encourage arbitrary numbers but these numbers are actually meaningful in context of MySQL. Basically it's a trade-off. If we keep the numbers, it'll have better meaning across different DBMSes but if we change it to values like "medium", we will make it less arbitrary and more standardized. I'm fine with the latter but these values should be hard-coded in core somewhere (and not look it up from MysQL platform in doctrine as we want to be platform agnostic). Does that make sense?

I think we can do away with them by creating custom types for Blob, MediumBlob and maybe String type and then handling the numbers there. So in tables.json one will then only need to specify the type. We already do much more than that with custom types such as EnumType.

If we keep the numbers, it'll have better meaning across different DBMSes but if we change it to values like "medium", we will make it less arbitrary and more standardized.

The SchemaBuilder will still gets the numbers. If we make custom types, I believe the only effect of say specifying "mediumblob" is that it's what will be seen in tables.json. Much like a constant identifier. Under the hood in the type class, it will be transformed to the appropriate integer (or threshold) and before it's passed to the SchemaBuilder.

@Krinkle wrote:
  • Do we actually want to support and encourage arbitrary numbers here? Or would it make sense to abstract this such that […]

I think we shouldn't support and encourage arbitrary numbers but these numbers are actually meaningful in context of MySQL. Basically it's a trade-off. If we keep the numbers, it'll have better meaning across different DBMSes but if we change it to values like "medium", we will make it less arbitrary and more standardized. I'm fine with the latter but these values should be hard-coded in core somewhere (and not look it up from MysQL platform in doctrine as we want to be platform agnostic). Does that make sense?

I think we can do away with them by creating custom types for Blob, MediumBlob and […]

If we keep the numbers, it'll have better meaning across different DBMSes but if we change it to values like "medium", we will make it less arbitrary and more standardized.

The SchemaBuilder will still gets the numbers. If we make custom types, I believe the only effect of say specifying "mediumblob" is that it's what will be seen in tables.json. Much like a constant identifier. Under the hood in the type class, it will be transformed to the appropriate integer (or threshold) and before it's passed to the SchemaBuilder.

Yep, that makes sense to me. 👍