Page MenuHomePhabricator

[BUG] jsonschema-tools materializes fields in yaml in a different order than in json files
Closed, ResolvedPublicBUG REPORT

Description

jsonschema-tools materializes current.yaml JSONSchema files with $refs, etc. into dereferenced and versioned .yaml and .json files.

I just learned that the ordering of the fields in JSONSchema objects is different between the two different outputs. For dynamic languages like python or javascript, the order of the fields don't matter. However, for strongly typed systems like Java or SQL systems, field ordering can matter.

Deserializing the materialized yaml or json schema files should result in the exact same document, but at the moment, it won't.

To fix:

  • jsonschema-tools should ensure consistent field ordering in materialized files.
  • all materalized schemas in schema/event/{primary,secondary} repositories need to be rematerialized to ensure the existent schemas have consistent ordering.

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Bug Report". · View Herald TranscriptMay 16 2022, 3:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Alternatively, we could drop support for materializing yaml versioned files.

Ottomata raised the priority of this task from Low to Medium.

Change 809019 had a related patch set uploaded (by Ottomata; author: Ottomata):

[schemas/event/secondary@master] Bump to jsonschema-tools 0.11.0 to get consistent json and yaml serialization ordering

https://gerrit.wikimedia.org/r/809019

Change 809021 had a related patch set uploaded (by Ottomata; author: Ottomata):

[schemas/event/primary@master] Bump to jsonschema-tools 0.11.0 to get consistent json and yaml serialization ordering

https://gerrit.wikimedia.org/r/809021

Change 809019 merged by Ottomata:

[schemas/event/secondary@master] Bump to jsonschema-tools 0.11.0 to get consistent json and yaml serialization ordering

https://gerrit.wikimedia.org/r/809019

Change 809021 merged by Ottomata:

[schemas/event/primary@master] Bump to jsonschema-tools 0.11.0 to get consistent json and yaml serialization ordering

https://gerrit.wikimedia.org/r/809021

Ottomata updated the task description. (Show Details)

Need to do some testing to see if rematerializing all the latest schemas (which will reorder the fields in yaml) has an effect on Spark Refine ingestion. Hopefully not, but I'm worried that the reordering of struct fields might have be a problem

Need to do some testing to see if rematerializing all the latest schemas (which will reorder the fields in yaml) has an effect on Spark Refine ingestion.

Ah, it should not affect anything. Refine uses event-utilities EventSchemaLoader, which uses the extensionless schema symlinks, which point at the yaml. The code fix we did affects the order of the json files (I erroneously wrote 'yaml' in the above comment), which are not used by Refine.

We should be able to just rematerialize in order and merge.

Change 823700 had a related patch set uploaded (by Ottomata; author: Ottomata):

[schemas/event/primary@master] Rematerialize all .json files to ensure consistent ordering of fields in yaml and json files

https://gerrit.wikimedia.org/r/823700

Hm, however when re-materializing the schemas in schemas/event/secondary, I do get yaml file changes, especially for files that have been around for a while.

I'm tempted to not mess with the schemas in secondary repo. I was more worried about the primary ones.

@JAllemandou @Milimetric @phuedx ...what do you think about removing the .json files from the schema repositories altogether? I don't think we really use them, and maintaining both .json and .yaml files might be a little confusing. @gmodena has told me he's for removing the .json files.

@JAllemandou @Milimetric @phuedx ...what do you think about removing the .json files from the schema repositories altogether? I don't think we really use them, and maintaining both .json and .yaml files might be a little confusing. @gmodena has told me he's for removing the .json files.

+1.
My understanding is that yaml is the only format actually in use. If so, I think there is no point in supporting both; I'd lean towards the path of least resistance (even if suboptimal) and deprecate json support.

Resolving because we fixed the bug in jsonschema-tools. We have decided not to fix the ordering in our schema repo json files. Instead we are just going to remove those files in T315674

Change 823700 abandoned by Ottomata:

[schemas/event/primary@master] Rematerialize all .json files to ensure consistent ordering of fields in yaml and json files

Reason:

https://phabricator.wikimedia.org/T315674

https://gerrit.wikimedia.org/r/823700