Page MenuHomePhabricator

Investigate schema visualization tools for schema.wikimedia.org
Closed, ResolvedPublic

Description

To make it easier to read schemas on https://schema.wikimedia.org/#!/, look into tools that can render a schema in a way that is friendlier for humans to read than the raw file without manually duplicating the information in the schema.

Option 1: YAML -> JSON -> wiki page

One option would be to create a bot that would, on a regular cadence:

  1. Fetch the latest schema from schema.wikimedia.org
  2. Convert from YAML to JSON (https://pyyaml.org)
  3. Publish to a wiki page (pywikibot)

Here's an example of the MP web base schema rendered as a JSON page on Wikitech. This option has the advantage of being easily implementable using existing tools, but it does require the maintenance of the bot long term.

See also:

  • catalog-to-wiki: A now-deprecated bot that published content from an external source to a wiki page

Option 2: Integrate a tool with schema.wikimedia.org

I looked into several open-source tools that render JSON schemas in Markdown or HTML. However, since it seems that the code for schema.wikimedia.org does not live in a repository, implementing one of these tools would be complicated and likely require engineering resources. This would likely make schema.wikimedia.org more difficult to maintain as well.

See also:

Option 3: Custom script and static site

The Blubber docs use a custom script to generate a Markdown file based on a JSON schema and publish the result to a static site. However, this approach is the most complex of the three options.

See also:

Recommendation

Of these options, I think option 1 would be the easiest to implement, but it's up to Data Products whether the improvement in readability between the raw YAML and the wikitable is worth having to maintain the bot.

Event Timeline

apaskulin updated the task description. (Show Details)
apaskulin added a subscriber: Milimetric.

Hi @Milimetric, Would you be able to provide feedback on whether you think it's worth it to move forward with one of these options?

I think my input here is main that option 2 is very much feasible. The repositories these schemas live in have build steps that materialize new versions of schemas as X.Y.Z.yaml and the current version as current.yaml. In those build steps, custom js runs on nodejs. So it seems very simple to me to integrate a friendly rendering of any yaml files as part of the build step. The rendering function could be factored out into a script that can be used to bulk-render the existing yaml files.

I have tried to think about the effort involved here compared to running a bot and honestly think it's easier technically. It might involve more agreement and argument among folks using these schemas, but maybe we only need rough consensus.

As for which rendering to use, I personally like json-schema-for-humans, but I don't think I should get to pick here, that feels like an easy vote/survey.