Page MenuHomePhabricator

Validate JSON-schema before allowing saves in the Schema namespace
Closed, DeclinedPublic

Description

Currently, the Schema namespace is validated by Extension:EventLogging, using a custom-written JSON validator, and a custom schema. The EventLogging server uses a stock JSON schema draft 3, however. This means that some documents which pass initial validation and can be stored as an EventLogging schema will crash the server in a way that sends alerts, and drops hours of data at a time.

We can improve this situation by replacing the MediaWiki extension validator with plain JSON-schema draft 3.

Update: Validation at both ends should use a "allOf" to enforce both JSON draft 3 as the base schema, and additional restrictions matching the new guidelines for Wikimedia EventLogging. Write a legacy schema to allow e.g. camelCase fields, and a strict schema for conforming newer schemas. The stricter schema can be introduced gradually, e.g. by "oneOf" and something like MediaWiki extension registration's "manifest_version" as a mandatory field when using the newer contract.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I was wrong about the premise of this task. There *is* JSON-schema validation happening, but it seems to have some gaps. Will post a minimal test case in a minute.

We're using a custom validation library, and a custom json-schema schema. However, the EventLogging server uses a vanilla draft 3 validator, so IMO this task is valid and we want the extension's validator to match the server's.

Change 585745 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/EventLogging@master] [WIP] Switch to plain json-schema draft 3 validator

https://gerrit.wikimedia.org/r/585745

The next step is to translate https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines into a formal schema and "allOf" or "extends" the base JSON-schema v3 and this custom validation.

@awight we will be deprecating on wiki schemas over the next couple of quarters. The EventLogging extension won't be used (by WMF) for managing schemas anymore.

We still have more work to do on documentation and instructions for developers on how to use the new system. Stay tuned! T238230: Decommission EventLogging backend components by migrating to MEP.

There is already extensive CI to ensure that users using new schema repos will abide by these guidelines. BTW, if you are willing to use the new system now, it is ready to start producing events. I would love to work with you on this if you are!

Milimetric moved this task from Incoming to Radar on the Analytics board.
Milimetric added a subscriber: Milimetric.

we don't plan on working on this, so feel free to reopen if you want to continue work on it (but maybe talk to us first :))

Thanks for talking me down from the ledge :-)

I'd be curious to see how the new system produces events from MediaWiki extensions, and would be happy to help crash-test the integration and migration. I'll keep an eye out for documentation.

Change 585745 abandoned by Awight:
[WIP] Switch to plain json-schema draft 3 validator

Reason:
Won't be needed!

https://gerrit.wikimedia.org/r/585745