Normalize ZObjects
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	DVrandecic
	Sep 13 2020, 7:04 PM

Description

The canonical view developed in T259932 is good for storing and for a reasonably compact view of ZObjects for human consumption. For machines to process it, it is more convenient to have it in a more uniform presentation.

The normalized representation of a ZObject is a tree where all leaves are ZObjects of the type Z6/String or Z9/Reference. No other types appear on the leaves, but the inner nodes could be of any type.

This also means, no escaping of strings will be needed as all strings are explicitly typed.

All Z10/Lists are represented by ZObjects of type Z10/List and not by arrays in the JSON representation. All objects of type Z10/List have either both the Z10K1 and Z10K2 key, or neither (i.e. it is an empty list).

So the following steps need to be done by the normalizer:

trim keys
sort keys
ensure that all strings are represented as explicit ZObjects of type Z6/String
ensure that all references are represented as explicit ZObjects of type Z9/Reference
ensure that lists represented by ZObjects of type Z10/List
ensure that all ZObjects of type Z10/List either have both keys or none
ensure that all ZObjects are trees with all leaves being either of type Z6/String or Z9/Reference (that should be a given if all previous steps are fulfilled)

Normalization must be performed on any input to any evaluation. It will considerably simplify writing function implementations.

Related Objects
Search...

Status	Assigned	Task
Open	None	T278321 Provide a special page listing invalid ZObjects
Resolved	None	T290119 Phase η root task
Resolved	None	T290118 User-defined types work with validation
Resolved	None	T291028 Instances of a user-defined type which should not validate indeed do not validate
Open	None	T278318 Let users validate an unstored ZObject / edit on a ZObject
Stalled	None	T269177 ZObject DAO should have a isWellFormed method
Resolved	cmassaro	T260861 Use the orchestrator/evaluator to validate instances of each Z4/Type using on-wiki defined Z8/Functions instead of hard-coded
Open	None	T278316 Provide a validation API endpoint for arbitrary ZObjects from both MediaWiki and the orchestrator
Open	None	T292240 Save Un- (or Under-) Validated ZObjects, Validate Asynchronously via Orchestrator, Then Annotate ZObjects As Valid or Not
Open	None	T278325 Store that a page contains an invalid ZObject in a tracking table
Open	None	T278320 Allow to store an invalid ZObject explicitly, so we can have failure examples for Validator functions' Testers
Open	None	T294045 Create a "Validate" Endpoint in Orchestrator
Duplicate	None	T273125 Migrate hard-coded validators to validation defined on-wiki
Open	None	T278319 Provide functionality for users to trigger a re-validation of a stored object
Resolved	None	T273124 Use evaluation for validation
Open	None	T278201 Provide a "magic" UX mode
Open	None	T278202 Alter the function model to have optional "fromString" and "toString" functions for Types, to allow easier input/output&debugging
Resolved	SimoneThisDot	T261471 Provide Special:Evaluate (or similar title) special page where users can trigger a function evaluation
Resolved	cmassaro	T260321 Allow users to trigger function calls for built-in functions (orchestrator-only)
Resolved	Lindsaykwardell	T280558 Normalize Inputs to Orchestrator
Resolved	arthurlorenzi	T275095 Move normalization and canonicalization code to function-schemata
Resolved	None	T262770 Normalize ZObjects

Event Timeline

DVrandecic created this task.Sep 13 2020, 7:04 PM

DVrandecic moved this task from Phase δ to Phase γ on the Abstract Wikipedia team board.

DVrandecic edited projects, added Abstract Wikipedia team (Phase γ); removed Abstract Wikipedia team (Phase δ).

DVrandecic moved this task from Phase γ to Phase δ on the Abstract Wikipedia team board.Oct 7 2020, 12:43 AM

DVrandecic edited projects, added Abstract Wikipedia team (Phase δ); removed Abstract Wikipedia team (Phase γ).

Jdforrester-WMF added a subtask: T266241: Normalizer and canonicalizer should deal with local and global keys.Oct 22 2020, 6:33 PM

Does normalisation include the suppression of all unnecessary whitespaces? It does not matter if the underlying storage uses a compact form, not easily readable by a human, given that the wiki editor could as well expand it with proper indendation for easier editing, and recompact it after.

As well, will we be able to insert comments in JSON-like objects, or will there be a way to separate the canonicalized and compacted form of the object separately from its source, to preserve some comments (which which syntax: like HTML/XMS with , C/C++/Java/JS with /* ... */ or // ... \n, SQL with -- ... \n, Lua with --[[ ... ]], sh/ksh/bash with # ... \n, Pascal with (* ... *), or others)/
May be that JSON form could be just stored in a cache, while the actual pages could use one of several other languages, the conversion being performed by a "Parser function", a new kind of Function, jsut like there are Renderers; and as there will bne a REPL, it will certainly not use that JSON syntax, but could as well use a Ture-like or Python-like syntax.

In my opinion, canonicalisation of the input can be made with a function as well. And to improve the speed of the function evaluator, you'll need to support a cache of results, so this cache should be pritable as well to store the canonicalized version of objects: there's no requirement for objects to use exclusively the JSON-like format with cryptic keys ands there are certainly better ways to represent it in a much simpler structure (e.g. we don't need many implicit or required keys such as Z1K1)

As well, the JSON data is also fully representable as a list of RDF triples (whose processing at large scale is very efficient and can be easily distributed and parallelized; for RDF, you would just need to create "reference" objects)

DVrandecic removed DVrandecic as the assignee of this task.Apr 6 2021, 2:28 AM

DVrandecic added a parent task: T275095: Move normalization and canonicalization code to function-schemata.

DVrandecic triaged this task as Low priority.Apr 7 2021, 4:45 AM

DVrandecic lowered the priority of this task from Low to Lowest.Apr 7 2021, 4:47 AM

DVrandecic closed this task as Resolved.May 5 2021, 4:46 PM

Jdforrester-WMF removed a subtask: T266241: Normalizer and canonicalizer should deal with local and global keys.May 6 2021, 4:19 PM

Normalize ZObjectsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Normalize ZObjects
Closed, ResolvedPublic
Actions

Related Objects
Search...