Page MenuHomePhabricator

Allow multiple data pages to reuse the same header structure declaration
Open, Needs TriagePublic

Description

There should be a way for multiple tabular data pages to re-use header structure as declared in another page. This way multiple similar pages won't have to re-declare all of their headers with localization (per T134823).

"headers-ref": "Some Tabular Data Page.tab"

Restrictions:

  • if json contains headers-ref, do not allow headers, titles, or types values.
  • The referenced page must not contain a headers-ref itself (no double redirects).

Open Questions:

  • If master page changes, how should we deal with broken children pages? For example, when child page is requested, and the master does not match it, we can simply return an error. In other words changing master will automatically break all children. This implies that all masters should be edit protected by the community. which defeats the purpose of heaving it easily translatable for all users. One option would be to have a special edit right and a special flag, so if a page is marked with the "tabular master" flag, it can still be edited by anyone, but save will only happen if the number and the type of the columns has not changed. Only those with the special "tabular master editor" rights will be able to change the structure of the table.

Event Timeline

What if the header was always separate, like a schema? The code would only be a little simpler but it might be easier for everyone to think about it if there's only one way to define structure.

Hmm, we could in theory have this:
Data:Schema.tabschema and Data:Schema.tabschema/MyData.tab

Editing two pages might be a pain... Plus renaming might be tricky, and code might get overly complex...

It's true that pages are kind of expensive. But if you start re-using schemas, the talk page of the initial page that contains the schema might get overrun with people who are re-using it from other .tab pages, and make everything kind of confusing.

Event Logging users seem to be pretty good at dealing with Schema: pages, maybe a separate schema page isn't too bad. Why do you say the code would be more complicated though? On a side-track, would it be bad to re-use the Schema: namespace? Seems to fit and serve an almost identical purpose.

Talk pages for both the schema and data might confuse people regardless, and a warning could be added that this is the "schema" page vs "data" page (on the talk).
Event schema has a very small and highly dev-oriented design. It is not really meant for general non-tech consumption. Data on the other hand will be exclusively managed by general community, so I am not sure Schema approach is valid. Also, Schema namespace has a very specific meaning (event logging). Also, I hope the Data namespace on commons will eventually host other data types like geojson blobs (shapes), making the schema a bit confusing. Data type is defined by the page "extension".

Ok, sounds good to me. I mean, we can always reassess once it gets popular enough. Wouldn't be too hard to change and write a bot to split up schema from data if we really wanted to.

After some more pondering:

  • each dataset MUST have column's name and type, but MAY have localized title
  • a dataset MAY reference an external translation dictionary, e.g. Data:Weather.dic
  • Dictionaries also allow localization of the data, and not just the column titles. If there is dictionary attached to a dataset, values in the "localized"-type column may be strings instead of objects. The string is treated as a key in the dictionary, and all the data interfaces will get the proper localized string on request. If dictionary does not have a key, it will be returned as <key> (same as mediawiki when key is not defined). Note that this allows multiple rows/columns to reuse the same translation.
  • the dictionary will have a {id -> {lang -> string}} structure, and could easily be moved to a different storage system and integrated with translatewiki.
"data": {
  "id": {
     "language-code": "some string",
     ...
  },
}

This idea addresses the duplication of localized storage, but does NOT actually deal with enforcing the structural consistency between multiple datasets. I am a bit reluctant to introduce that - cross-page consistency is hard to maintain and easy to break, introduces a lot of cross-dependencies and code hacks, and I feel should be solved by social contract and bots rather than mediawiki code.