Page MenuHomePhabricator

JSON-based page list
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):

Allow users to store JSON-based page lists in (both local and Commons) data namespace (see: T252711, T305571). A page list is a JSON list, which each entries having the following information: (either Title or QID is required)

  • Title - The displayed name of the entry (without disambiguation); by default the label of the Wikidata item
  • QID - The Wikidata QID of the topic
  • page name - Name of local pages of the entry; by default the sitelink of the Wikidata item (may be non-existent, see T123021), or if QID is not defined, the title
  • Any other fields users may define, including free-form Wikitext (T134658: Allow wiki markup data type)
    • This may include some fields that are populated by Wikidata, but can be overrided locally

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

To populate lists like https://en.wikipedia.org/wiki/List_of_2016_United_States_presidential_electors.

A more complex case is https://zh.wikipedia.org/wiki/%E7%AC%AC%E5%8D%81%E5%9B%9B%E5%B1%8A%E5%85%A8%E5%9B%BD%E4%BA%BA%E6%B0%91%E4%BB%A3%E8%A1%A8%E5%A4%A7%E4%BC%9A%E4%BB%A3%E8%A1%A8%E5%90%8D%E5%8D%95, which is a table-based list, and https://zh.wikipedia.org/wiki/%E5%85%A8%E5%9B%BD%E4%BA%BA%E6%B0%91%E4%BB%A3%E8%A1%A8%E5%A4%A7%E4%BC%9A%E5%B1%B1%E4%B8%9C%E7%9C%81%E4%BB%A3%E8%A1%A8%E5%9B%A2 contains an extract of one of its section in plain list form.

This will allow user to create lists like https://en.wikipedia.org/wiki/Comparison_of_text_editors. Currently they can only be edited as raw Wikitext (and there are cases that some careless edits will mess up the wikitable syntax). Wikitable is not machine-readable either. With a JSON-based page list, spreadsheet-like cell editing can be used (T134618: Epic: Implement spreadsheet-like cell editing for tabular data)

Benefits (why should this be implemented?):

  • Allow the same list be included in different pages
  • Allow building navboxes from a given list, or a part thereof (e.g. most of navboxes in https://en.wikipedia.org/wiki/Chuck_Schumer can be extracted from Wikipedia articles)
  • A more user-friendly editing interface
  • Wikidata-based error checking (both ways - a report of inconsistency may indicate errornous or outdated data in local wiki or Wikidata)
  • List will be machine-readable, and importable/exportable (T134617: Implement CSV/TSV import/export for tabular data set)
  • Reduce the workload of disambiguation (e.g. if a name exists in 10 lists, moving the page will require changes to 10 pages; also see the list of navboxes of Chuck Schumer above, if the page is moved all navboxes must be updated manually)
  • Some lists may potentially use a large numbers of Wikidata items, which may touch the entityAccessLimit or expensive function call limit. Introduce an intermediate page will make such Wikidata usage indirect (if it is stable, data may be duplicated in the JSON page; otherwise, a cache layer may be introduced) so that performance may be improved.

Note this is different from Wikidata-based automatic list (T67626: [Epic] Support for queries on-wiki (automated list generation)) in the following ways:

  • Wikidata does not allow user to store free form Wikitext but many lists such as https://en.wikipedia.org/wiki/Comparison_of_instant_messaging_protocols contains free form Wikitext.
  • Wikidata-based automatic list require users to match all entries to Wikidata items (and create one if not exists), which is difficult for existing lists (e.g. list of members of organizations, such as List of 2016 United States presidential electors above) with hundreds of entries. (In Chinese Wikipedia, there are hundreds of such navboxes.)
  • Some lists are either stable (such as List of 2016 United States presidential electors above; the actual entries never changes, though the page names may be moved due to disambiguation), or updated very infrequently (such as list of US presidents). Use a stable list will reduce the potential of vandalism.
  • A list with defined fixed membership can reduce calls to potentially expensive WDQS queries (cf T67626#9553372). Note WDQS queries are still useful in consistency checks, but calling WDQS on visit with result displayed on-the-fly and calling it only for consistency check are totally different things.

Event Timeline

At least on the English Wikipedia, completely rearchitecturing lists like that, especially if based on Wikidata, is almost certain to fail to achieve consensus.

In my proposal it will still allow local community to fully control the content of list content (i.e. not using any data from Wikidata), although the actual data will be located in a separate page like https://en.wikipedia.org/wiki/Data:List_of_2016_United_States_presidential_electors.json

As an much simplified example (note what I propose is much more complex and the following only contain parts that need to demonstrate how things will work):

https://en.wikipedia.org/wiki/Data:List_of_current_United_States_senators.json

{ "data":
  [
    { "@name": "Tommy Tuberville", "Class": "Class 2", "Party": "Republican", "Born": "September 18, 1954", "Occupations": ["College football coach", "Investment management firm partner"], "Alma mater": "Southern Arkansas University", "Elected": "January 3, 2021"},
   ...
  ],
  "fields": [
    "Class": ...,
    "Party": {
        "suggested-values": ["Republican", "Democratic"],
        "display-style": {"abbreviation": ...},
        ...
    }, 
    ...
  ]
  "default-order": "state, class"
}

https://en.wikipedia.org/wiki/List_of_current_United_States_senators:

{{#include: Data:List_of_current_United_States_senators.json | columns=State, @link/Senator, Party, Born, Occupations, Alma mater, Elected, Class | display=table }}

https://en.wikipedia.org/wiki/Template:Current_U.S._senators:

{{#include: Data:List_of_current_United_States_senators.json | columns=State, @link/Senator, Party%abbreviation | display=navbox }}

In the example above Wikidata is not involved at all. What I also propose is a spreadsheet-like editing of data and a wizard to edit the meta fields.

I am going to make a demo of more proposed feature here: https://www.mediawiki.org/wiki/Module:JSONPageList. Watch this space, but there are currently no expected time when this will be completed.

As demos:

Allow the same list be included in different pages

You can already do this with templates (or even using another article as template via labeled section transclusion)

A more user-friendly editing interface

Are you sure editing JSON is more user friendly than editing a table? (Either wikimarkup based, which can even be edited via GUI, or template based which is pretty much the same as JSON complexity wise but uses familiar syntax)


To be honest I am not sure how this would be helpful as opposed to having such lists generated automatically. I see value in having a list of senators that automatically updates basing on WD, this way a smaller Wikipedia would not need to update everything manually after a general election. But I see little value in having such lists still manually maintained but now adding JSON to list of required skills to every single Wikipedian. In my opinion this is different from the current Data pages on Commons which deal with more raw data such as numbers and that do imply high degree of technical expertise to deal with.

Allow the same list be included in different pages

You can already do this with templates (or even using another article as template via labeled section transclusion)

Yeah, if you view the source code of https://zh.wikipedia.org/wiki/%E5%85%A8%E5%9B%BD%E4%BA%BA%E6%B0%91%E4%BB%A3%E8%A1%A8%E5%A4%A7%E4%BC%9A%E5%B1%B1%E4%B8%9C%E7%9C%81%E4%BB%A3%E8%A1%A8%E5%9B%A2, it use modules to extract information from 14 other articles, and the module can parse two complete different format (plain list and wikitable). However the drawbacks are:

  • new users may easily mess up wikitable
  • this does not provide Wikidata integration unless you warp each row over a template that uses Wikidata
    • Note: for a stable list, what I propose is not transcluding data on-the-fly from Wikidata (since 1. this will make a page access many Wikidata entities potentially hitting entityAccessLimit; 2. thisincrease the potential of vandalism) instead, stable data are stored locally and can be compared with Wikidata via a new special page such as Special:Inconsistency.
  • data is not machine readable, and there are no type safety for data

Another issue is many Wikipedia communities reject some overdetailed information to be included any articles (and also not suitable to be included in template, since such template have no potential usage in Wikipedia), such as a list of all guests of a TV program. But it can be included in a new "central" site.

A more user-friendly editing interface

Are you sure editing JSON is more user friendly than editing a table? (Either wikimarkup based, which can even be edited via GUI, or template based which is pretty much the same as JSON complexity wise but uses familiar syntax)

JSON only described the internal storage method - the proper edit interface should be similar to Microsoft Excel.


To be honest I am not sure how this would be helpful as opposed to having such lists generated automatically. I see value in having a list of senators that automatically updates basing on WD, this way a smaller Wikipedia would not need to update everything manually after a general election. But I see little value in having such lists still manually maintained but now adding JSON to list of required skills to every single Wikipedian. In my opinion this is different from the current Data pages on Commons which deal with more raw data such as numbers and that do imply high degree of technical expertise to deal with.

  • For now data page is a Commons-only feature and not available in other wikis (though it is very trivial to enable it elsewhere).
  • Commons data page does not integrate with Wikidata. In my proposal fields can be populated locally, from Wikidata or even derived from other fields.
  • There are no native way to display Commons dataset as list, as wikitable, as navbox, let alone querying (which is why https://en.wikipedia.org/wiki/Template:Tabular_query exists). In the long-term future I even imagine we can even edit the JSON page list-generated content in VisualEditor (which will funnel edits to Wikidata if necessary).

Also, a central wiki (such as Commons) can host list with excessive detail that is not suitable for Wikipedia (i.e. list is content by itself, template is not). For example, Wikipedia currently does not (and will likely never) have article on local elections of each US county/city in 2023, let alone their candidates or results (e.g. Wikipedia does not have an article for list of all former and current member of city council of Palo Alto, California). But those can be hosted in the new project. (Wikidata is not suitable to host it for now, since we need to match each entry to Wikidata item and create new ones if they does not exist, which can not be done in short time for more than 10000 cities in the US. But once it is done, we would have easy way to sync data between the list and Wikidata in a proposed Special:Inconsistency page.)

Wikipedia may be benefit from this project even if there are no current article for this exact list. For example, the Wikipedia article of Glendale, California currently contains a list of names of its mayors. If we have a list of mayors in a machine-readable format, Wikipedia can simply transclude a "view" of it. For the list of mayor of Glendale, California, the comparison of approachs to store it:

Automatically generate it via query

  • Require automated list generation
  • Moderation may be challanging (depending on how we materializes results and track its histories, and propagate it to client)
  • List is not directly editable (though tool may be developed to indirectly edit wikidata)
  • Does not handle entries without Wikidata items
  • Local community have no control of data

Store it in item of the city or the position

  • Currently available
  • Not scalable for thousands of members
  • Performance issue (accessing items may be expensive)
  • List is editable, but not directly in client; reordering is tricky
  • Support entries without Wikidata items in a hacky way (somevalue)
  • Local community have no control of data

JSON-based page list (for stable list)

  • List is directly editable (since data is stored/duplicated in the list, and can be sync'd with Wikidata)
  • Support entries without Wikidata items
  • Local community can have full control of data
  • Little performance concern (we do not get data from Wikidata directly. Even for dynamic list, data can be cached for an indefinite time)