Page MenuHomePhabricator

Improve Wikidata handling of duplicate references in model and UI
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):
See https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Duplicate_References_Data_Model_and_UI

  1. Condense internal JSON storage for duplicate references
  2. Modify the Wikidata UI for editing duplicated references

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
As an example see Q21481859 in Wikidata, which has almost 3000 authors who (should) all have the same reference; the duplicated reference data accounts for over 1 MB of the 4.4 MB size of the item. Wikidata items have a maximum JSON file size of about 4.4 MB so the reference duplication has made this and similar items almost un-editable.

See also comments on the Wikidata RFC - the DuplicateReferences gadget and the "UseAsRef" script are widely used.

Benefits (why should this be implemented?):
First, this would help significantly reduce the size of many large Wikidata items, making them more usable and editable.
Second, this would allow a number of UI changes to improve the experience of adding and maintaining references in Wikidata.

I will also link some related tasks that may be resolved through this work.

Event Timeline

Michael subscribed.

(Removing MediaWiki-extensions-Wikibase-Client and adding MediaWiki-extensions-WikibaseView as this is not about Wikibase as deployed on client wikis like wikipedias, but about the UI of the Wikibase repository.)

Maybe we can title this task like "Reusing a reference should be easier and more efficient" - so to describe the problems, before proposing the solution.

and... thinking about possible solutions,

It may make sense to rely on "P248 (stated in)". I mean, instead of a reference like <stated in enwiki>, we could have a <stated in ITEM_DESCRIBING_THE_SOURCE>.

So, we can have 1 item describing that exact reference, so the same reference can be easily re-used millions of times, by just adding more <stated in> and pointing to the very exact item.

Thinking in long terms, to don't pollute the main namespace, it may also make sense to have a dedicated namespace like "Reference:", I mean, like "Lexeme:".

Incidental benefit of using "stated in":

  1. almost zero software development on Wikibase (the standard UX already supports this)
  2. it's something that Wikimedians already know ("stated in" is already known)
  3. some wikis probably already have basic Lua support to render a "stated in" reference (citation needed...)
  4. more efficient storage (references are not duplicated anymore at all costs)
  5. easy to be implemented during data entry (and a gadget could be created to merge such references in this new way)

What do you think about?

I'm also very curious what WikiCite users think about this task and about the last comment.

https://meta.wikimedia.org/wiki/WikiCite

Edited: I've notified this task to the attention of the Wikicite Telegram chat.