Page MenuHomePhabricator

[Epic] Article placeholder based on data from Wikidata
Closed, ResolvedPublic

Description

User story: As a user searching for information in my language I want to get basic information about a topic even if my Wikipedia doesn't have an article about it yet. I want to see a placeholder based on data from Wikidata. I want the option to create an article based on this data easily.

Community input: https://www.wikidata.org/wiki/Wikidata:Article_placeholder_input

Important things to keep in mind:

  • Wikidata does not aim to replace article writing. The development team has no ambition to try to have Wikidata write articles about complex topics that need more than just data to be fully understood.
  • We do want to give people access to information in their language to the best of our abilities.
  • Wikipedias and co will need to have a way to opt out.
  • This feature is especially important/valuable for small projects with a small contributor base.

Related Objects

StatusAssignedTask
Resolvedthiemowmde
DeclinedNone
InvalidLydia_Pintscher
ResolvedLucie
ResolvedLucie
InvalidLucie
OpenNone
ResolvedLucie
ResolvedLucie
ResolvedLucie
OpenNone
OpenNone
Resolvedhoo
ResolvedLucie
Resolvedoschposch
Resolvedhoo
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
OpenNone
DeclinedNone
OpenNone
ResolvedLucie
ResolvedLucie
DuplicateLucie
ResolvedLucie
DuplicateLucie
ResolvedLucie
OpenNone
ResolvedLucie
ResolvedLucie
ResolvedLucie
Resolvedhoo
ResolvedTobi_WMDE_SW
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
Resolvedaude
Resolveddaniel
Resolveddaniel
ResolvedLucie
ResolvedLucie
ResolvedLucie
OpenNone
Resolvedthiemowmde
OpenLydia_Pintscher
ResolvedLucie
OpenLucie
ResolvedLucie
ResolvedLucie
ResolvedLucie
Resolved Jonas
OpenNone
ResolvedLucie
ResolvedLucie
ResolvedLucie
ResolvedLucie
ResolvedLucie
ResolvedLucie
ResolvedLucie
ResolvedLucie
ResolvedJanZerebecki
ResolvedJanZerebecki
DeclinedNone
DuplicateLucie
ResolvedLucie
ResolvedLucie
ResolvedLucie
Resolvedhoo
ResolvedLucie
ResolvedLucie
Resolved Jonas
ResolvedAklapper
DeclinedDereckson
Resolvedcsteipp
ResolvedJanZerebecki
DuplicateNone
Resolvedhoo
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedLucie
ResolvedNone
ResolvedLucie
OpenNone
InvalidLucie
DeclinedLucie
ResolvedLucie
InvalidLucie
ResolvedLucie
OpenNone
Resolvedhoo

Event Timeline

daniel created this task.May 21 2015, 12:41 PM
daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a subscriber: daniel.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 21 2015, 12:41 PM
daniel updated the task description. (Show Details)May 21 2015, 1:02 PM
daniel set Security to None.

I think we should add another two choice when the local page doesn't exist:

  • forward to an edit form with preloaded text, which can be set by Lua.
  • Load ContentTranslation to translate the page from a given wiki.
Lydia_Pintscher renamed this task from Create Special:ShowEntity on the client to article placeholder based on Wikidata data.Jun 5 2015, 10:18 AM
Lydia_Pintscher triaged this task as Normal priority.
Lydia_Pintscher updated the task description. (Show Details)
Lydia_Pintscher added a subscriber: Lydia_Pintscher.

Reworked description of this ticket to be more open. We need to do conceptual work here first.

Lucie claimed this task.Jul 1 2015, 2:29 PM
Tpt added a subscriber: Tpt.Jul 8 2015, 3:59 AM
Ricordisamoa added a subscriber: Ricordisamoa.
Tgr added a subscriber: Tgr.Jul 8 2015, 5:21 AM
Harej added a subscriber: Harej.Jul 12 2015, 6:10 PM

Will this discourage the creation of articles if there's a "good enough" auto-stub?

Lydia_Pintscher added a comment.EditedJul 12 2015, 6:29 PM

Will this discourage the creation of articles if there's a "good enough" auto-stub?

We'll be building it in a way that it does encourage proper article creation.
I think it's fine though if we discourage the really bad kind of stub though that just quickly gets outdated and offers no additional information beyond what Wikidata can offer through this.

In the end though we'll have to measure what impact this has and where/how we need to tweak. This will be part of Lucie's bachelor thesis.

jeblad added a subscriber: jeblad.Jul 12 2015, 9:43 PM

There is a page Wikidata:Article placeholder input about this, but the page is at Wikidata and the concept discussed is about a page put on Wikipedia. The page as such address the wrong audience, it isn't the Wikidata community you want feedback from, it is the community at Wikipedia.

Given some discussions lately in the community at nowiki about use of values from statements at Wikidata, I wonder if this special page is doable at all without a lot of discussions with the local communities. The kind of stupidly chained sentences from Resonator will not (ehm) resonate very well at Wikipedia. Just to make an example; the bots run at Swedish Wikipedia is not accepted at Norwegian (Bokmål) Wikipedia, and although they got a preliminary "yes" on Norwegian (Nynorsk) Wikipedia I think the project died.

Discussion at Wikipedia:Samfunnshuset/Arkiv/2014#Svenska sjøar (Signpost at Nynorsk) and Wikipedia:Torget/Arkiv/2014/august#Look to Sweden (Signpost at Bokmål).

It was brought up on various non-Wikidata venues. You can see several non-Wikidata people giving input.
And of course this isn't going to be a piece of cake. And just like everything around Wikidata I don't expect every Wikipedia to use it. They can as usual opt-in. And it's our job to make it work in a way that they'll want to do that.

This task describes two different problems; one is how a reader can be given the available information from Wikidata, and another is how an editor can use this information to create a new article. Those two questions should lead to very different user interfaces. The first problem also has two subproblems; one is about how you phrase a text that describes the current dataset, possibly in natural language, and the other is about how you infer some related information. The proper balance between the current information and the inferred information is not obvious, and how this balance is (or should be) skewed due to generation of natural language is an open field of research.

It should also be noted that it is pretty easy to generate text in some languages, while it is extremely difficult in other languages. Expect problems in expressing text due to shift in gender due to age of the subject, plural form due to the distance between subjects, or other rules like existence of certain letters used in special relations.

We will not be generating any text.

I'll copy some of the initial clarifications from the wiki page to the task description.

jeblad added a comment.EditedJul 12 2015, 10:34 PM

Btw, I have done some work on how to do NLG based on semantic data. The wiktionary problem is easy compared to this. It isn't difficult to generate some text, but it is very difficult to make it feel "natural" (whatever that is) and construct it in such a way that the text snippets are reusable.

For a simple introduction of what can be done on Wikidata I would say read Building natural language generation systems (ISBN 978-0-521-62036-8) The proposed "Messages" are built from statements, DocumentPlan and Constituents are what you want to express, Phrases are how you want to express it, and then the surface text is more or less obvious. The problem is how to describe those specialized "entities" such that they can create an usable structure depending on the available statements. It seems like a DocumentPlan is given by the type of Item, and that Messages are given by the Property, and that Phrases and Constituents ties them together. The microplanner meets the planner and must figure out which edges to keep sense and which ones to throw away. From there on it is a pretty straight forward surface realization.

But then you can build intervened models of neural nets and perhaps end up with something readable in a much simpler way.

As I said: We will not be generating text.

Given the resistance against using property values in templates,.. I don't know if a pure wikidata-item-special-page will be accepted. I really don't know.

leila added a subscriber: leila.Jul 12 2015, 11:02 PM
Lydia_Pintscher renamed this task from article placeholder based on Wikidata data to [Epic] Article placeholder based on data from Wikidata.Aug 14 2015, 11:36 AM

What's the subtask on how to select which subjects are relevant enough to be served as article placeholders? I see T109437: [RFC] Get to article placeholder from red link/404 article but it doesn't mention selection; I'll comment there.

@Nemo_bis: We'll start with displaying links to it in search so T109438 would be a good place to brainstorm about when to show links to placeholders in search results similar to how Italian Wikipedia does it already.

-jem- added a subscriber: -jem-.Nov 5 2015, 11:51 AM

@Lucie I think you can close this once you've uploaded the pdf to Commons and linked it here

\o/

hoo changed the status of subtask T109458: [Story] CDN cache article placeholders from Open to Stalled.Aug 14 2016, 5:29 PM
Lucie added a comment.Aug 30 2016, 1:36 PM

My Bachelor's thesis on Generating Article Placeholders from Wikidata for Wikipedia - Increasing Access to Free and Open Knowledge can be found here:
https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf

Lucie closed this task as Resolved.Aug 30 2016, 1:36 PM
Lucie moved this task from Broader tickets to Done on the ArticlePlaceholder board.
hoo changed the status of subtask T109458: [Story] CDN cache article placeholders from Stalled to Open.Jan 10 2017, 5:40 PM
Addshore removed a subscriber: Addshore.Apr 3 2017, 4:27 PM