Page MenuHomePhabricator

Show an auto-generated article description when a user-contributed one is unavailable.
Closed, DeclinedPublic

Description

For articles that do not have a user-contributed Wikidata description, it would be beneficial to present an auto-generated description, using an existing tool such as Reasonator (http://tools.wmflabs.org/reasonator) or AutoDesc (http://tools.wmflabs.org/autodesc), or a derivative thereof.

At worst, the result would prompt potential contributors that articles *can* have descriptions, and that the current article needs a better one. And at best, the auto-generated description might be good enough to obviate the need for a human-generated description altogether.

Event Timeline

Dbrant raised the priority of this task from to Needs Triage.
Dbrant updated the task description. (Show Details)
Dbrant subscribed.

I think this could be added to the node.js service if needed. So, there wouldn't need to be a change to the app for this feature by itself.
But before we add something like this the app would have to be able to let the user edit the description from the app and store it in Wikidata.

Agreed. I'm okay, in principle, with showing automatically generated
descriptions that are a bit suboptimal for whatever reason. But only if the
user can fix it. That's the true spirit of Wikipedia.

@Tgr, we just talked about this in the team. T99895: [Epic] Article placeholder based on data from Wikidata is about the situation where there is a Wikidata item but no article in the users language. The idea is to show the user - instead of a "nothing found" error message - a nice rendering of what we know about the topic and possibly help him starting a new article in his language.

This ticket here is, as far as I understand, about the situation where there is an article in the users language but no Wikidata description in that language. It seems this ticket is suggesting to extract an abstract from the existing article or auto-generate a description from the Wikidata statements. Our Wikidata's team conclusion is that both is most probably a bad idea. For example, it gives the wrong impression that user contributions are not needed any more. It will create the conflicting situation of users deleting descriptions that happen to be identical to the auto-generated one, leaving third-party users of the data with nothing. @Lydia_Pintscher can explain this better, I think.

What Thiemo says. I think auto-generating descriptions is a very bad idea. They are just not good enough at this point to really capture why a certain concept is important.

Why can't we have both?
It seems there is a desire to exploring using this so I think we should try it and if they turn out terrible then we learn something.

Putting my coding hat on, a description could also have an associated flag (manual/automatic) that is returned by the API and in queries a caller could specify whether automatically generated descriptions are acceptable.

Personally I would love to use this and I have a bunch of ideas in my head and I'm sure @Magnus does too so why not enable your users to use it? This is how innovation starts :)

Anyway.. isn't this how Wikipedia got traction? Auto-generated town/city stubs that people dived into edit. Why wouldn't we want to try something like this whilst Wikidata is in its infancy?

Then let's clearly set expectations and criteria beforehand as to what constitutes success and why we're actually doing it. Just because we want to/can isn't enough imho.

FWIW: The auto-generated town articles you mention are imho a perfect example of something gone wrong. A significant percentage of them are still around - largely unchanged and outdated - and broken on wikis other than enwp where they were done.

@Lydia is right in that the bot-generated articles were/are generally considered a nuisance. But that is mainly, and you said, because they were never updated. It is, thus, a point against flooding the manual description field with automatic descriptions. It is not an argument against dynamically generated ones, even if cached in a field. Which would, IMHO, be the ideal solution; in addition to label/alias/description, the wb_terms could have an "autodesc" type that is updated as required. However, that requires tight integration with wikibase, which I don't see as the way forward right now. Code review by WMF has proven to be far to slow and inflexible for developing new code, especially code as complex as automatic descriptions would require.

I think the way forward is to develop the algorithm independently, Labs tool or dedicated VM, and keep it flexible enough to plug it into wikibase once it's working properly for many languages and most items.

Bot-generated articles are a stray argument. Precisely *because* bot-generated articles were an issue, we need to cover topics nobody bothered to cover without expensive static free text, in a sustainable way as Wikidata statements are.

As for "wrong impression that user contributions are not needed any more", I doubt it. The moment someone bothers to actually create an article, description or whatever, then it's automatically out of the long tail of topics nobody bothers about.

This ticket here is, as far as I understand, about the situation where there is an article in the users language but no Wikidata description in that language.

It would be useful to clarify the task title/description. In T99895, as here, 95 % of the work is in i18n/language engineering to make the text acceptable in many languages.

If the user doesn't like the auto-description then the user can fix the two or three wikidata statements used to generate the description. This will fix the description in all languages so this is what we should be encouraging. Fixing the English description will just fix the English description.

I am shocked the description of this bug does not mention improving Localisation which, for me, is the most important thing about this bug.

Currently quite a lot of the existing descriptions on Wikidata are bot created.

From pressing a few times on Random Item:
Q19410337 -> "street in Dongen, the Netherlands"
The bot User:RobotMichiel1972 that created this item put description in Dutch and in English. Automatic fallback description in other languages in the same format would be useful.

Q899453 Halve of the claims are created by https://www.wikidata.org/wiki/User:JhsBot and https://www.wikidata.org/wiki/User:Dexbot .

There are currently many music albums that don't have descriptions. I find it unlikely that humans want to write the descriptions for them. When using the search functions the fact that it's not immediately clear that those are music albums can create problems when their names are also used in other contexts.

It's not easily visible for a user that those claims are bot created. If automatic description fallbacks would be shown in a similar way as language fallbacks are shown, it would make it clear that the existing description are only bot created.

The goal doesn't have to be to show why an item is important. It's simply about describing the item.
Let's say I'm searching for an item that's called "winner". There are two item that represent music singles with Q8025588 and Q2104521. If an autogenerated description would tell me that they are music singles I would know that they aren't what I'm looking for and a bit later in the list I find Q18560095 that corresponds to the winner property. If it would show me "subclass of person" as a description for the Q18560095 I could easily find it even through it's the 18th hit in the list and those two music single came before it.