Page MenuHomePhabricator

Heritage Monuments database/API seems to be lacking Wiki page links
Open, Stalled, NormalPublic


I was checking out Iran monuments on the Heritage API and it seems like they're loaded, but the links to the wiki pages unfortunately aren't there, even the bluelinks.

Note that those results are coming from here, where there are definitely pages that exist.

Event Timeline

mahmoud created this task.Aug 13 2016, 6:10 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 13 2016, 6:10 AM

The mapping assumes that the wiki article is the page which the name parameter wikilinks to.

Looking at the source for the page it however seems as though the links are being generated by either another parameter or by the template assuming that the name parameter can always be used as a wikilinks?

mahmoud added a comment.EditedAug 15 2016, 7:00 AM

So the "name" parameter here is "توضیح" which technically means "description" if you ask me.

Could you point me to the logic that determines the (localized) "name" parameter and also a valid/working table of this nature? Thanks!

So (if I understand things correctly) the template today simply wraps name/توضیح/description in [[ ]] to try and create links to articles. That is why most of them are red links.

So توضیح=test is rendered as [[test]].

What the bot currently expects is that the link is made explicit in the parameter. i.e for the above example it would expect توضیح=[[test]].

The reason for this is that there is no way for the bot (as it currently works) to know if the link is red and blue so with the current way the parameters are handled it would have to assume that they all exist, which is obviously not true. The benefit of the توضیح=[[test]] method is that you would only link the text where there is an article and of course you can use normal disambiguation such as توضیح=[[test (the other one)|test]].

Some countries also have the article link as a separate parameter in the template, the end result is similar to using توضیح=[[test]].

The mapping logic for the various parameters can be found here.

Lokal_Profil changed the task status from Open to Stalled.Sep 1 2017, 7:05 PM

Setting this to stalled as it requires a change on the side first to work (or a mechanism to validate links but that sounds expensive)

LilyOfTheWest triaged this task as Normal priority.Sep 4 2017, 8:45 PM
LilyOfTheWest added a subscriber: LilyOfTheWest.

@Lokal_Profil can you let us know what change should be made on the fawiki side?

Lokal_Profil added a comment.EditedSep 5 2017, 11:20 AM

So for the monuments database side it think (@JeanFred correct me if you disagree) it might be to expensive to, during each harvest, evaluate if the auto generated link goes to an existing page or not. To allow the article to be harvested would then require either a new parameter which gives the target for the wikilink (when one exists) or that the name/توضیح field is changed to include [[ ]] syntax when the value should be wikilinked. Both these solutions also allow you to link to articles that don't have the same exact name as the monument (e..g due to disambiguation). Out of these two I would recommend the first.

Note that Iran is not the only country with this issue, it is also true for many (most?) of the Latin American lists (PE, VE, CL, BO, CO, UY at least). These autowikilink the name unless, a link target parameter is provided in which case that is used instead, or a second link target parameter is used in which case you can link to multiple articles from the name filed. So I would suggest we figure out a general solution before we start recommending any country to start doing changes.

From a migration point of view (i.e. T174966: Move WLM-IR data to Wikidata). If the links get harvested by the monuments database then migrating them is not an issue. If they don't then it is possible to check if there an article exists with the same title as the name/توضیح field and if it is not a redirect and if so use that as the target item on Wikidata. This requires that you are confident that these links are always (or in the large majority of cases) appropriate because we will be inserting monument specific statements into the corresponding item.

Found this one again. @JeanFred do you agree that validating the links (likely for all datasets) would be expensive or do you think that is doable?

In general do we want to recommend any particular solution?