Page MenuHomePhabricator

Create KMB preview page
Closed, ResolvedPublic4 Estimated Story Points

Description

Set up a preview page of the generated information templates to spot any obvious errors.

Event Timeline

New preview being generated based on PR #21. When ready invite RAÄ to comment.

Note that images are expected to be over-categorised until the new list for T160054 gets added.

Note on preview -- Category:Area of national interest... should be ...Areas... as commonscats use plural forms.

And another: 16001000007155 -- Should be Churches in Gotland is categorized as both Churches in Gotland and Buildings in Gotland. The latter is not necessary.

Similarly, 16001000284664 -- BBR_unknown is both in Farms in Sweden and Agriculture in Sweden.

A minor one: Category:Media contributed by RAÄ: with potential duplicates doesn't need the colon in the middle.

Note on preview -- Category:Area of national interest... should be ...Areas... as commonscats use plural forms.

Uppdated in mappings. Also added the new Träindustri mapping.

And another: 16001000007155 -- Should be Churches in Gotland is categorized as both Churches in Gotland and Buildings in Gotland. The latter is not necessary.

Similarly, 16001000284664 -- BBR_unknown is both in Farms in Sweden and Agriculture in Sweden.

This is T160054 and should be resolved with that.

A minor one: Category:Media contributed by RAÄ: with potential duplicates doesn't need the colon in the middle.

I have used this (i.e. Category:Media contributed by RAÄ: ...) as the standard pattern for all of the maintenance categories.

In Hangout, @Ainali wrote:

I första exemplet så funkar inte länken till bbr (den som finns i beskrivningen). Den funkar dock utan "m": http://kulturarvsdata.se/raa/bbr/html/21200000003273

This seems to be due to an error in RAÄ's import. A new harvest is on the way. To fix these on wiki the "m" can be removed from the BBR template.

Also, not sure about the mapping of stadsmiljö to Cityscapes. It works for the images in the preview, but I'm sure there will be photos where it doesn't.

Also, not sure about the mapping of stadsmiljö to Cityscapes. It works for the images in the preview, but I'm sure there will be photos where it doesn't.

Ping @Ainali on this one

I created the two red content categories.

I'll create the meta categories after the final processing (to ensure only needed ones are created)

It was the best I could find. If you are unsure it is good enough, feel free to remap or remove.

After looking through more examples -- it does fit in many cases, but I'd say there are equally many cases where it doesn't 1 2 3. But then it looks it's not possible to tell apart the _actual_ cityscape photos.

@Ainali @Alicia_Fagerving_WMSE : So which is the recommendation? Remove, remap or leave?

I looked for some alternatives but the one which sounded best (c:Category:Urban areas in Sweden) seems to fill a different purpose.

Another one: 16001000056906 -- PD & Archaeological on muni level is only categorized as Archaeological monuments in... by municipality but not by province, even though the source page does have landskap.

@Ainali @Alicia_Fagerving_WMSE : So which is the recommendation? Remove, remap or leave?

I looked for some alternatives but the one which sounded best (c:Category:Urban areas in Sweden) seems to fill a different purpose.

I'd vote for remove as there's too many false positives to make it worthwhile. Edit: photos with both stadsmiljö and a description starting with vy från or vy över generally look more cityskapey, but it's still not perfect.

photos with both stadsmiljö and a description starting with vy från or vy över generally look more cityskapey, but it's still not perfect.

No support for that sort of combination is available right now.

Another one: 16001000056906 -- PD & Archaeological on muni level is only categorized as Archaeological monuments in... by municipality but not by province, even though the source page does have landskap.

Not sure I understand. If it gets a match on municipal level then it (by design) skips both the county and province categories.

Another one: 16001000056906 -- PD & Archaeological on muni level is only categorized as Archaeological monuments in... by municipality but not by province, even though the source page does have landskap.

Not sure I understand. If it gets a match on municipal level then it (by design) skips both the county and province categories.

Shouldn't there be a double categorization for archaeological monuments? Both a real one and a historical one.

Question: In "Originalbeskrivning", is it possible to list the categories in the same order as they appear in KMB? In KMB you can see the hierarchy and understand the "tree" but in Originalbeskrivning they are all mixed up (possibly in alphabetical order instead).

Another thought: In the preview object 16001000010979 -- BBR_b there are links in KMB to Wikipedia articles. Could they be used somehow?

Question: In "Originalbeskrivning", is it possible to list the categories in the same order as they appear in KMB? In KMB you can see the hierarchy and understand the "tree" but in Originalbeskrivning they are all mixed up (possibly in alphabetical order instead).

Nope. Because in Kulturarvsdata they are returned in alphabetical order (hence the need for the file in T160054: Only lowest itemClassName level should be used as a category) =/

Another thought: In the preview object 16001000010979 -- BBR_b there are links in KMB to Wikipedia articles. Could they be used somehow?

Nope. Again this is not available in Kulturarvsdata (AFAIK). It is likely supplied by the UGC-hub instead.

Another one: 16001000056906 -- PD & Archaeological on muni level is only categorized as Archaeological monuments in... by municipality but not by province, even though the source page does have landskap.

Not sure I understand. If it gets a match on municipal level then it (by design) skips both the county and province categories.

Shouldn't there be a double categorization for archaeological monuments? Both a real one and a historical one.

Not sure that is how it's been done. But I'll take a look. If so then yes drop County but always add Province

Another one: 16001000056906 -- PD & Archaeological on muni level is only categorized as Archaeological monuments in... by municipality but not by province, even though the source page does have landskap.

Not sure I understand. If it gets a match on municipal level then it (by design) skips both the county and province categories.

Shouldn't there be a double categorization for archaeological monuments? Both a real one and a historical one.

Not sure that is how it's been done. But I'll take a look. If so then yes drop County but always add Province

@Alicia_Fagerving_WMSE: You are right. Re-adding it in T167350: Always add province category for fmis entries (Patch is ready)

@Ainali @Alicia_Fagerving_WMSE
I have now updated the preview page wit all of the latest changes (incl. T167352: Allow multiple primary tags and T167350: Always add province category for fmis entries).

In general I'm happy with the results. There are a few weird things such as "exteriörer" resulting in "Church exteriors in Sweden" being preferred over "Churches in Göteborg Municipality". But I think the few exceptions are worth it. Note that there are other categories placing the image in Göteborg Municipality (listed buildings) and if there weren't then a civil parish or municipality category would have been added explicitly.

Question: In "Originalbeskrivning", is it possible to list the categories in the same order as they appear in KMB? In KMB you can see the hierarchy and understand the "tree" but in Originalbeskrivning they are all mixed up (possibly in alphabetical order instead).

Nope. Because in Kulturarvsdata they are returned in alphabetical order (hence the need for the file in T160054: Only lowest itemClassName level should be used as a category) =/

Note that with the primary_classes filtering this almost always gets reduced to just one or two words making the above less of an issue.

Note that with the primary_classes filtering this almost always gets reduced to just one or two words making the above less of an issue.

Yes, that's good.

I noticed something else though, on 16001000540747 -- meta_cat the [[:Category:Excavations in Sweden]] was removed, most likely because it was not on level three. However, it was lowest in its "branch" and would have made sense to keep. I guess I asked the wrong question when I asked for the lowest categories, becaues it seems like I got all on level 3, but what we really want is all with no child. Does it make sense to make a new request (i.e. would it help?)?

Question: for 16001000531388 -- other_version, will we add "| other versions =" on the existing images too (with our upload in it) or is that too hard to do? If it is, could we produce a list of all uploads using other version so it could be added manually?

Another idea, that might be a bit more work. 16001000531388 -- other_version is getting the template {{Fornminne|10097601050001}} added and I notice that Category:Visby ringmur already have the template too, with the same id. Would it make sense to look at all categories that have the template and see if the id match our uploads and if so, add the category to the image?

Note that with the primary_classes filtering this almost always gets reduced to just one or two words making the above less of an issue.

Yes, that's good.

I noticed something else though, on 16001000540747 -- meta_cat the [[:Category:Excavations in Sweden]] was removed, most likely because it was not on level three. However, it was lowest in its "branch" and would have made sense to keep. I guess I asked the wrong question when I asked for the lowest categories, becaues it seems like I got all on level 3, but what we really want is all with no child. Does it make sense to make a new request (i.e. would it help?)?

Ah. I assumed that there were always three levels. But yes definitely it is the lowest level that we want (independently of which level it is) so it is probably worth making a new request.
T167387: New harvest of lowest classes

Question: for 16001000531388 -- other_version, will we add "| other versions =" on the existing images too (with our upload in it) or is that too hard to do? If it is, could we produce a list of all uploads using other version so it could be added manually?

No We would only add it to the images uploaded in this batch. That said many of these images will be exactly the same (so the upload will be skipped) and for the rest we have a maintenance category. Since they are always the same image (just different crops/resolutions) I think most of those instances will be solved by one image being deleted.

Another idea, that might be a bit more work. 16001000531388 -- other_version is getting the template {{Fornminne|10097601050001}} added and I notice that Category:Visby ringmur already have the template too, with the same id. Would it make sense to look at all categories that have the template and see if the id match our uploads and if so, add the category to the image?

I'm not sure it is something we want to do as part of the batch upload tools. It is then better to systematically add these as commonscat in the wikidata entry/wlm list. Might be doable through one of Magnus' tools?

I noticed though that I don't seem to be harvesting the fmis/bbr entries from wikidata. I'll see if I can add that.
T167386: Harvest commonscat from Wikdiata entries for bbr/fmis

This comment was removed by Ainali.

One more update of the preview has now been done.

Unless someone finds any glaring mistakes I'll try the test upload tomorrow.

Lokal_Profil claimed this task.