Page MenuHomePhabricator

Map infobox/photograph template for SMVK-Cypern-2017-01
Closed, ResolvedPublic1 Estimated Story Points

Description

Create a task similar to T151040 for Mexiko. This requires that you manually look at the metadata file and analyze it to see what can be done with it to comply with Commons best practices.

Metadata doc to analyze

Keyword mappings:
Places
keywords

Filename:
<Beskrivning>_

if not in <Beskrivning> also:

<Nyckelord>_ e.g. keyword1_keyword2_
<Region, foto>_
<Ort, foto>_

followed by:

-_SMVK-MM-Cypern_-_<Fotonummer>.<ext>

Template:
See Template:photograph documentation

{{Photograph
 |photographer       =  {{Creator:John Lindros}} OR <empty> // this leaves out 6 images that have very little metadata anyhow. OK?
 |title              = 
 |description        = {{{sv| <Beskrivning>. Svenska Cypernexpeditionen 1927-1931. <br /> ''Nyckelord:'' <nyckelord>}} // 6 cases with only the latter
                                {{en|The Swedish Cyprus expedition 1927-1931}}
 |depicted people    = <Personnamn, avbildad>
 |depicted place     = {{city|<Places mapping>}} 
 |date               = <Fotodatum>
 |medium             =
 |dimensions         =
 |institution        = {{Institution:Statens museer för världskultur}}
 |department         = [[:d:Q1331646|Medelhavsmuseet]]
 |references         =
 |object history     =
 |exhibition history =
 |credit line        = 
 |inscriptions       =
 |notes              =
 |accession number   = {{SMVK-MM-link|<Länk[last digits]>|<Fotonummer>}}
 |source             = The original image file was recieved from SMVK with the following filename:  <br /> 
'''<Fotonummer>.tif''' // Double check this!
{{SMVK cooperation project|COH}}
 |permission         = {{cc-zero}}
 |other_versions     =
}}
´´´

Categories:
```[[Category:<keywords mappings>]] AND [[Category:<places mappings>]] IF NOT EQUAL // possible since places is often in description

[[Category:Swedish Cyprus Expedition|Swedish Cyprus Expedition]]
[[Category:Media_from_the_National_Museums_of_World_Culture]]
[[Category:Media_contributed_by_SMVK_2017-02]]
[[Category:Media_contributed_by_SMVK_2017-02_with_faulty_depicted_person_values]]

Event Timeline

Mexiko template for referense:

Template:
{{Photograph
 |photographer       =  <Personnamn / fotograf>
 |title              = 
 |description        = {{sv|<Beskrivning>. <Händelse / var närvarande vid>. <Motivord>}}
                                {{en|Translated <Händelse / var närvarande vid>}}
 |depicted people    = <Personnamn, avbildad>
 |depicted place     = {{city|<Places mapping>}} 
 |date               = <Fotodatum>
 |medium             =
 |dimensions         =
 |institution        = {{Institution:Statens museer för världskultur}}
 |department         = {{q:Q1371375|Etnografiska muséet}}
 |references         =
 |object history     =
 |exhibition history =
 |credit line        = 
 |inscriptions       =
 |notes              =
 |accession number   =
 |source             = The original image file was recieved from SMVK with the following filename:  <br /> 
'''<Fotonummer>.tif'''
{{SMVK cooperation project|COH}}
 |permission         = {{cc-zero}}
 |other_versions     =
}}
´´´
Categories:
[[Category:<Places mapping>]] AND/OR [[Category:<useful cat2>]]

OR
[[Category:Images_from_SMVK-EM_without_full_description]]
[[Category:Images_from_SMVK-EM_2016-11]]

Only five fotos in the metadata lacks description:

FotonummerPostnr.NyckelordBeskrivningLand, fotoRegion, fotoOrt, fotoGeograf namn, alternativFotodatumPersonnamn / fotografPersonnamn / avbildadSökordHändelse / var närvarande vidLänk
C003264004279Svenska Cypernexpeditionen1927-1931Lindros, Johnhttp://kulturarvsdata.se/SMVK-MM/Photograph/html/4004279
Csn1454076840Svenska CypernexpeditionenLindros, Johnhttp://kulturarvsdata.se/SMVK-MM/Photograph/html/4076840
Csn1464076846Svenska CypernexpeditionenLindros, Johnhttp://kulturarvsdata.se/SMVK-MM/Photograph/html/4076846
Csn1474076851Svenska CypernexpeditionenLindros, Johnhttp://kulturarvsdata.se/SMVK-MM/Photograph/html/4076851
Csn1484076856Svenska CypernexpeditionenLindros, Johnhttp://kulturarvsdata.se/SMVK-MM/Photograph/html/4076856
Csn1494076861Svenska CypernexpeditionenLindros, Johnhttp://kulturarvsdata.se/SMVK-MM/Photograph/html/4076861

21 images in the metadata lacks photographer, see P4850

@Lokal_Profil Are these OK?

|photographer       =  {{Creator:John Lindros}} OR <empty> // this leaves out 6 images that have very little metadata anyhow. OK?

|description        = {{sv|<Beskrivning}} OR {{sv|<Nyckelord> // 6 cases with only the latter
                                {{en|The Swedish Cyprus expedition 1927-1931}}

@Mattias_Ostmar-WMSE

  • Q: Is below correct? A: yes
|source             = The original image file was recieved from SMVK with the following filename:  <br /> 
'''<Fotonummer>.tif''' // Double check this!

@Lokal_Profil @Jopparn The images is currently licensed CC-BY-NC-ND (e.g. http://collections.smvk.se/carlotta-mhm/web/object/4258740) - should I ask Magnus to change this, like with the Mexiko images?

@Lokal_Profil Are these OK?

|photographer       =  {{Creator:John Lindros}} OR <empty> // this leaves out 6 images that have very little metadata anyhow. OK?

These should be fine. Only issue is how this affects the copyright if photographer is unknown. We'll check with MVK later today.

|description        = {{sv|<Beskrivning}} OR {{sv|<Nyckelord> // 6 cases with only the latter
                                {{en|The Swedish Cyprus expedition 1927-1931}}

English looks ok. For the swedish one I would suggest doing somehting similar to the Mexico collection. I.e.

|description        = {{sv| <Beskrivning>. <händelse>. <br /> ''Nyckelord:'' <nyckelord>}}

Where:

  • <händelse> is the Swedish version of "The Swedish Cyprus expedition 1927-1931" (likely always the same string),
  • "Nyckelord:" is only present if <nyckelord> is not empty.

As for whether to allow images without <Beskrivning> (but with <nyckelord>). If it's only 6 cases I would say upload them but tag with a maintenance category.

@Mattias_Ostmar-WMSE

|source             = The original image file was recieved from SMVK with the following filename:  <br /> 
'''<Fotonummer>.tif''' // Double check this!

With the addition of {{SMVK_cooperation_project|COH|museum=MM}}` underneath and correcting the typo (recieved ->received) I think that should work.

@Lokal_Profil @Jopparn The images is currently licensed CC-BY-NC-ND (e.g. http://collections.smvk.se/carlotta-mhm/web/object/4258740) - should I ask Magnus to change this, like with the Mexiko images?

Yes this needs to be updated before we upload.

In addition to the above:

  • accession number should be {{SMVK-MM-link|1=foto|2=<Postnr.>|3=<Fotonummer>}}
  • [[ Category:Media_from_the_National_Museums_of_World_Culture]] is likely not needed since it is added automatically by {{SMVK_cooperation_project}}
  • department should be [[:d:|Medelhavsmuseet]]
  • accession number should be {{SMVK-MM-link|1=foto|2=<Postnr.>|3=<Fotonummer>}} - No, SMVK-MM-link differs from SMVK-EM-link
  • [[ Category:Media_from_the_National_Museums_of_World_Culture]] is likely not needed since it is added automatically by {{SMVK_cooperation_project}}
  • department should be [[:d:|Medelhavsmuseet]]
  • department should be [[:d:|Medelhavsmuseet]]

There wa a copy past missing in my original code. This should have been
[[:d:Q1331646|Medelhavsmuseet]] of course

  • accession number should be {{SMVK-MM-link|1=foto|2=<Postnr.>|3=<Fotonummer>}} - No, SMVK-MM-link differs from SMVK-EM-link

Per an off-Phabricator discussion the correct format is {{SMVK-MM-link|1=<Postnr.>|2=<Fotonummer>}} (my bad)

I am discussing with SMVK about the licensing. They are looking into it, and are working on a broader policy change toward openness :-).

The infobox creation is done is the script create_infotexts.py, but the filename creation is done in metadata_to_json_and_fnamesmap.py.

Reopen since we need some further definitions.

|depicted place = {{city|<Places mapping>}}

I suggest we prioritize wikidata links here like so. Is this reasonable?

| depicted place ={{city|1=<wikidata item from places mapping|link=wikidata}} if exists OR ELSE {{city|<commons category from places mapping>|link=commons}}

However, the documentation had no examples of the use of the link parameter, so I hope this works as intended.

UPDATE: All places have wikidata items, so the result is that every image with <Ort, foto> matching the places will be mapped to wikidata and none to commons categories. Is this OK @Lokal_Profil?

Sometimes there are multiple people depicted in an image e.g. ´Lindros, John, Otterman, Gudrun, Westholm, Alfred´

The function flip_names() in batchupload.helpers() almost takes care of it, but would also need a split on every second ", " like e.g. so.

For reference; this is the pattern I used for |depicted people field since it was so few people to map:

def depicted_people_mapping(name_string_or_list):
    people_mapping = {
        "John Lindros":"[[Category:John Lindros|John Lindros]]", # [[:d:Q5957823|John Lindros]]
        "Lazaros Kristos":"Lazaros Kristos",
        "Alfred Westholm":"[[Category:Alfred Westholm|Alfred Westholm]]", # [[:d:Q6238028|Alfred Westholm]]
        "Erik Sjökvist":"[[Category:Erik Sjöqvist|Erik Sjöqvist]]", # [[:d:Q5388837|Erik Sjöqvist]] OBS! "Q" inte "K"
        "Erik Sjöqvist":"[[Category:Erik Sjöqvist|Erik Sjöqvist]]", # [[:d:Q5388837|Erik Sjöqvist]]
        "Einar Gjerstad":"[[Category:Einar Gjerstad|Einar Gjerstad]]", # [[:d:Q481299|Einar Gjerstad]]
        "Lazaros Giorkos":"Lazaros Giorkos",
        "Stefan Gjerstad":"Stefan Gjerstad",
        "Vivi Gjerstad":"Vivi Gjerstad",
        "Gudrun Otterman":"Gudrun Otterman",
        "Martin Gjerstad":"[[Category:Martin Gjerstad|Martin Gjerstad]]", # [[d:Q16632979|Martin Gjerstad]]
        "Knut Thyberg":"[[Category:Knut Thyberg|Knut Thyberg]]", # [[:d:Q16633505|Knut Thyberg]]
        "Rosa Lindros":"Rosa Lindros",
        "Ernst Kjellberg":"[[Category:Ernst Kjellberg|Ernst Kjellberg]]", # [[:d:Q5911946|Ernst Kjellberg
        "Bror Millberg":"Bror Millberg"
    }
    if isinstance(name_string_or_list, list):
        out_string = ""
        for name in name_string_or_list:
            out_string += people_mapping[name] + "/"
        return out_string.rstrip("/")
    else: # pre-supposes isinstance(name_string_or_list, basetring) == True
        return people_mapping[name_string_or_list]

if not item["Personnamn / avbildad"] == "":
    if len(item["Personnamn / avbildad"].split(", ")) <= 2:
        flipped_name = helpers.flip_name(item["Personnamn / avbildad"])
        mapped_name = depicted_people_mapping(flipped_name)
        infobox += "| depicted people    = " + mapped_name
    else:
        #print("Bökig | depicted person: {}".format(item["Personnamn / avbildad"]))
        words = item["Personnamn / avbildad"].split(", ")
        if len(words) % 2 == 0:
            span = 2
            list_of_names = [", ".join(words[i:i + span]) for i in range(0, len(words), span)]
            flipped_names_list = helpers.flip_names(list_of_names)
            #print(flipped_names_list)
            mapped_people = depicted_people_mapping(flipped_names_list)
            infobox += "| depicted people    = " + mapped_people
        else:
            print("Error: not even number of names in depicted people: {}".format(item["Personnamn / avbildad"]))
else:
    infobox += "| depicted people    = "
infobox += "\n"

Full code in create_infotexts.py

I decided to create commons categories for the people with wikipedia articles even though I don't know how many images there is for each one.

Reopen since we need some further definitions.

|depicted place = {{city|<Places mapping>}}

I suggest we prioritize wikidata links here like so. Is this reasonable?

| depicted place ={{city|1=<wikidata item from places mapping|link=wikidata}} if exists OR ELSE {{city|<commons category from places mapping>|link=commons}}

However, the documentation had no examples of the use of the link parameter, so I hope this works as intended.

I don't think you need the |link= parameter. Without it any [[ https://commons.wikimedia.org/wiki/Template:City | {{City|<Wikidata id>}} ]] should be resolved in the users language (linking to their Wikipedia). I don't think {{city|<commons category from places mapping>|link=commons}} will work since the allowed values are quite limited.

UPDATE: All places have wikidata items, so the result is that every image with <Ort, foto> matching the places will be mapped to wikidata and none to commons categories. Is this OK @Lokal_Profil?

From a {{City}} perceptive this is ok but do we still need the categories for categorisation of the images?

The infobox creation is done is the script create_infotexts.py, but the filename creation is done in metadata_to_json_and_fnamesmap.py.

I'll try to take a look at these in the coming days. Could you generate a few examples (stick them on a subpage on Commons?)

Sometimes there are multiple people depicted in an image e.g. ´Lindros, John, Otterman, Gudrun, Westholm, Alfred´

The function flip_names() in batchupload.helpers() almost takes care of it, but would also need a split on every second ", " like e.g. so.

Did you not write code to deal with this in the Mexico batch? Think I remember seeing it.

For reference; this is the pattern I used for |depicted people field since it was so few people to map:

    def depicted_people_mapping(name_string_or_list):
        people_mapping = {
            "John Lindros":"[[Category:John Lindros|John Lindros]]", # [[:d:Q5957823|John Lindros]]
            "Lazaros Kristos":"Lazaros Kristos",
            "Alfred Westholm":"[[Category:Alfred Westholm|Alfred Westholm]]", # [[:d:Q6238028|Alfred Westholm]]
            "Erik Sjökvist":"[[Category:Erik Sjöqvist|Erik Sjöqvist]]", # [[:d:Q5388837|Erik Sjöqvist]] OBS! "Q" inte "K"
            "Erik Sjöqvist":"[[Category:Erik Sjöqvist|Erik Sjöqvist]]", # [[:d:Q5388837|Erik Sjöqvist]]
...

I would stick these in a table on wiki rather than hard-code them. That way others can find the mapping in the future. If you want to avoid fetching from the wiki every time I would recommend loading a json file or similar.

else:
    print("Error: not even number of names in depicted people: {}".format(item["Personnamn / avbildad"]))

If this happens I believe it would be better to output the original name string and add the image to a maintenance category to clean it up manually afterwards then to throw it away.

@Lokal_Profil Thanks! Out of habit I added this while it's work in progress (wikitable coming up, printouts are for debugging now) and for documentation - this way it's easier to pick up the learnings after the batch is finished and some time has gone.

Row 456 in èxcel-export.xls` contains the row:

C04393+C04394 3924555 Svenska Cypernexpeditionen Teatern. Från öster. Sammansättningsbild. Soli. Svenska Cypernexpeditionen Cypern Soli 1927-1931 Lindros, John http://kulturarvsdata.se/SMVK-MM/Photograph/html/3924555`

This fotnummer, however is _not_ present in the google docs version (row is missing).

  • check if a file named C04393+C04394.tifis present in the batch of images - and handle possible errors

Row 456 in èxcel-export.xls` contains the row:

C04393+C04394 3924555 Svenska Cypernexpeditionen Teatern. Från öster. Sammansättningsbild. Soli. Svenska Cypernexpeditionen Cypern Soli 1927-1931 Lindros, John http://kulturarvsdata.se/SMVK-MM/Photograph/html/3924555`

This fotnummer, however is _not_ present in the google docs version (row is missing).

  • check if a file named C04393+C04394.tifis present in the batch of images - and handle possible errors

I think C04393+C04394 is actually the photo number (at least according to http://collections.smvk.se/carlotta-mhm/web/object/3924555)

That said it's not impossible that the + has been replaced by another character in the filename.

Jopparn changed the point value for this task from 2 to 1.
Jopparn changed the point value for this task from 2 to 1.

Probably will be reopened after test-upload.