Page MenuHomePhabricator

Test upload of person data
Closed, ResolvedPublic2 Estimated Story Points

Description

When test-uploading whole authority items, we should make sure we test some tricky cases. We should find and upload examples of:

  • Person with YYYY birth or death date when the pre-existing WD item already has it as YYYYMMDD; result: statement not added
    • khwz1fk30skg2t2 OK
    • zw9cf3vh1dnt177 OK
  • Person created from scratch (based on empty item with only SELIBR: Q57237818 / 1zcfh90k1dhvpbh
  • Person from scratch, including nationality : Q57238122 / xv8bczpg216whfk
  • Person with detailed birthdate and profession: Q57238782 / nl03b3l603nkbr7

Adding more edited items so that we don't lose track....

Event Timeline

The examples all look good.

My only reflection is if it would be possible to use the "hasVariant" to set aliases. E.g. for 1zcfh90k1dhvpbh adding "Bertil K:son Söderling|Bertil Knutson Söderling" to aliases.

For an import from scratch, which are the tests to see if the person already exists on Wikidata?

@Lokal_Profil

About aliases: I thought about it a lot, as ignoring the name variants obviously removes some (very!) usable information.

The problem is that the name variants do not have language labels. As they often contain versions of the name in different transcriptions, this makes it impossible to know which language they belong to.

For a very confusing example of Swedish and foreign labels lumped together, see https://libris.kb.se/katalogisering/tr574vdc33gk2cc#it where the variants are represented as

{
  "@type": "Person",
  "familyName": "Strindberg",
  "givenName": "Johan August",
  "lifeSpan": "1849-1912"
},
{
  "@type": "Person",
  "familyName": "Strintmperg",
  "givenName": "August",
  "lifeSpan": "1849-1912"
},
{
  "@type": "Person",
  "familyName": "Sutorindoberi",
  "givenName": "Yōhan A.",
  "lifeSpan": "1849-1912"
},

@Lokal_Profil

As for existing persons – we're only matching via the old SELIBR / new URI now, meaning this import would only cover the 60k+ people who got their URI's in the previous run, plus I guess a handful who got a SELIBR manually since the time of the first upload. Meaning there will be no creation of new items if no match is found.

@Lokal_Profil

As for existing persons – we're only matching via the old SELIBR / new URI now, meaning this import would only cover the 60k+ people who got their URI's in the previous run, plus I guess a handful who got a SELIBR manually since the time of the first upload. Meaning there will be no creation of new items if no match is found.

Thanks for the clarification. I assumed this but wanted to double-check.

@Lokal_Profil

About aliases: I thought about it a lot, as ignoring the name variants obviously removes some (very!) usable information.

The problem is that the name variants do not have language labels. As they often contain versions of the name in different transcriptions, this makes it impossible to know which language they belong to.

For a very confusing example of Swedish and foreign labels lumped together, see https://libris.kb.se/katalogisering/tr574vdc33gk2cc#it where the variants are represented as

{
  "@type": "Person",
  "familyName": "Strindberg",
  "givenName": "Johan August",
  "lifeSpan": "1849-1912"
},
{
  "@type": "Person",
  "familyName": "Strintmperg",
  "givenName": "August",
  "lifeSpan": "1849-1912"
},
{
  "@type": "Person",
  "familyName": "Sutorindoberi",
  "givenName": "Yōhan A.",
  "lifeSpan": "1849-1912"
},

I completely accept your decision based on that. Would be good to make a note about this somewhere (in repo README or somewhere else) to clarify that this is an active decision and to prevent a well meant but incorrect patch/implementation of this later.

I have included this information (along with info on the nationality-label logic which touches on similar problem) in the mapping overview, which is also linked from the Background page on wmse wiki.

I have included this information (along with info on the nationality-label logic which touches on similar problem) in the mapping overview, which is also linked from the Background page on wmse wiki.

Ah. Sorry for missing that. Then I'd say maybe add a link to the mapping from the REAME section about the auth importer.

Based on this I think you should be good to go. Maybe do a test run of 20-50 (as part of bot request) and give them a quick look to see that there aren’t any edge-cases popping up.

Just spotted that T203380 is still left. Or was there simply no test with this implemented?

Looks like nice short descriptions are rare :)

Did a quick run with https://www.wikidata.org/wiki/Q57510548

Looks like nice short descriptions are rare :)

Did a quick run with https://www.wikidata.org/wiki/Q57510548

True. Thanks for adding a link