Page MenuHomePhabricator

Pywikibot issue with adding qualifiers to existing claims
Open, Stalled, Needs TriagePublic

Description

There is a use case where we would want to add information to existing SDC statement. Normally this happens when corresponding Wikidata-item is added after SDC-entry has been added to Commons.

If a statement exists there are three approaches:

  1. don't add qualifiers that already exist to avoid duplicates -> error from pywikibot

"AttributeError: 'NoneType' object has no attribute 'entity_type'" -> can't add only missing information to send

  1. set qualifiers to pywikibot but don't "add" them -> partial information in SDC

there is a duplicate statement with only the new information under it and older is unmodified

  1. set all information to statement as normally

there is a complete duplicate statement along with the older partial statement, which remains unmodified

None of these is ideal (first foremost since it doesn't even complete). Removing existing claim might be potential approach, but increases risks of something going wrong (also didn't test it).

If server was "smart" it could combine when sending partial information, but potentially client-side needs something more to tell exactly which claim the new qualifier belongs to. For example, file may be found in different servers with different quality or with some modifications in either one and marking them as sources for commons needs some care to be correct. For adding to right claim, there needs to be some kind of identifiers separating similar claims with different values.

Which approach should be taken when adding missing information to an existing entry?

Event Timeline

You can edit existing claims by loading them from SDC/Wikidata and then editing them and then saving the claim.

Structured data on commons example
Diff link to example edit by the script

import pywikibot

# Connect to Wikidata
site = pywikibot.Site("commons", "commons")
repo = site.data_repository()

# Load the commons item
page = pywikibot.FilePage(site, 'File:Akateemisen Karjala-Seuran 15-vuotisjuhlat Vanhalla ylioppilastalolla 21.2.1937.jpg')  # Specify the file
item = page.data_item()  # Get the data item associated with the page


# Identify the claim (assuming you know the property ID)
property_id = "P195"  # Replace with the actual property ID of the claim
claims = item.claims[property_id]

# Assuming we're working with the first claim
claim = claims[0]

# Create the qualifier
qualifier_property_id = "P217"  # Replace with the actual property ID of the qualifier
qualifier_value = "HK19670603:29739"

# Replace with the actual value for the qualifier
qualifier = pywikibot.Claim(repo, qualifier_property_id)
qualifier.setTarget(qualifier_value)

# Add the qualifier to the claim
claim.addQualifier(qualifier)

# Save the changes
item.editEntity({'claims': [claim.toJSON()]})

And same in Wikidata
Diff link to example edit with the script

import pywikibot

# Connect to Wikidata
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()

# Load the item
item_id = "Q15397819"  # Sandbox item three
item = pywikibot.ItemPage(repo, item_id)
item.get()

# Identify the claim (assuming you know the property ID)
property_id = "P17"  # Replace with the actual property ID of the claim
claims = item.claims[property_id]

# Assuming we're working with the first claim
claim = claims[0]

# Create the qualifier
qualifier_property_id = "P9478"  # Replace with the actual property ID of the qualifier
qualifier_value = "456"  # Replace with the actual value for the qualifier
qualifier = pywikibot.Claim(repo, qualifier_property_id)
qualifier.setTarget(qualifier_value)

# Add the qualifier to the claim
claim.addQualifier(qualifier)

# Save the changes
item.editEntity({'claims': [claim.toJSON()]})

I have used following approach in Wikidata

1.) If there is only claim without any qualifiers I will add my qualifiers and references to claim

2.) If there is claim with existing qualifiers (with refs) I will create new claim with my qualifiers and refs. This will create duplicate claims, but merging would require human interventon anyway and it is easier if the there is existing items in Wikidata/SDC.

Though #2 depends on case. For example there are undisputed/known values where one can add qualifiers safely. Like the cases where information is known to be same for all. (stable identifiers for example, language of the monolingual text ...)

does not work in commons:

wdrepo = wikidata_site.data_repository()

item_internet = pywikibot.ItemPage(wdrepo, 'Q74228490')  # file available on the internet
item_internet.get()

#item_internet.claims <- only has property P31 and P279

so the claims does not find the properties it should already have (P7482, P973) and code cannot continue

example: File:Seppo Lindblom 1984.jpg

page already has claims:
claim P275
claim P6216
claim P571
claim P7482
claim P9478

searching for existing by property is not accepted ("not a valid item page title") so it should be q-code, but does not find the existing item in commons.

So this is another bug in pywikibot regarding sdc data?

jos tuon itemin yrittää ohittaa niin sitten tulee virhettä:
ValueError: Q23040125 is not type <class 'pywikibot.page._wikibase.ItemPage'>.

Ipr1 changed the task status from Open to Stalled.Dec 6 2023, 6:37 AM

FYI: pywikibot needs fixing, it doesn't work with commons sdc-data

When you are doing item_internet = pywikibot.ItemPage(wdrepo, 'Q74228490') then it refers to Q74228490 and not to image file in Wikidata commons.

If you want to refer to FilePage then you need to fetch the item using file_page.data_item()

Example

import pywikibot

# Connect to Wikidata
site = pywikibot.Site("commons", "commons")

# Load the commons item
file_page = pywikibot.FilePage(site, 'File:Seppo Lindblom 1984.jpg')  # Specify the file
mediainfo_item = file_page.data_item()  # Get the data item associated with the page

# Show claims
for claim in mediainfo_item.claims:
    print(claim)

those I've already got:
wikidata_site = pywikibot.Site("wikidata", "wikidata")
commonssite = pywikibot.Site("commons", "commons")
for page in pages:
filepage = pywikibot.FilePage(page)
wditem = page.data_item()

sdcdata = wditem.get() # all the properties in json-format
claims = sdcdata['statements']  # claims are just one step from dataproperties down

So nothing I haven't tried already, I just cut it out since it is a lot..

And like I've said before, listing claims is not the issue, it is trying to make the server accept it as modification instead of new entry.