Page MenuHomePhabricator

integraality does not handle "Unknown value" groupings well
Closed, ResolvedPublic

Description

See example at https://www.wikidata.org/wiki/Wikidata:WikiProject_PCC_Wikidata_Pilot/Northwestern_University_Libraries/Alumni_property_dashboard : a non-sense value T2202693931 is used as grouping for "unknown value" grouping.

This also results in a straight crash when grouping_link is set − see eg https://www.wikidata.org/wiki/Wikidata:WikiProject_Video_games/Statistics/Country_of_origin

The groupings query is https://w.wiki/wVm and returns items such as https://www.wikidata.org/wiki/Q61976205 which have P495=Unknown value

Triggering an update results with

<class 'pywikibot.exceptions.InvalidTitle'> 'T2157987814' is not a valid item page title
web_1    |     self.process_page(page)
web_1    |   File "/code/integraality/pages_processor.py", line 116, in process_page
web_1    |     output = stats.retrieve_and_process_data()
web_1    |   File "/code/integraality/property_statistics.py", line 614, in retrieve_and_process_data
web_1    |     text += self.make_stats_for_one_grouping(grouping, item_count, higher_grouping)
web_1    |   File "/code/integraality/property_statistics.py", line 543, in make_stats_for_one_grouping
web_1    |     group_item = pywikibot.ItemPage(self.repo, grouping)
web_1    |   File "/usr/local/lib/python3.7/site-packages/pywikibot/page/__init__.py", line 4694, in __init__
web_1    |     super(ItemPage, self).__init__(site, title, ns=ns)
web_1    |   File "/usr/local/lib/python3.7/site-packages/pywikibot/page/__init__.py", line 4392, in __init__
web_1    |     self._link.title)
web_1    |   File "/usr/local/lib/python3.7/site-packages/pywikibot/page/__init__.py", line 4095, in __init__
web_1    |     % (self.id, self.entity_type))
web_1    | pywikibot.exceptions.InvalidTitle: 'T2157920837' is not a valid item page title

Event Timeline

JeanFred renamed this task from Error when updating Wikidata:WikiProject Video games/Statistics/Country of origin to integraality does not handle "Unknown value" groupings.Jan 29 2021, 9:44 AM
JeanFred triaged this task as Medium priority.
JeanFred updated the task description. (Show Details)
JeanFred renamed this task from integraality does not handle "Unknown value" groupings to integraality does not handle "Unknown value" groupings well.Mar 4 2021, 10:44 PM
JeanFred updated the task description. (Show Details)

With c255b8b0be00 the straight crash is avoided and https://www.wikidata.org/wiki/Wikidata:WikiProject_Video_games/Statistics/Country_of_origin does generate.

However the underlying issue remains. The above page has rows for T2157859999, T2202800763, T2252325880, T2207302749, T2194780413, T2179246763, T2184111391.

The question is: how to reliably identify these unknown values? Just consider all the ones starting with t12345 ?

JeanFred claimed this task.