Page MenuHomePhabricator

Indexing on wikibase.cloud
Closed, ResolvedPublicBUG REPORT

Description

I think my wikibase https://ottgaz.org, hosted on wikibase.cloud, needs to be reindexed. Certain SPARQL queries aren't working. More generally, I would like to understand how indexing works--is there a way to know when/if data is not indexed?

Steps to replicate the issue (include links if applicable):

My queries are not returning all the results I'd expect. For example, this query (meant to return all subregions of the superregion Q39) does not return https://ottgaz.org/wiki/Item:Q128, even though it has a status property object with the qualifier Q39.

query: # regions contained by Edirne (Q39)
PREFIX og: <https://ottgaz.org/entity/>
PREFIX ogs: <https://ottgaz.org/entity/statement/>
PREFIX ogv: <https://ottgaz.org/value/>
PREFIX ogt: <https://ottgaz.org/prop/direct/>
PREFIX ogp: <https://ottgaz.org/prop/>
PREFIX ogps: <https://ottgaz.org/prop/statement/>
PREFIX ogpq: <https://ottgaz.org/prop/qualifier/>

SELECT  ?subregion ?subregionLabel ?statusLabel 
WHERE 
{
  ?subregion ogp:P15 ?statement.
  ?statement ogps:P15 ?status.
  ?statement ogpq:P34 og:Q39.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

On Telegram, GreenReaper helped me with the following:

The ones that appear there were modified two days ago while the one that does not was only modified a week ago. Perhaps the indexing was not working properly at that time, in which case this is a Wikibase.cloud issue. I expect if modified it would start showing up.

I modified one item and it did indeed show up.

Later GreenReaper wrote

Not sure ?action=purge will do it. I think you might need to get the cloud team to reindex it. I think they have a method of doing it on their end.

Thanks for any specific and general advice. I'm hoping I don't have to request reindexing all that often--invisible data makes me nervous!

Details

Other Assignee
GreenReaper

Event Timeline

whanley updated the task description. (Show Details)
whanley updated Other Assignee, added: GreenReaper.

Here's another example of the problem. These two queries that differ only on line 16. The first uses P9 as a qualifier, and the second uses P34. Almost all the records in question have both P9 and P34 qualifiers, so the number of results should be the same. But the first returns 283 results, while the second returns only 81.

Query 1: https://tinyurl.com/27okahnl
Query 2: https://tinyurl.com/27mm73uv

An additional clue: almost all of the results in query 2 start with A. This makes me think that the indexing stopped partway through a bulk upload I was doing from OpenRefine, which was proceeding alphabetically.

We have a similar issues on https://beyond-notability.wikibase.cloud/wiki/Main_Page. For example (as described by one of our team:


A strange issue here. I was testing out some date count queries when I noticed that the death date for Gisela Marie Augusta Richer (Q1030) appears on her page but wasn't returning in sparql query results. (She has several other dates which I think are all fine, just not this one.)

This query to fetch every statement associated with her has no reference to the property P15 anywhere: https://tinyurl.com/ymwcfjda

What's more, if you look for the date statements relating to the page itself (schema:dateModified and wikibase:timestamp), they're saying that the page was last modified on 28 June, but in the page history, the death date was added on 31 July (and it's the most recent page edit).

https://beyond-notability.wikibase.cloud/w/index.php?title=Item:Q1030&action=history

I've subsequently found at least one more case: Kate Norgate (Q1000). Again, her death date is the most recent addition to the page (her birth date was added earlier on the same day and does appear in results. https://tinyurl.com/ywsnqkl5)

Tarrow claimed this task.
Tarrow subscribed.

Hi @whanley,

Clearly this is a long time after you reported the issues but after we've put extensive work into improving the reliability of the queryservice and rebuilding the index I was wondering if you (and maybe @Drjwbaker) could comment if these issues now appear to be fixed? I'm actually going to mark the ticket as resolved but please don't hesitate to reopen if you believe that they are still present.

I've tried these queries you two specifically pasted and they all seem to be resolved (although the data is of course not familiar to me) apart from the discrepancy between the two queries you present side by side. Here I now see the counts returned to differ by 10 results although I could imagine this is now expected after you have added more data.

Thanks!

Thanks @Tarrow for the follow up. The issue is resolved, and I appreciate your work.

Will