Page MenuHomePhabricator

Missing terms in wb_terms table
Closed, ResolvedPublic

Description

On toolforge, I try to get labels, aliases, and descriptions for Q823 (via "sql wikidata" shell command):

MariaDB [wikidatawiki_p]> SELECT * FROM wb_terms where term_full_entity_id='Q823' ;
Empty set (0.05 sec)

None of these show up. It's not replication lag, because the first edits (including labels) on that item date back to 2012:
https://www.wikidata.org/w/index.php?title=Q823&limit=500&action=history

Expected: Dozens of rows with labels, aliases, and descriptions

Observed: None of those

Event Timeline

Addendum:

MariaDB [wikidatawiki_p]> select @@hostname;
+------------+
| @@hostname |
+------------+
| labsdb1003 |
+------------+
1 row in set (0.00 sec)

@Magnus -unless you can confirm the same happens on the new hosts (labsdb1009, 10 or 11), I will close this as "won't fix"- new labsdb hosts should be used, and the old ones are only left available for easier transitioning. As labsdb1003 is going to be put down in a matter of days, it is not worth fixing know consistency issues.

"sql" is the recommended shell command to connect to the replicas on Toolforge, correct?

"sql wikidata" connected me to labsdb1003.

I do expect the default settings to give me something useful. Is that too much to ask?

OK:

MariaDB [wikidatawiki_p]> select @@hostname;
+------------+
| @@hostname |
+------------+
| labsdb1011 |
+------------+
1 row in set (0.01 sec)

MariaDB [wikidatawiki_p]> SELECT * FROM wb_terms where term_full_entity_id='Q823' ;
Empty set (0.01 sec)

NOW can we take a look please?

I do expect the default settings to give me something useful. Is that too much to ask?

That is certainly a bug, but it was not the one originally reported "Toolforge Wikidata database replica corruption?". We can create a separate one for "make sql command connect to the new databases". As a DBA, I have nothing to do with how cloud (the clients) are setup, I can only report such errors to the right team.

NOW can we take a look please?

I will check it now.

Production masters provide the same result, empty sets-

root@db1070[(none)]> use wikidatawiki
Database changed
root@db1070[wikidatawiki]> SELECT * FROM wb_terms where term_full_entity_id='Q823' ;
Empty set (0.01 sec)

this is not a replica corruption, because production and toolforge replicas provide the same results. If you think the database on production has a problem- you should try forcing a reparse of the page ("purge") on wikidata or wikibase client wikis, and if that doesn't work, escalate to wikidata team so developers can fix any problems. The replicas are healthy, but they will replicate incorrect results if that is what there is in production.

So, to summarize, there is nothing wrong with the database itself on labs or replication (as a DBA, that is the only think I can check) - that doesn't mean the bug is invalid- I have created T181492 for the sql command part. If you think there is a production (mediawiki) problem, we should escalate to MediaWiki-General , probably Wikidata extension.

jcrespo renamed this task from Toolforge Wikidata database replica corruption? to Wikidata database corruption?.Nov 28 2017, 11:38 AM

Okay, Since the prod gave the same result, it's not a corruption in replicating, it is very likely to be corruption of the job to update/add rows in wb_terms table (which can happen due to various reasons). In these cases, the best to handle is to do an edit in labels of the page to trigger another update of the table which I did and it fixed the problem, it would be nice to write a script to scan the table and find such inconsistencies though.

I did ?action=purge on the item before I filed the bug.

It appears that the labels have now arrived. There was probably an hour between purge/bug filing and the data appearing, so it took at least that long to populate the entry, which in itself is a bug. This should happen instantaneously, especially since it seems to remove the labels upon edit.

@Magnus- you probably didn't see Ladsgroup comment, apparently it requires editing, and not purging for the table to repopulate.

@jcrespo OK, but that only means the rows were missing since the last edit, presumably, on 5 August 2017‎.

And it doesn't look like it's a one-off; none of these examples have any labels/aliases/descriptions, when they clearly should have:

MariaDB [wikidatawiki_p]> select page_title from page_props,page WHERE page_id=pp_page AND page_namespace=0 AND pp_propname='wb-sitelinks' AND pp_value>=5 AND NOT EXISTS (SELECT * FROM wb_terms WHERE term_full_entity_id=page_title) limit 50;
+------------+
| page_title |
+------------+
| Q832       |
| Q1153      |
| Q2250      |
| Q2638      |
| Q2806      |
| Q2869      |
| Q3074      |
| Q3088      |
| Q3293      |
| Q3315      |
| Q3316      |
| Q3467      |
| Q3488      |
| Q3706      |
| Q3975      |
| Q4046      |
| Q4163      |
| Q4170      |
| Q4173      |
| Q4268      |
| Q4286      |
| Q4309      |
| Q4312      |
| Q4324      |
| Q4326      |
| Q4385      |
| Q4693      |
| Q4772      |
| Q4811      |
| Q4825      |
| Q4831      |
| Q4874      |
| Q4879      |
| Q4885      |
| Q4938      |
| Q4939      |
| Q5234      |
| Q5241      |
| Q5247      |
| Q6127      |
| Q6450      |
| Q7448      |
| Q7796      |
| Q7845      |
| Q8145      |
| Q8154      |
| Q8169      |
| Q8292      |
| Q8744      |
| Q8829      |
+------------+
50 rows in set (29.47 sec)

In fr.wikisource.org with 10.0.32-MariaDB which differ from 10.0.31-MariaDB in www.mediawiki.org,
I get and error, 10 minuts after a good run:

Erreur Lua dans Module:Central-s-fr à la ligne 1709 : attempt to index global 'wikibase' (a nil value).

Also in Phabricator Search Advanced Search, the status of tasks is replaced by a default icon.

I posted this twice for a faster debug.

In fr.wikisource.org with 10.0.32-MariaDB which differ from 10.0.31-MariaDB in www.mediawiki.org,
I get and error, 10 minuts after a good run:

Erreur Lua dans Module:Central-s-fr à la ligne 1709 : attempt to index global 'wikibase' (a nil value).

Also in Phabricator Search Advanced Search, the status of tasks is replaced by a default icon.

I posted this twice for a faster debug.

Can you provide a bit more info? I don't really get what you are doing.

What are you exactly running?
Where are you running that?

I work on a the Module:Central able to convert any scribunto Lua module as multilingual with automatic import of arguments and wikidatas, see T135845. I try it 100 times each day.

The bug happens when I query the first data from wikibase, not when I configure the link using wikibase.getEntityIdForTitle().
No matter Firefox or Chromium.

@Rical

Erreur Lua dans Module:Central-s-fr à la ligne 1709 : attempt to index global 'wikibase' (a nil value).

I do not see the relation between this error and the topic of this task. The function getEntityIdForTitle is in the mw.wikibase table and not the wikibase table and so should be called with mw.wikibase.getEntityIdForTitle(). I have updated the module: https://fr.wikisource.org/w/index.php?title=Module%3ACentral-s-fr&type=revision&diff=7132707&oldid=7131483

Your help answer the question. Many thanks.
Sorry to have disturb this task.

@Magnus I agree with you it is a bug, and I have updated the tags and the title to reflect that, it is now on the backlog for the Wikidata team.

Ladsgroup renamed this task from Wikidata database corruption? to Missing terms in wb_terms table.Jan 16 2018, 10:36 AM
Ladsgroup moved this task from incoming to ready to go on the Wikidata board.

From about 2017-12-28, MariaDB is in version 10.0.23 in wikipedia.org and change between 10.0.31 and 10.0.32 in wikisource.org and mediawiki.org.
probably because not enough stable.

The page Auteur:Rical/Victor_Hugo uses the Module:Tests specialy coded to test the use of wikibase.

Some tests of banal properties from

statements[1].mainsnak.datavalue.value

result all in empty tables { }, not in sought strings, like for P569 = birthday, P570, P21...

The module Module:Auteur2, compiled on 2016-12-14, runs fine.

all items created after March 14 are missing in the wb_terms replica table.

all items created after March 14 are missing in the wb_terms replica table.

Can you provide some more details about this?
Which replica? Which query are you using to get that data?

If you are talking about labsdb replicas, I do see new inserts arriving to that table. And the new items arriving to the primary master and arriving to the labsdb hosts.

@Pasleim I checked a random new item and it still has complete entries in wb_terms: https://quarry.wmflabs.org/query/26347

(Note: the empty term_search_key and term_weight are intentional, see T188993: Replace term_search_key and term_weight with empty values when wb_terms is not used for search.)

Nevermind, I did not know that term_entity_id is no longer written.

@Magnus In some of your tools, e.g. pagepile and tree, labels of new items are not shown for the same reason.

Somebody needs to do a grep in Toolforge folder probably for all users :) [I know, offtopic]

@Magnus have you identified any other issues with other items?
I looked through a bunch in the list that you provided and they all appear to have terms in wb_terms now.

Addshore moved this task from Inbox to Investigate & Discuss on the [DEPRECATED] wdwb-tech board.
Addshore removed a project: Wikidata-Campsite.

Removing the campsite as this is not really currently actionable.

Addshore changed the task status from Open to Stalled.Aug 28 2018, 6:59 AM
Addshore claimed this task.

These hasn't been any movement on this ticket in quite some time.
If there are any more examples of terms missing from the table please reopen the ticket.