Page MenuHomePhabricator

nlwiki Labs replica with problems
Closed, ResolvedPublic

Description

Some entries in the nlwiki DB replica on Tool Labs are not correct.

Deleted article still in pages table: https://nl.wikipedia.org/wiki/Aline_fobe is deleted, but

MariaDB [nlwiki_p]> select * from page where page_title='Aline_fobe' and page_namespace=0;
| page_id | page_namespace | page_title | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_no_title_convert | page_content_model |
| 4543292 |              0 | Aline_fobe |                   |            0 |                0 |           0 | 0.800933363152 | 20160614082945 | 20160528071415     |    46686023 |     2952 |                     0 | wikitext           |

Also, this is a redirect: https://nl.wikipedia.org/w/index.php?title=Ernst_August_zur_Lippe&redirect=no
but not in the DB replica (not shown).

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

This is a known issue, the long term reasons are difficult to fix without changing mediawiki iself. The short term ones are being handled with the full (but slow) reimport being done of all shards.

I can explain more in detail why this happens- if you just want it fixed, just wait and nlwiki will eventually get fully reimported. The reason it takes so long is because otherwise it would require bring all labs db down.

Magnus claimed this task.

OK, I'll wait, not really mission-critical.

It shouldn't be that this page and two privacy violations keep appearing on pages as https://tools.wmflabs.org/wikidata-todo/duplicity.php?wiki=nlwiki&mode=list. When will it "eventually get fully reimported"?

Apologies for the delay on fixing nlwiki, we are importing in order of popularity, and that means we have already reimported and fixed enwiki, wikidatawiki, dewiki, commonswiki and all s3 group hosts. That means only 33 wikis out of ~900 are left to process (reimport + filter), you had the bad lack of being interested on nlwiki, which is on the s2 group, and one of the wikis pending. This will happen soon, however -in the order of days, not months. Sadly, we are constantly delayed by tasks in order to prevent outages- we are only a team of 2 people with a part-time to take care not only of labs databases, but of all of production databases, beta, analytics, etc. The process itself is not fast either (around 9TB of data to analyze row by row). I would ask you patience.

We are not tracking the process on each individual bug reports- but you can see it on: https://phabricator.wikimedia.org/T140788 and its subtasks. Most of the work has been completed. You can see sumary of the improvements on: https://commons.wikimedia.org/wiki/File:Labsdbs_for_WMF_tools_and_contributors_-_get_more_data,_faster.webm

Again, apologies. This will get much better (faster and more reliable servers) very soon. Wait for the announcement.

Apologies for the delay on fixing nlwiki, we are importing in order of popularity, and that means we have already reimported and fixed enwiki, wikidatawiki, dewiki, commonswiki and all s3 group hosts. That means only 33 wikis out of ~900 are left to process (reimport + filter), you had the bad lack of being interested on nlwiki, which is on the s2 group, and one of the wikis pending. This will happen soon, however -in the order of days, not months. Sadly, we are constantly delayed by tasks in order to prevent outages- we are only a team of 2 people with a part-time to take care not only of labs databases, but of all of production databases, beta, analytics, etc. The process itself is not fast either (around 9TB of data to analyze row by row). I would ask you patience.

We are not tracking the process on each individual bug reports- but you can see it on: https://phabricator.wikimedia.org/T140788 and its subtasks. Most of the work has been completed. You can see sumary of the improvements on: https://commons.wikimedia.org/wiki/File:Labsdbs_for_WMF_tools_and_contributors_-_get_more_data,_faster.webm

Again, apologies. This will get much better (faster and more reliable servers) very soon. Wait for the announcement.

Thanks for your response.

root@labsdb1003[nlwiki]> select * from page where page_title='Aline_fobe' and page_namespace=0;
Empty set (0.01 sec)