Special:Categories on some wikis errors out with "Exception encountered, of type "Wikimedia\Assert\ParameterAssertionException"
Closed, ResolvedPublic

Description

Special:Categories on wikisource.org returns error "Exception encountered, of type "Wikimedia\Assert\ParameterAssertionException" when lists "near" the Category:Norsk:

https://wikisource.org/w/index.php?title=Special:Categories&dir=prev&offset=Not_proofread&limit=1
https://wikisource.org/w/index.php?title=Special:Categories&offset=Norsk&limit=100
https://wikisource.org/w/index.php?title=Special:Categories&dir=prev&offset=User_ru&limit=500

Regards, Z.

There are a very large number of changes, so older changes are hidden. Show Older Changes

We also need to know how it made it into the table in the first place.

This category is populated by the extension.

Krenair added a comment.EditedMay 26 2015, 1:30 PM

I null edited one of the pages linking to category #56001... Seems to have fixed it. Certainly no need to manually fix the database, but let's see if we can work out why this happened before null editing the problem away.

Glaisher added a subscriber: Tpt.May 26 2015, 3:56 PM
Tpt added a comment.Jun 6 2015, 12:48 AM

I have no idea of the root cause of the issue. The extension adds the category to the ParserOutput object and doesn't do any direct change to this database table

@hashar: As you moved this to "Being worked on" on the Wikimedia-log-errors workboard, any known assignee for this task?
Asking as this problem still happens.

TTO added a subscriber: TTO.Oct 5 2015, 9:39 AM

Noting that this is still happening despite Not proofread (with space) being listed as empty in the DB, though still present there:

MariaDB [enwikisource_p]> select * from category where cat_title like "% %";
+---------+--------------------------+-----------+-------------+-----------+
| cat_id  | cat_title                | cat_pages | cat_subcats | cat_files |
+---------+--------------------------+-----------+-------------+-----------+
|    7276 | Epic poetry              |       -11 |           0 |         0 |
|    7277 | Song lyrics              |        -8 |           0 |         0 |
|    7659 | Speedy deletion requests |        -2 |           0 |         0 |
|    7856 | Law, Islamic             |        -1 |           0 |         0 |
| 3614412 | Without text             |         0 |           0 |         0 |
| 3614486 | Not proofread            |         0 |           0 |         0 |
+---------+--------------------------+-----------+-------------+-----------+
6 rows in set (0.03 sec)

Notice the different cat_id value from above (Not proofread was 56001).

(I think the negative rows are a red herring, enwiki has a bunch of those as well which don't seem to cause problems there.)

Not marking this as a dupe for now, in case ProofreadPage is somehow to blame here.

We also need to know how it made it into the table in the first place.

For the ones with ProofreadPage categories (e.g. "Without text" or "Not proofread"), this was a bug that was fixed in 840064047c3f8e33e3a0ad8a5858e98696fdb609 (2013-12-04).

For the others, no idea. but the low id numbers suggest this being ancient history… definitely before 2009 (deletion of Category:Law, Islamic), perhaps as early as 2005 (creation of Category:Epic poetry). It will be difficult to determine the root cause, which has clearly been fixed at some point, since .

I can easily reproduce this on enwiki. https://en.wikipedia.org/wiki/Special:Categories?from=Righteous

This is not a ProofreadPage bug.

enwiki has 34 broken entries:

mysql:research@analytics-store.eqiad.wmnet [enwiki]> select * from category where cat_title like "% %";
+--------+-------------------------------------------------------------+-----------+-------------+-----------+
| cat_id | cat_title                                                   | cat_pages | cat_subcats | cat_files |
+--------+-------------------------------------------------------------+-----------+-------------+-----------+
| 636434 | Righteous Among the Nations                                 |       -23 |           0 |         0 |
| 641458 | Chicago, Illinois                                           |        -4 |           0 |         0 |
| 644770 | House of Welf                                               |        -5 |           0 |         0 |
| 645269 | Companies based in Peterborough                             |        -1 |          -1 |         0 |
| 649684 | WikiProject Linux articles                                  |        -3 |          -3 |         0 |
| 651939 | Welsh Americans                                             |      -388 |           0 |         0 |
| 653890 | Cities, towns and villages in the Punjab Region of Pakistan |        -1 |           0 |         0 |
| 653896 | Leonese-language writers                                    |        -2 |           0 |         0 |
| 653999 | Arab Female film directors                                  |        -2 |           0 |         0 |
| 661676 | Articles with unsourced statements since March 2009         |        -2 |           0 |         0 |
| 669151 | Behavioural sciences                                        |        -6 |           0 |         0 |
| 683386 | Germans of Polish descent                                   |        -4 |           0 |         0 |
| 685531 | Subud Members by nationality                                |        -2 |          -2 |         0 |
| 686663 | Athletic Bilbao footballers                                 |        -1 |           0 |         0 |
| 687103 | People from Yaoundé                                         |       -13 |           0 |         0 |
| 688746 | Military of Georgia (country)                               |       -20 |           0 |         0 |
| 690083 | Systems biology                                             |        -5 |           0 |         0 |
| 690354 | Torah people                                                |        -2 |           0 |         0 |
| 690474 | Lemony Snicket                                              |        -1 |           0 |         0 |
| 691155 | Irish television programmes                                 |       -27 |           0 |         0 |
| 692935 | Valencian Community geography stubs                         |       -44 |           0 |         0 |
| 693222 | Video game franchises                                       |        -7 |           0 |         0 |
| 694331 | Lemony Snicket stubs                                        |       -21 |           0 |         0 |
| 696725 | Swaminarayan sect of Hinduism                               |       -10 |           0 |       -10 |
| 698443 | Buildings and structures in Rhondda Cynon Taff              |        -3 |          -3 |         0 |
| 698897 | Schools in Rhondda Cynon Taff                               |        -1 |           0 |         0 |
| 702670 | Swiss-Romanian people                                       |        -1 |           0 |         0 |
| 703741 | Pigeons and doves                                           |       -72 |          -1 |        -5 |
| 703946 | Liechtensteinian footballers                                |        -1 |          -1 |         0 |
| 706752 | Daniel Dumile songs                                         |        -1 |           0 |         0 |
| 707288 | National sports teams of FYR of Macedonia                   |        -5 |           0 |         0 |
| 713511 | Companies listed on Rwanda Over The Counter Exchange        |        -1 |           0 |         0 |
| 719433 | Heat transfer                                               |        -1 |           0 |         0 |
| 721931 | Córdoba Province (Argentina)                                |        -1 |           0 |         0 |
+--------+-------------------------------------------------------------+-----------+-------------+-----------+
34 rows in set (0.64 sec)

@TTO Let's say that this is the most common special-case of T155091, and like you said there:

We could also do with a script that handles categories with negative cat_pages values. Literally all it'd have to do is $dbr->select( 'category', 'cat_id', [ 'cat_pages < 0' ], __METHOD__ ) then Category::newFromID( $row->cat_id )->refreshCounts() on each row.

…it could be handled with a simple special-case script.

We already have maintenance/cleanupEmptyCategories.php, which was ran fairly recently (T140811). I propose that we change it to delete category entries with negative count too (right now, it only does zeroes) and re-run it. This should be easier for Operations and DBA to stomach than running an entirely new script which deletes things.

We already have maintenance/cleanupEmptyCategories.php, which was ran fairly recently (T140811). I propose that we change it to delete category entries with negative count too (right now, it only does zeroes) and re-run it. This should be easier for Operations and DBA to stomach than running an entirely new script which deletes things.

@Anomie Since you handled T140811… what to do think of this idea, and would you be able to help with getting the script reran?

TTO added a subscriber: jcrespo.Feb 16 2017, 11:00 PM

;I think there would be value in getting https://gerrit.wikimedia.org/r/333486/ reviewed and run, in part because these DB key errors exist in tables other than category as well. @jcrespo seemed quite open to the idea of it. At least for the category table, it's not doing any DELETEs that Category::refreshCounts wouldn't already do.

I also did https://gerrit.wikimedia.org/r/333917/ (for T18765: Write a maintenance script to refresh category member counts), which is also waiting for review. That would fix the problem on enwiki, since all the problematic categories have incorrect counts. I seem to remember that this is not the case on some smaller wikis, though.

You're of course welcome to write a third script to solve this problem if you think it would be useful, but I think there would be more value in getting at least one of those scripts reviewed and run.

@matmarex wrote:

This is not a ProofreadPage bug.

That's not what I said:

@TTO wrote:

...in case ProofreadPage is somehow to blame here.

It's possible that ProofreadPage is not always normalising category names. Clearly the enwiki problems are not related to that (so we should really be having this discussion in T155091: "Invalid DB key" errors on various special pages or another task) but there may still be something to solve in ProofreadPage.

Ankry added a subscriber: Ankry.Feb 17 2017, 1:07 PM
hashar removed a subscriber: hashar.Feb 17 2017, 1:56 PM

We already have maintenance/cleanupEmptyCategories.php, which was ran fairly recently (T140811). I propose that we change it to delete category entries with negative count too (right now, it only does zeroes) and re-run it. This should be easier for Operations and DBA to stomach than running an entirely new script which deletes things.

@Anomie Since you handled T140811… what to do think of this idea,

I don't see a problem with it; I suppose these rows still exist because whatever code path would normally have cleaned them up when they went negative instead recounted the correctly-formatted version. But it sounds like TTO wants to make a more comprehensive solution, which is probably not a bad idea.

@jcrespo seemed quite open to the idea of it

I do not have a problem with this, you can do it no matter how you want it, and do not need to block on me (I would appreciate reviews of course).

The only things I recommend are: destructive operations, like batch updates and deleting rows, should be scheduled on the deployments weekly section to avoid collisions with schema changes; they should have a dry-run option (or a --really-realy-break-the-db :-)), and it should log to a file modifications e.g. ids for easier recovery if something goes wrong; batch updates and wait for slaves to avoid lag. Very long maintenance tasks should be puppetized on a cron or similar and done in small runs. The script should be idempotent as much as possible. They are not rigid rules (just use common sense) - there are more and less important tables/columns (page titles vs. category counts), and they keep me and the backups happy and undisturbed :-D.

Zdzislaw added a comment.EditedApr 19 2017, 10:03 PM

any chance to solve this problem?
on pl ws:
https://pl.wikisource.org/w/index.php?title=Specjalna:Kategorie&offset=&limit=500

[WPfe6wrAEFcAABlogTAAAABV] 2017-04-19 22:04:27: Krytyczny wyjątek typu "Wikimedia\Assert\ParameterAssertionException"

please,
cat browsing is an (very) important way to search content on pl ws.

Z.

TTO added a comment.Apr 20 2017, 2:04 AM

It would be great if someone could review and potentially merge https://gerrit.wikimedia.org/r/333486! That way, we can fix these problems on all wikis.

Ankry awarded a token.Apr 20 2017, 8:51 AM

Change 349490 had a related patch set uploaded (by Bartosz Dziewoński):
[mediawiki/core@master] Ignore broken entries on Special:Categories

https://gerrit.wikimedia.org/r/349490

Reedy added a subscriber: Reedy.EditedApr 26 2017, 1:58 PM
In T99736#3196384, @TTO wrote:

It would be great if someone could review and potentially merge https://gerrit.wikimedia.org/r/333486! That way, we can fix these problems on all wikis.

Currently partially broken on at least some WMF wikis... testwiki in this case (mediawikiwiki is also broken)

[edd4076da66f4d48c6113c66] [no req]   Wikimedia\Rdbms\DBQueryError from line 1075 of /srv/mediawiki-staging/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  wl_id AS `id`,wl_namespace AS `ns`,wl_title AS `title`  FROM `watchlist`    WHERE ((wl_title LIKE '% %' ESCAPE '`' ) OR (wl_title LIKE '%\\r%' ESCAPE '`' ) OR (wl_title LIKE '%\\n%' ESCAPE '`' ) OR (wl_title LIKE '%\\t%' ESCAPE '`' ) OR (wl_title LIKE '`_%' ESCAPE '`' ) OR (wl_title LIKE '%`_' ESCAPE '`' ))  LIMIT 500  
Function: CleanupInvalidDbKeys::cleanupTable
Error: 1054 Unknown column 'wl_id' in 'field list' (10.192.32.103)

Backtrace:
#0 /srv/mediawiki-staging/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php(933): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#1 /srv/mediawiki-staging/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php(1269): Wikimedia\Rdbms\Database->query(string, string)
#2 /srv/mediawiki-staging/php-1.29.0-wmf.21/maintenance/cleanupInvalidDbKeys.php(164): Wikimedia\Rdbms\Database->select(string, array, array, string, array)
#3 /srv/mediawiki-staging/php-1.29.0-wmf.21/maintenance/cleanupInvalidDbKeys.php(86): CleanupInvalidDbKeys->cleanupTable(array)
#4 /srv/mediawiki-staging/php-1.29.0-wmf.21/maintenance/doMaintenance.php(111): CleanupInvalidDbKeys->execute()
#5 /srv/mediawiki-staging/php-1.29.0-wmf.21/maintenance/cleanupInvalidDbKeys.php(310): require_once(string)
#6 /srv/mediawiki-staging/multiversion/MWScript.php(99): require_once(string)
#7 {main}

wl_id is T130067

Seems DB changes are done in codfw, not in eqiad

So this can't really be run until eqiad is primary again and all the tables reimported

wl_id is not deployed into production, I was very clear at: T130067#3210420. I cannot do anything if people deploy features that do not exist in production. HEAD <> production.

It seems this was a one-time, run, not a long list of errors. Please wait on T130067 to be fully resolved to touch wl_id.

I guess this depends on T130067 now, then…

Yes, it is only deployed on eqiad so far and will take a few days (after the switchover) to get it done on codfw. So please do not use wl_id column until T130067 is marked as resolved.

Thanks!

OR you can run it on non-watchlist tables :-)

TTO added a comment.Apr 26 2017, 8:50 PM

I forgot about wl_id not being available on the cluster. Guess it's just good timing that it's going to be here very soon...

TTO added a comment.May 17 2017, 1:07 AM

@Reedy, want to try running this script again, initially as a dry run?

When running this on large.dblist wikis, be sure to exclude the pagelinks and templatelinks tables, as the query for those tables takes a very long time. An example command line with these tables excluded would be -t page -t redirect -t archive -t logging -t protected_titles -t category -t recentchanges -t watchlist -t categorylinks. It should be fine to run it across all tables on medium and small wikis.

Reedy added a comment.May 17 2017, 2:27 PM

I'll save you the grep, here are the non-zero entries:

bgwikinews:  *** Looking for invalid wl_title entries in watchlist...
bgwikinews:  *** Number of invalid rows: 2
bgwikinews:       wl_id |  ns | dbkey
bgwikinews:        4476 |  14 | Шоуто_по_TV7_сложи_юзди_на_най-желания_столичен_бохем_
bgwikinews:        5140 |  15 | Шоуто_по_TV7_сложи_юзди_на_най-желания_столичен_бохем_

hewikiquote:  *** Looking for invalid cat_title entries in category...
hewikiquote:  *** Number of invalid rows: 1
hewikiquote:      cat_id |  ns | dbkey
hewikiquote:         613 |  14 | סדרות טלוויזיה מצוירות

igwiki:  *** Looking for invalid cat_title entries in category...
igwiki:  *** Number of invalid rows: 1
igwiki:      cat_id |  ns | dbkey
igwiki:         249 |  14 | Article nke ntakiri

ltwikiquote:  *** Looking for invalid cat_title entries in category...
ltwikiquote:  *** Number of invalid rows: 1
ltwikiquote:      cat_id |  ns | dbkey
ltwikiquote:         797 |  14 | Lietuvių politikai

BTW, I have no proof of that, but this was discovered recently and it would fit (maybe?) the issues mentioned here: T163337 (some kind of refresh being executed twice and substracting more than once). I am speaking without seeing the code- I do not know if an addition/substraction is done or just full count is done each time.

TTO added a comment.May 18 2017, 1:48 AM

@jcrespo I think you want T18036 for that. This task is about invalid page titles, not incorrect counts.

Reedy added a comment.May 18 2017, 7:05 AM

Reedy added a comment.May 18 2017, 7:09 AM

I note, I don't know how long medium actually took; I started it and went away and didn't come back to it till morning.

large is running with all the -t's

TTO added a subscriber: Matanya.EditedMay 18 2017, 10:19 AM

Virtually all of the invalid rows in medium.log are category and categorylinks rows with spaces in the title. There are a few invalid titles in the logging tables as well:

enwikiquote:  *** Looking for invalid log_title entries in logging...
enwikiquote:  *** Number of invalid rows: 1
enwikiquote:      log_id |  ns | dbkey
enwikiquote:       59061 |  -1 | Contributions/66.90.103.130_
enwikiquote:  The following updates would be run with the --fix flag:
enwikiquote:  log_id=59061: update 'Contributions/66.90.103.130_' to 'Contributions/66.90.103.130'
enwikiquote:  *** Run with --fix to clean up these rows

eswiktionary:  *** Looking for invalid log_title entries in logging...
eswiktionary:  *** Number of invalid rows: 2
eswiktionary:      log_id |  ns | dbkey
eswiktionary:        2243 |   2 | Alhen_
eswiktionary:        1015 |   2 | Ppfk_
eswiktionary:  The following updates would be run with the --fix flag:
eswiktionary:  log_id=2243: update 'Alhen_' to 'Alhen'
eswiktionary:  log_id=1015: update 'Ppfk_' to 'Ppfk'
eswiktionary:  *** Run with --fix to clean up these rows

etwiki:  *** Looking for invalid log_title entries in logging...
etwiki:  *** Number of invalid rows: 1
etwiki:      log_id |  ns | dbkey
etwiki:        5612 |   2 | Athanasius_Soter_
etwiki:  The following updates would be run with the --fix flag:
etwiki:  log_id=5612: update 'Athanasius_Soter_' to 'Athanasius_Soter'
etwiki:  *** Run with --fix to clean up these rows

euwiki:  *** Looking for invalid log_title entries in logging...
euwiki:  *** Number of invalid rows: 1
euwiki:      log_id |  ns | dbkey
euwiki:         695 |   2 | YurikBot_
euwiki:  The following updates would be run with the --fix flag:
euwiki:  log_id=695: update 'YurikBot_' to 'YurikBot'
euwiki:  *** Run with --fix to clean up these rows

mediawikiwiki:  *** Looking for invalid log_title entries in logging...
mediawikiwiki:  *** Number of invalid rows: 1
mediawikiwiki:      log_id |  ns | dbkey
mediawikiwiki:       86339 |   2 | VoA_2_
mediawikiwiki:  The following updates would be run with the --fix flag:
mediawikiwiki:  log_id=86339: update 'VoA_2_' to 'VoA_2'
mediawikiwiki:  *** Run with --fix to clean up these rows

mswiki:  *** Looking for invalid log_title entries in logging...
mswiki:  *** Number of invalid rows: 2
mswiki:      log_id |  ns | dbkey
mswiki:       12236 |   2 | Alistair_
mswiki:        2625 |   2 | Malekhanif_
mswiki:  The following updates would be run with the --fix flag:
mswiki:  log_id=12236: update 'Alistair_' to 'Alistair'
mswiki:  log_id=2625: update 'Malekhanif_' to 'Malekhanif'
mswiki:  *** Run with --fix to clean up these rows

nlwiktionary:  *** Looking for invalid log_title entries in logging...
nlwiktionary:  *** Number of invalid rows: 1
nlwiktionary:      log_id |  ns | dbkey
nlwiktionary:       15527 |   2 | Aaaaghwlaguwlah!_*Throws_cat*_Aaaaaaah!_
nlwiktionary:  The following updates would be run with the --fix flag:
nlwiktionary:  log_id=15527: update 'Aaaaghwlaguwlah!_*Throws_cat*_Aaaaaaah!_' to 'Aaaaghwlaguwlah!_*Throws_cat*_Aaaaaaah!'
nlwiktionary:  *** Run with --fix to clean up these rows

ukwiktionary:  *** Looking for invalid log_title entries in logging...
ukwiktionary:  *** Number of invalid rows: 3
ukwiktionary:      log_id |  ns | dbkey
ukwiktionary:        4911 |   2 | Анатолій_Гончаров_(ґомосек)_
ukwiktionary:        4910 |   2 | Анатолій_Гончаров_(ґомосексуаліст)_
ukwiktionary:        4915 |   2 | Анатолій_Гончаров_(ґомосексуаліст)_
ukwiktionary:  The following updates would be run with the --fix flag:
ukwiktionary:  log_id=4911: update 'Анатолій_Гончаров_(ґомосек)_' to 'Анатолій_Гончаров_(ґомосек)'
ukwiktionary:  log_id=4910: update 'Анатолій_Гончаров_(ґомосексуаліст)_' to 'Анатолій_Гончаров_(ґомосексуаліст)'
ukwiktionary:  log_id=4915: update 'Анатолій_Гончаров_(ґомосексуаліст)_' to 'Анатолій_Гончаров_(ґомосексуаліст)'
ukwiktionary:  *** Run with --fix to clean up these rows

urwiki:  *** Looking for invalid log_title entries in logging...
urwiki:  *** Number of invalid rows: 1
urwiki:      log_id |  ns | dbkey
urwiki:         311 |   2 | AFRAZ_ULQURAISH_
urwiki:  The following updates would be run with the --fix flag:
urwiki:  log_id=311: update 'AFRAZ_ULQURAISH_' to 'AFRAZ_ULQURAISH'
urwiki:  *** Run with --fix to clean up these rows

The last one (urwiki) was noted by @Matanya at T155091#2933416.

So far, invalid data only exists in category, categorylinks and logging (and one invalid watchlist entry on bgwikinews). I'll be interested to see what dirty secrets the large wikis are hiding...

TTO added a comment.May 18 2017, 11:53 PM

More of the same in large.log, except for trwiki which actually has two invalid titles in the page table. The script can't fix those, so they'll need to be repaired manually.

What do we do now? Run the script with --fix on each wiki/table that needs fixing? Would that need a window set aside?

TTO renamed this task from Special:Categories on wikisource.org -> "Exception encountered, of type "Wikimedia\Assert\ParameterAssertionException" error to Special:Categories on some wikis errors out with "Exception encountered, of type "Wikimedia\Assert\ParameterAssertionException".
TTO added subscribers: He7d3r, Chicocvenancio.
Reedy added a comment.May 19 2017, 8:06 AM
In T99736#3274438, @TTO wrote:

More of the same in large.log, except for trwiki which actually has two invalid titles in the page table. The script can't fix those, so they'll need to be repaired manually.

What do we do now? Run the script with --fix on each wiki/table that needs fixing? Would that need a window set aside?

I'm fairly happy to JFDI

Reedy added a comment.EditedMay 19 2017, 8:16 AM

bgwikinews...

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 2
Number of invalid rows: 2
     wl_id |  ns | dbkey
      4476 |  14 | Шоуто_по_TV7_сложи_юзди_на_най-желания_столичен_бохем_
      5140 |  15 | Шоуто_по_TV7_сложи_юзди_на_най-желания_столичен_бохем_
*** Deleting invalid watchlist rows...
Deleting invalid watchlist rows...
*** Deleted 1 rows from watchlist.
Deleted 1 rows from watchlist.

Deleted one? Is it because of the same dbkey, but differing wl_id? Running it again...

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 0
Number of invalid rows: 0

Have we got an output error?

Reedy added a comment.EditedMay 19 2017, 8:19 AM

hewikiquote

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 1
Number of invalid rows: 1
    cat_id |  ns | dbkey
       613 |  14 | סדרות טלוויזיה מצוירות
*** Deleting invalid category rows...
Deleting invalid category rows...
*** Deleted 1 rows from category.
Deleted 1 rows from category.

igwiki

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 1
Number of invalid rows: 1
    cat_id |  ns | dbkey
       249 |  14 | Article nke ntakiri
*** Deleting invalid category rows...
Deleting invalid category rows...
*** Deleted 1 rows from category.
Deleted 1 rows from category.

ltwikiquote

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 1
Number of invalid rows: 1
    cat_id |  ns | dbkey
       797 |  14 | Lietuvių politikai
*** Deleting invalid category rows...
Deleting invalid category rows...
*** Deleted 1 rows from category.
Deleted 1 rows from category.

The output is a bit icky...

reedy@tin:~$ mwscript cleanupInvalidDbKeys.php ltwikiquote --fix
*** Looking for invalid page_title entries in page...
Looking for invalid page_title entries in page...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rd_title entries in redirect...
Looking for invalid rd_title entries in redirect...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid ar_title entries in archive...
Looking for invalid ar_title entries in archive...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid log_title entries in logging...
Looking for invalid log_title entries in logging...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pt_title entries in protected_titles...
Looking for invalid pt_title entries in protected_titles...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 1
Number of invalid rows: 1
    cat_id |  ns | dbkey
       797 |  14 | Lietuvių politikai
*** Deleting invalid category rows...
Deleting invalid category rows...
*** Deleted 1 rows from category.
Deleted 1 rows from category.

*** Looking for invalid rc_title entries in recentchanges...
Looking for invalid rc_title entries in recentchanges...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pl_title entries in pagelinks...
Looking for invalid pl_title entries in pagelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid tl_title entries in templatelinks...
Looking for invalid tl_title entries in templatelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cl_to entries in categorylinks...
Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Done!
Done!*** Cleaned up invalid DB keys on ltwikiquote!
 Cleaned up invalid DB keys on ltwikiquote!

Lots of double output? :)

Reedy added a comment.May 19 2017, 8:35 AM

Hmm...

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 2
Number of invalid rows: 2
    cat_id |  ns | dbkey
     25861 |  14 | Bandar di Lombardy
     24052 |  14 | Bandar raya, bandar dan kampung di Kelantan
*** Deleting invalid category rows...
Deleting invalid category rows...
*** Deleted 1 rows from category.
Deleted 1 rows from category.

Those titles differ. We have a bug :)

Reedy added a comment.May 19 2017, 8:40 AM

medium done

TTO added a comment.EditedMay 19 2017, 8:53 AM

The script's help reminds you to redirect stdout to a text file :)

Also, no idea why the affected count is wrong. It's just printing $dbw->affectedRows()

Reedy added a comment.May 19 2017, 8:54 AM
arwiki:  *** Number of invalid rows: 0
arwiki:  *** Looking for invalid cat_title entries in category...
Number of invalid rows: 3
arwiki:  *** Number of invalid rows: 3
arwiki:      cat_id |  ns | dbkey
arwiki:       37001 |  14 | أشخاص على قيد الحياة
arwiki:       31886 |  14 | جغرافيا الإمارات
arwiki:       37996 |  14 | خريجو ساندهيرست
Deleting invalid category rows...
arwiki:  *** Deleting invalid category rows...
Deleted 1 rows from category.
Reedy added a comment.May 20 2017, 1:26 PM

It's finished on large wikis too

medium done

It's finished on large wikis too

what about plwikisource?
https://pl.wikisource.org/w/index.php?title=Specjalna:Kategorie&offset=&limit=500
there are still errors, ie:

+---------+--------------------------+-----------+-------------+-----------+
| cat_id  | cat_title                | cat_pages | cat_subcats | cat_files |
+---------+--------------------------+-----------+-------------+-----------+
|    1433 | Andrzej Frycz-Modrzewski |        -2 |          -1 |          0|
|  362879 | Bez treści               |         0 |           0 |          0|
+---------+--------------------------+-----------+-------------+-----------+

Z.

Reedy added a comment.May 20 2017, 5:23 PM

It was run on all wikis.

It kind of seems we're still getting things appearing. @TTO noticed earlier that that oldwikisource was broken again

Running the script again fixed them...

Which kinda seems like something is still creating these bad rows in some very small amounts...

Reedy added a comment.May 20 2017, 5:25 PM
reedy@tin:~$ mwscript cleanupInvalidDbKeys.php plwikisource --fix
*** Looking for invalid page_title entries in page...
Looking for invalid page_title entries in page...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rd_title entries in redirect...
Looking for invalid rd_title entries in redirect...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid ar_title entries in archive...
Looking for invalid ar_title entries in archive...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid log_title entries in logging...
Looking for invalid log_title entries in logging...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pt_title entries in protected_titles...
Looking for invalid pt_title entries in protected_titles...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 2
Number of invalid rows: 2
    cat_id |  ns | dbkey
      1433 |  14 | Andrzej Frycz-Modrzewski
    362879 |  14 | Bez treści
*** Deleting invalid category rows...
Deleting invalid category rows...
*** Deleted 1 rows from category.
Deleted 1 rows from category.

*** Looking for invalid rc_title entries in recentchanges...
Looking for invalid rc_title entries in recentchanges...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pl_title entries in pagelinks...
Looking for invalid pl_title entries in pagelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid tl_title entries in templatelinks...
Looking for invalid tl_title entries in templatelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cl_to entries in categorylinks...
Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Done!
Done!*** Cleaned up invalid DB keys on plwikisource!
 Cleaned up invalid DB keys on plwikisource!
reedy@tin:~$

It was run on all wikis.

It kind of seems we're still getting things appearing. @TTO noticed earlier that that oldwikisource was broken again

Running the script again fixed them...

Which kinda seems like something is still creating these bad rows in some very small amounts...

these errors were identified earlier in F8114681 :

...
Looking for invalid cat_title entries in category...
plwikisource:  *** Number of invalid rows: 2
plwikisource:      cat_id |  ns | dbkey
plwikisource:        1433 |  14 | Andrzej Frycz-Modrzewski
plwikisource:      362879 |  14 | Bez treści
plwikisource:  *** Run with --fix to clean up these rows
...

but it looks like they have not been fixed at that time.

 reedy@tin:~$ mwscript cleanupInvalidDbKeys.php plwikisource --fix
...
 Number of invalid rows: 2
     cat_id |  ns | dbkey
       1433 |  14 | Andrzej Frycz-Modrzewski
     362879 |  14 | Bez treści
 *** Deleting invalid category rows...
 Deleting invalid category rows...
 *** Deleted 1 rows from category.
 Deleted 1 rows from category.
...
 *** Done!

thank you!

Z.

Are we done fixing the large wikis too, or just checking them?

Reedy added a comment.May 20 2017, 8:20 PM

I had fixed them, yes.

Currently re-running the script over all wikis, will post the log when it's done

medium done

I have made some test (for data from medium log F8114681 ):

bnwikisource_p:
+---------+--------------------------+-----------+-------------+-----------+
| cat_id  | cat_title                | cat_pages | cat_subcats | cat_files |
+---------+--------------------------+-----------+-------------+-----------+
|    15973|    মুদ্রণ সংশোধন করা হয়নি |         0 |           0 |          0|
+---------+--------------------------+-----------+-------------+-----------+

afwiki_p:
+---------+--------------------------+-----------+-------------+-----------+
| cat_id  | cat_title                | cat_pages | cat_subcats | cat_files |
+---------+--------------------------+-----------+-------------+-----------+
|    3790 | Geskiedenis van Suid-A...|       -16 |           0 |          0|
|    3958 | Verenigde State van Am...|        -1 |           0 |          0|
|    3961 |            Romaanse tale |        -3 |          -1 |          0|
|    4088 |        Wiskunde saadjies |        -1 |           0 |          0|
+---------+--------------------------+-----------+-------------+-----------+

cawikisource_p:
+---------+--------------------------+-----------+-------------+-----------+
| cat_id  | cat_title                | cat_pages | cat_subcats | cat_files |
+---------+--------------------------+-----------+-------------+-----------+
|    33033|             Sense revisar|         0 |           0 |          0|
+---------+--------------------------+-----------+-------------+-----------+

It seems that no fixes have been made on medium wikis, they have been only checked.

TTO added a comment.May 23 2017, 5:22 AM

@Reedy did that script ever finish?

Reedy added a comment.May 23 2017, 6:32 AM

TTO added a comment.EditedMay 23 2017, 10:05 AM

The wikis that still need cleanup are:

  • afwiki, arzwiki, azwiki, bewiki, bnwiki, bnwikisource, brwiki, cawikisource (same rows as before, with same primary key values) - so nothing in medium.log (F8114681) was fixed until it got down to cswiki
  • commonswiki - 1 invalid pagelinks row (we didn't run this over the pagelinks table on large wikis before, so this one wouldn't have been found last time around)
  • trwiki - 2 invalid titles in the page table, which need to be repaired by hand
  • zhwiki, zhwiktionary, zhwikibooks - many invalid categorylinks entries persist. The script was clearly run with --fix on these sites, because the invalid category rows are gone. Might need to investigate what's happening here

So if you could run the script again with --fix on afwiki, arzwiki, azwiki, bewiki, bnwiki, bnwikisource, brwiki, cawikisource, commonswiki (-t pagelinks only to save time), and maybe also on the zh sites to see whether the bogus categorylinks go away.

Reedy added a comment.May 23 2017, 3:54 PM
$ mwscript cleanupInvalidDbKeys.php commonswiki -t pagelinks --fix
*** Looking for invalid pl_title entries in pagelinks...
Looking for invalid pl_title entries in pagelinks...
*** Number of invalid rows: 1
Number of invalid rows: 1
   pl_from |  ns | dbkey
  59121748 | 100 | Polish_Patent_Office's_Publication_Server_
*** Queueing link update jobs for the pages in pl_from...
Queueing link update jobs for the pages in pl_from...
*** Link update jobs have been added to the job queue.
Link update jobs have been added to the job queue.

*** Done!
Done!*** Cleaned up invalid DB keys on commonswiki!
 Cleaned up invalid DB keys on commonswiki!
Reedy added a comment.May 23 2017, 7:26 PM
reedy@terbium:~$ mwscript cleanupInvalidDbKeys.php zhwiki --fix
*** Looking for invalid page_title entries in page...
Looking for invalid page_title entries in page...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rd_title entries in redirect...
Looking for invalid rd_title entries in redirect...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid ar_title entries in archive...
Looking for invalid ar_title entries in archive...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid log_title entries in logging...
Looking for invalid log_title entries in logging...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pt_title entries in protected_titles...
Looking for invalid pt_title entries in protected_titles...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rc_title entries in recentchanges...
Looking for invalid rc_title entries in recentchanges...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pl_title entries in pagelinks...
Looking for invalid pl_title entries in pagelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid tl_title entries in templatelinks...
Looking for invalid tl_title entries in templatelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cl_to entries in categorylinks...
Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 31
Number of invalid rows: 31
   cl_from |  ns | dbkey
   4985406 |  14 | Windows Phone软件
   4014590 |  14 | X Window系统
   2928606 |  14 | Zh-yue 母语使用者
   1480103 |  14 | 中美关系 (清朝)
   5632418 |  14 | 南北線 (東京地下鐵)
   1097412 |  14 | 卡爾霍恩縣 (密西西比州)
   5561090 |  14 | 喜歡Monsta X的維基人
   4441535 |  14 | 地区 (中国行政区划)
   4441637 |  14 | 地区 (中国行政区划)
   5426840 |  14 | 奥林匹克里昂 (女子)
   5094715 |  14 | 幼稚園 (雜誌)
   3414906 |  14 | 文化廣播 (韓國)
   5682199 |  14 | 日本交流道 (按罗马拼音分类)
   1870560 |  14 | 日本鐵路車輛 (營運業者別)
   1930875 |  14 | 日本鐵路車輛 (營運業者別)
   1882618 |  14 | 日本鐵路車輛 (營運業者別)
   1928125 |  14 | 日本鐵路車輛 (營運業者別)
   1930752 |  14 | 日本鐵路車輛 (營運業者別)
   1930835 |  14 | 日本鐵路車輛 (營運業者別)
   5506998 |  14 | 東區 (福岡市)
   5376775 |  14 | 林堡省 (荷蘭)
   5624046 |  14 | 歷史修正主義 (否認)
   4432852 |  14 | 江原道 (北)鐵路車站
   3819557 |  14 | 没有ISO 639-3代码的语言
   1526097 |  14 | 組織 (生物)
   3127803 |  14 | 西區 (橫濱市)
   3734851 |  14 | 铁路机车 (按制造商)
   5587027 |  14 | 香港公司 (按行業分類)
   5587029 |  14 | 香港公司 (按行業分類)
   5587043 |  14 | 香港公司 (按行業分類)
   1169359 |  14 | 鼓楼区 (福州市)
*** Queueing link update jobs for the pages in cl_from...
Queueing link update jobs for the pages in cl_from...
*** Link update jobs have been added to the job queue.
Link update jobs have been added to the job queue.

*** Done!
Done!*** Cleaned up invalid DB keys on zhwiki!
 Cleaned up invalid DB keys on zhwiki!
Reedy added a comment.May 23 2017, 7:29 PM
reedy@terbium:~$ mwscript cleanupInvalidDbKeys.php zhwiktionary --fix
*** Looking for invalid page_title entries in page...
Looking for invalid page_title entries in page...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rd_title entries in redirect...
Looking for invalid rd_title entries in redirect...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid ar_title entries in archive...
Looking for invalid ar_title entries in archive...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid log_title entries in logging...
Looking for invalid log_title entries in logging...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pt_title entries in protected_titles...
Looking for invalid pt_title entries in protected_titles...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rc_title entries in recentchanges...
Looking for invalid rc_title entries in recentchanges...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pl_title entries in pagelinks...
Looking for invalid pl_title entries in pagelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid tl_title entries in templatelinks...
Looking for invalid tl_title entries in templatelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cl_to entries in categorylinks...
Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 14
Number of invalid rows: 14
   cl_from |  ns | dbkey
   1363511 |  14 | 俄语 人
   1363122 |  14 | 日语 动物
   1185806 |  14 | 藏语 人
   1185808 |  14 | 藏语 人
   1188276 |  14 | 藏语 人
   1190876 |  14 | 藏语 人
   1190962 |  14 | 藏语 人
   1185811 |  14 | 藏语 人
   1190867 |  14 | 藏语 人
   1190870 |  14 | 藏语 人
   1190915 |  14 | 藏语 人
   1179669 |  14 | 藏语 动物
   1179642 |  14 | 藏语 动物
   1188254 |  14 | 藏语 动物
*** Queueing link update jobs for the pages in cl_from...
Queueing link update jobs for the pages in cl_from...
*** Link update jobs have been added to the job queue.
Link update jobs have been added to the job queue.

*** Done!
Done!*** Cleaned up invalid DB keys on zhwiktionary!
 Cleaned up invalid DB keys on zhwiktionary!
reedy@terbium:~$ mwscript cleanupInvalidDbKeys.php zhwikibooks --fix
*** Looking for invalid page_title entries in page...
Looking for invalid page_title entries in page...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rd_title entries in redirect...
Looking for invalid rd_title entries in redirect...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid ar_title entries in archive...
Looking for invalid ar_title entries in archive...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid log_title entries in logging...
Looking for invalid log_title entries in logging...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pt_title entries in protected_titles...
Looking for invalid pt_title entries in protected_titles...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rc_title entries in recentchanges...
Looking for invalid rc_title entries in recentchanges...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pl_title entries in pagelinks...
Looking for invalid pl_title entries in pagelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid tl_title entries in templatelinks...
Looking for invalid tl_title entries in templatelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cl_to entries in categorylinks...
Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 3
Number of invalid rows: 3
   cl_from |  ns | dbkey
     15789 |  14 | Blender 3D︰从入门到精通
     15790 |  14 | Blender 3D︰从入门到精通
     15792 |  14 | Blender 3D︰从入门到精通
*** Queueing link update jobs for the pages in cl_from...
Queueing link update jobs for the pages in cl_from...
*** Link update jobs have been added to the job queue.
Link update jobs have been added to the job queue.

*** Done!
Done!*** Cleaned up invalid DB keys on zhwikibooks!
 Cleaned up invalid DB keys on zhwikibooks!
Reedy added a comment.May 23 2017, 8:10 PM
In T99736#3285790, @TTO wrote:
  • afwiki, arzwiki, azwiki, bewiki, bnwiki, bnwikisource, brwiki, cawikisource (same rows as before, with same primary key values) - so nothing in medium.log (F8114681) was fixed until it got down to cswiki

Fixed up

In T99736#3285790, @TTO wrote:
  • trwiki - 2 invalid titles in the page table, which need to be repaired by hand
Looking for invalid page_title entries in page...
*** Number of invalid rows: 2
Number of invalid rows: 2
   page_id |  ns | dbkey
   2123254 |   6 | Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg_
   2123693 |   6 | Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_

And of course, the pages without the trailing _ exist too..

mysql:wikiadmin@db1076 [trwiki]> select page_id, page_title from page where page_namespace = 6 and page_title LIKE 'Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg%';
+---------+---------------------------------------------------------------+
| page_id | page_title                                                    |
+---------+---------------------------------------------------------------+
|  978989 | Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg  |
| 2123254 | Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg_ |
+---------+---------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql:wikiadmin@db1076 [trwiki]> select page_id, page_title from page where page_namespace = 6 and page_title LIKE 'Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg%';
+---------+-------------------------------------------------------------------+
| page_id | page_title                                                        |
+---------+-------------------------------------------------------------------+
|  439047 | Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg  |
| 2123693 | Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_ |
+---------+-------------------------------------------------------------------+
2 rows in set (0.00 sec)

So deleted those two via the api sandbox...

2017-05-23 20:09:02 [WSSW3gpAEKoAACxRJqMAAACN] mw1215 trwiki 1.30.0-wmf.1 exception ERROR: [WSSW3gpAEKoAACxRJqMAAACN] /wiki/%C3%96zel:G%C3%BCnl%C3%BCk/Reedy_(WMF)   Wikimedia\Assert\ParameterAssertionException from line 63 of /srv/mediawiki/php-1.30.0-wmf.1/vendor/wikimedia/assert/src/Assert.php: Bad value for parameter $dbkey: invalid DB key 'Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_' {"exception_id":"WSSW3gpAEKoAACxRJqMAAACN","caught_by":"mwe_handler"}

Helpful :)

Running for trwiki again with --fix to fix the logging table... lol

Ankry added a comment.May 23 2017, 9:31 PM
*** Looking for invalid cl_to entries in categorylinks...
Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 3
Number of invalid rows: 3
   cl_from |  ns | dbkey
     15789 |  14 | Blender 3D︰从入门到精通
     15790 |  14 | Blender 3D︰从入门到精通
     15792 |  14 | Blender 3D︰从入门到精通
*** Queueing link update jobs for the pages in cl_from...
Queueing link update jobs for the pages in cl_from...
*** Link update jobs have been added to the job queue.
Link update jobs have been added to the job queue.

*** Done!
Done!*** Cleaned up invalid DB keys on zhwikibooks!
 Cleaned up invalid DB keys on zhwikibooks!

According to Quarry: https://quarry.wmflabs.org/query/18646 these entries still exist!
Maybe, some extra actions are needed except database cleaning?

Reedy added a comment.May 23 2017, 9:41 PM
reedy@terbium:~$ mwscript cleanupInvalidDbKeys.php trwiki --fix
*** Looking for invalid page_title entries in page...
Looking for invalid page_title entries in page...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rd_title entries in redirect...
Looking for invalid rd_title entries in redirect...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid ar_title entries in archive...
Looking for invalid ar_title entries in archive...
*** Number of invalid rows: 2
Number of invalid rows: 2
     ar_id |  ns | dbkey
   2640374 |   6 | Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg_
   2640375 |   6 | Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_
*** Updating these rows, setting ar_title to the closest valid DB key...
Updating these rows, setting ar_title to the closest valid DB key...
ar_id=2640374: updating 'Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg_' to 'Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg'
ar_id=2640375: updating 'Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_' to 'Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg'
*** Updated 2 rows on archive.
Updated 2 rows on archive.

*** Looking for invalid log_title entries in logging...
Looking for invalid log_title entries in logging...
*** Number of invalid rows: 2
Number of invalid rows: 2
    log_id |  ns | dbkey
   9453217 |   6 | Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg_
   9453220 |   6 | Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_
*** Updating these rows, setting log_title to the closest valid DB key...
Updating these rows, setting log_title to the closest valid DB key...
log_id=9453217: updating 'Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg_' to 'Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg'
log_id=9453220: updating 'Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_' to 'Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg'
*** Updated 2 rows on logging.
Updated 2 rows on logging.

*** Looking for invalid pt_title entries in protected_titles...
Looking for invalid pt_title entries in protected_titles...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cat_title entries in category...
Looking for invalid cat_title entries in category...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid rc_title entries in recentchanges...
Looking for invalid rc_title entries in recentchanges...
*** Number of invalid rows: 2
Number of invalid rows: 2
     rc_id |  ns | dbkey
  29357268 |   6 | Renoir,_Pierre-Auguste_-_The_Two_Sisters,_On_the_Terrace.jpg_
  29357270 |   6 | Supplicating_Pilgrim_at_Masjid_Al_Haram._Mecca,_Saudi_Arabia.jpg_
*** Deleting invalid recentchanges rows...
Deleting invalid recentchanges rows...
*** Deleted 1 rows from recentchanges.
Deleted 1 rows from recentchanges.

*** Looking for invalid wl_title entries in watchlist...
Looking for invalid wl_title entries in watchlist...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid pl_title entries in pagelinks...
Looking for invalid pl_title entries in pagelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid tl_title entries in templatelinks...
Looking for invalid tl_title entries in templatelinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Looking for invalid cl_to entries in categorylinks...
Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 0
Number of invalid rows: 0

*** Done!
Done!*** Cleaned up invalid DB keys on trwiki!
 Cleaned up invalid DB keys on trwiki!
Reedy added a comment.May 23 2017, 9:45 PM

According to Quarry: https://quarry.wmflabs.org/query/18646 these entries still exist!
Maybe, some extra actions are needed except database cleaning?

Confirmed still there on the master

Some bug related to enqueuing jobs?

TTO added a comment.May 24 2017, 4:04 AM

Seems to be a MediaWiki bug. I created https://zh.wikibooks.org/wiki/User:This,_that_and_the_other/test (page id 23795) with the text [[Category:Blender 3D︰從入門到精通|插图集 ]] and a brand new invalid categorylinks row got added:

*** Looking for invalid cl_to entries in categorylinks...
*** Number of invalid rows: 4
   cl_from |  ns | dbkey
     15789 |  14 | Blender 3D︰从入门到精通
     15790 |  14 | Blender 3D︰从入门到精通
     15792 |  14 | Blender 3D︰从入门到精通
     23795 |  14 | Blender 3D︰从入门到精通     <<<<<< this one is new
*** Run with --fix to clean up these rows
TTO assigned this task to Reedy.May 24 2017, 6:10 AM
TTO closed this task as Resolved.

I'm going to mark this task resolved, as Special:Categories should now be working on all WMF wikis. I've created T166198: Invalid DB key being added to categorylinks table on zhwikibooks about the categorylinks issue. Thanks @Reedy, you're a legend!

Change 349490 abandoned by Bartosz Dziewoński:
Ignore broken entries on Special:Categories

https://gerrit.wikimedia.org/r/349490

Thank you both for working on this!