Special:Categories on wikisource.org -> "Exception encountered, of type "Wikimedia\Assert\ParameterAssertionException" error
Open, HighPublic

Description

Special:Categories on wikisource.org returns error "Exception encountered, of type "Wikimedia\Assert\ParameterAssertionException" when lists "near" the Category:Norsk:

https://wikisource.org/w/index.php?title=Special:Categories&dir=prev&offset=Not_proofread&limit=1
https://wikisource.org/w/index.php?title=Special:Categories&offset=Norsk&limit=100
https://wikisource.org/w/index.php?title=Special:Categories&dir=prev&offset=User_ru&limit=500

Regards, Z.

Zdzislaw created this task.May 19 2015, 9:55 PM
Zdzislaw added a project: Wikisource.
Zdzislaw added a subscriber: Zdzislaw.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 19 2015, 9:55 PM
Legoktm triaged this task as "High" priority.May 19 2015, 9:58 PM
Legoktm set Security to None.
Krenair added a subscriber: Krenair.
2015-05-20 19:17:38 mw1067 sourceswiki exception INFO: [50321e84] /w/index.php?title=Special:Categories&dir=prev&offset=Not_proofread&limit=1   Wikimedia\Assert\ParameterAssertionException from line 63 of /srv/mediawiki/php-1.26wmf6/vendor/wikimedia/assert/src/Assert.php: Bad value for parameter $dbkey: invalid DB key
#0 /srv/mediawiki/php-1.26wmf6/includes/title/TitleValue.php(76): Wikimedia\Assert\Assert::parameter()
#1 /srv/mediawiki/php-1.26wmf6/includes/specials/SpecialCategories.php(174): TitleValue->__construct()
#2 /srv/mediawiki/php-1.26wmf6/includes/pager/IndexPager.php(436): CategoryPager->formatRow()
#3 /srv/mediawiki/php-1.26wmf6/includes/specials/SpecialCategories.php(170): IndexPager->getBody()
#4 /srv/mediawiki/php-1.26wmf6/includes/specials/SpecialCategories.php(85): CategoryPager->getBody()
#5 /srv/mediawiki/php-1.26wmf6/includes/specialpage/SpecialPage.php(384): SpecialCategories->execute()
#6 /srv/mediawiki/php-1.26wmf6/includes/specialpage/SpecialPageFactory.php(582): SpecialPage->run()
#7 /srv/mediawiki/php-1.26wmf6/includes/MediaWiki.php(285): SpecialPageFactory::executePath()
#8 /srv/mediawiki/php-1.26wmf6/includes/MediaWiki.php(603): MediaWiki->performRequest()
#9 /srv/mediawiki/php-1.26wmf6/includes/MediaWiki.php(431): MediaWiki->main()
#10 /srv/mediawiki/php-1.26wmf6/index.php(46): MediaWiki->run()
#11 /srv/mediawiki/w/index.php(3): include()
#12 {main} {"private":false}

There's "Not proofread" (without the _) and this is matching the preg_match in TitleValue. I guess someone with access to DB should fix it manually?

+--------+---------------+-----------+-------------+-----------+
| cat_id | cat_title     | cat_pages | cat_subcats | cat_files |
+--------+---------------+-----------+-------------+-----------+
|  56001 | Not proofread |         9 |           0 |         0 |
|   1955 | Not_proofread |      7679 |           0 |         0 |

We also need to know how it made it into the table in the first place.

This category is populated by the extension.

Krenair added a comment.EditedMay 26 2015, 1:30 PM

I null edited one of the pages linking to category #56001... Seems to have fixed it. Certainly no need to manually fix the database, but let's see if we can work out why this happened before null editing the problem away.

Glaisher added a subscriber: Tpt.May 26 2015, 3:56 PM
Tpt added a comment.Jun 6 2015, 12:48 AM

I have no idea of the root cause of the issue. The extension adds the category to the ParserOutput object and doesn't do any direct change to this database table

@hashar: As you moved this to "Being worked on" on the Wikimedia-log-errors workboard, any known assignee for this task?
Asking as this problem still happens.

TTO added a subscriber: TTO.Oct 5 2015, 9:39 AM

Noting that this is still happening despite Not proofread (with space) being listed as empty in the DB, though still present there:

MariaDB [enwikisource_p]> select * from category where cat_title like "% %";
+---------+--------------------------+-----------+-------------+-----------+
| cat_id  | cat_title                | cat_pages | cat_subcats | cat_files |
+---------+--------------------------+-----------+-------------+-----------+
|    7276 | Epic poetry              |       -11 |           0 |         0 |
|    7277 | Song lyrics              |        -8 |           0 |         0 |
|    7659 | Speedy deletion requests |        -2 |           0 |         0 |
|    7856 | Law, Islamic             |        -1 |           0 |         0 |
| 3614412 | Without text             |         0 |           0 |         0 |
| 3614486 | Not proofread            |         0 |           0 |         0 |
+---------+--------------------------+-----------+-------------+-----------+
6 rows in set (0.03 sec)

Notice the different cat_id value from above (Not proofread was 56001).

(I think the negative rows are a red herring, enwiki has a bunch of those as well which don't seem to cause problems there.)

Not marking this as a dupe for now, in case ProofreadPage is somehow to blame here.

We also need to know how it made it into the table in the first place.

For the ones with ProofreadPage categories (e.g. "Without text" or "Not proofread"), this was a bug that was fixed in 840064047c3f8e33e3a0ad8a5858e98696fdb609 (2013-12-04).

For the others, no idea. but the low id numbers suggest this being ancient history… definitely before 2009 (deletion of Category:Law, Islamic), perhaps as early as 2005 (creation of Category:Epic poetry). It will be difficult to determine the root cause, which has clearly been fixed at some point, since .

I can easily reproduce this on enwiki. https://en.wikipedia.org/wiki/Special:Categories?from=Righteous

This is not a ProofreadPage bug.

enwiki has 34 broken entries:

mysql:research@analytics-store.eqiad.wmnet [enwiki]> select * from category where cat_title like "% %";
+--------+-------------------------------------------------------------+-----------+-------------+-----------+
| cat_id | cat_title                                                   | cat_pages | cat_subcats | cat_files |
+--------+-------------------------------------------------------------+-----------+-------------+-----------+
| 636434 | Righteous Among the Nations                                 |       -23 |           0 |         0 |
| 641458 | Chicago, Illinois                                           |        -4 |           0 |         0 |
| 644770 | House of Welf                                               |        -5 |           0 |         0 |
| 645269 | Companies based in Peterborough                             |        -1 |          -1 |         0 |
| 649684 | WikiProject Linux articles                                  |        -3 |          -3 |         0 |
| 651939 | Welsh Americans                                             |      -388 |           0 |         0 |
| 653890 | Cities, towns and villages in the Punjab Region of Pakistan |        -1 |           0 |         0 |
| 653896 | Leonese-language writers                                    |        -2 |           0 |         0 |
| 653999 | Arab Female film directors                                  |        -2 |           0 |         0 |
| 661676 | Articles with unsourced statements since March 2009         |        -2 |           0 |         0 |
| 669151 | Behavioural sciences                                        |        -6 |           0 |         0 |
| 683386 | Germans of Polish descent                                   |        -4 |           0 |         0 |
| 685531 | Subud Members by nationality                                |        -2 |          -2 |         0 |
| 686663 | Athletic Bilbao footballers                                 |        -1 |           0 |         0 |
| 687103 | People from Yaoundé                                         |       -13 |           0 |         0 |
| 688746 | Military of Georgia (country)                               |       -20 |           0 |         0 |
| 690083 | Systems biology                                             |        -5 |           0 |         0 |
| 690354 | Torah people                                                |        -2 |           0 |         0 |
| 690474 | Lemony Snicket                                              |        -1 |           0 |         0 |
| 691155 | Irish television programmes                                 |       -27 |           0 |         0 |
| 692935 | Valencian Community geography stubs                         |       -44 |           0 |         0 |
| 693222 | Video game franchises                                       |        -7 |           0 |         0 |
| 694331 | Lemony Snicket stubs                                        |       -21 |           0 |         0 |
| 696725 | Swaminarayan sect of Hinduism                               |       -10 |           0 |       -10 |
| 698443 | Buildings and structures in Rhondda Cynon Taff              |        -3 |          -3 |         0 |
| 698897 | Schools in Rhondda Cynon Taff                               |        -1 |           0 |         0 |
| 702670 | Swiss-Romanian people                                       |        -1 |           0 |         0 |
| 703741 | Pigeons and doves                                           |       -72 |          -1 |        -5 |
| 703946 | Liechtensteinian footballers                                |        -1 |          -1 |         0 |
| 706752 | Daniel Dumile songs                                         |        -1 |           0 |         0 |
| 707288 | National sports teams of FYR of Macedonia                   |        -5 |           0 |         0 |
| 713511 | Companies listed on Rwanda Over The Counter Exchange        |        -1 |           0 |         0 |
| 719433 | Heat transfer                                               |        -1 |           0 |         0 |
| 721931 | Córdoba Province (Argentina)                                |        -1 |           0 |         0 |
+--------+-------------------------------------------------------------+-----------+-------------+-----------+
34 rows in set (0.64 sec)

@TTO Let's say that this is the most common special-case of T155091, and like you said there:

We could also do with a script that handles categories with negative cat_pages values. Literally all it'd have to do is $dbr->select( 'category', 'cat_id', [ 'cat_pages < 0' ], __METHOD__ ) then Category::newFromID( $row->cat_id )->refreshCounts() on each row.

…it could be handled with a simple special-case script.

We already have maintenance/cleanupEmptyCategories.php, which was ran fairly recently (T140811). I propose that we change it to delete category entries with negative count too (right now, it only does zeroes) and re-run it. This should be easier for Operations and DBA to stomach than running an entirely new script which deletes things.

We already have maintenance/cleanupEmptyCategories.php, which was ran fairly recently (T140811). I propose that we change it to delete category entries with negative count too (right now, it only does zeroes) and re-run it. This should be easier for Operations and DBA to stomach than running an entirely new script which deletes things.

@Anomie Since you handled T140811… what to do think of this idea, and would you be able to help with getting the script reran?

TTO added a subscriber: jcrespo.Feb 16 2017, 11:00 PM

;I think there would be value in getting https://gerrit.wikimedia.org/r/333486/ reviewed and run, in part because these DB key errors exist in tables other than category as well. @jcrespo seemed quite open to the idea of it. At least for the category table, it's not doing any DELETEs that Category::refreshCounts wouldn't already do.

I also did https://gerrit.wikimedia.org/r/333917/ (for T18765: Write a maintenance script to refresh category member counts), which is also waiting for review. That would fix the problem on enwiki, since all the problematic categories have incorrect counts. I seem to remember that this is not the case on some smaller wikis, though.

You're of course welcome to write a third script to solve this problem if you think it would be useful, but I think there would be more value in getting at least one of those scripts reviewed and run.

@matmarex wrote:

This is not a ProofreadPage bug.

That's not what I said:

@TTO wrote:

...in case ProofreadPage is somehow to blame here.

It's possible that ProofreadPage is not always normalising category names. Clearly the enwiki problems are not related to that (so we should really be having this discussion in T155091: "Invalid DB key" errors on various special pages or another task) but there may still be something to solve in ProofreadPage.

Ankry added a subscriber: Ankry.Feb 17 2017, 1:07 PM
hashar removed a subscriber: hashar.Feb 17 2017, 1:56 PM

We already have maintenance/cleanupEmptyCategories.php, which was ran fairly recently (T140811). I propose that we change it to delete category entries with negative count too (right now, it only does zeroes) and re-run it. This should be easier for Operations and DBA to stomach than running an entirely new script which deletes things.

@Anomie Since you handled T140811… what to do think of this idea,

I don't see a problem with it; I suppose these rows still exist because whatever code path would normally have cleaned them up when they went negative instead recounted the correctly-formatted version. But it sounds like TTO wants to make a more comprehensive solution, which is probably not a bad idea.

@jcrespo seemed quite open to the idea of it

I do not have a problem with this, you can do it no matter how you want it, and do not need to block on me (I would appreciate reviews of course).

The only things I recommend are: destructive operations, like batch updates and deleting rows, should be scheduled on the deployments weekly section to avoid collisions with schema changes; they should have a dry-run option (or a --really-realy-break-the-db :-)), and it should log to a file modifications e.g. ids for easier recovery if something goes wrong; batch updates and wait for slaves to avoid lag. Very long maintenance tasks should be puppetized on a cron or similar and done in small runs. The script should be idempotent as much as possible. They are not rigid rules (just use common sense) - there are more and less important tables/columns (page titles vs. category counts), and they keep me and the backups happy and undisturbed :-D.

Zdzislaw added a comment.EditedWed, Apr 19, 10:03 PM

any chance to solve this problem?
on pl ws:
https://pl.wikisource.org/w/index.php?title=Specjalna:Kategorie&offset=&limit=500

[WPfe6wrAEFcAABlogTAAAABV] 2017-04-19 22:04:27: Krytyczny wyjątek typu "Wikimedia\Assert\ParameterAssertionException"

please,
cat browsing is an (very) important way to search content on pl ws.

Z.

TTO added a comment.Thu, Apr 20, 2:04 AM

It would be great if someone could review and potentially merge https://gerrit.wikimedia.org/r/333486! That way, we can fix these problems on all wikis.

Ankry awarded a token.Thu, Apr 20, 8:51 AM

Change 349490 had a related patch set uploaded (by Bartosz Dziewoński):
[mediawiki/core@master] Ignore broken entries on Special:Categories

https://gerrit.wikimedia.org/r/349490