Page MenuHomePhabricator

Fatal error when viewing some Wikisource pages: MWContentSerializationException: The serialization is an invalid JSON array.
Closed, ResolvedPublic

Description

While attempting to read this page:
https://no.wikisource.org/wiki/Side:Presten_som_ikke_kunde_brukes/87

I got:

[W2rmugpAICkAACqcBy0AAACK] 2018-08-08 12:48:58: Fatal exception of type MWContentSerializationException

Any hints what could happen here?

Event Timeline

Ankry created this task.Aug 8 2018, 12:52 PM
Restricted Application added subscribers: jeblad, Danmichaelo, Aklapper. · View Herald TranscriptAug 8 2018, 12:52 PM
Reedy added a subscriber: Reedy.
2018-08-08 12:48:58 [W2rmugpAICkAACqcBy0AAACK] mw1320 nowikisource 1.32.0-wmf.15 exception ERROR: [W2rmugpAICkAACqcBy0AAACK] /wiki/Side:Presten_som_ikke_kunde_brukes/87   MWContentSerializationException from line 129 of /srv/mediawiki/php-1.32.0-wmf.15/extensions/ProofreadPage/includes/Page/PageContentHandler.php: The serialization is an invalid JSON array. {"exception_id":"W2rmugpAICkAACqcBy0AAACK","exception_url":"/wiki/Side:Presten_som_ikke_kunde_brukes/87","caught_by":"mwe_handler"} 
[Exception MWContentSerializationException] (/srv/mediawiki/php-1.32.0-wmf.15/extensions/ProofreadPage/includes/Page/PageContentHandler.php:129) The serialization is an invalid JSON array.
  #0 /srv/mediawiki/php-1.32.0-wmf.15/extensions/ProofreadPage/includes/Page/PageContentHandler.php(108): ProofreadPage\Page\PageContentHandler->unserializeContentInJson(string)
  #1 /srv/mediawiki/php-1.32.0-wmf.15/includes/Storage/RevisionStore.php(1298): ProofreadPage\Page\PageContentHandler->unserializeContent(string, string)
  #2 /srv/mediawiki/php-1.32.0-wmf.15/includes/Storage/RevisionStore.php(1224): MediaWiki\Storage\RevisionStore->loadSlotContent(MediaWiki\Storage\SlotRecord, NULL, NULL, NULL, integer)
  #3 [internal function]: Closure$MediaWiki\Storage\RevisionStore::emulateMainSlot_1_29#2(MediaWiki\Storage\SlotRecord)
  #4 /srv/mediawiki/php-1.32.0-wmf.15/includes/Storage/SlotRecord.php(306): call_user_func(Closure$MediaWiki\Storage\RevisionStore::emulateMainSlot_1_29#2;575, MediaWiki\Storage\SlotRecord)
  #5 /srv/mediawiki/php-1.32.0-wmf.15/includes/Storage/RevisionRecord.php(174): MediaWiki\Storage\SlotRecord->getContent()
  #6 /srv/mediawiki/php-1.32.0-wmf.15/includes/Revision.php(903): MediaWiki\Storage\RevisionRecord->getContent(string, integer, User)
  #7 /srv/mediawiki/php-1.32.0-wmf.15/includes/page/WikiPage.php(780): Revision->getContent(integer, User)
  #8 /srv/mediawiki/php-1.32.0-wmf.15/extensions/ProofreadPage/includes/Page/PageViewAction.php(31): WikiPage->getContent(integer, User)
  #9 /srv/mediawiki/php-1.32.0-wmf.15/includes/MediaWiki.php(500): ProofreadPage\Page\PageViewAction->show()
  #10 /srv/mediawiki/php-1.32.0-wmf.15/includes/MediaWiki.php(294): MediaWiki->performAction(Article, Title)
  #11 /srv/mediawiki/php-1.32.0-wmf.15/includes/MediaWiki.php(867): MediaWiki->performRequest()
  #12 /srv/mediawiki/php-1.32.0-wmf.15/includes/MediaWiki.php(524): MediaWiki->main()
  #13 /srv/mediawiki/php-1.32.0-wmf.15/index.php(42): MediaWiki->run()
  #14 /srv/mediawiki/w/index.php(3): include(string)
  #15 {main}
Aklapper renamed this task from problem accessing a page in nowikisource to Error accessing a page in nowikisource: MWContentSerializationException: The serialization is an invalid JSON array..Aug 8 2018, 1:20 PM
Ankry added a comment.EditedAug 8 2018, 1:21 PM

Also here:
https://no.wikisource.org/wiki/Presten_som_ikke_kunde_brukes/Juleaften

probably the same source of error:
[W2rtiApAICgAAJnbMp8AAABR] 2018-08-08 13:18:00: Krytyczny wyjątek typu "MWContentSerializationException"

Tpt added a subscriber: Tpt.Aug 8 2018, 1:36 PM

It seems that the two pages could be parsed as JSON by PHP and so, ProofreadPage assumes that they are using the JSON serialization for Page: pages. But, because they are not using the good JSON serialization format, an exception is thrown.

Tagging Editing-team per https://www.mediawiki.org/wiki/Developers/Maintainers.

Note, Logstash shows has record of 1 instance of this error from version php-1.32.0-wmf.12. In recent days the error is appearing more frequently in Logstash (~40x in one day for *wikisource.org). It was likely hidden previously due to T200960.

Ankry added a comment.Aug 10 2018, 7:11 AM

The page is dated from beginning of proofreadpage (July 2009), it has no underlying scan and page_links_updated / page_touched = 20151017054924 suggest that the problem appeared around that date.

I tried to retrieve page text:

  1. failed via API:

https://no.wikisource.org/w/api.php?action=query&prop=revisions&titles=Side%3APresten%20som%20ikke%20kunde%20brukes%2F87&rvprop=content

  1. but sucseed via [[Special:Export]]:

https://no.wikisource.org/wiki/Spesial:Eksporter/Page:Presten_som_ikke_kunde_brukes/87

It seems that this page has no proofreadpage header/footer at all while content-model is set to proofread-page.
@Tpt any idea how can we create propoer header/footer here (if this is the source of the problem)?

I have copied the content (text) from this page here:
https://no.wikisource.org/wiki/Side:Presten_som_ikke_kunde_brukes/87/retrieved

Tpt added a comment.Aug 10 2018, 9:06 AM

@Tpt any idea how can we create propoer header/footer here (if this is the source of the problem)?

The root cause of the error seems to me that the page content is considered as valid JSON by the PHP json_decode function and, so, ProofreadPage tries and fail to parse it as JSON. I'm submitting a change that hopefully will solve the problem by considering as valid JSON only textual content that could be parsed in a PHP array (i.e. is a JSON array or object).

Change 451834 had a related patch set uploaded (by Tpt; owner: Tpt):
[mediawiki/extensions/ProofreadPage@master] Guess that the Page: serialization is in JSON only if it an array or an object

https://gerrit.wikimedia.org/r/451834

Ankry added a comment.EditedAug 10 2018, 11:57 AM

Probably the same problem with 2 pages in Hungarian Wikisource:
https://hu.wikisource.org/wiki/Oldal:Budenz-Szinnyei_-_Finn_nyelvtan.djvu/46
https://hu.wikisource.org/wiki/Oldal:Budenz-Szinnyei_-_Finn_nyelvtan.djvu/61

[W218-gpAMFQAABNZ94QAAACT] 2018-08-10 11:54:38: Fatal exception of type MWContentSerializationException

Ankry renamed this task from Error accessing a page in nowikisource: MWContentSerializationException: The serialization is an invalid JSON array. to Error accessing some page in wikisources: MWContentSerializationException: The serialization is an invalid JSON array..Aug 10 2018, 1:16 PM
Krinkle renamed this task from Error accessing some page in wikisources: MWContentSerializationException: The serialization is an invalid JSON array. to Fatal error when viewing some Wikisource pages: MWContentSerializationException: The serialization is an invalid JSON array..
Ankry added a comment.Aug 15 2018, 6:14 AM

@Krinkle the same error appears when accessing revision text via API, so this is not only "viewing".

The Norwegian page was "repaired" by deleting and recreating the page from scratch.
@Tpt should we do the same with the Hungarian ones, or is the fix planned to be deployed soon?

Tpt added a comment.Aug 15 2018, 6:53 AM

@Ankry the fix is planned to be deployed soon. Have some real pages still affected is a good way to check that the problem is indeed solved.

Change 451834 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] Guess that the Page: serialization is in JSON only if it an array or an object

https://gerrit.wikimedia.org/r/451834

matmarex closed this task as Resolved.Aug 16 2018, 8:32 PM
matmarex claimed this task.
matmarex added a subscriber: matmarex.

The change has been merged and will be deployed to production wikis next week, 21-23 August, per the usual schedule. Hopefully it fixes the issues.

Sorry about the delay, there was a problem with failing unit tests that turned out to be unrelated (T202091).

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptAug 16 2018, 8:32 PM
matmarex reassigned this task from matmarex to Tpt.Aug 16 2018, 8:32 PM
matmarex removed a project: VisualEditor.