Page MenuHomePhabricator

"success is not a function" JS exception on certain DjVu files
Closed, ResolvedPublicBUG REPORT

Description

On certain DjVu files, the following exception is thrown:

TypeError: success is not a function. (In 'success()', 'success' is an instance of Object)

The call stack in normal mode only contains ensureImageZoomInitialization, and in debug mode I've been unable to reproduce. However something is clearly failing (silently) in debug mode too, because the "Proofread tools" in the editor toolbar don't show up (they do in normal mode).

The user-visible effect of this is that the hidden text layer in the DjVu file is not loaded, I presume because the scripts that extract it bomb on this exception before getting to that point.

On files affected by this I get the symptom (no OCR text) consistently, but the console message does not always show up (something timing-dependent there?). For example this page.

The above example file has a known provenance (I generated it myself from raw scans) and examining it with the DjVuLibre command-line tools reveals nothing obviously out of the ordinary (i.e. this is not just T219376).

A few pages have pathological output: a hundred or so distinct regions containing a single space character. In the djvutxt dump output that retrieveMetaData() in DjVuImage.php works on, these will each show up as \n\037\013; and since retrieveMetaData() does a simple regex replace (cf. T230415) it is conceivable that the extremely long runs of this sequence will blow past a recursion, stack, or runtime limit in the regex engine or some similar weirdness.

Another possible trigger is the scan resolution. All the files I've seen it on are high-resolution (like 2–3k x 3–4k pixels), with each page in the 1MB range and the whole DjVu file in the 100MB to several hundred MB range. Since the call stack for the exception looks like image zooming code—and refers to a on-success function object that contains an object (error object that was not checked at the time it was first returned?) reference instead of a function reference—it is conceivable that large resolution or file size plays a role.

In any case, the first step here is probably for someone familiar with ensureImageZoomInitialization and its calling context to trace back where that success variable comes from and what might cause it to contain object instead of function.

Event Timeline

@Xover: Could you please share a link for an example page where that error message does show up, to have steps to reproduce? Thanks.

Change 550571 had a related patch set uploaded (by Tpt; owner: Tpt):
[mediawiki/extensions/ProofreadPage@master] Fixes the JS error "success is not a function"

https://gerrit.wikimedia.org/r/550571

Change 550571 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] Fixes the JS error "success is not a function"

https://gerrit.wikimedia.org/r/550571

Ah thanks! Looks like I misinterpreted the line "but the console message does not always show up (something timing-dependent there?). For example this page."

Tpt changed the subtype of this task from "Task" to "Bug Report".Nov 14 2019, 9:18 PM

It looks like the mentioned javascript error in the console is gone now (as of 1.35.0-wmf.10 today; I can't recall if I checked on wmf.8 due to the recent hiccup in the deployment calendar), but the text layer is still not loading in the example page. Absent any console messages I have no idea how to debug this further. Are there server side logs that can be checked? A test system where it can be traced? Should I open a new task for that, and if so, with what tags or projects so that it has a chance to hit the right teams' radar?

Tpt claimed this task.

It seems that my fix indeed solve the "success is not a function". So, I believe this task should be closed.

The text layer is not preloaded in JavaScript but in the PHP side. Opening a new task for this problem is the good way to go. This problem might be related to T204020.

@Tpt This is almost certainly not the same issue as T204020 (which is a dup of T219376, where some possible approaches to fix it are suggested). 20402 is basically the result of a naïve text layer extraction algorithm in MediaWiki-DjVu that fails badly when faced with unexpected data in a DjVu file, combined with the fact that the DjVuLibre tools will happily accept invalid input that it will choke on when asked to output it again. MediaWiki-DjVu can absolutely be modified such that it will fail gracefully in these cases. But, crucially, such DjVu files can be reliably identified (djvutxt --detail=page file.djvu | egrep "^failed"), and the example from this task does not exhibit that problem.

In any case, T240562 has been filed for the failed text layer extraction.