Page MenuHomePhabricator

Wikisource Export: Exports failing because subpages "could not be found"
Closed, ResolvedPublicBUG REPORT

Description

What is the problem?

This book does not export.

It returns:

Download failed. The book 'П'єси_і_переклади_співаної_поезії/Дальні_верховини' could not be found.

It might be because the subpage П'єси_і_переклади_співаної_поезії/Дальні_верховини is a redirect to a non-existent page.

However, this also fails, and the failing subpage does not appear to be a redirect.

These are the only two exports which I have found that failed so far.

It does export successfully in commit 1b426f19e6938b764a94c2611a3359f85227ba25 (I have not tried it with any other commits between then and now).

Steps to reproduce problem
  1. https://ws-export.wmcloud.org/?lang=uk&page=%D0%9F%27%D1%94%D1%81%D0%B8_%D1%96_%D0%BF%D0%B5%D1%80%D0%B5%D0%BA%D0%BB%D0%B0%D0%B4%D0%B8_%D1%81%D0%BF%D1%96%D0%B2%D0%B0%D0%BD%D0%BE%D1%97_%D0%BF%D0%BE%D0%B5%D0%B7%D1%96%D1%97&format=epub-3&fonts=

Expected behavior: Ebook exports successfully
Observed behavior: Error returned

Environment

WS Export: WS Export version 2.6.2.

QA Results - Wikisource

Event Timeline

dom_walden renamed this task from Wikisource Export: Export fails when subpages are redirects to non-existent pages to Wikisource Export: Exports failing because subpages "could not be found".Apr 16 2021, 2:26 PM
dom_walden updated the task description. (Show Details)

The issue depends on the special characters in the page name.

The same issue appears in https://nap.wikisource.org/wiki/L'Eneide

The link to the tool is https://ws-export.wmcloud.org/tool/book.php?lang=nap&format=pdf-a4&page=L%2527Eneide
which produces the error message: Download failed. The book 'L%27Eneide' could not be found.

However, the PDF is correctly generated if the link is corrected (by hand) to https://ws-export.wmcloud.org/tool/book.php?lang=nap&format=pdf-a4&page=L%27Eneide

On Wikisource Book Export version version 2.7.4.

@Ruthven: the https://ws-export.wmcloud.org/tool/book.php?lang=nap&format=pdf-a4&page=L%2527Eneide link is double-encoded (i.e. %2527 is the encoded form of %27).

The one in the sidebar and download popup is https://ws-export.wmcloud.org/?format=epub&lang=nap&page=L%27Eneide, which works correctly. Typing L'Eneide into the form also works fine.

Oh, I see: it looks like Modello:WSExport/Link might be the source of the problem. It should be using {{PAGENAME}} rather than {{PAGENAMEE}}, because it's then doing urlencode on the result.

@dmaza Exporting the EPUB from that page on nap.source, I still have the same error message. Because of the encoding L'Eneide I suppose.

The issue on napwikisource was fixed in 2021.

It looks like the redirect-to-nonexisting bug still exists, although some of the above examples are now working (at least one because the redirect was subsequently fixed).

The following shows the issue (with this test page):

$ ./bin/console a:e -l beta --nocredits -t Links
04:51:12 DEBUG     [app] GET https://en.wikisource.beta.wmflabs.org/api/rest_v1/page/html/Links
04:51:13 DEBUG     [app] GET https://en.wikisource.beta.wmflabs.org/w/api.php?titles=Links&prop=categories&clshow=%21hidden&action=query&format=json
04:51:13 DEBUG     [app] Sending request for 2 titles
04:51:13 DEBUG     [app] GET https://en.wikisource.beta.wmflabs.org/api/rest_v1/page/html/Links%2FLorem
04:51:13 DEBUG     [app] GET https://en.wikisource.beta.wmflabs.org/api/rest_v1/page/html/Links%2FSubpage_redirects_to_non-existent_page
04:51:15 WARNING   [app] HTTP response 302
04:51:15 DEBUG     [app] GET https://en.wikisource.beta.wmflabs.org/api/rest_v1/page/html/Links%2FSubpage_doesn't_exist
04:51:15 WARNING   [app] HTTP response 404
04:51:15 DEBUG     [app] Got responses for 1 pages

In PageParser.php line 20:
                                                                                                                                                           
  [TypeError]                                                                                                                                              
  App\PageParser::__construct(): Argument #1 ($doc) must be of type DOMDocument, null given, called in src/BookProvider.php on line 135                                                                                                                                        
                                                                                                                                                           
Exception trace:
 at src/PageParser.php:20
 App\PageParser->__construct() at src/BookProvider.php:135
 App\BookProvider->getMetadata() at src/BookProvider.php:48
 App\BookProvider->get() at src/BookCreator.php:49

This used to be handled by showing an error like:

The book 'Links/Subpage_redirects_to_non-existent_page' could not be found.

This should probably say "The page x could not be found" because it's odd to say it's a "book" when it's not the root book title that's been requested (although it also might be).

Alternatively, we could include a blank page for the missing page, or a page with the above message on it.

Here's a patch to fix the above issue: https://github.com/wikimedia/ws-export/pull/511

In general, I think failing to export because a subpage doesn't exist is perhaps a reasonable way to do it. Otherwise, people will get an export that they think is fine, but actually find holes in it.

Merged and released.

QA notes:

  • The above example book has been fixed so doesn't show the error.
  • In general, the same behaviour is expected for both a top-level work (i.e. the title provided by the user) and any subpage that's not found, including if it's the target of a redirect.