Page MenuHomePhabricator

Log messages at ERROR level on http channel: Special:Book unable to connect to https://tools.pediapress.com
Closed, ResolvedPublicPRODUCTION ERROR

Description

After enabling ERROR logging for all channels on group0 wikis (T228838), errors like this appeared:

https://logstash.wikimedia.org/goto/34d754be37f634f93da57651fca054b2

POST https://tools.pediapress.com/mw-serve/ HTTP/1.1 - NULL cURL error 28: Connection timed out after 5000 milliseconds (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://tools.pediapress.com/mw-serve/

The errors are logged when users access Special:Book with different URL parameters.

The volume is low, but I'm filing a task anyway for evaluation, since we were not recording these logs before.

Event Timeline

Looks like a request to an external source from MediaWiki that doesn't make use of the url-downloader as a proxy, thus it's firewalled.

I 've dug a bit into this. And it's a mess of historical reasons.

Namely the Collection extensions, which has a code stewardship request since 2019 per T224922, defaults to the https://tools.pediapress.com/mw-serve/ URL for sending whatever it send to pediapress. This can be overriden, using $wgCollectionMWServeURL and indeed it was in the very long distant past (10 years ago). It was set to $wgCollectionMWServeURL = "http://pdf2.wikimedia.org:8080/mw-serve/";. However, since 017f07485ffcbdf304fa62fd11d749d9bdec471b it was switched to target OCG, a relatively short lived service meant to replace that functionality. OCG didn't live long. 3 years after the above commit, it was fully removed by c0d76882def4190d0844fd7e0ae014e63e3544f2.

And the extension fell back to the default. Since 2017, if someone tried to use that part of the functionality, the extension tried to reach out to the default url and failing.

Just to make things a bit more confusing, there is a configuration to force the usage of urldownloader for collection urls, namely

	$wgCollectionCommandToServeURL = [
		'zip_post' => "{$wmgLocalServices['urldownloader']}|https://pediapress.com/wmfup/",
	];

per https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/master/wmf-config/CommonSettings.php#2449. However the above url doesn't match the default url and the config doesn't apply.

As far as I am concerned, the proper thing to do here is to either push forward with T224922 and undeploy the unmaintained and abandoned extension, or at the very least figure out how to make that part of the functionality.

Diving a bit deep in the rabbithole.

The config above is for the Collection's zip_post command, which actually works! Going e.g. to https://www.mediawiki.org/wiki/User:AKosiaris_(WMF)/Books/T374888 and clicking on order printed book apparently works fine. What didn't work was the Collection's render_collection command which the only caller I 've managed to find on www.mediawiki.org was from the Saved Book template. I 've gone ahead and updated it on that wiki https://www.mediawiki.org/w/index.php?title=Template:Saved_book&diff=prev&oldid=6769935

Let's see how this goes.

Hmm, I 've dug into Turnilo too and I fear my change isn't going to have any impact. Per https://w.wiki/BHjy, the top 5 user agents request wise, all showing Chrome version that are at least ~2 years old (a sign of bot activity), are without a Referer and from singaporean IPs

And https://w.wiki/BHk5 shows that all requests that timeout appear to be from Cloud Providers.

I am pretty confident that this is almost exclusively bot activity.

Change #1075157 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[mediawiki/extensions/Collection@master] Remove render_article, render_collection commands

https://gerrit.wikimedia.org/r/1075157

Change #1075157 merged by jenkins-bot:

[mediawiki/extensions/Collection@master] Remove render_article, render_collection commands

https://gerrit.wikimedia.org/r/1075157

matmarex assigned this task to akosiaris.

I'm also generally willing to review changes that remove code that hasn't worked in years. I'm just not as fast as James. ;)

Optimistically closing this. If we still see the errors after the next train deploy, we can re-open.

I wonder what would happen if that code were made to properly go through the URL downloader. Would it bring back the book rendering as PDF? Probably not, too many assumptions have been made that it doesn't work.

And the Collection extension seem to have third-party users, including of the functionality just removed, if T373556 is anything to go by.

In some better world I would probably do something about this. But everything is falling apart and I frankly no longer care.

I'm also generally willing to review changes that remove code that hasn't worked in years. I'm just not as fast as James. ;)

Optimistically closing this. If we still see the errors after the next train deploy, we can re-open.

Thanks!

I wonder what would happen if that code were made to properly go through the URL downloader. Would it bring back the book rendering as PDF? Probably not, too many assumptions have been made that it doesn't work.

I wondered that too, but given the extension ins not owned, the person/team that would have at least an incentive to answer that question doesn't exist.

Even if it did work, from a user experience perspective it would be worse for the movement's wikis. There would be 2 different ways to get a PDF, with probably different looks. That would inevitably lead to confusion, recurring discussions between users to resolve that confusion, friction, frustration, etc. Not a recipe for a nice experience.

And the Collection extension seem to have third-party users, including of the functionality just removed, if T373556 is anything to go by.

I am not surprised that there are users. The issue here is one of striking a balance between catering to the needs third party users vs the needs of Wikimedia wikis. If this wasn't an unmaintained and unowned extension, it could probably be solved in a much better way. With the situation as is, striking that balance isn't really possible.

In some better world I would probably do something about this. But everything is falling apart and I frankly no longer care.

I've been looking at forking (or adding to) DownloadBook in order to provide rendering for Collection via Chrome on a smallish wiki (since a first pass with Weasyprint or pandoc, the backends it currently offers, aren't suitable for my purposes). Its ugly, but shows promise.

I'd like to offer to take this extension from "unmaintained" to "maintained". Since the privilege policy implies "ownership" out-moded (Previously, some extension maintainers were given ownership rights on the relevant project in Gerrit.... This model should not be used for new extensions) and I already have +2, I'm not sure what to do.

Change #1079670 had a related patch set uploaded (by MarkAHershberger; author: MarkAHershberger):

[mediawiki/extensions/Collection@master] Recover Collection book creation for DownloadBook

https://gerrit.wikimedia.org/r/1079670

I'd like to offer to take this extension from "unmaintained" to "maintained".

I think a first step would be to make that offer in T224922: Code Stewardship Review: Collection Extension

Change #1088773 had a related patch set uploaded (by MarkAHershberger; author: MarkAHershberger):

[mediawiki/extensions/Collection@REL1_42] Recover Collection book creation for DownloadBook

https://gerrit.wikimedia.org/r/1088773

Change #1088773 abandoned by Reedy:

[mediawiki/extensions/Collection@REL1_42] Recover Collection book creation for DownloadBook

https://gerrit.wikimedia.org/r/1088773

Restricted Application changed the subtype of this task from "Task" to "Production Error". · View Herald TranscriptJul 18 2025, 8:56 AM