Page MenuHomePhabricator

[EPIC] Determine next steps with books functionality
Open, Needs TriagePublic

Description

Background

The books functionality within the collections extension was paused upon the sunsetting of the OCG service. We would like to determine the possibility of returning this functionality using the new Chromium PDF renderer. Based on Chromium's performance, we will make decisions and next steps for books as well as for the Collections extension as a whole.

Related Objects

Event Timeline

Books functionality will be returning via PediaPress. After investigating the new chromium service (T181084: [EPIC] Deploy the mediawiki-services-chromium-render service (Proton)) in depth, we began to look for alternatives in terms of bringing back the PDF books functionality on Wikimedia projects. We reached out to PediaPress, who were the original patrons of books on Wikipedia to see if they would be interested in taking up PDF rendering for books once again. They have agreed and we are currently working on the details and schedule. They will start by working on a temporary solution based on an older technology that has previously been used to create PDF. This might have some drawbacks when it comes to graphical elements, such as maps, but will mean a faster working solution. They then plan to work on a new HTML-to-PDF renderer afterwards, based on feedback on the first implementation.

Books functionality will be returning via PediaPress. After investigating the new chromium service (T181084: [EPIC] Deploy the mediawiki-services-chromium-render service (Proton)) in depth, we began to look for alternatives in terms of bringing back the PDF books functionality on Wikimedia projects. We reached out to PediaPress, who were the original patrons of books on Wikipedia to see if they would be interested in taking up PDF rendering for books once again. They have agreed and we are currently working on the details and schedule. They will start by working on a temporary solution based on an older technology that has previously been used to create PDF. This might have some drawbacks when it comes to graphical elements, such as maps, but will mean a faster working solution. They then plan to work on a new HTML-to-PDF renderer afterwards, based on feedback on the first implementation.

I'd really like to see this new direction extensively documented in terms of expectations and commitments (from both sides). In the past we have had problems in the Collection extension to the point that the extension itself basically became technical debt. I understand that Pediapress had enough challenges already, but this extension residing in a sort of no man's land needs to be fixed. Maybe it requires https://www.mediawiki.org/wiki/Code_stewardship_reviews

Also, pinging @Tpt as the maintainer of wsexport.

I'd really like to see this new direction extensively documented in terms of expectations and commitments (from both sides). In the past we have had problems in the Collection extension to the point that the extension itself basically became technical debt. I understand that Pediapress had enough challenges already, but this extension residing in a sort of no man's land needs to be fixed. Maybe it requires https://www.mediawiki.org/wiki/Code_stewardship_reviews

More detailed documentation will come once we have a clearer idea of the technical side from Pediapress. Currently though, we expect no changes to the collections extension outside of the rendering portion. We might do some cleanup of the workflow but books will function more or less as they do currently. The main difference would be that we'd redirect to Pediapress for the final PDF download. That said, you're right - this answers little to no questions on of the future of the collections extension as a whole in terms of ownership and maintenance. I agree that it's a good candidate for https://www.mediawiki.org/wiki/Code_stewardship_reviews.

A relevant related W3C specification draft: https://www.w3.org/TR/wpub/ and https://www.w3.org/TR/pwp/
It's an aim to make ePub and web work together.
A possibility is to have an ePub/webPublication output system and then use converters targeting multiple formats (PDF, mobi...). It's the approach used by wexport.

Books functionality will be returning via PediaPress.

Excellent. I suppose this means mwlib (with Parsoid HTML?). After PDF goes back to the stable functionality, are we also going to get ZIM and ODT back? EPUB would be in order too.

A series of feature requests and bug reports in the Collection component, which we had triaged in 2014, have since been "retriaged" by some under the assumption that OCG (and others after OCG) was the only way. Please contact me if you need help with debugging and retriaging the rendering-only issues which will go back to PediaPress.

The code for concatenating pages should be removed (or cleaned up if it's intended to be used by Proton or PediaPress in some way).