[EPIC] Determine next steps with books functionality
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	ovasileva
	Jan 11 2018, 9:41 PM

Description

Background

The books functionality within the collections extension was paused upon the sunsetting of the OCG service. We would like to determine the possibility of returning this functionality using the new Chromium PDF renderer. Based on Chromium's performance, we will make decisions and next steps for books as well as for the Collections extension as a whole.

Related Objects
Search...

Status	Assigned	Task
Resolved	ovasileva	T181079 [GOAL] Provide an expanded reading experience by improving the ways that users can download articles of interest for later consumption
Open	None	T184772 [EPIC] Determine next steps with books functionality
Invalid	None	T183161 Performance test books on chromium rendering service
Resolved	Johan	T177076 Keep the community informed about book PDF unavailability

Event Timeline

ovasileva created this task.Jan 11 2018, 9:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 11 2018, 9:41 PM

ovasileva added a subtask: T183161: Performance test books on chromium rendering service.Jan 11 2018, 9:41 PM

ovasileva added a subtask: T177076: Keep the community informed about book PDF unavailability.

ovasileva mentioned this in T181079: [GOAL] Provide an expanded reading experience by improving the ways that users can download articles of interest for later consumption.Jan 11 2018, 9:45 PM

Jdlrobson mentioned this in T183161: Performance test books on chromium rendering service.Feb 6 2018, 1:03 AM

Jdlrobson changed the status of subtask T183161: Performance test books on chromium rendering service from Open to Stalled.

Books functionality will be returning via PediaPress. After investigating the new chromium service (T181084: [EPIC] Deploy the mediawiki-services-chromium-render service (Proton)) in depth, we began to look for alternatives in terms of bringing back the PDF books functionality on Wikimedia projects. We reached out to PediaPress, who were the original patrons of books on Wikipedia to see if they would be interested in taking up PDF rendering for books once again. They have agreed and we are currently working on the details and schedule. They will start by working on a temporary solution based on an older technology that has previously been used to create PDF. This might have some drawbacks when it comes to graphical elements, such as maps, but will mean a faster working solution. They then plan to work on a new HTML-to-PDF renderer afterwards, based on feedback on the first implementation.

ovasileva closed subtask T183161: Performance test books on chromium rendering service as Invalid.Apr 9 2018, 2:35 PM

ovasileva mentioned this in T186740: [EPIC] It should be possible to print a book using the Proton service.

ovasileva mentioned this in T167210: [EPIC] Adding PDF TOC with PDF page numbers to electron.Apr 9 2018, 2:38 PM

ovasileva mentioned this in T169738: [Spike 8hrs] Investigate ability of using post-processing approach with new print styles.

ovasileva mentioned this in T177993: Article concatenation fails on large books.Apr 9 2018, 2:42 PM

ovasileva mentioned this in T178095: [EPIC] Fix problems with the PHP concatenation special page service.

ovasileva mentioned this in T177994: Book generation fails for articles with '/' character in title.

ovasileva mentioned this in T175868: Deploy and test new book rendering (Remex + Electron).

ovasileva mentioned this in T177996: Article concatenation not resilient to curl errors.Apr 9 2018, 2:44 PM

ovasileva mentioned this in T178036: Book rendering database query error.

ovasileva mentioned this in T171832: Deploy new book renderer to all projects.

ovasileva mentioned this in T171836: Apply new print styles for books.

ovasileva mentioned this in T167955: Create PDF styles for books.

ovasileva mentioned this in T173015: Use PDF post-processing service to generate final PDF.Apr 9 2018, 2:47 PM

ovasileva mentioned this in T173579: Expose PDF post-processing scripts as a stateless web service.

ovasileva mentioned this in T171960: Create a library to post-process PDF and add page numbers and table of contents.

ovasileva mentioned this in T177805: [Spike] How do we render contributors and images section of books accurately?.

ovasileva mentioned this in T171834: Create page for testing new book renderer.

ovasileva mentioned this in T182230: [Spike] Explore ways of creating a stateless web service in Python.Apr 9 2018, 2:49 PM

In T184772#4116906, @ovasileva wrote:

Books functionality will be returning via PediaPress. After investigating the new chromium service (T181084: [EPIC] Deploy the mediawiki-services-chromium-render service (Proton)) in depth, we began to look for alternatives in terms of bringing back the PDF books functionality on Wikimedia projects. We reached out to PediaPress, who were the original patrons of books on Wikipedia to see if they would be interested in taking up PDF rendering for books once again. They have agreed and we are currently working on the details and schedule. They will start by working on a temporary solution based on an older technology that has previously been used to create PDF. This might have some drawbacks when it comes to graphical elements, such as maps, but will mean a faster working solution. They then plan to work on a new HTML-to-PDF renderer afterwards, based on feedback on the first implementation.

I'd really like to see this new direction extensively documented in terms of expectations and commitments (from both sides). In the past we have had problems in the Collection extension to the point that the extension itself basically became technical debt. I understand that Pediapress had enough challenges already, but this extension residing in a sort of no man's land needs to be fixed. Maybe it requires https://www.mediawiki.org/wiki/Code_stewardship_reviews

Also, pinging @Tpt as the maintainer of wsexport.

I'd really like to see this new direction extensively documented in terms of expectations and commitments (from both sides). In the past we have had problems in the Collection extension to the point that the extension itself basically became technical debt. I understand that Pediapress had enough challenges already, but this extension residing in a sort of no man's land needs to be fixed. Maybe it requires https://www.mediawiki.org/wiki/Code_stewardship_reviews

More detailed documentation will come once we have a clearer idea of the technical side from Pediapress. Currently though, we expect no changes to the collections extension outside of the rendering portion. We might do some cleanup of the workflow but books will function more or less as they do currently. The main difference would be that we'd redirect to Pediapress for the final PDF download. That said, you're right - this answers little to no questions on of the future of the collections extension as a whole in terms of ownership and maintenance. I agree that it's a good candidate for https://www.mediawiki.org/wiki/Code_stewardship_reviews.

A relevant related W3C specification draft: https://www.w3.org/TR/wpub/ and https://www.w3.org/TR/pwp/
It's an aim to make ePub and web work together.
A possibility is to have an ePub/webPublication output system and then use converters targeting multiple formats (PDF, mobi...). It's the approach used by wexport.

Envlh subscribed.Apr 25 2018, 7:35 AM

Aklapper mentioned this in T135643: Show tables in pdfs (#9).May 16 2018, 12:50 PM

In T184772#4116906, @ovasileva wrote:

Books functionality will be returning via PediaPress.

Excellent. I suppose this means mwlib (with Parsoid HTML?). After PDF goes back to the stable functionality, are we also going to get ZIM and ODT back? EPUB would be in order too.

A series of feature requests and bug reports in the Collection component, which we had triaged in 2014, have since been "retriaged" by some under the assumption that OCG (and others after OCG) was the only way. Please contact me if you need help with debugging and retriaging the rendering-only issues which will go back to PediaPress.

Nemo_bis added projects: All-and-every-Wikisource, Collection.May 25 2018, 8:30 AM

Liuxinyu970226 subscribed.Jun 20 2018, 11:41 AM

The code for concatenating pages should be removed (or cleaned up if it's intended to be used by Proton or PediaPress in some way).

PFWOz mentioned this in T150872: Replace OCG in collection extension with Electron.Jan 8 2019, 12:56 AM

PFWOz subscribed.Jan 8 2019, 12:58 AM

ovasileva removed a project: Proton.Feb 14 2019, 4:20 PM

Liuxinyu970226 unsubscribed.Feb 15 2019, 2:26 PM

MJL subscribed.Apr 7 2019, 9:18 PM

How's this coming?

@Sj this probably has the best most recent information. https://www.mediawiki.org/wiki/Topic:Uxkv0ib36m3i8vol

Johan closed subtask T177076: Keep the community informed about book PDF unavailability as Resolved.Jul 15 2019, 1:50 PM

[EPIC] Determine next steps with books functionalityOpen, Needs TriagePublicActions

Description

Background

Related ObjectsSearch...

Event Timeline

[EPIC] Determine next steps with books functionality
Open, Needs TriagePublic
Actions

Related Objects
Search...