Page MenuHomePhabricator

Wikisource: migrate API to Parsoid's API
Closed, ResolvedPublic8 Estimated Story PointsNov 4 2020

Description

Background: After the investigation T257902, it was concluded that the current MediaWiki parser HTML needs to be replaced with the new Parsoid's API so that reliability can be improved and simplify some of the HTML transformations.

Acceptance Criteria:
*WSexport uses Parsoid's API to get the HTML pages and convert them to an epub.
*Parse API's response according to new retrieved data.

Details

Due Date
Nov 4 2020, 5:00 AM

Event Timeline

HMonroy renamed this task from Wikisource: migrate API to Parsoid HTML to Wikisource: migrate API to Parsoid's API.Oct 6 2020, 7:35 PM
HMonroy updated the task description. (Show Details)
HMonroy updated the task description. (Show Details)
ifried set the point value for this task to 8.Oct 6 2020, 11:05 PM
ARamirez_WMF changed the subtype of this task from "Task" to "Deadline".
ARamirez_WMF changed Due Date from Oct 21 2020, 4:00 AM to Nov 4 2020, 5:00 AM.Oct 22 2020, 7:39 PM
dom_walden subscribed.

I compared the appearance of ebooks generated on test and production for a number of different ebooks, including those in the recently popular list.

I found a number of differences before and after this change. Many of these are not likely to be troubling to users. For example, margins between paragraphs sometimes differ, some text is aligned differently, some text is of different size.

BeforeAfter
alignment_before.png (369×313 px, 11 KB)
alignment_after.png (360×308 px, 11 KB)
size_before.png (254×595 px, 41 KB)
size_after.png (253×595 px, 49 KB)

There are a few I think are worth fixing, and I raised those separately below:

There are probably more bugs which we will hopefully find in due course.

I mostly tested epub and pdf formats, but I briefly tested mobi and rtf as well.

I compared the appearance of ebooks generated on test and production for a number of different ebooks, including those in the recently popular list.

I found a number of differences before and after this change. Many of these are not likely to be troubling to users. For example, margins between paragraphs sometimes differ, some text is aligned differently, some text is of different size.

BeforeAfter
alignment_before.png (369×313 px, 11 KB)
alignment_after.png (360×308 px, 11 KB)

For these two works it would be useful to know the urls of the original works. The Thai language work, I cannot tell whether it is a block center gone wrong or what.

ifried subscribed.

This is now on production. As noted by Dom, there are some minor formatting issues for us to analyze, as a result of this work. However, these issues can be addressed in separate tickets: T270367, T270372, T270373, T270395. For this reason, I am marking this work as Done.

Correction: This is on the test server, but not on production yet. While it appeared to be released, as per https://github.com/wsexport/tool/releases, this wasn't the case when we actually logged into the prod server to check. So I'll move it back to sign-off & mark it as Done when it's released to prod.

It looks like we've either never set up autodeployment on prod, or we removed it at some point. I think we should set it up again. Or were we purposefully sticking at 2.0.0 for a reason (sorry, I know I should remember…)?

This is now on production, so I am marking this ticket as Done. There are some bugs associated with this work, but they will be resolved in separate tickets. The team is already working on some of these tickets and further tickets will be discussed in our next estimation meeting.