Provide EPUB sanitizer


Author: stf

EPUB is a open format for E-Books. Even though it is not really easy to create, its xml-based design enables a broad use. I expect a lot of wikimedia-related epubs, e.g. from wikipedia, wikisource or wikibook pages, which would be nice to store right in the projects near by its source.

Version: unspecified
Severity: enhancement

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz17858.
bzimport created this task.Via LegacyMar 8 2009, 11:09 AM
bzimport added a comment.Via ConduitMar 20 2009, 5:49 PM

jeluf wrote:

EPUB is a ZIP file containing (X)HTML files. We should not distribute these without sanitizing them first. Even though Javascript is not part of the EPUB specification, we can't be sure that browser plugins properly disable the browser's Javascript engine.

> changed bug summary, keywords, product

brion added a comment.Via ConduitMar 20 2009, 5:54 PM

Might be interesting, but as noted would need some special support for inline reading and sanitation etc.

Bawolff added a comment.Via ConduitMay 22 2010, 1:30 AM

There exists a tool to validate such files at which might be useful here.

Bawolff added a comment.Via ConduitJan 9 2012, 7:56 PM

I'm resetting the priority field. You really shouldn't be touching those unless you're a developer, and you definitely shouldn't mess with them without an explanation as to why.

Gilles added a project: Multimedia.Via WebNov 24 2014, 3:37 PM
Mrjohncummings added a subscriber: Mrjohncummings.EditedVia WebAug 21 2015, 9:18 AM

It would be really great if Commons could support EPUB, it is one of the main book formats on Project Guttenberg which has 100,000 public domain books available with it's partners. Also as far as I understand fixing the validation problem for EPUB will also fix the same problem for the OpenDocument format.

Restricted Application added subscribers: Steinsplitter, Matanya, Aklapper. · View Herald TranscriptVia HeraldAug 21 2015, 9:18 AM
KRLS added a subscriber: KRLS.Via WebAug 29 2015, 3:45 PM
Jdforrester-WMF moved this task to Backlog on the Multimedia workboard.Via WebSep 4 2015, 6:08 PM
Alex_brollo added a subscriber: Alex_brollo.EditedVia WebNov 30 2015, 7:58 AM

I agree abut the need of a Commons strong support for ePub files. Commons can be seen as a shared multimedia repository, and books too are "media". In my vision, wikisource projects should be considered "the typographies" and Commons "the library"; a central library could be managed with robust librarian tecniques joining best skills of mediawiki people.

There's another way to to this, with a creative use of existing DjVu files; I'm testing such a fuzzy idea, is here anyone interested about?

Restricted Application added a project: Commons. · View Herald TranscriptVia HeraldNov 30 2015, 7:58 AM
TheDJ added a subscriber: TheDJ.Via WebNov 30 2015, 9:21 AM

@Alex_brollo, I think plenty people would be interested and that mostly things like this are currently held back by the lack of people able to work on it, so if you want to experiment with this, by all means go ahead.

zhuyifei1999 moved this task to File format support on the Commons workboard.Via WebNov 30 2015, 11:18 AM

Add Comment