Provide EPUB sanitizer
Open, LowPublic

Description

Author: stf

Description:
EPUB is a open format for E-Books. Even though it is not really easy to create, its xml-based design enables a broad use. I expect a lot of wikimedia-related epubs, e.g. from wikipedia, wikisource or wikibook pages, which would be nice to store right in the projects near by its source.


Version: unspecified
Severity: enhancement

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz17858.
bzimport created this task.Mar 8 2009, 11:09 AM

jeluf wrote:

EPUB is a ZIP file containing (X)HTML files. We should not distribute these without sanitizing them first. Even though Javascript is not part of the EPUB specification, we can't be sure that browser plugins properly disable the browser's Javascript engine.

> changed bug summary, keywords, product

brion added a comment.Mar 20 2009, 5:54 PM

Might be interesting, but as noted would need some special support for inline reading and sanitation etc.

There exists a tool to validate such files at http://code.google.com/p/epubcheck/ which might be useful here.

I'm resetting the priority field. You really shouldn't be touching those unless you're a developer, and you definitely shouldn't mess with them without an explanation as to why.

Mrjohncummings added a subscriber: Mrjohncummings.EditedAug 21 2015, 9:18 AM

It would be really great if Commons could support EPUB, it is one of the main book formats on Project Guttenberg which has 100,000 public domain books available with it's partners. Also as far as I understand fixing the validation problem for EPUB will also fix the same problem for the OpenDocument format.

Restricted Application added subscribers: Steinsplitter, Matanya, Aklapper. · View Herald TranscriptAug 21 2015, 9:18 AM
KRLS added a subscriber: KRLS.Aug 29 2015, 3:45 PM
Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Sep 4 2015, 6:08 PM
Alex_brollo added a subscriber: Alex_brollo.EditedNov 30 2015, 7:58 AM

I agree abut the need of a Commons strong support for ePub files. Commons can be seen as a shared multimedia repository, and books too are "media". In my vision, wikisource projects should be considered "the typographies" and Commons "the library"; a central library could be managed with robust librarian tecniques joining best skills of mediawiki people.

There's another way to to this, with a creative use of existing DjVu files; I'm testing such a fuzzy idea, is here anyone interested about?

Restricted Application added a project: Commons. · View Herald TranscriptNov 30 2015, 7:58 AM
TheDJ added a subscriber: TheDJ.Nov 30 2015, 9:21 AM

@Alex_brollo, I think plenty people would be interested and that mostly things like this are currently held back by the lack of people able to work on it, so if you want to experiment with this, by all means go ahead.

Add Comment