Page MenuHomePhabricator

Install the extension Html2wiki in ru:Wikisource
Open, Stalled, Needs TriagePublic

Description

We would like install Html2wiki in https://ru.wikisource.org.

By definition, Wikisource is a library containing only texts that have already been published somewhere. Most of them are imported from Html pages of various sites, the rest is probably 20% is the import from Pdf scans, and a small percentage is imported from other formats. There are no other sources of content for Wikisource.

That is, users import most Wikisource content from htmls, each time manually converting them to wiki-format. This spends valuable time, which can be used more productively for the Project. In addition, usually the text from html is copied without formatting the styles, the restoration of which manually also takes a long time.

Installing this extension would be very helpful. In addition, by the description, it can to upload illustrations, many books have tens and hundreds of them, that will also save a lot of time.

Event Timeline

Vladis13 created this task.Feb 9 2018, 5:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 9 2018, 5:30 PM
Vladis13 updated the task description. (Show Details)Feb 9 2018, 5:46 PM

As the extension creator, I would be glad to work with WMF to review/update/secure this extension.

It was purpose built, and hasn't been fully vetted for production use "in the wild". In particular, it would benefit from a feature where users could specify processing rules depending on the specifics of their source. Ie. we have specific rules hard-coded for Google Docs Html, but it would be great to allow users to specify rules, based on a preview.

Vladis13 updated the task description. (Show Details)Feb 9 2018, 6:06 PM

Hi, as this extension is not yet installed on Wikimedia sites, please see https://www.mediawiki.org/wiki/Review_queue for the steps required. Thanks.

Aklapper changed the task status from Open to Stalled.Feb 10 2018, 6:47 PM
freephile added a comment.EditedFeb 10 2018, 7:49 PM

Thanks Andre for pointing to that process. I hope to work on it as time permits; which right now is zero. (to anyone listening: I'd gladly accept any collaborators; or paying client who needs customization/improvements of the extension.)

Note: The underlying coverter used is pandoc. Pandoc is the swiss-army knife of format conversions. My hope is to actually expand the scope of the extension to be a wiki front-end to pandoc such that mediawiki becomes capable of importing any format that pandoc understands.

Vladis13 added a comment.EditedFeb 23 2018, 2:00 PM

As I understand, the extension is ready. Perhaps it's time to just go on to apply for a security check?