Page MenuHomePhabricator

Transform HTML to wikitext from a maintenance script
Closed, ResolvedPublicFeature

Description

Hi! I'm developing a maintenance script that needs to convert HTML to wikitext.

However it seems https://mediawiki.solutions/w/api.php?action=help&modules=visualeditoredit no longer works (see https://phabricator.wikimedia.org/T234049) and the REST API doesn't have a method for transforming HTML to wikitext either.

It seems the preferred way is ParsoidExtensionAPI, but https://www.mediawiki.org/wiki/Parsoid/So_you_want_your_extension_to_work_with_Parsoid doesn't seem to cover maintenance scripts, so following the advice there, I'm opening this ticket.

So how can I transform HTML to wikitext from a maintenance script?

Note: the input HTML comes from https://mediawiki.solutions/w/api.php?action=help&modules=visualeditor

Event Timeline

Aklapper changed the subtype of this task from "Task" to "Feature Request".Apr 1 2022, 5:35 PM

Parsoid itself has a bin/parse.php that takes an argument, --html2wt
https://github.com/wikimedia/parsoid/blob/master/bin/parse.php#L59

Does that suffice for your purposes?

Parsoid itself has a bin/parse.php that takes an argument, --html2wt
https://github.com/wikimedia/parsoid/blob/master/bin/parse.php#L59

Does that suffice for your purposes?

You might also want to add the --integrated or --domain argument to ensure that Parsoid knows the appropriate wiki configuration to use. --integrated works if its a local wiki, --domain works via the MW external API.

There's also the /transform/html/to/wikitext endpoint exported by RESTBase (and by Parsoid if you turn on $wgParsoidEnableREST, although that endpoint is experimental and subject to change). That has the benefit of using the wiki configuration of the wiki which is exporting that endpoint. You can write a simple curl or other script to hit the REST endpoint; I think the rate limits are quite generous.

ssastry subscribed.

Any reason not to decline this?

Sophivorus claimed this task.

@cscott Thanks so much for your thoughtful reply! I was finally able to resolve it via the API, so I didn't try your solution yet. I still want to try it and reply here but it seems @ssastry is a bit in a hurry. I'm closing this task as resolved then, and if I try your solution I'll post here for future visitors.