Page MenuHomePhabricator

mwbot-rs/parsoid: fails transforming html to wikitext if title is italic
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Find an article having an italicized title. (e.g. Father Ted)
  • Use get or get_revision (I haven't tested get, but I think they should work the same) to fetch that page into html
  • Then try to transform it to wikitext using transform_to_wikitext.

What happens?: It fails:

HTTP status client error (400 Bad Request) for url (https://en.wikipedia.org/api/rest_v1/transform/html/to/wikitext/%3Ci%3EFather%20Ted%3C%2Fi%3E/1129515675)

What should have happened instead?: Success

Software version (skip for WMF-hosted wikis like Wikipedia): parsoid = "0.7.4"

Other information (browser name/version, screenshots, etc.):

Details

ReferenceSource BranchDest BranchAuthorTitle
repos/mwbot-rs/mwbot!14title2-fixmainlegoktmHave Wikicode store a title instead of trying to parse it
repos/mwbot-rs/mwbot!13title-fixmainlegoktmExtract usable title from Parsoid HTML
Customize query in GitLab

Event Timeline

0xDeadbeef edited projects, added mwbot-rs (parsoid); removed mwbot-rs.
0xDeadbeef edited subscribers, added: mwbot-rs (parsoid); removed: mwbot-rs.
Legoktm triaged this task as High priority.Jan 8 2023, 5:42 AM
Legoktm added a subscriber: Legoktm.

Hmm, I noticed part of this last month and filed T324431: Parsoid: displaytitle HTML now appearing in <title> element rather than page title but didn't realize it broke this too.

The main part of this problem is that when converting Wikicode to ImmutableWikicode it tries to parse the title from HTML and stores that. Until this is fixed from parsoid's API side a temporary fix would be to store the original provided title within Wikicode and use that when converting to the immutable one.

The main part of this problem is that when converting Wikicode to ImmutableWikicode it tries to parse the title from HTML and stores that. Until this is fixed from parsoid's API side a temporary fix would be to store the original provided title within Wikicode and use that when converting to the immutable one.

Yeah, that's a good idea too. I put up a MR fixing .title() if you want to take a look, but I like your idea so it isn't even called in the common case. Do you want to work on a patch for that or should I?

parsoid 0.7.5 has been released with the fix.