Page MenuHomePhabricator

Fix broken URI in HTTP redirect (/parsoid prefix missing)
Closed, ResolvedPublic

Description

When parsoid.wmflabs.org was "moved" to parsoid-tests.wikimedia.org, some scripts were relocated into the "parsoid" subdirectory. However, the code of these scripts has not been fixed, resulting in:

http://parsoid-tests.wikimedia.org/parsoid/enwiki/Wikipedia

returning the following error message:
Cannot GET /enwiki/Wikipedia?oldid=710924518

That's because the correct URI is: /parsoid/enwiki/Wikipedia?oldid=710924518
When I navigate my browser to such a fixed URI, everything works fine.
Please fix this issue, as I have a tool that relies on this feature (and, possibly, suggest me a better interface for acquiring live DOM representation of Wikipedia articles).

Event Timeline

Blahma created this task.Mar 19 2016, 11:24 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 19 2016, 11:24 PM
Blahma renamed this task from Fix broken URI in HTTP redirect (/parsoid missing) to Fix broken URI in HTTP redirect (/parsoid prefix missing).Mar 20 2016, 12:05 AM

Still demonstrating the same behavior (because, sadly, nobody has yet even triaged this in the month that has passed).

It just contributed to my gadget broken again, this time because parsoid-tests.wikimedia.org started enforcing HTTPS and my work-around code was not expecting that extra redirect (it could have redirected transparently, but because of the bug I as a work-around need to capture the error response, extract the oldid value from it and initiate another connection to the fixed URL).

Therefore, this bug takes time not only from me as a developer, but it has direct influence on the production of (my gadget's) users and their editing performance. Yet, it seems to be so easy to fix that I would have even dared to do it myself, if I knew where to find the relevant source code etc.

ssastry added a subscriber: ssastry.EditedApr 14 2016, 8:09 PM

(and, possibly, suggest me a better interface for acquiring live DOM representation of Wikipedia articles).

Sorry .. this somehow escaped my attention and I have been lax lately triaging tasks.

But, you can get the latest Parsoid HTML via RESTBase @ https://<wiki>.wikipedia.org/api/rest_v1/page/html/<title> .. instead of needing to hit Parsoid directly for this.

Ex: https://en.wikipedia.org/api/rest_v1/page/html/Hospet

cscott added a subscriber: cscott.Apr 14 2016, 8:10 PM

You should be using the REST API for getting live DOM representations of WP articles.

Formerly https://rest.wikimedia.org/, but I believe the preferred URL is now:
https://en.wikipedia.org/api/rest_v1/?doc

Out of curiosity -- what is the gadget you are working on? We're always interested in new uses of the Parsoid DOM...

@Blahma separately .. what is this gadget .. so that https://www.mediawiki.org/wiki/Parsoid/Users can be updated .. feel free to update it yourself.

Thank you for suggesting to use the REST API instead. I had kind of known about it, but did not realize it would fulfill also converting the other way around that I need (after I modify the HTML, I need to get the corresponding wikitext back out of it). I will now try to modify my code to use this instead.

Mine is the first of the two Gadgets currently listed on the "Users" page. The tools has led to the creation of 6000 Czech articles and 5000 Slovak articles by helping people translate between these two similar languages more efficiently (it preserves the markup and translates the words with high confidence – including links, which is my own addition employing Wikidata, kind of what ContentTranslation can do today as well). With the exception of some template handling (automatic interwiki conversion of some frequent infoboxes), the tool might actually become redundant once ContentTranslation gets machine translation of similar reliability for this language pair. And while ContentTranslation was announced in January 2015, my tool has been around since November 2013.

Arlolra triaged this task as Medium priority.Apr 15 2016, 8:52 PM
Arlolra added a project: Parsoid.

Right now, even the correct URL – such as https://parsoid-tests.wikimedia.org/parsoid/enwiki/Wikipedia – does not work anymore and results in an error message "Cannot GET /enwiki/Wikipedia". It seems like some has played with URL rewrites on the server, breaking the rest of the available functionality.

I know that solution is to start using the REST API (and I will need to urgently write such code while my gadget is out of order now), but should parsoid-tests.wikimedia.org or this part of it not be actually closed down if it is deprecated and its code is getting ever more broken?

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptMay 14 2016, 11:02 PM

The v1 API was removed in T100681.

The URL you're looking for is,
https://parsoid-tests.wikimedia.org/parsoid/en.wikipedia.org/v3/page/html/Wikipedia/720255687

But, yeah, please use the REST API for your gadget. As subdomain implies, that's just a test server.

Arlolra closed this task as Resolved.Mar 29 2017, 5:54 PM
Arlolra claimed this task.