Page MenuHomePhabricator

Preparing MCS for Parsoid-PHP switch
Closed, ResolvedPublic0 Estimated Story Points

Description

This is a tracker task to figure out testing / qa needs and any work / changes needed to MCS/PCS to switchover from Parsoid/JS to Parsoid/PHP.

T229015: Tracking: Direct live production traffic at Parsoid/PHP is the tracker task for deployment. But, we don't anticipate a deploy before mid-September 2019 at this time. But, let us handle this as if we are going to be deploying around mid to end September.

Since MCS talks to RESTBase, there is nothing that will change on that end. However, T229025: Add ability to RESTBase to partition client traffic to Parsoid/PHP and T229018: RESTBase should be able to store Parsoid/PHP contents in Cassandra alongwith Parsoid/JS contents implies that MCS might have to handle cookies tracking which version of Parsoid generated its HTML unless we come with a mechanism for handling it in RESTBase / Parsoid that is transparent to clients. But, for now, it is probably safe to assume that MCS might need to handle a cookie related to this.

We will also do an early deployment to the beta cluster so that clients can do early testing there. But, please comment on the ticket / edit the description adding any other requirements to ensure we cover all our bases.

Event Timeline

@Mholloway @bearND Do we have specific questions or needs? As far as we discussed this will happen transparently through restbase and there will be no breaking changes so we should be good.

Assuming are no intended output changes, I can't think of anything for us to do until the partitioning scheme is in place and we have to add header/cookie handling. Our existing test suites should catch any significant inconsistencies with Parsoid/JS once we are able to start consuming the Parsoid/PHP rendering of production content.

Jhernandez claimed this task.

@ssastry It seems like we don't have anything specific to do for this. Please re-open if there is anything you will require us to tackle for the switch.

Now that Parsoid-PHP is available in beta, and you can query RESTBase for Parsoid-PHP content by passing a header ( see description of T229074 ), could you run your tests in Beta and confirm everything looks good?

ssastry triaged this task as High priority.Nov 4 2019, 4:12 AM

Hi Product Infra Team: We want to catch problems sooner than later, so checking in about updates on how testing is going here. Anything that you need from us? Or, is everything looking good on your end?

Hi @ssastry -

Two potential issues I noticed when running the tests with the header to request the php output:

  1. Previously scheme-less urls for videos now have a scheme. For example, on enwiki Hummingbird/810247947 the URLs for videos had their prefixes change from //upload.wikimedia.org/ to https://upload.wikimedia.org/
  2. The title in the href of the link tag with rel="dc:isVersionOf now uses spaces instead of underscores. For example, on enwiki User:BSitzmann_(WMF)/MCS/Test/Frankenstein/778666613 the href changed from //en.wikipedia.org/wiki/User%3ABSitzmann_(WMF)/MCS/Test/Frankenstein to //en.wikipedia.org/wiki/User%3ABSitzmann%20(WMF)/MCS/Test/Frankenstein

Hi @ssastry -

Two potential issues I noticed when running the tests with the header to request the php output:

Thanks!

  1. Previously scheme-less urls for videos now have a scheme. For example, on enwiki Hummingbird/810247947 the URLs for videos had their prefixes change from //upload.wikimedia.org/ to https://upload.wikimedia.org/

Yup, known. https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/550551 is the final part of the fix. It is right now going through testing and will be deployed today likely.

  1. The title in the href of the link tag with rel="dc:isVersionOf now uses spaces instead of underscores. For example, on enwiki User:BSitzmann_(WMF)/MCS/Test/Frankenstein/778666613 the href changed from //en.wikipedia.org/wiki/User%3ABSitzmann_(WMF)/MCS/Test/Frankenstein to //en.wikipedia.org/wiki/User%3ABSitzmann%20(WMF)/MCS/Test/Frankenstein

Both are valid, and we can see what caused the change here. But, is this a problem? Or, are you just flagging the change?

@ssastry 2. is a problem in that it breaks the links we generate for editing the page. We can work around it in mobileapps services if it's an expected change but ideally we'd rather the underlying output keep the canonical title

Got it. Looks like Parsoid/PHP normalizes titles whereas Parsoid/JS emits it as given.

[subbu@earth:~/work/wmf/parsoid] echo "abc" | parse.js --pageName "A B"
..
<link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/A%20B"/>
..
[subbu@earth:~/work/wmf/parsoid] echo "abc" | parse.js --pageName "A_B" 
..
<link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/A_B"/>
..
[subbu@earth:~/work/wmf/parsoid] echo "abc" | php bin/parse.php --pageName "A_B"
..
<link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/A%20B"/>
..
[subbu@earth:~/work/wmf/parsoid] echo "abc" | php bin/parse.php --pageName "A B"
..
<link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/A%20B"/>
...

So, I suppose RESTBase normalizes spaces to _ before posting to Parsoid. Otherwise, you would have to deal with the space issue already. On scandium.eqiad.wmnet, I could post requests to Parsoid/JS directly and could verify that if the title has spaces instead of _, I get back <link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/User%3ABSitzmann%20(WMF)/MCS/Test/Frankenstein"/>

Will look into this.

I think Parsoid/PHP is better code here since it handles titles consistently but let me check in with others to see what ought to be canonical title output here.

Change 550722 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Use underscores instead of spaces in dc:isVersionOf link tag

https://gerrit.wikimedia.org/r/550722

Change 550722 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use underscores instead of spaces in dc:isVersionOf link tag

https://gerrit.wikimedia.org/r/550722

Hi @ssastry -

Two potential issues I noticed when running the tests with the header to request the php output:

  1. Previously scheme-less urls for videos now have a scheme. For example, on enwiki Hummingbird/810247947 the URLs for videos had their prefixes change from //upload.wikimedia.org/ to https://upload.wikimedia.org/
  2. The title in the href of the link tag with rel="dc:isVersionOf now uses spaces instead of underscores. For example, on enwiki User:BSitzmann_(WMF)/MCS/Test/Frankenstein/778666613 the href changed from //en.wikipedia.org/wiki/User%3ABSitzmann_(WMF)/MCS/Test/Frankenstein to //en.wikipedia.org/wiki/User%3ABSitzmann%20(WMF)/MCS/Test/Frankenstein

Both these issues have been fixed and deployed to both beta cluster and production. Let me know if you find anything else in testing.

test.wikipedia.org and test2.wikipedia.org have been switched to serve Parsoid/PHP HTML only. Please test and report any issues you spot.

ssastry claimed this task.

Per @JoeWalsh comment on a call y'day, MCS are happy for us to roll out. @JoeWalsh @Jhernandez please reopen task if you do any other testing and find issues.