Page MenuHomePhabricator

Parsoid v2 API selser might not be working correctly
Closed, ResolvedPublic

Description

Request:
{
  "method": "post",
  "uri": "http://parsoid-lb.eqiad.wikimedia.org/v2/en.wikipedia.test.local/wt/User%3AGWicke%2F_restbase_test/646859921",
  "headers": {
    "content-type": "application/json"
  },
  "body": {
    "original": {
      "html": {
        "headers": {
          "content-type": "text/html;profile=mediawiki.org/specs/html/1.0.0"
        },
        "body": "<!DOCTYPE html>\n<html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\" about=\"http://en.wikipedia.org/wiki/Special:Redirect/revision/646859921\"><head prefix=\"mwr: http://en.wikipedia.org/wiki/Special:Redirect/\"><meta property=\"mw:articleNamespace\" content=\"2\"/><link rel=\"dc:replaces\" resource=\"mwr:revision/0\"/><meta property=\"dc:modified\" content=\"2015-02-12T22:30:30.000Z\"/><meta about=\"mwr:user/11429869\" property=\"dc:title\" content=\"GWicke\"/><link rel=\"dc:contributor\" resource=\"mwr:user/11429869\"/><meta property=\"mw:revisionSHA1\" content=\"6417e5e59b2975e65eebb5104ea572913a61db7e\"/><meta property=\"dc:description\" content=\"selser test page\"/><meta property=\"mw:parsoidVersion\" content=\"0\"/><link rel=\"dc:isVersionOf\" href=\"//en.wikipedia.org/wiki/User%3AGWicke/_restbase_test\"/><title>User:GWicke/_restbase_test</title><base href=\"//en.wikipedia.org/wiki/\"/><link rel=\"stylesheet\" href=\"//en.wikipedia.org/w/load.php?modules=mediawiki.legacy.commonPrint,shared|mediawiki.skinning.elements|mediawiki.skinning.content|mediawiki.skinning.interface|skins.vector.styles|site|mediawiki.skinning.content.parsoid&amp;only=styles&amp;skin=vector\"/></head><body id=\"mwAA\" lang=\"en\" class=\"mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki\" dir=\"ltr\"><div id=\"bar\">Selser test</div></body></html>"
      },
      "data-parsoid": {
        "headers": {
          "content-type": "application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1"
        },
        "body": {
          "counter": 0,
          "ids": {
            "mwAA": {
              "dsr": [
                0,
                23,
                0,
                0
              ]
            },
            "bar": {
              "stx": "html",
              "autoInsertedEnd": true,
              "dsr": [
                0,
                23,
                12,
                0
              ]
            }
          }
        }
      },
      "revid": "646859921"
    },
    "html": "<!DOCTYPE html>\n<html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\" about=\"http://en.wikipedia.org/wiki/Special:Redirect/revision/646859921\"><head prefix=\"mwr: http://en.wikipedia.org/wiki/Special:Redirect/\"><meta property=\"mw:articleNamespace\" content=\"2\"/><link rel=\"dc:replaces\" resource=\"mwr:revision/0\"/><meta property=\"dc:modified\" content=\"2015-02-12T22:30:30.000Z\"/><meta about=\"mwr:user/11429869\" property=\"dc:title\" content=\"GWicke\"/><link rel=\"dc:contributor\" resource=\"mwr:user/11429869\"/><meta property=\"mw:revisionSHA1\" content=\"6417e5e59b2975e65eebb5104ea572913a61db7e\"/><meta property=\"dc:description\" content=\"selser test page\"/><meta property=\"mw:parsoidVersion\" content=\"0\"/><link rel=\"dc:isVersionOf\" href=\"//en.wikipedia.org/wiki/User%3AGWicke/_restbase_test\"/><title>User:GWicke/_restbase_test</title><base href=\"//en.wikipedia.org/wiki/\"/><link rel=\"stylesheet\" href=\"//en.wikipedia.org/w/load.php?modules=mediawiki.legacy.commonPrint,shared|mediawiki.skinning.elements|mediawiki.skinning.content|mediawiki.skinning.interface|skins.vector.styles|site|mediawiki.skinning.content.parsoid&amp;only=styles&amp;skin=vector\"/></head><body id=\"mwAA\" lang=\"en\" class=\"mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki\" dir=\"ltr\"><div id=\"bar\">Selser test</div></body></html>"
  }
}

Expected:
"<div id=bar>Selser test"

Result:
"<div id=\"bar\">Selser test</div>"

Parsoid clearly fell back to non-selective serialization. The original HTML & the modified HTML are identical strings.

Is there anything wrong in the request, or is this an issue in Parsoid? At first sight it seems to match the test. Although, that test looks like it doesn't actually test selser, as it is not passing an oldid.

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added projects: Parsoid, RESTBase.
GWicke added subscribers: GWicke, Arlolra.
GWicke updated the task description. (Show Details)
GWicke set Security to None.
GWicke updated the task description. (Show Details)
GWicke removed a subscriber: Aklapper.

@GWicke: You aren't supplying the wikitext so presumably parsoid-lb.eqiad.wikimedia.org is making a TemplateRequest to en.wikipedia.test.local for the revision source and that's probably unreachable. Without the source, selser falls back to non-selective serialization.

Re: the test you mention. That's a good point. Some of those v2 API tests aren't actually enforcing anything. I was just using them to document the expected API. With the move to spec'ing the API out in Swagger, I should harden them up. I'll add this normalization test for selser.

Ahhhh, now I see that broken domain prefix too. My fault, sorry. Let me see if fixing that gets it working.

With https://github.com/gwicke/restbase/commit/481e19401a1cc10ebf335a168cb39f59ca439bee, it still does not seem to work:

{
  "uri": "http://parsoid-lb.eqiad.wikimedia.org/v2/en.wikipedia.org/wt/User%3AGWicke%2F_restbase_test/646859921",
  "headers": {
    "content-type": "application/json"
  },
  "body": {
    "original": {
      "html": {
        "headers": {
          "content-type": "text/html;profile=mediawiki.org/specs/html/1.0.0"
        },
        "body": "<!DOCTYPE html>\n<html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\" about=\"http://en.wikipedia.org/wiki/Special:Redirect/revision/646859921\"><head prefix=\"mwr: http://en.wikipedia.org/wiki/Special:Redirect/\"><meta property=\"mw:articleNamespace\" content=\"2\"/><link rel=\"dc:replaces\" resource=\"mwr:revision/0\"/><meta property=\"dc:modified\" content=\"2015-02-12T22:30:30.000Z\"/><meta about=\"mwr:user/11429869\" property=\"dc:title\" content=\"GWicke\"/><link rel=\"dc:contributor\" resource=\"mwr:user/11429869\"/><meta property=\"mw:revisionSHA1\" content=\"6417e5e59b2975e65eebb5104ea572913a61db7e\"/><meta property=\"dc:description\" content=\"selser test page\"/><meta property=\"mw:parsoidVersion\" content=\"0\"/><link rel=\"dc:isVersionOf\" href=\"//en.wikipedia.org/wiki/User%3AGWicke/_restbase_test\"/><title>User:GWicke/_restbase_test</title><base href=\"//en.wikipedia.org/wiki/\"/><link rel=\"stylesheet\" href=\"//en.wikipedia.org/w/load.php?modules=mediawiki.legacy.commonPrint,shared|mediawiki.skinning.elements|mediawiki.skinning.content|mediawiki.skinning.interface|skins.vector.styles|site|mediawiki.skinning.content.parsoid&amp;only=styles&amp;skin=vector\"/></head><body id=\"mwAA\" lang=\"en\" class=\"mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki\" dir=\"ltr\"><div id=\"bar\">Selser test</div></body></html>"
      },
      "data-parsoid": {
        "headers": {
          "content-type": "application/json;profile=mediawiki.org/specs/data-parsoid/0.0.1"
        },
        "body": {
          "counter": 0,
          "ids": {
            "mwAA": {
              "dsr": [
                0,
                23,
                0,
                0
              ]
            },
            "bar": {
              "stx": "html",
              "autoInsertedEnd": true,
              "dsr": [
                0,
                23,
                12,
                0
              ]
            }
          }
        }
      },
      "revid": "646859921"
    },
    "html": "<!DOCTYPE html>\n<html prefix=\"dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/\" about=\"http://en.wikipedia.org/wiki/Special:Redirect/revision/646859921\"><head prefix=\"mwr: http://en.wikipedia.org/wiki/Special:Redirect/\"><meta property=\"mw:articleNamespace\" content=\"2\"/><link rel=\"dc:replaces\" resource=\"mwr:revision/0\"/><meta property=\"dc:modified\" content=\"2015-02-12T22:30:30.000Z\"/><meta about=\"mwr:user/11429869\" property=\"dc:title\" content=\"GWicke\"/><link rel=\"dc:contributor\" resource=\"mwr:user/11429869\"/><meta property=\"mw:revisionSHA1\" content=\"6417e5e59b2975e65eebb5104ea572913a61db7e\"/><meta property=\"dc:description\" content=\"selser test page\"/><meta property=\"mw:parsoidVersion\" content=\"0\"/><link rel=\"dc:isVersionOf\" href=\"//en.wikipedia.org/wiki/User%3AGWicke/_restbase_test\"/><title>User:GWicke/_restbase_test</title><base href=\"//en.wikipedia.org/wiki/\"/><link rel=\"stylesheet\" href=\"//en.wikipedia.org/w/load.php?modules=mediawiki.legacy.commonPrint,shared|mediawiki.skinning.elements|mediawiki.skinning.content|mediawiki.skinning.interface|skins.vector.styles|site|mediawiki.skinning.content.parsoid&amp;only=styles&amp;skin=vector\"/></head><body id=\"mwAA\" lang=\"en\" class=\"mw-content-ltr sitedir-ltr ltr mw-body mw-body-content mediawiki\" dir=\"ltr\"><div id=\"bar\">Selser test</div></body></html>"
  }
}

Expected:
"<div id=bar>Selser test"

Result:
"<div id=\"bar\">Selser test</div>"

Is there another issue with the request?

gerritbot subscribed.

Change 190410 had a related patch set uploaded (by Arlolra):
Fix selser in the v2 API

https://gerrit.wikimedia.org/r/190410

Patch-For-Review

GWicke: There was a bug in there (see the attached patch) but I would have expected the current result to be:

<div id=\"bar\">Selser test

because you've got { "stx": "html", "autoInsertedEnd": true } in there, which the non-selective serializer should respect.

Oh, wait, nevermind. I was looking at the original data-parsoid. Not the new one, which doesn't exist. In that case, selser will assume they're all new nodes (no data-parsoid on them) and just serialize as such.

I suspect you'e about to say that data-parsoid won't change from the original and should be applied to the new html as well. And that this test that accepts a pagebundle doesn't seem right:

https://github.com/wikimedia/parsoid/blob/master/tests/mocha/api.js#L293-L320

But how do you distinguish between when to apply the original to the new, and when the new has the data-parsoid inlined?

But how do you distinguish between when to apply the original to the new, and when the new has the data-parsoid inlined?

Can you expand on this a bit? Is your concern somebody adding ids to elements that then pull in data-parsoid that originally belonged to another node?

Can you expand on this a bit? Is your concern somebody adding ids to elements that then pull in data-parsoid that originally belonged to another node?

Ha. And I'm also concerned that someone requested a pagebundle in the first place in that scenario.

Moving on.

Should we require that data-parsoid always be part of original in the post then?

Should we require that data-parsoid always be part of original in the post then?

I would say that if there is an original in the request, then data-parsoid should be used from it. Do you see a use case for supporting inline data-parsoid beyond a transition period at all? If there isn't one, then it would basically boil down to what you say.

I'm not sure about long-term inline data-parsoid support.

My question was about this test,
https://github.com/wikimedia/parsoid/blob/master/tests/mocha/api.js#L293-L320

Should we allow a post like that? Or should data-parsoid always be child of original?

Should we allow a post like that? Or should data-parsoid always be child of original?

Ah, now I get it. I don't really see a use case for passing in a modified data-parsoid, except maybe parsoids playing ping-pong? IMHO we also shouldn't encourage clients that aren't storage services like parsoid to work directly with pagebundles. It just encourages people to mess with private information when they shouldn't.

Sounds good. I'll update the patch or follow up with another to fix this.

@Arlolra: Awesome! Thank you (as always) for your swift and thorough response.

Change 190410 merged by jenkins-bot:
Fix selser in the v2 API

https://gerrit.wikimedia.org/r/190410

Guess you're just waiting on the Monday deploy now :)

When testing against the latest patch (b2c16b3628) I'm now getting something that looks like it's still not using selser:

"<div id=\"bar\">Selser test"

Those quotes shouldn't be there if selser was used.

It turns out that I need to also set fetchWT to true in the config to enable wikitext fetching for selser. The default value is off, and the documentation states:

/**
 * @property {boolean} fetchWT
 * When transforming from html to wt, fetch the original wikitext before.
 * Intended for use in round-trip testing.
 */
 ParsoidConfig.prototype.fetchWT = false;

This flag is not part of the default config file.

It is also not set in our production config.

How is this going to work in production?

Change 190601 had a related patch set uploaded (by GWicke):
Fetch wikitext if an original DOM is supplied in selser mode

https://gerrit.wikimedia.org/r/190601

Patch-For-Review

Change 190632 had a related patch set uploaded (by GWicke):
Fetch wikitext if original dom is passed in using v2 api

https://gerrit.wikimedia.org/r/190632

Patch-For-Review

Change 190601 abandoned by GWicke:
Fetch wikitext if an original DOM is supplied in selser mode

Reason:
Better solution in https://gerrit.wikimedia.org/r/#/c/190632/

https://gerrit.wikimedia.org/r/190601

Re-opened to reflect that the patch is not yet merged & deployed. Will close once that's the case.

Change 190632 merged by jenkins-bot:
Fetch wikitext if original dom is passed in using v2 api

https://gerrit.wikimedia.org/r/190632

GWicke renamed this task from Parsoid v2 API selser might not be working correctly to Parsoid v2 API selser might not be working correctly: fix in master, to be deployed.Feb 14 2015, 8:34 PM
ssastry renamed this task from Parsoid v2 API selser might not be working correctly: fix in master, to be deployed to Parsoid v2 API selser might not be working correctly.Feb 16 2015, 9:20 PM
ssastry closed this task as Resolved.
ssastry removed a project: Patch-For-Review.

Change 191674 had a related patch set uploaded (by Arlolra):
Add a v2 api test for fetching wikitext when serializing

https://gerrit.wikimedia.org/r/191674

Patch-For-Review

Change 191674 merged by jenkins-bot:
Add a v2 api test for fetching wikitext when serializing

https://gerrit.wikimedia.org/r/191674