Page MenuHomePhabricator

Parsoid transform - TypeError
Closed, ResolvedPublic

Description

I've just noticed this error in the RESTBase logs:

TypeError: Cannot call method 'hasOwnProperty' of undefined
    at applyToSiblings (/srv/deployment/parsoid/deploy/src/lib/mediawiki.DOMUtils.js:2028:18)
    at Object.DU.applyDataParsoid (/srv/deployment/parsoid/deploy/src/lib/mediawiki.DOMUtils.js:2038:3)
    at html2wt (/srv/deployment/parsoid/deploy/src/api/routes.js:258:7)
    at routes.v2Post (/srv/deployment/parsoid/deploy/src/api/routes.js:880:4)
    at callbacks (/srv/deployment/parsoid/deploy/node_modules/express/lib/router/index.js:272:11)
    at /srv/deployment/parsoid/deploy/src/api/routes.js:562:4
    at /srv/deployment/parsoid/deploy/node_modules/core-js/modules/es6.promise.js:67:39
    at /srv/deployment/parsoid/deploy/node_modules/core-js/modules/es6.promise.js:77:6
    at process._tickCallback (node.js:415:13)

This was thrown for the /fr.wikipedia.org/v1/transform/html/to/wikitext/Kei_Koito/115908268 RESTBase endpoint.

Full log entry - P771

Event Timeline

mobrovac raised the priority of this task from to Needs Triage.
mobrovac updated the task description. (Show Details)
mobrovac added a project: Parsoid.
mobrovac subscribed.

Seems to imply that data-parsoid was passed in but it didn't contain an ids property.
https://github.com/wikimedia/parsoid/blob/master/lib/mediawiki.DOMUtils.js#L2028

We can protect against that but the history looks like there was an edit war,
https://fr.wikipedia.org/w/index.php?title=Kei_Koito&action=history

Might be worth simulating edits in rapid succession to the same page with restbase and see how it holds up?

Arlolra set Security to None.

On our side, just checking that ids is set might be masking the problem. Would it be better to return 400 Bad Request?

I'm pretty sure that the underlying issue is insufficient error checking in RESTBase. The cassandra backend does not return a 404 on an empty result set. This is a problem if data-parsoid is not found, as the request should return an error instead of happily continuing to send an empty data-parsoid object to Parsoid.

The other bit of error checking that we could improve is on write. Currently we write both data-parsoid and html in parallel. If we instead stored data-parsoid first, and only stored html if the data-parsoid save went through, then we would avoid the situation where only html is successfully stored.

GWicke triaged this task as High priority.EditedJun 11 2015, 10:01 PM

@Arlolra, I think it would definitely be good to return a 400 if a passed-in data-parsoid object ends up being empty. That would avoid corrupting content.

I wonder if T98887 is because of missing data-parsoid causing Parsoid to effectively mark the whole page dirty (right now, we consider data-parsoid content during diffs).

Change 217730 had a related patch set uploaded (by Arlolra):
Return 400 if the passed in data-parsoid is empty

https://gerrit.wikimedia.org/r/217730

Change 217730 merged by jenkins-bot:
Return 400 if the passed in data-parsoid is empty

https://gerrit.wikimedia.org/r/217730

Assigning to Petr, he'll finish this up on the RESTBase end.

@Arlolra @ssastry has the Parsoid patch been deployed yet? The logs still show (from today):

TypeError: Cannot call method 'hasOwnProperty' of undefined
    at applyToSiblings (/srv/deployment/parsoid/deploy/src/lib/mediawiki.DOMUtils.js:1989:18)
    at Object.DU.applyDataParsoid (/srv/deployment/parsoid/deploy/src/lib/mediawiki.DOMUtils.js:1999:3)
    at html2wt (/srv/deployment/parsoid/deploy/src/api/routes.js:146:7)
    at routes.v2Post (/srv/deployment/parsoid/deploy/src/api/routes.js:780:4)
    at callbacks (/srv/deployment/parsoid/deploy/node_modules/express/lib/router/index.js:272:11)
    at /srv/deployment/parsoid/deploy/src/api/routes.js:450:4
    at /srv/deployment/parsoid/deploy/node_modules/core-js/modules/es6.promise.js:67:39
    at /srv/deployment/parsoid/deploy/node_modules/core-js/modules/es6.promise.js:77:6
    at process._tickCallback (node.js:415:13)

@Pchelolo's PR for checking the data-parsoid return is here .

@Arlolra has the Parsoid patch been deployed yet?

Yes, but it's proved insufficient. @GWicke and I discussed possible reasons why that might be on Friday but remains inconclusive. In any case, I've started another patch.

https://gerrit.wikimedia.org/r/#/c/219284/

Yes, but it's proved insufficient. @GWicke and I discussed possible reasons why that might be on Friday but remains inconclusive. In any case, I've started another patch.

It turns out the likely cause for this is some data corruption caused by RESTBase's revision policy retention mechanism, where data-parsoid was wrongfully double-encoded, so that we were storing JSON.stringify(JSON.stringify(data)), thus providing Parsoid with a String, not a JSON, cf. this RESTBase PR.

Change 219284 had a related patch set uploaded (by Arlolra):
T102117: Improve validating dp in the api

https://gerrit.wikimedia.org/r/219284

Change 219284 merged by jenkins-bot:
T102117: Improve validating dp in the api

https://gerrit.wikimedia.org/r/219284

This has been fixed on both ends, so resolving.