Page MenuHomePhabricator

"Properly" address missing srcText issues in PageConfigFrame
Open, Needs TriagePublic

Description

Currently Parsoid/PHP crashes if there's not a valid srcText specified in the top-level Frame (which is PageConfigFrame). This is used in both serialize and parse (see T234548 -- technically we use SelserData instead of Frame right now).

We sort of kludge around this by using the empty string '' as a srcText when we create the PageConfig/Env in a number of paths, but this masks cases where this could be an actual bug (ie, where srcText is required but we're not providing it).

One solution might be to add a boolean parameter to the PageConfigFactory (or something like that) which would come all the way from the API and indicate when we expect this to be a new page (in which case we won't have a srcText) vs where we expect there to be a previous revision and we should error out if we can't fetch the text of that.

Details

Related Gerrit Patches:

Event Timeline

cscott created this task.Oct 3 2019, 4:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 3 2019, 4:46 PM

Change 539428 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Ensure we have page content for the Api configs

https://gerrit.wikimedia.org/r/539428

Change 539428 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Ensure we have page content for the Api configs

https://gerrit.wikimedia.org/r/539428

Change 547013 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Return a clean 404 error if a non-existant title is requested via REST

https://gerrit.wikimedia.org/r/547013

Change 547013 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Return a clean 404 error if a non-existant title is requested via REST

https://gerrit.wikimedia.org/r/547013

Change 550863 had a related patch set uploaded (by C. Scott Ananian; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Add handling for new / missing pages in html2wt direction

https://gerrit.wikimedia.org/r/550863

Change 550863 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add handling for new / missing pages in html2wt direction

https://gerrit.wikimedia.org/r/550863

I'd like to keep the clarity of SelserData, that accessing the page frame while serializing should throw, unless we're doing selser.

I'd like to keep the clarity of SelserData, that accessing the page frame while serializing should throw, unless we're doing selser.

You and Scott disagree on that. I think Scott is advocating for all page source access to go through the page frame and that that is the way to properly handle VE-edited extensions with HTML updates, nested documents, etc. So, I think that should be resolved before additional changes in this code.

I'm not saying we shouldn't use the page frame (I think I've missed the boat on that), I'm saying that when we do, we add some guardrails around it.