Page MenuHomePhabricator

Update ParsoidExtensionAPI to be a coherent and functional extension API to aid extension implementations
Open, HighPublic

Description

We should use Parsoid's native implementation of several extensions (Cite, Pre, Nowiki, Gallery, Poem) to identify a narrow but coherent and consistent extension API that is sufficient for the extensions to do their job painlessly. This will require thinking about which of Parsoid's internal concepts are first-class language / MediaWiki concepts and which are Parsoid's implementation-specific concepts and use that to design the appropriate interface that can be published a first draft extension API.

But, perhaps the first step is to audit what of Parsoid's implementation details are currently exposed to the natively implemented extensions.

Details

Related Gerrit Patches:

Event Timeline

ssastry created this task.Jan 14 2020, 12:56 PM
Restricted Application added subscribers: jeblad, Aklapper. · View Herald TranscriptJan 14 2020, 12:56 PM
[subbu@earth:~/work/wmf/parsoid/src/Ext] git grep -h '^use' | grep Parsoid | sort | uniq -c
      9 use Parsoid\Config\Env;
      7 use Parsoid\Config\ParsoidExtensionAPI;
      7 use Parsoid\Ext\Extension;
      8 use Parsoid\Ext\ExtensionTag;
      6 use Parsoid\Html2Wt\SerializerState;
      3 use Parsoid\Tokens\DomSourceRange;
      2 use Parsoid\Tokens\KV;
      2 use Parsoid\Tokens\SourceRange;
      5 use Parsoid\Utils\ContentUtils;
      8 use Parsoid\Utils\DOMCompat;
      8 use Parsoid\Utils\DOMDataUtils;
      8 use Parsoid\Utils\DOMUtils;
      4 use Parsoid\Utils\PHPUtils;
      2 use Parsoid\Utils\Title;
      2 use Parsoid\Utils\TokenUtils;
      4 use Parsoid\Utils\Util;
      3 use Parsoid\Utils\WTUtils;
      2 use Parsoid\Wt2Html\TT\Sanitizer;

Change 565563 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Start untangling Parsoid internals from extensions

https://gerrit.wikimedia.org/r/565563

Change 566417 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] No need to explicitly pass 'inTemplate' flag from extension code

https://gerrit.wikimedia.org/r/566417

Change 566417 merged by jenkins-bot:
[mediawiki/services/parsoid@master] No need to explicitly pass 'inTemplate' flag from extension code

https://gerrit.wikimedia.org/r/566417

The above gerrit patches have moved the needle on this by doing a cleanup of extension code in the Parsoid repo by reducing exposure of Parsoid internals. But more work is needed. See commit message of gerrit 565563

LGoto assigned this task to ssastry.Tue, Jan 28, 10:00 PM
LGoto moved this task from To Do to Doing on the Parsing-critical-path board.
LGoto removed a subscriber: ssastry.

Change 569702 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Use extension config option for html2wt formatting of extension tags

https://gerrit.wikimedia.org/r/569702

Change 565563 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Start untangling Parsoid internals from extensions

https://gerrit.wikimedia.org/r/565563

Change 570919 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Cite: Remove more Parsoid internals knowledge

https://gerrit.wikimedia.org/r/570919

One thing I am finding is that Cite (and possibly our other extensions in the Parsoid codebase) have knowledge of DOM state and how Parsoid optimizes handling of data-* attributes by keeping them off to the side in a a bag. So, we can resolve this situation in one of two ways:

  1. Extensions don't have knowledge of the DOM state and needn't have to putz around with visitAndLoad, visitAndStore gymnastics for perf optimization, the API has to handle that transparently as far as possible. In this mode, we document the fact that extensions shouldn't make any assumptions about how / where data-attributes of DOM nodes are stored since the implementation details might change. We could require extensions to go through provided API / helpers to read/write data-* attributes.
  1. We need to document DOM states (raw DOM, post-processed DOM) and make those first-class Parsoid DOM concepts and require extensions to be aware what state the DOM is in and use appropriate helper methods.

Not yet sure what is desirable.

ssastry added a comment.EditedFri, Feb 7, 4:56 PM

Here is the list of helper utilities used in all of src/Ext/*. Some of them are just convenience helpers. But, the visitAndLoad/StoreDataAttribs as well as ContentUtils helpers expose Parsoid implementation knowledge and need to be handled as per the previous comment.

ContentUtils::ppToDOM
ContentUtils::ppToXML
ContentUtils::shiftDSR
ContentUtils::toXML
DOMUtils::assertElt
DOMUtils::hasNChildren
DOMUtils::isBody
DOMUtils::isDiffMarker
DOMUtils::isElt
DOMUtils::isText
DOMUtils::matchTypeOf
DOMUtils::migrateChildrenBetweenDocs
DOMUtils::migrateChildren
DOMUtils::selectMediaElt
DOMDataUtils::addAttributes
DOMDataUtils::addTypeOf
DOMDataUtils::getDataMw
DOMDataUtils::getDataParsoid
DOMDataUtils::setDataParsoid
DOMDataUtils::setDataMw
DOMDataUtils::visitAndStoreDataAttribs
DOMDataUtils::visitAndLoadDataAttribs
TokenUtils::kvToHash
Util::clone
Util::decodeWtEntities
Util::entityEncodeAll
Util::parseMediaDimensions
Util::validateMediaParam
WTUtils::escapeNowikiTags
WTUtils::fromExtensionContent
WTUtils::isSealedFragmentOfType
WTUtils::isNewElt

Change 569702 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use extension config option for html2wt formatting of extension tags

https://gerrit.wikimedia.org/r/569702

ssastry triaged this task as High priority.Mon, Feb 10, 12:55 AM

Change 570919 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Cite: Remove more Parsoid internals knowledge

https://gerrit.wikimedia.org/r/570919

Change 573766 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Remove direct access to Sanitizer from extension code

https://gerrit.wikimedia.org/r/573766

Change 573766 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Remove direct access to Sanitizer from extension code

https://gerrit.wikimedia.org/r/573766