There are two implementations of the preprocessor in the mediawiki core code base. We should deprecate one of them. Why?
* With the migration to Parsoid/PHP, we are going to be first hooking Parsoid into the preprocessor and then later replacing the legacy preprocessor entirely. Maintaining two copies of the preprocessor needlessly duplicates work (and introduces the potential for subtle bugs) in code we are ultimately going to remove anyway.
* It is good practice according to our deprecation strategy to deprecate before removal; the Parsoid/PHP transition is going to be involved and won't necessarily provide adequate notice before certain features in one preprocessor implementation can't be supported any more (see https://gerrit.wikimedia.org/r/418198 comment on PS2 for example). Deprecating one of the implementations early in 1.33 is kinder to our downstreams and lets us identify any unnecessary use of a specific preprocessor class (like https://gerrit.wikimedia.org/r/460200) before it becomes a problem with the Parsoid port.
So, if we should duplicate one, which one should we deprecate?
* The original reason for splitting the preprocessor seems to have been to avoid a dependency on the standard `dom` extension to PHP. But present-day MediaWiki already depends on the `dom` extension in other places: Remex-based tidy, the localisation cache, and SiteImporter for example. The `dom` extension is standard in PHP and enabled by default.
* A secondary reason was that the hash-based implementation performed better in early experiments with HipHop (and there is a vague reference in the WMF configuration to [iffy memory allocation](https://github.com/wikimedia/operations-mediawiki-config/blob/60f87b7c5a1b06d38917274b4b09ea5b9dfb5534/wmf-config/InitialiseSettings.php#L12865) ). But [HHVM will shortly end support for PHP](https://hhvm.com/blog/2018/09/12/end-of-php-support-future-of-hack.html), and MediaWiki is dropping HHVM support (T192166).
Since the point of this exercise is to facilitate a future Parsoid port, we recommend keeping the `Preprocessor_DOM` implementation. It will play better with Parsoid (which is DOM-based), and is apparently faster on PHP 7 (which we are moving to: T176370).
Note that we aren't committed to removing code just because we've deprecated it, although Parsoid/PHP will eventually replace the preprocessor entirely with a unified tokenizer.