We should enable this wmf_capped profile so that the documents we index remains under reasonable size.
It does not seem that we need a per-wiki approach to this so adding this config option in CirrusSearch-common.php seems appropriate.
The config var is CirrusSearchDocumentSizeLimiterProfile and the value to set it to is wmf_capped.
For testing: the request https://test.wikipedia.org/w/api.php?action=query&format=json&prop=cirrusbuilddoc&titles=Template%3ALong&formatversion=2&cbbuilders=content should work and produce the same output as https://test.wikipedia.org/w/api.php?action=query&format=json&prop=cirrusbuilddoc&titles=Template%3ALong&formatversion=2&cbbuilders=content&cblimiterprofile=wmf_capped
Once this profile is enabled we might be able to drop the wgCirrusSearchMaxFileTextLength config entry in wmf-config/InitialiseSettings.php as it should be taken care of by the limiter.
AC:
- CirrusSearch documents should remain under 4Mb when serialized to json
- This 4mb should visible in https://grafana-rw.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?orgId=1&viewPanel=58 after deploying the config change
- wgCirrusSearchMaxFileTextLength is removed from wmf-config/InitialiseSettings.php