List of steps to reproduce (step by step, including full links if applicable):
- when compiling php, compile it against a version of PCRE that has build time option PCRE_CONFIG_NEWLINE set to -1 (any)
- visit a page like talk:ą and note how its in the wrong namespace. This is especially noticible in special pages in polish.
What happens?:
*Any page with a unicode character that has 0x85 in its utf-8 representation is misidentified. I believe this is due to PCRE_CONFIG_NEWLINE causing it to be considered a line ending character
What should have happened instead?:
- we should treat 0x85 as a normal character.
- we should prevent installation if we detect this configuration
Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:
Allegedly this happens on openbsd's php package.
See also https://www.mediawiki.org/wiki/Topic:Wrdkyaid33xitqz2