I looked at all of these as best I could, and I decided on a general mapping to standard Arabic forms internally. Arabic does that to some degree, as does Persian! And for the languages without custom stemmers and stop word filters, the character used internally doesn't matter, as long as the desired words can find each other.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Tue, Apr 30
Mon, Apr 29
Wed, Apr 24
Apr 19 2024
In reading up on the ligatures, I found another ligature (yod-yod-patah ײַ) that has several variants, one using a ligature from above (double-yod + patah ײַ), one with separate characters (yod + yod + patah ייַ), and a less common variant with the patah in the middle (yod + patah + yod יַי). It looks like icu_normalizer already converts the single-character form (ײַ) to one using the double-yod ligature (ײַ).
Apr 18 2024
Apr 17 2024
While we had planned to expand the deployment of the hiragana-to-katakana mapping from English to most other languages (though not Japanese), testing revealed that doing the mapping pre-tokenization interfered with the new ICU tokenizer's ability to parse Japanese text (on non-Japanese wikis).
Apr 16 2024
Apr 15 2024
In T362442#9714689, @Xaosflux wrote:#REDIRECT [[Page#Anchor]] - going to it should still lead you to the Anchor, not to #top of that page - as is being done here.
Showing the canonical page title in the suggestion was a design decision that was made for Vector 2022, though it can be confusing when the full title isn't obviously related to the redirect title. T303013 has some potential heuristics for deciding when to show the redirect info. Feel free to chime in over there if you don't feel like your use case would be covered (the specific artificial example here would be covered).
Is this substantially different from T331389? ("...<nowiki> output in search result descriptions")
Apr 11 2024
Apr 10 2024
In T358495#9705136, @NMW03 wrote:Not sure if this task fixes that, lowercasing I and dotted I (İ) returns different lowercase letters
Apr 9 2024
Apr 5 2024
Apr 3 2024
Not getting automated tags for some reason, but this is included in 1.42.0-wmf.25, so it will be deployed soon.
Mar 29 2024
Mar 28 2024
Full write-up (and it's a lot!) is on MediaWiki.
Mar 21 2024
Mar 19 2024
The full write up is on MediaWiki.
This is done for commons and wikidata for the production clusters (eqiad and codfw) as a result of T342444. (wikidata hasn't reindexed in cloudelastic yet, but it is in the queue.)
Mar 18 2024
Swapping Assignee and Other Assignee with Erik, since he's still working on reindexing cloudelastic via the new update pipeline backfill mechanism.
Mar 6 2024
Adding Erik as "other assignee" (never done that before) and increasing the points because Erik is doing more than usual for reindexing watching the cloudelastic reindex using the new update pipeline backfilling mechanism, and I've been doing more than usual gathering stats while fretting a bit over reindex speed.
Mar 4 2024
@RKemper, it does look like it's deployed everywhere it should be!
Feb 27 2024
Feb 26 2024
Moving this back to the backlog in favor of a smaller next harmonization project.
Feb 21 2024
Feb 20 2024
Full write up on MediaWiki.
Feb 13 2024
Feb 6 2024
T332337 has been comitted, so this is ready to go.
Feb 5 2024
Jan 26 2024
More detailed writeup (which partially overlaps the plugin docs) on MediaWiki.
Jan 24 2024
Gerrit patch for the plugin (which wasn't added here automatically): https://gerrit.wikimedia.org/r/c/search/extra/+/972478
Dec 5 2023
This happens across all wikis, not just cswiki.