As far as cirrussearch is concerned a template and it's redirects are the same page. Will have to find some time next week to look closer into this particular case, but in general there is no useful distinction in CirrusSearch between redirects and the pages they redirect to.
Well, the use case is that I want to edit those pages which transclude the redirect, in order to modify template name and old parameters, rather than pages transcluding the generic template name where new parameters are already used.
That works fine if generic template name and redirect name differ significantly.
I took a closer look at this and indeed, the mapping is performing lowercasing for all queries to the template field (and we only have a single analysis chain applied). We can probably simply change the analysis chain there, will require some quick review that template boosting is all using appropriately cased template names across wikis that have configured it.
I'm not sure if there might be more knock-on effects though...
Change 512198 abandoned by EBernhardson:
Templates in search should be case sensitive
discussed with Stas, we think a better way forward is to have the field indexed both ways (case sensitive and insensitive). Unfortunately we are having some disk space problems and adding new fields will have to wait for Q1 to replace aging servers.
Thank you for now.
BTW, linksto: and incategory: are both using page names as well.
I do not expect categories distinguished by letter case only, but for linksto: there might be a difference between BIOS and Bios articles, even more at Wiktionary with significant first letter.
High level plan:
- Adjust cirrussearch mapping generation to add a case-sensitive multi-field to the template property
- Run an in-place reindex across all wikis and clusters
- Adjust hastemplate keyword to utilize new case-sensitive multi-field
Adjust cirrussearch mapping generation to add a case-sensitive multi-field to the template property
- multi-fields: https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
- current mapping: https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump
- Entry point to mapping generation code: CirrusSearch\Maintenance\MappingConfigBuilder
- template field definition: ContentHandler::getFieldsForSearchIndex in mediawiki core
- factory that creates SearchIndexField instances: CirrusSearch\Search\CirrusSearchIndexFieldFactory
- concrete field class created for template field: CirrusSearch\Search\KeywordIndexField
Today the template field is defined as:
$fields['template'] = $engine->makeSearchFieldMapping( 'template', SearchIndexField::INDEX_TYPE_KEYWORD ); $fields['template']->setFlag( SearchIndexField::FLAG_CASEFOLD );
FLAG_CASEFOLD is used to tell the search engine that it should ignore case for this field. It seems like what we actually want to tell the search engine is that casefolding is convenient for default searches, but to identify a specific template requires case-sensitive matching. Whatever name is chosen to indicate this, KeywordIndexField::getMapping will need to be adjusted to recognize the flag and generate an appropriate multi-field.
Run an in-place reindex across all wikis and clusters
Adjust hastemplate keyword to utilize new case-sensitive multi-field
Adjust CirrusSearch\Query\HasTemplateFeature::parseValue to recognize whatever syntax is agreed on to trigger case-sensitive matching, returning a 'case-sensitive' property along with the current templates. Use this value in HasTemplateFeature::doGetFilterQuery to decide the appropriate field to filter on.
Just a reminder:
- The Flag needs three states:
- SENSITIVE_2ND or IGNORE1_SENSITIVE
- For hastemplate: and incategory: IGNORE1_SENSITIVE is appropriate.
- On a Wiktionary, linksto: is SENSITIVE_ALL for main namespace, but any other linksto: is IGNORE1_SENSITIVE.
- Common text search is IGNORE.
- When accessing the database tables, it is no problem at all, capitalizing first character of the title part (not namespace) will deliver IGNORE1_SENSITIVE entries.
- There are config variables indicating which page names might need SENSITIVE_ALL.
I suppose that the last remark refers to the $wgCapitalLinks and
$wgCapitalLinkOverrides configuration variables.
When querying cirrus properly honors these parameters in a way that searching for hastemplate:foo will actually search for Template:Foo on english wikipedia but Template:foo on english wiktionary.
For the indexed value a single flag is needed I think because the wiki configuration will be taken into account by CirrusSearch when searching.
Well, on a wiktionary only pages in main namespace are SENSITIVE_ALL, but templates and categories do behave like every other wiki.
On any non-wiktionary main space pages and all others are IGNORE1_SENSITIVE, afaik.
The patch will go out with the train in the last week of april (no train running next week). The reindex that allows this to work has mostly completed, a few wikis have to be re-run but will hopefully be finished at or soon after this train rolls forward.