Tue, Mar 20
(sorry I'm very new to MCR)
How will this work regarding namespaces?
I mean can there be a mix of namespaces here or is there a single top level namespace somewhere?
Looks like it's a big win in all rounds for explorer, my take away is:
- despite having a higher ZRR (which was not affected by the test) explorer wins. ZRR directly impact CTR, bad luck for explorer which could have had better CTR with similar ZRR.
- CTR is significantly higher
- explorer wins in all interleaved tests (I think it's the first time we see a group winning on every days)
- there's a big win in CTR for explorer on day 11 which seems unnatural. Perhaps due to a big session this day.
- every other graphs show a preference for explorer except the "Return to make a different search". But this one is hard to interpret and everything is within the error bounds.
Mon, Mar 19
@Dvorapa thanks for your feedback, I'll decline the issue as we prefer to focus on new keywords that are not ambiguous like prefix.
Subpageof was not yet broadly advertised as we are still gathering feedback to know if it covers most of the usecases the prefix keyword covers. In this case your query perfectly fits the usecases we wanted to cover with subpageof.
Unfortunately the prefix keyword is a bit special as it will ''consume'' all the characters after it. In other words the limitation is that you can only use one and it must at the end of the query. We attempted to fix this but we realized that this keyword was used in many pages to link search results and we preferred not to change how it works to avoid breaking those links.
Fri, Mar 16
Wed, Mar 14
Tue, Mar 13
Mon, Mar 12
Fri, Mar 9
Closing as invalid since I don't see anything here that is not expected.
- with prefixsearch the namespaces selected using api params can be overwritten using a namespace prefix in the search query
- when asking the API to follow redirects the results may not be in the namespace requested in case of cross-namespace redirects
Thu, Mar 8
Oh my bad! this is using prefixsearch so please forget what I said about the inconsistency between Search:Search and API.
I may be wrong I think this the expected behavior, one can override the list of namespaces set in the API params (or namespace filter in Special:Search) by prefixing their query with a namespace:
- file:foo will search foo in the File namespace disabling any namespace selection made with api params
What seems to be directly related is the html generated for file results vs text page results.
Text page results will have a div with a class mw-search-result-heading.
Image results don't have this one.
According to https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/master/modules/all/ext.wikimediaEvents.searchSatisfaction.js#670
we rely on this div to identify SRP clicks.
At a glance I'd say that we ignore all image clicks (not only the clicks on the thumbnails).
Thanks for catching the problem!
Tue, Mar 6
@zeljkofilipin I'm interested but I don't consider myself a very knowledgeable developer for this kind of reviews, I'd be happy to help if I can :)
Mon, Mar 5
Wed, Feb 28
I'm in the process of refactoring all of this to include sane interfaces for keywords to change the highlighting behavior. Unless there's some urgency in fixing this I'd prefer to wait until I'm done with the refactoring.
Tue, Feb 27
Yes, but since this keyword will fail before all wikis are reindexed I think we should postpone any doc addition until the reindex is done. A task for adding it to the doc is good idea.
Mon, Feb 26
Agreed on the documentation.
The limit applies to the website as well.
Are there any reasons you don't use &formatversion=2? The output is then clearer since the pageid is no longer an array index but a property that is only set on valid titles?
On the beta cluster some of the elastic indices are not up to date because of T125976.
I'm running the script manually to see if it mitigates the issue.
Proper fix would be to explicitly check for the Title existence before returning it to the user.
Tue, Feb 20
@Lea_WMDE this is because subpageof also covers redirects e.g.:
Historical archive/Friends of Wikipedia/Websites using Wikipedia articles is a redirect to Wikipedia:Mirrors and forks.
Feb 15 2018
These two projects can be archived:
- search ltr is now maintained at https://github.com/o19s/elasticsearch-learning-to-rank conjointly with opensource connections
- reposotory-swift is no longer maintained by us but: https://github.com/BigDataBoutique/elasticsearch-repository-swift
Feb 14 2018
@Aklapper I think this is slightly different.
I think the 1.31.0-wmf21 was cut just before this was merged so this will go to production wikis next week and when the config https://gerrit.wikimedia.org/r/#/c/410242/ is deployed. Earliest would be Friday 23 if we swat the config on Thursday evening.
Feb 12 2018
This is in theory possible but the problem is that some profiles refer to some class implementations that are maybe not available on the host wiki.
So yes we could use the sister search logic with some adaptation but the may blocker will be that the builder implementation won't be available if the wikibase extension is not loaded on the host wiki.
We will have similar problems with SDoC search sooner or later since search on commons is available from all wikis. If SDoC search provides some custom implementations the code will have to be available on the host wiki.
In short, if the WikibaseClient imports the builder classes then it's probably fine, if not I think it'll be hard to do it.
Feb 9 2018
Feb 8 2018
The problem was due to the initial deployment of the refactoring of profile management in cirrus. I failed to anticipate this problem, the problem would have been fixed by rolling out .20 to group2 wikis. The fix provided by Stas is the right one for this situation, in theory it could be reverted once wmf20 is deployed everywhere (but not strictly needed).
The difficulty here (with crosswiki searches) is that an old version (group2) is accessing config from a new version (group1), this means that when we refactor we don't have to think about back compat issues in term of code deployed in the new branch but back compat of the config generated by the new code.
This makes testing this kind of issues very hard, we would have to run searches with version N-1 of the code with the config generated with version N.
Anyways, I apologize for the troubles this has caused, thanks Stas and Erik for the quick fix.
Feb 6 2018
Thanks for the report.
Indeed elasticsearch 5.3.x or 5.4.x is needed if running Mediawiki 1.30 (https://www.mediawiki.org/wiki/Extension:CirrusSearch).
Elastic 5.5 and 5.6 will be supported by MW 1.31.
Plugin a stemmer in the completion suggester is possible but I'm having difficulties to anticipate all the drawbacks that may occur doing so.
The reason is that it's an autocomplete search. Meaning that we do partial matching, basically in all the examples given above only fully written phrases have been studied but we have to keep in mind that we still suggest pages when the phrase/word is not fully written. In other words we will apply a stemming algorithm to partially typed words.
Perhaps the best approach would be to setup a small demo so that polish speakers could try this approach and tell us if it's worthwhile.
Feb 5 2018
This makes perfect sense. Then I'm not sure this problem is fixable, I'm tempted to mark this ticket as invalid unless you think there are solutions to make this more obvious for devs that read jenkins output. I don't see any great solutions other than a small information line in the build output prior PHPCS that says: "The lines reported by PHPCS may not align with your patch as seen in gerrit if it was rebased".
Or perhaps outputting the faulty line in the PHPCS report if this is possible?
Anyways, please feel free to close this task as Invalid.
Feb 2 2018
Feb 1 2018
When enabling the role wikibase_repo it seems that wikibase is enabled for all the wikis.
This makes this role problematic when enabled with other extensions like CirrusSearch.
CirrusSearch needs to enable many wikis (multilang role) and it's not practical to have wikibase enabled for all of these wikis.
Would this task will allow to fix this problem and have wikibase only enabled for few set of wikis and not all of them?
Jan 22 2018
Currently the CI infra does not have a way to setup the env needed to run the CirrusSearch integration tests (T185462).
Jan 17 2018
@Gopavasanth index names mean the name of the indices created by the extension CirrusSearch in elasticsearch.
CirrusSearch is the extension that provides search functionalities using elasticsearch as a backend.
Jan 16 2018
@Oetterer I don't think this is related, this task is just to track progress on making the extension CirrusSearch compatible with the new extension registration process. It is just listing the pieces of code that make this refactoring problematic not actual problems regarding Config factories.
Perhaps we'll end up having the same issues but I don't think we have code that need to be run just after the extension is loaded.
Jan 5 2018
Jan 3 2018
I remember that it's due to the type of API param we use. When setting an array as ApiBase::PARAM_TYPE a default must be provided IIRC.
The use of arrays was a way to expose the list of possible profiles to use but the drawback was that the API would fail if you provide an unknown param. I think this is wrong, I agree, cirrus should be able to know if a profile was explicitly set by the user.
Dec 21 2017
This is not only the namespaces selected and saved by the users but also the list of default namespace searched by default.
Currently when the extension is enabled you can encouter a strange behavior that looks like a bug:
Dec 20 2017
Same for me I'd be for trying to increase the refresh rate on wikidata_content.
Dec 19 2017
I ported elasticsearch-memory and elasticsearch-indexing.
Dec 18 2017
Q45825730 is me, I used this one just to test.
If a large majority of such usecases involve searching the entity id (QXXX) of the newly created item we can perform an additional db match to compensate the lag of the search index.
It's what we do for normal wikis, a db match is run in addition to the query sent to the search index.
If users search for the label or aliases of the newly created then this solution is pointless.
Dec 14 2017
Dec 13 2017
The error EADDRINUSE /tmp/cirrussearch-integration-tagtracker means that the tests are running in the background or that we failed to cleanup the socket when the tests finished or was killed.
It's perfectly fine to delete /tmp/cirrussearch-integration-tagtracker if you think the test is no longer running.
A decent place for profiles has always been a pain and I could not find something sane. I'd like to address (improve) this problem adding a ProfileManager in cirrus.