Wed, Mar 3
T271387 is when filters have an invalid value that we don't know how to handle in the UI (it's not one of the values that could be chosen in one of the dropdowns). It can only be achieved by manually mishandling the URI (or via software that messes up links) since we provide no controls to construct a search like that.
This is actual search input (can simply be entered in the search input field) that the search engine just happens to fail to process and make sense of. It's a rather extreme edge case, but it's plausible that it happens by accidental user input.
(FWIW, I think we can give both the same generic error msg treatment)
Tue, Mar 2
FYI: I would absolutely love to see numbers in the other tabs. But this will require a separate search query for each tab, which will mean a five-fold increase of search traffic (and it'll take longer to render the page, because it now has to wait for 5 results)
I think we'll need to look into other ways to draw attention to the other tabs.
Mon, Mar 1
I suspect the correct solution within MediaInfoHandler::getTitleForId would probable be to throw an InvalidArgumentException in case there's no title for the given id (EntityHandler::getTitleForId is documented to throw that exception if $id refers to an entity of the wrong type.)
Fri, Feb 26
I think we have 2 separate problems here:
Thu, Feb 25
Yeah, that's not the expected behavior.
This ticket required a change in 2 places.
I've inquired about when (if any) upcoming release is scheduled here: T274252#6859705
Actually - is there a timeline for releasing the next version of this repo?
It looks like (except for this patch) nothing has happened in this repo for the past 8 months, so I'm worried it'll sit around for a long time.
If there's no plan to release this soon, that's ok - in that case MediaSearch can simply reimplement something like getLocationAgnosticMwApi.
Wed, Feb 24
I don't think running in parallel with different inputs would be a problem.
Ping @Mholloway in case he has thoughts.
Fri, Feb 19
This is a totally arbitrary format that already exists for several of Cirrus' on-wiki configurable code (e.g. cirrussearch-boost-templates; see CirrusSearch\Util::parseSettingsInMessage)
The only deviation from the other existing messages is that the values can be split up over multiple lines, because the lists are expected to be long.
Thu, Feb 18
Wed, Feb 17
I think that looks great
Tue, Feb 16
Note: this bit of code has already been merged (even though it has not yet undergone backlog grooming). Because this has minimal impact and I suspect the behavior is desirable, I will let it go ahead & be deployed.
If backlog grooming comes around and find that this is undesirable, ping me & I will remove the relevant code. Otherwise, this can move straight to "Needs QA"
Mon, Feb 15
Fri, Feb 12
I've just tested the steps to reproduce, and even after a lot of scrolling, I didn't run into 429s. Yesterday was the same. Several hundreds of thumbnails (that obviously had to be generated on the fly, given their slow response time) loaded just fine.
I don't know whether it's still much of an issue.
I don't think "categories and pages" doesn't infer anything about their priority/order; I simply read it a "both of those things" (not particularly in any order)
We *could* prioritize categories over other pages, but that would have the effect of categories that are a very poor match for the search term outranking other pages that are obviously much better matches, which I don't think is desirable.
Thu, Feb 11
The tool will run the API to get 500 random unillustrated articles from each wiki and their image recommendations
@CBogen Is that 1 image recommendation per article, or however many the API returns by default?
(asking because it'll have a significant impact on the amount of images that will need to be evaluated)
Wed, Feb 10
I have a patch in code review that deviates slightly from the task description.
I figured it'd be confusing or misleading to categorize media replacements as "add media" (but we probably also wouldn't want to ignore those), so instead of a singular "add media" tag, I've added 3: one for additions (only), one for removals (only), and another for changes (both additions & removals).
New media (including replacements) thus includes both "add media" and "change media".
Does that work?
Patch has been approved, but won't be testable until it's enabled on-wiki, which is blocked on security readiness review (T266513)
Moving to blocked until that is complete.
Tue, Feb 9
Yes, that is in the patch that has been merged.
Closing. For the most part, this is already part of T258053 as well (with the exception of non-english aliases for non-english searches, for which we don't have efficient means of fetching, and we likely have little relevant data anyway)
Thu, Feb 4
AIUI, this is no longer blocked. We can proceed to make mediasearch default (for searches within file namespace), right?
Wed, Feb 3
Feb 1 2021
Jan 29 2021
Jan 28 2021
I have updated the ticket. Please look over the changes (esp. the last acceptance criterium, which would grant community better control)
The map of these license statements can be found at https://commons.wikimedia.org/w/index.php?title=MediaWiki:Wikibasecirrus-license-mapping
These 2 were missing from the license map config:
P275=Q98755364, # copyright licence = Commons Attribution-ShareAlike 3.0 Italy P275=Q98755344, # copyright licence = Commons Attribution-Share Alike 3.0 Serbia
I have added them, and the files in question now no longer appear in search results they don't belong.
We've manually been assessing thousands of search results, and the data that we have indicates that redirects are a pretty good signal (worse than titles, but better than text)
Ergo: we probably should not remove redirects from the data used by the algorithm, just because there's false information in there: there's true for all other fields.
I think the right thing to do in this specific case would be to remove the redirect (given that it's essentially false - you also wouldn't want "dog" to redirect to "mona lisa")
Jan 27 2021
Special:Search allows selecting multiple namespaces at once.
None of our current filters support that IIRC - what would that look like (if at all possible)?
Since I'm here, I figured it'd be worth pointing out a few caveats about the minifier, most of which probably won't really be a surprise:
- This grew from a few very tiny regular expression. It is a lot more complex now, but it's essentially still just a bunch of regexes. They're usually faster than any PHP userland code could parse JS syntax, but they're limited in their ability to process code (e.g. finding matching closing brackets in nested structures is something regexes are not great at...)
- Since it's just a bunch of regexes, it does not parse or validate the original or produced code, so it may silently produce invalid results.
- One of the most common ways in which it could silently fail is when the PCRE limits (pcre.backtrack_limit & pcre.recursion_limit - see http://php.net/manual/en/pcre.configuration.php) are configured too low. I've optimized the regexes a bunch and have not recently seen bug reports that indicate it still happens frequently (if at all), but it is technically still possible, given low enough limits & code written in a specific way to trigger large pcre recursion/backtracking.
- A JS-based minifier should be much superior than this (or any other) PHP-based minifier. While I started to build it for exactly the same reasons that apply to MediaWiki (be able to run with minimal fuss & additional infrastructure), it might makes sense to attempt to shell out to a JS compiler for WMF sites (if at all possible), and use a PHP-based solution as fallback, when that infrastructure is not available (for many 3rd party installs)
(I had no idea this was going on - feel free to ping me if there's anything I can help with)
https://github.com/matthiasmullie/minify/commit/8538190f4ab21f77c938e51109547f0e943f7d44 would probably fix the slowness in that regex. Thanks for tracking that down!
Jan 26 2021
The mediasearch query param must remain until T262271 is resolved.
The api emits a warning that it's an unrecognized parameter (it's not an actively supported API param), but it is required to allow the mediasearch profile to be used until it becomes enabled by default.
Dropping the query param from the request would result in old (non-mediasearch) search profiles being used. That warning can simply be ignored until we can drop the query param from the request.
Jan 25 2021
Jan 22 2021
Under normal conditions, it is not possible to submit invalid values like the one encountered here.
I suspect there must've been a federation configuration issue >1y ago where test-commons thought it was linked to production wikidata rather than testwikidata, thus allowing this invalid data.
Either way, I have a patch that should handle these kind of issues (that don't usually seem possible in the first place) in a more graceful way, and allows for fixing the invalid data.
Jan 21 2021
The code still hasn't been removed, so we should probably keep this one open until the actual code is gone.
I had resolved T258055 because we have the answers that we need (but maybe I should've kept it open until all relevant code is also gone)