Yes! A duplicate! Sorry, resolving ...
@egardner can this be moved out of code review now?
Fri, Jul 30
Mon, Jul 26
Or ... better still - use a dismax of a field and its plain version when creating the query
Any progress on this?
Fri, Jul 23
in the user testing Growth did in Dec 2020, we asked users to rate the different pieces of information for usefulness and the source (labeled "Suggestion reason" in the test UI) was one of the most highly rated pieces of information by users, particularly when the source was that it was used in the same article in another language Wikipedia as this was easy to understand. This appears to be reflected in the Android MVP data, where ratings were 80% for when this was the suggestion reason shown
Thu, Jul 22
In the future, we'd like to be able to filter by source type, e.g. only show Wikidata-based recommendations (which tend to be significantly more reliable). This is not needed for the first iteration, but since otherwise the search index data would probably not need to be reloaded between iterations, it might be easier to deal with it now.
So the easier-to-follow process for doing search tuning is
FWIW here's the code used to train/test the logistic regression model
tl;dr: gathering more labeled data does not look like it will measurably improve the precision of our results, so there's no point in making a big effort to do it
Tue, Jul 20
@ArielGlenn Can we close this now?
Mon, Jul 19
@egardner is this something the design systems team is already planning?
blocked by https://phabricator.wikimedia.org/T280368
Blocked by https://phabricator.wikimedia.org/T280368
Thu, Jul 15
Tue, Jul 13
Mon, Jul 12
This will get done as part of T280368
Jul 1 2021
So when querying the search api, you guys need to set srqiprofile =empty in the request, then the confidence score can be worked out as descirbed
Jun 30 2021
Jun 28 2021
Jun 25 2021
Jun 23 2021
Hmm or maybe not. maintenance/createFileListfromCategoriesAndTemplates.php doesn't handle sub-categories
Jun 22 2021
After some poking around I see that step 1 can be accomplished using the scripts
Jun 21 2021
@Vlad.shapik can this be resolved now?
Jun 16 2021
Turns out the problem was the clock on my phone had drifted! Working fine now, thanks @Urbanecm
Jun 14 2021
Jun 11 2021
Fix deployed, working now
Jun 8 2021
We could do. Should I make a ticket?
May 27 2021
I think we could do another cycle of tuning search results incorporating the data from the image-recommendation test, and then graph the data again and see where we are
We can, but it means going through all unillustrated articles and grabbing their scores, so it's going to need a ticket in itself. I'll make one
We have 2485 rated recommendations using mediasearch, from 984 search terms
May 26 2021
May 24 2021
Talked to @LSobanski and we don't think we need extra storage atm
May 14 2021
@John_Cummings - structured data on a commons File page is for describing the file. For example:
- what an image depicts
- the copyright licence associated with the file
- who created the file
May 13 2021
Comparison of results from the analysis tool before and after this change
Here's a csv of the results
May 11 2021
Yeah, that's expected behaviour for now - https://phabricator.wikimedia.org/T279072 has been raised to modify the behaviour of the dropdown so that it's more like desktop
May 10 2021
... so we just need a few more ratings to get to our target in ar, cs and ceb