Cool thanks @hashar
Thu, Sep 16
Are we expecting group1 wikis to be on 1.37.0-wmf.23 since yesterday? They don't (or at least commons and wikidata don't) seem to be
Tue, Sep 14
Mon, Sep 13
Not moving forward with this for now - the code has been merged and reverted so it's in the commit log
Fri, Sep 10
Thu, Sep 9
Wed, Sep 8
Tue, Sep 7
Mon, Sep 6
@Zbyszko I think the MediaInfo RDF format could be considered stable now, but we cannot guarantee anything because
- there is no team to oversee its stability (we're the structured data team, not the commons team)
- the stable interface policy is only for wikidata, and if it's to be used as a guarantee it either needs to be updated to include commons, or we need a new doc for commons
Fri, Sep 3
I mean consistent (or persistent) over time.
Hm, I disagree. Sometimes it will be necessary to have 'materialized views' of data inside outside of the monolith in separate places. Wikidata Query Service is a great example (and the basic event driven architecture is similar to what is proposed for this Knowledge Store, IIUC). It runs a transformed view of Wikibase context to serve a different query model.
I'd be wary of relying on surface transformations on the data instead of making the real knowledge store(s) more structured and granular, in a way that editors and editing tools can integrate with
Thu, Sep 2
Wed, Sep 1
Note that you need to be logged into production to do the first curl, and into a stat server to do the second
Results from AnalyzeResults.php:
Mon, Aug 30
@SJu we think it's a HotCat bug similar to the others linked, so closing
Fri, Aug 27
@dcausse it turns out this model can't be used with languages for which we don't have stemming, so I prepared another ranklib file that uses the .plain fields (where available)
Mon, Aug 23
This smells to me like HotCat isn't picking up the new revision id when the mediainfo slot on the page has been updated, and therefore a HotCat bug rather than a MediaWiki bug
This and T283535 both look like there's a problem with RandomImageGenerator.php, while the random image that's being generated for testing is being written to disk
Aug 19 2021
Aug 18 2021
Here's the notebook code used to work out the numbers
Ok here are some results, based on trawling through the image-suggestions api in August 2021, and using wiki snapshots from June
Aug 4 2021
Aug 3 2021
Yes! A duplicate! Sorry, resolving ...
@egardner can this be moved out of code review now?
Jul 30 2021
Jul 26 2021
Or ... better still - use a dismax of a field and its plain version when creating the query
Any progress on this?
Jul 23 2021
in the user testing Growth did in Dec 2020, we asked users to rate the different pieces of information for usefulness and the source (labeled "Suggestion reason" in the test UI) was one of the most highly rated pieces of information by users, particularly when the source was that it was used in the same article in another language Wikipedia as this was easy to understand. This appears to be reflected in the Android MVP data, where ratings were 80% for when this was the suggestion reason shown
Jul 22 2021
In the future, we'd like to be able to filter by source type, e.g. only show Wikidata-based recommendations (which tend to be significantly more reliable). This is not needed for the first iteration, but since otherwise the search index data would probably not need to be reloaded between iterations, it might be easier to deal with it now.
So the easier-to-follow process for doing search tuning is
FWIW here's the code used to train/test the logistic regression model
tl;dr: gathering more labeled data does not look like it will measurably improve the precision of our results, so there's no point in making a big effort to do it
Jul 20 2021
@ArielGlenn Can we close this now?
Jul 19 2021
@egardner is this something the design systems team is already planning?
blocked by https://phabricator.wikimedia.org/T280368
Blocked by https://phabricator.wikimedia.org/T280368
Jul 15 2021
Jul 13 2021
Jul 12 2021
This will get done as part of T280368
Jul 1 2021
So when querying the search api, you guys need to set srqiprofile =empty in the request, then the confidence score can be worked out as descirbed
Jun 30 2021
Jun 28 2021
Jun 25 2021
Jun 23 2021
Hmm or maybe not. maintenance/createFileListfromCategoriesAndTemplates.php doesn't handle sub-categories
Jun 22 2021
After some poking around I see that step 1 can be accomplished using the scripts
Jun 21 2021
@Vlad.shapik can this be resolved now?
Jun 16 2021
Turns out the problem was the clock on my phone had drifted! Working fine now, thanks @Urbanecm