Outcome Summary
WikiGapFinder can be accessed through this URL: https://recommend-large.wmflabs.org/?campaign=WikiGapFinder
The following choices were made:
- WikiGapFinder is the campaign name (though wikigap would have been better for monitoring via Content Translation). This triggers the WikiGap-specific behavior (i.e. article filtering, descriptive text, language defaults)
- Though we've stuck to a single codebase for GapFinder and the WikiGapFinder campaign, a larger instance was started to help speed up processing and reduce latency for the end user
- Filtering is done by building the list of suggested translations as normal and then doing a final filtering step where Wikidata properties are gathered for each article and are checked against the instance-of property for whether they are human (Q5) and sex-or-gender property for either women (Q6581072) or transgender female (Q1052281).
- Even with this filter, the number of items returned has been large enough to fill in the default of 12 results and provide some diversity between searches. Thus, no work had to be done to expand the number of articles initially considered for translation.
- Seed articles can still be used to focus the results -- e.g.:
- Women scientists (defined as similar to Marie Curie): https://recommend-large.wmflabs.org/?campaign=WikiGapFinder&seed=Marie%20Curie
- Women activists (defined as similar to Greta Thuberg): https://recommend-large.wmflabs.org/?campaign=WikiGapFinder&seed=Greta%20Thunberg
- Women artists (defined as similar to Carrie Mae Weems): https://recommend-large.wmflabs.org/?campaign=WikiGapFinder&seed=Carrie%20Mae%20Weems
- Women politicians (defined as similar to Jacinda Ardern): https://recommend-large.wmflabs.org/?campaign=WikiGapFinder&seed=Jacinda%20Ardern
- TBD:
- Long-term solution for this endpoint -- i.e. when this campaign is over, how can we continue to support editors looking to create articles about women in their language?
- Simpler configuration for future campaigns
- Coalescing of medium instance (https://recommend.wmflabs.org/) and large instance (https://recommend-large.wmflabs.org/) back to a single instance
Background
In support of WikiGap, which begins 6 March 2020, we would like to extend the existing GapFinder system to filter down the results to just biographies of women. It is possible that other women-related topics would be of interest but that is out-of-scope for this task given that it is much harder to define the boundaries of that topic and the existing system can already partially support that task (by intelligently choosing seed articles/categories).
Current state
GapFinder does not allow for explicit filtering -- e.g., based on Wikidata properties or ORES topics. Users can either provide a seed article for which to find similar articles (morelike) or a Wikipedia category to use in filtering, but in practice the former leads to a mix of articles about men and women and the latter leads to either very low numbers of results or very generic article recommendations. For example, you get no results with Category:20th-century women scientists but do get a few with the more generic Category:Women, but the results are very generic as opposed to specific women.
Possible Endpoints
- Adjust the existing GapFinder endpoint (https://recommend.wmflabs.org/) to allow for filtering within the interface and API. This would reduce the number of endpoints that must be documented and maintained but may be more difficult to implement the UI, could lead to a growing number of ad-hoc filters added over time, and restricts our ability to adjust other aspects of the interface for a given campaign.
- Stand up a second endpoint for WikiGap (WikiGapFinder being the suggested name -- major kudos to Eric). This allows for tweaks to the interface as well as doing additional filtering upfront on the backend. It is much more flexible though we will want to be careful about building lots of new endpoints because that could make maintaining the code much more difficult.
Filtering for WikiGap
- Filter based on a configuration file that contains a WDQS query: this keeps all the filtering on the back-end and is simple and flexible in that the filtering can be adjusted just by changing a config file as opposed to the core code. It should allow us to maintain consistent codebases across endpoints as well.
- Filter on-the-fly: this approach would allow users to indicate what levels of filtering they want. Presumably we would want to restrict them to a few pre-set properties such as P21 (sex or gender), P27 (country of citizenship), and/or P106 (occupation). The challenge here is that this could clutter the interface, lead to lower numbers of results being returned (or require much larger initial queries) because the filtering is being done after the queries, and many users would find it challenging to use some of these properties -- i.e. they require some knowledge of Wikidata properties/values and especially for occupation could be very misleading given the large number of possible values.
- Not being considered: relying on the existing category filtering. As noted above, this does not work well in practice for the needs of WikiGap.
Possible Interface Adjustments
- Adjust the GapFinder interface text to welcome participants from WikiGap (or give a link back to WikiGap for people who find the interface through other channels)
- Set the campaign ID to be WikiGapFinder so that when users create articles via Content Translation, that information is stored in the edit tags and can be tracked by WikiGap.
- Consider changing the default source/target languages to be English -> Swedish given that Wikimedia Sweden is running this campaign. Other language communities will be partaking though so we should not restrict the language choices.