Page MenuHomePhabricator

Extend GapFinder to support WikiGap
Closed, ResolvedPublic

Description

Outcome Summary

WikiGapFinder can be accessed through this URL: https://recommend-large.wmflabs.org/?campaign=WikiGapFinder
The following choices were made:

  • WikiGapFinder is the campaign name (though wikigap would have been better for monitoring via Content Translation). This triggers the WikiGap-specific behavior (i.e. article filtering, descriptive text, language defaults)
  • Though we've stuck to a single codebase for GapFinder and the WikiGapFinder campaign, a larger instance was started to help speed up processing and reduce latency for the end user
  • Filtering is done by building the list of suggested translations as normal and then doing a final filtering step where Wikidata properties are gathered for each article and are checked against the instance-of property for whether they are human (Q5) and sex-or-gender property for either women (Q6581072) or transgender female (Q1052281).
  • Even with this filter, the number of items returned has been large enough to fill in the default of 12 results and provide some diversity between searches. Thus, no work had to be done to expand the number of articles initially considered for translation.
  • Seed articles can still be used to focus the results -- e.g.:
  • TBD:
    • Long-term solution for this endpoint -- i.e. when this campaign is over, how can we continue to support editors looking to create articles about women in their language?
    • Simpler configuration for future campaigns
    • Coalescing of medium instance (https://recommend.wmflabs.org/) and large instance (https://recommend-large.wmflabs.org/) back to a single instance

Background

In support of WikiGap, which begins 6 March 2020, we would like to extend the existing GapFinder system to filter down the results to just biographies of women. It is possible that other women-related topics would be of interest but that is out-of-scope for this task given that it is much harder to define the boundaries of that topic and the existing system can already partially support that task (by intelligently choosing seed articles/categories).

Current state

GapFinder does not allow for explicit filtering -- e.g., based on Wikidata properties or ORES topics. Users can either provide a seed article for which to find similar articles (morelike) or a Wikipedia category to use in filtering, but in practice the former leads to a mix of articles about men and women and the latter leads to either very low numbers of results or very generic article recommendations. For example, you get no results with Category:20th-century women scientists but do get a few with the more generic Category:Women, but the results are very generic as opposed to specific women.

Possible Endpoints

  • Adjust the existing GapFinder endpoint (https://recommend.wmflabs.org/) to allow for filtering within the interface and API. This would reduce the number of endpoints that must be documented and maintained but may be more difficult to implement the UI, could lead to a growing number of ad-hoc filters added over time, and restricts our ability to adjust other aspects of the interface for a given campaign.
  • Stand up a second endpoint for WikiGap (WikiGapFinder being the suggested name -- major kudos to Eric). This allows for tweaks to the interface as well as doing additional filtering upfront on the backend. It is much more flexible though we will want to be careful about building lots of new endpoints because that could make maintaining the code much more difficult.

Filtering for WikiGap

  • Filter based on a configuration file that contains a WDQS query: this keeps all the filtering on the back-end and is simple and flexible in that the filtering can be adjusted just by changing a config file as opposed to the core code. It should allow us to maintain consistent codebases across endpoints as well.
  • Filter on-the-fly: this approach would allow users to indicate what levels of filtering they want. Presumably we would want to restrict them to a few pre-set properties such as P21 (sex or gender), P27 (country of citizenship), and/or P106 (occupation). The challenge here is that this could clutter the interface, lead to lower numbers of results being returned (or require much larger initial queries) because the filtering is being done after the queries, and many users would find it challenging to use some of these properties -- i.e. they require some knowledge of Wikidata properties/values and especially for occupation could be very misleading given the large number of possible values.
  • Not being considered: relying on the existing category filtering. As noted above, this does not work well in practice for the needs of WikiGap.

Possible Interface Adjustments

  • Adjust the GapFinder interface text to welcome participants from WikiGap (or give a link back to WikiGap for people who find the interface through other channels)
  • Set the campaign ID to be WikiGapFinder so that when users create articles via Content Translation, that information is stored in the edit tags and can be tracked by WikiGap.
  • Consider changing the default source/target languages to be English -> Swedish given that Wikimedia Sweden is running this campaign. Other language communities will be partaking though so we should not restrict the language choices.

Event Timeline

Change 571110 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] Add welcome message and default language pair

https://gerrit.wikimedia.org/r/571110

@Isaac this is how the new interface will look like on page load. Let me know if you want me to change anything in it.

scrot-area-2020-02-09_22:52:56.png (971×1 px, 449 KB)

this is how the new interface will look like on page load. Let me know if you want me to change anything in it.

Only change I'd suggest at this point would be to also include a "For more information, see here: https://meta.wikimedia.org/wiki/WikiGap" as subtext to "Welcome to the 'WikiGapFinder' campaign!"

@Eric_Luth_WMSE should also feel free to make any requests regarding the specific text to include

Change 571110 merged by jenkins-bot:
[research/recommendation-api@master] Add welcome message and default language pair

https://gerrit.wikimedia.org/r/571110

@Isaac thanks. I made the change (which will be deployed once we're done with the remaining parts).

On another note, for this iteration, presumably we want to filter recommendations on the fly (2nd option the task description) because it seems the fastest to implement. As I see it, no UI change will happen, and in the back-end we'll detect the campaign and filter the results accordingly. If this approach sounds good to you, then can you let me know the full set of restrictions (Wikidata properties) that we'll be implementing? Thanks.

thanks. I made the change (which will be deployed once we're done with the remaining parts).

Excellent, thanks!

On another note, for this iteration, presumably we want to filter recommendations on the fly (2nd option the task description) because it seems the fastest to implement. As I see it, no UI change will happen, and in the back-end we'll detect the campaign and filter the results accordingly.

Yeah, I would agree that filtering on the fly (if it works) seems to make the most sense in this instance given the timeline. I am slightly worried that we won't end up with enough results in some cases, especially if we make the filters more restrictive. In my quick scan, it looks like the initial query to Search can request up to 500 articles and then all the filtering is applied to that set. It seems like if the post-hoc filtering that you are suggesting doesn't return enough results, we would have to consider making queries to WDQS instead?

If this approach sounds good to you, then can you let me know the full set of restrictions (Wikidata properties) that we'll be implementing? Thanks.

Right now, I think we want to just restrict to Wikidata items that have both: instance-of:human (P31:Q5) and sex-or-gender:female (P21:Q6581072). This matches the criteria that WikiGap suggests in their current guide for identifying articles to translate (the occupation criteria is just an example to help personalize to one's interests): https://meta.wikimedia.org/wiki/WikiGap/Wikidata_guide#Prepare_the_source_code

It seems like if the post-hoc filtering that you are suggesting doesn't return enough results, we would have to consider making queries to WDQS instead?

Probably. I'll see what the code is doing and let you know.

Probably. I'll see what the code is doing and let you know.

Ok, thanks -- that would obviously be a bigger rewrite and presumably would add latency too so I'm still onboard with trying post-hoc filtering first. There might also be a smart way to requery the Search API (e.g., using articles from the first batch as seeds) to add more results if the first query does not provide 12 results after filtering.

Change 572722 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] WikiGapFinder campaign: show women on page load

https://gerrit.wikimedia.org/r/572722

Change 572722 merged by Bmansurov:
[research/recommendation-api@master] WikiGapFinder campaign: show women on page load

https://gerrit.wikimedia.org/r/572722

Change 572753 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] Make adjustments for code to run on labs

https://gerrit.wikimedia.org/r/572753

Change 572753 merged by Bmansurov:
[research/recommendation-api@master] Make adjustments for code to run on labs

https://gerrit.wikimedia.org/r/572753

@Isaac The labs instance has been updated with the latest code. The WikiGapFinder campaign users will see articles about women on the landing page. Subsequent searches will also filter out articles that are not about women. Below are some of the before and after screenshots.

Before (landing page)

scrot-area-2020-02-17_17:44:47.png (836×1 px, 362 KB)

After (landing page)

scrot-area-2020-02-17_18:57:10.png (982×1 px, 679 KB)

Before (searching for "Michelle Obama")

scrot-area-2020-02-17_17:45:00.png (893×1 px, 704 KB)

After (searching for "Michelle Obama")

scrot-area-2020-02-17_18:57:58.png (983×1 px, 787 KB)

Obviously, searching for something that's not related to women may not yield any results for this campaign.

Search is not fast and there's a noticeable delay before the user sees any results. One thing I'd suggest is to create a new labs instance that's bigger than the current instance (m1.medium) before the campaign starts. We can later turn off the larger instance and switch back to the current instance.

Let me know if I missed anything or if you want me to add other features.

@bmansurov When I left the language (by choosing espanol instead of svenska), it surfaced non-WikiGAP articles.

Nevermind, I purged my cache and it appears to be working now.

@bmansurov will you be able to track the number of uses during the campaign? It would be interesting to get the stats on effectiveness and usage.

@Astinson if I'm not mistaken the campaigns are already being tracked with eventlogger and also content translation. I can look into it if you need the details.

@bmansurov this is fantastic! Getting a larger instance makes sense to me given that we might expect spikes in traffic during events. Let me know if you would like any support there.

And a few other questions / comments:

  • Thinking about the increased delay for search results, do we know what is causing it? I suspect it's the addition of the claims parameter in the Wikidata query (as opposed to any additional processing that is going on). A few thoughts on how to reduce this burden:
    • Right now these claims seem to be fetched twice: once for checking if the article is missing in the target language (filter_by_missing) and once when the filtering down to women is done for the campaign (filter_by_campaign). The first call (filter_by_missing) could be adjusted to only request sitelinks and the second call (filter_by_campaign) only claims...or I suppose find a way to do a single call for both and store the results? We'll probably want to adjust the properties requested for get_wikidata_items_from_titles and get_titles_from_wikidata_items too.
    • If even a single call is painfully slow, would it be faster to store a list of Wikidata IDs that match the campaign criteria somewhere? I hesitate to go this route though because I assume that either requires setting up a specific endpoint for WikiGapFinder or another endpoint that acts as a simplified Wikidata API that only returns the claims we care about and no additional metadata.
  • The days parameter in the code here is doubled but that would only seem to change the date for which pageview data is being requested (from two days ago to four days ago) and not the number of articles. Wouldn't you need to change the max_candidates parameter here to increase the number of possible articles from 500 (to I believe a max of 1000). I'm not sure I advocate right now for increasing to 1000 though given the additional time that would presumably require for gathering Wikidata properties.

if I'm not mistaken the campaigns are already being tracked with eventlogger and also content translation

Yep, a few additional details:

And a few other questions / comments:

  • Thinking about the increased delay for search results, do we know what is causing it? I suspect it's the addition of the claims parameter in the Wikidata query (as opposed to any additional processing that is going on).

Yes, that's true. Also, the site was already slow. I did a quick profiling. Here's the data.

processtime in seconds
Request processed5.425361
_ Getting most popular candidates0.242289
_ Applying filters5.002784
___ Filtering by title0.000399
___ Filtering by missing3.791824
___ Filtering by disambiguation0.188154
___ Filtering by campaign1.021533
  • Right now these claims seem to be fetched twice: once for checking if the article is missing in the target language (filter_by_missing) and once when the filtering down to women is done for the campaign (filter_by_campaign).

Not quite. After each filtering, the number of candidates potentially reduces, and subsequent filters will fetch less data. But, yes, we're fetching some of the data twice.

The first call (filter_by_missing) could be adjusted to only request sitelinks > and the second call (filter_by_campaign) only claims...

Sure, we can play around with these. Requesting specific items doesn't always mean faster response times, though. Maybe the MW API works fastest because of how it can cache results. If we request new properties, then we won't use the cache. Maybe it's not how it works. I'll have to properly profile everything before doing any modifications to code.

or I suppose find a way to do a single call for both and store the results? We'll probably want to adjust the properties requested for get_wikidata_items_from_titles and get_titles_from_wikidata_items too.

Yes, this may work too. I'll keep this in mind.

  • If even a single call is painfully slow, would it be faster to store a list of Wikidata IDs that match the campaign criteria somewhere? I hesitate to go this route though because I assume that either requires setting up a specific endpoint for WikiGapFinder or another endpoint that acts as a simplified Wikidata API that only returns the claims we care about and no additional metadata.

Do you mean just IDs or other info that we need? It's probably not worth going this route as your intuition tells.

  • The days parameter in the code here is doubled but that would only seem to change the date for which pageview data is being requested (from two days ago to four days ago) and not the number of articles.

Yes, you're right. Doubling the number of days doesn't change the output count. I should have checked this before implementing. I think I'll remove the doubling logic in the next iteration. We seem to be getting enough results.

Wouldn't you need to change the max_candidates parameter here to increase the number of possible articles from 500 (to I believe a max of 1000).

I don't think that 500 is related to this query.

I'll try to find the exact bottlenecks and change code accordingly soon.

Thanks for the details @bmansurov ! Broadly that makes sense (I hadn't thought about the caching aspect)

Do you mean just IDs or other info that we need? It's probably not worth going this route as your intuition tells.

Yeah, I was just thinking a list of acceptable Wikidata IDs -- e.g., the result of a WDQS query that is stored in config -- and then just checking to see if a given item in a result set is in that set of acceptable IDs or not (as opposed to calling the wikidata API to get claims data). But I also agree that we shouldn't go down this route right now.

Yes, you're right. Doubling the number of days doesn't change the output count. I should have checked this before implementing. I think I'll remove the doubling logic in the next iteration. We seem to be getting enough results.

Sounds good.

I'll try to find the exact bottlenecks and change code accordingly soon.

Great, thanks!

Coding is not my strength really.. When would you consider this tool ready to "launch"?

Coding is not my strength really.. When would you consider this tool ready to "launch"?

I'll leave this decision to @bmansurov. My personal evaluation is that this version of WikiGapFinder is technically sound (does what we expect it to do) but we still need another week ideally to reduce how much time it takes to provide the recommendations and test more completely. Is that okay?

Also, @Eric_Luth_WMSE while I have your attention, could I get verification of a few things:

  • Right now we are only filtering down to women (as in the Wikidata list example given on WikiGap's page). Is that correct? We can also expand it to other gender identities if desired -- we would just need a list of identities to be included (see one-of constraint here for Wikidata's full list of possible gender identities).
  • We are using the campaign WikiGapFinder to tag articles translated when using this tool -- i.e. we pass this tag to the Content Translation which then will make it an edit tag when the article is created. Is that okay? Will you be able to capture this through your dashboards still?
  • And ignoring the latency issue for the moment (you'll have to wait ~10 seconds to get results), you should verify that https://recommend.wmflabs.org/?campaign=WikiGapFinder works as you expect it to. I'll note that because sorting is done by pageview counts, entertainers tend to show up high in the results. If you also search for a woman -- e.g., Marie Curie -- in the "Search for articles to create" field, then that will bring in results of women generally whose occupation is similar -- i.e. physicists/chemists. If you like that type of result better, we can also provide links that start the user off with one of those searches. For example: https://recommend.wmflabs.org/?campaign=WikiGapFinder&seed=Marie%20Curie. We should be able to also set the source/target languages via the URL, but in my testing, that does not update the interface right now so I would not suggest doing so until we determine that that bug can be easily fixed.

Ok, thanks for that! I just need to know when/if I can mention it to the local organizers.

On your points:

  • I could see a point of adding e.g. transgender female (Q1052281), which would make it "everyone identifying as female" rather. Would that be difficult or make the time it takes to provide recommendations longer?
  • I'll look this up, and ping @Alicia_Fagerving_WMSE in the meanwhile.
  • I showed it to a few colleagues and the coordinator at the ministry of foreign affairs, they all liked it very much but had that same comment, that there's lots of entertainers. But at the same time, it is quite easy to see that you can narrow it down by adding keywords. One spontaneous thought, could the text be changed in the search box to clarify that you can use it to narrow down the search? Like "Enter keywords here for a refined search" or something?

Change 574256 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] Update labs_setup.sh

https://gerrit.wikimedia.org/r/574256

Change 574256 merged by Bmansurov:
[research/recommendation-api@master] Update labs_setup.sh

https://gerrit.wikimedia.org/r/574256

Change 574260 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] WikiGapFinder: Remove the day multiplier

https://gerrit.wikimedia.org/r/574260

Change 574260 merged by jenkins-bot:
[research/recommendation-api@master] WikiGapFinder: Remove the day multiplier

https://gerrit.wikimedia.org/r/574260

Change 574262 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] Speed up Wikidata lookup

https://gerrit.wikimedia.org/r/574262

Change 574262 merged by Bmansurov:
[research/recommendation-api@master] Speed up Wikidata lookup

https://gerrit.wikimedia.org/r/574262

@Isaac I've made some changes and created a large labs instance. Here it is: http://recommend-large.wmflabs.org/?campaign=WikiGapFinder

You can still access the old instance at http://recommend.wmflabs.org/?campaign=WikiGapFinder

The new instance is not fast enough for my liking, but it's much more faster than the regular instance. Maybe it's wise to use the large instance for this campaign as it's not announced anywhere (except here).

I think we can squeeze more performance out, but I wonder if that's worth the time invested. Also, let me know if I missed anything else.

  • We are using the campaign WikiGapFinder to tag articles translated when using this tool -- i.e. we pass this tag to the Content Translation which then will make it an edit tag when the article is created. Is that okay? Will you be able to capture this through your dashboards still?

I don't know if we have that tag set up at this point, @pau and @Amire80 may need to do something on the translation tool to make sure its working. You should be able to track those results in the dashboard if they are reported (i.e. the editors are registered, and/or the wikis edited are being tracked in a P&E Dashboard event). @Eric_Luth_WMSE

I've made some changes and created a large labs instance. Here it is: http://recommend-large.wmflabs.org/?campaign=WikiGapFinder

Yeah, that is noticeably faster. Thanks for setting that up. I'm not sure I fully understand why it's so much faster as I assumed that the API calls were the main issue but I'm definitely glad to see the improvement!

The new instance is not fast enough for my liking, but it's much more faster than the regular instance. Maybe it's wise to use the large instance for this campaign as it's not announced anywhere (except here).

Agreed -- there's one tiny issue I want to work out with source/target languages (see below) but then I'm comfortable broadcasting the link.

I think we can squeeze more performance out, but I wonder if that's worth the time invested. Also, let me know if I missed anything else.

  • Thanks @bmansurov -- I'd agree that at this point there aren't any obvious ways to speed up performance and it's at a threshold that is workable.
  • Per Eric's comments above, I think we will be extending the list of acceptable Wikidata values for sex-or-gender here to include at least transgender female (Q1052281). I wanted to check on one other though: @Eric_Luth_WMSE the other Wikidata property that receives a good bit of usage is the non-binary (Q48270) identity. Is that part of WikiGap as well and can be included?
  • I noticed that the English -> Swedish default for this campaign overrides s= and t= parameters that are passed in the URL. Is it easy to fix this so that we can override this English -> Swedish default in the URLs we pass around if need be?

Change 574724 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] WikiGapFinder: include transgender females when filtering

https://gerrit.wikimedia.org/r/574724

Change 574724 merged by jenkins-bot:
[research/recommendation-api@master] WikiGapFinder: include transgender females when filtering

https://gerrit.wikimedia.org/r/574724

Change 574727 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] Give precedence to language query parameters over campaign

https://gerrit.wikimedia.org/r/574727

Change 574727 merged by jenkins-bot:
[research/recommendation-api@master] Give precedence to language query parameters over campaign

https://gerrit.wikimedia.org/r/574727

  • Per Eric's comments above, I think we will be extending the list of acceptable Wikidata values for sex-or-gender here to include at least transgender female (Q1052281).

Done

  • I noticed that the English -> Swedish default for this campaign overrides s= and t= parameters that are passed in the URL. Is it easy to fix this so that we can override this English -> Swedish default in the URLs we pass around if need be?

Done

Thanks for the updates @bmansurov -- code makes sense. I don't see the language overrides on my end so I assume you're waiting to push to the server until we get verification of the other lingering questions.

In case I'm doing something wrong, example: https://recommend-large.wmflabs.org/?campaign=WikiGapFinder&t=de still gives me English -> Swedish even with a full refresh (cmd+shift+R on mac) but https://recommend-large.wmflabs.org/?t=de appropriately makes German the target.

Collecting the lingering questions / comments / etc.:

I showed it to a few colleagues and the coordinator at the ministry of foreign affairs, they all liked it very much but had that same comment, that there's lots of entertainers. But at the same time, it is quite easy to see that you can narrow it down by adding keywords. One spontaneous thought, could the text be changed in the search box to clarify that you can use it to narrow down the search? Like "Enter keywords here for a refined search" or something?

@Eric_Luth_WMSE : unfortunately that's actually a pretty painful change to make. Text like "Search for articles to create" is defined early on so that we can get translations through translatewiki for other languages. For that particular phrase, we have 76 languages available, which really helps the interface make sense (and be inclusive) to users outside of English. The easier approach given that the campaigns are starting in about a week or so is to provide some example queries. For instance:

Regarding customization of links, we can also set the source and target languages. For instance, the above links can also have the source (what language do we use to find articles to be created) and target (what language will the articles be created in) set. To set the source, just add &s=<langcode> to the URL, so &s=de would set the source to be German Wikipedia. The same goes for target, but it's &t=<langcode>. Just one caveat: if you change the source language so it isn't English, we would potentially need to update the seed parameter in case the article has a different name -- e.g., the Marie Curie article is titled "Maria Skłodowska-Curie" in Polish, so if Polish is the source, we'd have to use this URL: https://recommend-large.wmflabs.org/?campaign=WikiGapFinder&seed=Maria%20Skłodowska-Curie&s=pl

Per Eric's comments above, I think we will be extending the list of acceptable Wikidata values for sex-or-gender here to include at least transgender female (Q1052281). I wanted to check on one other though: @Eric_Luth_WMSE the other Wikidata property that receives a good bit of usage is the non-binary (Q48270) identity. Is that part of WikiGap as well and can be included?

Just let us know either way.

I don't know if we have that tag set up at this point, @pau and @Amire80 may need to do something on the translation tool to make sure its working. You should be able to track those results in the dashboard if they are reported (i.e. the editors are registered, and/or the wikis edited are being tracked in a P&E Dashboard event).

This is still lingering. At this point, I will assume that WikiGapFinder as the campaign tag is acceptable. All articles created through WikiGapFinder (and by extension ContentTranslation) are obviously attributed to the editor so WikiGap should be able to track them if the editor is known to be linked to the campaign. I'll leave it up to you whether you want to work with the ContentTranslation folks to specifically have a dashboard for usage of WikiGapFinder but we can also calculate statistics after the fact by filtering on edit tags if that is ever desired.

I don't see the language overrides on my end so I assume you're waiting to push to the server until we get verification of the other lingering questions.

I just mentioned this but, for completeness, I'm including it again.

Thanks for the updates @bmansurov -- code makes sense. I don't see the language overrides on my end so I assume you're waiting to push to the server until we get verification of the other lingering questions.

In case I'm doing something wrong, example: https://recommend-large.wmflabs.org/?campaign=WikiGapFinder&t=de still gives me English -> Swedish even with a full refresh (cmd+shift+R on mac) but https://recommend-large.wmflabs.org/?t=de appropriately makes German the target.

Sorry, I forgot to update the server. It should be all good now.

@Isaac : Understood. What is possible in terms of adding info? I'm thinking if it is possible to add a small fact box or something like that to explain what it is and the background. If it is hard to do, we could do it at a dedicated page for the tool on the meta pages, but it would, if possible, perhaps make it easier to understand for someone coming to the page from elsewhere.
The same goes for the word "WikiGap Finder campaign"; is the word "campaign" fixed here?

Thanks for all your impressive work, I think this will be a great add on!

when it comes to the hashtag question, I am probably a bit to unaccostomed to Dashboard to understand how it would be integrated? I know how to search for a hashtag through quarry, but can I see hashtags in dashboard? Is there any documentation of how that works?
(@Alicia_Fagerving_WMSE @Ragesoss )

when it comes to the hashtag question, I am probably a bit to unaccostomed to Dashboard to understand how it would be integrated? I know how to search for a hashtag through quarry, but can I see hashtags in dashboard? Is there any documentation of how that works?
(@Alicia_Fagerving_WMSE @Ragesoss )

P&E Dashboard doesn't have any support for hashtags yet. It's a good idea and one I'm interested in adding at some point, but it's not on the near-term roadmap at this time. (If a developer is interested in working on this feature, I'd be happy to provide guidance.)

Understood. What is possible in terms of adding info? I'm thinking if it is possible to add a small fact box or something like that to explain what it is and the background... [removed content] The same goes for the word "WikiGap Finder campaign"; is the word "campaign" fixed here?

@Eric_Luth_WMSE yeah, we can adjust the phrasing / links etc. for Welcome to the 'WikiGapFinder' campaign! For more information, see here: https://meta.wikimedia.org/wiki/WikiGap. We have tried to keep this re-usable for other campaigns (right now the template is Welcome to the <campaign name> campaign. For more information, see here: <link>) but if you would like something different, we can look into updating it. Just provide the text and we'll do our best to see how to incorporate it. @bmansurov feel free to overrule me if that's more work than I'm expecting.

P&E Dashboard doesn't have any support for hashtags yet. It's a good idea and one I'm interested in adding at some point, but it's not on the near-term roadmap at this time. (If a developer is interested in working on this feature, I'd be happy to provide guidance.)

Bummer but hopefully not a blocker. As I said, if anyone wants to do the analyses about how many articles specifically came through this tool, the tag will be saved with edits so it'll always be possible even if it can't be immediately incorporated into the existing dashboards.

@Astinson can you please clarify what you need here? Shall we just configure a Content Translation campaign id (also known as "cta"), as we did for Wiki for Human Rights? Or something else?

Ok, @bmansurov thanks for your patience. A final request was put in to add some more descriptive text for the campaign. I know this would require changing templating for the campaign taglines -- perhaps just to accept a block of HTML? Let me know if this is a larger change than I expect. The text is longer than is ideal but it's good content. If you think it's too much and it's easy to make it dismissable (like the Privacy Policy info), that would be another option too. See below (with a screenshot of how it looked on my laptop when I inserted the text):

<p>
There are four times as many articles about men as there are about women. The figures vary regionally, but no matter how you look at it, the picture is clear: the information about women is less extensive than that about men. Regardless of which language version of Wikipedia you read. We want to change this. WikiGap is an initiative that encourages people around the world to add more content to Wikipedia about women figures, experts and role models in various fields. Read more about WikiGap here: <a href="https://meta.wikimedia.org/wiki/WikiGap">https://meta.wikimedia.org/wiki/WikiGap</a>
</p>
<p>
The WikiGapFinder helps you discover articles about women that exist in one language but are missing in another. Start by selecting a source language and a target language. WikiGapFinder will find trending articles about women in the source that are missing in the target. If you are interested in a particular topic area, provide a seed article in the source language, and WikiGapFinder will find related articles missing in the target. Click on a card to take a closer look at a missing article to see if you would like to create it from scratch or translate it.
</p>

Screen Shot 2020-02-28 at 7.22.17 PM.png (1×2 px, 1 MB)

I also want to document that these requests were made (along with my reasons for pushing them off):

  • Remove or adjust the Wikipedia GapFinder Beta brand that exists for all campaigns to this campaign name (otherwise the name overlap can be kinda confusing)
    • Even though this is slightly confusing, I asked that we leave this in place as it helps to maintain a consistent identity for the general GapFinder tool.
  • Remove the word "campaign" from the "Welcome" message so that this portal can be used even after this particular WikiGap campaign is over:
    • I think this is probably part of a longer discussion. We can pledge to maintain this portal until this summer but then I want the flexibility to make larger changes if we continue to do research / make improvements in this space. At that point, we can discuss how to make this tool available more long-term.
  • I also want to document that when the campaign starts, I will plan to monitor https://meta.wikimedia.org/wiki/Research_talk:Increasing_article_coverage/Tool as that is where feedback is directed from the tool.

Once the above text is incorporated, I plan to close this task and open new ones for the remaining requests that will be put on backlog.

Change 575747 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] WikiGapFinder: add campaign info

https://gerrit.wikimedia.org/r/575747

Change 575747 merged by jenkins-bot:
[research/recommendation-api@master] WikiGapFinder: add campaign info

https://gerrit.wikimedia.org/r/575747

Thanks so much! I think it starts to look great.
I went over the text again, would it be possible (clarifying a few things) to change the second paragraph to the text below? Specifically mentioning that it is possible to search on names (as you mentioned).

"The WikiGapFinder helps you discover articles about women that exist in one language but are missing in another.
Start by selecting a source language and a target language. WikiGapFinder will find trending articles about women in the source that are missing in the target.
If you are interested in writing about women in a particular field, name the field or a woman in that field in the search bar.
Click on a card to take a closer look at a missing article to see if you would like to create it from scratch or translate it."

Please also tell me when we can communicate about the campaign!

Looks great -- thanks @bmansurov ! If you get a chance, updating the second paragraph of the description as noted above would be appreciated. I'll close this task out then.

@Eric_Luth_WMSE I think we are ready to share out then. This is the general URL to provide: http://recommend-large.wmflabs.org/?campaign=WikiGapFinder

If you want to provide language-specific URLs, you can do that like this by changing the t= URL parameter (s= if you want to change the source wiki). Just make sure the campaign=WikiGapFinder is always included.
e.g., English -> Japanese: http://recommend-large.wmflabs.org/?campaign=WikiGapFinder&t=ja

If you want to give starting examples of using other women to focus the results, here are a few quick ones (seed= parameter):

Change 576173 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[research/recommendation-api@master] WikiGapFinder: update campaign info

https://gerrit.wikimedia.org/r/576173

Change 576173 merged by jenkins-bot:
[research/recommendation-api@master] WikiGapFinder: update campaign info

https://gerrit.wikimedia.org/r/576173

The campaign info has been updated.

@Isaac and @bmansurov -- thank you so much this is awesome!

Isaac claimed this task.

Yes, thanks @bmansurov !! Diff looks good, I'm just not seeing it reflected on https://recommend-large.wmflabs.org/?campaign=WikiGapFinder yet so if you could push it out, that'd be appreciated.

I'll close out the task as complete though.

@Isaac, Can you clear your cache? This is what I see:

scrot-area-2020-03-03_19:31:19.png (984×1 px, 283 KB)

Oh I see. The new text should be live now.