Page MenuHomePhabricator

Get search traffic breakdown for emerging language wikis
Closed, ResolvedPublic

Description

As a product manager, I want to understand the breakdown of user search traffic on emerging language wikipedias, so I can understand the estimated scale of impact of planned features as part of special:search experimentations.

Background and context

Search and Structured Data teams plan to work on improving special:search experience on emerging language wikis that generally have less content/articles than bigger wikipedias
Use case: If there is no exact article match in the go bar, the reader is redirected to the special search page. Improve the user experience on the special search page by showing more/relevant content.
Users: Casual readers on emerging language wikipedias.

Previous analysis: There was an analysis in 2017 that produced the following diagram of results:

search-traffic-plot.png (600×1 px, 256 KB)

In order to have an understanding of the volume of potential impact, we would like to answer the following high level questions: How many users end up on the special search page because there is no exact article match for what they are looking for? Does that happen often?

In order to further establish baselines of search performance, we would like to understand search engagement when getting to the special search page (clicks-thru-rate, etc..)

Requirements

Examples of questions we would like to answer:

  • Total search volume per wiki: What is the total number of searches in the go bar?
  • Autocomplete only
  • Go bar-to-special:search volume per wiki:
    • What is the amount/% of searches initiated in the go-bar that end up on the special page?
    • What is the amount/% of users that get redirected to the special search page after doing a go bar search?
    • What amount/percentage of queries that get redirected to special:search had no autocomplete suggestions?
    • What amount/percentage of queries that have no autocomplete suggestions also have zero full text search results (i.e. 0 autosuggest suggestions > 0 special:search results)? inverse: what amount/percentage of queries with no autocomplete suggestions do have results in special:search?
  • Are users clicking through on special:search results, especially when they are redirected there and there are no autocomplete suggestions?
  • Search engagement: TBD

Languages of interest
We are interested in the following emerging languages for the search experimentations:

Priority 1:

  • Arabic
  • Bengali*
  • Spanish
  • Portuguese*
  • Russian

Priority 2:

  • French*
  • Korean*
  • Indonesian
  • Ukrainian
  • Thai*
  • Malaysian (?)
  • Hindi
  • Tagalog
  • Afrikaans
  • Cantonese
  • Malayalam
  • Telugu

*These languages are part of wikis to avoid (below)

Wikis to avoid:

Note that we want to avoid/be mindful when doing analysis for wikipedias that have new vector deployed on desktop for the go bar, as that might affect our metrics.
List of wikis of early adopters for new Vector skin (and go bar improvements): https://www.mediawiki.org/wiki/Reading/Web/Desktop_Improvements#List_of_early_adopter_wikis_(test_wikis)

Dashboard:

  • Dashboard on Superset with monthly updated data with notes from notebook.
  • Historial metrics are stored in a hive table.
Supporting information
  • Search team has a script in R to compute search traffic metrics on specific wikipedias
  • We should be able to get historical data for the last 90 days instead of waiting for 2 weeks thanks to the new logging infrastructure

Event Timeline

MPhamWMF updated the task description. (Show Details)
MPhamWMF renamed this task from DRAFT - Get search traffic breakdown for emerging language wikis to Get search traffic breakdown for emerging language wikis.Feb 17 2022, 7:33 PM
MPhamWMF updated the task description. (Show Details)
MPhamWMF moved this task from needs triage to later on... on the Discovery-Search board.
MPhamWMF moved this task from later on... to watching / waiting on the Discovery-Search board.
mpopov edited projects, added Product-Analytics; removed Product-Analytics (Kanban).
mpopov added a subscriber: mpopov.

Moving this out of PA's Kanban for triaging tomorrow (2022-02-28) at the team meeting

Please find the analysis for required metrics related to go-bar searches below. The numbers are from my pull of 1 week of data from 04/03/2022 to 04/10/2022.

Columns in the tables below contain following metrics :

  • Total number of searches in the go-bar
  • Number (%) of Autocomplete only searches in the go-bar
  • Number (%) of searches initiated in the go-bar that end up on the special page
  • Number (%) of searches initiated in the go-bar that get redirected to special: search had no autocomplete suggestions
  • Number (%) of searches initiated in the go-bar that get redirected to special: search had no autocomplete suggestions also have zero full-text search results
  • Number (%) of searches initiated in the go-bar that get redirected to special: search had no autocomplete suggestions but do have results in special:search

Priority 1

gobar_searchautocomplete_only_searchgobar_to_special_searchauto_zero_resultauto_special_zero_resultauto_zero_result_only
arwiki57,33836,867 (64.2%)20,471 (35.7%)15,615 (76.3%)2,153 (10.5%)13,462 (65.8%)
bnwiki* 7,0764,384 (62.0%)2,692 (38.0%)2,196 (81.6%)1,014 (37.7%)1,182 (43.9%)
eswiki831,582683,817 (82.2%)147,765 (17.8%)107,394 (72.7%)10,869 (7.3%)96,525 (65.3%)
ptwiki* 323,991245,077 (75.6%)78,914 (24.4%)61,797 (78.3%)7,190 (9.1%)54,607 (69.2%)
ruwiki1,336,4831,187,723 (88.8%)148,760 (11.1%)103,010 (69.2%)15,757 (10.6%)87,253 (58.7%)

Priority 2

gobar_searchautocomplete_only_searchgobar_to_special_searchauto_zero_resultauto_special_zero_resultauto_zero_result_only
afwiki1,437872 (60.7%)565 (39.3%)410 (72.6%)134 (23.7%)276 (48.8%)
frwiki* 1,398,6101,131,050 (80.9%)267,560 (19.1%)199,072 (74.4%)17,170 (6.4%)181,902 (68.0%)
hiwiki2,9571250(42.3%)1,707 (57.7%)1,486 (87.1%)711 (41.7%)775 (45.4%)
idwiki64,38642,880 (66.6%)21,506 (33.4%)17,102 (79.5%)2,470 (11.5%)14,632 (68.0%)
kowiki* 277,048190,712 (68.8%)86,336 (31.2%)54,911 (63.6%)8,622 (10.0%)46,289 (53.6%)
mlwiki1,298858 (66.1%)440 (33.9%)336 (76.4%)170 (38.6%)166 (37.7%)
mswiki6,1064,188 (68.6%)1,918 (31.4%)1,410 (73.5%)196 (10.2%)1,214 (63.3%)
tewiki1,125571 (50.8%)554 (49.2%)465 (83.9%)209 (37.7%)256 (46.2%)
thwiki* 22,34815,228 (68.1%)7,120 (31.9%)5,486 (77.1%)653 (9.2%)4,833 (67.9%)
tlwiki3,2662,306 (70.6%)960 (29.4%)716 (74.6%)226 (23.5%)490 (51.0%)
ukwiki72,83158,119 (80.0%)14,712 (20.2%)9,925 (67.5%)1,919 (13.0%)8,006 (54.4%)
zh_yuewiki1,644521 (31.7%)1,123 (68.3%)364 (69.9%)105 (20.2%)259 (23.1%)

From the data, looks like smaller wikis (e.g. Bengali, Hindi) are more likely to have searches that end up on special page, and with zero results.

Notes:

  • For the number of searches, I count every distinct searchSessionId + pageViewId combination. A search session can consist of multiple searches as the user types out their query, and this collapses them into a single unit.
  • Only desktop searches are included in this analysis.
  • Bots and test users are excluded.
  • Calculations are in this notebook.

Please let me know if there are any questions or anything unclear.
Also for the next step, as mentioned in the description, do we need a dashboard for this set of metrics, or it's a one-time report? Also, would we want to have aggregate data for several Wikipedias?

Thanks @cchen ! This is great.

  1. is the zh_yuewiki data correct? It says autocomplete_only_search is 521, which is not 31.7% of 1,336,483.
  2. the original chart had clickthrough numbers as well; is that something that could also be added here? It would be good to know if users are clicking through on special:search results, especially when they are redirected there and there are no autocomplete suggestions

@MPhamWMF there was a copy and paste error, I updated the go bar searches count for zh_yuewiki.

And yes, I will add the click-through numbers for these wikis.

Adding clickthrough numbers for go bar searches and special:search results.

gobar_searchgobar_search_clicks (CTR)gobar_to_special_searchspecial_search_clicks (CTR)
afwiki1,437553 (38.48%)56597 (17.17%)
arwiki57,33827,048 (47.17%)20,4714,521 (22.08%)
bnwiki*7,0763,035 (42.89%)2,692321 (11.92%)
eswiki831,582429,048 (51.59%)147,76537,388 (25.30%)
frwiki* 1,398,610799,744 (57.18%)267,56077,005 (28.78%)
hiwiki2,957576 (19.48%)1,707300 (17.57%)
idwiki64,38628,720 (44.61%)21,5064,781 (22.23%)
kowiki* 277,04865,627 (23.69%)86,33619,935 (23.09%)
mlwiki1,298635 (48.92%)44046 (10.45%)
mswiki6,1062,107 (34.51%)1,918325 (16.94%)
ptwiki*323,991168,704 (52.07%)78,91418,499 (23.44%)
ruwiki1,336,483935,589 (70.00%)148,76054,664 (36.75%)
tewiki1,125381 (33.87%)55496 (17.33%)
thwiki* 22,34810,743 (48.07%)7,1201,364 (19.16%)
tlwiki3,266992 (30.37%)96087 (9.06%)
ukwiki72,83141,680 (57.23%)14,7124,078 (27.72%)
zh_yuewiki1,644634 (38.56%)1,12389 (17.08%)

@cchen , looks like maybe the gobar_search numbers for ptwiki and ruwiki are off