Page MenuHomePhabricator

Find out which namespace combinations are used for searching
Closed, ResolvedPublic

Description

Motivation
It would be great to offer searching specific namespace bundles. So far we don't really know which combinations are usually searched through together.

Task
Find out, which namespace combinations are used by users of the advanced search prototype. Do not store any information about the users themselves.

Event Timeline

Out of my own experience (on which you should base absolutely 0)

  • Help + Wikipedia + Education + Template + Module + talks (Community)
  • MediaWiki + User (js/css) + Template + talks (Tech)
  • Main + Draft + Portal + Book + Category + File + timedText (content)

I'm not a power searcher, so if someone could kindly provide a few (say, 3 or 4) example search queries that illustrate multi-namespace searching, that would be super helpful. (Please and thank you.)

From my experience as a long-time volunteer:

  • Searching in both "Help:…" + "Wikipedia:…" the same time is very common, because help pages are split into technical and project-specific ones, but users do not care that much about this split and should consider both in most cases.
  • Searching multiple talk namespaces is very common if you remember a phrase that was said in a discussion, but don't remember where. This typically includes the "Wikipedia:…" namespace because many discussions (e.g. deletion requests) happen there.
  • Template authors typically search both the "Template:…" and the "User:…" namespace, because template authors use the later as a sandbox.
  • We do have wikis with multiple "content namespaces", e.g. all Wiktionaries, and wikis with an "Appendix:…" namespace (e.g. https://es.wikipedia.org/wiki/1996#Fallecimientos links to an "Anexo:…" page). I was never very active in such wikis, but I assume there is a need to search these "content namespaces".

@mpopov The namespace search we are talking about is on the Special:Search page. You can tick the boxes of all namespaces, where your search term should be looked for (per default it just searches in the article namespace). I'm not a search expert either, but looking at @TheDJ examples, you could say for example that he selects the namespaces Main + Draft + Portal + Book + Category + File + timedText if he is looking for content, like "pineapple juice". Or he selects Help + Wikipedia + Education + Template + Module + talks when looking for the community discussion about "flow on talk pages". @TheDJ please correct me if I paraphrased your searches wrongly here :)

@thiemowmde & @Lea_WMDE: thank you very much!!! this is very helpful and I hope to have something tomorrow or early next week :)

Top combinations (that had >200 searches on 1 June 2017):

namespacessearchesproportion
Book, Book talk, Category, Category talk, Draft, Draft talk, Education Program, Education Program talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Portal, Portal talk, Project, Project talk, Talk, Template, Template talk, TimedText, TimedText talk, User, User talk31620.1108
Category, File16900.0592
Category, Creator, File, Help, Institution11250.0394
Category, Creator, File, Help, Institution, MediaWiki talk11220.0393
Category, Category talk, Education Program, Education Program talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Project, Project talk, Talk, Template, Template talk, User, User talk11040.0387
Anexo, Category, Category talk, Education Program, Education Program talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Portal, Project, Project talk, Talk, Template, Template talk, User, User talk, Wikiproyecto10920.0383
Category, Category talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Portal, Portal Diskussion, Project, Project talk, Talk, Template, Template talk, User, User talk10350.0363
Campaign, Campaign talk, Category, Category talk, Creator, Creator talk, Data, Data talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, GWToolset, GWToolset talk, Help, Help talk, Institution, Institution talk, MediaWiki, MediaWiki talk, Module, Module talk, Project, Project talk, Sequence, Sequence talk, Talk, Template, Template talk, TimedText, TimedText talk, Translations, Translations talk, User, User talk9990.0350
Category, Category talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Project, Project talk, Talk, Template, Template talk, User, User talk6510.0228
Category, Category talk, Draft, Draft talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Portal, Portal talk, Project, Project talk, Talk, Template, Template talk, User, User talk5750.0201
Campaign, Campaign talk, Category, Category talk, Creator, Creator talk, Data, Data talk, File, Gadget, Gadget definition, Gadget definition talk, Gadget talk, GWToolset, GWToolset talk, Help, Help talk, Institution, Institution talk, MediaWiki, MediaWiki talk, Module, Module talk, Sequence, Sequence talk, Template, Template talk, TimedText, TimedText talk, Translations, Translations talk5580.0196
Anexo, Portal5120.0179
Help, Project4620.0162
Help4570.0160
Help, Project, Template4410.0155
Book, Book talk, Category, Category talk, Draft, Draft talk, Education Program, Education Program talk, File, File talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Portal, Portal talk, Project, Project talk, Talk, Template, Template talk, TimedText, TimedText talk, User, User talk4320.0151
Project3900.0137
Category, Category talk, Education Program, Education Program talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, Livro, MediaWiki, MediaWiki talk, Module, Module talk, Portal, Project, Project talk, Talk, Template, Template talk, TimedText, TimedText talk, User, User talk3780.0132
Category3410.0119
Campaign, Campaign talk, Carte, Category, Category talk, Cod, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Portal, Proiect, Project, Project talk, Talk, Template, Template talk, User, User talk2700.0095
Category, File, MediaWiki, Project, User, User talk2520.0088
Campaign, Campaign talk, Category, Category talk, Creator, Creator talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, GWToolset, GWToolset talk, Help, Help talk, Institution, Institution talk, MediaWiki, MediaWiki talk, Module, Module talk, Project, Project talk, Sequence, Sequence talk, Talk, Template, Template talk, TimedText, TimedText talk, Translations, Translations talk, User, User talk2450.0086
Campaign, Campaign talk, Category, Category talk, Creator, Creator talk, Data, Data talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, GWToolset, GWToolset talk, Help, Help talk, Institution, Institution talk, MediaWiki, MediaWiki talk, Module, Module talk, Sequence, Sequence talk, Template, Template talk, TimedText, TimedText talk, Translations, Translations talk2400.0084
MediaWiki talk, Template talk2260.0079
Category, Category talk, Discussion Portail, Discussion Projet, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Portail, Project, Project talk, Projet, Talk, Template, Template talk, User, User talk2250.0079
Template2180.0076
Category, Category talk, File, File talk, Gadget, Gadget definition, Gadget definition talk, Gadget talk, Help, Help talk, MediaWiki, MediaWiki talk, Module, Module talk, Pembicaraan Portal, Portal, Project, Project talk, Talk, Template, Template talk, User, User talk2070.0073

Full dataset with breakdown by project + language (where applicable):

See T165861#3313123 for more details.

I've been looking through this data and have some insights.

Trends

  • Different Projects and Languages have different preferred namespace combinations
    • for example, Commons uses fewer namespaces in a given search than WP
      • likely due to search goals based on the kind of content
  • The most common search combination across different language Wikipedias is to search in All namespaces
    • I think this is because they want to search in everything except the article namespace, but the data doesn't reflect this namespace. The next point lends support to this hypothesis
  • One very common behavior on WP is wanting to look in everything except a few places that you know what you are looking for isn't in
    • Over 50% of WP advanced searches used more than 20 namespaces in a specific search
    • On English and German Wikipedias, the 2nd most commonly searched combination (after All) is All with a few things removed.
      • in the case of English WP, all namespaces to do with gadgets are removed
      • in the case of German WP all namespaces to do with gadgets are removed as well as Portal + Portal talk
    • This also makes me think that people are using advanced search instead of the Everything button in order to remove the Article namespace since what they are looking for in this scenario definitely isn't an article
      • it is also possible that users like the sense of control of actually being able to see that all namespaces are checked, or it is legacy behavior
  • The current assumption of adding the namespaces you want to look in specifically is also a common use case
    • 30% of searches used 4 or fewer namespaces
  • On the Wikis where the advanced search starts with namespaces chosen by default, those combinations are widely used
    • Spanish WP sets Anexo + Portal as the default advanced search namespaces and they were the 2nd most used combination (on es WP) and made up about 20% of searches on Spanish WP
    • Commons has Category + Creator + Help + File + Institution as defaults, also 2nd most searched combo on commons and make up about 15% of searches there
    • Are these defaults used so heavily because they are defaults or because they are useful?
      • I'm not sure, but it is clear that specific combinations as defaults are widely used.
  • The most commonly searched pair across all projects and languages is Project + Help
    • this doesn't take into account just pairs, ex if the search is Help + Project + Category, this counts
    • Project denotes the current project you are on, so on WP it is Wikipedia, on Wikidata, it is Wikidata, etc

Important Takeaways

  • people use All (and maybe None) button and modify the namespaces after that
  • people use search differently across projects and across languages on the same projects
    • no silver bullet
  • people use defaults commonly

Recommendations

Definitely keep:

  • all button
  • defaults

Definitely try for beta:

  • buttons that add specific namespace combinations
    • General Help: Help + Project
    • All Talk: Project + All talks + Talk
  • buttons that remove specific namespace combinations
    • -Gadgets seems like a good candidate for this: would remove gadgets, gadget talk, g definitions, g definitions talk if any are present

We should consider:

  • advanced search is often used by power users, and there aren't many clear combinations besides All on WP
    • can we allow users to define their own namespace sets/buttons?
  • I can generate a spreadsheet of values that says "if a user has just chosen this namespace, what are the chances that they want to pair it with this other namespace" that could be used to suggest other namespaces to add

I am open to feedback and especially interested in hearing from @thiemowmde and @TheDJ as well as any other Wikipedians on what I've written here.

Some stuff I used/made

This ugly python program that does calculations, outputs pair chance counts and percentages, and makes a single graph. (Uses the file that @mpopov uploaded)
This could be modified to ignore searches of more than 5 namespaces to offer useful suggestions.

A graph and some stats calculated by accumulating all Wikipedias

stats.png (220×700 px, 39 KB)

graph.png (475×636 px, 18 KB)

A csv of pairs made in WP (no accounting for language) when fewer than 6 namespaces are chosen

From T165861#3529293:

I've rewritten the query and recounted the namespace combos. I skipped profiles like Translations and Discussions since they're rare and weird and I don't have time to figure out how to deal with each one, but I included logic for content, multimedia, and everything profiles. Attaching counts from 2017-08-01:

Updated Findings

  • WP users overwhelmingly use the defaults: content (article), everything (all namespaces), and multimedia (file), generally in that order.
  • The other findings mostly still stand. The "percentages of total searches" from previous findings go down quite a lot since the default button options, which make up the large majority of searches, were included
  • wikis that have the "topic" namespace (I checked de, es, fr) often remove it (not actionable, just interesting)

It will be interesting to see whether these numbers look the same with buttons like multimedia no longer present: will users realize it was just searching files? will they adapt? have they just been doing it from the normal search bar (by typing file: ____) anyway?

Recommendations
same buttons as in previous post:

all button, general help (help + project), discussion (project and all talk namespaces), all gadgets (the namespaces that include gadgets, their definitions, and talks to do with them)

Since the all button has been implemented as a check box, they all should be implemented that way: when all namespaces associated with a checkbox are included, the box is checked, clicking it removes them. When only some are checked, the box is unchecked and checking it adds the ones that haven't been checked yet.

These recommendations should definitely be reevaluated based on how people use them and react to them during the beta, but I they're what should be tried based on the numbers, testimony in this thread, and the bits and pieces I've heard from people reporting back from Wikimania.

(thanks for setting up the query and aggregating the data @mpopov !)

Lea_WMDE claimed this task.