It's not the first time, and it probably won't be the last: the goal of this task is to better understand the sources of fulltext search session abandonment, in hopes of finding opportunities for improvement when possible.
As a search stakeholder, when determining enhancements it would be helpful to understand what things are possibly within the control of the search platform and which things are actually not easily treated by the search platform.
Currently, fulltext search session abandonment appears to hover near 48% on desktop.
{F57534802, size=full}
Thinking out loud, possible components of this seemingly moderately high abandonment rate include:
- AutomataAcceptance criteria:
- Inadequate content for things that probably could realistically exist on the given Wikimedia project if notability criteria are someday metFor the top 10 most frequently visited Wikipedias classify the component(s) most likely for X (50?) fulltext search session abandonments apiece (approach for sampling probably includes a mix of fulltext head queries and some form of random sampling)
- Inadequate content for things that probably won't realistically ever exist on a given Wikimedia project given its content policieAs this is exploratory in part, identify any other obvious component(s) leading to fulltext search session abandonments
- Inadequate search termsSplit the data based on original source of fulltext search, particularly when a user started fulltext search from a namespace 0 article view versus not that (this could be coarse grained like "not namespace 0" or could be finer grained)
- Overabundant search terms (e.g.- If possible, natural language query too longindicate possible alternative search or presentation strategy that could be used (this may pertain to both zero results and non-zero results cases) (BONUS if there's any way to automate this)
- Typing accidentsIdentify whether search could be satisfied with an external search engine / conversational agent with Wikipedia/Wikimedia content (be careful to not overwhelm and beware of filter bubbles)
- Copy-paste accidentsIdentify whether search could be satisfied with an external search engine / conversational agent with not-Wikipedia/Wikimedia content (be careful to not overwhelm and beware of filter bubbles)
- Bad spelling guesses- Analyze and make a report
- "Wrong keyboard" issue- Document the approach
- Referred search sessions that aren't same-site organic- Describe potential next tickets to act on the data (or if data are fulltext search (the traffic may be organicy inconclusive or nonactionable, or it may be organized activity)describe why)
Some notes:
- Non-automata UA spurious calls- Commons, Wikidata, and other sister projects excluded due to different interaction patterns
- User power tools spawning lots of sessions likely to be less prone to clicksThe target for the analysis could be this task or a wiki page or both, and seems likely an access controlled Jupyter notebook or Sheet will be necessary to hand code the data
Thinking out loud, possible components of this seemingly moderately high abandonment rate include:
- Automata
- Sister search results sidebar clicks (?)Inadequate content for things that probably could realistically exist on the given Wikimedia project if notability criteria are someday met
- Users who were actually satisfied by looking at the SERP. {T375387} intends to help identify one kind of "satisfied" but abandoned searcher behavior. Others can be harder to track, such as when search snippets satisfy search intent or when the UA configuration reduces well intentioned measurement instrumentInadequate content for things that probably won't realistically ever exist on a given Wikimedia project given its content policies
- Data collection that inflates the denominator of fulltext search (perhaps when a page is served but the user didn't/couldn't see the SERP; maybe there are also unanticipated redirects or some such thing)
There are probably more potential components at play, but those are some that came to mind.
There can be overlap between these components. And there can be "fixes" above and beyond current approach to mitigate these components sometimes, ranging from updating measurement/visualization approach to applying different search or presentation strategies. Some of the potential components are likely observable in "fulltext head queries", and some likely are not.
Acceptance criteria:Inadequate search terms
- For the top 10 most frequently visited Wikipedias classify the component(s) most likely for X (50?) fulltext search session abandonments apiece (approach for sampling probably includes a mix of fulltext head queries and some form of random sampliOverabundant search terms (e.g., natural language query too long)
- As this is exploratory in part, identify any other obvious component(s) leading to fulltext search session abandonm- Typing accidents
- Split the data based on original source of fulltext search, particularly when a user started fulltext search from a namespace 0 article view versus not that (this could be coarse grained like "not namespace 0" or could be finer grained)Copy-paste accidents
- If possible, indicate possible alternative search or presentation strategy that could be used (this may pertain to both zero results and non-zero results cases) (BONUS if there's any way to automate this)Bad spelling guesses
- Identify whether search could be satisfied with an external search engine / conversational agent with Wikipedia/Wikimedia content (be careful to not overwhelm and beware of filter bubbles)"Wrong keyboard" issue
- Identify whether search could be satisfied with an external search engine / conversational agent with not-Wikipedia/Wikimedia content (be careful to not overwhelm and beware of filter bubblesReferred search sessions that aren't same-site organic fulltext search (the traffic may be organic, or it may be organized activity)
- Analyze and make a report- Non-automata UA spurious calls
- Document the approach- User power tools spawning lots of sessions likely to be less prone to clicks
- Describe potential next tickets to act on the data (or if data are fully inconclusive or nonactionable, describe why)
Some notes:Sister search results sidebar clicks (?)
- Commons,Users who were actually satisfied by looking at the SERP. {T375387} intends to help identify one kind of "satisfied" but abandoned searcher behavior. WikidataOthers can be harder to track, and other sister projects excluded due to differentsuch as when search snippets satisfy search intent or when the UA configuration reduces well interacntion patternsed measurement instruments
- The target for the analysis could be this task or a wiki page or bothData collection that inflates the denominator of fulltext search (perhaps when a page is served but the user didn't/couldn't see the SERP; maybe there are also unanticipated redirects or some such thing)
There are probably more potential components at play, but those are some that came to mind.
There can be overlap between these components. And there can be "fixes" above and beyond current approach to mitigate these components sometimes, ranging from updating measurement/visualization approach to applying different search or presentation strategies. Some of the potential components are likely observable in "fulltext head queries", and seemsome likely an access controlled Jupyter notebook or Sheet will be necessary to hand code the dataare not.