Page MenuHomePhabricator

Analysis of advanced search usage
Closed, ResolvedPublic

Description

This is the umbrella ticket for all tickets around the topic of analysing advanced search.

Event Timeline

Lea_WMDE triaged this task as Medium priority.Feb 7 2018, 11:15 AM
Lea_WMDE created this task.

@Lea_WMDE The Advanced Search Extension Dashboard now presents the data described in:

  • T187039 - Measure change in keyword usage with AdvancedSearch, on the Search Keywords page (fifth menu item in the Dashboard navigation), and
  • T187038 - Measure usefulness of special:search page with advancedSearch, on the Special: Search page (fourth menu item in the Dashboard navigation).

The sources of the data are described on each respective Dashboard page.

Please review this Dashboard in respect to T187039 and T187038 and let me know if these two tickets can be resolved now. Thanks!

Hi Goran,

in general: I like the overview statements that you added to all screens!
T187038 (special:search): I like what I see so far, but I don't always want to see the last 90 days, but a "growing graph", where I might lose granularity, but not dates. So in a year it should present the data of the whole past year (but again it could be in a granularity of 1 datapoint per day or so.)
T187039 (keyword use): Same here, it should be a "growing graph". Also, would it be possible to display all lines at the same time, and only focus on one, if wanted? With the selector it feels a bit like looking for the needle in the hay stack, you have to try everything out to get a general feel.
bonus: on the first tab (where you did not actually change anything) it looks like something is broken: All the graphs seem to consider the last day to be March 1st. And of the "last number figures" only the first one seems to be correct, too.

Thanks @Lea_WMDE!

-> "growing graphs"

  • I think I know the solution that will fit what you are describing. Reporting back as soon as implement it on the Dashboard.

"Also, would it be possible to display all lines at the same time, and only focus on one, if wanted?"

  • Depends on (a) whether the above mentioned visualization can support something like that, and (b) whether the data scales so that it makes sense to have different keywords represented on the same chart. I need to experiment a bit with this and than I can let you know. I will do my best to make it happen, of course.

... on the first tab (where you did not actually change anything) it looks like something is broken...

  • That means that something is wrong with the update module - fixing it is now a priority.

@Lea_WMDE

Please take a look at the Dashboard now. As of the update problem ("All the graphs seem to consider the last day to be March 1st. ") - someone has introduced a new eventLogging SQL schema for the AdvancedSearchExtension without notifying me; the Dashboard now relies on a concatenation of data from two consecutive eventLogging schemes on the first data intake, and the proceeds to work with the most recent version. All in all, everything should be fixed now.

The dygraphs were introduced as an implementation of your "growing graphs" idea, please check it out and let me know what you think.

NOTE. Every time I change the Dashboard update module, we lose data - because of the 90 day restriction for the wmf_raw database. We should decide if the current version of the Dashboard and the data that it presents are fine or not, so that I could put the update procedures on crontab in production, automate, and we stop losing data points from the past. Thanks.

Hi @GoranSMilovanovic,
generally I like it very much :) Being able to restrict the timeframe from below is awesome! Just to make sure, since the explanatory text still mentions "90 days", we are passing the 90 days, right? Maybe we should adapt the text then.

And one more question to the special:search part: Right now we are displaying all searches that STARTED OFF at special:search, so all results displayed because of a redirect of the upper right search box are not included, right? (This is what we intend). Would it be possible, to exclude in another graph all searches that came after they were redirected from the upper right search box? (I would create another ticket for that then)

I don't fully understand your note, since I don't know which of your actions change the dashboard module. Shall we have a quick chat next week about that, and about improving the process so you are not suprised by SQL schema changes, plus your time frame for the other technical wishes tasks?

Hi @Lea_WMDE

Just to make sure, since the explanatory text still mentions "90 days", we are passing the 90 days, right? Maybe we should adapt the text then.

Fixed.

And one more question to the special:search part: Right now we are displaying all searches that STARTED OFF at special:search, so all results displayed because of a redirect of the upper right search box are not included, right? (This is what we intend). Would it be possible, to exclude in another graph all searches that came after they were redirected from the upper right search box? (I would create another ticket for that then)

The data are exactly defined by what is provided in the description of T187038 by @thiemowmde or @daniel. Please open a new ticket for this and make sure that someone describes the data exactly via the precise values of the URL parameters that should be looked upon in the Cirrus or webrequest tables (or wherever in the Analytics Data Lake). Thanks.

Shall we have a quick chat next week about that, and about improving the process so you are not suprised by SQL schema changes, plus your time frame for the other technical wishes tasks?

That would be the best thing to do at this point, I believe. Please take into your consideration that Wikidata and WDCM related tasks will be in focus in April - May 2018, as well as the fact that the New Editors team is starting a new Banner Campaign in early May. Thus, now is the time to develop what needs to be developed for this Dashboard, because later on in the Spring the chances are slim that I will be able to focus on this. Thanks!

The essential thing that you need to understand from my previous comment is the following one: the longer the development of this Dashboard, the more data from the past become unavailable to us. Simply, search and webrequest data get purged from our Big Data storage systems after 90 days. Additional day of development = additional data point lost on all graphs.