Page MenuHomePhabricator

Extract some statistics on the use of the isBlank() function in wdqs query logs
Closed, ResolvedPublic

Description

It would nice to have an idea of the percentage of queries that use the isBlank function. It might be interesting to try to identify tools using this function so that we could contact their maintainer if we were to introduce a new function to replace isBlank.

Event Timeline

dcausse updated the task description. (Show Details)

As I was working on getting a better idea of the queries, I got some results relatively easily:
Since beginning of year:

  • Internal cluster: No request using isBlank(), 481202298 requests total
  • External cluster: 54669 requests using isBlank(), 202695416 requests total (0.03%)

I can provide more details as needed :)

dcausse added a subscriber: Lea_Lacroix_WMDE.

@Lea_Lacroix_WMDE the use of isBlank seems pretty low, do you think we should still try to identify bots by grouping by user-agent and see if something is identifiable?

dcausse triaged this task as Medium priority.Feb 27 2020, 2:12 PM
dcausse moved this task from Incoming to Waiting on the Discovery-Search (Current work) board.

Yes please :) It's a low percentage but it's still far from zero. Can we also look at the example queries?

Events using isBlank since the beginning of year are now stored here: /user/joal/wdqs_queries/2020_use_isBlank/wdqs_use_is_blank_202002.json.
There are ~56k events stored in json format in a single file to facilitate analysis.

closing, will reopen if more details are needed