Develop tests for phabricator search to detect regressions / search quality issues
Open, Stalled, NormalPublic


Extra credit: set up monitoring to periodically evaluate search performance and alert when problems are detected.

Inspired by @jcrespo's suggestions from the parent task:

  • Test: if upgrades can break phab regularly, lets run some production-level unit test after upgrades
  • Monitor: let's set some icinga monitoring to check for degradation so owners learn about issues before anyone else
  • Workaround: have mysql as a failback to be temporarily enabled if things go badly
mmodell created this task.Dec 5 2017, 11:43 PM
mmodell triaged this task as High priority.
mmodell updated the task description. (Show Details)Dec 5 2017, 11:48 PM

@mmodell I see that that task has Browser-Tests tag. Let me know if you would like to use Selenium for this. I am available for setup, review and/or pairing.

@zeljkofilipin: Thanks! Yeah I think Selenium might be the way to go and I really am not sure where to begin. I'll read the docs first and then check with you when I get lost ;)

@mmodell in which repository the tests should live? I can create a simple search test for start, so you could see how it works.

@zeljkofilipin would it make sense to put them in the scap deployment repo for Phab? That would be rPHDEP Phabricator Deployment

zeljkofilipin added a comment.EditedDec 12 2017, 4:27 PM

Any repository would do as far as I am concerned! :)

The trouble I have is how to contribute to rPHDEP. I vaguely remember that there was a page on mediawiki that explained it, but I can not find it now. I am reading Phabricator, Diffusion and Category:Phabricator but I do not see the instructions.

Should I just push to a topic branch?

Looks like this is it: Arcanist.

Argh. Looks like I am doing something wrong. arc works fine (as far as I can see) in my home folder.

~$ arc version
arcanist 58f254840efe4b29b8c89684804fae7e2dfa525b (11 Oct 2017)
libphutil cb945f0205fab3b7683efe32ddae65eeb5e2b9af (30 Nov 2017)

But it fails in phab-deployment.

~/Documents/phabricator/phab-deployment$ arc version
Source file "phabricator/src/__phutil_library_init__.php" failed to load.
(Run with `--trace` for a full exception trace.)
~/Documents/phabricator/phab-deployment$ arc version --trace
 ARGV  '/Users/z/Documents/phabricator/arcanist/bin/../scripts/arcanist.php' 'version' '--trace'
 LOAD  Loaded "phutil" from "/Users/z/Documents/phabricator/libphutil/src".
 LOAD  Loaded "arcanist" from "/Users/z/Documents/phabricator/arcanist/src".
Config: Reading user configuration file "/Users/z/.arcrc"...
Config: Did not find system configuration at "/etc/arcconfig".
Working Copy: Reading .arcconfig from "/Users/z/Documents/phabricator/phab-deployment/.arcconfig".
Working Copy: Path "/Users/z/Documents/phabricator/phab-deployment" is part of `git` working copy "/Users/z/Documents/phabricator/phab-deployment".
Working Copy: Project root is at "/Users/z/Documents/phabricator/phab-deployment".
Config: Did not find local configuration at "/Users/z/Documents/phabricator/phab-deployment/.git/arc/config".
Loading phutil library from '/Users/z/Documents/phabricator/arcanist/src'...
Loading phutil library from 'phabricator/src'...

[2017-12-12 17:15:04] EXCEPTION: (Exception) Source file "phabricator/src/__phutil_library_init__.php" failed to load. at [<phutil>/src/moduleutils/PhutilBootloader.php:242]
arcanist(head=wmf/stable, ref.wmf/stable=58f254840efe), phutil(head=wmf/stable, ref.wmf/stable=cb945f0205fa)
  #0 PhutilBootloader::executeInclude(string) called at [<phutil>/src/moduleutils/PhutilBootloader.php:208]
  #1 PhutilBootloader::loadLibrary(string) called at [<phutil>/src/moduleutils/core.php:12]
  #2 phutil_load_library(string) called at [<arcanist>/scripts/arcanist.php:624]
  #3 arcanist_load_libraries(array, boolean, string, ArcanistWorkingCopyIdentity) called at [<arcanist>/scripts/arcanist.php:172]

@zeljkofilipin You now have push on that repo. I have never used arcanist on that repo directly as it's not really for code, only for deployments (and we don't do code review on scap deployment repos, typically)

Ok, I can push. I have pushed a small commit containing only a sample readme to T182160 branch.

zeljkofilipin added a comment.EditedDec 15 2017, 12:51 PM

I have just pushed the first test to T182160 branch.

EBernhardson added a subscriber: EBernhardson.EditedJan 19 2018, 6:03 PM

If you are looking for search quality, typically what would be done is:

  • Create a dataset that has a set of queries you care about
  • For each query source some results and grade them on a scale of 0-3 for how good they are
  • At test time run all the queries and get the result lists
  • Calculate per-query ndcg@n, average together all the queries. You might look at ndcg@3 and ndcg@10, but depends on how users use the search and how willing they are to scan down the list
  • Monitor changes in ndcg

NDCG is probbably the most common metric, but there might be other interesting ones. ndcg puts a lot of weight on the position of the result, but our users might only care that the result is there or not. Precision@n might be interesting for this as it disregards position and just checks if docs are in the top N. This is basically # of good docs / # of possible good docs in top N per query. So for P@5 if a query has 1 good result and that result is in 3rd place, the query gets a 1. If there are 3 and only 2 make it in the top 5 it gets .66.

The first part of this is now finished. We have the selenium/webdriver/mocha framework running and all that remains is to write some tests for search results. I think I will use production dataset and the production phabricator instance to run the tests against. Thanks @EBernhardson for the tips about search quality testing. I'll try to take this into account when building my tests. And thanks to @zeljkofilipin for setting up the testing framework and getting me up to speed on how this all works.

mmodell changed the task status from Open to Stalled.Feb 26 2018, 5:32 PM

Still need to develop a few more tests for this one.

I'm available for pairing and/or reviews! :)

This is currently not a priority due to other urgent stuff.

mmodell lowered the priority of this task from High to Normal.Jun 25 2018, 4:57 PM
mmodell added a project: User-MModell.Sep 10 2018, 4:51 PM
mmodell removed mmodell as the assignee of this task.