Page MenuHomePhabricator

[Liftwing testing] - Post deployment testing
Open, Needs TriagePublic

Description

As an engineer,

I would like to have a process that tests the deployed model servers after each change in code in order to verify that the models work as expected in production.
This could be some basic testing on the endpoints (something like a smoke test) which at the moment this can only be done manually and it is a burdensome task since we have over 100 models deployed.

A rollout plan for these tests could be the following:

  • Identify the inputs for each model and write a script(s) that verifies that we get a 200 response from the endpoints.
  • The scripts are run manually from the engineer after deployment
  • In the future, once we verify that these tests work well and are sufficient we can enrich them and add them to our CD process.

Event Timeline

Change 884292 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] test: liftwing manual testing on deployment server

https://gerrit.wikimedia.org/r/884292

In the attached patch I added a python script that hits all the deployed models in production and staging and verifies that a proper response is returned (200 status code and word probability in text).
If both of the revision ids fail to give a proper response we log an error with the appropriate info. The reason for testing 2 revision ids is that I got some errors in editquality damaging pl wiki when I used one rev id, so I thought this was a good "hack" to avoid false positives.
I also added two files used by the script:

  • a configuration file named deployed_models.yaml which lists all the deployed models in staging and production
  • a json file named rev_ids.json that holds two revision ids for every wiki language as found in the table event_sanitized.mediawiki_revision_score

I got the revision ids by running this query via presto:

SELECT database as wiki_db, MAX(rev_id) AS rev_id_1, MIN(rev_id) AS rev_id_2 
    FROM event_sanitized.mediawiki_revision_score
    WHERE 
        year = 2022 AND 
        page_namespace = 0 AND
        substr(database, -4) = 'wiki'
    GROUP BY database

As discussed within the team we want to proceed with httpbb which is a more standard tool for this purpose. The python script has been uploaded to inference services repo for reference and can be used for now until we make httpbb work.

At the moment there is an issue with httpb not accepting json payloads for which we have submitted a patch which is currently under review.

Change 885990 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/puppet@production] httpbb: add tests for liftwing (prod/staging)

https://gerrit.wikimedia.org/r/885990

Change 885990 merged by Elukey:

[operations/puppet@production] httpbb: add tests for liftwing (prod/staging)

https://gerrit.wikimedia.org/r/885990

Change 886063 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/puppet@production] profile::httpbb: fix liftwing hosts

https://gerrit.wikimedia.org/r/886063

Change 886063 merged by Elukey:

[operations/puppet@production] profile::httpbb: fix liftwing hosts

https://gerrit.wikimedia.org/r/886063

Change 886375 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/puppet@production] httpbb: liftiwing add new API tests

https://gerrit.wikimedia.org/r/886375

Change 886375 merged by Elukey:

[operations/puppet@production] httpbb: liftiwing add new API tests

https://gerrit.wikimedia.org/r/886375