Decide whether creating Phester is actually worth while
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	daniel
	Apr 29 2019, 7:29 PM

Description

Now that we have a better understanding what phester should do and how much work it would be to create it, we should re-consider using existing tools instead.

An initial survey turned up the following options:

strst (typescript, new/small project, yaml based, lacks some features)
tavern (python, new/small project, yaml based, lacks some features)
behat (php, established, feature rich & complex, focus on cucumber style logic tests)
codeception (php, established, feature rich & complex, focus on fluent style logic tests)

https://www.cypress.io/ based on node.js, tests are written in node.js; seems to want to avoid the kind of fixture creation described in the phester spec.
https://robotframework.org/ written in python. Cucumber-ish tests
https://www.soapui.org/open-source.html A drag-and drop UI approach to test specification
https://github.com/apiaryio/dredd Written in node.js, tests are YAML files defining APIs. Mature

These should be discussed, and more should perhaps be found and considered.

Decision matrix (tentative): https://docs.google.com/spreadsheets/d/1G50XPisubSRttq4QhakSij8RDF5TBAxrJBwZ7xdBZG0/edit#gid=0

Requirements, that seem particular to (or especially important for) the Wikimedia use case:

HTTP-centric paradigm, focusing on specifying the headers and body of requests, and running assertions against headers and body of the response.
Support for running assertions against parts of a structured (JSON) response (JSON-to-JSON comparison, with the ability to use the more human friendly YAML syntax)
filtering by tags (because we expect to have a large number of tests)
parallel execution (because we expect to have a large number of tests)
yaml based declarative tests and fixtures: tests should be language agnostic, it should be easy to write tests for people involved with different language ecosystems and code bases. This also avoids lock-in to a specific tool, since yaml is easy to parse and convert.
generalized fixture creation, entirely API based, without the need to write "code" other than specifying requests in yaml.
randomized fixtures, so we can create privileged users on potentially public tests systems.
control over cookies and sessions
ease of running on in dev environments without the need to install additional tools / infrastructure (this might by a reason to switch to python for implementation; node.js is also still in the race).
discovery of tests defined by extensions.

Related Objects
Search...

Status	Assigned	Task
Stalled	Atieno	T219873 Create a suite of end-to-end API test for MediaWiki core
Resolved	CCicalese_WMF	T228101 Pick a test runner (our own or an existing one)
Resolved	daniel	T222100 Decide whether creating Phester is actually worth while
Resolved	daniel	T228001 Create an initial set of API integration tests using variables (part 1)
Resolved	• Clarakosi	T225614 Create an initial set of basic API integration tests
Resolved	• Clarakosi	T228111 API integration test: anonymous page creation and editing via API:Edit
Resolved	• Clarakosi	T228113 API integration test: re-parse of dependent pages via API:Edit
Resolved	• Clarakosi	T228119 API integration test: page history with edit summary, size diff
Resolved	• Clarakosi	T228124 API integration test: recent changes with edit summary, size diff, etc
Resolved	• Clarakosi	T228125 API integration test: renaming/moving a page (basic) via API:Move
Resolved	• Clarakosi	T228126 API integration test: pre-save transform (PSR) (via API:Edit)
Resolved	• Clarakosi	T228127 API integration test: template transclusion via API:Parsing wikitext
Resolved	• Clarakosi	T228128 API integration test: some parser functions via API:Parsing wikitext
Resolved	• Clarakosi	T228129 API integration test: some magic words via API:Parsing wikitext
Resolved	• Clarakosi	T228130 API integration test: diffs via API:Compare
Resolved	• Clarakosi	T228131 API integration test: fetching different kinds of links / reverse links
Resolved	• Clarakosi	T228132 API integration test: listing category contents
Resolved	• Clarakosi	T228133 API integration test: basic Kask tests
Resolved	• Clarakosi	T228134 API integration test: basic RESTbase tests
Resolved	daniel	T228323 Document test runner decision and rationale
Resolved	• Clarakosi	T228324 Create an initial set of API integration tests using variables (part 2)

Event Timeline

daniel created this task.Apr 29 2019, 7:29 PM

daniel added a parent task: T219873: Create a suite of end-to-end API test for MediaWiki core.

daniel added a project: Platform Team Workboards.Apr 29 2019, 7:32 PM

daniel moved this task from Not a real column to Ready on the Platform Team Workboards board.

Just to make sure this isn't lost, here is a list of potential candidates that @Fjalapeno sent me:

Very VERY brief assessment of the tools suggested by corey:

https://www.cypress.io/ based on node.js, tests are written in node.js; seems to want to avoid the kind of fixture creation described in the phester spec.
https://robotframework.org/ written in python. Cucumber-ish tests
https://www.soapui.org/open-source.html A drag-and drop UI approach to test specification
https://github.com/apiaryio/dredd Written in node.js, tests are YAML files defining APIs. Mature

In terms of functionality, dredd seems to be a good fit at a glance. Requiring node.js for running tests doesn't seem ideal, but will probably become a lot less annoying with better containerization of the development and testing environment. If dredd was written in python or php, I'd probably go for it.

Jdforrester-WMF subscribed.May 8 2019, 9:49 PM

daniel updated the task description. (Show Details)May 13 2019, 2:46 PM

daniel added a project: User-Daniel.

CCicalese_WMF triaged this task as Medium priority.May 13 2019, 3:31 PM

daniel updated the task description. (Show Details)May 14 2019, 8:08 PM

This week I was tasked with testing Dredd and I wanted to provide a full summary since we’ll be making the final decision soon.

Dredd accepts two different file types, OpenAPI and API Blueprint. OpenAPI has strict guidelines against duplicating endpoints and methods so it’s not surprising that Dredd errors out when you attempt to bypass those guidelines. This is not feasible for the Action API as there is only one endpoint. Even if the Action API was a REST API it would still limit the tests to target each HTTP method only once.

The other option is API Blueprint. I’ve wrestled with the specification in the last couple of days (mainly because it has VERY limited documentation and low adoption). Unlike OpenAPI/Dredd, API Blueprint doesn’t error out when you provide it duplicate endpoints or methods but it does provide a warning. Also, note that although the API Blueprint seems to be more forgiving with certain violations it still would tie the monitoring and integration tests to API Blueprint which doesn’t seem to have added much in 3 years.

To get a sense of how our simple ActionAPI test looks with Dredd/API Blueprint look here. When you compare it to Phester it's over 4x larger and requires a schema of responses in order to compare the results and provide regex support. When considering monitoring, this isn’t really a con as you’ll be testing a handful of endpoints but when testing a large and complex API like the Action API this can quickly evolve to large files that are prone to errors.

One of the earlier selling points of Dredd for me was the variable extraction as shown here. On one hand, it provides a lot of flexibility to be able to provide unique instructions before/after endpoints are tested but it also doesn’t lend itself to DRY practices to explicitly extract/insert local variables for multiple requests as I expect we’d be doing for the Action API.

Overall, I think Dredd is a workable solution for small APIs but not the best tool for testing large APIs like the Action API. In this case, I’d lean more towards further developing Phester.

daniel reassigned this task from daniel to CCicalese_WMF.May 28 2019, 7:10 PM

daniel updated the task description. (Show Details)Jun 3 2019, 3:47 PM

zeljkofilipin added a project: User-zeljkofilipin.Jun 4 2019, 10:55 AM

zeljkofilipin moved this task from Backlog 🪒 to Deep work 🌊 on the User-zeljkofilipin board.

zeljkofilipin updated the task description. (Show Details)Jun 5 2019, 2:30 PM

zeljkofilipin awarded a token.Jun 7 2019, 2:58 PM

zeljkofilipin updated the task description. (Show Details)

It's obvious a few people have put a lot of thought and effort in this. Existing options were considered and the decision was made to make a prototype. As far as I can tell, it's working well enough to consider further development.

In general, I dislike tests that are not written in a programming language. I can see the value in this case, but we should be very careful.

Advantages

The biggest advantage to tests not written in a programming language is that people not familiar with that language can read and write tests. That is valuable only if people writing tests would not be familiar with the code. If all/most people writing/reading tests already know the language (PHP) then writing tests in the language might make more sense.

Disadvantages

I do like yaml in general, but I am not sure it's a good choice for a big test suite. (As far as I understood it, phester would be used to create a big test suite.) To have a taste of why it might not be a good idea to write a big project in yaml, take a look at [[ https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master | integration/config ]] repository, [[ https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/jjb/ | jjb ]] folder. It contains jenkins job definitions written in yaml. It's transformed into xml that jenkins uses internally (as far as I understand it). It gets complicated quickly. I fear the same could happen to the tests written in yaml.

If a lot of tests would be similar, with very small differences (that's the case in jjb example) reusing existing code might get complicated.

Recommendation

My recommendation would be to implement a small but representative test suite in both yaml and a programming language (like php) using a testing framework like phpunit. The suite should have small number of very simple tests (list all pages, list all users...), small number of tests of usual complexity (create account > log in > edit page...), and the vast majority of the tests would be complicated tests, running multiple fixtures and testing complicated workflows, trying to reuse existing code with small differences. That is the place where I think the tool will either shine or crash. Duplication of effort (for creating two test suites initially) should not take too much time and I think it would be very valuable. Comparing how easy it is to write and read code written in both would be valuable, I think.

Even if phester doesn't support all the features that more complicated tests require, I think the tests should be written anyway, to develop the test format.

Thanks to @thcipriani for a humorous page explaining why you should use yaml for everything https://noyaml.com/

daniel updated the task description. (Show Details)Jun 12 2019, 10:59 AM

daniel updated the task description. (Show Details)Jun 13 2019, 8:11 AM

zeljkofilipin removed a project: User-zeljkofilipin.Jun 17 2019, 1:11 PM

Decision matrix (tentative): https://docs.google.com/spreadsheets/d/1G50XPisubSRttq4QhakSij8RDF5TBAxrJBwZ7xdBZG0/edit#gid=0

It doesn't look like Behat got scored on all criteria; I've used it extensively over the past few years and would be happy to talk through its strengths and weaknesses if anyone on CPT would like.

In T222100#5283732, @kostajh wrote:

Decision matrix (tentative): https://docs.google.com/spreadsheets/d/1G50XPisubSRttq4QhakSij8RDF5TBAxrJBwZ7xdBZG0/edit#gid=0

It doesn't look like Behat got scored on all criteria; I've used it extensively over the past few years and would be happy to talk through its strengths and weaknesses if anyone on CPT would like.

I'd love to hear your impression. Can you start by putting comments into the matrix? Or values, if you feel like it.

So I played with codeception a bit. Here's the basic CRUD tests:

2 34 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93

class="paste-embed-head">P8712 API CRUD test in codeception

<?php class="k">class CRUDCest { public function _before(ApiTester $I) { } // tests public function testCreateEditDelete(ApiTester $I) { $I->wantTo( 'Create, edit, and delete' ); $I->haveHttpHeader('Content-Type', 'application/x-www-form-urlencoded'); $I->sendPOST( 'api.php', [ 'action' => 'edit', 'title' => 'Test', 'creatonly' => 'true', 'format' => 'json', 'summary' => 'some test', 'text' => 'test text', 'token' => '+\\', ] ); $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200 $I->seeResponseContainsJson( [ 'edit' => [ 'result' => 'Success' ] ]); $I->sendGET( 'api.php', [ 'action' => 'parse', 'page' => 'Test', 'format' => 'json', ] ); $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200 $I->seeResponseIsJson(); $I->seeResponseMatches( '/test text/'); $I->haveHttpHeader('Content-Type', 'application/x-www-form-urlencoded'); $I->sendPOST( 'api.php', [ 'action' => 'edit', 'title' => 'Test', 'format' => 'json', 'summary' => 'some edit', 'text' => 'edited test text', 'token' => '+\\', ] ); $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200 $I->seeResponseContainsJson( [ 'edit' => [ 'result' => 'Success' ] ]); $I->sendGET( 'api.php', [ 'action' => 'parse', 'page' => 'Test', 'format' => 'json', ] ); $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200 $I->seeResponseIsJson(); $I->seeResponseMatches( '/edited test text/'); $I->haveHttpHeader('Content-Type', 'application/x-www-form-urlencoded'); $I->sendPOST( 'api.php', [ 'action' => 'delete', 'title' => 'Test', 'format' => 'json', 'token' => '+\\', ] ); $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200 $I->seeResponseContainsJson( [ 'delete' => [ 'title' => 'Test' ] ]); $I->sendGET( 'api.php', [ 'action' => 'parse', 'page' => 'Test', 'format' => 'json', ] ); $I->seeResponseContainsJson( [ 'error' => [ 'code' => 'missingtitle' ] ]); } class="o">}

It's not too bad, but I find YAML more convenient for representing JSON structures and HTTP headers.

Also, codeception produces over 5000 (!) lines of generated code as scaffolding around this.

Codeception is pretty flexible. I'm wondering how hard it would be to implement the functionality we want for phester (including YAML best test specs) into codeception.

If we were to use codeception for other things as well (perhaps instead of selenium, or as a wrapper around phpunit, both of which it supports), this would make sense. But if all we want to do is validate HTTP request/response pairs, codeception seems to be overkill, and gets in the way more than it helps. And I'm not sure how easy (or hard) it would be to make it play nicely with tests defined by extensions.

Also, codeception doesn't seem a good fit for monitoring live services. Though it's probably possible.

I'd love to hear your impression. Can you start by putting comments into the matrix? Or values, if you feel like it.

It’s on my TODO list :)

Meanwhile you could have a look at https://github.com/deminy/behat-rest-testing

In T222100#5308491, @kostajh wrote:

Meanwhile you could have a look at https://github.com/deminy/behat-rest-testing

Thanks, played a bit with it. Here's the CRUD test in Behat:

P8716 API CRUD test in Behat

1	Feature: action API
2	In order to confidently refactor code
3	as a developer
4	I want to see if the action API works as expected
5
6	Scenario: CRUD
7	When I send a POST request to "/api.php?action=edit&format=json" with form data:
8	"""
9	title=BehatTest
10	createonly=1
11	summary=testing
12	text=some+text
13	token=%2B%5C
14	"""
15	Then response code should be 200
16	And field "edit/result" in the response should be "Success"
17
18	When I send a GET request to "/api.php?action=parse&page=BehatTest&format=json"
19	Then response code should be 200
20	And the response should contain "some text"
21
22	When I send a POST request to "/api.php?action=edit&format=json" with form data:
23	"""
24	title=BehatTest
25	summary=testing
26	text=different+text
27	token=%2B%5C
28	"""
29	Then response code should be 200
30	And field "edit/result" in the response should be "Success"
31
32	When I send a GET request to "/api.php?action=parse&page=BehatTest&format=json"
33	Then response code should be 200
34	And the response should contain "different text"
35
36	When I send a POST request to "/api.php?action=delete&format=json" with form data:
37	"""
38	title=BehatTest
39	token=%2B%5C
40	"""
41	Then response code should be 200
42	And field "edit/result" in the response should be "Success"

It's pretty compact, but only because I added logic to the RestContext.

Encoding the POST data as a string is awkward, especially because it requires manual URL encoding, but this could probably be fixed. There actually is a version that uses table syntax, but that has JSON encoding for the POST body hard coded, so I didn't use it.

Overall, the Cucumber approach of matching natural language with regular expressions and mapping that to PHP code seems error prone for the use case of API tests. Does the response should contain "different text" do a substring match or regular expression? Is it case sensitive? How is I send a POST request to "/api.php?action=edit&format=json" with form data from I send a POST request to "/api.php?action=edit&format=json" with values?

While being able to read the scenarios as English sentences is nice, it hides what'S actually going on below. When testing an API, the high level "behavior" is not the only thing under test. The other thing is compliance in the nitty gritty. Cache control headers, content negotiation all that. It's possible to do all that in Cucumber, but the extra layer of indirection seems to get in the way more than it is helpful. But maybe it's just a matter of getting used to it?...

One thing that isn't clear to me is how I'd pass variables within a scenario. E.g. after I created a page, I want to extract the page's ID from the response and use it in the next step of the scenario. The only option I found was using state in the context object. That's fine for a login or something, but different scenarios may need completely different things to be passed between steps. If that would require a specialized Context class, that would be a show stopper, I'm afraid.

WDoranWMF moved this task from Doing to Team 3 on the Platform Team Workboards board.Jul 5 2019, 5:38 PM

WDoranWMF edited projects, added Platform Team Workboards (Team 3); removed Platform Team Workboards (Doing).

WDoranWMF moved this task from Backlog to Doing on the Platform Team Workboards (Team 3) board.

WDoranWMF reassigned this task from CCicalese_WMF to daniel.Jul 5 2019, 5:44 PM

WDoranWMF added a subscriber: CCicalese_WMF.

I found another tool that is relatively close to phester conceptually: htt the HTTP Test Tool (hosted on sourceforge - I didn't know that was still a thing). As far as I can tell, htt was created by the FSF in 2011 and as seen little activity between 2013 and 2019, but had version 2.4 release this year.

Like phester, htt is purely declarative and centered on modeling HTTP requests and responses. It's very low level though, and I don't see anything like fixtures or variables. I don't think it's a viable alternative, but it's similar enough that we should look to it for inspiration and for pitfalls.

In other news, I'm investigating the possibility of making a codeception plugin for phester style tests. Seems quite doable, but I have only just started to poke around.

daniel edited projects, added Platform Engineering (API Integration Tests); removed Platform Engineering (Needs Cleaning - Code Health (TEC13)).Jul 9 2019, 7:07 PM

CCicalese_WMF edited parent tasks, added: T228101: Pick a test runner (our own or an existing one); removed: T219879: Create a test runner for end-to-end API tests (Phester).Jul 15 2019, 8:08 PM

CCicalese_WMF removed a parent task: T219873: Create a suite of end-to-end API test for MediaWiki core.Jul 15 2019, 8:14 PM

CCicalese_WMF added subtasks: T228001: Create an initial set of API integration tests using variables (part 1), T225614: Create an initial set of basic API integration tests.Jul 15 2019, 8:41 PM

CCicalese_WMF moved this task from Doing to Backlog on the Platform Team Workboards (Team 3) board.Jul 15 2019, 9:46 PM

zeljkofilipin unsubscribed.Jul 16 2019, 9:34 AM

WDoranWMF edited projects, added Platform Team Workboards (Architecture Review Workboard); removed Platform Team Workboards (Team 3).Jul 17 2019, 4:42 PM

CCicalese_WMF added a subtask: T228324: Create an initial set of API integration tests using variables (part 2).Jul 17 2019, 6:26 PM

CCicalese_WMF moved this task from Backlog to Next Sprint on the Platform Team Workboards (Architecture Review Workboard) board.Jul 17 2019, 6:30 PM

• Fjalapeno moved this task from Next Sprint to Engineering Tasks Ready for Estimation on the Platform Team Workboards (Architecture Review Workboard) board.Jul 23 2019, 1:40 PM

• Fjalapeno changed the point value for this task from 2 to 1.Jul 23 2019, 1:47 PM

• Fjalapeno moved this task from Engineering Tasks Ready for Estimation to Ready on the Platform Team Workboards (Architecture Review Workboard) board.Jul 23 2019, 1:57 PM