Session
- Track: Testing
- Topic: The test pyramid and speeding up your results
Description
Testing at different levels of the test pyramid (unit, integration, and system level) come with benefits and trade-offs. One of the primary trade-offs is time to execution; as you get higher in the pyramid the tests take longer to complete. This session will describe current practices in splitting tests into their levels and future work on improving test execution time (NB: from the perspective a test writer, not infrastructure).
Questions to answer and discuss
Question: Is the test pyramid concept valuable to Wikimedia and if so, why?
Significance: Depending on the consensus on this we may move the discussion in different directions.
Question: When we talk about the different levels of the test pyramid, is that easy to visualize in MediaWiki core's tests? What about extension/skin tests?
Significance:
Question: How often do you write unit tests? Integration tests? System tests? Why / why not? What are the pain points of each, are there things we can change in our tooling to help?
Related Issues
- ...
- ...
Optional pre-reading for all Participants
Optional pre-session activity
Bonus points if you spend 10 minutes analyzing an extension or a subsystem of core and attempt to figure out what the "test pyramid" (or is it an ice cream cone?) looks like for that extension/subsystem.
Remote participation
Please add your responses to the questions via the comments on this task. Also happy to add more discussion questions to guide this session.
Notes document(s)
https://etherpad.wikimedia.org/p/WMTC19-T234637
Notes and Facilitation guidance
https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/NotesandFacilitation
Session Leader(s)
Session Scribes
- @WDoranWMF
- [name]
Session Facilitator
Session Style / Format
- [what type of format will this session be?]
Slides:
Post-event summary:
- In general, everybody agreed that the Test Pyramid is a reasonable model to use.
- That being said, there will and should probably be some variability in how the pyramid is applied.
- The use of the Testing Pyramid is not intended to demonize any area of testing.
- All three levels of testing have their purpose, strengths, and weaknesses.
- From the repos that were surveyed, many do actually fit the pyramid model today.
- There's been some good progress splitting unit and integration tests apart.
- Although, it's actually really unit test vs non unit tests. Meaning, not all integration tests are actually that.
- Broader understanding and education is needed.
- traditional means of educating though are much less successful in our environment than tutorials and other forms of contextual knowledge sharing.
Post-event action items:
- More propagation of this model and general how-tos
- Investigate other means of education
Notes:
Wikimedia Technical ConferenceAtlanta, GA USA
November 12 - 15, 2019
Session Name / Topic
The test pyramid and speeding up your results
Session Leader: Kosta + Jean-Rene; Facilitator: Deb; Scribe: Will
https://phabricator.wikimedia.org/T234637
Session Attendees
Lars, JR, Kosta, Timo, James, Will, Gergo, Elena, DJ, Neslihan, Fisch,
Jelko, Cormac
Summary ACTION items:
- ACTION: Explore our current practices for publishing a dashboard of testing-related data (coverage, speed, etc).
- ACTION: Using the above dashboard, celebrate positive outcomes/people/teams.
- ACTION: Explore the creation of a periodic email update on the impact and benefits of past efforts by teams/individuals/etc
Notes:
- Lowest layer: Unit, Middle layer: Integration, Peak layer: E2E
- Topics:
- What is the overall goal?
- How does this affect the build time for the software?
- How does this impact the type of code and architecture patterns developed given types of tests?
- The two highest layers can be the most straightforward to write as unit tests can affect how code is created
- Organisation thatt have the ice cream cone model, may have invested in test engineers which leads to fewer tests being written
- Q: Isn\'t there an approach where e2e is considered the most key?
- A: If things change at the unit or root level they are closely to what needs to be fixed
- Are you familiar with the test pyramid?
- Majority yes
- Do you write:
- Unit: many
- Integration: 50%
- e2e: 3 people out of 12
- Do you think there is a clear distinction in the abstract between unit and integration?
- Yes
- Do you think there is a clear distinction in MW between unit and integration?
- Yes: 3/12
- No 2/12
- Timo: There is no clear distinction in how developers write the unit tests.
- Most extreme, is where you are only interacting with methods from a single class, only lines of code from a given block may execute
- Testing as a real instantiation but you control
- Everything is essentially running but you are only testing against a specific class
- \[Interactive exercise \-- take an extension or project and look at its tests.\] Do the tests for the code you work on follow the test pyramid structure?
- Yes 6/12
- No 2/12
- What methodologies did you use to investigate?
- Wikibase media info
- Has some selenium tests, we counted 3 files
- The repo is being slowly split into integration and unit
- There are 3 files for explicit integration tests
- There are about 30 files inthe unit domain
- Was this difficult or easy?
- Pretty straightforward
- Wikibase media info
- Fileimporter
- A lot of php unit tests
- Just started moving them to the right dirs so it was hard to figure out what were unit or integration just from looking
- Ran php unit test, around 600 tests
- Most of them are real tests
- We have 3 or 4 real integration test
- We have 1 quite basic e2e tests
- Ease?
- Relatively straightforward
- It was doable from reading and interpreting
- Growth experiments
- 3 sub dirs in test
- Couldn\'t work out what type of test those are
- PHP unit has integration and unit folders, roughly look the right shape
- Some selenium tests
- Roughly looks like the triangle
- Recent changes
- Counted the test methods rather than class, 40 unit, 5 integration and 1 selenium
- Android App
- 8 files for unit, 1 for integration 0 for e2e
- Ease?
- It would be nice to have a clearer report
- Can we use the jenkins output to gather this information and have a more structured report
- What are the strengths and weakness of following the test pyramid model?
- The time cost invloved
- The feedback on progress when running it locally is very effective
- e2e tests are fragile and the size grows exponential
- Alternate: Worked somewhere that was ice cream cone but it worked well as releases where slower
- With e2e it doesn\'t isolate at all
- With code to be refactored that has no tests you might start with the ice cream cone and work toward the pyramid
- If you have integration and unit tests, e2e should be a final confirmation tests
- If you have neither then starting from the top makes sense
- e2e also make sense when you separate systems that work well independently and whose unit tests pass but fail together
- e2e tests should be cleaned up
- Strength of the pyramid is the feedback level, unit is seconds, integration in 10s of seconds, e2e can be a minute per test
- Weakness, it requires writing infinite tests to reach 100%, there is aiming for 100% versus pausing all development to get there
- You need the code base to support the model
- Major strength of e2e is how does the whole system run together
- What are the blockers and challengers to writing tests at each level (unit, integration, e2e)?
- Not used to writing these tests, for example, low familiarity with e2e here
- You may then avoid writing the tests as they appear like a burden
- It is hard to write unit tests for code without any, because the code resists
- If it\'s not designed to be testable, writing tests is hard, in one case having to rewrite the code to make it testable
- When trying to write selenium tests, it took a lot of time in part because there were missing deps on the CI servers
- The path to getting it to work was unclear
- Q: Do we have real integrations tests in MW?
- A: we have tests that pre-date our real unit tests
- Fisch: on my team we are encourages and given time to write tests.
- Going higher in the pyramid it is not super clear what would be needed to fulfill the higher layer tests
- System level tests running in environments
- If we drive coverage on unit and integration test, the quality of the testing is more relevant than the quality of the test code
- Even if you have the frameworks and structures, it can be unclear when and why to use the tests
- Why are you testing that button, is it because you have inherrent knowledge about its significance
- If you have a wall between writing the code and writing the test that makes it difficult
- Experience levels different across team, meaning that some team members end up responsible for tests
- Not used to writing these tests, for example, low familiarity with e2e here
- What can we do to educate / promote/ improve tooling or code?
- Annual survey :P
- Vote for the selenium unconf versions: Zejlko is willing to pair and train other teams
- Zejlko would like feedback on documentation, how it did or did not work?
- If you can\'t find the docs let me know
- If they aren\'t comprehensible let me know
- James: We have a lot of people who aren\'t fulltime or do not have time to join optional trainings
- \[ACTION\] Shaming works. Publishing stats on level of testing had a noticeable impact
- \[ACTION\] If we expose a dashboard of a data
- \[ACTION\] We can also celebrate people
- \[ACTION\] We can send an update email demonstrating the impact and benefit of the past effort
- Elena: Outreachy interns faced very minimal obstacles in getting started and running tests, this shows that documentation is working
- Lars: People could see their own score and the average, rather than being public the information is personal
- DJ: We have conventions in the past that may make no sense nowadays. How we mark tests as using the DB for example
- We don\'t give good information on how to mock each service and how you do mock data ffrom each individual service
- For example, Laravel\'s docs
- We don\'t give good information on how to mock each service and how you do mock data ffrom each individual service
- Timo: Anecdotes of success stories tend to work well but not on their own. Usability of the experience for the developer. For example, many people have graphana dashboards but few have logstash boards
- Is the experience intuitive and rewarding for the developer?
- Can the experience be gamed? You get two hundred wiki points if you make this change
- Is there is feedback where you get feedback that you didn\'t make the tests worse from CI
- \[ACTION\] Initial commit that constructs the basic framing structure of a project was very helpful