Page MenuHomePhabricator

Wikimedia Technical Conference 2019 Session: The test pyramid and speeding up your results
Closed, ResolvedPublic

Description

Session

  • Track: Testing
  • Topic: The test pyramid and speeding up your results

Description

Testing at different levels of the test pyramid (unit, integration, and system level) come with benefits and trade-offs. One of the primary trade-offs is time to execution; as you get higher in the pyramid the tests take longer to complete. This session will describe current practices in splitting tests into their levels and future work on improving test execution time (NB: from the perspective a test writer, not infrastructure).

Questions to answer and discuss

Question: Is the test pyramid concept valuable to Wikimedia and if so, why?
Significance: Depending on the consensus on this we may move the discussion in different directions.

Question: When we talk about the different levels of the test pyramid, is that easy to visualize in MediaWiki core's tests? What about extension/skin tests?
Significance:

Question: How often do you write unit tests? Integration tests? System tests? Why / why not? What are the pain points of each, are there things we can change in our tooling to help?

Related Issues

  • ...
  • ...

Optional pre-reading for all Participants

Optional pre-session activity

Bonus points if you spend 10 minutes analyzing an extension or a subsystem of core and attempt to figure out what the "test pyramid" (or is it an ice cream cone?) looks like for that extension/subsystem.

Remote participation

Please add your responses to the questions via the comments on this task. Also happy to add more discussion questions to guide this session.


Notes document(s)

https://etherpad.wikimedia.org/p/WMTC19-T234637

Notes and Facilitation guidance

https://www.mediawiki.org/wiki/Wikimedia_Technical_Conference/2019/NotesandFacilitation


Session Leader(s)

Session Scribes

Session Facilitator

Session Style / Format

  • [what type of format will this session be?]

Slides:

Post-event summary:

  • In general, everybody agreed that the Test Pyramid is a reasonable model to use.
    • That being said, there will and should probably be some variability in how the pyramid is applied.
    • The use of the Testing Pyramid is not intended to demonize any area of testing.
      • All three levels of testing have their purpose, strengths, and weaknesses.
  • From the repos that were surveyed, many do actually fit the pyramid model today.
  • There's been some good progress splitting unit and integration tests apart.
    • Although, it's actually really unit test vs non unit tests. Meaning, not all integration tests are actually that.
  • Broader understanding and education is needed.
    • traditional means of educating though are much less successful in our environment than tutorials and other forms of contextual knowledge sharing.

Post-event action items:

  • More propagation of this model and general how-tos
  • Investigate other means of education

Notes:

Wikimedia Technical ConferenceAtlanta, GA USA
November 12 - 15, 2019

Session Name / Topic
The test pyramid and speeding up your results
Session Leader: Kosta + Jean-Rene; Facilitator: Deb; Scribe: Will
https://phabricator.wikimedia.org/T234637

Session Attendees
Lars, JR, Kosta, Timo, James, Will, Gergo, Elena, DJ, Neslihan, Fisch,
Jelko, Cormac

Summary ACTION items:

  • ACTION: Explore our current practices for publishing a dashboard of testing-related data (coverage, speed, etc).
  • ACTION: Using the above dashboard, celebrate positive outcomes/people/teams.
  • ACTION: Explore the creation of a periodic email update on the impact and benefits of past efforts by teams/individuals/etc

Notes:

  • Lowest layer: Unit, Middle layer: Integration, Peak layer: E2E
  • Topics:
    • What is the overall goal?
    • How does this affect the build time for the software?
    • How does this impact the type of code and architecture patterns developed given types of tests?
  • The two highest layers can be the most straightforward to write as unit tests can affect how code is created
  • Organisation thatt have the ice cream cone model, may have invested in test engineers which leads to fewer tests being written
  • Q: Isn\'t there an approach where e2e is considered the most key?
  • A: If things change at the unit or root level they are closely to what needs to be fixed
  • Are you familiar with the test pyramid?
    • Majority yes
  • Do you write:
    • Unit: many
    • Integration: 50%
    • e2e: 3 people out of 12
  • Do you think there is a clear distinction in the abstract between unit and integration?
    • Yes
  • Do you think there is a clear distinction in MW between unit and integration?
    • Yes: 3/12
    • No 2/12
    • Timo: There is no clear distinction in how developers write the unit tests.
      • Most extreme, is where you are only interacting with methods from a single class, only lines of code from a given block may execute
      • Testing as a real instantiation but you control
      • Everything is essentially running but you are only testing against a specific class
  • \[Interactive exercise \-- take an extension or project and look at its tests.\] Do the tests for the code you work on follow the test pyramid structure?
    • Yes 6/12
    • No 2/12
  • What methodologies did you use to investigate?
    • Wikibase media info
      • Has some selenium tests, we counted 3 files
      • The repo is being slowly split into integration and unit
      • There are 3 files for explicit integration tests
      • There are about 30 files inthe unit domain
    • Was this difficult or easy?
      • Pretty straightforward
  • Fileimporter
    • A lot of php unit tests
    • Just started moving them to the right dirs so it was hard to figure out what were unit or integration just from looking
    • Ran php unit test, around 600 tests
      • Most of them are real tests
    • We have 3 or 4 real integration test
    • We have 1 quite basic e2e tests
    • Ease?
      • Relatively straightforward
      • It was doable from reading and interpreting
  • Growth experiments
    • 3 sub dirs in test
    • Couldn\'t work out what type of test those are
    • PHP unit has integration and unit folders, roughly look the right shape
    • Some selenium tests
    • Roughly looks like the triangle
  • Recent changes
    • Counted the test methods rather than class, 40 unit, 5 integration and 1 selenium
  • Android App
    • 8 files for unit, 1 for integration 0 for e2e
  • Ease?
    • It would be nice to have a clearer report
  • Can we use the jenkins output to gather this information and have a more structured report
  • What are the strengths and weakness of following the test pyramid model?
    • The time cost invloved
    • The feedback on progress when running it locally is very effective
    • e2e tests are fragile and the size grows exponential
      • Alternate: Worked somewhere that was ice cream cone but it worked well as releases where slower
    • With e2e it doesn\'t isolate at all
    • With code to be refactored that has no tests you might start with the ice cream cone and work toward the pyramid
    • If you have integration and unit tests, e2e should be a final confirmation tests
      • If you have neither then starting from the top makes sense
    • e2e also make sense when you separate systems that work well independently and whose unit tests pass but fail together
    • e2e tests should be cleaned up
    • Strength of the pyramid is the feedback level, unit is seconds, integration in 10s of seconds, e2e can be a minute per test
    • Weakness, it requires writing infinite tests to reach 100%, there is aiming for 100% versus pausing all development to get there
    • You need the code base to support the model
    • Major strength of e2e is how does the whole system run together
  • What are the blockers and challengers to writing tests at each level (unit, integration, e2e)?
    • Not used to writing these tests, for example, low familiarity with e2e here
      • You may then avoid writing the tests as they appear like a burden
    • It is hard to write unit tests for code without any, because the code resists
      • If it\'s not designed to be testable, writing tests is hard, in one case having to rewrite the code to make it testable
    • When trying to write selenium tests, it took a lot of time in part because there were missing deps on the CI servers
      • The path to getting it to work was unclear
    • Q: Do we have real integrations tests in MW?
    • A: we have tests that pre-date our real unit tests
    • Fisch: on my team we are encourages and given time to write tests.
      • Going higher in the pyramid it is not super clear what would be needed to fulfill the higher layer tests
    • System level tests running in environments
    • If we drive coverage on unit and integration test, the quality of the testing is more relevant than the quality of the test code
    • Even if you have the frameworks and structures, it can be unclear when and why to use the tests
      • Why are you testing that button, is it because you have inherrent knowledge about its significance
      • If you have a wall between writing the code and writing the test that makes it difficult
    • Experience levels different across team, meaning that some team members end up responsible for tests
  • What can we do to educate / promote/ improve tooling or code?
    • Annual survey :P
    • Vote for the selenium unconf versions: Zejlko is willing to pair and train other teams
    • Zejlko would like feedback on documentation, how it did or did not work?
      • If you can\'t find the docs let me know
      • If they aren\'t comprehensible let me know
    • James: We have a lot of people who aren\'t fulltime or do not have time to join optional trainings
      • \[ACTION\] Shaming works. Publishing stats on level of testing had a noticeable impact
      • \[ACTION\] If we expose a dashboard of a data
      • \[ACTION\] We can also celebrate people
      • \[ACTION\] We can send an update email demonstrating the impact and benefit of the past effort
    • Elena: Outreachy interns faced very minimal obstacles in getting started and running tests, this shows that documentation is working
    • Lars: People could see their own score and the average, rather than being public the information is personal
    • DJ: We have conventions in the past that may make no sense nowadays. How we mark tests as using the DB for example
      • We don\'t give good information on how to mock each service and how you do mock data ffrom each individual service
        • For example, Laravel\'s docs
    • Timo: Anecdotes of success stories tend to work well but not on their own. Usability of the experience for the developer. For example, many people have graphana dashboards but few have logstash boards
      • Is the experience intuitive and rewarding for the developer?
    • Can the experience be gamed? You get two hundred wiki points if you make this change
      • Is there is feedback where you get feedback that you didn\'t make the tests worse from CI
    • \[ACTION\] Initial commit that constructs the basic framing structure of a project was very helpful

Event Timeline

debt created this task.Oct 4 2019, 3:24 PM

We probably all know the "testing pyramid", but what we don't necessarily do is use the pyramid to guide our testing approaches or investments. I'm really interested in this topic and would like to see the concepts be more broadly applied.

kostajh added a subscriber: kostajh.Oct 9 2019, 9:34 AM

If others support the idea, it would be interesting to go beyond speed of execution and also also talk about styles of code (specifically to use dependency injection) which facilitate writing unit tests, and how you can change existing code in core/extensions to be more friendly to unit testing.

Tgr added a subscriber: Tgr.Oct 13 2019, 8:17 PM

... how you can change existing code in core/extensions to be more friendly to unit testing.

Also (and maybe more importantly) how to make it actually happen. Requests for comment/Dependency injection has been accepted in 2015, and we have not seen a whole lot of dependency injection in the wild since then.

greg reassigned this task from greg to kostajh.Oct 16 2019, 3:55 PM
greg updated the task description. (Show Details)

@kostajh has agreed to lead this session.

Would anyone else like to co-lead with him?

I'd be happy to. @kostajh feel free to decline my offer :-)

kostajh updated the task description. (Show Details)Oct 17 2019, 11:32 AM

I'd be happy to. @kostajh feel free to decline my offer :-)

Added you, thanks @Jrbranaa!

debt triaged this task as Medium priority.Oct 22 2019, 6:57 PM
greg added a comment.Oct 23 2019, 9:37 PM

(Programming note)

This session was accepted and will be scheduled.

Notes to the session leader

  • Please continue to scope this session and post the session's goals and main questions into the task description.
    • If your topic is too big for one session, work with your Program Committee contact to break it down even further.
    • Session descriptions need to be completely finalized by November 1, 2019.
  • Please build your session collaboratively!
    • You should consider breakout groups with report-backs, using posters / post-its to visualize thoughts and themes, or any other collaborative meeting method you like.
    • If you need to have any large group discussions they must be planned out, specific, and focused.
    • A brief summary of your session format will need to go in the associated Phabricator task.
    • Some ideas from the old WMF Team Practices Group.
  • If you have any pre-session suggested reading or any specific ideas that you would like your attendees to think about in advance of your session, please state that explicitly in your session’s task.
    • Please put this at the top of your Phabricator task under the label “Pre-reading for all Participants.”

Notes to those interested in attending this session

(or those wanting to engage before the event because they are not attending)

  • If the session leader is asking for feedback, please engage!
  • Please do any pre-session reading that the leader would like you to do.
debt updated the task description. (Show Details)Oct 25 2019, 9:34 PM

If others support the idea, it would be interesting to go beyond speed of execution and also also talk about styles of code (specifically to use dependency injection) which facilitate writing unit tests, and how you can change existing code in core/extensions to be more friendly to unit testing.

I would like to learn more about this. Not just for existing code, but for new code as well. On PHP side I've managed to improve, but on the frontend JavaScript I'm still struggling.

Does the testing pyramid also cover test maintenance costs? We had some browser tests in our extensions in Language team, but eventually they all got abandoned and removed. Now there is high barrier to have any browser tests with the fear it will happen again.

I would like to learn more about this. Not just for existing code, but for new code as well. On PHP side I've managed to improve, but on the frontend JavaScript I'm still struggling.

yes, I think we plan to talk about this in a language agnostic way.

Does the testing pyramid also cover test maintenance costs? We had some browser tests in our extensions in Language team, but eventually they all got abandoned and removed. Now there is high barrier to have any browser tests with the fear it will happen again.

Yes, I would say so. We'll have maintenance as part of the discussion too.

kostajh updated the task description. (Show Details)Nov 8 2019, 10:59 AM
greg updated the task description. (Show Details)Nov 11 2019, 5:02 PM
greg added subscribers: WDoranWMF, TheDJ.
kostajh updated the task description. (Show Details)Nov 14 2019, 10:21 PM
kostajh updated the task description. (Show Details)Nov 14 2019, 10:24 PM
greg updated the task description. (Show Details)Nov 15 2019, 3:15 PM
Jrbranaa updated the task description. (Show Details)Nov 15 2019, 7:31 PM
greg closed this task as Resolved.Dec 17 2019, 10:44 PM

Thanks for making this a good session at TechConf this year. Follow-up actions are recorded in a central planning spreadsheet (owned by me) and I'll begin farming them out to responsible parties in January 2020.