Page MenuHomePhabricator

Ensure we're testing appropriately and not over-testing across Wikimedia-deployed code
Open, Needs TriagePublic0 Estimated Story Points


Several things are true:

  • We want to be confident that we can deploy "master" of our code to production.
  • We want to cross-check our various deployed code bases with each other prior to them hitting production.
  • We want "gating" code checks to not take too long (~10 mins total), as that reduces developer productivity.
  • We want to be able to merge patches quickly and with confidence, especially for incident response.

This means we want all our code to be covered "adequately" by tests, that we test all our extensions / skins / etc. together, and that these tests not take forever. These are in moderate opposition to each other.

As of mid-April 2019, the interlock "gate" only covers 31 of the 190 extensions and skins in production, and yet is already far too slow (~25 minutes for a full test run; 12 minutes for the wmf-quibble-core-vendor-mysql-hhvm-docker job, of which 5 minutes is the actual PHP unit run). Requests to add new, critical extensions to the gate (like Scribunto in T125050) are objected to on the grounds that this would be the straw that would break the camel's back. Even with no further additions of new code bases, the slowness of gate increases as teams increase the depth and breadth of their code coverage, either as maintenance and response to discovered bugs, or particularly as they add features and configuration options.

Very roughly, we can approximate that every test you write steals time from every other developer, so you should have a good reason to add that twelfth edge case test.

From the most recent merge of MediaWiki core's PHP unit test report numbers, we have 77 test classes that take more than 1000ms, an entirely arbitrary round number:

ClassDuration ↑FailSkipPassTotal
CirrusSearch\SearcherTest1 min 18 sec0044
CirrusSearch\SearcherTest::testSearchText1 min 18 sec0012031203
Wikibase\Repo\Tests\Store\Sql\WikiPageEntityMetaDataLookupTest9.7 sec003636
Babel\Tests\BabelTest9.4 sec0088
Wikibase\Repo\Tests\Api\ApiUserBlockedTest8.6 sec0033
Wikibase\Lib\Tests\SimpleCacheWithBagOStuffTest8 sec003535
SpecialPageFatalTest7.5 sec0022
SpecialPageFatalTest::testSpecialPageDoesNotFatal7.5 sec00193193
Wikibase\Repo\Tests\Api\ApiUserBlockedTest::testBlock7.4 sec001818
CirrusSearch\LanguageDetectTest6.9 sec0033
CirrusSearch\LanguageDetectTest::testTextCatDetector6.8 sec0099
EchoDiscussionParserTest6.3 sec001111
Wikibase\Lib\Tests\Store\Sql\SqlEntityInfoBuilderTest5.9 sec002121
Wikibase\Repo\Tests\Store\StoreTest5.7 sec0033
Wikibase\Repo\Tests\Store\StoreTest::testRebuild5.7 sec0011
Wikibase\Repo\Tests\Api\EditEntityTest5.4 sec001010
AbuseFilterConsequencesTest5 sec0022
ResourcesTest5 sec0088
Wikibase\Repo\Tests\Actions\EditEntityActionTest3.9 sec0033
Wikibase\Repo\Tests\Api\GetEntitiesTest3.8 sec0055
ResourcesTest::testFileExistence3.7 sec0029472947
CirrusSearch\Maintenance\ScriptsRunnableTest::testScriptCanBeLoaded3.7 sec001414
Wikibase\Repo\Tests\Api\SetClaimTest3.7 sec0088
Wikibase\Repo\Tests\Api\GetEntitiesTest::testGetEntities3.5 sec00168168
Wikibase\Repo\Tests\Store\Sql\WikiPageEntityStoreTest3.4 sec002727
MessageIndexTest3.2 sec0022
MessageIndexTest::testMessageIndexImplementation3.2 sec0044
Wikibase\Repo\Tests\Api\SetSiteLinkTest3.2 sec0077
Wikibase\Repo\Tests\Api\EditEntityTest::testEditEntity3.1 sec002828
Wikibase\Repo\Tests\Api\SetAliasesTest3 sec001313
AutoLoaderStructureTest3 sec0044
EchoDiscussionParserTest::testGenerateEventsForRevision_mentionStatus3 sec001111
Wikibase\Client\Tests\Usage\UsageTrackingIntegrationTest3 sec0066
EchoDiscussionParserTest::testGenerateEventsForRevision2.7 sec0088
Wikibase\Repo\Tests\Api\ApiXmlFormatTest2.7 sec001212
AbuseFilterTest2.4 sec0044
Wikibase\Repo\Tests\Api\SetSiteLinkTest::testSetSiteLink2.3 sec001414
Wikibase\Repo\Tests\Specials\SpecialNewItemTest2.2 sec001010
Wikibase\Lib\Tests\Formatters\ItemPropertyIdHtmlLinkFormatterTest2 sec002222
Wikibase\Repo\Tests\Specials\SpecialNewPropertyTest2 sec0055
Wikibase\Repo\Tests\Api\BotEditTest2 sec0033
Wikibase\Repo\Tests\Api\BotEditTest::testBotEdits1.9 sec001414
Wikibase\Repo\Tests\Api\FormatSnakValueTest1.9 sec0044
Wikibase\Repo\Tests\Api\FormatSnakValueTest::testApiRequest1.8 sec001515
Wikibase\Repo\Tests\Api\SetDescriptionTest1.6 sec0088
Wikibase\Repo\Tests\Api\RemoveClaimsTest1.6 sec0044
Wikibase\Repo\Tests\Api\SetAliasesTest::testSetAliases1.6 sec001414
AbuseFilterTest::testGenerateTitleVars1.5 sec004242
AbuseFilterConsequencesTest::testFilterConsequences1.5 sec001616
ExtensionJsonValidationTest1.5 sec0011
ExtensionJsonValidationTest::testPassesValidation1.5 sec004343
Wikibase\Repo\Tests\Actions\EditEntityActionTest::testUndoRevisions1.5 sec0066
Wikibase\Repo\Tests\Actions\EditEntityActionTest::testUndoSubmit1.4 sec002020
Flow\Tests\PermissionsTest1.4 sec0022
Flow\Tests\PermissionsTest::testPermissions1.4 sec00104104
ApiCoreThankIntegrationTest1.4 sec001010
Wikibase\Repo\Tests\Api\PermissionsTest1.4 sec0033
Wikibase\Repo\Tests\Api\SetReferenceTest1.4 sec0066
Wikibase\Repo\Tests\Api\SetLabelTest1.4 sec0077
Wikibase\Lib\Tests\LanguageFallbackChainFactoryTest1.3 sec0044
ApiStructureTest1.3 sec0022
Flow\Tests\Collection\PostCollectionTest1.2 sec0088
LanguageSearchTest::testSearch1.2 sec001616
Wikibase\Repo\Tests\Store\Sql\WikiPageEntityRedirectLookupTest1.2 sec0088
Wikibase\Lib\Tests\LanguageFallbackChainFactoryTest::testNewFromLanguage1.2 sec002525
Wikibase\Repo\Tests\Api\RemoveReferencesTest1.2 sec0044
Babel\Tests\BabelTest::testGetUserLanguageInfo1.1 sec0022
Wikibase\Repo\Tests\Api\SetQualifierTest1.1 sec0033
Babel\Tests\BabelTest::testGetUserLanguages1.1 sec0022
Wikibase\Lib\Tests\Store\Sql\SqlEntityInfoBuilderTest::testGivenInvalidArguments_constructorThrowsException1.1 sec0055
ApiMobileViewConvertTitleTest1 sec0055
Babel\Tests\BabelTest::testRenderDefaultLevelNoCategory1 sec0022
Wikibase\Repo\Tests\Api\RemoveQualifiersTest1 sec0044
Wikibase\Repo\Tests\Store\Sql\PropertyInfoTableBuilderTest1 sec0044
Wikibase\Repo\Tests\Store\WikiPageEntityStorePermissionCheckerTest1 sec0044
Babel\Tests\BabelTest::testRenderDefaultLevel1 sec0022
Wikibase\Repo\Tests\Api\SetDescriptionTest::testSetDescription1 sec0099

Obviously these are not directly comparable for a variety of reasons:

  • These are shown based on class; some extensions, like Cirrus, merge all their tests into few classes, whereas others like Wikibase spread them out, making Cirrus look massive and flattering the huge number of tests that it creates.
  • Individual tests can be trivial (e.g. ResourcesTest::testFileExistence runs all 2947 file checks in 3.7 sec) or very complex (the solitary Wikibase\Repo\Tests\Store\StoreTest::testRebuild takes 5.7 sec).
  • Some tests are for features/behaviour that is particularly critical and it's reasonable to test every edge case we can think of; similarly, some tests are in areas where we've had repeated major issues, and we want to avoid further regressions.
  • Some tests are driven by use (e.g. ResourcesTest is a structure test for files registered by extensions and skins, and though might well be made more efficient is mostly driven by the rest of us).

However, it's worth reflecting on whether we're over-testing in the areas most likely to break and most important for users.

Event Timeline

Ensure we're testing appropriately and not over-testing across Wikimedia-deployed code

Does this task encompass Selenium as well? I had not looked closely at the Selenium tests until recently, but one thing that struck me as I did was that the purpose of these tests was not entirely clear to me. Given that they are the slowest part of the testing stack and the most error prone (citation needed, but I think that's very likely correct) , I think it would be helpful to document and clarify what the purpose of writing Selenium tests is. For example, maybe they are supposed to cover particular cases that aren't testable via unit/integration tests (e.g. does this JavaScript widget function in a certain way), or are they instead intended to cover general workflow like "Can I edit a page" in which case, maybe we could consider a much faster, non-Selenium based tool (like Behat).

Very roughly, we can approximate that every test you write steals time from every other developer, so you should have a good reason to add that twelfth edge case test.

I hear this but I also would not want to discourage developers from writing tests. Our existing testing infrastructure makes it relatively easy to write time consuming integration tests and writing unit tests is always more work. Once we have a clearer separation of unit tests and integration tests (coming soon!) I think it will be easier to encourage writing more of the former and being judicious about the latter.

Change 587883 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/CirrusSearch@master] [DNM] Only run Searcher & LanguageDetect tests in this repo, not in gate

Change 588488 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] layout: [mediawiki/extensions/CirrusSearch] Support standalone

Change 588488 merged by jenkins-bot:
[integration/config@master] layout: [mediawiki/extensions/CirrusSearch] Support standalone

I've added some documentation about the @group feature, with the warning about Standalone prominent; would appreciate people more expert in writing dev-facing documentation could improve it though.

Change 587883 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Only run certain big, isolated tests in this repo, not in gate