Page MenuHomePhabricator

mwcore-phpunit-coverage-master times out after 5 hours
Open, Needs TriagePublic

Description

From Icinga:

<wmf-insecte> Project mwcore-phpunit-coverage-master build #168: STILL FAILING in 5 hr 0 min:
 https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/168/

From the build time history https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/buildTimeTrend it seems to take way more time whenever the build is scheduled on integration-slave-docker-1043.integration.eqiad.wmflabs

Build#DurationAgent
Failed#1685 hr 0 minintegration-slave-docker-1043
Failed#1675 hr 0 minintegration-slave-docker-1043
Success#1663 hr 40 minintegration-slave-docker-1052
Failed#1655 hr 0 minintegration-slave-docker-1043
Success#1643 hr 35 minintegration-slave-docker-1041
Success#1633 hr 36 minintegration-slave-docker-1051
Success#1624 hr 5 minintegration-slave-docker-1040
Success#1613 hr 49 minintegration-slave-docker-1054
Success#1603 hr 32 minintegration-slave-docker-1059
Success#1593 hr 32 minintegration-slave-docker-1041
Success#1583 hr 59 minintegration-slave-docker-1040
Success#1573 hr 40 minintegration-slave-docker-1051
Success#1563 hr 34 minintegration-slave-docker-1041
Success#1553 hr 57 minintegration-slave-docker-1048
Success#1543 hr 49 minintegration-slave-docker-1054
Success#1533 hr 41 minintegration-slave-docker-1050
Success#1523 hr 36 minintegration-slave-docker-1048
Success#1513 hr 44 minintegration-slave-docker-1050
Success#1503 hr 32 minintegration-slave-docker-1050
Success#1493 hr 38 minintegration-slave-docker-1041
Success#1483 hr 35 minintegration-slave-docker-1051
Success#1473 hr 35 minintegration-slave-docker-1041
Success#1463 hr 48 minintegration-slave-docker-1058
Success#1453 hr 32 minintegration-slave-docker-1041
Success#1443 hr 34 minintegration-slave-docker-1041
Success#1433 hr 50 minintegration-slave-docker-1050
Success#1423 hr 45 minintegration-slave-docker-1054
Failed#1415 hr 0 minintegration-slave-docker-1043
Success#1403 hr 27 minintegration-slave-docker-1040
Success#1382 hr 48 minintegration-slave-docker-1050
Success#1373 hr 15 minintegration-slave-docker-1041
Success#1362 hr 44 minintegration-slave-docker-1050
Success#1353 hr 52 minintegration-slave-docker-1050
Success#1343 hr 43 minintegration-slave-docker-1041
Success#1334 hr 18 minintegration-slave-docker-1040
Success#1323 hr 43 minintegration-slave-docker-1048
Success#1313 hr 44 minintegration-slave-docker-1052
Success#1303 hr 37 minintegration-slave-docker-1052
Success#1293 hr 51 minintegration-slave-docker-1058
Success#1283 hr 32 minintegration-slave-docker-1051
Success#1274 hr 19 minintegration-slave-docker-1043
Success#1262 hr 42 minintegration-slave-docker-1058
Success#1252 hr 58 minintegration-slave-docker-1041
Success#1242 hr 40 minintegration-slave-docker-1050
Success#1233 hr 3 minintegration-slave-docker-1041
Success#1222 hr 46 minintegration-slave-docker-1050
Success#1212 hr 45 minintegration-slave-docker-1052
Success#1202 hr 52 minintegration-slave-docker-1054
Success#1193 hr 4 minintegration-slave-docker-1040
Success#1182 hr 38 minintegration-slave-docker-1041
Success#1172 hr 38 minintegration-slave-docker-1050
Success#1162 hr 47 minintegration-slave-docker-1051
Success#1152 hr 45 minintegration-slave-docker-1051
Success#1144 hr 5 minintegration-slave-docker-1043
Success#1132 hr 47 minintegration-slave-docker-1054
Success#1122 hr 37 minintegration-slave-docker-1048
Success#1112 hr 28 minintegration-slave-docker-1050
Success#1102 hr 26 minintegration-slave-docker-1041
Success#1092 hr 35 minintegration-slave-docker-1054

Event Timeline

hashar created this task.Sep 12 2019, 8:09 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 12 2019, 8:09 AM
hashar updated the task description. (Show Details)Sep 12 2019, 8:12 AM

Mentioned in SAL (#wikimedia-releng) [2019-09-12T08:14:08Z] <hashar> Marking integration-slave-docker-1043 offline , triggering mwcore-phpunit-coverage-master again # T232706

Mentioned in SAL (#wikimedia-releng) [2019-09-13T12:39:51Z] <hashar> integration-slave-docker-1043 : killed stall container # T232706

hashar closed this task as Resolved.Sep 13 2019, 12:41 PM
hashar claimed this task.

There was a stall container on integration-slave-docker-1043 apparently using too much CPU. I have killed it that should help.

hashar reopened this task as Open.Fri, Sep 27, 8:11 AM

https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ still times out from time to time depending on the underlying host executing the job and its workload...

StatusBuildDurationInstance
Failed#1995 hr 0 minintegration-agent-docker-1009
Success#1984 hr 28 minintegration-agent-docker-1005
Failed#1975 hr 0 minintegration-agent-docker-1006
Success#1964 hr 39 minintegration-agent-docker-1006
Failed#1955 hr 0 minintegration-agent-docker-1002
Success#1944 hr 37 minintegration-agent-docker-1011
Success#1934 hr 42 minintegration-agent-docker-1006
Success#1924 hr 53 minintegration-agent-docker-1003
Success#1913 hr 29 minintegration-slave-docker-1059
Success#1903 hr 39 minintegration-slave-docker-1048
Success#1893 hr 44 minintegration-slave-docker-1054
Success#1883 hr 31 minintegration-slave-docker-1058
Success#1873 hr 28 minintegration-slave-docker-1058
Success#1863 hr 48 minintegration-slave-docker-1052
Failed#1855 hr 0 minintegration-slave-docker-1043

And the suite also list a bunch of very slow tests:

You should really fix these slow tests (>50 ms)

152277 msApiQuerySiteinfoTest:testContinuation (fixed T234016)
223437 msApiMoveTest:testMoveSubpages
323395 msRefreshLinksPartitionTest:testRefreshLinks with data set #0
422151 msApiQueryContinueTest:testGen2Prop2List1Meta
521176 msApiQueryContinueTest:testSameGenList
620822 msApiQueryContinueTest:testGen2Prop
720026 msMWDebugTest:testAppendDebugInfoToApiResultXmlFormat
815395 msDatabaseSQLTest:testInsertSelectBatching
915351 msMovePageTest:testMoveSubpagesIfAllowed
1015017 msMovePageTest:testMoveSubpages
1114036 msApiQueryContinueTest:testSameGenAndProp
1213151 msApiQueryPrefixSearchTest:testOffsetContinue with data set "no offset"
1312404 msMediaWiki\Auth\AuthManagerTest:testAutoAccountCreation
1412275 msSpecialPageFatalTest:testSpecialPageDoesNotFatal with data set "Version"
1510898 msApiMainTest:testApiNoParam
1610705 msSpecialPageFatalTest:testSpecialPageDoesNotFatal with data set "Preferences"
1710572 msDefaultPreferencesFactoryTest:testIntvalFilter
1810195 msApiQueryContinueTest:testGen1Prop1List
1910171 msApiQueryPrefixSearchTest:testOffsetContinue with data set "with offset"
2010007 msSpecialPreferencesTest:testT43337
219867 msSpecialPageFatalTest:testSpecialPageDoesNotFatal with data set "Recentchanges"
229848 msApiQueryPrefixSearchTest:testOffsetContinue with data set "past end, no offset"
239809 msMovePageTest:testTitleMoveCompleteIntegrationTest
249582 msApiQueryPrefixSearchTest:testOffsetContinue with data set "past end, with offset"
259111 msApiQueryPrefixSearchTest:testValidCovers
268953 msSpecialPageFatalTest:testSpecialPageDoesNotFatal with data set "PasswordPolicies"
278934 msLinksUpdateTest:testOnAddingAndRemovingCategoryToTemplates_embeddingPagesAreIgnored
288675 msSpecialPageFatalTest:testSpecialPageDoesNotFatal with data set "Listgrants"
298586 msApiMoveTest:testMoveTalk
308365 msArticleTablesTest:testTemplatelinksUsesContentLanguage
318351 msApiRevisionDeleteTest:testUnhidingOutput
328078 msApiRevisionDeleteTest:testPartiallyBlockedPage
338020 msWikiPageMcrWriteBothDbTest:testDoRollback
347955 msWikiPageMcrDbTest:testDoRollback
357841 msWikiPagePreMcrDbTest:testDoRollback
367829 msApiRevisionDeleteTest:testHidingRevisions
377551 msGlobalWithDBTest:testWfIsBadImage with data set "No context page"
387547 msImportLinkCacheIntegrationTest:testImportForImportSource
397480 msWikiPageMcrReadNewDbTest:testDoRollback
407475 msSpecialPageFatalTest:testSpecialPageDoesNotFatal with data set "Listgrouprights"
417250 msMessageCacheTest:testLoadFromDB_fetchLatestRevision
427193 msSearchEngineTest:testFiltersMissing
437188 msMovePageTest:testMove with data set "Move between invalid names"
447186 msApiQueryContinueTest:testGen1Prop
457161 msWikiPageNoContentModelDbTest:testDoRollback
467110 msMovePageTest:testMoveAbortedByTitleMoveHook
477056 msMovePageTest:testMove with data set "Aborted by hook"
486869 msApiMoveTest:testMoveTalkFailed
496854 msSpecialPageFatalTest:testSpecialPageDoesNotFatal with data set "Newpages"
506851 msMovePageTest:testMove with data set "Doubly aborted by hook"
hashar added a comment.Wed, Oct 2, 5:28 PM

There is another issue, the builds that run on Stretch based VMs are slower than the one running on Jessie VMs. That also affects other jobs so there is some infrastructure issue that has to be figured out.