Page MenuHomePhabricator

GenerateEntityDiffPatchOperationUnitTest fails randomly
Closed, ResolvedPublic

Description

18:28:30 - should break the pipeline *** FAILED ***
18:28:30   Unsatisfied expectation:
18:28:30   
18:28:30   Expected:
18:28:30   inAnyOrder {
18:28:30     <mock-1> WikibaseEntityRevRepositoryTrait.getEntityByRevision(Q1, 1) once (called once)
18:28:30     <mock-1> WikibaseEntityRevRepositoryTrait.getEntityByRevision(Q1, 2) once (never called - UNSATISFIED)
18:28:30   }
18:28:30   
18:28:30   Actual:
18:28:30     <mock-1> WikibaseEntityRevRepositoryTrait.getEntityByRevision(Q1, 1) (GenerateEntityDiffPatchOperationUnitTest.scala:21)

AC:

  • the build runs consistently

Side: S

Event Timeline

dcausse created this task.Jul 13 2020, 9:08 AM
Restricted Application added a project: Wikidata. · View Herald TranscriptJul 13 2020, 9:08 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

In:

val from: Future[Iterable[Statement]] = fetchAsync(item, fromRev)
val to: Future[Iterable[Statement]] = fetchAsync(item, revision)

val awaiting: Future[Unit] = from flatMap { fromIt =>
  to map {
    toIt => generateAndSendDiff(op, item, fromIt, toIt, revision, fromRev, origMeta, collector)
  }
}

Await.result(awaiting, Duration.Inf)

Await.result will return as soon as the first failure occurs, this is not what the test expects.
The test expects all future to be completed.
The reason we want all futures to be completed is that we'd like to keep the number of free threads in the ExecutorService to be 2 so that the next message never waits previous fetches.

dcausse added a comment.EditedJul 14 2020, 1:54 PM

Wanting the Executor to free of any previous task is perhaps something we want to reconsider. Ideally when we fetch 2 revisions at the same time if one fetch fails we want to fail (interrupt) the second fetch as soon as possible. Interrupting a thread while http client is running a request might not be the best idea as it might invalidate its underlying resources.

I suggest leaving the second fetch running while we propagate the failure (FailedOp) to the downstream operators. But in order to not penalize the next message I suggest increasing the thread pool with a couple more threads, possibly using directly ThreadPoolExecutor with e.g. corePoolSize=2, maximumPoolSize=12, keepAliveTime=5minutes, workQueue=new LinkedBlockedQueue(), threadFactory=the one already created.

The flaky test "an unexpected failure fetching one of the two revisions" should "break the pipeline" in
GenerateEntityDiffPatchOperationUnitTest will now have to stop expecting that the second call getEntityByRevision("Q1", 2) is always called but also accepts that this call can be omitted (possibly using scalamock noMoreThanOnce).

dcausse updated the task description. (Show Details)Jul 14 2020, 2:01 PM

Change 614791 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikidata/query/rdf@master] fix randomly failing test

https://gerrit.wikimedia.org/r/614791

Change 614791 merged by jenkins-bot:
[wikidata/query/rdf@master] fix randomly failing test

https://gerrit.wikimedia.org/r/614791

dcausse assigned this task to Mstyles.Jul 21 2020, 7:49 AM
dcausse triaged this task as Medium priority.
dcausse moved this task from Scaling to Current work on the Wikidata-Query-Service board.
dcausse moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Gehel closed this task as Resolved.Aug 17 2020, 12:44 PM