Page MenuHomePhabricator

Consider moving the complex logic in ApiPerformTest to a job, and instead reply immediately, firing the job on failure
Open, MediumPublic

Description

Right now, when a user views a Function, Implementation, or Test case page, the Vue UX tries to show the relevant test results. It does this by firing an API request to ApiPerformTest, which then replies synchronously with a blob of JSON with the results.

In the happy path, this is pulled straight from the DB cache, and results very swiftly (a few ms).

In the sad path, where some or all of the cached results aren't available, the API has to do the work to walk the relevant trees and trigger the function calls to determine the test results (which then get cached for future calls) and splice them into whatever cached results it already has, and return this.

This has several problems:

  • It's complicated logic that's not ideal inside an API.
  • Multiple users can theoretically end up triggering the same work in parallel, as there's no de-duping of the Action API.
  • The spans of function calls to trigger can be significant, and be slow, tying up a PHP process that could otherwise be doing work.
  • Each function call is already resource-intensive on the MW side (see T338242: Use Guzzle postAsync() rather than post() in OrchestratorRequest); the overarching test API being synchronous adds another process to this.
  • These calls can timeout, and in some circumstances (running on unpublished revisions), the work done so far can be lost, uncached.
  • As this gets triggered on page view, which may be very soon after publication, it's plausible that the orchestrator hasn't yet received its copy of the edited Objects, leading to Object cache misses there and general load/slowness.

We proposed moving the workload from ApiPerformTest into an asynchronous MediaWiki job that would run on the job runners to trigger the sets of jobs of each pair of function calls for the test case and the validation step. ApiPerformTest would then always return immediately, but sometimes with missing content; the front-end JS would partially populate results as they were available, and would call the API again after some timeout when more content might be available.

By carefully crafting the jobs and setting the right flags, we could probably use MediaWiki's parameter job de-duping to ensure we only run one try of each revision of each Function's Implementation/Test case pair at once. We might also want to tie it to the completion of the persistToCache() step, so that the orchestrator already has the new or altered Object cached before the requests come in.

Event Timeline

Perhaps we should have an indicator on the frontend somewhere in the tests table saying: 'we are gathering all tests results, this may take a while...' or a skeleton loader per item: anyway To design.