Page MenuHomePhabricator

Performance review of Wikifunctions
Closed, ResolvedPublic

Description

Description

This is a request for a performance review of Wikifunctions from 1-June-2021 (Tuesday) to 14-June-2021 (Monday), inclusive.

The system is composed of a JavaScript Vue client, MediaWiki PHP middleware, and backend Node.js orchestration/evaluation (n.b., user-defined functions will be supported in Node.js and Python in earlier releases of Wikifunctions).

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikiLambda/+/refs/heads/master
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/function-orchestrator/+/refs/heads/master
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/function-evaluator/+/refs/heads/master

Preview environment

It is presently possible to run the Mediawiki extension WikiLambda in MediaWiki-Docker with minimal configuration, and wireup for more replete orchestration and evaluation is in flight.

Additionally, Lucas Werkmeister has in spare time been running an independent system (which is not the target of evaluation, to be clear) - https://notwikilambda.toolforge.org/wiki/Main_Page

Which code to review

We're interested in evaluation of performance across the tiers. It should be noted that we expect this new Wikimedia project's user base to be small at first, and therefore its computational needs comparatively modest at first.

Performance assessment

Q&A:

  • What work has been done to ensure the best possible performance of the feature? So far, we haven't begun optimization for performance.
  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance? Snappy FTUX on the Vue client side app, or at least effective management of user expectations for FTUX on the Vue client side app, type conformance checking on the JSON UGC, and function execution bottlenecks and in-memory or on-disk pressure points.
  • Are there potential optimisations that haven't been performed yet? Yes.
  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org. Not much. We do intend to have various profiling measures in place in the future, as it will likely play into different types of optimization (e.g., caching and parallelism bounds) and we'll need guards in place to monitor and address yet-to-be-defined SLOs; but this is MVP work.

Event Timeline

dr0ptp4kt updated the task description. (Show Details)
Gilles changed the task status from Open to Stalled.Sep 11 2020, 7:42 AM

Marking this as stalled for my own tracking of performance reviews that are currently actionable for us. Please move this back to open once there is something running for us to review.

Bumping this task. We would like to have a performance readiness review performed in Q4 FY 2020-2021. Exact projected requested start date to be determined and agreed upon within the Abstract Wikipedia team.

Heads up @Gilles, request for performance readiness review Q4. As you probably know, the architecture has changed, with the WikiLambda extension (still) in MediaWiki and a service orchestrator / choreographer component in Node.js and programming language evaluator componentry behind that; it may change in some ways although that would likely be after first launch dependent on real world observed workload behavior.

We'll need to update the task Description to better reflect reality. But first, I wanted to get this in the queue for Q4. Looking forward to it!

Sounds good to me, thanks for requesting it early!

dr0ptp4kt renamed this task from Performance review of WikiLambda extension to Performance review of Wikifunctions.May 7 2021, 7:10 PM
dr0ptp4kt changed the task status from Stalled to Open.
dr0ptp4kt triaged this task as High priority.
dr0ptp4kt updated the task description. (Show Details)
dr0ptp4kt updated the task description. (Show Details)

@Gilles I touched up the Description some. As emailed, we'd like review from the 1st of June (Tuesday) to the 14th of June (Tuesday), 2021, inclusive.

Thanks!

dr0ptp4kt updated the task description. (Show Details)
dr0ptp4kt updated the task description. (Show Details)

As promised, I've updated the main docker-compose instructions to hopefully cover both the main testing use case and the likely variants that you'll encounter, and confirmed in a brand-new check-out that it works as I expect, using:

function-orchestrator:
  image: docker-registry.wikimedia.org/wikimedia/mediawiki-services-function-orchestrator:2021-05-26-191706-production
  ports:
    - 6254:6254
function-evaluator:
  image: docker-registry.wikimedia.org/wikimedia/mediawiki-services-function-evaluator:2021-05-27-170058-production
  ports:
    - 6927:6927

Special:CreateZObject page

  • A fair amount of content jumping around can be avoided by matching the no-JS placeholder's sizing to the actual final UI's. This should be optimised for the common case (user has JS). The same is true for Special:EvaluateFunctionCall
  • The language list query doesn't appear to be cached at all (Cache-Control: private, must-revalidate, max-age=0 response). I don't expect the list of languages to change that frequently? Even caching this for a few minutes server-sider and client-side would help.
  • The loaded JS files coming from Ace are uncached a well (mode-json.js theme-chrome.js worker-json.js)

I don't know why Ace needs a service worker at all, but I didn't see an obvious way to avoid it in their documentation. I guess that odd architecture choice is something we'll have to live with as long as Ace is what's used for syntax highlighting.

Orchestrator and evaluator

I benchmarked the orchestrator and evaluator by hammering the API endpoint with ab. There doesn't seem to be any memory leak in either service.

In terms of performance, the drawback of the single-threaded nature of node starts to show pretty quickly as the concurrency increases. With 10 concurrent requests, the median response time is around 600ms. As expected with 100 concurrent requests, it becomes 6s.

Having a dynamically sized pool of evaluator services is desirable for the production deployment. I assume that the orchestrator sends requests to the evaluator asynchronously and as a result shouldn't need to be pooled as well.

The issue might be solvable inside the service as well, if the actual running of functions is done inside a Web Worker instead of directly on the main thread. But attempting to solve this in-service has downsides: less control over the concurrency and observability becomes more complex as well.

I would like to test the timeout abilities, but I really can't figure out the UI in order to create something that would call a custom JS function. Is there a step-by-step tutorial I could follow in order to do that?

Hi, Gilles, a few questions/comments.

I assume that the orchestrator sends requests to the evaluator asynchronously and as a result shouldn't need to be pooled as well

Correct, all communication with the orchestrator and evaluator is asynchronous. Out of curiosity, with these concurrent requests, how much time is spent in the orchestrator vs. the evaluator?

The issue might be solvable inside the service as well, if the actual running of functions is done inside a Web Worker instead of directly on the main thread

I did not know about this! Are you recommending this for the evaluator or the orchestrator?

I really can't figure out the UI in order to create something that would call a custom JS function

Do you want to call this from the wiki or insert your request somewhere deeper in the stack (e.g. orchestrator, evaluator)? If you check out the API sandbox, there is an example request that will let you run JS code. As for a step-by-step tutorial ... hmm, there isn't, but there should be. At present, I am happy to discuss this with you if you like!

I haven't checked what's causing the single-threaded behaviour. I presume it's happening in the evaluator, as you're probably running code on the main thread there?

I've tried modifying the python3 API sandbox example, but I suspect it's actually not fully working for me, as the output values don't have the result of the sum. I get:

{
    "query": {
        "wikilambda_function_call": {
            "Orchestrated": {
                "success": "",
                "data": "{\"Z1K1\":\"Z5\",\"Z5K1\":{\"Z1K1\":\"Z402\",\"Z402K1\":{\"Z1K1\":\"Z421\",\"Z421K1\":null}}}"
            }
        }
    }
}

Is this the expected output?

And trying to modify the python code to call python's time.sleep() doesn't seem to sleep at all. I have a feeling that the python code is actually not being run and there's no visible error in the output. Is there an error log somewhere from the evaluator that I could inspect?

Hmm, I also notice that nothing is working at the moment. We've made a bunch of changes to the orchestrator recently, so I assume that's the reason. I will investigate today.

probably running code on the main thread

All code is run within promises, but in the main thread, yes.

error log somewhere from the evaluator

Not yet; that is something I would very much like to have.

Krinkle added a subscriber: Gilles.

Following up from off thread discussion: we'll plan on the following based on what we know today:

  1. Loading to beta cluster. Work will be happening over the next short while to get there.
  2. Sometime later in Q2, further performance readiness review.
  3. In 2022, after working through pre-launch production considerations, MVP production launch.
Krinkle subscribed.
Krinkle assigned this task to Gilles.
Krinkle edited projects, added Performance-Team; removed Performance-Team (Radar).