Page MenuHomePhabricator

Evaluator service dies on receiving a request
Closed, ResolvedPublic

Description

I created a new wiki with Wikilambda, but when firing off the example query "Invoke native code" I get the following result

{
    "query": {
        "wikilambda_function_call": {
            "Orchestrated": {
                "success": "",
                "data": "{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z22\"},\"Z22K1\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z23\"}},\"Z22K2\":{\"Z1K1\":{\"Z1K1\":\"Z9\",\"Z9K1\":\"Z23\"}}}"
            }
        }
    }
}

and the evaluator is exited. (Are there any logs i can provide for help?)

Event Timeline

This comment was removed by cmassaro.

Ignore previous comment :). I have replicated this with the latest builds. Investigating now.

Huh. Looks like the buster-node10js images are the culprit: the function-evaluator image isn't building at HEAD, but everything works if I use a different image. I guess that the version of Node packaged in those images doesn't work with some of our dependencies. This should have caused a ruckus and a din in function-evaluator CI. @Jdforrester-WMF, is there a way to get alerts when CI is failing in a repo?

Change 697660 had a related patch set uploaded (by Cory Massaro; author: Cory Massaro):

[mediawiki/services/function-evaluator@master] Update NPM dependencies and base Docker images in order to resolve Node compatibility problem.

https://gerrit.wikimedia.org/r/697660

Huh. Looks like the buster-node10js images are the culprit: the function-evaluator image isn't building at HEAD, but everything works if I use a different image. I guess that the version of Node packaged in those images doesn't work with some of our dependencies.

Ah, are some of our dependencies not locked down enough?

This should have caused a ruckus and a din in function-evaluator CI.

Indeed, it still doesn't in CI: the HEAD commit is bcc9d048945bf9f78aa9a06ae2a5bacb75062333 which passed a re-triggered CI run: https://integration.wikimedia.org/ci/job/service-pipeline-test/9028/

@Jdforrester-WMF, is there a way to get alerts when CI is failing in a repo?

No. Failing CI won't let people merge patches, but we don't re-trigger CI systemically for such issues.

Really, we should ask SRE to make an image so that we can migrate to node 12 urgently, but they're somewhat resistant.

Huh. Looks like the buster-node10js images are the culprit: the function-evaluator image isn't building at HEAD, but everything works if I use a different image. I guess that the version of Node packaged in those images doesn't work with some of our dependencies.

Ah, are some of our dependencies not locked down enough?

Maybe? Running npm update certainly bumped the versions. I don't know if there's a way to manage our dependencies more automatically.

This should have caused a ruckus and a din in function-evaluator CI.

Indeed, it still doesn't in CI: the HEAD commit is bcc9d048945bf9f78aa9a06ae2a5bacb75062333 which passed a re-triggered CI run: https://integration.wikimedia.org/ci/job/service-pipeline-test/9028/

I have no idea how that CI run passed, hahaha. HEAD doesn't build for me; the generated Dockerfile produces a Node mismatch.

I don't think there's a way to fix the Node version thing in a base Docker image. Maybe the apt command can force a version update? I'll try that.

Change 697668 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/services/function-evaluator@master] build: Pin all dependencies and devDependencies exactly

https://gerrit.wikimedia.org/r/697668

Change 697660 had a related patch set uploaded (by Jforrester; author: Cory Massaro):

[mediawiki/services/function-evaluator@master] Switch base Docker image down to stretch temporarily to resolve Node compatibility problem

https://gerrit.wikimedia.org/r/697660

Change 697660 merged by jenkins-bot:

[mediawiki/services/function-evaluator@master] Update NPM dependencies and base Docker images in order to resolve Node compatibility problem.

https://gerrit.wikimedia.org/r/697660

Change 697668 merged by jenkins-bot:

[mediawiki/services/function-evaluator@master] build: Pin all dependencies and devDependencies exactly

https://gerrit.wikimedia.org/r/697668

Change 697692 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/services/function-orchestrator@master] build: Switch base image to stretch from buster for now due to broken npm

https://gerrit.wikimedia.org/r/697692

Change 697692 merged by jenkins-bot:

[mediawiki/services/function-orchestrator@master] build: Switch base image to stretch from buster for now due to broken npm

https://gerrit.wikimedia.org/r/697692

Sorry, re-opening this as this still happens for me.

It happens both with the following settings:

function-orchestrator:
  image: docker-registry.wikimedia.org/wikimedia/mediawiki-services-function-orchestrator:2021-06-02-161154-production
  ports:
    - 6254:6254
function-evaluator:
  image: docker-registry.wikimedia.org/wikimedia/mediawiki-services-function-evaluator:2021-06-01-223817-production
  ports:
    - 6927:6927

which are as of now the most current tags, as well as using the most current hashes:

function-orchestrator:
  image: docker-registry.wikimedia.org/wikimedia/mediawiki-services-function-orchestrator:f145b987ad54bd888561704527a5c81feb9da561
  ports:
    - 6254:6254
function-evaluator:
  image: docker-registry.wikimedia.org/wikimedia/mediawiki-services-function-evaluator:7e96a096a5ae1c9aa77c02d0927598ddcaa5ecb3
  ports:
    - 6927:6927

If instead of re-opening I should start a new bug, let me know.

The patch that fixed this was merged on June 2: https://gerrit.wikimedia.org/r/c/mediawiki/services/function-orchestrator/+/697692

The most recent evaluator image on the Docker registry is from June 1st. @Jdforrester-WMF , is there a way we can manually force a new build?

At present, you will have to build a local image. From the function-evaluator directory:

blubber .pipeline/blubber.yaml development | docker build -t test-evaluator -f - .

Then, in docker-compose.override.yaml, update the image in your function-evaluator stanza as follows:

function-evaluator:
  image: test-evaluator:latest
  ports:
    - 6927:6927

Sorry this is such a pain!

The patch that fixed this was merged on June 2: https://gerrit.wikimedia.org/r/c/mediawiki/services/function-orchestrator/+/697692

The most recent evaluator image on the Docker registry is from June 1st. @Jdforrester-WMF , is there a way we can manually force a new build?

Yes, done, though from the original post-merge job on that patch:

+ docker tag 2b5f7ce84ff8c35b0075364a2b009330a534a5f58768e504b3633180fe970170 docker-registry.discovery.wmnet/wikimedia/mediawiki-services-function-orchestrator:2021-06-02-161154-production

… and then later:

Untagged: docker-registry.discovery.wmnet/wikimedia/mediawiki-services-function-orchestrator:2021-06-02-161154-production

Which is, err, unhelpful.

Hmm. It's done it again. 2021-06-04-220538-production pushed and then immediately untagged.

But even without the tags, shouldn't the git hashes be referring to the right version?

But even without the tags, shouldn't the git hashes be referring to the right version?

It's also deleting the git hash tag:

Untagged: docker-registry.discovery.wmnet/wikimedia/mediawiki-services-function-orchestrator:f145b987ad54bd888561704527a5c81feb9da561

Two questions on this:

@DVrandecic : you mentioned that the instructions in the README don't help. The README describes how to build the local versions of the docker containers. Can you suggest how the README could be more clear?
@Jdforrester-WMF : docker-registry still doesn't have a version of function-evaluator newer than June 1; is there a way I can manually upload a new image there or trigger a build?

Two questions on this:

@DVrandecic : you mentioned that the instructions in the README don't help. The README describes how to build the local versions of the docker containers. Can you suggest how the README could be more clear?
@Jdforrester-WMF : docker-registry still doesn't have a version of function-evaluator newer than June 1; is there a way I can manually upload a new image there or trigger a build?

Denny is saying that following the instructions doesn't work, as the latest tag in the registry is from 1 June. I'm saying that the job is deleting its tag immediately after pushing. I've asked RelEng for help.

Yes, agreed. I am just wondering if the fallback instructions in the README can be made more clear.

As to the latter, thank you--reading comprehension fail on my part.

For some reason https://gerrit.wikimedia.org/r/c/mediawiki/services/function-evaluator/+/698570 did cause a new image to be tagged, as 2021-06-07-210845-production, so this is Resolved. Will follow-up with RelEng longer-term as we don't want this to recur.