Page MenuHomePhabricator

Deployment Pipeline fails with CPS error for Kartotherian
Closed, ResolvedPublic


See the stack trace below. This is from

expected to call java.lang.RuntimeException.<init> but wound up catching org.wikimedia.integration.ExecutionGraph.toString; see:
	at com.cloudbees.groovy.cps.impl.ThrowBlock$1.receive(
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(
	at com.cloudbees.groovy.cps.Next.step(
	at com.cloudbees.groovy.cps.Continuable$
	at com.cloudbees.groovy.cps.Continuable$
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(
	at com.cloudbees.groovy.cps.Continuable.run0(
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$
	at hudson.remoting.SingleLaneExecutorService$
	at jenkins.util.ContextResettingExecutorService$
	at java.util.concurrent.Executors$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$

Event Timeline

thcipriani triaged this task as Medium priority.Sep 19 2019, 2:29 PM
thcipriani moved this task from INBOX to Ready on the Release-Engineering-Team-TODO (201909) board.

There are a few things going on here.

First, the execution field for the kartotherian pipeline configuration is incorrect:

  - test
  - prep
  - production-tilerator

It should be given as a list of lists (graph branches, aka arcs/edges). In this case, I'm assuming they want a single serial branch of execution for the defined stages.

  - [test, prep, production-tilerator]

Also note that this is the default execution configuration (execute defined stages serially), so it can be omitted entirely in this case.

Secondly, there isn't validation on the execution config and the provided configuration actually resulted in a weird situation where each string of the list was treated as a list (Groovy's loose typing and duck-typing of collections made this possible) and every letter of each string a graph node, which in this case resulted in a cyclic graph.

new ExecutionGraph(['test', 'prep', 'production-tilerator']).toString()
digraph { t -> e; t -> i; t -> o; e -> s; e -> p; e -> r; s -> t; p -> r; r -> e; r -> o; r -> a; o -> d; o -> n; o -> r; d -> u; u -> c; c -> t; i -> o; i -> l; n -> -; - -> t; l -> e; a -> t; }

Next, an exception was thrown due to the graph cycle being detected when ExecutionGraph.stack was called in PipelineBuilder.

Lastly, the exception tried to format the graph as a string using ExecutionGraph.toString() of which the implementation is apparently not Groovy CPS (call-passing style) compatible—the Jenkins (CloudBees actually) CPS plugin converts all node-executed Groovy to CPS—and so the entire pipeline imploded in the obtuse and disconcerting way that Groovy CPS implodes.

The solutions are:

  1. Annotate ExecutionGraph.toString() with @NonCPS
  2. Add validation for execution
  3. In the meantime, fix the offending kartotherian configuration.

Change 538093 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/pipelinelib@master] Validate that execution configuration is a list of lists

Change 538088 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/pipelinelib@master] Annotate ExecutionGraph.toString as NonCPS

Change 538093 merged by jenkins-bot:
[integration/pipelinelib@master] Validate that execution configuration is a list of lists

Change 538088 merged by jenkins-bot:
[integration/pipelinelib@master] Annotate ExecutionGraph.toString as NonCPS

Change 539209 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[mediawiki/services/kartotherian@master] pipeline: Fix execution configuration

Change 539209 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] pipeline: Fix execution configuration

Looks like it can't find node_modules:

npm WARN Local package.json exists, but node_modules missing, did you mean to install?

Looks like it can't find node_modules:

npm WARN Local package.json exists, but node_modules missing, did you mean to install?

@Mathew.onipe the build variant in kartotherian's blubber.yaml likely needs to include a copies: [local] directive—see the Hello Node user tutorial for an example. Starting with Blubber's v4 configuration, project files are no longer copied into the image filesystem by default.

Let me know if you'd like further direction on this.

Change 541820 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[mediawiki/services/kartotherian@master] Add copies directive to build stage

Change 541820 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] Add copies directive to build stage

Now you're back to the lerna: not found error we had before. How did this ever work?

Looks like lerna is one of devDependencies and so npm install --production (command used when the blubber config has node: { env: production }) fails due to the missing lerna binary. This isn't an issue with the pipeline or Blubber and will have to be worked out on the project's end.

There is one more issue with the pipeline config that can be addressed in this task before closing it out, however: I don't think you want/need a prep stage since the prep variant is not runnable—it has no entrypoint. It is an intermediate variant used to prepare the production image.

    node: { requirements: [.] }
    copies: [local]
    includes: [build]
    node: { env: production }
    copies: [prep]
    node: { env: production }
    entrypoint: [node, packages/tilerator/server.js]

Configuring the tilerator-production variant with copies: [prep] tells Blubber to output a multi-stage Dockerfile, including prep to build application files and production-tilerator as the final runnable image.

blubber .pipeline/blubber.yaml production-tilerator
FROM AS prep
# (install all production _and_ development packages needed to prepare/build application files)
FROM AS production-tilerator
# (copy files over from prep to a more minimal production image)
COPY --chown=65533:65533 --from=prep ["/srv/service", "/srv/service"]
COPY --chown=65533:65533 --from=prep ["/opt/lib", "/opt/lib"]
# (this is the runnable image that includes an entrypoint)
ENTRYPOINT ["node", "packages/tilerator/server.js"]

Change 541932 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[mediawiki/services/kartotherian@master] allow npm install devDeps

Change 541932 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] allow npm install devDeps

@dduvall Thanks!. I removed the test stage also forced devdeps to install. We should definitely look at a better way to handle this later. but Its fine as it is.
Currently, Build is passing but not publishing yet. Do we need to enable CI publish stage for the repo?

@Mathew.onipe no problem!

I suspect there's definitely a better way to handle these image builds. A few thoughts on that after looking more closely at the repo:

  1. Would it make sense to add lerna to dependencies in the root package.json? I ask because it seems to be a requirement to getting _any_ dependencies installed for the sub-projects, not just dev dependencies.
  2. Are each of the projects/* directories functional on their own after lerna has installed everything or are there lateral dependencies among projects at runtime? (i.e. will cd projects/kartotherian; node server.js depend on anything outside of projects/kartotherian?) If they're standalone, we could change your blubber.yaml production variants around a bit to copy only the relevant sub-project root into /srv/service.

On the image publishing front, looks like there's still some configuration missing. Your publish stage doesn't specify an image to build before attempting to publish one. (Sorry, we really need to improve the validation here, and perhaps we should remove some of the shorthand configuration rules that lead to this kind of confusion.)

I'll submit another patch.

Change 542213 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[mediawiki/services/kartotherian@master] pipeline: Specify which variants to build/publish

Change 542213 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] pipeline: Specify which variants to build/publish