Page MenuHomePhabricator

Deployment Pipeline fails with CPS error for Kartotherian
Closed, ResolvedPublic

Description

See the stack trace below. This is from https://integration.wikimedia.org/ci/job/kartotherian-pipeline-tilerator/1/console

expected to call java.lang.RuntimeException.<init> but wound up catching org.wikimedia.integration.ExecutionGraph.toString; see: https://jenkins.io/redirect/pipeline-cps-method-mismatches/
java.lang.NullPointerException
	at com.cloudbees.groovy.cps.impl.ThrowBlock$1.receive(ThrowBlock.java:67)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:186)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Details

Related Gerrit Patches:
mediawiki/services/kartotherian : masterpipeline: Specify which variants to build/publish
mediawiki/services/kartotherian : masterallow npm install devDeps
mediawiki/services/kartotherian : masterAdd copies directive to build stage
mediawiki/services/kartotherian : masterpipeline: Fix execution configuration
integration/pipelinelib : masterAnnotate ExecutionGraph.toString as NonCPS
integration/pipelinelib : masterValidate that `execution` configuration is a list of lists

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 19 2019, 2:28 PM
thcipriani triaged this task as Medium priority.Sep 19 2019, 2:29 PM
thcipriani moved this task from INBOX to Ready on the Release-Engineering-Team-TODO (201909) board.
dduvall claimed this task.Sep 19 2019, 5:22 PM
dduvall moved this task from Ready to Doing on the Release-Engineering-Team-TODO (201909) board.

There are a few things going on here.

First, the execution field for the kartotherian pipeline configuration is incorrect:

execution:
  - test
  - prep
  - production-tilerator

It should be given as a list of lists (graph branches, aka arcs/edges). In this case, I'm assuming they want a single serial branch of execution for the defined stages.

execution:
  - [test, prep, production-tilerator]

Also note that this is the default execution configuration (execute defined stages serially), so it can be omitted entirely in this case.

Secondly, there isn't validation on the execution config and the provided configuration actually resulted in a weird situation where each string of the list was treated as a list (Groovy's loose typing and duck-typing of collections made this possible) and every letter of each string a graph node, which in this case resulted in a cyclic graph.

new ExecutionGraph(['test', 'prep', 'production-tilerator']).toString()
digraph { t -> e; t -> i; t -> o; e -> s; e -> p; e -> r; s -> t; p -> r; r -> e; r -> o; r -> a; o -> d; o -> n; o -> r; d -> u; u -> c; c -> t; i -> o; i -> l; n -> -; - -> t; l -> e; a -> t; }

Next, an exception was thrown due to the graph cycle being detected when ExecutionGraph.stack was called in PipelineBuilder.

Lastly, the exception tried to format the graph as a string using ExecutionGraph.toString() of which the implementation is apparently not Groovy CPS (call-passing style) compatible—the Jenkins (CloudBees actually) CPS plugin converts all node-executed Groovy to CPS—and so the entire pipeline imploded in the obtuse and disconcerting way that Groovy CPS implodes.

The solutions are:

  1. Annotate ExecutionGraph.toString() with @NonCPS
  2. Add validation for execution
  3. In the meantime, fix the offending kartotherian configuration.

Change 538093 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/pipelinelib@master] Validate that execution configuration is a list of lists

https://gerrit.wikimedia.org/r/538093

Change 538088 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[integration/pipelinelib@master] Annotate ExecutionGraph.toString as NonCPS

https://gerrit.wikimedia.org/r/538088

Change 538093 merged by jenkins-bot:
[integration/pipelinelib@master] Validate that execution configuration is a list of lists

https://gerrit.wikimedia.org/r/538093

Change 538088 merged by jenkins-bot:
[integration/pipelinelib@master] Annotate ExecutionGraph.toString as NonCPS

https://gerrit.wikimedia.org/r/538088

Change 539209 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[mediawiki/services/kartotherian@master] pipeline: Fix execution configuration

https://gerrit.wikimedia.org/r/539209

Change 539209 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] pipeline: Fix execution configuration

https://gerrit.wikimedia.org/r/539209

Looks like it can't find node_modules:

npm WARN Local package.json exists, but node_modules missing, did you mean to install?

Looks like it can't find node_modules:

npm WARN Local package.json exists, but node_modules missing, did you mean to install?

@Mathew.onipe the build variant in kartotherian's blubber.yaml likely needs to include a copies: [local] directive—see the Hello Node user tutorial for an example. Starting with Blubber's v4 configuration, project files are no longer copied into the image filesystem by default.

Let me know if you'd like further direction on this.

@dduvall Thanks. I will implement this.

Change 541820 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[mediawiki/services/kartotherian@master] Add copies directive to build stage

https://gerrit.wikimedia.org/r/541820

Change 541820 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] Add copies directive to build stage

https://gerrit.wikimedia.org/r/541820

Now you're back to the lerna: not found error we had before. How did this ever work?

Looks like lerna is one of devDependencies and so npm install --production (command used when the blubber config has node: { env: production }) fails due to the missing lerna binary. This isn't an issue with the pipeline or Blubber and will have to be worked out on the project's end.

There is one more issue with the pipeline config that can be addressed in this task before closing it out, however: I don't think you want/need a prep stage since the prep variant is not runnable—it has no entrypoint. It is an intermediate variant used to prepare the production image.

variants:
  build:
    base: docker-registry.wikimedia.org/nodejs10-devel
    node: { requirements: [.] }
    copies: [local]
  prep:
    includes: [build]
    node: { env: production }
  production-tilerator:
    copies: [prep]
    node: { env: production }
    entrypoint: [node, packages/tilerator/server.js]

Configuring the tilerator-production variant with copies: [prep] tells Blubber to output a multi-stage Dockerfile, including prep to build application files and production-tilerator as the final runnable image.

blubber .pipeline/blubber.yaml production-tilerator
FROM docker-registry.wikimedia.org/nodejs10-devel AS prep
# (install all production _and_ development packages needed to prepare/build application files)
FROM docker-registry.wikimedia.org/nodejs10-slim AS production-tilerator
# (copy files over from prep to a more minimal production image)
COPY --chown=65533:65533 --from=prep ["/srv/service", "/srv/service"]
COPY --chown=65533:65533 --from=prep ["/opt/lib", "/opt/lib"]
# (this is the runnable image that includes an entrypoint)
ENTRYPOINT ["node", "packages/tilerator/server.js"]

Change 541932 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[mediawiki/services/kartotherian@master] allow npm install devDeps

https://gerrit.wikimedia.org/r/541932

Change 541932 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] allow npm install devDeps

https://gerrit.wikimedia.org/r/541932

@dduvall Thanks!. I removed the test stage also forced devdeps to install. We should definitely look at a better way to handle this later. but Its fine as it is.
Currently, Build is passing but not publishing yet. Do we need to enable CI publish stage for the repo?

@Mathew.onipe no problem!

I suspect there's definitely a better way to handle these image builds. A few thoughts on that after looking more closely at the repo:

  1. Would it make sense to add lerna to dependencies in the root package.json? I ask because it seems to be a requirement to getting _any_ dependencies installed for the sub-projects, not just dev dependencies.
  2. Are each of the projects/* directories functional on their own after lerna has installed everything or are there lateral dependencies among projects at runtime? (i.e. will cd projects/kartotherian; node server.js depend on anything outside of projects/kartotherian?) If they're standalone, we could change your blubber.yaml production variants around a bit to copy only the relevant sub-project root into /srv/service.

On the image publishing front, looks like there's still some configuration missing. Your publish stage doesn't specify an image to build before attempting to publish one. (Sorry, we really need to improve the validation here, and perhaps we should remove some of the shorthand configuration rules that lead to this kind of confusion.)

I'll submit another patch.

Change 542213 had a related patch set uploaded (by Dduvall; owner: Dduvall):
[mediawiki/services/kartotherian@master] pipeline: Specify which variants to build/publish

https://gerrit.wikimedia.org/r/542213

Change 542213 merged by jenkins-bot:
[mediawiki/services/kartotherian@master] pipeline: Specify which variants to build/publish

https://gerrit.wikimedia.org/r/542213

dduvall closed this task as Resolved.Oct 23 2019, 5:06 PM