Page MenuHomePhabricator

Toolforge build service: Can't process an image larger than 128 Mpx using ImageMagick
Closed, ResolvedPublicBUG REPORT

Description

Commons users often upload images larger than 128 Mpx. That's quite large, so they may want to crop the images using CropTool. However, CropTool uses ImageMagick from the heroku-22 stack, which sets a maximum pixel cache size of 128 Mpx.

/etc/ImageMagick-6/policy.xml:

<policymap>
  <policy domain="resource" name="memory" value="256MiB"/>
  <policy domain="resource" name="map" value="512MiB"/>
  <policy domain="resource" name="width" value="16KP"/>
  <policy domain="resource" name="height" value="16KP"/>
  <policy domain="resource" name="area" value="128MP"/>
  <policy domain="resource" name="disk" value="1GiB"/>
  <policy domain="delegate" rights="none" pattern="URL" />
  <policy domain="delegate" rights="none" pattern="HTTPS" />
  <policy domain="delegate" rights="none" pattern="HTTP" />
  <policy domain="path" rights="none" pattern="@*"/>
  <policy domain="cache" name="shared-secret" value="passphrase" stealth="true"/>
</policymap>

This file comes from the Heroku base image, but the numerical limits are the same as the ones in the Debian package, which in turn are mostly the same as the ones in an example policy file in the upstream documentation.

memory means memory allocated from malloc(), and map means an anonymous mmap(). For security purposes, these are basically the same. They are different in this configuration file because it is an example not meant for production use.

Note the powers of two in the policy: 256MiB memory, 512MiB map, 1GiB disk. It's like that because it's an example. There is no rational reason for these values to be separated by factors of two.

The Debian patch file claims to be imposing these limits for safety, but the values they came up with do not differ from the example policy, and do not have a documented justification.

Given multiple configuration sources, ImageMagick always uses the lowest specified limit. So setting a higher limit by adding a configuration file, setting an environment variable, or calling the relevant API has no effect. Overriding this limit in an application as non-root would require some extreme hackery.

So for reasonable operation of ImageMagick on Debian or Heroku, it is necessary to customise or remove the policy.xml file.

[EDIT: Wrong: see policy precedence below]

For CropTool and other applications that use PHP's imagick extension with FPM, I believe it will prove useful to reduce the "memory" limit to a low value, say 10MB, and to set the "map" limit to almost all of the pod's memory quota. With FPM, memory allocated with malloc() will usually not be returned to the OS after request termination, so the average memory usage of the pod should be reduced by encouraging ImageMagick to use mmap() for large allocations. In Toolforge, the memory quota is controlled by the user -- an administrator building the base image can't know what quota will be active when the image is used.

Commons images frequently have a width and height of much more than 16,000 px.

In CropTool, I don't want the disk to be used at all, since it's just a tmpfs so disk usage will be counted towards the memory limit of the pod, which I want to fully consume with the "map" limit.

My suggestion for a new policy.xml file:

<policymap>
  <policy domain="resource" name="memory" value="1TiB"/>
  <policy domain="resource" name="map" value="1TiB"/>
  <policy domain="resource" name="width" value="1MP"/>
  <policy domain="resource" name="height" value="1MP"/>
  <policy domain="resource" name="area" value="1PP"/>
  <policy domain="resource" name="disk" value="1TiB"/>
  <policy domain="delegate" rights="none" pattern="URL" />
  <policy domain="delegate" rights="none" pattern="HTTPS" />
  <policy domain="delegate" rights="none" pattern="HTTP" />
  <policy domain="path" rights="none" pattern="@*"/>
  <policy domain="cache" name="shared-secret" value="passphrase" stealth="true"/>
</policymap>

This will allow the limits to be set by the application when building the image, or by using the API.

Setting limits in a policy.xml unnecessarily constrains what applications can do. It is not beneficial for security.

Policy precedence

Contrary to what I said when I opened the ticket, there is an order of precedence applied when loading policy files. ImageMagick builds a list of policies, and traverses the list for each limit type, applying the first match. So the first policy file which has a value for a given limit will be used, from the following list:

*$MAGICK_CONFIGURE_PATH/policy.xml, if the environment variable exists

  • Various fixed system locations
  • Various user-configurable locations
  • Internal defaults

So it is possible for users to override the system policy.xml, by setting MAGICK_CONFIGURE_PATH and placing a policy.xml in that directory. The file needs to actually have values — if they are missing then it will fall back to the system policy.

Event Timeline

Given multiple configuration sources, ImageMagick always uses the lowest specified limit. So setting a higher limit by adding a configuration file, setting an environment variable, or calling the relevant API has no effect.

Well that's super frustrating.

Setting limits in a policy.xml unnecessarily constrains what applications can do. It is not beneficial for security.

Yeah, I agree with your conclusion. And resource limits are imposed by kubernetes, there's no need to more specifically restrict imagemagick; tools that invoke it specifically should impose their own limits or deal with the consequences of OOM, etc.

@aborrero tells me that the Debian maintainer is willing to remove the resource limits from policy.xml in the Debian package. That would be great to see, and will eventually help us to avoid production issues like T344233. It will also carry weight with Heroku if we decide to open a pull request there.

I'm assuming these upstream changes will take years to filter down to us. In the meantime, I'd like to have some sort of workaround for CropTool, which has been broken since the k8s transition in January, with users frequently complaining.

It would be possible to compile ImageMagick with a non-empty install prefix, since it looks for its configuration file in $PREFIX/etc. Since there's no support for compiling C code in the build system, I'd have to serve the binaries over NFS, a hack I also used for panoviewer. Then I could set LD_LIBRARY_PATH in the Procfile.

I'm not sure what the other options are. A custom stack (runimage)? Or fork the base-images repo and provide patched images with the same name as upstream? @dcaro: it's a bit surprising that builds-api/cli have no option to select the stack — what is the plan for upgrading to heroku-24?

I'm not sure what the other options are. A custom stack (runimage)? Or fork the base-images repo and provide patched images with the same name as upstream?

I created https://github.com/heroku/base-images/pull/316 as a starting point.

We can patch it ourselves for now yes, that's an option, until we upgrade to the 24 runner/builder (that is stable now, so that should be done soon).

@dcaro: it's a bit surprising that builds-api/cli have no option to select the stack — what is the plan for upgrading to heroku-24?

For most users it would be transparent, as it would not affect them at all. There's some users that we will need some more care, specially the ones using Aptfile as the packages might change.

There's no clear schedule planned yet, but the general idea is to enable a '--next/--new-builder' or similar flag temporarily in the client to use the newer builder and test your builds, that eventually will become '--old-builder' and then disappear (that will help us also track if there's any such usage and reach to those tools and help).

From that pull request (now closed), they suggest using:
https://help.heroku.com/RFDJQSG3/how-can-i-override-imagemagick-settings-in-a-policy-xml-file

that would mean setting that envvar and placing your override policy earlier in the path, I'm testing locally to see if we can that way change the default and not need any changes for you.

@tstarling I have a patch that implements the fix mentioned in the link above, do you have an image I can play with/test to make sure it works?

@tstarling I have a patch that implements the fix mentioned in the link above, do you have an image I can play with/test to make sure it works?

Today's project is my unmerged PR in croptool:

toolforge build start --ref memory-display-errors https://github.com/tstarling/croptool.git

Last build was croptool-buildpacks-pipelinerun-sw5fc. But it should work with any build service image, right? convert -list resource will give you the current merged settings.

tools.croptool@tools-bastion-13:~$ toolforge webservice shell
I have no name!@shell-1721826100:~$ convert -list resource
Resource limits:
  Width: 16KP
  Height: 16KP
  List length: unlimited
  Area: 128MP
  Memory: 256MiB
  Map: 512MiB
  Disk: 1GiB
  File: 786432
  Thread: 8
  Throttle: 0
  Time: unlimited

Hmm, first try using the fix mentioned above does not seem to change much:

heroku@c73de7ed0d69:/workspace$ echo $MAGICK_CONFIGURE_PATH
/app/.magick:/layers/heroku_procfile/imagemagick_fix:/etc/ImageMagick-6

heroku@c73de7ed0d69:/workspace$ cat /layers/heroku_procfile/imagemagick_fix/policy.xml 
<policymap>
  <policy domain="delegate" rights="none" pattern="URL" />
  <policy domain="delegate" rights="none" pattern="HTTPS" />
  <policy domain="delegate" rights="none" pattern="HggTTP" />
  <policy domain="path" rights="none" pattern="@*"/>
  <policy domain="cache" name="shared-secret" value="passphrase" stealth="true"/>
</policymap>

heroku@c73de7ed0d69:/workspace$ convert -list resource
Resource limits:
  Width: 16KP
  Height: 16KP
  List length: unlimited
  Area: 128MP
  Memory: 256MiB
  Map: 512MiB
  Disk: 1GiB
  File: 786432
  Thread: 4
  Throttle: 0
  Time: unlimited

Looking

Wait, it does only if you overwrite it:

heroku@c73de7ed0d69:~/.magick$ cat <<EOP >policy.xml
<policymap>
  <policy domain="delegate" rights="none" pattern="URL" />
  <policy domain="delegate" rights="none" pattern="HTTPS" />
  <policy domain="delegate" rights="none" pattern="HggTTP" />
  <policy domain="path" rights="none" pattern="@*"/>
  <policy domain="cache" name="shared-secret" value="passphrase" stealth="true"/><policy domain="resource" name="width" value="20KP"/>
</policymap>
EOP

heroku@c73de7ed0d69:~/.magick$ convert -list resource
Resource limits:
  Width: 20KP      <- this changed
  Height: 16KP
  List length: unlimited
  Area: 128MP
  Memory: 256MiB
  Map: 512MiB
  Disk: 1GiB
  File: 786432
  Thread: 4
  Throttle: 0
  Time: unlimited
` ``

I don't seem to find a way for it not to use the default one:

heroku@c73de7ed0d69:~/.magick$ identify -list policy | grep ^Path
Path: /etc/ImageMagick-6/policy.xml
Path: /app/.magick/policy.xml
Path: /layers/heroku_procfile/imagemagick_fix/policy.xml
Path: /etc/ImageMagick-6/policy.xml
Path: /app/.magick/policy.xml
Path: [built-in]

heroku@c73de7ed0d69:~/.magick$ echo $MAGICK_CONFIGURE_PATH 
/app/.magick:/layers/heroku_procfile/imagemagick_fix

Or to tell it that the size limit is unlimited (it just sets it to 0), @tstarling any preference on what big numbers to use to overwrite the limits?

Or to tell it that the size limit is unlimited (it just sets it to 0), @tstarling any preference on what big numbers to use to overwrite the limits?

I updated the task description, suggesting some big numbers.

@aborrero: sorry for the confusion, see my updated task description. It is possible (although awkward) for applications to override the system policy. I think relaxing these limits in the Debian package is still warranted, but I don't want to mislead anyone about the strength of the rationale.

project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/455

builds-builder: bump to 0.0.114-20240729143738-4f948212

dcaro claimed this task.
dcaro triaged this task as Medium priority.
dcaro moved this task from Backlog to Toolforge iteration 13 on the Toolforge board.
dcaro edited projects, added Toolforge (Toolforge iteration 13); removed Toolforge.
dcaro moved this task from Next Up to Done on the Toolforge (Toolforge iteration 13) board.

This is deployed now, now the memory limit used is the one of the running worker node (and as high as the configured limit, way way higher than we will provide any time soon), feel free to reopen if it's not working for you for whichever reason :)