Page MenuHomePhabricator

Migrate panoviewer from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/panoviewer) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Related Objects

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

This is a reminder that the tool for which this ticket is created is still running on the Grid.
The grid is deprecated and all remaining tools need to migrate to Toolforge Kubernetes.

We've sent several emails to maintainers as we continue to make the move away from the Grid.
Many of the issues that have held users back from moving away from the Grid have been addressed in
the latest updates to Build Service. See: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog

You might find the following resources helpful in migrating your tool:

  1. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Migrating_an_existing_tool
  2. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Tutorials_for_popular_languages

Don't hesitate to reach out to us using this ticket or via any of our support channels

If you have already migrated this tool, kindly mark this ticket as 'resolved'
To do this, click on the 'Add Action' dropdown above the comment text box, select 'Change Status', then 'Resolved'.
Click 'Submit'

Thank you!

Tool synopsis (source):

  • The tool has a static HTML/JS frontend which embeds Panellum, an upstream project which displays panoramic images.
  • The user specifies the Commons filename in a fragment parameter. There is a commons template which constructs the URLs.
  • JS requests a configuration blob from a PHP/SQL/NFS backend.
  • The PHP backend checks the source width in the WMCS replica. If the image is small, it downloads the file and delivers JSON linking to it. If it is large, it downloads a thumbnail and queues a tiling job.
  • The tiling job is a shell script that downloads the source image and then runs a python script which is distributed with Panellum.
  • The python script runs Hugin to reproject the source image from spherical equirectangular to cube faces. Then it uses Pillow to cut the cube faces into tile sets at multiple resolutions. Then it writes a JSON file with details of what it did.
  • Meanwhile, the JS frontend polls PHP for an updated config blob. When the job completes and the Panellum JSON file appears in NFS, PHP delivers the updated config. Panellum is given the updated config and the displayed image is progressively enhanced.

I kindly request that you extend the life of this tool to the February 2024 deadline. We will attempt to migrate it much earlier than that, preferably in December.

I was added as an additional maintainer to this tool well after its launch, so I was not part of its original configuration on GE. Therefore I have only recently noticed that it is at risk of being deleted.

Therefore I have only recently noticed that it is at risk of being deleted.

I don't think it is at risk of being deleted. I think the worst-case scenario is that the tiling job stops working. This would mean that old images will continue to work as usual, and new images would be shown with a maximum resolution of 4000px width.

The tiling job was broken between 2021 and October 2023, and apparently nobody reported the issue.

Therefore I have only recently noticed that it is at risk of being deleted.

I don't think it is at risk of being deleted. I think the worst-case scenario is that the tiling job stops working. This would mean that old images will continue to work as usual, and new images would be shown with a maximum resolution of 4000px width.

The tiling job was broken between 2021 and October 2023, and apparently nobody reported the issue.

Thanks, yeah I noticed the tiling was not working but didn't know why.

I think the worst-case scenario is that the tiling job stops working.

I guess the worst case scenario was actually T354949.

I would like to work on this, but I need to be added as a maintainer.

Also, I would like to be added as a maintainer of zoomviewer.

Tim, I just added you as a maintainer!

I hit a problem — Hugin is missing from Ubuntu 22.04, which is also the only distro available for Toolforge buildpacks. It's back again in Ubuntu 23.04. It's in Debian. I couldn't find any explanation on Launchpad for why it was removed from one release.

The Aptfile plugin allows any arbitrary text to be appended to /etc/apt/sources.list, but I couldn't find any way to use that to install a PPA, since the keys are missing and there is no way to install keys.

I tried using an inline buildpack to install the package, but my project.toml file was apparently ignored.

The following solutions occur to me:

  1. Migrate the builder base image to Debian.
  2. Patch fagiani/apt, adding a PPA feature. Although note that a PPA is not exactly ideal since there is no official PPA.
  3. There is an official Flatpak, so the problem could be solved by writing a Flatpak buildpack by analogy with the apt one.
  4. Install some kind of binary and use it over NFS.
  5. Port the relevant parts of Hugin to some language that is supported by the build service. Or use some other pre-existing tool for spherical image projection which fits our needs. Migrate the glue code in Panellum's generate.py to use the other tool.

While option 5 might seem attractive for productionization, it would create a new maintenance responsibility. Hugin is actually doing the remapping, it's not relying on a library. It has GPU support, which basically doubles all the relevant code if we want to keep that.

As for option 4, nona (the binary we want) has 73 shared library dependencies according to ldd. The main reason is that it links to huginbase, a library containing just about all the code for all the hugin-related tools. huginbase has 72 shared library dependencies according to ldd. Nona's direct dependencies (with objdump -p) are only huginbase, libpano13, libtiff and the standard libraries.

Runtime performance over NFS will presumably depend on how many libraries it has to pull in, and their sizes. There's an unofficial Hugin AppImage from 2021 but I figure it would have poor performance due to its large size.

Hugin's CMake build system does have a static compilation option, so I could try using that to make a nona binary.

I hit a problem — Hugin is missing from Ubuntu 22.04, which is also the only distro available for Toolforge buildpacks. It's back again in Ubuntu 23.04. It's in Debian. I couldn't find any explanation on Launchpad for why it was removed from one release.

The reason is probably LP#1960598.

I built a static variant of libvigraimpex 1.11.1. Then I built nona with CMAKE_BUILD_TYPE=Release, HUGIN_SHARED=0, HUGIN_SHARED_LIBS=0 and VIGRA_LIBRARIES set to the full path of my static libvigraimpex. I commented out the OpenEXR check in FindVIGRA.cmake which is non-functional for a static library. I added ${X11_X11_LIB} to nona's target_link_libraries().

The resulting binary was 7MB. I uploaded it to bin/nona in the tool data directory. I added the shared dependencies for nona to Aptfile. I rebuilt the image in toolforge. However, when I run a job, it seems to use the old image that I built on Tuesday.

When I inspect the image, the correct packages seem to be there in /layers, but not in the root.

$ become panoviewer
$ toolforge jobs run --image tool-panoviewer/tool-panoviewer:latest --mount all --command 'ldd /data/project/panoviewer/bin/nona' --wait 60 test ; toolforge jobs logs test
...
2024-01-31T05:37:22+00:00 [test-slbxp]  libexiv2.so.27 => not found
...
$ toolforge jobs run --image tool-panoviewer/tool-panoviewer:latest --mount all --command 'sh -c '\''ls -l /layers/fagiani_apt/apt/usr/lib/x86_64-linux-gnu/libexiv2.so.27 ; ls -l /usr/lib/x86_64-linux-gnu/libexiv2.so.27'\' --wait 60 test ; toolforge jobs logs test
...
2024-01-31T05:44:54+00:00 [test-dz67j] lrwxrwxrwx 1 heroku heroku 18 Jan  1  1980 /layers/fagiani_apt/apt/usr/lib/x86_64-linux-gnu/libexiv2.so.27 -> libexiv2.so.0.27.5
2024-01-31T05:44:54+00:00 [test-dz67j] ls: cannot access '/usr/lib/x86_64-linux-gnu/libexiv2.so.27': No such file or directory

The build log is at P55903.

Since I care about having a working build system, and there's no particular hurry on this task, I didn't rebuild or attempt to purge caches, since I didn't want to lose the test case.

I think this is an issue related to how the apt-buildpack "installs" packages, as it just extracts them in the layers directory so it does not run any post-script or similar.

I have introduced several workarounds to that in the last few days that were added after your build:

  • Correcting symlinks
  • Better dependency and virtual package resolution

Can you rebuild and try again?

Note that as I said, it does not install packages at the root directory, but only in the layers one, then uses LD_CONFIG and others to point to the installation path (it's an intentional limitation of the buildpack system to not run as root during build).

Also note that T355214: [apt-buildpack] Not sourcing /layers/fagiani_apt/apt/.profile.d/000_apt.sh is not yet fixed, so you might have to prepend the 'ad-hoc' commands to toolforge jobs with launcher, like launcher sh -c 'ls -l ...'

Also note that T355214: [apt-buildpack] Not sourcing /layers/fagiani_apt/apt/.profile.d/000_apt.sh is not yet fixed, so you might have to prepend the 'ad-hoc' commands to toolforge jobs with launcher, like launcher sh -c 'ls -l ...'

This has been fixed and rolled out :), details in the docs here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Job

000_apt.sh lacks PYTHONPATH which should be something like

PYTHONPATH=/layers/fagiani_apt/apt/usr/lib/python3/dist-packages

This sort of thing would be easier to debug if the toolforge documentation mentioned that packages are installed into a prefix and that efforts to make things work anyway may not be successful.

000_apt.sh lacks PYTHONPATH which should be something like

PYTHONPATH=/layers/fagiani_apt/apt/usr/lib/python3/dist-packages

This sort of thing would be easier to debug if the toolforge documentation mentioned that packages are installed into a prefix and that efforts to make things work anyway may not be successful.

Added the docs, thanks for the feedback.

I'll open a task for the PYTHONPATH, though it might get tricky due to the python buildpack tweaking it too (one of the many reasons why I recommend using lang-specific buildpacks whenever possible).

000_apt.sh lacks PYTHONPATH which should be something like

PYTHONPATH=/layers/fagiani_apt/apt/usr/lib/python3/dist-packages

This sort of thing would be easier to debug if the toolforge documentation mentioned that packages are installed into a prefix and that efforts to make things work anyway may not be successful.

Added the docs, thanks for the feedback.

I'll open a task for the PYTHONPATH, though it might get tricky due to the python buildpack tweaking it too (one of the many reasons why I recommend using lang-specific buildpacks whenever possible).

Hmm, I think there might be some issues with programs that have a hardcoded python path as a shebang too, if you install a different python version :/, will open a task for that also, let me know if you find that issue and I'll prioritize that task too.

A number of libraries are installed into subdirectories of /usr/lib/x86_64-linux-gnu , and usually they would be found by the ldconfig cache. But this fagiani_apt layer doesn't support ldconfig, so such libraries cannot be loaded unless their directories are manually added to LD_LIBRARY_PATH.

I got this error:

Traceback (most recent call last):
  File "/layers/fagiani_apt/apt/usr/lib/python3/dist-packages/numpy/core/__init__.py", line 22, in <module>
    from . import multiarray
  File "/layers/fagiani_apt/apt/usr/lib/python3/dist-packages/numpy/core/multiarray.py", line 12, in <module>
    from . import overrides
  File "/layers/fagiani_apt/apt/usr/lib/python3/dist-packages/numpy/core/overrides.py", line 7, in <module>
    from numpy.core._multiarray_umath import (
ImportError: libblas.so.3: cannot open shared object file: No such file or directory

It looks like I can resolve it like this:

$ launcher 'PYTHONPATH=/layers/fagiani_apt/apt/usr/lib/python3/dist-packages LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/layers/fagiani_apt/apt/usr/lib/x86_64-linux-gnu/blas:/layers/fagiani_apt/apt/usr/lib/x86_64-linux-gnu/lapack python'
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> 

Can't rebuild

[step-export] 2024-02-02T12:47:50.127313093Z ERROR: failed to export: failed to write image to the following tags: [tools-harbor.wmcloud.org/tool-panoviewer/tool-panoviewer:latest: PUT https://tools-harbor.wmcloud.org/v2/tool-panoviewer/tool-panoviewer/blobs/uploads/ac1a3e77-63b7-4ccf-9ba7-2abb99655f01?_state=REDACTED&digest=sha256%3A99a441f5196cda3ebbef6f643416dbb95efa8a918b0066c5174946a634686c6a: DENIED: adding 15.3 MiB of storage resource, which when updated to current usage of 1014.3 MiB will exceed the configured upper limit of 1.0 GiB.]

Did toolforge build clean and trying again.

Looks good, thanks @dcaro. I reverted my temporary hacks in the tool.

There are some remaining followup tasks, but migration of this tool to Kubernetes is complete.