Page MenuHomePhabricator

Create a superset container image using the PipelineLib framework
Closed, ResolvedPublic

Description

The container image for Superset will be built using PipelineLib from the Release Engineering team.

We will create a GitLab project at: https://gitlab.wikimedia.org/repos/data-engineering/superset and the build pipeline will be maintained primarily as a .gitlab-ci.yaml file and a blubber.yaml file.

Details

TitleReferenceAuthorSource BranchDest Branch
Add /app to the PYTHONPATH for supersetrepos/data-engineering/superset!23btullisadd_app_to_pythonpathmain
Add our customisation to the WMF superset buildrepos/data-engineering/superset!22btullisadd_wmf_requirementsmain
Configure the npm proxy settingsrepos/data-engineering/superset!9btullisconfig_npmmain
Test the use of the https_proxy environment variablerepos/data-engineering/superset!8btullistest_https_proxymain
Use a plain node builderrepos/data-engineering/superset!7btullisuse_node_installmain
Use a node builder for the npm ci steprepos/data-engineering/superset!6btullisuse_node_buildermain
Set the npm proxy on the trusted runnersrepos/data-engineering/superset!4btullisfix_npm_proxymain
Fix the location of the blubber filerepos/data-engineering/superset!3btullisfix_publish_configmain
Add a publish stage to the superset imagerepos/data-engineering/superset!2btullispublish_supersetmain
Add the data-engineering/superset project to trusted-runnersrepos/releng/gitlab-trusted-runner!52btullisadd_superset_projectmain
Add initial files for building supersetrepos/data-engineering/superset!1btullisbegin_superset_buildmain
Show related patches Customize query in GitLab

Event Timeline

BTullis triaged this task as High priority.
BTullis moved this task from Incoming to Quarterly Goals on the Data-Platform-SRE board.
BTullis moved this task from Quarterly Goals to In Progress on the Data-Platform-SRE board.

I have made some progress on the Superset image in this MR.

I will check the repository settings and then make a request to gain access to the trusted runners for this project, as per these instructions.

To start with, this is just a vanilla interpretation of the upstream Dockerfile, using blubber and kokkuri.

I will make some further modifications to incorporate our specific requirements and the additional packages that we will need.

It took a while, but I've got past a real blocker that I had with publishing via the trusted runners.
I've now got what I feel is a good vanilla image of superset, that is ready for customisation.

I can do the following:

btullis@marlin:~$ openssl rand -base64 42
E22wJCmT74ZKrPHtks0D/uT9qdx3llrP33HYI/nG3bWEiKFExJC0FSrR

btullis@marlin:~$ docker run --env SUPERSET_SECRET_KEY=E22wJCmT74ZKrPHtks0D/uT9qdx3llrP33HYI/nG3bWEiKFExJC0FSrR -it docker-registry.wikimedia.org/repos/data-engineering/superset:latest
[2023-12-13 02:18:21 +0000] [7] [INFO] Starting gunicorn 20.1.0
[2023-12-13 02:18:21 +0000] [7] [INFO] Listening at: http://0.0.0.0:8088 (7)
[2023-12-13 02:18:21 +0000] [7] [INFO] Using worker: gthread
[2023-12-13 02:18:21 +0000] [8] [INFO] Booting worker with pid: 8
logging was configured successfully
2023-12-13 02:18:22,793:INFO:superset.utils.logging_configurator:logging was configured successfully
2023-12-13 02:18:22,797:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/home/superset/.local/lib/python3.9/site-packages/flask_limiter/extension.py:293: UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
  warnings.warn(

The next step is to incorporate our specific python packages from here.

I've made a good start with this, adding kerberos and memcached support to the superset container. Marking as ready for review.

I think we can call this done now. There will likely be some iteration on the image once we start testing it, but for now I'm happy that it's got both the debian packages and python libraries that we will need to begin working with it on dse-k8s.