Page MenuHomePhabricator

Install DjVuLibre and XPDF packages for Kubernetes containers on Tool Labs
Open, Stalled, NormalPublic

Description

A tool I'm developing now depends on djvutoxmland pdftotext utilities. They're both installed and are available in main tool-labs shells, but are not available for Python apps running in Kubernetes with Python3 venv.

tools.zoomproof@tools-bastion-03:~$ whereis pdftotext
pdftotext: /usr/bin/pdftotext /usr/bin/X11/pdftotext /usr/share/man/man1/pdftotext.1.gz
tools.zoomproof@tools-bastion-03:~$ whereis djvutoxml
djvutoxml: /usr/bin/djvutoxml /usr/bin/X11/djvutoxml /usr/share/man/man1/djvutoxml.1.gz
tools.zoomproof@tools-bastion-03:~$ webservice --backend=kubernetes python shell
If you don't see a command prompt, try pressing enter.
tools.zoomproof@interactive:~$
tools.zoomproof@interactive:~$ python3 -m venv ~/www/python/venv
tools.zoomproof@interactive:~$ source ~/www/python/venv/bin/activate
(venv) tools.zoomproof@interactive:~$ whereis pdftotext
pdftotext:
(venv) tools.zoomproof@interactive:~$ whereis djvtoxml
djvtoxml:
(venv) tools.zoomproof@interactive:~$

Can we please install those utilities? They're legit, open source utilities, already available in main ToolLabs shell, so I expect no issues on this side.
If there's a way for me to mount/call those utilities from inside of Kubernetes virtualenv, I'd be thankful for a hint on how to do that.

Event Timeline

Xelgen created this task.Jun 4 2017, 1:39 PM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 4 2017, 1:39 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Xelgen removed yuvipanda as the assignee of this task.Jun 5 2017, 7:07 PM
Xelgen updated the task description. (Show Details)
Xelgen added a subscriber: yuvipanda.
Krinkle renamed this task from Install request on ToolLabs inside Kubernetes containers for DjVuLibre and XPDF packages to Install DjVuLibre and XPDF packages for Kubernetes containers on Tool Labs.Jun 5 2017, 7:20 PM
Krinkle removed a project: Kubernetes.
Xelgen added a subscriber: Krinkle.Jun 5 2017, 7:24 PM

Can't it be installed via pip?

No, as pip is only for Python packages. And those are regular binaries, you'd install with your distro package manager (apt-get, yum install, etc..)

@Krinkle, 10x for more sane description of task.

(oh I thought you meant install the python requests package)

Dzahn added a subscriber: Dzahn.Jun 5 2017, 8:00 PM

Packages get installed via Puppet. The list of packages installed in toollabs, on exec nodes, is:

puppet/modules/toollabs/manifests/exec_environ.pp in the operations/puppet repo. See the "package{}" resources in there.

Question is though how the currently installed packages are not listed there.

chasemp triaged this task as Normal priority.Jun 5 2017, 8:10 PM

The k8s packages are installed via Dockerfiles at rODIT.

bd808 added a subscriber: bd808.Jun 5 2017, 8:14 PM

Question is though how the currently installed packages are not listed there.

The packages for the Docker images used in the Kubernetes cluster are not handled by Puppet. They are instead managed with Dockerfiles generated by code in the rODIT operations-docker-images-toollabs-images repository.

chasemp added a subscriber: chasemp.Jun 5 2017, 8:15 PM

@Dzahn I believe this user is in the k8s environment which is handled separately from SGE

I imagine the current state is explained with:

tools-bastion-03:~$ whereis pdftotext

pdftotext: /usr/bin/pdftotext /usr/bin/X11/pdftotext /usr/share/man/man1/pdftotext.1.gz

tools-exec-1414:~$ whereis pdftotext

pdftotext: /usr/bin/pdftotext /usr/bin/X11/pdftotext /usr/share/man/man1/pdftotext.1.gz

tools-worker-1001:~$ whereis pdftotext

pdftotext:

Where tools-worker-1001 is an eligible worker to run a container in question. I'm not sure we have worked out a process for new packages that are system wide on the k8s nodes. I am marking this task to talk over in our team meeting tomorrow.

pdftotext: /usr/bin/pdftotext /usr/bin/X11/pdftotext /usr/share/man/man1/pdftotext.1.gz
bd808 added a comment.Jun 7 2017, 12:54 AM

@Xelgen, can you run the tool you would like to build on Grid Engine rather than Kubernetes? It sounds like the software you need is available there today.

Xelgen added a comment.EditedJun 7 2017, 11:05 AM

@bd808 Frankly 2 weeks ago I've found workaround for djvutoxml - I copied binary and it's libraries from Jessie installation to python venv bin folder, and then alternated binary with patchelf so it will look for shared libraries in new destination, and this works for me now.
But it's a really duct-tape solution, mundane to keep updated and secure in future, plus I believe there are at least half a dozen tools using djvutoxml, so it's something which other people will need when they'll migrate to Kubernetes.

As far as I understand in long term we plan to move from Grid Engine to K8s. If that's the case I may try to stick with K8s, make same trick to pdftotext utility, and when things will be sorted out, I'll just have to delete those modified files, and app will automatically switch to system wide binaries.

bd808 added a comment.Jun 7 2017, 3:57 PM

As far as I understand in long term we plan to move from Grid Engine to K8s. If that's the case I may try to stick with K8s, make same trick to pdftotext utility, and when things will be sorted out, I'll just have to delete those modified files, and app will automatically switch to system wide binaries.

Kubernetes is certainly hoped to be the more stable and full-featured replacement for grid engine. We are at an awkward phase of the project today however due mostly to a lack of staff to work on proper solutions for the new platform. The current webservice --backend=kubernetes system is a reasonable match for the use case of a tool that runs a single web process and can use language specific library management (composer for PHP, pip + virtualenv for Python, etc) to get all the things it needs. The Docker containers we are making available are limited to a single language runtime and install a very minimal set of additional libraries and binaries. We have about 230 tools using Kubernetes today which is about 1/3rd of the tools that probably could move over with minimal changes to the tool's code.

A year ago we thought that we would be able to create a giant Docker container that replicated the 'everything for everyone' setup that the grid engine hosts have been using. About six months ago @yuvipanda finally got around to trying to build such a container (T152089). The outcome there is not well documented, but it failed badly. Since building a giant monolith container is now off of the table as an intermediate solution, we need to rethink the near term plan for Kubernetes. The hoped for long term solution is still to select a FLOSS Platform as a Service solution that works with Kubernetes as a backend (T136264).

Unfortunately, the cloud-services-team does not currently have the resources to work on the PaaS project for the foreseeable future. We are currently down one Operations Engineer (we are hiring!) and we plan to add one more person to the team in the upcoming Wikimedia Foundation fiscal year (July 2017-June 2018). Adding more people will help with some of our issues, but we are planning to focus on paying down some very long term technical debt in our OpenStack environment and doing more outreach to our current and future customers in the coming year's plan. There is more detail on this in the Foundation annual plan draft on meta and our team tracking board for annual plan work.

Without the time and people needed to evaluate, select, and deploy a proper PaaS layer over Kubernetes, we are left with a couple of choices. We can add new apt managed packages to the Docker containers reactively as people request them on tickets like this one. Each time we do that we are a step closer to the currently unknown tipping point of having an image that is too big to manage effectively. Alternately we can suggest that tools which have needs that cannot be met by the current Docker containers stick with grid engine as their deployment runtime. That path doesn't help the maintainers of those tools move to the more robust Kubernetes platform, but it also doesn't pile on more tech debt for the cloud-services-team to unwind in the future.

The ideal mid-term solution would be for a small group of volunteers who are very interested in seeing Kubernetes be more widely used to become involved in moving things forward. There are very few things that need to be done in the evaluation process that can only be handled by paid staff. The process of creating the list of features that should be compared in PaaS solutions (T136265) and testing products against those criteria are things that could largely be done by volunteers with a bit of support from the cloud-services-team. With a surge of support for this work from our community, I would be willing to change the team's priorities and goals to find resources to assist in making Kubernetes a more general solution sooner rather than later. Without that kind of infusion of energy and resources however, a better Kubernetes PaaS is going to have to wait until we finish other projects that have been determined to be more critical.

@Xelgen If you desire you may compile the two utilities locally for your tool, instead of copying and altering the binaries themselves. I tested them and pasted the logs (much of the irrelevant outputs removed):

1tools.zhuyifei1999-test@tools-bastion-02:~$ wget ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.04.tar.gz
2--2017-06-09 19:03:27-- ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.04.tar.gz
3 => ‘xpdf-3.04.tar.gz’
4Resolving ftp.foolabs.com (ftp.foolabs.com)... 50.0.186.21
5Connecting to ftp.foolabs.com (ftp.foolabs.com)|50.0.186.21|:21... connected.
6Logging in as anonymous ... Logged in!
7==> SYST ... done. ==> PWD ... done.
8==> TYPE I ... done. ==> CWD (1) /pub/xpdf ... done.
9==> SIZE xpdf-3.04.tar.gz ... 825519
10==> PASV ... done. ==> RETR xpdf-3.04.tar.gz ... done.
11Length: 825519 (806K) (unauthoritative)
12
13100%[=====================================================================================================>] 825,519 95.7KB/s in 7.6s
14
152017-06-09 19:03:36 (107 KB/s) - ‘xpdf-3.04.tar.gz’ saved [825519]
16
17tools.zhuyifei1999-test@tools-bastion-02:~$ tar -xzf xpdf-3.04.tar.gz
18tools.zhuyifei1999-test@tools-bastion-02:~$ wget http://downloads.sourceforge.net/djvu/djvulibre-3.5.27.tar.gz
19--2017-06-09 19:06:32-- http://downloads.sourceforge.net/djvu/djvulibre-3.5.27.tar.gz
20Resolving downloads.sourceforge.net (downloads.sourceforge.net)... 216.34.181.59
21Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|216.34.181.59|:80... connected.
22HTTP request sent, awaiting response... 301 Moved Permanently
23Location: http://downloads.sourceforge.net/project/djvu/DjVuLibre/3.5.27/djvulibre-3.5.27.tar.gz [following]
24--2017-06-09 19:06:33-- http://downloads.sourceforge.net/project/djvu/DjVuLibre/3.5.27/djvulibre-3.5.27.tar.gz
25Connecting to downloads.sourceforge.net (downloads.sourceforge.net)|216.34.181.59|:80... connected.
26HTTP request sent, awaiting response... 302 Found
27Location: https://iweb.dl.sourceforge.net/project/djvu/DjVuLibre/3.5.27/djvulibre-3.5.27.tar.gz [following]
28--2017-06-09 19:06:33-- https://iweb.dl.sourceforge.net/project/djvu/DjVuLibre/3.5.27/djvulibre-3.5.27.tar.gz
29Resolving iweb.dl.sourceforge.net (iweb.dl.sourceforge.net)... 192.175.120.182, 2607:f748:10:12::5f:2
30Connecting to iweb.dl.sourceforge.net (iweb.dl.sourceforge.net)|192.175.120.182|:443... connected.
31HTTP request sent, awaiting response... 200 OK
32Length: 3648522 (3.5M) [application/x-gzip]
33Saving to: ‘djvulibre-3.5.27.tar.gz’
34
35100%[=====================================================================================================>] 3,648,522 10.9MB/s in 0.3s
36
372017-06-09 19:06:34 (10.9 MB/s) - ‘djvulibre-3.5.27.tar.gz’ saved [3648522/3648522]
38
39tools.zhuyifei1999-test@tools-bastion-02:~$ tar -xzf djvulibre-3.5.27.tar.gz
40tools.zhuyifei1999-test@tools-bastion-02:~$ webservice --backend=kubernetes python2 shell
41If you don't see a command prompt, try pressing enter.
42tools.zhuyifei1999-test@interactive:~$
43tools.zhuyifei1999-test@interactive:~$ ls -F
44djvulibre-3.5.27/ djvulibre-3.5.27.tar.gz logs/ replica.my.cnf xpdf-3.04/ xpdf-3.04.tar.gz
45tools.zhuyifei1999-test@interactive:~$ export PATH=~/.local/bin:$PATH CPATH=~/.local/include LIBRARY_PATH=~/.local/lib PKG_CONFIG_PATH=~/.local/lib/pkgconfig
46tools.zhuyifei1999-test@interactive:~$ cd djvulibre-3.5.27
47tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/djvulibre-3.5.27$ ./configure --prefix=${HOME}/.local
48[...]
49tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/djvulibre-3.5.27$ make
50[...]
51tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/djvulibre-3.5.27$ make install
52[...]
53tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/djvulibre-3.5.27$ whereis djvutoxml
54djvutoxml: /data/project/zhuyifei1999-test/.local/bin/djvutoxml
55tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/djvulibre-3.5.27$ djvutoxml
56[1-18202] djvutoxml: unspecified input file name.
57Usage: djvutoxml [options] <inputfile> <outputfile>
58Options:
59 --with[out]-anno
60 --with[out]-text
61 --page p
62tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/djvulibre-3.5.27$ cd ../xpdf-3.04
63tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/xpdf-3.04$ ./configure --prefix=${HOME}/.local
64[...]
65configure: WARNING: Couldn't find X
66configure: WARNING: Couldn't find Motif
67configure: WARNING: Couldn't find FreeType
68configure: WARNING: -- You will be able to compile pdftops, pdftotext,
69 pdfinfo, pdffonts, pdfdetach, and pdfimages, but not xpdf
70 or pdftoppm
71configure: WARNING: Couldn't find libpng -- you will not be able to build pdftohtml or pdftopng
72tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/xpdf-3.04$ make
73[...]
74g++ -g -O2 -DHAVE_CONFIG_H -I.. -I./.. -I./../goo -I./../fofi -I./../splash -I. -c TextOutputDev.cc
75g++ -g -O2 -DHAVE_CONFIG_H -I.. -I./.. -I./../goo -I./../fofi -I./../splash -I. -c UnicodeTypeTable.cc
76g++ -g -O2 -DHAVE_CONFIG_H -I.. -I./.. -I./../goo -I./../fofi -I./../splash -I. -c pdftotext.cc
77g++ -g -O2 -DHAVE_CONFIG_H -I.. -I./.. -I./../goo -I./../fofi -I./../splash -I. -o pdftotext AcroForm.o Annot.o Array.o BuiltinFont.o BuiltinFontTables.o Catalog.o CharCodeToUnicode.o CMap.o Decrypt.o Dict.o Error.o FontEncodingTables.o Form.o Function.o Gfx.o GfxFont.o GfxState.o GlobalParams.o JArithmeticDecoder.o JBIG2Stream.o JPXStream.o Lexer.o Link.o NameToCharCode.o Object.o OptionalContent.o Outline.o OutputDev.o Page.o Parser.o PDFDoc.o PDFDocEncoding.o PSTokenizer.o SecurityHandler.o Stream.o TextOutputDev.o TextString.o UnicodeMap.o UnicodeTypeTable.o XFAForm.o XpdfPluginAPI.o XRef.o Zoox.o pdftotext.o \
78 -L../goo -lGoo -L../fofi -lfofi -L../goo -lGoo -lm
79g++ -g -O2 -DHAVE_CONFIG_H -I.. -I./.. -I./../goo -I./../fofi -I./../splash -I. -c HTMLGen.cc
80HTMLGen.cc:30:17: fatal error: png.h: No such file or directory
81 #include <png.h>
82 ^
83compilation terminated.
84Makefile:48: recipe for target 'HTMLGen.o' failed
85make[1]: *** [HTMLGen.o] Error 1
86make[1]: Leaving directory '/data/project/zhuyifei1999-test/xpdf-3.04/xpdf'
87Makefile:24: recipe for target 'all' failed
88make: *** [all] Error 2
89tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/xpdf-3.04$ make pdftotext
90cd goo; make
91make[1]: Entering directory '/data/project/zhuyifei1999-test/xpdf-3.04/goo'
92make[1]: Nothing to be done for 'all'.
93make[1]: Leaving directory '/data/project/zhuyifei1999-test/xpdf-3.04/goo'
94cd fofi; make
95make[1]: Entering directory '/data/project/zhuyifei1999-test/xpdf-3.04/fofi'
96make[1]: Nothing to be done for 'all'.
97make[1]: Leaving directory '/data/project/zhuyifei1999-test/xpdf-3.04/fofi'
98cd splash; make
99make[1]: Entering directory '/data/project/zhuyifei1999-test/xpdf-3.04/splash'
100make[1]: Nothing to be done for 'all'.
101make[1]: Leaving directory '/data/project/zhuyifei1999-test/xpdf-3.04/splash'
102cd xpdf; make pdftotext
103make[1]: Entering directory '/data/project/zhuyifei1999-test/xpdf-3.04/xpdf'
104make[1]: 'pdftotext' is up to date.
105make[1]: Leaving directory '/data/project/zhuyifei1999-test/xpdf-3.04/xpdf'
106tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/xpdf-3.04$ make install
107mkdir -p /data/project/zhuyifei1999-test//.local/bin
108/usr/bin/install -c xpdf/pdftops /data/project/zhuyifei1999-test//.local/bin/pdftops
109/usr/bin/install -c xpdf/pdftotext /data/project/zhuyifei1999-test//.local/bin/pdftotext
110/usr/bin/install -c xpdf/pdfinfo /data/project/zhuyifei1999-test//.local/bin/pdfinfo
111/usr/bin/install: cannot stat ‘xpdf/pdfinfo’: No such file or directory
112Makefile:85: recipe for target 'install' failed
113make: *** [install] Error 1
114tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/xpdf-3.04$ whereis pdftotext
115pdftotext: /data/project/zhuyifei1999-test/.local/bin/pdftotext
116tools.zhuyifei1999-test@interactive:/data/project/zhuyifei1999-test/xpdf-3.04$ pdftotext
117pdftotext version 3.04
118Copyright 1996-2014 Glyph & Cog, LLC
119Usage: pdftotext [options] <PDF-file> [<text-file>]
120 -f <int> : first page to convert
121 -l <int> : last page to convert
122 -layout : maintain original physical layout
123 -table : similar to -layout, but optimized for tables
124 -lineprinter : use strict fixed-pitch/height layout
125 -raw : keep strings in content stream order
126 -fixed <fp> : assume fixed-pitch (or tabular) text
127 -linespacing <fp> : fixed line spacing for LinePrinter mode
128 -clip : separate clipped text
129 -enc <string> : output text encoding name
130 -eol <string> : output end-of-line convention (unix, dos, or mac)
131 -nopgbrk : don't insert page breaks between pages
132 -opw <string> : owner password (for encrypted files)
133 -upw <string> : user password (for encrypted files)
134 -q : don't print any messages or errors
135 -cfg <string> : configuration file to use in place of .xpdfrc
136 -v : print copyright and version info
137 -h : print usage information
138 -help : print usage information
139 --help : print usage information
140 -? : print usage information

Some of the downsides would be:

  • every time the binaries are called, or other compilation is needed, a ton of environment variables must be set for them to load/compile correctly (PATH=~/.local/bin:$PATH CPATH=~/.local/include LIBRARY_PATH=~/.local/lib PKG_CONFIG_PATH=~/.local/lib/pkgconfig).
  • no automatic upgrades via apt, although you might be able to make a script to do so by running the commands to download, compile, & install.
  • every time the system have a major upgrade (eg. jessie => stretch) backwards-compatibility may be broken and the programs may need to be recompiled.

PS: This is the fourth time I've done such local compiling/installing on tool labs. Most C/C++/Assembly programs, including larger programs like FFmpeg (which I compiled for grid for the embeddeddata bot), should work fine when installed this way.

I just remembered that Gentoo has a package manager, Portage, that installs packages by compiling them locally. If we would not allow users to define apt installs (because it requires root), would emerge be a solution?

I just remembered that Gentoo has a package manager, Portage, that installs packages by compiling them locally. If we would not allow users to define apt installs (because it requires root), would emerge be a solution?

If there is a non-root emerge equivalent for Debian Jessie, yes. Another option would be to install something like GNU stow in the images to at least make custom compilation a bit easier. The real solution however is T136264: Evaluate Kubernetes based workflow replacement options for SGE and ensuring that the selected solution supports some reasonable mechanism for building containers with custom software packages included. An example of this in other PaaS products is heroku-buildpack-apt.

bd808 edited projects, added Kubernetes; removed Cloud-Services.Jul 28 2017, 11:01 PM
Krinkle removed a subscriber: Krinkle.Jul 29 2017, 1:12 AM
bd808 changed the task status from Open to Stalled.Aug 25 2017, 10:31 PM