Page MenuHomePhabricator

Document how to add library support in PAWS
Closed, ResolvedPublic

Description

Add documentation on using pip to install python libraries in PAWS notebooks here https://wikitech.wikimedia.org/wiki/PAWS and perhaps also here https://www.mediawiki.org/wiki/PAWS

Original description:

Hello,

So I am starting to learn tensorflow, and thought I could use PAWS to run my scripts, all the more as I would like to see what I can do with this tool for Wikimedia projects.

But so far, even the most basic script trying to import the required libraries fails for me.

Is there a way to install a library, like saying to jupyter "pip install tensorflow", or any other way?

Maybe I missed it, but I didn't find explanations on that topic in the PAWS documentation.

Cheers

Event Timeline

Chicocvenancio moved this task from Backlog to Good first tasks on the PAWS board.
Chicocvenancio subscribed.

This specific request can be solved by adding a !pip install tensorflow at the beginning of the notebook.
But yeah, we need to document this.

Actually, while the command does try to install the missing module, it fails to successfully do so:

Building wheels for collected packages: gast, termcolor, absl-py
  Running setup.py bdist_wheel for gast ... error
  Complete output from command /srv/paws/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-vfle1knv/gast/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpso9dvo1ypip-wheel- --python-tag cp36:
  usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: -c --help [cmd1 cmd2 ...]
     or: -c --help-commands
     or: -c cmd --help
  
  error: invalid command 'bdist_wheel'
  
  ----------------------------------------
  Failed building wheel for gast
  Running setup.py clean for gast
  Running setup.py bdist_wheel for termcolor ... error
  Complete output from command /srv/paws/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-vfle1knv/termcolor/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmp41_9n52bpip-wheel- --python-tag cp36:
  usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: -c --help [cmd1 cmd2 ...]
     or: -c --help-commands
     or: -c cmd --help
  
  error: invalid command 'bdist_wheel'
  
  ----------------------------------------
  Failed building wheel for termcolor
  Running setup.py clean for termcolor
  Running setup.py bdist_wheel for absl-py ... error
  Complete output from command /srv/paws/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-vfle1knv/absl-py/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmp5q76k_wrpip-wheel- --python-tag cp36:
  /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
    warnings.warn(msg)
  usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: -c --help [cmd1 cmd2 ...]
     or: -c --help-commands
     or: -c cmd --help
  
  error: invalid command 'bdist_wheel'
  
  ----------------------------------------
  Failed building wheel for absl-py
  Running setup.py clean for absl-py
Failed to build gast termcolor absl-py
Installing collected packages: wheel, gast, h5py, keras-applications, termcolor, keras-preprocessing, absl-py, pbr, mock, tensorflow-estimator, protobuf, markdown, grpcio, tensorboard, astor, tensorflow
  Running setup.py install for gast ... done
  Running setup.py install for termcolor ... done
  Running setup.py install for absl-py ... done
Successfully installed absl-py-0.7.1 astor-0.7.1 gast-0.2.2 grpcio-1.19.0 h5py-2.9.0 keras-applications-1.0.7 keras-preprocessing-1.0.9 markdown-3.1 mock-2.0.0 pbr-5.1.3 protobuf-3.7.1 tensorboard-1.13.1 tensorflow-1.13.1 tensorflow-estimator-1.13.0 termcolor-1.1.0 wheel-0.33.1

Is there something I can do about it @Chicocvenancio ?

Oups, my bad, actually, despite the error message, it works. I guess that the error are related to some admin (wheel) specific stuff that aren't really required to make a module installed in a functional manner.

Thank you again!

@Psychoslave yeah, that's just telling you it could not install some of the requirements by one method and tried others successfully. Mind that you will have to run the cell again if/when your PAWS server is restarted.

Thank you for the feedback @Chicocvenancio.

Almost out of topic but I thought I might document that on the wiki. Maybe I already have a developer account, but I didn't found the relevant password nor a way to get back a reset quickly if that the case. I saw that there was some documentation on the topic, but didn't have time yet to go through it. Hopefully I can find a bit of time for that this week-end, although I would rather focus on raising my skills on tensorflow and more, so any guidance that would help me to go quicker on that account setup/reset would be very welcome.

I can help you set up a developer account on another task or telegram, if you wish. But the preferred wiki for documentation would be mediawiki.org. PAWS is the first line of contact for a lot of Wikimedia developers and mediawiki is a friendlier and more visible alternative.

By the way @Chicocvenancio, what about making most popular module preinstalled?

I don't know how popular tensorflow is among PAWS users, but surely they are module that are popular and that would be convenient to have installed out of the box.

By the way @Chicocvenancio, what about making most popular module preinstalled?

I don't know how popular tensorflow is among PAWS users, but surely they are module that are popular and that would be convenient to have installed out of the box.

Its a trade off, every new package pre-installed into a server means a larger startup time and greater resource usage for all servers. In general I would want to use the minimum number of packages, while leaving novices able to do their work without an added step. Tensorflow, in my view, does not add sufficient value for novice users to offset the added usage for every server startup.

I could be convinced otherwise with more users calling for it to be installed.

srishakatux changed the visibility from "Public (No Login Required)" to "acl*outreachy-mentors (Project)".Feb 26 2020, 9:18 PM
srishakatux subscribed.

(we'll open this task on March 5th - when the contribution period opens)

srishakatux changed the visibility from "acl*outreachy-mentors (Project)" to "Public (No Login Required)".Mar 5 2020, 6:24 PM

Hi everyone! I'm Karma Dolkar, a sophomore at the Indian Institute of Technology, Roorkee. I'm an Outreachy 2020 applicant and would like to take this issue up. Where exactly do I need to mention !pip install tensorflow ? Can anyone please guide me?

@Karma2902 hello and welcome! In the task description it is clear:

Add documentation on using pip to install python libraries in PAWS notebooks here https://wikitech.wikimedia.org/wiki/PAWS and perhaps also here https://www.mediawiki.org/wiki/PAWS

Hi, I'm Diksha, an outreachy applicant. Can I take this task? And May I know how should I create my own sandbox page on MediaWiki so that I can make all the changes in that page only.

Hello @Dikshagupta99! I see that @Karma2902 has already started on this one, though they did not claim it (assign it) to themselves. It might be possible to work together a bit on it, but you may want to check out the other tasks as well.

As for a sandbox page, the best place to work on that is under your Wikitech user page, presuming you've created an account there. For example, https://wikitech.wikimedia.org/wiki/User:Bstorm/sandbox is my own sandbox under my own user page.

@Karma2902, thanks! That's right information, but I think it is in the wrong location. It looks like the best place would be in a new sub-page under https://wikitech.wikimedia.org/wiki/PAWS#Documentation where we should start general user documentation with this detail. There isn't much there right now.

@Karma2902 thanks again for the work. If it is just that short, that can be entered directly in https://wikitech.wikimedia.org/wiki/PAWS#Documentation under Use without a subpage. Then we can delete the subpage.

Is there anything more that needs to be done? Please let me know. :) @Bstorm

By the way @Chicocvenancio, what about making most popular module preinstalled?

I don't know how popular tensorflow is among PAWS users, but surely they are module that are popular and that would be convenient to have installed out of the box.

Its a trade off, every new package pre-installed into a server means a larger startup time and greater resource usage for all servers. In general I would want to use the minimum number of packages, while leaving novices able to do their work without an added step. Tensorflow, in my view, does not add sufficient value for novice users to offset the added usage for every server startup.

@Chicocvenancio Revisiting this thread and question for a follow-up question and perhaps some clarification on future direction. I agree that Tensorflow, while quite popular for AI/ML, is likely not a common package for the typical Wikimedia user doing Wikimedia things. The container size and startup time have to be capped at some point. Since Tensorflow is quite heavily CPU bound, a long-term question might be what scale of computing task is appropriate for the generic PAWS/notebook container, and what is starting to be out of scope for the generic configuration we have? What are the parameters for determining the appropriate use of PAWS, and when do we start considering things beyond the currently supported profile?

PS: I know it has been two years but @Psychoslave the free Google CoLab Jupyter system may be useful for Tensorflow, as it even has support for using their Tensor hardware acceleration for free.

Frostly subscribed.

Looks like this isn’t completely done. I can do this.

I've created a documentation page for this purpose, and linked it from the main examples/recipies page. Cheers!