Page MenuHomePhabricator

Remove hack from ML's blubber files
Closed, ResolvedPublic

Description

In all our Blubber files we have the following:

builder:
  # FIXME: path hack - see: https://phabricator.wikimedia.org/T267685
  command: ["PYTHONPATH=/opt/lib/python/site-packages", "python3", "-m",
  "nltk.downloader", "omw", "sentiwordnet", "stopwords", "wordnet"]

The FIXME points to a task that has been resolved, the fix should be (IIUC):

runs:
  environment:
    NLTK_DATA: '/home/somebody/nltk_data'
builder:
  command: [ "PYTHONPATH=/opt/lib/python/site-packages", "python3.7", "-m", "nltk.downloader", "omw", "sentiwordnet", "stopwords", "wordnet" ]

See https://gerrit.wikimedia.org/r/plugins/gitiles/research/mwaddlink/+/refs/heads/main/.pipeline/blubber.yaml#6

Event Timeline

We discussed this task during the team meeting, and we are going to split the work in this way:

  • @isarantopoulos will check if we can fix the new revscoring single Docker image directly, so that we'll be able to deprecate the old ones containing the hack.
  • @achou will check if we need nltk for outlink and fix the transformer image accordingly.

this seems to work!

builder:
  command: ["python3.7", "-m", "nltk.downloader", "omw", "sentiwordnet", "stopwords", "wordnet"]

since there is only one version of python3 installed we can use python3 instead of python 3.7
I built the revscoring image and tested it. the NLTK_DATA env var is reduntant since this it is set to /home/user/nltk_data as default.

Change 868131 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] outlink: fix mwapi session host headers

https://gerrit.wikimedia.org/r/868131

Change 868131 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] outlink: fix mwapi session host headers

https://gerrit.wikimedia.org/r/868131

Done! Thanks Aiko and Ilias :)