Page MenuHomePhabricator

Install enchant on stat1002 and stat1003
Closed, ResolvedPublic

Description

In order to build misspelled language features for machine learning and natural language processing on the stat machines, I need enchant and the available dictionaries installed.

sudo apt-get install enchant myspell-*

You might get an error for myspell-* because apt will select packages that don't exist (happened on my local 14.04 install). In that case, please install all available dictionaries. Here's what I see:

myspell-af              myspell-es              myspell-lt              myspell-st
myspell-bg              myspell-et              myspell-lv              myspell-sv-se
myspell-ca              myspell-fa              myspell-nb              myspell-sw
myspell-cs              myspell-fo              myspell-nl              myspell-th
myspell-da              myspell-fr              myspell-nn              myspell-tl
myspell-de-at           myspell-fr-gut          myspell-nr              myspell-tn
myspell-de-ch           myspell-ga              myspell-ns              myspell-tools
myspell-de-de           myspell-gd              myspell-pl              myspell-ts
myspell-de-de-oldspell  myspell-gv              myspell-pt              myspell-uk
myspell-el-gr           myspell-he              myspell-pt-br           myspell-ve
myspell-en-au           myspell-hr              myspell-pt-pt           myspell-xh
myspell-en-gb           myspell-hu              myspell-ru              myspell-zu
myspell-en-us           myspell-hy              myspell-sk              
myspell-en-za           myspell-it              myspell-sl              
myspell-eo              myspell-ku              myspell-ss

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak added a subscriber: Halfak.

We would install packages via puppet, not manually. So this needs a puppet code change with a list of all the package names.

There is no common role that is used on both stat1002 and stat1003. So i'm wondering where this should be added. Is it definitely needed on both of them?

stat1003 is only a "statistics::cruncher" while stat1002 is "statistics::private", "analytics::refinery", "analytics::clients" "analytics::rsyncd" "analytics::refinery:*.. "

Which of these does enchant belong to the most?

Change 210846 had a related patch set uploaded (by Dzahn):
add class to install enchant and myspell packages

https://gerrit.wikimedia.org/r/210846

Change 210847 had a related patch set uploaded (by Dzahn):
stats: incl enchant role in statistics::cruncher

https://gerrit.wikimedia.org/r/210847

@Halfak here's the list i get on stat1002:

https://gerrit.wikimedia.org/r/#/c/210846/1/modules/statistics/manifests/enchant.pp

including that in the "statistics::cruncher" role would only install them on stat1002 though not stat1003 so far

I was planning to use them on stat1003, but stat1002 fills a similar use-case for me. It's hard for me to answer this question, but if I were to choose one machine over the other, I'd choose stat1003 since I generally use that for machine learning on public data (my current project) more often.

Might it make sense to create a task for a role common to stat machines? It seems that this is a common problem.

For the moment, I'm a bit confused about what you're saying about the role statistics::cruncher. In a previous comment, you said that 'stat1003 is only a "statistics::cruncher"', but in the last comment, you said that 'including that in the "statistics::cruncher" role would only install them on stat1002 though not stat1003'. Are those statements incompatible?

Might it make sense to create a task for a role common to stat machines? It seems that this is a common problem.

Yes it is, i agree. We can just make a new role and apply it to both machines. Let's just find a meaningful name for it.

For the moment, I'm a bit confused about what you're saying about the role statistics::cruncher. In a previous comment, you said that 'stat1003 is only a "statistics::cruncher"', but in the last comment, you said that 'including that in the "statistics::cruncher" role would only install them on stat1002 though not stat1003'. Are those statements incompatible?

They are incompatible. My bad, sorry, correct is:

stat1002.eqiad.wmnet has all these roles:

role statistics::private
role::analytics::refinery
role::analytics::clients
role::analytics::rsyncd
role::analytics::refinery::data::check::email
role::analytics::refinery::guard
role::analytics::password::research

stat1003.eqiad.wmnet just has this one role:

role statistics::cruncher

And my patch would add it to the statistics::cruncher role.

Let me change that so that we have a single new role called "enchant" (reasoanble?) that we can apply to both nodes.

I wonder if statistics::cruncher would be a good common role between machines. To me, it sounds like an appropriate name for use-cases common to both of these machines and the machines we have planned to add (new "stats" / "computing" machines in 2015Q1)

If we would put the existing statistics::cruncher role on stat1002 now we would effectively change that server and install all the things that role installs that it has not had before.

The description of that role is "'Statistics general compute node (non private data)'" so i suppose it was for a reason that stat1003 does not have that (because it has private data?).

I didn't set this up myself but i assume it was in purpose that stat1002 and stat1003 are different in that way. (@Ottomata ?)

It seems easier to me for right now to have a new role specifically for "enchant/myspell" and include it on both nodes, without changing other existing roles.

Note that both role::statistics::cruncher and role::statistics::private include the module class statistics::compute.

I would just ensure these packages are installed directly in the statistics::compute class, along with all the other various hodge podge of packages that are being installed on statistics nodes.

Change 210847 merged by Dzahn:
statistics: add spell checker packages

https://gerrit.wikimedia.org/r/210847

@Ottomata thanks, added to statistics::compute. also, used "ensure_packages" like for the other existing ones

@Halfak done, the packages have been installed on both stat1002 and stat1003, you can see the full list in the gerrit link below.


https://gerrit.wikimedia.org/r/#/c/210847/9/modules/statistics/manifests/compute.pp

puppet on stat1002 said:

Notice: /Stage[main]/Statistics::Compute/Package[enchant]/ensure: ensure changed 'purged' to 'present'
Notice: /Stage[main]/Statistics::Compute/Package[myspell-ga]/ensure: ensure changed 'purged' to 'present'
Notice: /Stage[main]/Statistics::Compute/Package[myspell-uk]/ensure: ensure changed 'purged' to 'present'
Notice: /Stage[main]/Statistics::Compute/Package[myspell-ca]/ensure: ensure changed 'purged' to 'present'
Notice: /Stage[main]/Statistics::Compute/Package[myspell-el-gr]/ensure: ensure changed 'purged' to 'present'
Notice: /Stage[main]/Statistics::Compute/Package[myspell-af]/ensure: ensure changed 'purged' to 'present'
... etc..

Change 210846 abandoned by Dzahn:
add class to install enchant and myspell packages

Reason:
superseded by https://gerrit.wikimedia.org/r/#/c/210847/

https://gerrit.wikimedia.org/r/210846

Change 258041 had a related patch set uploaded (by Dzahn):
statistics: remove package myspell-de-de-oldspell

https://gerrit.wikimedia.org/r/258041

Change 258041 merged by Dzahn:
statistics: remove package myspell-de-de-oldspell

https://gerrit.wikimedia.org/r/258041