ORES articlequality for euwiki works differently in production
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Dec 5 2019, 6:51 PM

Description

It looks like we are extracting different counts of English Words

In beta (https://ores-beta.wmflabs.org/scores/euwiki/?models=articlequality&revids=7239990 ), we get:

{
  "euwiki": {
    "models": {
      "articlequality": {
        "version": "0.8.1"
      }
    },
    "scores": {
      "7239990": {
        "articlequality": {
          "features": {
            "feature.euwiki.revision.category_links": 12.0,
            "feature.euwiki.revision.cn_templates": 0.0,
            "feature.euwiki.revision.image_links": 2.0,
            "feature.euwiki.revision.infobox_templates": 0.0,
            "feature.euwiki.revision.paragraphs_without_refs_total_length": 0.0,
            "feature.len(<datasource.basque.dictionary.revision.dict_words>)": 3220.0,
            "feature.len(<datasource.english.dictionary.revision.dict_words>)": 631.0,
            "feature.len(<datasource.spanish.dictionary.revision.dict_words>)": 904.0,
            "feature.len(<datasource.wikitext.revision.words>)": 3994.0,
            "feature.wikitext.revision.chars": 37639.0,
            "feature.wikitext.revision.content_chars": 19054.0,
            "feature.wikitext.revision.external_links": 39.0,
            "feature.wikitext.revision.headings_by_level(2)": 10.0,
            "feature.wikitext.revision.headings_by_level(3)": 5.0,
            "feature.wikitext.revision.ref_tags": 79.0,
            "feature.wikitext.revision.wikilinks": 132.0
          },
          "score": {
            "prediction": "GA",
            "probability": {
              "B": 0.3166585497835499,
              "C": 0.03317708333333332,
              "FA": 0.12386093073593067,
              "GA": 0.5158198051948053,
              "Start": 0.010483630952380953,
              "Stub": 0.0
            }
          }
        }
      }
    }
  }
}

In production (https://ores.wikimedia.org/scores/euwiki/?models=articlequality&revids=7239990 ), we sometimes get something different:

{
  "euwiki": {
    "models": {
      "articlequality": {
        "version": "0.8.1"
      }
    },
    "scores": {
      "7239990": {
        "articlequality": {
          "features": {
            "feature.euwiki.revision.category_links": 12.0,
            "feature.euwiki.revision.cn_templates": 0.0,
            "feature.euwiki.revision.image_links": 2.0,
            "feature.euwiki.revision.infobox_templates": 0.0,
            "feature.euwiki.revision.paragraphs_without_refs_total_length": 0.0,
            "feature.len(<datasource.basque.dictionary.revision.dict_words>)": 3220.0,
            "feature.len(<datasource.english.dictionary.revision.dict_words>)": 563.0,
            "feature.len(<datasource.spanish.dictionary.revision.dict_words>)": 904.0,
            "feature.len(<datasource.wikitext.revision.words>)": 3994.0,
            "feature.wikitext.revision.chars": 37639.0,
            "feature.wikitext.revision.content_chars": 19054.0,
            "feature.wikitext.revision.external_links": 39.0,
            "feature.wikitext.revision.headings_by_level(2)": 10.0,
            "feature.wikitext.revision.headings_by_level(3)": 5.0,
            "feature.wikitext.revision.ref_tags": 79.0,
            "feature.wikitext.revision.wikilinks": 132.0
          },
          "score": {
            "prediction": "GA",
            "probability": {
              "B": 0.3101853354978356,
              "C": 0.03265624999999999,
              "FA": 0.12594426406926404,
              "GA": 0.5207305194805196,
              "Start": 0.010483630952380953,
              "Stub": 0.0
            }
          }
        }
      }
    }
  }
}

Other times we get this:

{
  "euwiki": {
    "models": {
      "articlequality": {
        "version": "0.8.1"
      }
    },
    "scores": {
      "7239990": {
        "articlequality": {
          "features": {
            "feature.euwiki.revision.category_links": 12.0,
            "feature.euwiki.revision.cn_templates": 0.0,
            "feature.euwiki.revision.image_links": 2.0,
            "feature.euwiki.revision.infobox_templates": 0.0,
            "feature.euwiki.revision.paragraphs_without_refs_total_length": 0.0,
            "feature.len(<datasource.basque.dictionary.revision.dict_words>)": 3220.0,
            "feature.len(<datasource.english.dictionary.revision.dict_words>)": 712.0,
            "feature.len(<datasource.spanish.dictionary.revision.dict_words>)": 904.0,
            "feature.len(<datasource.wikitext.revision.words>)": 3994.0,
            "feature.wikitext.revision.chars": 37639.0,
            "feature.wikitext.revision.content_chars": 19054.0,
            "feature.wikitext.revision.external_links": 39.0,
            "feature.wikitext.revision.headings_by_level(2)": 10.0,
            "feature.wikitext.revision.headings_by_level(3)": 5.0,
            "feature.wikitext.revision.ref_tags": 79.0,
            "feature.wikitext.revision.wikilinks": 132.0
          },
          "score": {
            "prediction": "GA",
            "probability": {
              "B": 0.31277958152958163,
              "C": 0.03202380952380952,
              "FA": 0.12197104978354974,
              "GA": 0.5208669282106781,
              "Start": 0.012358630952380953,
              "Stub": 0.0
            }
          }
        }
      }
    }
  }
}

But we also get something that looks consistent with ores-beta too. All the differences seen to be due to the count of English language words.

Details

	Subject	Repo	Branch	Lines +/-
	Standardizes English dictionaries on hunspell for English in ORES	operations/puppet	production	+5 -5

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Halfak	T240725 ORES deployment mid-Dec. 2019
Resolved	Halfak	T239942 ORES articlequality for euwiki works differently in production
Resolved	Halfak	T240724 Retrain enwiki and dewiki models with revscoring-2.6.2

Event Timeline

Halfak created this task.Dec 5 2019, 6:51 PM

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptDec 5 2019, 6:51 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@kevinbazira and I looked into this. We ran the following command to look up the installed versions of the enchant dictionaries we would like to use.

$ ssh ores1001.codfw.wmnet -y "apt-cache policy hunspell-en-*" > ores1001.output
...
$ ssh ores2001.codfw.wmnet -y "apt-cache policy hunspell-en-*" > ores2001.output
...

Then we used diff to look for differences between the output:

$ diff ores1001.output ores1002.output
...
$ diff ores2001.output ores2002.output
...
$ diff ores1001.output ores2001.output
...

No differences where reported.

Halfak triaged this task as High priority.Dec 5 2019, 6:54 PM

I confirmed that all nodes have the same git hash.

I also checked to see if any other *spell-en* dicts were installed. Turns out we have aspell-en installed on all of the hosts.

I finally was able to replicate something by running some code on each host:

$ ssh ores1002.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
563.0
$ ssh ores1003.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
563.0
$ ssh ores1004.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
563.0
$ ssh ores1005.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
631.0
$ ssh ores1006.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
563.0
$ ssh ores1007.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
631.0
$ ssh ores1008.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
712.0
$ ssh ores1009.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'from revscoring.languages import english; from revscoring.dependencies import solve; import mwapi; from revscoring.extractors import api; print(api.Extractor(mwapi.Session(\"https://eu.wikipedia.org\", user_agent=\"ahalfaker@wikimedia.org\")).extract(7239990, english.dictionary.revision.dict_words))'"
712.0

Note that repeated runs against the same host return a consistent count. At least now I know that I can compare ores1001 to ores1009 and expect to see differences *somewhere*.

Alright! I figured out how to replicate the issue between machines without using any of our code. Instead, I'm using the CLI for enchant directly. When run as I have below, the utility prints out misspelled words. So when it prints nothing, it thinks the world is a correctly spelled english word. When it prints out the word, it thinks it is misspelled.

$ ssh ores1001.eqiad.wmnet "echo 'de' | enchant -l -d en"
$ ssh ores1001.eqiad.wmnet "echo 'eta' | enchant -l -d en"
eta
$ ssh ores1009.eqiad.wmnet "echo 'de' | enchant -l -d en"
de
$ ssh ores1009.eqiad.wmnet "echo 'eta' | enchant -l -d en"

This shows that ores1001 and ores1009 disagree on what is English!

I made extra-double sure there was nothing different about what spelling packages were installed.

$ ssh ores1009.eqiad.wmnet "apt-cache policy *spell-*" > ores1009.output
$ ssh ores1001.eqiad.wmnet "apt-cache policy *spell-*" > ores1001.output
$ diff ores1001.output ores1009.output

No difference.

$ ssh ores1001.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'import enchant; print(enchant.Dict(\"en\").provider)'"
<Enchant: Myspell Provider>
$ ssh ores1009.eqiad.wmnet "/srv/deployment/ores/deploy/venv/bin/python -c 'import enchant; print(enchant.Dict(\"en\").provider)'"
<Enchant: Myspell Provider>

No difference there.

$ ssh ores1001.eqiad.wmnet "cat 'de' | strace -e open,openat enchant -l -d en" &> ores1001.strace
$ ssh ores1009.eqiad.wmnet "cat 'de' | strace -e open,openat enchant -l -d en" &> ores1009.strace
$ diff ores1001.strace ores1009.strace 
34c34
< open("/usr/lib/x86_64-linux-gnu/enchant/libenchant_ispell.so", O_RDONLY|O_CLOEXEC) = 4
---
> open("/usr/lib/x86_64-linux-gnu/enchant/libenchant_aspell.so", O_RDONLY|O_CLOEXEC) = 4
35a36
> open("/usr/lib/x86_64-linux-gnu/libaspell.so.15", O_RDONLY|O_CLOEXEC) = 4
39,41d39
< open("/usr/lib/x86_64-linux-gnu/enchant/libenchant_hspell.so", O_RDONLY|O_CLOEXEC) = 4
< open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
< open("/lib/x86_64-linux-gnu/libz.so.1", O_RDONLY|O_CLOEXEC) = 4
45c43
< open("/usr/lib/x86_64-linux-gnu/enchant/libenchant_aspell.so", O_RDONLY|O_CLOEXEC) = 4
---
> open("/usr/lib/x86_64-linux-gnu/enchant/libenchant_hspell.so", O_RDONLY|O_CLOEXEC) = 4
47c45,46
< open("/usr/lib/x86_64-linux-gnu/libaspell.so.15", O_RDONLY|O_CLOEXEC) = 4
---
> open("/lib/x86_64-linux-gnu/libz.so.1", O_RDONLY|O_CLOEXEC) = 4
> open("/usr/lib/x86_64-linux-gnu/enchant/libenchant_ispell.so", O_RDONLY|O_CLOEXEC) = 4
57,59c56,58
< open("/usr/share/hunspell/en_AU.aff", O_RDONLY) = 3
< open("/usr/share/hunspell/en_AU.dic", O_RDONLY) = 3
< open("/usr/share/hunspell/en_AU.aff", O_RDONLY) = 3
---
> open("/usr/share/hunspell/en_US.aff", O_RDONLY) = 3
> open("/usr/share/hunspell/en_US.dic", O_RDONLY) = 3
> open("/usr/share/hunspell/en_US.aff", O_RDONLY) = 3

@akosiaris, could you look at my work here and see what you think? The above commands give you a nice, minimal example of the problematic differences between some of the ORES hosts.

Platonides subscribed.Dec 5 2019, 11:24 PM

I had a quick look, I agree with the above investigation (I reproduced as well) and can add (in case it's not obvious) that for some reason libenchant seems to prefer on ores1009 aspell instead of ispell. It is become even more complicated because of hunspell usage as well, lower down strace. I dug into it for about 1 hour or so without an obvious cause/solution being revealed to me.

I wonder if there is something we can do from our end to minimize the issue in the short term.

One idea I have is to explicitly load different dictionaries within our code. E.g.,

class MultidictChecker:

  def __init__(self, dicts*):
    self.dicts = dicts
  
  def check(self, word):
    for dict in self.dicts:
      if dict.check(word):
        return True
    
    return False
try:
  import enchant
  us_dict = enchant.Dict('en_US')
  uk_dict = enchant.Dict('en_UK')
  as_dict = enchant.Dict('en_AS')
  
  multidict = MultidictChecker(us_dict, uk_dict, as_dict)
...

Change 556023 had a related patch set uploaded (by Halfak; owner: Halfak):
[operations/puppet@production] Standardizes English dictionaries on hunspell for English in ORES

https://gerrit.wikimedia.org/r/556023

gerritbot added a project: Patch-For-Review.Dec 9 2019, 6:49 PM

Bam: https://github.com/wikimedia/revscoring/pull/460

I 've gave this one more go. I am still without a solution but here's the TL;DR of the investigation

packages and versions are consistently the same on all 9 machines.
output of echo "de" | enchant -l -d en differs on ores1008 and ores1009 from the other 7 machines
same goes for output of echo "en" | enchant -l -d en
available locales and locale in use are the same on all machines
the strace of the enchant command above is variable across the 9 machines. It is the result of the different directory ordering (this is FS and machine specific and not something we can change) of /usr/lib/x86_64-linux-gnu/enchant. This is also apparent in enchant-lsmod command.
enchant has a dictionary backend ordering mechanism. It is governed by /usr/share/enchant/enchant.ordering which is consistent across all machines.

I think looking at the strace output of enchant -l -d en is after all a red herring. It shows differences in how the various backend dictionaries are being loaded, but that's fine cause of the enchant.ordering mechanism.

Digging into this a deep more with @MoritzMuehlenhoff, it's becoming clear that en differs between the 2 sets of machines. On ores1001 it is en_AU and on ores1009 it is en_US. It's not yet clear why, but it's a myspell/hunspell issue. passing -d en_US or -d en_AU makes enchant behave consistently. On the filesystem level, en does not even exist. This is not the case for e.g. es which has multiple variants, all of them symlinks on the filesystem level to es.

Something like the approach at T239942#5725060 would probably work fine.