Page MenuHomePhabricator

Update Analysis Analysis Tools (July 2018)
Closed, ResolvedPublic

Description

While reviewing the Esperanto morphological libraries (T197240), I found a few errors in my analysis code that conflated post-analysis types and pre-analysis types, which led to some obviously absurd stats on very small numbers (like 137% of types being affected by a change). Previous analysis percentages were thus incorrect, but probably not by too much in most cases.

The primary goal is to fix that counting error, though lots of other little fixes and improvements will come along for the ride.

Event Timeline

Change 444130 had a related patch set uploaded (by Tjones; owner: Tjones):
[wikimedia/discovery/relevanceForge@master] Update Analysis Analysis Tools

https://gerrit.wikimedia.org/r/444130

Important changes in the patch (more complete list in the commit message on Gerrit):

compare_counts.pl:

  • fixed conflation of pre-analysis and post-analysis types in collision and split stats
  • separated counting of lost/gained tokens for unchanged types
  • improved group loss/gain categories in "Changed Groups", adding "mixed" for the messy ones

README.md:

  • rewrote collisions/splits and add token count gains/losses to reflect updates
  • updated analysis example based on new output
  • added command line example for Polish

Other stuff:

  • regenerated sample comparison outputs
  • added Polish self analysis output

Change 444130 merged by jenkins-bot:
[wikimedia/discovery/relevanceForge@master] Update Analysis Analysis Tools

https://gerrit.wikimedia.org/r/444130