Page MenuHomePhabricator

Consistent ordering in model_info files
Open, NormalPublic

Description

Labels get out of order. It's hard to review changes.

Note the order of labels reported in the following example:

$ revscoring model_info models/enwiki.nettrom_wp10.gradient_boosting.model 
Model Information:
	 - type: GradientBoosting
	 - version: 0.8.1
	 ...
	 - python_version: '3.5.3'
	 - release: '4.9.0-9-amd64'
	
	Statistics:
	counts (n=32400):
		label       n         ~Stub    ~Start    ~C    ~B    ~GA    ~FA
		-------  ----  ---  -------  --------  ----  ----  -----  -----
		'Stub'   5477  -->     4635       803    26    12      1      0
		'Start'  5469  -->      704      3498   857   339     70      1
		'C'      5479  -->       75       987  2712  1028    584     93
		'B'      5484  -->       40       664  1379  2155    894    352
		'GA'     5495  -->        3        42   331   329   3509   1281
		'FA'     4996  -->        1         2    23   232    930   3808
	rates:
		              'Stub'    'Start'    'C'    'B'    'GA'    'FA'
		----------  --------  ---------  -----  -----  ------  ------
		sample         0.169      0.169  0.169  0.169    0.17   0.154
		population     0.576      0.322  0.054  0.035    0.01   0.003
	match_rate (micro=0.386, macro=0.189):
		   GA     FA    Stub    Start      B      C
		-----  -----  ------  -------  -----  -----
		0.097  0.065   0.501    0.269  0.083  0.119

In this case, "counts" and "rates" have the orders in the correct label, but the label order gets shuffled for "match_rate".

We have an ordered array of labels that we can work with. See https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/statistics/classification/classification.py#L27

Here's where we format the block you see for "match_rate" in the example: https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/statistics/classification/micro_macro_stats.py#L50

It looks like we lose ordering here. Maybe we could use an OrderedDict instead. The stats variable is an OrderedDict, so it seems like we can trust the ordering that comes from here: https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/statistics/classification/classification.py#L91.

Event Timeline

Halfak created this task.Jul 31 2019, 4:27 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJul 31 2019, 4:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Halfak updated the task description. (Show Details)Aug 8 2019, 9:55 PM
Halfak moved this task from Untriaged to Maintenance/cleanup on the Scoring-platform-team board.
Halfak triaged this task as Normal priority.Wed, Sep 11, 9:14 PM
Halfak added a project: good first bug.