Labels get out of order. It's hard to review changes.
Note the order of labels reported in the following example:
```
$ revscoring model_info models/enwiki.nettrom_wp10.gradient_boosting.model
Model Information:
- type: GradientBoosting
- version: 0.8.1
...
- python_version: '3.5.3'
- release: '4.9.0-9-amd64'
Statistics:
counts (n=32400):
label n ~Stub ~Start ~C ~B ~GA ~FA
------- ---- --- ------- -------- ---- ---- ----- -----
'Stub' 5477 --> 4635 803 26 12 1 0
'Start' 5469 --> 704 3498 857 339 70 1
'C' 5479 --> 75 987 2712 1028 584 93
'B' 5484 --> 40 664 1379 2155 894 352
'GA' 5495 --> 3 42 331 329 3509 1281
'FA' 4996 --> 1 2 23 232 930 3808
rates:
'Stub' 'Start' 'C' 'B' 'GA' 'FA'
---------- -------- --------- ----- ----- ------ ------
sample 0.169 0.169 0.169 0.169 0.17 0.154
population 0.576 0.322 0.054 0.035 0.01 0.003
match_rate (micro=0.386, macro=0.189):
GA FA Stub Start B C
----- ----- ------ ------- ----- -----
0.097 0.065 0.501 0.269 0.083 0.119
```
In this case, "counts" and "rates" have the orders in the correct label, but the label order gets shuffled for "match_rate".
We have an ordered array of labels that we can work with. See https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/statistics/classification/classification.py#L27
Here's where we format the block you see for "match_rate" in the example: https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/statistics/classification/micro_macro_stats.py#L50
It looks like we lose ordering [here](https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/statistics/classification/micro_macro_stats.py#L47). Maybe we could use an OrderedDict instead. The `stats` variable is an OrderedDict, so it seems like we can trust the ordering that comes from here: https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/statistics/classification/classification.py#L91.