Maniphest T190050

Train and test wp10 model for fawiki
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Mar 19 2018, 2:59 PM

Related Objects
Search...

Status	Assigned	Task
Resolved	awight	T201518 ORES deployment (Early August)
Resolved	Ladsgroup	T190050 Train and test wp10 model for fawiki
Resolved	Halfak	T174684 Article quality campaign for Persian Wikipedia

Event Timeline

Halfak created this task.Mar 19 2018, 2:59 PM

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptMar 19 2018, 2:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Halfak added a subtask: T174684: Article quality campaign for Persian Wikipedia.Mar 19 2018, 2:59 PM

Halfak assigned this task to Ladsgroup.Mar 28 2018, 2:46 PM

Halfak edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMar 28 2018, 2:46 PM

@Ladsgroup will be working on a merge_labels utility for wp10/wikilabels.

Model Information:
	 - type: GradientBoosting
	 - version: 0.6.0
	 - params: {'presort': 'auto', 'population_rates': None, 'learning_rate': 0.01, 'min_weight_fraction_leaf': 0.0, 'loss': 'deviance', 'max_depth': 7, 'labels': ['Stub', 'Start', 'C', 'B', 'GA', 'FA'], 'verbose': 0, 'init': None, 'min_samples_leaf': 1, 'max_leaf_nodes': None, 'center': True, 'scale': True, 'random_state': None, 'multilabel': False, 'n_estimators': 700, 'label_weights': None, 'subsample': 1.0, 'max_features': 'log2', 'warm_start': False, 'min_samples_split': 2}
	Environment:
	 - revscoring_version: '2.1.0'
	 - platform: 'Linux-4.9.0-6-amd64-x86_64-with-debian-9.4'
	 - machine: 'x86_64'
	 - version: '#1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02)'
	 - system: 'Linux'
	 - processor: ''
	 - python_build: ('default', 'Jan 19 2017 14:11:04')
	 - python_compiler: 'GCC 6.3.0 20170118'
	 - python_branch: ''
	 - python_implementation: 'CPython'
	 - python_revision: ''
	 - python_version: '3.5.3'
	 - release: '4.9.0-6-amd64'
	
	Statistics:
	counts (n=665):
		label      n         ~Stub    ~Start    ~C    ~B    ~GA    ~FA
		-------  ---  ---  -------  --------  ----  ----  -----  -----
		'Stub'    24  -->       14         3     0     0      6      1
		'Start'   22  -->        4         2     3     6      7      0
		'C'       26  -->        1         1     2     1     18      3
		'B'       66  -->        1         0     1     9     41     14
		'GA'     273  -->        0         0     1    10    189     73
		'FA'     254  -->        1         1     0     6     80    166
	rates:
		              'Stub'    'Start'    'C'    'B'    'GA'    'FA'
		----------  --------  ---------  -----  -----  ------  ------
		sample         0.036      0.033  0.039  0.099   0.411   0.382
		population     0.036      0.033  0.039  0.099   0.411   0.382
	match_rate (micro=0.365, macro=0.167):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.032  0.048  0.386  0.513  0.011    0.011
	filter_rate (micro=0.635, macro=0.833):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.968  0.952  0.614  0.487  0.989    0.989
	recall (micro=0.574, macro=0.372):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.583  0.136  0.654  0.692  0.077    0.091
	!recall (micro=0.751, macro=0.888):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.989  0.962  0.779  0.612  0.992    0.992
	precision (micro=0.547, macro=0.453):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.667  0.281  0.646  0.554  0.286    0.286
	!precision (micro=0.799, macro=0.892):
		  Stub     B     FA     GA      C    Start
		------  ----  -----  -----  -----  -------
		 0.984  0.91  0.784  0.741  0.964     0.97
	f1 (micro=0.551, macro=0.388):
		  Stub      B    FA     GA      C    Start
		------  -----  ----  -----  -----  -------
		 0.622  0.184  0.65  0.616  0.121    0.138
	!f1 (micro=0.773, macro=0.889):
		  Stub      B     FA    GA      C    Start
		------  -----  -----  ----  -----  -------
		 0.987  0.935  0.781  0.67  0.978    0.981
	accuracy (micro=0.736, macro=0.858):
		  Stub     B     FA     GA      C    Start
		------  ----  -----  -----  -----  -------
		 0.974  0.88  0.731  0.645  0.956    0.962
	fpr (micro=0.249, macro=0.112):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.011  0.038  0.221  0.388  0.008    0.008
	roc_auc (micro=0.735, macro=0.806):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.987  0.663  0.749  0.694  0.811    0.934
	pr_auc (micro=0.537, macro=0.414):
		  Stub      B     FA     GA      C    Start
		------  -----  -----  -----  -----  -------
		 0.674  0.205  0.647  0.563  0.147    0.249
	
	 - score_schema: {'type': 'object', 'title': 'Scikit learn-based classifier score with probability', 'properties': {'probability': {'type': 'object', 'description': 'A mapping of probabilities onto each of the potential output labels', 'properties': {'Stub': 'number', 'B': 'number', 'FA': 'number', 'GA': 'number', 'C': 'number', 'Start': 'number'}}, 'prediction': {'type': 'string', 'description': 'The most likely label predicted by the estimator'}}}

Ready for review: https://github.com/wiki-ai/articlequality/pull/63

Ladsgroup moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.Apr 14 2018, 4:01 PM

Ladsgroup mentioned this in rOWC30a618080690: Add fawiki wp10 model.Apr 14 2018, 4:02 PM

This is somehow missing a ton of observations for the lower classes. Those should come out of one of the labeling campaigns.

That actually surprised me as well...

So, one of the labeling campaigns is probably not getting pulled in. Want to check on that?

I double checked that. Both are pulled in.

Halfak mentioned this in rOWC801f4cb8afa6: Add fawiki wp10 model.Apr 28 2018, 12:09 PM

I checked on this with @Ladsgroup and it looks like we accidentally had people label the GA/FA set rather than the 600 observation sample. So I've loaded the 600 observation sample (see http://labels.wmflabs.org/ui/fawiki/). On the bright side, we'll have better data about the 300 GA/FA sample (not all were labeled GA/FA).