Page MenuHomePhabricator
Paste P76289

RevertRisk Threshold Analysis for all Wikis
ActivePublic

Authored by gkyziridis on May 19 2025, 9:05 AM.
Tags
None
Referenced Files
F60224008: RevertRisk Threshold Analysis for all Wikis
May 19 2025, 9:05 AM
Subscribers
============ - cywiki - ============
- Raw data shape: (200511, 17)
- Duplicate rows found and removed: 1265
- Clean data shape: (199246, 17)
- Unique revision_ids: 199246 | Data Shape: 199246 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (190747, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_cywiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.11076630651950836
- confusion_matrix_cywiki.png saved!
- False Positive Rate is: 0.14999340149691756
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 135259 23868
reverted 15169 16451
============ - simplewiki - ============
- Raw data shape: (312209, 17)
- Duplicate rows found and removed: 41914
- Clean data shape: (270295, 17)
- Number of duplicated revision_ids found: 9
- Unique revision_ids: 270289 | Data Shape: 270289 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (246893, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_simplewiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.9065559506416321
- confusion_matrix_simplewiki.png saved!
- False Positive Rate is: 0.1500198513981056
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 179832 31740
reverted 13622 21699
============ - bewiki - ============
- Raw data shape: (80609, 17)
- Duplicate rows found and removed: 6221
- Clean data shape: (74388, 17)
- Unique revision_ids: 74388 | Data Shape: 74388 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (73969, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_bewiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.572495698928833
- confusion_matrix_bewiki.png saved!
- False Positive Rate is: 0.15014515086058638
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 61770 10913
reverted 183 1103
============ - kkwiki - ============
- Raw data shape: (82268, 17)
- Duplicate rows found and removed: 16708
- Clean data shape: (65560, 17)
- Unique revision_ids: 65560 | Data Shape: 65560 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (64276, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_kkwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.6048707962036133
- confusion_matrix_kkwiki.png saved!
- False Positive Rate is: 0.14999318290272
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 49875 8801
reverted 1475 4125
============ - nnwiki - ============
- Raw data shape: (25248, 17)
- Duplicate rows found and removed: 4213
- Clean data shape: (21035, 17)
- Unique revision_ids: 21035 | Data Shape: 21035 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (20392, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_nnwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.4162459373474121
- confusion_matrix_nnwiki.png saved!
- False Positive Rate is: 0.15001312680493567
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 16188 2857
reverted 48 1299
============ - mkwiki - ============
- Raw data shape: (54028, 17)
- Duplicate rows found and removed: 8215
- Clean data shape: (45813, 17)
- Unique revision_ids: 45813 | Data Shape: 45813 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (44585, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_mkwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.505830705165863
- confusion_matrix_mkwiki.png saved!
- False Positive Rate is: 0.15001292809627906
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 36161 6382
reverted 507 1535
============ - lawiki - ============
- Raw data shape: (27151, 17)
- Duplicate rows found and removed: 3948
- Clean data shape: (23203, 17)
- Unique revision_ids: 23203 | Data Shape: 23203 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (22893, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_lawiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.6340628266334534
- confusion_matrix_lawiki.png saved!
- False Positive Rate is: 0.14979973297730306
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 19104 3366
reverted 80 343
============ - afwiki - ============
- Raw data shape: (21996, 17)
- Duplicate rows found and removed: 3614
- Clean data shape: (18382, 17)
- Unique revision_ids: 18382 | Data Shape: 18382 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (17768, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_afwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.7369382977485657
- confusion_matrix_afwiki.png saved!
- False Positive Rate is: 0.1498459410174181
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 13520 2383
reverted 214 1651
============ - tewiki - ============
- Raw data shape: (97488, 17)
- Duplicate rows found and removed: 5150
- Clean data shape: (92338, 17)
- Unique revision_ids: 92338 | Data Shape: 92338 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (91883, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_tewiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.36725547909736633
- confusion_matrix_tewiki.png saved!
- False Positive Rate is: 0.1500330323717243
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 77194 13626
reverted 242 821
============ - mrwiki - ============
- Raw data shape: (42535, 17)
- Duplicate rows found and removed: 3930
- Clean data shape: (38605, 17)
- Number of duplicated revision_ids found: 2
- Unique revision_ids: 38604 | Data Shape: 38604 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (37677, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_mrwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.8673532605171204
- confusion_matrix_mrwiki.png saved!
- False Positive Rate is: 0.15002842524161455
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 29902 5278
reverted 575 1922
============ - swwiki - ============
- Raw data shape: (10831, 17)
- Duplicate rows found and removed: 682
- Clean data shape: (10149, 17)
- Unique revision_ids: 10149 | Data Shape: 10149 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (9971, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_swwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.7482092380523682
- confusion_matrix_swwiki.png saved!
- False Positive Rate is: 0.1493978517955951
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 7840 1377
reverted 322 432
============ - mlwiki - ============
- Raw data shape: (32931, 17)
- Duplicate rows found and removed: 5245
- Clean data shape: (27686, 17)
- Unique revision_ids: 27686 | Data Shape: 27686 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (27032, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_mlwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.9362650513648987
- confusion_matrix_mlwiki.png saved!
- False Positive Rate is: 0.15019467495182287
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 21608 3819
reverted 810 795
============ - iswiki - ============
- Raw data shape: (17947, 17)
- Duplicate rows found and removed: 4129
- Clean data shape: (13818, 17)
- Unique revision_ids: 13818 | Data Shape: 13818 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (13452, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_iswiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.8892603516578674
- confusion_matrix_iswiki.png saved!
- False Positive Rate is: 0.15050732807215333
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 10549 1869
reverted 430 604
============ - pawiki - ============
- Raw data shape: (20662, 17)
- Duplicate rows found and removed: 2817
- Clean data shape: (17845, 17)
- Unique revision_ids: 17845 | Data Shape: 17845 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (17030, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_pawiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.5458896160125732
- confusion_matrix_pawiki.png saved!
- False Positive Rate is: 0.14929753356228537
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 13624 2391
reverted 88 927
============ - hawiki - ============
- Raw data shape: (142582, 17)
- Duplicate rows found and removed: 11926
- Clean data shape: (130656, 17)
- Unique revision_ids: 130656 | Data Shape: 130656 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (130286, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_hawiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.4823181927204132
- confusion_matrix_hawiki.png saved!
- False Positive Rate is: 0.15009778484218073
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 109079 19264
reverted 1329 614
============ - tlwiki - ============
- Raw data shape: (29823, 17)
- Duplicate rows found and removed: 2465
- Clean data shape: (27358, 17)
- Unique revision_ids: 27358 | Data Shape: 27358 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (26356, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_tlwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.607416570186615
- confusion_matrix_tlwiki.png saved!
- False Positive Rate is: 0.15018641595072135
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 20970 3706
reverted 176 1504
============ - bnwiki - ============
- Raw data shape: (330764, 17)
- Duplicate rows found and removed: 29591
- Clean data shape: (301173, 17)
- Number of duplicated revision_ids found: 10
- Unique revision_ids: 301166 | Data Shape: 301166 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (292405, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_bnwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.6465859413146973
- confusion_matrix_bnwiki.png saved!
- False Positive Rate is: 0.15002477700693756
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 233274 41174
reverted 4019 13938
============ - trwiki - ============
- Raw data shape: (749190, 17)
- Duplicate rows found and removed: 116314
- Clean data shape: (632876, 17)
- Number of duplicated revision_ids found: 4
- Unique revision_ids: 632874 | Data Shape: 632874 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (581675, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_trwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.6082413196563721
- confusion_matrix_trwiki.png saved!
- False Positive Rate is: 0.14998876387080776
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 438769 77423
reverted 10108 55375
============ - azwiki - ============
- Raw data shape: (224309, 17)
- Duplicate rows found and removed: 23643
- Clean data shape: (200666, 17)
- Number of duplicated revision_ids found: 4
- Unique revision_ids: 200664 | Data Shape: 200664 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (194127, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_azwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.5366107821464539
- confusion_matrix_azwiki.png saved!
- False Positive Rate is: 0.14990706525346845
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 153673 27099
reverted 2012 11343
Optimal Thresholds calculated at 16-05-2025T19:24:01
{'cywiki': 0.11076631, 'simplewiki': 0.90655595, 'bewiki': 0.5724957, 'kkwiki': 0.6048708, 'nnwiki': 0.41624594, 'mkwiki': 0.5058307, 'lawiki': 0.6340628, 'afwiki': 0.7369383, 'tewiki': 0.36725548, 'mrwiki': 0.86735326, 'swwiki': 0.74820924, 'mlwiki': 0.93626505, 'iswiki': 0.88926035, 'pawiki': 0.5458896, 'hawiki': 0.4823182, 'tlwiki': 0.60741657, 'bnwiki': 0.64658594, 'trwiki': 0.6082413, 'azwiki': 0.5366108}
Time taken: 4278.109 secs