Page MenuHomePhabricator
Paste P80968

Threshold Analysis for enwiki
ActivePublic

Authored by gkyziridis on Aug 7 2025, 12:05 PM.
Tags
None
Referenced Files
F65721769: Threshold Analysis for enwiki
Aug 7 2025, 12:05 PM
Subscribers
None
============ - enwiki - ============
- Date Window: 2024-12-01 00:00:00 - 2025-01-05 00:00:00
- Raw data shape: (3866111, 17)
- Duplicate rows found and removed: 180911
- Clean data shape: (3685200, 17)
- Unique revision_ids: 3685200 | Data Shape: 3685200 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (3522442, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.8609287142753601
- confusion_matrix_enwiki.png saved!
- False Positive Rate is: 0.14999987694942868
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 2763092 487604
reverted 84670 187076
Optimal Thresholds calculated
{'enwiki': 0.8609287}
============ - enwiki - ============
- Date Window: 2025-01-01 00:00:00 - 2025-02-05 00:00:00
- Raw data shape: (3954832, 17)
- Duplicate rows found and removed: 216729
- Clean data shape: (3738103, 17)
- Unique revision_ids: 3738103 | Data Shape: 3738103 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (3559861, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.8726013898849487
- confusion_matrix_enwiki.png saved!
- False Positive Rate is: 0.15000013720110358
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 2787878 491979
reverted 85382 194622
Optimal Thresholds calculated
{'enwiki': 0.8726014}
============ - enwiki - ============
- Date Window: 2025-02-01 00:00:00 - 2025-03-05 00:00:00
- Raw data shape: (3465504, 17)
- Duplicate rows found and removed: 180513
- Clean data shape: (3284991, 17)
- Unique revision_ids: 3284991 | Data Shape: 3284991 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (3113856, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.8680040240287781
- confusion_matrix_enwiki.png saved!
- False Positive Rate is: 0.15000008746932014
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 2429423 428722
reverted 78568 177143
Optimal Thresholds calculated
{'enwiki': 0.868004}
============ - enwiki - ============
- Date Window: 2025-03-01 00:00:00 - 2025-04-05 00:00:00
- Raw data shape: (3821024, 17)
- Duplicate rows found and removed: 203813
- Clean data shape: (3617211, 17)
- Unique revision_ids: 3617211 | Data Shape: 3617211 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (3439901, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.7739927768707275
- confusion_matrix_enwiki.png saved!
- False Positive Rate is: 0.1499996065068035
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 2700175 476500
reverted 75224 188002
Optimal Thresholds calculated
{'enwiki': 0.7739928}
============ - enwiki - ============
- Date Window: 2025-04-01 00:00:00 - 2025-05-05 00:00:00
- Raw data shape: (3616972, 17)
- Duplicate rows found and removed: 210338
- Clean data shape: (3406634, 17)
- Unique revision_ids: 3406634 | Data Shape: 3406634 | Same? : -> True
- Removing edits that are reverts from df | New Shape: (3247157, 17)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold for 15.0% FPR is: 0.779203474521637
- confusion_matrix_enwiki.png saved!
- False Positive Rate is: 0.15000018345178204
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted 2548353 449710
reverted 73117 175977
Optimal Thresholds calculated
{'enwiki': 0.7792035}
Average Threshold: {'enwiki': 0.83092}