Page Menu
Home
Phabricator
Search
Configure Global Search
Log In
Paste
P80968
Threshold Analysis for enwiki
Active
Public
Actions
Authored by
gkyziridis
on Aug 7 2025, 12:05 PM.
Edit Paste
Archive Paste
View Raw File
Subscribe
Mute Notifications
Tags
None
Referenced Files
F65721769: Threshold Analysis for enwiki
Aug 7 2025, 12:05 PM
2025-08-07 12:05:58 (UTC+0)
Subscribers
None
============
- enwiki -
============
- Date Window:
2024
-12-01
00
:00:00 -
2025
-01-05
00
:00:00
- Raw data shape:
(
3866111
,
17
)
- Duplicate rows found and removed:
180911
- Clean data shape:
(
3685200
,
17
)
- Unique revision_ids:
3685200
|
Data Shape:
3685200
|
Same? : -> True
- Removing edits that are reverts from df
|
New Shape:
(
3522442
,
17
)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold
for
15
.0% FPR is:
0
.8609287142753601
- confusion_matrix_enwiki.png saved!
- False Positive Rate is:
0
.14999987694942868
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted
2763092
487604
reverted
84670
187076
Optimal Thresholds calculated
{
'enwiki'
:
0
.8609287
}
============
- enwiki -
============
- Date Window:
2025
-01-01
00
:00:00 -
2025
-02-05
00
:00:00
- Raw data shape:
(
3954832
,
17
)
- Duplicate rows found and removed:
216729
- Clean data shape:
(
3738103
,
17
)
- Unique revision_ids:
3738103
|
Data Shape:
3738103
|
Same? : -> True
- Removing edits that are reverts from df
|
New Shape:
(
3559861
,
17
)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold
for
15
.0% FPR is:
0
.8726013898849487
- confusion_matrix_enwiki.png saved!
- False Positive Rate is:
0
.15000013720110358
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted
2787878
491979
reverted
85382
194622
Optimal Thresholds calculated
{
'enwiki'
:
0
.8726014
}
============
- enwiki -
============
- Date Window:
2025
-02-01
00
:00:00 -
2025
-03-05
00
:00:00
- Raw data shape:
(
3465504
,
17
)
- Duplicate rows found and removed:
180513
- Clean data shape:
(
3284991
,
17
)
- Unique revision_ids:
3284991
|
Data Shape:
3284991
|
Same? : -> True
- Removing edits that are reverts from df
|
New Shape:
(
3113856
,
17
)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold
for
15
.0% FPR is:
0
.8680040240287781
- confusion_matrix_enwiki.png saved!
- False Positive Rate is:
0
.15000008746932014
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted
2429423
428722
reverted
78568
177143
Optimal Thresholds calculated
{
'enwiki'
:
0
.868004
}
============
- enwiki -
============
- Date Window:
2025
-03-01
00
:00:00 -
2025
-04-05
00
:00:00
- Raw data shape:
(
3821024
,
17
)
- Duplicate rows found and removed:
203813
- Clean data shape:
(
3617211
,
17
)
- Unique revision_ids:
3617211
|
Data Shape:
3617211
|
Same? : -> True
- Removing edits that are reverts from df
|
New Shape:
(
3439901
,
17
)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold
for
15
.0% FPR is:
0
.7739927768707275
- confusion_matrix_enwiki.png saved!
- False Positive Rate is:
0
.1499996065068035
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted
2700175
476500
reverted
75224
188002
Optimal Thresholds calculated
{
'enwiki'
:
0
.7739928
}
============
- enwiki -
============
- Date Window:
2025
-04-01
00
:00:00 -
2025
-05-05
00
:00:00
- Raw data shape:
(
3616972
,
17
)
- Duplicate rows found and removed:
210338
- Clean data shape:
(
3406634
,
17
)
- Unique revision_ids:
3406634
|
Data Shape:
3406634
|
Same? : -> True
- Removing edits that are reverts from df
|
New Shape:
(
3247157
,
17
)
- Is any revert_risk_score NA? : False
- Is any user_edit_count NA? : False
- Is any time_to_revert NA? : False
- ROC_enwiki.png saved!
- Optimal threshold
for
15
.0% FPR is:
0
.779203474521637
- confusion_matrix_enwiki.png saved!
- False Positive Rate is:
0
.15000018345178204
- CONFUSION MATRIX -
Predicted not reverted reverted
Actual
not reverted
2548353
449710
reverted
73117
175977
Optimal Thresholds calculated
{
'enwiki'
:
0
.7792035
}
Average Threshold:
{
'enwiki'
:
0
.83092
}
Event Timeline
gkyziridis
created this paste.
Aug 7 2025, 12:05 PM
2025-08-07 12:05:58 (UTC+0)
gkyziridis
mentioned this in
T400590: Investigate revertrisk threshold generation for enwiki
.
Aug 7 2025, 1:46 PM
2025-08-07 13:46:57 (UTC+0)
Log In to Comment