Page MenuHomePhabricator

Run analysis of revert time and number changes over time for wikidata
Closed, ResolvedPublic

Description

We want to measure the impact of ORES rollout over time (and impact of anti-vandalism bots) in Wikidata. In order to do so, we can run mwreverts tool on dump of wikidata. It'll be interesting

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Analysing the dump has been finished and all of the reverted/reverting edits (metadata only) is a 2.7GB file. I need to download it from stat1005 and run some scripts on top of that to make some histograms and data.

This is result of the analysis (note that I omit any revert that took more than 48 hours in average):

MonthNumber of revertsAverage revert time (seconds)Average revert time (hours)
2013-0268217916.9286028441612.199146834123378
2013-031791710196.7507395210812.8324307609780783
2013-043476328276.7178897103887.854643858252885
2013-053251639669.28490589234611.019245807192318
2013-062459828412.000731766817.892222425490781
2013-07755525306.3858371938957.0295516214427485
2013-08805222222.6300298062966.172952786057304
2013-09711417918.6201855495864.977394495985996
2013-10926041637.5571274297811.565988090952718
2013-111050424780.016755521816.883337987644947
2013-121518058733.14117259552616.31476143683209
2014-011467949131.3002929354713.647583414704297
2014-021748521576.3000857878255.993416690496618
2014-031263520303.8941828254945.639970606340415
2014-041252225891.2212106692027.192005891852556
2014-051513337105.9354391065310.307204288640703
2014-061937142999.9583397864111.944432872162892
2014-073405461557.15052563583617.09920847934329
2014-081509841466.0322559279211.518342293313312
2014-091154927786.041648627557.718344902396542
2014-101309036266.2597402599810.073961038961105
2014-113449670138.8866535252319.483024070423674
2014-121604427151.3712914485157.542047580957921
2015-012325236368.3129193188210.102309144255228
2015-022141225898.3339716048077.193981658779113
2015-032734147517.5675359350513.199324315537513
2015-043073063382.1874715263317.606163186535092
2015-051403943292.31839874627612.025643999651743
2015-061407329394.2635543238768.16507320953441
2015-071849742179.5639292858711.716545535912742
2015-081777042958.66797974115611.932963327705878
2015-091858127615.4226360260447.670950732229457
2015-102610235043.904643322389.734417956478438
2015-114521354039.8169774178815.011060271504967
2015-122841342409.55379579771411.78043160994381
2016-014476660458.5573873021316.794043718695036
2016-022391426297.450029271537.304847230353203
2016-032777324362.234508335346.767287363426483
2016-043041636819.4590018413210.227627500511478
2016-052161025219.5910226747557.005441950742988
2016-062302027558.875282363157.655243133989765
2016-073184629005.434371663828.05706510323995
2016-081875123268.411231401036.463447564278064
2016-092327524590.705520945096.8307515335958575
2016-103111928939.6233169446358.038784254706844
2016-112502424583.5901534526976.828775042625749
2016-122613538474.2314520759910.687286514465553
2017-016034278197.6439130287821.721567753619105
2017-023570029089.1520728291478.08032002023032
2017-034881231783.1098295500928.82864161931947
2017-042704929351.9430293172088.153317508143669
2017-053407333599.2708009278149.333130778035503
2017-066192535400.112620104869.833364616695794
2017-074448544070.65379341347612.241848275948188
2017-083846630320.6072635574358.422390906543733
2017-093195430141.5864367528058.372662899098001
2017-105404627575.770750841847.659936319678288
2017-118746726761.4495638354757.4337359899542985
2017-124211332499.1553439553069.027543151098696
2018-0151829127.73359073368.09103710853711

I think we need a plot of this data.

I'd also suggested using the geometric mean for looking at time-to-revert. E.g. geometric_mean = function(x){ exp(mean(log(x))) }

In python, I'd do:

>>> from statistics import mean
>>> from math import log, exp
>>> 
>>> def geo_mean(x):
...     return exp(mean(log(x_val) for x_val in x))
... 
>>> 
>>> mean([1,2,3,4,5,6])
3.5
>>> geo_mean([1,2,3,4,5,6])
2.993795165523909
>>> mean([1, 100, 2, 3, 76, 88])
45
>>> geo_mean([1, 100, 2, 3, 76, 88])
12.605921135923992

This is the new data:

MonthNumberAverage (hour)first quartile (hour)median (hour)last quartile (hour)Geo mean (seconds)
2013-0268212.19914683412337730.005277777777780.02166666666670.151944444444132.86284580032776
2013-03179172.83466327758243250.01027777777780.05611111111110.539166666667321.6465459454957
2013-04347627.85449357216372950.006666666666670.7597222222228.640277777781085.6034573069921
2013-053251611.019276561282650.08254.9305555555619.11006944444761.213122588603
2013-06245987.8922224254907810.3536805555562.0991666666715.79458333334825.439276953202
2013-0775557.0295516214427530.01277777777780.1883333333335.74958333333929.3304425969944
2013-0880526.1729527860572940.01722222222220.246255.337569444441036.062441273601
2013-0971144.97739449598600550.01833333333330.2451388888893.665925.7537805061206
2013-10926011.5606965142788580.01909722222220.67319444444416.85451388891740.5125758122417
2013-11105046.8833379876449190.02111111111110.4730555555568.485069444441353.9937700205217
2013-121518016.314761436832090.08472222222226.0105555555635.00895833336189.446694505918
2014-011467913.6475834147043020.06194444444442.1469444444430.05333333334273.771618016276
2014-02174855.9934166904966160.03388888888890.5402777777784.636944444441399.4303378428422
2014-03126315.6321820652891030.02361111111110.4538888888895.760138888891323.240246068031
2014-04125207.1858144746183890.02638888888890.5548611111119.004583333331640.7044790992202
2014-051513310.3072042886407190.04138888888892.0247222222219.09611111113174.742202212586
2014-061937111.9444328721628550.06152777777783.2213888888922.20736111114228.159227770243
2014-073405417.0992084793432650.14055555555612.100694444433.48659722226291.251250359374
2014-081509811.5183422933133150.05055555555563.4945833333320.94048611113833.1045376465695
2014-09115497.7183449023965520.01277777777780.7966666666678.521111111111057.2491634998469
2014-101316110.2637310783543970.1047222222224.3972222222217.58277777784853.667500504064
2014-113449619.4829950815553484.5051388888922.1062528.393402777821719.050783607843
2014-12160447.5420475809579210.05444444444441.022083333339.069513888892290.8116706971905
2015-012325210.1023091442552140.08055555555561.4394444444413.24756944443350.2806461362784
2015-02214127.19398165877908550.05888888888890.6513888888897.310416666672060.9280488912323
2015-032734013.2256684345281630.1052.5413888888931.11618055564996.953295070048
2015-043073017.6061957280254550.82590277777814.122777777830.277513399.019492677255
2015-051403912.0256439996517650.05805555555562.2644444444423.83166666673857.3118494680843
2015-06140738.1650732095344130.03305555555560.74944444444410.83361111111905.6095435477175
2015-071849711.7165455359127310.08752.7436111111122.27416666674284.532531253856
2015-081777011.9329633277058720.05722222222221.9588888888923.34715277783721.9030917244063
2015-09185817.6709507322294580.04944444444441.119444444449.695555555562352.5438921601585
2015-10261069.7249805281033730.06916666666671.9944444444416.34055555563431.9107897312483
2015-114521315.0110602715050.2469444444443.827528.39111111115573.570488066568
2015-122841311.7804316099438060.08166666666672.5636111111123.00805555564132.71109472196
2016-014476616.794043718695240.43222222222214.703611111131.680972222210813.405643138178
2016-02239147.304847230353210.04534722222220.6472222222227.541319444441943.457141571479
2016-03277586.7580885990825470.06972222222220.7702777777789.603958333332410.452891434275
2016-043041610.2276603779443570.1335416666673.3256944444417.66118055564972.091989111171
2016-05216107.0054419507429690.02777777777780.5277777777786.043055555561506.158973248446
2016-06230207.6552431339897670.05416666666671.214305555567.887986111112356.769406591105
2016-07318468.0570651032398980.08972222222221.296256.077847222222542.8589566948995
2016-08187516.4634475642780530.02111111111110.5536111111116.467361111111354.002299741235
2016-09232756.8307515335958950.03722222222220.83222222222210.27819444441839.9137458364835
2016-10311258.037645818830880.08138888888891.6566666666711.19472222222858.483665156888
2016-11250246.8286951193520890.04472222222220.8508333333337.949722222222000.6514309612098
2016-122613510.687286514465490.04666666666671.6047222222218.066252982.7133700511777
2017-016034221.7215677536191870.60562518.819583333343.191041666713661.402602844963
2017-02357008.0803200202303140.01611111111110.6577777777788.553888888891372.9292490882046
2017-03488018.8263411098133240.05194444444441.3436111111112.14972222222719.2814204540837
2017-04270498.1533175081436560.03611111111110.8888888888899.542026.0390930816654
2017-05340739.3331307780353950.03722222222220.73361111111117.12027777781996.5815620862377
2017-06619259.8333646166958250.02194444444443.5211111111123.37666666672184.1250102359068
2017-074448512.2418482759481970.01166666666670.95305555555614.21805555561386.5594154053178
2017-08384668.4223909065437290.05840277777781.809027777789.994652777782832.105630765698
2017-09319548.3726628990980090.04722222222221.831527777789.998055555562682.296850277629
2017-10540687.6682915238424040.07888888888892.865277777788.931180555563174.2841649084194
2017-11874677.4337131241877891.100833333332.983888888897.579027777786615.386286727896
2017-12421139.0275431510987630.03222222222221.4908333333316.93333333332610.847848926223
2018-015188.0910371085371080.03118055555560.54513888888912.11752030.5197775763806

Another data:

MonthNumber of users revertingAverage number of reverts per userFirst quartilemedianlast quartile
2013-03119810.4732888146911521.01.04.0
2013-04113319.503971756398941.02.04.0
2013-05115315.3330442324371211.02.04.0
2013-06113412.9611992945326281.01.03.0
2013-078305.7602409638554221.01.03.0
2013-088285.9299516908212561.01.03.0
2013-096786.0206489675516221.01.03.0
2013-108316.4392298435619731.01.03.0
2013-119506.955789473684211.01.54.0
2013-1211607.7646551724137931.01.03.0
2014-0113706.7934306569343071.01.03.0
2014-0213058.2245210727969341.01.03.0
2014-0314306.0461538461538461.01.03.0
2014-0414375.6270006958942241.01.03.0
2014-0514656.6532423208191121.01.03.0
2014-06138710.3749098774333091.01.03.0
2014-07127116.613690007867821.01.03.0
2014-0812847.2359813084112151.01.03.0
2014-099618.492195629552551.01.03.0
2014-1010288.0924124513618681.01.03.0
2014-11110925.9639314697926051.01.03.0
2014-1213078.6518745218056611.01.03.0
2015-0115768.962563451776651.01.03.0
2015-0215438.9559300064808821.01.03.0
2015-0316929.5466903073286051.01.03.0
2015-04160212.5599250936329591.01.03.0
2015-0513086.49847094801223251.01.03.0
2015-0614375.9074460681976341.01.03.0
2015-0715555.8199356913183281.01.03.0
2015-0817816.5951712521055581.01.03.0
2015-0917316.526863084922011.01.03.0
2015-1017839.1514301738642741.01.03.0
2015-11177114.5025409373235471.01.03.0
2015-12187610.3965884861407251.01.03.0
2016-01207813.6179018286814241.01.03.0
2016-0220407.3480392156862741.01.03.0
2016-0320339.1790457452041321.01.03.0
2016-04197610.8911943319838061.01.03.0
2016-0520416.4581087702106811.01.03.0
2016-0619917.0165745856353591.01.03.0
2016-0720599.478873239436621.01.03.0
2016-0816326.5441176470588231.01.03.0
2016-0916838.2822341057635181.02.03.0
2016-10186610.1886387995712761.01.03.0
2016-1119777.8012139605462821.01.03.0
2016-1219628.1202854230377161.01.03.0
2017-01229010.5777292576419221.01.03.0
2017-0221779.8594395957741.01.03.0
2017-03237412.6002527379949441.01.04.0
2017-0418627.812030075187971.01.03.0
2017-05202910.6111384918679161.01.03.0
2017-06209124.7058823529411781.01.03.0
2017-07232510.8584946236559151.01.03.0
2017-0823389.8870829769033361.02.03.0
2017-0923237.9212225570383131.02.03.0
2017-10244413.1076104746317521.02.04.0
2017-11257518.285048543689321.02.04.0
2017-1225249.7167194928684621.01.03.0
2018-011112.9639639639639641.01.02.0

(Same structure but for users who have reverted more than five edits per month)

MonthNumber of users revertingAverage number of reverts per userFirst quartilemedianlast quartile
2013-0211434.903508771929828.013.025.0
2013-0321550.7302325581395348.015.033.5
2013-04201101.681592039800999.013.035.0
2013-0520976.779904306220098.015.032.0
2013-0617375.763005780346828.013.031.0
2013-0714625.239726027397267.2512.025.0
2013-0814127.0851063829787228.012.030.0
2013-0911128.270270270270278.014.030.5
2013-1014629.0547945205479448.011.024.75
2013-1116232.487654320987658.014.030.75
2013-1215846.5632911392405058.013.027.0
2014-0116045.86258.013.021.25
2014-0217351.473988439306368.014.033.0
2014-0316140.8571428571428549.014.034.0
2014-0416237.1358024691358048.013.033.75
2014-0519639.2959183673469357.012.022.0
2014-0619464.149484536082478.2514.032.75
2014-07189102.634920634920638.015.033.0
2014-0817243.453488372093038.012.025.5
2014-0913450.7985074626865658.013.028.0
2014-1015046.048.014.031.0
2014-11166164.518072289156638.013.034.75
2014-1218252.1208791208791248.012.032.0
2015-0121156.436018957345978.014.034.0
2015-0220556.941463414634158.014.035.0
2015-0323558.468085106382988.013.032.0
2015-0421782.248847926267277.011.027.0
2015-0517138.988304093567257.011.026.0
2015-0618834.207446808510647.011.023.25
2015-0719434.96391752577328.012.026.0
2015-0823039.804347826086958.010.023.0
2015-0925034.787.011.021.0
2015-1029047.58.013.025.75
2015-1127284.996323529411777.012.027.0
2015-1227960.114695340501797.012.024.0
2016-0131081.483870967741948.012.029.75
2016-0231438.5031847133757958.012.022.0
2016-0332648.202453987730067.012.023.0
2016-0431359.773162939297127.011.025.0
2016-0531532.377777777777787.511.023.0
2016-0632234.527950310559017.011.024.75
2016-0731552.558730158730168.013.023.0
2016-0824534.032653061224497.011.023.0
2016-0926343.532319391634988.013.026.0
2016-1027858.7266187050359758.011.525.0
2016-1127944.64157706093197.011.027.0
2016-1230443.013157894736847.012.024.0
2017-0135458.912429378531078.012.029.0
2017-0234353.574344023323628.012.025.0
2017-0339966.403508771929828.012.028.0
2017-0427942.4731182795698967.013.029.5
2017-0532257.8571428571428548.012.028.0
2017-06340143.388235294117648.013.032.0
2017-0737258.8951612903225847.012.025.0
2017-0837752.2068965517241357.012.026.0
2017-0938039.484210526315797.012.025.0
2017-1042866.619158878504687.012.028.0
2017-1143499.972350230414758.012.027.75
2017-1240351.672456575682388.014.028.0
2018-011215.757.010.020.0

We discussed getting the plots cleaned up and adding English/Spanish Wikipedia at our sync meeting.

This is great. Please graph the results, write a report, and give a description of the weirdness in Spanish's dump files.

This is the plot of geographical mean of revert time in Wikidata:

Figure_1.png (1×1 px, 75 KB)

This is median of reverts made by users who made more than five reverts in that month
Figure_2.png (1×1 px, 72 KB)

This is number of users who reverted more than five in the month:
Figure_5.png (1×1 px, 59 KB)

OK I made an epic to cover other work we should do before we speak publicly about what we found. See T200898: Analyze the effects of ORES deployments on counter-vandalism behavior

I think this will make for a great follow-up paper to When the Levee Breaks. But more immediately, we'll get a better view of what counter-vandalism looks like on various wikis.

Oh! And to the point of reviewing this specific task, please limit your aggregate analysis to 12 months. This will help account for seasonality. E.g., December/January and September look weird and can appear twice in a 17 month sample.