We need the following number for a blog post about TWL citation data.
What is the net change (added - removed) in links added by users on the user list, limited to namespace 0?
We need the following number for a blog post about TWL citation data.
What is the net change (added - removed) in links added by users on the user list, limited to namespace 0?
Still working on https://phabricator.wikimedia.org/T387887, but posting the metrics I got so far.
I'll sum up all the logs at the end to provide the final answer to this ticket, just logging the findings as I process the archives (I can't process all archives at once as it breaks the database and makes the performance much worse).
The query on the links_linkevent table:
select ll.change, count(1) from links_linkevent ll where ll.id between <range> -- -> this range represents the chunks as I run the "load" and "migration" scripts from the archives and ll.page_namespace = 0 and ll.on_user_list = 1 group by ll.change;
links_linkevent.change = 0 (REMOVED)
links_linkevent.change = 1 (ADDED)
I'll keep posting updates as I process the batches.
Adding more results from yesterday + today's morning:
Summary from our archives in GDrive - from 2019-07 to 2024-10:
Adding sums from 2024-11-01 to 2025-02-12
Total counts:
Just adding a final count, now considering the 5 files that got rejected in the migration process (mentioned in https://phabricator.wikimedia.org/T387887#10702024)
Total counts:
With that, I believe this is the final count for all data from our archives.