Page MenuHomePhabricator

What is the net change in links added by TWL users in the article namespace?
Closed, ResolvedPublic

Description

We need the following number for a blog post about TWL citation data.

What is the net change (added - removed) in links added by users on the user list, limited to namespace 0?

Event Timeline

Still working on https://phabricator.wikimedia.org/T387887, but posting the metrics I got so far.

I'll sum up all the logs at the end to provide the final answer to this ticket, just logging the findings as I process the archives (I can't process all archives at once as it breaks the database and makes the performance much worse).

The query on the links_linkevent table:

select ll.change, count(1)
from links_linkevent ll 
where ll.id between <range> -- -> this range represents the chunks as I run the "load" and "migration" scripts from the archives
  and ll.page_namespace = 0
  and ll.on_user_list = 1
group by ll.change;

links_linkevent.change = 0 (REMOVED)
links_linkevent.change = 1 (ADDED)

  • From date 2019-07-01 to 2021-08-12 - from ID: 1 to 2618526
    • Result: change = 0, count = 12875 | change = 1, count 121267
  • From date 2021-08-13 to 2022-03-28 - from ID: 2618527 to 3698248
    • Result: change = 0, count = 6502 | change = 1, count = 59885
  • From date 2022-03-29 to 2022-12-27 - from ID: 3698249 to 4803636
    • Result: change = 0, count = 11773 | change = 1, count = 82543
  • From date 2022-12-28 to 2023-02-12 - from ID: 4803637 to 5559299
    • Result: change = 0, count = 10627 | change = 1, count = 22912

I'll keep posting updates as I process the batches.

Adding more results from yesterday + today's morning:

  • From date 2023-02-13 to 2023-03-31 - from ID: 5559300 to 6158857
    • Result: change = 0, count = 9974 | change = 1, count = 18943
  • From date 2023-04-01 to 2023-05-13 - from ID: 6158858 to 6711692
    • Result: change = 0, count = 2175 | change = 1, count = 13060
  • From date 2023-05-14 to 2023-05-25 - from ID: 6711693 to 7363239
    • Result: change = 0, count = 374 | change = 1, count = 2793
  • From date 2023-05-26 to 2023-06-25 - from ID: 6711693 to 7363239
    • Result: change = 0, count = 374 | change = 1, count = 2793
  • From date 2023-06-26 to 2023-06-26 - from ID: 8241013 to 8442154
    • Result: change = 0, count = 32 | change = 1, count = 247
Mimurawil changed the task status from Stalled to In Progress.Mar 26 2025, 7:31 PM
Mimurawil claimed this task.
  • From date 2023-06-27 to 2023-06-29 - from ID: 8442155 to 9020957
    • Result: change = 0, count = 156 | change = 1, count = 703
  • From date 2023-07-01 to 2023-07-18 - from ID: 9020958 to 9421373
    • Result: change = 0, count = 739 | change = 1, count = 6261
  • From date 2023-07-19 to 2023-07-21 - from ID: 9421374 to 9726862
    • Result: change = 0, count = 140 | change = 1, count = 757
  • From date 2023-07-22 to 2023-07-31 - from ID: 9726863 to 10776213
    • Result: change = 0, count = 844 | change = 1, count = 3405
  • From date 2023-08-01 to 2023-08-03 - from ID: 10776214 to 11166680
    • Result: change = 0, count = 207 | change = 1, count = 970
  • From date 2023-08-04 to 2023-08-20 - from ID: 11166681 to 11673113
    • Result: change = 0, count = 882 | change = 1, count = 6165
  • From date 2023-08-21 to 2023-08-22 - from ID: 11673114 to 12100012
    • Result: change = 0, count = 315 | change = 1, count = 734
  • From date 2023-08-23 to 2023-08-26 - from ID: 11673114 to 12100012
    • Result: change = 0, count = 355 | change = 1, count = 3605
  • From date 2023-08-27 to 2023-08-29 - from ID: 12584644 to 14147477
    • Result: change = 0, count = 333 | change = 1, count = 937
  • From date 2023-08-30 to 2023-08-31 - from ID: 14147478 to 15149195
    • Result: change = 0, count = 148 | change = 1, count = 597
  • From date 2023-09-01 to 2023-09-30 - from ID: 15149196 to 23173260
    • Result: change = 0, count = 1595 | change = 1, count = 7757
  • From date 2023-10-01 to 2024-01-31 - from ID: 23173261 to 27923227
    • Result: change = 0, count = 7917 | change = 1, count = 51359
  • From date 2024-02-01 to 2024-10-30 - from ID: 27923228 to 39687576
    • Result: change = 0, count = 17106 | change = 1, count = 142238

Summary from our archives in GDrive - from 2019-07 to 2024-10:

  • change = 0 (removed), count = 85443
  • change = 1 (added), count = 549931
  • delta = 464488

Adding sums from 2024-11-01 to 2025-02-12

  • change = 0 (removed), count = 15107
  • change = 1 (added), count = 54112

Total counts:

  • added: 604043
  • removed: 100550
  • delta: 503493

Just adding a final count, now considering the 5 files that got rejected in the migration process (mentioned in https://phabricator.wikimedia.org/T387887#10702024)

  • change = 0 (removed), count = 1129
  • change = 1 (added), count = 7203

Total counts:

  • added: 611246
  • removed: 101679
  • delta: 509567

With that, I believe this is the final count for all data from our archives.

Kgraessle subscribed.

This looks good to me, moving to QA.

@sjvipin does this look good to you? If so, please resolve this task.

Yes, @Kgraessle shared the numbers I needed. Closing this task. Thanks!