Page MenuHomePhabricator

Find out number of deleted wikibases
Closed, ResolvedPublic

Assigned To
Authored By
Charlie_WMDE
Jan 31 2024, 2:07 PM
Referenced Files
Restricted File
Feb 7 2024, 7:49 AM
Restricted File
Feb 6 2024, 3:37 PM
F41793982: image.png
Feb 6 2024, 3:21 PM

Description

Before starting work on the parent ticket we would like to make an educated guess of how many people we would actually reach that could provide feedback to us.

The data needed:

  • dataset of wikibase instance deletion events per short period of time (probably per day or month) (excluding those that were created and deleted by WMDE users for the purpose of testing; we already have deleted instances but not excluding ones created by the WMDE team)
  • how many of the deleted instances were from accounts that also still have active instances (i.e those that had at least one edit in the last 30 days)
  • how many of the deleted instances were from accounts that do not have any (active) instances anymore
  • No. of abandoned instances (charlie asks conny)

Event Timeline

Tarrow updated the task description. (Show Details)
Tarrow updated the task description. (Show Details)

I came up with the following script for collecting these metrics:

use App\Wiki;
use Carbon\Carbon;
use Carbon\CarbonPeriod;
use Illuminate\Support\Facades\Http;

$startDate = Wiki::select('created_at')->get()->first()->created_at;

$result = [];
$startDate->setDay(1);

foreach (CarbonPeriod::create($startDate, '1 month', Carbon::today()) as $month) {
    $id = $month->format('m-Y');
    $result[$id] = [];
    $wikisDeletedInMonth = Wiki::withTrashed()
        ->with('wikiManagers')
        ->whereRelation('wikiManagers', 'email', 'not like', '%wikimedia%')
        ->where([
            ['deleted_at', '<>', null],
            ['deleted_at', '>=', $month->startOfMonth()->toDateString()],
            ['deleted_at', '<=', $month->endOfMonth()->toDateString()],
        ])
        ->get();
    $result[$month->format('m-Y')]['count'] = $wikisDeletedInMonth->count();

    $wikisWhereAllManagersInactive = 0;
    $wikisWhereManagersStillActive = 0;
    foreach ($wikisDeletedInMonth as $wiki) {
        foreach ($wiki->wikiManagers()->get() as $user) {
            $matches = Wiki::whereRelation('wikiManagers', 'email', '=', $user->email)->get();
            foreach ($matches as $match) {
                try {
                    $res = Http::get('https://'.$match->domain.'/w/api.php?action=query&list=recentchanges&format=json');
                    $lastEdited = data_get($res->json(), 'query.recentchanges.0.timestamp');
                    if ($lastEdited) {
                        if (Carbon::now()->subDays(30) < Carbon::parse($lastEdited)) {
                            $wikisWhereManagersStillActive++;
                            continue 3;
                        }
                    }
                } catch (Exception $ex) {
                    // pass - wiki probably does not resolve
                }
            }
        }
        $wikisWhereAllManagersInactive++;
    }

    $result[$id]['active_managers'] = $wikisWhereManagersStillActive;
    $result[$id]['inactive_managers'] = $wikisWhereAllManagersInactive;
}

$output = ["month,deletions,active_managers,inactive_managers"];
foreach ($result as $month => $stats) {
    $output[] = implode(
        ',',
        [
            $month,
            $stats['count'],
            $stats['active_managers'],
            $stats['inactive_managers'],
        ]
    );
}
echo implode(PHP_EOL, $output).PHP_EOL;

Which can be run in a backend api pod like this:

k exec -ti deployments/api-app-backend -- php artisan tinker --execute "$(cat script.php)"

For production this yields:

month,deletions,active_managers,inactive_managers
02-2022,0,0,0
03-2022,0,0,0
04-2022,0,0,0
05-2022,0,0,0
06-2022,17,11,6
07-2022,3,3,0
08-2022,3,2,1
09-2022,4,2,2
10-2022,5,4,1
11-2022,5,1,4
12-2022,5,4,1
01-2023,71,70,1
02-2023,3,0,3
03-2023,10,5,5
04-2023,3,1,2
05-2023,1,0,1
06-2023,5,3,2
07-2023,3,2,1
08-2023,7,3,4
09-2023,3,2,1
10-2023,13,1,12
11-2023,8,0,8
12-2023,15,0,15
01-2024,11,2,9
02-2024,2,1,1

{F41801075}

Fring removed Fring as the assignee of this task.Feb 6 2024, 3:29 PM
Fring moved this task from Doing to In Review on the Wikibase Cloud (Kanban board Q1 2024) board.
Fring subscribed.

@Charlie_WMDE Do the numbers in the comment above suit your needs?

Hey @Fring thanks for the numbers!

do we know where the increse in deletions came from since last october?

We thought we should ask you (@Charlie_WMDE) but maybe it's also meaningful to split by the usernames who did the deleting. For example more than 95% of the Jan 2023 deletion spike was due to a single user

do we know where the increse in deletions came from since last october?

@Charlie_WMDE I would think this is related to going public beta mid of September, but that's mostly a guess.

I created a subticket to generate the additional data point as tom suggested and am moving this ticket to done. thanks for figuring this out @Fring