Page MenuHomePhabricator

📈conversionMetrics data contains incorrect edit dates
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):
Access conversion metric data at https://api.wikibase.cloud/wikiConversionData

What happens?:
The last_edited_time for socioinsight.wikibase.cloud is listed as 2023-10-03 07:27:28

What should have happened instead?:
The most recent edit for socioinsight.wikibase.cloud should have been 2023-10-12T02:08:20Z

https://socioinsight.wikibase.cloud/w/api.php?action=query&list=recentchanges&format=json

and observe an edit with the timestamp of 2023-10-12T02:08:20Z

Other information (browser name/version, screenshots, etc.):
This appears to happen because there are multiple wikiLifeCycleEvents for socioinsight.wikibase.cloud. The conversion metric controller returns the first of these

See the tinker session below:

> Wiki::where(['domain' => 'socioinsight.wikibase.cloud'])->first()->wikiLifecycleEvents()->get()
= Illuminate\Database\Eloquent\Collection {#7641
    all: [
      App\WikiLifecycleEvents {#7643
        +id: 600,
        +created_at: "2023-10-03 19:06:31",
        +updated_at: "2023-10-03 19:06:31",
        first_edited: "2023-10-03 06:56:06",
        last_edited: "2023-10-03 07:27:28",
        +wiki_id: 657,
      },
      App\WikiLifecycleEvents {#7277
        +id: 615,
        +created_at: "2023-10-04 19:06:36",
        +updated_at: "2023-10-04 19:06:36",
        first_edited: "2023-10-03 06:56:06",
        last_edited: "2023-10-04 08:21:04",
        +wiki_id: 657,
      },
      App\WikiLifecycleEvents {#7524
        +id: 695,
        +created_at: "2023-10-10 19:04:02",
        +updated_at: "2023-10-10 19:04:02",
        first_edited: "2023-10-03 06:56:06",
        last_edited: "2023-10-10 01:09:06",
        +wiki_id: 657,
      },
      App\WikiLifecycleEvents {#7546
        +id: 726,
        +created_at: "2023-10-12 19:05:04",
        +updated_at: "2023-10-12 19:05:04",
        first_edited: "2023-10-03 06:56:06",
        last_edited: "2023-10-12 02:08:20",
        +wiki_id: 657,
      },
    ],
  }

This is surprising because we are using the updateOrCreate method so I would have expected us to only have one

Event Timeline

Tarrow updated the task description. (Show Details)

It would be great if whoever takes this on also makes sure that other columns in this CSV are calculated correctly and not affected by the same problem (I refer to first_edited and created_at)

The API call now returns information in which some Wikibases have the last edit date, but do not have the first edit date (I suspected this might happen in my comment from May 21, but didn't review it properly).
Could someone please check this?

I'm not sure why that is, but it seems those wikis do not have any revision objects at all, e.g. https://monacan.wikibase.cloud/w/api.php?action=query&format=json&prop=revisions which means our approach of getting revision 1 and picking its date fails. The wiki also has no deleted revisions.

If you manually walk up the possible ids, you will hit a revision at some point, so the bug is that relying on revision 1 to be always present is not working as intended.

Anton.Kokh renamed this task from conversionMetrics data contains incorrect last edit dates to conversionMetrics data contains incorrect edit dates.Jun 11 2024, 12:12 PM
Tarrow removed Fring as the assignee of this task.Jun 21 2024, 6:02 PM
Tarrow moved this task from Done to To do on the Wikibase Cloud (Kanban Board Q2 2024) board.
Tarrow added a subscriber: Fring.

I was asked to look at this again by @Anton.Kokh since he claims there is still an issue. Specifically relating to wikis that have last-edited date but no first edited date.

An example is test-lm-uat.wikibase.cloud.

I had a look and I believe there are still issues but we're very much playing whack-a-mole with getting this right. In the previous case we had weirdly missing first-edits; now we have weirdly extra last-edits.

We consider any recent change to count as a "last edit". You can see historically and after #814 we made a call like https://test-lm-uat.wikibase.cloud/w/api.php?action=query&list=recentchanges&format=json which returns:

{"batchcomplete":"","query":{"recentchanges":[{"type":"log","ns":2,"title":"User:Second Tester","pageid":0,"revid":0,"old_revid":0,"rcid":1,"timestamp":"2022-06-24T12:07:22Z"}]}}

In this error case we are seeing those Wiki's which have never had an edit but that have had some user action (or perhaps something else) logged into their RC table.

Perhaps we should instead adjust this to be https://test-lm-uat.wikibase.cloud/w/api.php?action=query&format=json&list=recentchanges&rctype=edit%7Cnew i.e. filter for only edit to existing pages and the creation of new pages.

Alternatively we could consider moving the logic to something more analogous to the first-edited logic: https://test-lm-uat.wikibase.cloud/w/api.php?action=query&format=json&list=allrevisions%7Calldeletedrevisions&arvprop=ids%7Ctimestamp

In any case I suspect we're going to step on another rake in some unforeseen way around people who delete, or revdelete or suppress as the last action we look at.

Anton.Kokh renamed this task from conversionMetrics data contains incorrect edit dates to 📈conversionMetrics data contains incorrect edit dates.Jul 1 2024, 7:37 AM

last_edited_time still includes platform user changes

For some instances, first_edited_time also, for example: https://antontest4.wikibase.cloud/

We talked about this and we think both should rely on first and last revisions (but not the automatically created MainPage by the PlatformReservedUser; but other actions by this user should be counted if triggered by an additional user action e.g. importing entities)

it appears that part of the issue is probably around stale WikiLifecycleStats objects. We probably want to clean these all up and let them be regenerated. If we want to normalise using either most recent/oldest revision that will require a followup patch. This logic will also be broken by the importing entities stuff. Resolving that is going to be slightly trickier since we will need to differentiate between platformreserveduser edits of different types

https://github.com/wbstack/api/pull/854 is a PR to try and normalise this a little.

After deploying this I believe the easiest thing would be to *drop* all the LifecycleStats and allow them to be recreated so they are using this consistent method.

I think we should then *probably* follow up and have importing done by another different user than the PlatformReservedUser; perhaps PlatformImporter. We should reserve the PlatformReservedUser only for actions triggered purely by us as platform operators and not from events / actions triggered by users.

@Tarrow Would you like maybe to proceed with the follow up you suggested, not to build tech debt?
Could you also please check whether your changes also are taken into account when looking for 'active' wikis? ( https://phabricator.wikimedia.org/T371408 )

Seems to have worked fine in staging; ran the migration with
kubectl exec -it deployment/api-app-backend -- php artisan migrate

and then confirm after a moment there were the same number (57) of lifecycle events as wikis.