Page MenuHomePhabricator

Instrument the Wikistories share feature
Closed, ResolvedPublic

Description

Wikistories has a new feature where each story has a special share link. People who follow the link are able to view the story even though stories are otherwise visible only to registered users with the beta feature enabled.

Requests

  • We want to track when a user accesses one of the share links from the editor(story builder) view OR the reader (story viewer) view
  • We want to know how many users consume wikistories in a shared link, and wether they are logged-in users or anonymous users.
  • We want to know the channel for every Share (whatsapp, gmail etc)

Steps

  • Add events that capture these events to the wikistories_consumption_event and wikistories_contribution_event schema.
  • Update the code to log the information when appropriate and use the new schema version.
  • Verify that the Data Lake is receiving the new data.

Acceptance criteria

In wikistories_contribution_event

  • event_type = "story_share" when users access the shared link

Add "shared_channel" column to record shared channels (NULL for other event_type)

In wikistories_consumption_event

  • event_type = "story_share" when users access the shared link
  • Add "user_is_anonymous" column to track whether the users are logged in or not

[] referrer_type = "shared_link" (specify shared channels if possible) if the story view or story impression is from a shared link. event_type='story_share' is sufficient to know that this consumption event is from a url that was shared.
Add "shared_channel" column to record shared channels (NULL for other event_type)

Event Timeline

cchen triaged this task as Medium priority.Aug 2 2023, 7:16 AM
cchen moved this task from Triage to Current Quarter on the Product-Analytics board.
cchen moved this task from Current Quarter to Kanban on the Product-Analytics board.
cchen edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
cchen moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

@cchen this looks good but I don't think there is any way to get the shared_channel at the time of sharing.

However, we may be able to get that information by looking at the referrer field for the shared urls (with ?action=storyview) rows in the pageviews table.

cchen updated the task description. (Show Details)

@SBisson thanks for reviewing the requirements. I will go ahead and update the ticket and the schema.

Change 965845 had a related patch set uploaded (by Conniecc1; author: Conniecc1):

[schemas/event/secondary@master] T343183 add story_share event and bump to version 1.2.0

https://gerrit.wikimedia.org/r/965845

Change 965846 had a related patch set uploaded (by Conniecc1; author: Conniecc1):

[schemas/event/secondary@master] T343183 add "stoty share" event; add "user_is_anonymous" field and bump to version 1.1.0

https://gerrit.wikimedia.org/r/965846

@SBisson I've submitted updates to the schemas.

SBisson moved this task from Backlog to Dev on the Inuka-Team (Kanban) board.
SBisson added a subscriber: cchen.

@SBisson I've submitted updates to the schemas.

Thank you. I will work on the Wikistories part now

Change 966880 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/Wikistories@master] Share feature instrumentation

https://gerrit.wikimedia.org/r/966880

I think I have completed the code on the Wikistories side. This is blocked until the schema updates are merged and deployed.

Change 965845 had a related patch set uploaded (by Bearloga; author: Conniecc1):

[schemas/event/secondary@master] wikistories_contribution_event: add story_share event type

https://gerrit.wikimedia.org/r/965845

Change 965845 merged by jenkins-bot:

[schemas/event/secondary@master] wikistories_contribution_event: add story_share event type

https://gerrit.wikimedia.org/r/965845

Change 965846 merged by jenkins-bot:

[schemas/event/secondary@master] T343183 add "story share" event; add "user_is_anonymous" field and bump to version 1.1.0

https://gerrit.wikimedia.org/r/965846

PWaigi-WMF changed the task status from Open to In Progress.Nov 17 2023, 5:40 PM

Change 966880 merged by jenkins-bot:

[mediawiki/extensions/Wikistories@master] Share feature instrumentation

https://gerrit.wikimedia.org/r/966880

For testing, @PWaigi-WMF did some shares and opened shared links today.

Events i checked:

  1. In wikistories_contribution_event event_type = "story_share" when users access the shared link
    • No events were found in the table using the following query
select * from event.mediawiki_wikistories_contribution_event where event_type = 'story_share' and year = 2024 and month = 2
  1. In wikistories_consumption_event event_type = "story_share" when users access the shared link
    • No events were found in the table using the following query
select * from event.mediawiki_wikistories_consumption_event where event_type = 'story_share' and year = 2024 and month = 2
  1. In wikistories_consumption_event, referrer_type = "shared_link" if the story view or story impression is from a shared link
    • No events were found in the table using the following query
select * from event.mediawiki_wikistories_consumption_event where referrer_type = "shared_link" and year = 2024 and month = 2
  • but I was able to find the story views from shares in wmf.pageview_actor using the following query.
select cast(substr(dt,1,10) AS DATE), count(*) from pageview_actor 
where year = 2024 and month = 2 AND uri_host = 'id.wikipedia.org' AND uri_query like '%action=storyview%'
group by cast(substr(dt,1,10) AS DATE)
`

Hi @cchen, I just tried the share option from the story viewer and I could see the event going out. However, due to the use of the hasty parameter I don't know if it was properly accepted by the backend. Is there a place you can check for events that failed schema validation?

On contribution side:
In some cases the instrument sends event data containing activity_session_id and dt fields which are not present in version 1.2.0 of the contribution schema: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/mediawiki/wikistories_contribution_event/1.2.0.yaml so a lot of events aren't passing schema validation.

See https://logstash.wikimedia.org/app/discover#/view/AXMlVWkuMQ_08tQas2Xi?_g=h@7c81a5b&_a=h@998eb81 (errored_stream_name = "mediawiki.wikistories_contribution_event")

On consumption side:
No schema validation errors but
https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&var-service=eventgate-analytics-external&var-stream=mediawiki.wikistories_consumption_event&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos&var-site=All&from=now-7d&to=now is showing that there are HTTP 400 & 404 errors but I'm not sure if those are specific to the stream or in general.

@mpopov Thanks for the troubleshooting!

The activity_session_id field was never in the schema but always in the code. It only started failing with the latest schema update because additionalProperties: false was added. That's an easy fix.

However, the mysterious dt is added by MetricsClient.processSubmitCall. Can you shed some light on why it's there and what is the proper way to handle it?
EDIT: According to the docs, we're supposed to define dt at the root and it is somehow different from meta.dt. I'll just go ahead and do that then.

Change 1006014 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/Wikistories@master] Remove unexpected field 'activity_session_id' from contribution events

https://gerrit.wikimedia.org/r/1006014

@SBisson: root-level dt is the client-side timestamp, while meta.dt is the server-side timestamp. Both are useful to have for analysis.

Re: activity_session_id I actually would recommend to update the schema to include it, rather than removing it from the instrument. Having a token for linking events together in a single session is especially useful for analysis.

For changes like that, please check with @cchen first.

Change 1006060 had a related patch set uploaded (by Sbisson; author: Sbisson):

[schemas/event/secondary@master] Wikistories contribution: declare dt field

https://gerrit.wikimedia.org/r/1006060

So, pending @cchen opinion, the proposed changes would be

  • add activity_session_id to the contribution schema and keep it populated in the code
  • Add dt to the contribution schema and let it be populated by MetricsClient
  • Stop populating meta.dt on the client since this is supposed to be server time
  • Possibly add dt to the consumption schema and disallow additional properties to follow the guidelines

@SBisson can we keep both dt and meta.dt?

If I understand correctly, we should stop populating meta.dt on the client-side so it gets populated on the server side when the event is received. This way we would have both available for analysis.

@SBisson sounds good, other proposed changes look good to me.

@cchen The schema change is here: https://gerrit.wikimedia.org/r/1006060 Please review so it can eventually be merged and deployed. Thanks!

Change 1006060 merged by jenkins-bot:

[schemas/event/secondary@master] Fix Wikistories schemas

https://gerrit.wikimedia.org/r/1006060

Change 1006014 merged by jenkins-bot:

[mediawiki/extensions/Wikistories@master] Stop populating meta.dt so it get populated server-side

https://gerrit.wikimedia.org/r/1006014

In wikistories_consumption_event, referrer_type = "shared_link" if the story view or story impression is from a shared link

  • No events were found in the table using the following query. and the referrer_type column is NULL for all the events.
select * from event.mediawiki_wikistories_consumption_event where referrer_type = "shared_link" and year = 2024 and month = 4
  • but I was able to find the story views from shares in wmf.pageview_actor using the following query.
select cast(substr(dt,1,10) AS DATE), count(*) from pageview_actor 
where year = 2024 and month = 4 AND uri_host = 'id.wikipedia.org' AND uri_query like '%action=storyview%'
group by cast(substr(dt,1,10) AS DATE)
`

In wikistories_consumption_event, referrer_type = "shared_link" if the story view or story impression is from a shared link

  • No events were found in the table using the following query. and the referrer_type column is NULL for all the events.

As far as I can tell, referrer_type was never populated by the instrumentation code and it is acknowledged by a comment in the schema.

In this particular case, since we cannot know about where the story is being shared (email, social media, etc), referrer_type="shared_link" would be redundant to event_type = "story_share" so we suggest using the latter for querying in both contribution and consumption event tables.

Change #1028886 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/Wikistories@master] Fix name of story content js var so instrumentation can access it

https://gerrit.wikimedia.org/r/1028886

Change #1028886 merged by jenkins-bot:

[mediawiki/extensions/Wikistories@master] Fix name of story content js var so instrumentation can access it

https://gerrit.wikimedia.org/r/1028886

There is one criterion that has not been met:
The share event in wikistories_contribution_event (event_type = "story_share") is not tracked when users access the shared link.

We have decided to pause this work and close this ticket.