Page MenuHomePhabricator

How are people finding campaigns now?
Closed, ResolvedPublic

Event Timeline

Query:

query_vars['pv_page_title'] = tuple(('Event:' + selects['event_page_title'].astype(str)).unique().tolist())

event_referer = '''
SELECT  
   page_title, sum(view_count) AS pageviews, referer_class, referer_name, year, month
FROM wmf.pageview_hourly  
WHERE 
  year       = {year}            AND
  month      >= 1      AND
  month      < 7   AND
  agent_type = 'user'            AND
  project = 'meta.wikimedia'     AND
  page_title IN {pv_page_title}
GROUP BY page_title, referer_class, referer_name, year, month
    '''

campaign_referers = spark.run(event_referer.format(**query_vars))
#remove prefix
#campaign_referers['page_title'] = campaign_referers['pv_page_title'].map(lambda x: x.lstrip('Event:'));

Pageview_hourly documentation on DataHub, on wikitech

Referer_name = Name of referer when referer class is external(search engine) or external(media sites)
Referer_class  = Can be internal, external or unknown
referer_class% of total pageviews
external1.1
external (media sites)1.4
external (search engine)2.0
internal45.1
none50.3

None = direct traffic

Iflorez updated the task description. (Show Details)

Follow-up questions posed by team members:

  1. What is meant by internal? internal means that a reader arrived at the event page from another wiki page; this includes banner traffic.
    1. What is an example of this? I don't have an example of this yet. I'll be following the conversation on this Slack channel to see if I can pull examples, and ideally a report on which pages are driving traffic or topics therein etc.
  2. If someone learns about an event on twitter, for example, and there is a direct link to the event page from someone’s tweet, is the traffic from that person considered to be ‘social media’ or ‘none’? > That would be logged as "external (media sites)".
  3. Note: Automated bot traffic, usually shows up as None or Direct traffic. What are bots? Wikipedia's content is read by humans and also by code scripts with different levels of ability (often called bots). Identifying the types of scripts, such as spiders, is crucial for allowing us to distinguish human traffic from automated and mechanized traffic. While we do not believe at this time that traffic to event pages is due to automation, it's important to keep this in mind as we learn more.
Aklapper renamed this task from How are ppl finding campaigns now? to How are people finding campaigns now? .Jul 13 2023, 8:32 AM

Thank you, @Iflorez! One more question: What would traffic from CentralNotice banners fall under? Would that be direct links, links from one wiki to another, or something else?

Thank you, @Iflorez! One more question: What would traffic from CentralNotice banners fall under? Would that be direct links, links from one wiki to another, or something else?

Banner traffic is coming in as referer_class = internal

'Facebook', 'Instagram', 'Twitter' = External (media site)

I've created a new ticket for specific traffic tagging or traffic type questions T342155

Queries:

event_referer = '''
SELECT  
   CONCAT(CAST(year AS string), '-', 
                   LPAD(cast(month as string), 2, '0'), '-', 
                   LPAD(cast(day as string), 2, '0'), '-', 
                   LPAD(cast(hour as string), 2, '0')
                   ) AS timestamp,
   page_title, 
   page_id, 
   sum(view_count) AS total_pageviews, 
   view_count AS pageviews,
   referer_class, 
   referer_name
FROM wmf.pageview_hourly  
WHERE 
  year       = {year}            AND
  month      >= 1      AND
  month      < 7   AND
  agent_type = 'user'            AND
  project = 'meta.wikimedia'     AND
  page_title IN {pv_page_title}
GROUP BY page_title, page_id, referer_class, referer_name, year, month, day, hour, view_count
    '''
event_pva = '''
SELECT  
   actor_signature, 
     CONCAT(CAST(year AS string), '-', 
                   LPAD(cast(month as string), 2, '0'), '-', 
                   LPAD(cast(day as string), 2, '0'), '-', 
                   LPAD(cast(hour as string), 2, '0')
                   ) AS timestamp,
   page_id,
   referer_class,
   referer,
   uri_host, 
   uri_query
FROM wmf.pageview_actor pva
WHERE 
  year         == {year}                  AND
  month        == {month_int}             AND
  month        >= {month_int_less_two}    AND
  pva.uri_host == 'meta.wikimedia.org'    AND
  page_id IN {pv_page_id}
GROUP BY 
   page_id,
   pva.referer,
   referer_class,
   pva.uri_host, 
   pva.uri_query,
   actor_signature,
   year, month, day, hour
    '''

Data source: pageview_actor table.
Data source notes: The wmf.pageview_actor table (available on Hive) contains filtered webrequest data to keep only pageviews and redirects to pageviews. It keeps most dimensions from webrequest, has an updated agent_type value flagging traffic estimated automated, and offers the actor_signature field facilitating in-project session-fingerprinting. It is stored in the Parquet columnar file format and partitioned by (year, month, day, hour). As webrequest, the data is deleted after 90 days.

External Media Sites sources:
Facebook,
Instagram,
Twitter

External pageview sources:

  1. mail.google.com
  2. google.com
  3. bing.com
  4. lens.google.com
  5. linktr.ee
  6. web.telegram.org
  7. startpage.com
  8. qr.page
  9. guc.toolforge.org
  10. docs.google.com
  11. tinyurl.com
  12. eventdata.crossref.org

Internal pageview sources:
https://meta.wikimedia.org/wiki/Main_Page/fr
https://meta.wikimedia.org/wiki/Main_Page
https://meta.wikimedia.org
Meta search page
https://commons.wikimedia.org/
https://meta.wikimedia.org/wiki/WikiForHumanRights/Join_Community_Events
other similar event page
https://meta.wikimedia.org/wiki/Accueil
https://pt.wikipedia.org/
https://meta.wikimedia.org/wiki/Special:PageTranslation
User group page
Event Talk page
Meta Special:All Event Pages
https://fr.wikipedia.org/
https://meta.m.wikimedia.org/
https://meta.wikimedia.org/wiki/Wiki_Kilti_Ayiti_2023
user page
https://en.wikipedia.org/
https://meta.wikimedia.org/wiki/Special:GlobalPreferences
https://meta.wikimedia.org/wiki/WikiLinguila

See the pageview actor query, adapted for simplicity, in action on Superset.