I'm not sure how I actually feel about this but I wanted to start a conversation/task around potentially adding a field to pageview_actor that would record whether a particular actor signature had evidence of editing activity in their session.
Why:
- In research, for privacy reasons, we often filter out editors from reader session datasets given that their pageview history can be partially reconstructed from edit history -- e.g., the covid dataset. While the logged-in parameter can help with this, that's an imperfect proxy (you can have an account but not edit; you can edit but not have an account).
- The clause for determining "did edit" depends on webrequests not marked as pageviews so the data is in the webrequests table but not pageview_actor table
- Precomputing this in pageview_actor would also help to normalize this pattern and make it more accessible to folks working with session data
- While surfacing it as a field makes it easier to identify editors in the data, I don't see this as a major privacy concern given that pageview_actor also has the same 90-day limit and access restrictions as webrequests
Potential reasons not to do it:
- For people who want this filter, they can always just go back to working with webrequests instead
The logic that we currently use for this is:
(uri_query LIKE '%action=edit%') # desktop wikitext editor (uri_query LIKE '%action=visualeditor%') # desktop and mobile visualeditor (uri_query LIKE '%&intestactions=edit&intestactionsdetail=full&uiprop=options%') # mobile wikitext editor
Notes:
- these clauses have to sweep through non-pageviews which is why it has to be done in the creation of pageview_actor as opposed to ad-hoc as needed afterwards.
- I haven't checked these clauses recently but hopefully they are still correct :) Nothing prevents changes to the API calls that would break these though...
- we don't have a clause for the apps because we traditionally leave them out of research datasets though as they grow in popularity, this will become less acceptable and we'll want to figure out how to include them in these clauses.