We want to tag the following types of Wikipedia Preview webrequests in the logs, so they can be counted by an Oozie job:
- API requests for preview content, triggered when a user hovers or taps on a Wikipedia Preview link
- Regular requests for a Wikipedia article, triggered when a user clicks the link in the preview
The tag must not be applied to other types of webrequests generated by Wikipedia Preview. Ideally, the tag should be the same for both types of requests, but this isn't strictly necessary.
- Adding a value to the X-Analytics header, which contains key-value pairs formatted as in this example: ns=-1;special=Userlogin;WMF-Last-Access=09-May-2017;WMF-Last-Access-Global=09-May-2017;https=1. These key-value pairs are parsed into a map field in webrequest, so they'll be easy to access. However, we cannot change headers for requests generated by links.
- Adding a query parameter to the URL. This is not as easy to access; webrequest copies the query portion of the URL into a dedicated field, but doesn't parse it. This is the preferred option. We will try it first and reconsider if it doesn't work.
We will also need to be able to detect which partner the site is coming from. The options are:
- Relying on the referrer field. This will be set automatically, but it would create difficulties if some partners have content on multiple domains or subdomains. The referrer field isn't specially parsed. This is the preferred option. We will try it first and reconsider if it doesn't work.
- Use the tag (whether in X-Analytics or the query string) to indicate the partner. This will require a little bit of manual configuration for each partner, but will produce cleaner data.