Please provide all the following information:
- Context.
For FY 25-26, WE5 (Responsible Use of Infrastructure) is being supported from Product Analytics by me (@KCVelaga_WMF). Apart from specific analyses, one of the key questions we are trying to answer as part of analytics support is to understand what can currently be measured, what can't be, and to what extent can we reasonably measure/track with current tooling available. As an example, at the moment we are working to try and segment API traffic into various potential cohorts using mix of clustering and heuristics. For most measurements in this area, the typical analytics instrumentation we have for product metrics is not available. The instrumentation is available (mostly maintained by SRE), with majority of the logs being clubbed with webrequest table (for example, use of X-analytics header), but not as say, an event stream that can be easily queried for specific actions taken by users. This means, webrequest table will be the primary data source for the most part for any analytics needs within the objective area for time being.
In my conversations with @fkaelin during the last few weeks, I realized that he (and previous work with Muniza) have developed some guides, explorations and reusable functions for working with webrequest data. Also, discussing learning from working for webrequest table has been extremely helpful to speed-up my work. For example, this notebook about creating intermediate datasets to speed-up queries, exploration of entropy metrics, various functions to break down the uri query parts. All this has been quite helpful as I didn't need to spend a significant amount of time to figure things out from scratch.
- Description.
I have been informally talking to Fabian, but it would be great to have more formal support. The request is for consultation for this and next quarter, as I work through analysis needs of the objective area. As I will be working webrequest logs for the most part, having this consultation support would be helpful to 1) avoid spending time on what is already known to work and doesn't 2) be able to reuse existing code where available and best apply them 3) think through the application of webrequest in general and how it can be improved or built upon for working within WE5 and also some things in SDS.
The ask is for consultation on a regular basis, and some pair programming sessions. My best guess is that this would be 8-10 hours of time on average in a month (at max) - but it may change if more involvement is needed, in which case we can discuss and update the request. The goal of the request is also to have visibility and documentation for the support.
- Expected Deliverable.
Mostly what I mentioned above, but the result would be that analytics support for WE5 would be faster (hard to quantify I guess), but eventually across analyses I can share how these consults and the previous work done in this area has been helpful to build upon.
- Estimated Effort.
2-3 hrs / week on average, but usually less than that.
- Priority Please indicate a priority of your task and a small description of what it would unlock for you. We ask you to leave this task as “needs triage” since your request will go through a Backlog refinement process where our team will prioritize the work.
I need this task resolved in:
- 1 month.
- 3 months.
- 6 months.
- Whenever you get to it :-)
- Other. Do you have any other questions or comments ?
For use by WMF Research team; please leave everything below as it is:
- Does the request serve one of the existing Research team's audiences? If yes, choose the primary audience. (1 of 4)
- What is the type of work requested?
- What is the impact of responding to this request?
- Support a technology or policy need of one or more WM projects
- Advance the understanding of the WM projects.
- Something else. If you choose this option, please explain briefly the impact below.
Tasks needing support
this section will be updated as needs emerge
- Scaling Semantic URI generation API calls (from webrequests)
- Scaling IP and UA segmentation