Page MenuHomePhabricator

[Hypothesis] 5.3.3: Attribution API
Open, HighPublic

Description

If we provide content reusers with canonical, low-friction retrieval paths, we will lower the effort required to adopt trust signals from Wikimedia’s Attribution Framework, avoid the standardization of suboptimal retrieval paths, and unlock our ability to reliably measure attribution adoption.

Background

WE 5.3 focuses on creating an Attribution Framework. The framework is designed to: a) Ensure that content reusers are appropriately meeting the requirements set by Wikimedia content licenses, b) Surface Wikimedia project brands to promote content trust and continually build brand awareness across distributed ecosystems, and c) Amplify signals that may motivate users to engage with Wikimedia's mission directly as a reader, contributor, or donor.

The framework website is currently in an alpha stage, where it is being shared with specific partners for feedback. However, the page itself is publicly accessible and could be discovered by additional audiences. Additionally, the current approach faces the following challenges (in rough priority order):
Measurement: Overloading existing endpoints prevents us from differentiating attribution-specific usage from other use cases. Dedicated endpoints will enable us to accurately track attribution adoption metrics. I imagine that y'all can get around that with shared partner data, but it would be nice to know who is attributing within the community and such as well.
Performance & Cacheability: Some data points (reference count, contributor count) are expensive to calculate on the fly, especially via the Action API. While it’s nice that we are offering multiple options to get the data, this is an instance where more opinionated paths may reduce unexpected consequences and user confusion in the long run. Dedicated, RESTful endpoints could offer responses that are well formatted, responsive, and unlikely to have negative impacts on our infrastructure as we can easily cache the data. Two particularly risky examples in the current model are:

  • Contributor count on active pages (like Donald Trump) could require 100+ Action API requests to tally all editors, including calls to multiple actions to get both anonymous and registered editors.
  • Reference counts rely on parsing rendered HTML (including in REST), risking expensive reparses, cache pollution, and brittleness when things like templates change.

Adoption Friction//: The harder we make it, the less likely folks will actually adopt the standards. Requiring substantial client-side parsing (HTML, URL) across multiple data points is probably enough to drive some people away.
Implementation Stickiness: Once developers implement a "good enough" solution, migrating them to something different, even if it’s better for everyone, becomes difficult. 3rd party reusers in particular will not want to reinvest in the same solution multiple times.
Reputation//: While certainly far from the top priority, a common product sentiment is “Fix anything that makes us look bad.” For some of these signals, the experience frankly makes us look bad.

To ensure that developers do not erroneously invest in non-recommended workflows and can more meaningfully engage with the recommendations, we need to move quickly to provide a more user-friendly and cohesive experience.

Problem statement

Although many of the identified attribution requirements and trust signals are technically available through existing APIs, the developer experience to get them is far from perfect. The current workflows put too much responsibility on the clients to work across many endpoints and make implementation judgement calls to stitch together what is ultimately required. Additionally, by leveraging existing endpoints, it prevents us from understanding who is engaging with and adhering to our framework recommendations; this is because the endpoints are likely already being utilized for other use cases, and we would not be able to explicitly differentiate the attribution usage.

Impacted users
The Attribution Framework asks content reusers both within and outside of the mission to appropriately attribute the reused content, while also ensuring that their users know that the content originated on Wikimedia projects. While the main use case was motivated by third party reuse outside beyond the mission (eg: AI chatbots, Search summary results), the requirements still apply to Wikimedia community members who are reusing content off platform, such as in educational games, or even sharing content within their social networks.

Scope

The primary scope of this project is to create user friendly and observable endpoint(s) that can be used to implement the recommendations surfaced through the Attribution Framework. The final outcome of this work does not necessarily have to be the final version of the endpoint(s), but should instead act as a means of promoting use and feedback so that we may validate attribution workflows and learn about how to continually improve the workflows.

Milestones/Epics:

  • Release MVP that returns basic attribution information and which can be used for initial partner feedback.
  • Expand MVP to include all requested fields and required logic.
    • License information & coverage for media files
    • Author information returned on media files
  • Implement reference count
  • Improve contributor count
  • Refine cache logic/support
  • Complete OpenAPI spec that meets all API Platform requirements (eg: Linting, test coverage, monitoring, sandbox visibility)
  • Research and make recommendation for best way to host & return Wikimedia brand marks
  • Update attribution framework documentation to reference new API workflows
  • [Stretch] Update CTA logic
  • [Stretch] Implement trending indicator

Out of Scope

  • Deprecating existing endpoints that may offer similar capabilities. Technical investigation should surface if this can and should be considered for future work, and if there is opportunity to have the REST interface flow through existing backend code paths; however, there is intention for creating dedicated REST endpoints for fetching attribution data.
  • The scope of work this quarter focuses on releasing a beta endpoint/API module. We may elevate the work to an official v1 this quarter, but will likely postpone until a future quarter so that we may 1) Gather and respond feedback from partners and community members who try using the endpoint, and 2) Ideally launch this API in a way that reflects canonical routing and/or other standardized API structures (eg: error messages).
  • Enterprise use cases are outside of scope for this work — we assume that Enterprise will create similar workflows for their users to utilize (some of which may ultimately rely on our endpoint and/or related event stream data). We also do not expect community members to create Enterprise accounts simply for leveraging attribution information.
  • Monitoring downstream actions. This API will return the attribution information, but will not handle keeping track of users who actually return to our site as a result of the presented information. There is a separate workstream to accomplish that, and we may commit to future items to support their desired workflows.

Expected Outcomes

The main goal of this work is to make it as easy as possible for external developers to appropriately attribute Wikimedia content. 

  • Increased adoption: The easier we make it, the more likely partners and community members are to actually adopt the recommended signals. Particularly in our situation, where the content is freely available through open licensing, we are ultimately asking developers to follow these standards out of a sense of goodwill. Removing friction will increase the chances that partners and community tools will adopt all signals, instead of simply ignoring or cherry picking the most straight-forward or licence required signals from the list.
  • Adoption observability: By creating dedicated endpoints, we can more directly observe and monitor adoption. The current approach of offering a smattering of existing solutions will make it difficult to differentiate who is using data for attribution purposes, vs those who are using the data for. Additionally, dedicated endpoints will more easily support future enhancements like supporting conversion tracking more natively.
  • Flexibility through abstraction: By creating an API contract that is abstracted from the backend, we are able to maintain a consistent interface for callers while enabling flexibility on the backend to change how we calculate certain values. For example, we may want to change how contributors are tallied over time, with outstanding decisions for whether we include bots as editors, reverted changes, and other nuanced interpretations of what ‘contributor’ means.
  • Improved developer experience: The current experience relies heavily on client-side parsing and business logic to get to the final signals. That makes it difficult for external developers to implement and ultimately maintain, particularly in cases where they are expected to parse HTML or rely on non-use case specific endpoints that may change over time. A dedicated API will allow us to take a more direct and opinionated approach to the signals themselves, while also making it easier for content reusers to adopt them.
  • Partner feedback: Rapid delivery of initial endpoints through a ‘beta’ API module will allow partners to engage more quickly with the recommendations, which will lead to faster and more meaningful feedback loops. This feedback will allow us to improve both the API and the attribution framework itself, such as refining the definitions of specific signals.

Known risks & limitations

Signal clarity: Some signals do not have clear enough definitions for implementation. The attribution framework is new, and some signals (eg: Trending; CTAs) are still aspirational in nature or have a strong confidence in priority coupled with loose technical definitions. Additional refinement is required from both 5.3 leadership and engineering. 

  • Mitigation: We plan to release the initial version of the API as a beta module, so that partners may interact with it and give us feedback to improve the experience. We intend to make the API contract as stable as possible, yet provide enough abstraction that we can change how specific signals are calculated without impacting consumers and end users.

Wikimedia project customizations: Every Wikimedia project has custom configurations; in some cases, these customizations go to extreme measures and fundamentally change how and where certain data points are stored. For example, the notion of an Author for an original work is structured stored very differently between Commons and Wikisource.

  • Mitigation: We will prioritize delivering data for Wikipedia first. Looking beyond Wikipedia, we will perform additional feasibility studies and propose technical designs where needed. 

Balancing rapid delivery with building it “right”: We need to move quickly with this rollout so that we neither block the broader delivery of the Attribution Framework nor introduce undue confusion for the tech partners participating in initial discovery.

Related Work

KR 5.3 and related hypotheses.
Enterprise is creating similar workflows for Enterprise users.

    1. Dependencies
  • Some signals do not yet have a sufficient definition of intent or structure to guide technical implementation. We will work with the WE 5.3 leaders and respond to additional requirements or feedback as necessary.
  • We will likely need support from DPE for implementing the 'trending' indicator.
  • Fundraising should guide (and ultimately be responsible for) what is returned in the donation CTA.
  • This work requires beta module definitions to be completed and available.

Next steps

After releasing the beta module and receiving feedback from partner developers, we will upgrade the endpoint to a production version. Ideally, this will coincide with V2 pathing patterns so that we do not need to go through a deprecation cycle beyond the beta upgrade.

Deadline

The deadline for the initial MVP of this work is the end of January 2026. Acceptable logic and all key signals should be included by the end of the quarter.

Related Objects

StatusSubtypeAssignedTask
Openpmiazga
ResolvedMooeypoo
ResolvedSpikeAtieno
ResolvedSpikeAGhirelli-WMF
Openpmiazga
OpenSpikepmiazga
Openpmiazga
OpenAGhirelli-WMF
Openpmiazga
OpenAtieno
OpenAtieno
OpenNone
Declinedpmiazga
OpenSpikepmiazga
OpenNone
Openpmiazga
OpenAGhirelli-WMF
In ProgressSpikepmiazga
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

HCoplin-WMF added projects: OKR-Work, Epic.
HCoplin-WMF updated the task description. (Show Details)

A quick note about the trending indicator: in T409601, DPE onboarded a data pipeline for WME that classifies articles as trending according to (I think) this methodology. You can see a sample query here. Naturally, your definition of trending doesn't have to agree with that, but this could be a starting point.