Page MenuHomePhabricator

[Epic] Bring Test Kitchen Kotlin SDK to functional parity with JS + PHP
Open, MediumPublic

Description

Derived from T412028: [SPIKE] Scope bringing Kotlin SDK up to date and T401023: [XL] xLab Client Library: Convert to Kotlin and bring into app repo.

Work related to FY25-26 SDS2.4 User Adoption - Android and the Test Kitchen Kotlin SDK

Objective/Hypothesis

If we add an app-install enrollment authority to the Kotlin SDK and provide corresponding support for it in Test Kitchen, then the Android team can run experiments using Test Kitchen which will enable them to more efficiently configure instruments/experiments and access automated analytics.

How does this objective/hypothesis relate to organizational goals?

This hypothesis supports the 2025-2026 fiscal year annual plan for product and technology department to deliver on SDS2 Objective KR2.4, which states:

SDS Objective:

Product managers can quickly, easily, and confidently evaluate the impacts of product features on Wikipedia.

KR

At least 14 out of 20 product teams have used Test Kitchen to inform a strategic decision for an OKR initiative, by the end of Q4.

Description

The Test Kitchen Kotlin SDK (Android) currently functions as a core event submission client following the Java-to-Kotlin refactor in T401023: [XL] xLab Client Library: Convert to Kotlin and bring into app repo.. While it supports event submission, configuration, context, and sampling, it does not yet provide experiment- or instrument-shaped abstractions, nor does it match the public API exposed by the Test Kitchen JavaScript and PHP SDKs.

This epic defines the work required to bring the Kotlin SDK to functional parity with JS + PHP SDKs, aligned with current Test Kitchen platform requirements, while explicitly deferring exposure logging and certain product-level decisions as follow-up work.

Goals

The Kotlin SDK should:

  • Expose the same core conceptual building blocks as the JS and PHP SDKs
  • Provide experiment-shaped and instrument-shaped abstractions that produce correctly-shaped events
  • Allow the Android app to:
    • construct valid events
    • attach required contextual attributes
    • submit events reliably to EventGate
  • Fetch and interpret instrument and experiment configuration from the Test Kitchen UI
  • Support deterministic enrollment and assignment via a clearly defined Enrollment Authority
  • Be structured so that future platform requirements (e.g., exposure logging) can be added without re-architecting the SDK
Out of Scope

The following are intentionally not included and will be addressed in follow-up work:

Public SDK

All higher-level abstractions should be developed to provide a minimal, stable public API that allows callers to:

  • Initialize and configure the SDK (endpoints, environment, identifiers, state)
  • Construct a valid event (action, payload, schema/stream)
  • Attach required context (platform, app, version, instrument/experiment config)
  • Submit events deterministically (buffering, batching, retry, drop, flush)
Functional parity requirements

Core conceptual components

The Kotlin SDK should model the following concepts (semantically equivalent to JS/PHP where applicable):

  • SDK entry point (initialization & configuration)
  • Event (atomic, structured unit)
  • Contextual attributes (metadata decorating events)
  • Instrument (producer of metric events)
  • Experiment (provider of variation & assignment)
  • Assignment (subject-to-variant mapping)
  • Submission / Send (delivery to EventGate)
  • Lifecycle & state management
  • Event validation against expected schemas
Kotlin SDK Components
  1. Core SDK
  • Implement EventFactory for event construction
  • Implement ContextualAttributesFactory as single source of truth
  • Implement EventSender interface
  • Remove event construction from TestKitchenClient.submitMetricsEvent()
  • Remove TestKitchenClient altogether?
  1. Experiment abstraction & enrollment
  • Implement Experiment interface: Experiment, UnenrolledExperiment, OverriddenExperiment
  • Implement ExperimentManager to map enrollment results to experiment objects
  • Implement EnrollmentAuthority interface
    • Initial implementation: remote MediaWiki-based enrollment for logged-in only experiments? (To be confirmed)
  • Implement EnrollmentRequest and EnrollmentResult similar to PHP's Coordination features
  1. Instrument abstraction
  • Implement Instrument interface:
    • getInstrument()
    • send()
  • Enforce stream/schema binding and payload shape
  1. Configuration fetching
  • Implement TestKitchenConfigsFetcher (async HTTP client)
  • Add config resolution layer to map JSON to SDK models
  • Define caching strategy (TTL + invalidation)
  1. Validation
  • Define event validation contract:
    • required fields
    • failure (log/drop)
    • parity with JS/PHP where applicable
  1. Test Kitchen updates

UI:

  • Update User identifier type to Subject identifier type
  • Add app-install as an identifier type
    • Enrollment authorities use MW user ID, edge unique ID, app install ID

Endpoint:

  • add authority=mobile-apps
  1. Testing
  • Run an Android instrument to collect metric events
    • correctly-shaped instrument events
    • queries validate expected metric events
  • Run a synthetic Android experiment:
    • stable assignment
    • correctly-shaped experiment events
    • queries validate enrollment & assignment
  1. Deprecation / Cleanup
  • Remove legacy Test Kitchen Client fka Java MPC T413855: Remove Java Metrics Platform/Test Kitchen code
  • Remove unused or redundant Kotlin classes:
    • ContextController
    • StreamConfigCollection
    • CurationController
  • Remove unused Android stream configs deployed for Java MPC
  • Remove Java library releases from Maven Central and Archiva
  1. Documentation
  • Experiment and Instrument workflows/APIs are updated in Test Kitchen docs

Questions

The following should be resolved but are not implementation tasks in this epic:

  • What is the canonical subject identity for Android experiments?
    • device / app install
    • MW Central ID (logged-in)
    • hybrid?
  • Should Android support logged-in-only experiments?
    • not right now - eventually yess
  • Do Android experiments require per-wiki traffic allocation?
    • yes
  • Are language / geography targeting required at this stage?
    • yes
  • Are platform-specific experiment config endpoints needed in Test Kitchen UI?
    • yes -- authority=mobile-apps

Success metrics

  • Kotlin SDK exposes experiment- and instrument-shaped APIs equivalent in capability to JS + PHP
  • All event construction goes through EventFactory + ContextualAttributesFactory
  • Experiment enrollment and assignment are deterministic and testable
  • Sample instrument and synthetic experiment validate end-to-end testing
  • Legacy Java MPC-like code and releases are removed

Dependencies / Order

Decision: Subject Identity >> Decision: Logged-in-only experiments? >> EnrollmentAuthority >> ExperimentManager >> Experiment API >> Synthetic Experiment Test

Event Timeline

moving this here from the spike:

Some other questions/notes coming out of Slack/Phab:

  • Presumably we're still working out how we want to instruments and experiments to inherit either from each other or from a parent abstract class - pending this resolution, I'll update/add tickets accordingly
  • Apps may need per-wiki traffic allocations and language/geography targeting needs - pending discussion
  • Apps will need device added as a sampling unit in the TK UI
  • Create API endpoint in TK UI to serve experiments by platform (MW, Varnish, Devices)
cjming updated the task description. (Show Details)

[whoops, copying from the spike task:]
I'll try to answer or comment on some of these points, with my very limited conception😅 of what this new SDK will provide:

how do apps run A/B tests?
Presumably it works like: device/app identity decides deterministic bucketing?

Basically yes: the unique app_install_id is used to decide which bucket the user falls into. All of the bucketing is done client-side.
The conditions for whether a user is included/excluded from an experiment could also include any kind of data that's available locally: the user's primary app language, their geo location, whether they're logged in, whether they are an editor, whether they are a donor, whether they have any recent reading activity, and so on.

How do apps do remote configuration?

We have a REST endpoint from which apps get remote configuration for certain features. So far, this has not been used for A/B test purposes.

Do apps need to be able to do different bucketing based on logged-in status?

Sure, I can definitely foresee experiments where only logged-in users are considered, or only temp-account-holders, etc.

Do apps run server-side experiments?
Do apps ever run hybrid (client-side + server-side) experiments?

No; I'm not sure what a server-side experiment would mean in the context of apps.


For Kotlin then and for parity, it seems like we also want to move away from a TestKitchenClient - or do we? We need a way to initialize the SDK

I think, by definition, the SDK is a "client" in the sense that it needs to read configuration from the server (stream configs, plus instrument and experiment configs?).

Yes, the SDK needs to provide a way for the app to initialize it, so that it can wake up and restore its configuration state, whether it's reading stream configs from the network or from local cache.
Ideally the SDK should provide these affordances:

  • A way to plug in the app's existing network stack, so that it doesn't duplicate Retrofit or OkHttp objects, which are rather resource-heavy.
  • A way to initialize with existing stream configs. Since it's likely that this SDK will need to coexist, at least for a while, with our current Event Platform logic, and since Event Platform uses the same stream configs, it would be nice if the fetching of configs wasn't duplicated.
  • Callbacks for getting things like AgentData, PerformerData, and PageData on the fly, since all of those things can change dynamically during the lifetime of the app, and during the instance of the SDK.
  • Hooks into the app's lifecycle, e.g. whenever an Activity is paused/resumed, which is likely necessary for session management.