Page MenuHomePhabricator

[SPIKE] Create a plan for dogfooding MPIC for a test instrument deployed to the Beta Cluster
Open, Needs TriagePublicSpike

Description

Background

From T366312: [MPIC][User Story] As Product Manager I am able to judge efficacy of dogfooding for MPIC based on selected test instrument:

As part of evaluation of MPIC, we want to test the system end-to-end with a real world use case to test that it meets requirements.

AC

  • Determine which test instrument will be used?
    • Will we need to build a test instrument?
  • The plan is published to a wiki
    • metawiki?

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptMon, Jun 3, 10:51 AM
  • Determine which test instrument will be used?
    • Will we need to build a test instrument?

We will need to build a new instrument or modify an existing one. We have deliberately designed MPIC such that it won't override any pre-defined stream configurations. That is, all instruments that are currently deployed and configured via EventStreamConfig cannot be configured via MPIC (unless we remove their configuration first, which will disable the instrument for a while).

Thinking strategically, would it make sense to target building and deploying a standard instrument, e.g. a heartbeat instrument similar to SessionTick? I have a patch for it here: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/1030064


The steps involved in this are relatively straightforward:

  1. Deploy the MetricsPlatform extension to the Beta Cluster
  2. Configure the MetricsPlatform extension on the Beta Cluster
    • $wgMetricsPlatformInstrumentConfiguratorBaseUrl needs to be set to the internal base URL of MPIC
  3. Agree on the instrument and stream names for the instrument
  4. Build the example instrument
    1. Deploy the example instrument (via a train deployment)
  5. Enable the instrument via MPIC

1 & 2 are covered by T366315: [MPIC][User Story] As software Engineer, it is clear how to deploy a fully functional system in the beta env.
4 is covered by T364548: [SPIKE] Design API for the standardised page lifecycle instrument mixin
4 & 5 can be done in parallel.
6 can be done by anyone.

☝️ We moved this back to In Progress so that we could fold in detail from a discussion between Ben Tullis and Andrew Otto about what is possible with the Event Platform deployment on the Beta Cluster.

Alright. @WDoranWMF and @cjming have also being doing some work on this recently.

What we've found out

1

The app servers cannot make requests to services on the internet. This restriction also applies to the Beta Cluster.

We could run an instance of MPIC on the Beta Cluster but that would also require running a MariaDB instance, which increases operational overhead.

We can apply to SRE ServiceOps for a configuration to be added to allow access to MPIC at mpic-next.svc.eqiad.wmnet.

Alternatively, we can host a MediaWiki instance, an EventGate instance, and an MPIC instance in Toolforge so that we can do our own end-to-end testing in isolation.

My preference would be the former, even if it means coordinating with more teams, because it would mean that other teams could experiment with the system freely and as part of their development workflow. However, the latter allows us to move faster.

2

There are data-lake-ey parts on the Beta Cluster – events submitted to the Beta Cluster EventGate instance are produced to the Beta Cluster Kafka instance and then expire some time later.

Changes to the plan

(To confirm with @WDoranWMF)

We are going to host a MediaWiki instance, an EventGate instance, and an MPIC instance in Toolforge and do our own end-to-end testing in isolation.

Sources

  1. https://wikimedia.slack.com/archives/C055QGPTC69/p1718646585576599
  2. https://wikimedia.slack.com/archives/C055QGPTC69/p1718635516439119

We are going to host a MediaWiki instance, an EventGate instance, and an MPIC instance in Toolforge

Cool! Where will the EventGate instance produce to?

We could run an instance of MPIC on the Beta Cluster but that would also require running a MariaDB instance, which increases operational overhead.

FWIW, this sounds like less overhead than running your own MW, EventGate, and MPIC in toolforge? But I suppose toolforge makes things very easy?

In case this helps: https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate/Administration#Beta_/_deployment-prep

We are going to host a MediaWiki instance, an EventGate instance, and an MPIC instance in Toolforge

Cool! Where will the EventGate instance produce to?

It won't. Per @VirginiaPoundstone, it's sufficient to know that events are being sent to and validated by an EventGate instance.

FWIW, this sounds like less overhead than running your own MW, EventGate, and MPIC in toolforge? But I suppose toolforge makes things very easy?

Toolforge does make simple things very easy. That said, more folks would benefit from improving the Beta Cluster.

it's sufficient to know that events are being sent to and validated by an EventGate instance.

k! let me know if I can help in any way.

BTW, if you wanted, you could probably produce to EventGate in beta.

Then you'd have https://stream-beta.wmflabs.org/v2/ui/#/ there to help validate that events are sent and validated as expected.