Page MenuHomePhabricator

Commons Impact Metrics AQS 2.0 Deployment to Staging and Production
Open, In Progress, HighPublic

Description

For the May 2024 Wikimedia Hackathon, the Data Products team has produced a new AQS 2.0 service.

This service needs to be deployed to both staging and production. Based on discussion with @Kappakayala, we're submitting a ticket outlining the request.

The service itself is still under construction but we are reaching out a month ahead of expected launch so that the SRE side of things can be scheduled as necessary.

Please let us know what the next steps are.

Required:

  • Add blubber and pipeline configuration to service repo
  • Integrate service image building in CI pipeline
  • Add Commons Impact Metrics Cassandra user
  • Configure Cassandra secrets for user
  • Create deployment-charts helmfile.d configuration entries
  • Add service catalogue entry for service
  • Run smoke tests against staging and certify service is working properly
  • Deploy service to LVS with ingress config in state service_setup
  • Move service production in LVS
  • Add service to the REST Gateway
  • Route public requests via ATS

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
WDoranWMF updated the task description. (Show Details)

What external paths should we be routing to what internal paths for this service?

Additionally, two timeline and one Cassandra questions:

  • When do you anticipate having a minimal binary that successfully builds? (to unblock build pipeline setup)
  • When do you anticipate having the service ready to launch publicly? (to unblock final steps that would make it publicly reachable)
  • What Cassandra tables will the new user need access to?

Hi @Scott_French !

  • When do you anticipate having a minimal binary that successfully builds? (to unblock build pipeline setup)

Does this binary need to implement some of the endpoints, or just the frame of the service?

  • When do you anticipate having the service ready to launch publicly? (to unblock final steps that would make it publicly reachable)

I think it will be in 5 to 7 weeks.

  • What Cassandra tables will the new user need access to?

The tables are not yet in Cassandra. The plan is to create the following set of tables (see: Serving layer design), but we still haven't discussed the details with Cassandra owners so there might be small changes:
https://docs.google.com/document/d/1sWPJzO9J6nwfhzJAwrbWZQBcvwiT8UhBBfn2kFW3pNQ/edit?pli=1#heading=h.trp207kw0b93

Thanks, @mforns!

Does this binary need to implement some of the endpoints, or just the frame of the service?

A minimally complete binary that listens on the expected port and responds OK to health checks should be sufficient.

The idea would be to get your service up and running as early as possible so you can iterate, but accessible only internally.

I think it will be in 5 to 7 weeks.

Ah, that's good to know. That does give us a bit more breathing room.

Next steps:

I think the main things we need are the following:

  1. Naming: I see the repo is called commons-impact-analytics and the linked doc uses commons-analytics for the leading part of the API path. Any preference on what we call this service internally?
  1. Images: Once you've configured CI image builds and have a minimal one ready, let us know and we can start bootstrapping the service.
  1. Tables: Once the Cassandra tables are created, let us know and we can create the user and credentials, wire them into the service, etc.
  1. API paths (external and internal): We'll need to know what the external and internal API paths will look like. I see commons-analytics in the linked doc, but I'm not sure what prefixes that (e.g., whether continuing to use the "RESTBase-ish" /api/rest_v1/metrics API paths for consistency with existing services vs. something else).

Edit: I just came across [0] which may partly answer #4 (suggests commons analytics will serve as /metrics/commons by analogy with existing services).

[0] https://docs.google.com/spreadsheets/d/1nl-4zjd5OfbgINsVGwEc5jh5_xEexz8H7-c5ZIFpopk/preview

> The idea would be to get your service up and running as early as possible so you can iterate, but accessible only internally.
Thanks, @SGupta-WMF and @Milimetric are currently working on completion of the first endpoint MR. We're hoping to have that in by tomorrow. With that the service itself will be functional, however, we will still lack data.

Next steps:

I think the main things we need are the following:

  1. Naming: I see the repo is called commons-impact-analytics and the linked doc uses commons-analytics for the leading part of the API path. Any preference on what we call this service internally?

Our preference would, I think, be for: commons-impact-analytics

  1. Images: Once you've configured CI image builds and have a minimal one ready, let us know and we can start bootstrapping the service.

@SGupta-WMF and @Milimetric will update once we have these.

  1. Tables: Once the Cassandra tables are created, let us know and we can create the user and credentials, wire them into the service, etc.

@Milimetric is there any interim stage where we would have "test" data? Given the likely that the case is not, @mforns do you envision much change from this point in the tables?

  1. API paths (external and internal): We'll need to know what the external and internal API paths will look like. I see commons-analytics in the linked doc, but I'm not sure what prefixes that (e.g., whether continuing to use the "RESTBase-ish" /api/rest_v1/metrics API paths for consistency with existing services vs. something else).

Edit: I just came across [0] which may partly answer #4 (suggests commons analytics will serve as /metrics/commons by analogy with existing services).

That makes sense to me, is that right @SGupta-WMF ?

@WDoranWMF Yep , it makes sense . I confirmed with @mforns that API paths and we agreed on metrics/commons-analytics . Regarding the prefix , all the AQS services use /api/rest_v1/metrics , we could use that from consistency perspective . But still will like @VirginiaPoundstone to confirm .
Also , we are developing CIM AQS services in gitlab . Would that make any significant difference? Should we raise a ticket for CI of the same?

@WDoranWMF and @SGupta-WMF, thank you both for the followup.

As for image builds on Gitlab: The Blubber config should be the same as what you would have used in Gerrit. What will differ is using Gitlab CI to trigger image builds, which I believe is documented by [0].

Specifically, up through the "Publishing an image for use in production" section, which has an example of triggering production builds when a protected tag is created.

[0] https://www.mediawiki.org/wiki/GitLab/Workflows/Deploying_services_to_production

Scott_French changed the task status from Open to In Progress.Wed, Apr 24, 9:52 PM

Thanks, all, for the details shared thus far.

While turning up the service itself is blocked on a couple of open items (see below), I'm going to start moving ahead with some of my pending patches that can happen sooner.

@mforns or @SGupta-WMF - If you could let me know when the Cassandra tables have been created, that will unblock configuring user / grants, secrets, etc.

@SGupta-WMF - Let me know if you want to discuss Gitlab CI integration further. While the configuration you adopt will depend on your development workflow, I might be able to help answer questions or find the right folks to answer them :)

Change #1023956 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] admin_ng: add namespace for commons-impact-analytics

https://gerrit.wikimedia.org/r/1023956

Change #1023957 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] DNM: services: add commons-impact-analytics service helmfile configs

https://gerrit.wikimedia.org/r/1023957

Change #1023958 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/deployment-charts@master] DNM: rest-gateway: route commons-analytics via rest-gateway

https://gerrit.wikimedia.org/r/1023958

Change #1023959 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] kubernetes: add usernames for commons-impact-analytics to deployment server

https://gerrit.wikimedia.org/r/1023959

Change #1023960 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] DNM: cassandra: add commons_impact_analytics user

https://gerrit.wikimedia.org/r/1023960

Change #1023961 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] service: add commons-impact-analytics AQS 2.0 service

https://gerrit.wikimedia.org/r/1023961

Change #1023962 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] DNM: service: move commons-impact-analytics service to production state

https://gerrit.wikimedia.org/r/1023962

Change #1023964 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/dns@master] wmnet: add CNAME records for commons-impact-analytics (k8s ingress)

https://gerrit.wikimedia.org/r/1023964

Change #1023959 merged by Scott French:

[operations/puppet@production] kubernetes: add usernames for commons-impact-analytics to deployment server

https://gerrit.wikimedia.org/r/1023959

Change #1023956 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: add namespace for commons-impact-analytics

https://gerrit.wikimedia.org/r/1023956

Change #1023964 merged by Scott French:

[operations/dns@master] wmnet: add CNAME records for commons-impact-analytics (k8s ingress)

https://gerrit.wikimedia.org/r/1023964

Mentioned in SAL (#wikimedia-operations) [2024-04-29T18:50:42Z] <swfrench-wmf> running authdns-update on dns1004 for T361835

Change #1023961 merged by Scott French:

[operations/puppet@production] service: add commons-impact-analytics AQS 2.0 service

https://gerrit.wikimedia.org/r/1023961

I believe that's everything that can be done for now, pending resolution of the open items in T361835#9742947.

@Scott_French Thank you ! We are in process of creating the cassandra tables accounted in this task. Will update the task once we have cassandra tables defined and a minimal build is ready