Page MenuHomePhabricator

New Service Request Generated Datasets: Image Suggestions Service
Closed, ResolvedPublic

Description

Description:
This task is to place the service on Service Ops radar. Platform Engineering will update the ticket once Hugh is back.

A Go service that supports access of Image Suggestions data - both the Image Suggestion and its associated feedback - to Cassandra. An Image Suggestion is a mapping between an unillustrated Wiki Article and one or more Images. The Image Suggestion data is based on the output of an algorithm developed by the Structured Data team.

The repo is here:
https://gerrit.wikimedia.org/r/admin/repos/generated-data-platform/datasets/image-suggestions

Timeline: Targeting deployment start of Q4
Diagram:

Screenshot 2022-03-30 at 08.19.03.png (681×1 px, 143 KB)

Technologies: Go, Cassandra
Point person: @WDoranWMF @Eevans

Acceptance criteria

  • Move repo from Gitlab to Gerrit
  • Add DNS records
  • Write a Helm chart
  • Docker container is built
  • Integrate container creation into the pipeline
  • Test in mini-kube
  • Benchmark/load test the service
  • Provide metrics gathered via Prometheus and dashboards to monitor and assess
  • Deploy
  • Documentation updated

Details

Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
WDoranWMF renamed this task from New Service Request Image Suggestions Feedback Service to New Service Request Generated Data Image Suggestions Service.Mar 28 2022, 7:53 PM
WDoranWMF renamed this task from New Service Request Generated Data Image Suggestions Service to New Service Request Generated Datasets: Image Suggestions Service.
herron triaged this task as Medium priority.Mar 30 2022, 3:26 PM
herron added a project: serviceops.

Mentioned in SAL (#wikimedia-operations) [2022-03-31T20:40:52Z] <mutante> reserving port 4017 for new k8s service request 'image-suggestions' T304891

found out the "add dummy tokens to labs/private" step is not needed anymore

Change 775964 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/deployment-charts@master] add a namespace for new service image-suggestions

https://gerrit.wikimedia.org/r/775964

Mentioned in SAL (#wikimedia-operations) [2022-03-31T20:40:52Z] <mutante> reserving port 4017 for new k8s service request 'image-suggestions' T304891

No need to reserve a port as this will use Ingress from the get-go.

found out the "add dummy tokens to labs/private" step is not needed anymore

We still have those in labs/private hieradata/common/profile/kubernetes.yaml. As far as I know, everything from https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service is still needed (apart form reserving a NodePort/Service Port in case of ingress).

After discussion in the meeting yesterday, we concluded that:

  • We will create a generic chart for cassandra-http-gateway-based apps
  • We will by default make those use the ingress gateway and not set up LVS for them
  • The deployment will be called image-suggestion and use the image for this service.

Mentioned in SAL (#wikimedia-operations) [2022-04-20T23:36:14Z] <mutante> kubernetes/puppetmaster: added deployment/user tokens for new service image-suggestion T304891

Change 784791 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] kubernetes::deployment_server: add new service image-suggestion

https://gerrit.wikimedia.org/r/784791

Change 784791 merged by Dzahn:

[operations/puppet@production] kubernetes::deployment_server: add new service image-suggestion

https://gerrit.wikimedia.org/r/784791

We still have those in labs/private hieradata/common/profile/kubernetes.yaml. As far as I know, everything from https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service is still needed (apart form reserving a NodePort/Service Port in case of ingress).

  • (step 1) - skipped
  • (step 2) added deployment/user tokens like this (2 different tokens, random 22 characters from pwgen, additonal user with -deploy suffix, assuming "groups: - deploy" is correct here, is it?)
image-suggestion:
 token: ...
 groups:
    - deploy
image-suggestion-deploy:
  token: ...
  • (step 3) added to deployment_server / helmfile defaults

https://gerrit.wikimedia.org/r/c/operations/puppet/+/784791/

https://puppet-compiler.wmflabs.org/pcc-worker1002/34931/deploy1002.eqiad.wmnet/index.html

deployed. various /etc/kubernetes/image-suggestion* files have been created now on deploy1002.eqiad.wmnet.

  • The deployment will be called image-suggestion and use the image for this service.

ACK, using "image-suggestion" without the "s", singular form.

Change 784794 had a related patch set uploaded (by Dzahn; author: Dzahn):

[labs/private@master] kubernetes: add dummy tokens for image-suggestion service

https://gerrit.wikimedia.org/r/784794

Change 784794 merged by Dzahn:

[labs/private@master] kubernetes: add dummy tokens for image-suggestion service

https://gerrit.wikimedia.org/r/784794

Change 786426 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] add image-suggestion.discovery.wmnet and point to ingress-wikikube

https://gerrit.wikimedia.org/r/786426

Change 775964 merged by jenkins-bot:

[operations/deployment-charts@master] add a namespace for new service image-suggestion

https://gerrit.wikimedia.org/r/775964

Mentioned in SAL (#wikimedia-operations) [2022-04-27T22:09:36Z] <mutante> running puppet on kubemasters - adding namespace to kubernetes for new service image-suggestion (T304891, T305155)

Mentioned in SAL (#wikimedia-operations) [2022-04-27T22:34:36Z] <mutante> kubernetes - Uprading release=namespaces/namspace-certificates which added developer-portal and image-suggestion namespaces - but only on staging-codfw - (T304891, T305155, T297140)

Also see T297140#7886240 where 2 new namespaces were added, one for developer-portal and this over here for image-suggestion.

I confirmed the namespace image-suggestion now exists in all 4 environments/clusters.

quotas and limits are as follows, staging:

root@deploy1002:~# kubectl describe ns image-suggestion
Name:         image-suggestion

...

Resource Quotas
 Name:            quota-compute-resources
 Resource         Used  Hard
 --------         ---   ---
 limits.cpu       0     20
 limits.memory    0     10Gi
 requests.cpu     0     20
 requests.memory  0     10Gi

Resource Limits
 Type       Resource  Min    Max  Default Request  Default Limit  Max Limit/Request Ratio
 ----       --------  ---    ---  ---------------  -------------  -----------------------
 Container  cpu       100m   8    100m             100m           -
 Container  memory    100Mi  3Gi  100Mi            100Mi          -
 Pod        cpu       100m   9    -                -              -
 Pod        memory    100Mi  5Gi  -

production:

Resource Quotas
 Name:            quota-compute-resources
 Resource         Used  Hard
 --------         ---   ---
 limits.cpu       0     90
 limits.memory    0     100Gi
 requests.cpu     0     90
 requests.memory  0     100Gi

Resource Limits
 Type       Resource  Min    Max  Default Request  Default Limit  Max Limit/Request Ratio
 ----       --------  ---    ---  ---------------  -------------  -----------------------
 Container  cpu       100m   8    100m             100m           -
 Container  memory    100Mi  3Gi  100Mi            100Mi          -
 Pod        cpu       100m   9    -                -              -
 Pod        memory    100Mi  5Gi  -

Change 788753 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] service: add image-suggestion ingress service

https://gerrit.wikimedia.org/r/788753

Change 788814 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] add service records for new service image-suggestion

https://gerrit.wikimedia.org/r/788814

Change 788814 merged by Dzahn:

[operations/dns@master] add svc and discovery records for new service image-suggestion

https://gerrit.wikimedia.org/r/788814

Mentioned in SAL (#wikimedia-operations) [2022-05-03T22:36:26Z] <mutante> ns0: authdns-update - deploying DNS change,add new svc and discovery records for image-suggestion T304891

OK - authdns-update successful on all nodes!

[authdns1001:~] $ host image-suggestion.discovery.wmnet
image-suggestion.discovery.wmnet is an alias for k8s-ingress-wikikube-ro.discovery.wmnet.
k8s-ingress-wikikube-ro.discovery.wmnet has address 10.2.2.70

[authdns1001:~] $ host image-suggestion.svc.eqiad.wmnet
image-suggestion.svc.eqiad.wmnet is an alias for k8s-ingress-wikikube.svc.eqiad.wmnet.
k8s-ingress-wikikube.svc.eqiad.wmnet has address 10.2.2.70

[authdns1001:~] $ host image-suggestion.svc.codfw.wmnet
image-suggestion.svc.codfw.wmnet is an alias for k8s-ingress-wikikube.svc.codfw.wmnet.
k8s-ingress-wikikube.svc.codfw.wmnet has address 10.2.1.70

Change 786426 abandoned by Dzahn:

[operations/dns@master] add image-suggestion.discovery.wmnet and point to ingress-wikikube

Reason:

superseded by https://gerrit.wikimedia.org/r/c/operations/dns/+/788814

https://gerrit.wikimedia.org/r/786426

Mentioned in SAL (#wikimedia-operations) [2022-05-03T22:58:42Z] <mutante> added image-suggestion to kube_services.certs.yaml in private repo, generated new certs and git committed them T304891

@WDoranWMF @hnowlan

docs have been updated.

https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service
https://wikitech.wikimedia.org/wiki/Kubernetes/Ingress#Add_a_new_service_under_Ingress

I have created the DNS records (srv, discovery) and the certificates for the service proxy covering these names:

https://wikitech.wikimedia.org/wiki/Kubernetes/Enabling_TLS#Create_and_place_certificates

They have been deployed in the private repo and on the deployment server there is now:

[deploy1002:/etc/helmfile-defaults/private/main_services/image-suggestion] $

I think that concludes the steps that blocked you to get this moving.

It should have unblocked https://gerrit.wikimedia.org/r/c/operations/puppet/+/788753 and https://wikitech.wikimedia.org/wiki/Kubernetes/Enabling_TLS#Add_support_to_the_chart

also see email

Change 789876 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] New service: image-suggestion

https://gerrit.wikimedia.org/r/789876

Change 790695 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[generated-data-platform/datasets/image-suggestions@main] Install wmf-certificates package to get puppetca cert

https://gerrit.wikimedia.org/r/790695

Change 790695 merged by jenkins-bot:

[generated-data-platform/datasets/image-suggestions@main] Install wmf-certificates package to get puppetca cert

https://gerrit.wikimedia.org/r/790695

Change 791324 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] Add helmfile configuration for image-suggestion

https://gerrit.wikimedia.org/r/791324

Change 789876 merged by jenkins-bot:

[operations/deployment-charts@master] New service: image-suggestion

https://gerrit.wikimedia.org/r/789876

Change 791324 merged by jenkins-bot:

[operations/deployment-charts@master] Add helmfile configuration for image-suggestion

https://gerrit.wikimedia.org/r/791324

Change 793745 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[generated-data-platform/datasets/image-suggestions@main] Install wmf-certificates for production

https://gerrit.wikimedia.org/r/793745

Change 793745 merged by jenkins-bot:

[generated-data-platform/datasets/image-suggestions@main] Install wmf-certificates for production

https://gerrit.wikimedia.org/r/793745

Change 793747 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] image-suggestion: bump image version

https://gerrit.wikimedia.org/r/793747

Change 793747 merged by jenkins-bot:

[operations/deployment-charts@master] image-suggestion: bump image version

https://gerrit.wikimedia.org/r/793747

Change 793839 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] aqs: allow Kubernetes nodes access to cassandra

https://gerrit.wikimedia.org/r/793839

Change 793839 merged by Hnowlan:

[operations/puppet@production] aqs: allow Kubernetes nodes access to cassandra

https://gerrit.wikimedia.org/r/793839

Change 799283 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] cassandra-http-gateway: add missing log level

https://gerrit.wikimedia.org/r/799283

Change 788753 merged by Hnowlan:

[operations/puppet@production] service: add image-suggestion ingress service

https://gerrit.wikimedia.org/r/788753

Change 799357 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] service: image-suggestion state to lvs_setup

https://gerrit.wikimedia.org/r/799357

Change 799358 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] service: image-suggestion state to lvs_setup

https://gerrit.wikimedia.org/r/799358

Change 799998 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] service: image-suggestion state to production

https://gerrit.wikimedia.org/r/799998

Change 799283 merged by jenkins-bot:

[operations/deployment-charts@master] cassandra-http-gateway: add missing log level

https://gerrit.wikimedia.org/r/799283

Change 799357 merged by Hnowlan:

[operations/puppet@production] service: image-suggestion state to lvs_setup

https://gerrit.wikimedia.org/r/799357

Change 799358 merged by Hnowlan:

[operations/puppet@production] service: image-suggestion state to monitoring_setup

https://gerrit.wikimedia.org/r/799358

Change 802964 had a related patch set uploaded (by Jbond; author: Jbond):

[operations/puppet@production] service: image-suggestion state to production with paging disabled

https://gerrit.wikimedia.org/r/802964

Change 799998 merged by Hnowlan:

[operations/puppet@production] service: image-suggestion state to production

https://gerrit.wikimedia.org/r/799998

Change 802964 abandoned by Jbond:

[operations/puppet@production] service: image-suggestion state to production with paging disabled

Reason:

no longer neccesary

https://gerrit.wikimedia.org/r/802964

This is pretty much done. We currently only have two main metrics for the service so there's a very basic dashboard at https://grafana-rw.wikimedia.org/d/-u8RHiCnk/image-suggestion

hnowlan claimed this task.
hnowlan updated the task description. (Show Details)

@lbowmaker @hnowlan does this service have a page on Wikitech?

I've added some basic service configuration docs from an ops perspective here (as opposed to anything more specific about the internals of the service)