Page MenuHomePhabricator

puppetdb-api micro service dosn't work well with large queries
Closed, ResolvedPublic

Description

The puppet db micro service puppetdb-api.discovery.wmnet does not handle large queries very well. This can be demonstrated by the following two queries

$ time curl -X POST http://localhost:8080/pdb/query/v4/resources --data  '{"query":  ["=", "type", "File"]}' -H 'Content-Type: application/json' &> /dev/null 
curl -X POST http://localhost:8080/pdb/query/v4/resources --data  -H  &>   2.03s user 2.85s system 1% cpu 7:17.45 total
$ time curl -X POST https://puppetdb-api.discovery.wmnet:8090/pdb/query/v4/resources --data  '{"query": ["=", "type", "File"]}' -H 'Content-Type: application/json'            
<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>
curl -X POST https://puppetdb-api.discovery.wmnet:8090/pdb/query/v4/resources  0.02s user 0.02s system 0% cpu 1:00.06 total

The issue is that the flask service reads the data into memory iterates and modifies it before then sending it to the client, which causes a crash probably from OOM. It would be better if we could stream the data directly to the client and modify it on the fly.

The main affct this has is that cuminunpriv is unable to lookup very generic resources or classes that are used in a lot of places. I also suspect that genral queries will be slower on cuminunpriv vs cumin (which goes directly to puppetdb)

Event Timeline

jbond triaged this task as Medium priority.Jul 21 2023, 6:08 PM
jbond created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 940403 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] (WIP) puppetdb-microservice: update puppetdb micro service so it streams data

https://gerrit.wikimedia.org/r/940403

Plan of action:

Example, replace: resource { title = 'foobar' and type = 'file'} with: resource[certname] { title = 'foobar' and type = 'file'}

Script lives here: modules/profile/files/puppetdb/puppetdb-microservice.py in the Puppet repo. Script gets deployed to PuppetDB hosts.

We likely want to also add the group by param e.g.

resources[certname] { type = 'File' group by certname }

It may also be easier to modify the incoming request to simply add the ast extract and Group by modifiers. This may be easier then trying to transpose the incoming AST query to a PQL query

Change 951965 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb-api-microservice: redact one the puppetdb side

https://gerrit.wikimedia.org/r/951965

Change 940403 abandoned by Jbond:

[operations/puppet@production] (WIP) puppetdb-microservice: update puppetdb micro service so it streams data

Reason:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/951965

https://gerrit.wikimedia.org/r/940403

Change 951965 merged by Jbond:

[operations/puppet@production] puppetdb-api-microservice: redact one the puppetdb side

https://gerrit.wikimedia.org/r/951965

Change 952216 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb-api-microservice: need to convert current query to json

https://gerrit.wikimedia.org/r/952216

Change 952216 merged by Jbond:

[operations/puppet@production] puppetdb-api-microservice: need to convert current query to json

https://gerrit.wikimedia.org/r/952216

It seems like John fixed this.