Page MenuHomePhabricator

Create a "state of the cloud" monthly report
Open, Needs TriagePublic


Idea from @chasemp in a labs-admin email thread:

I really don't want to call these target metrics or anything but "total instances in Cloud VPS", "Total tools in Tools", "% on Grid", "% on k8s", "new ldap user count", "new Tools user count", "nova-fullstack failures (? or uptime?)"....if this was run on a cron and sent via email to the labs-admin list before the meeting that would be ideal I guess.

We have a similar weekly or months stats generator for Phab (user count, issue count, etc) and over time it creates a view of normal. If we suddenly have 100 new ldap users where we usually have 10, or if someone notices new Tools hasn't gone up in a few weeks, etc. These are hard to quantify baselines where we will be happy to we put them in front of our faces over time I believe.

Event Timeline

The trick to automating this is figuring out where we can gather the interesting metrics from and possibly where we could store them for month over month comparisons.

Counting the number of active Kubernetes namespaces requires a privileged account and I think can only be done from tools-k8s-master at the moment. We could probably setup something that would figure this out daily and post it into graphite/prometheus where it would be more accessible.

The metric(s) that come out of T167556: Define a metric to track OpenStack system availability would probably be reasonable to include on this report.

LDAP account creations since date:

$ ldapsearch -xLLL -P 3 -E pr=40000/noprompt -o ldif-wrap=no -b"ou=people,dc=wikimedia,dc=org" '(&(objectClass=posixaccount)(createTimestamp>=20170701000000Z))' dn | grep dn: | wc -l

Tool account creations since date:

$ ldapsearch -xLLL -P 3 -E pr=40000/noprompt -o ldif-wrap=no -b"ou=people,ou=servicegroups,dc=wikimedia,dc=org" '(&(objectClass=posixaccount)(createTimestamp>=20170701000000Z))' dn | grep dn: | wc -l

Have often found number of unique log ins to a systems per month to be a useful stat: (Drop the wc to see the distribution of logins per user.)

$ last |cut -f1 -d" " | head -n -2 | sort | uniq -c | sort -nr | wc -l

It can also help to track the directions folks take after account activation. Combining with tools account creates:

$ ldapsearch -xLLL -P 3 -E pr=40000/noprompt -o ldif-wrap=no -b"ou=people,dc=wikimedia,dc=org" '(&(objectClass=posixaccount)(createTimestamp>=20170701000000Z))' uid | awk -F: '/^uid/ {print $2}' | xargs last| cut -f1 -d" " | head -n -2 | sort | uniq -c | sort -nr