bd808 (Bryan Davis)Administrator
Engineering Manager, Wikimedia Cloud Services

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 2:36 PM (193 w, 6 d)
Roles
Administrator
Availability
Available
IRC Nick
bd808
LDAP User
BryanDavis
MediaWiki User
BDavis (WMF)

I'm BDavis (WMF) on wiki, bd808 on irc, and BryanDavis on Gerrit and Wikitech.

I've got a thing for ๐Ÿฆ„s. Don't judge.

I work for or provide services to the Wikimedia Foundation, but this is my only Phabricator account. Edits, statements, or other contributions made from this account are my own, and may not reflect the views of the Foundation.

Recent Activity

Yesterday

bd808 created T197910: OAuth approval dialog CSS and timeless do not play well together.
Thu, Jun 21, 10:20 PM ยท Timeless, MediaWiki-extensions-OAuth
bd808 awarded T179677: Propose a logo for the PAWS project a Love token.
Thu, Jun 21, 7:34 PM ยท PAWS (JupyterHub 0.9), Design, Google-Code-in-2017
bd808 added a comment to T196171: Developer account creation without OpenStackManager.

This sounds hacky, but until we provision a new identity management application, couldn't we just tell people to sign up via Striker?

Thu, Jun 21, 5:32 PM ยท cloud-services-team, wikitech.wikimedia.org, MediaWiki-extensions-OpenStackManager
bd808 added a comment to T182070: tools-webgrid-lighttpd have ~ 90 procs stuck at 100% CPU time (mostly tools.jembot).

As the author of Croptool, I see that the tool hangs from time to time and I don't really understand why. If a user requests a very large image to be cropped, it will naturally cause high CPU usage for some time, but I would expect that the PHP process eventually would be killed / time out. Let me know if there are settings I should try changing!

Thu, Jun 21, 3:29 PM ยท Toolforge
bd808 triaged T194332: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs as Normal priority.
Thu, Jun 21, 3:29 PM ยท Epic, Toolforge
bd808 added a parent task for T136265: Develop evaluation criteria for comparing Platform as a Service (PaaS) solutions: T194332: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs.
Thu, Jun 21, 3:28 PM ยท Kubernetes, Community-Tech-Tool-Labs, Tools-Kubernetes, Toolforge
bd808 added a subtask for T194332: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs: T136265: Develop evaluation criteria for comparing Platform as a Service (PaaS) solutions.
Thu, Jun 21, 3:28 PM ยท Epic, Toolforge

Wed, Jun 20

bd808 added a project to T197800: Welcome new contributor welcomes old contributors on every edit since today: Tools.
Wed, Jun 20, 9:13 PM ยท Tools
bd808 merged T189639: Separate LDAP account creation bits from Striker to create a new identity management platform into T179463: Create a single application to provision and manage developer (LDAP) accounts.
Wed, Jun 20, 4:52 PM ยท LDAP, Operations, Developer-Relations, Cloud-Services
bd808 merged task T189639: Separate LDAP account creation bits from Striker to create a new identity management platform into T179463: Create a single application to provision and manage developer (LDAP) accounts.
Wed, Jun 20, 4:52 PM ยท Striker, LDAP
bd808 added a comment to T196171: Developer account creation without OpenStackManager.

Doesn't Striker support doing this? We could just take the account creation logic out of Striker and have people create accounts that way.

Wed, Jun 20, 4:50 PM ยท cloud-services-team, wikitech.wikimedia.org, MediaWiki-extensions-OpenStackManager
bd808 added a comment to T182070: tools-webgrid-lighttpd have ~ 90 procs stuck at 100% CPU time (mostly tools.jembot).

Here are the clush commands I have been using to first check for and then kill processes that have leaked out of grid engine due to some cleanup failure:

find-orphans
$ clush -w @exec -w @webgrid -b 'ps axwo user:20,ppid,pid,cmd | grep -Ev "^($USER|root|daemon|diamond|_lldpd|messagebus|nagios|nslcd|ntp|prometheus|statd|syslog|Debian-exim|www-data|sgeadmin)"|grep -v perl|grep -E "     1 "'
kill-orphans
$ clush -w @exec -w @webgrid -b 'ps axwo user:20,ppid,pid,cmd | grep -Ev "^($USER|root|daemon|diamond|_lldpd|messagebus|nagios|nslcd|ntp|prometheus|statd|syslog|Debian-exim|www-data|sgeadmin)"|grep -v perl|grep -E "     1 "|awk "{print \$3}"|xargs sudo kill -9'
Wed, Jun 20, 3:16 PM ยท Toolforge

Tue, Jun 19

bd808 added a comment to T194186: rack/setup/install cloudelastic100[1-4].eqiad.wmnet systems.

@chasemp please let me know network requirements.

Tue, Jun 19, 10:42 PM ยท ops-eqiad, cloud-services-team, Cloud-VPS, Operations

Mon, Jun 18

bd808 closed T197517: Create redirect from partnermetrics.wmflabs.org to Toolforge as Resolved.

The redirects project is now handling the 302 response to send https://partnermetrics.wmflabs.org/ to https://tools.wmflabs.org/mediaviews-api. This redirect blindly appends the path from the original request to the target URL, so the mediaviews-api tool will need to handle treating https://tools.wmflabs.org/mediaviews-api/mediaplaycounts properly itself. This can probably be done in the tool's ~/.lighttpd.conf by adding a url.rewrite-if-not-file or similar config stanza.

Mon, Jun 18, 10:13 PM ยท cloud-services-team (Kanban), VPS-Projects
bd808 added a project to T197517: Create redirect from partnermetrics.wmflabs.org to Toolforge: cloud-services-team (Kanban).
Mon, Jun 18, 9:59 PM ยท cloud-services-team (Kanban), VPS-Projects
bd808 claimed T197517: Create redirect from partnermetrics.wmflabs.org to Toolforge.
Mon, Jun 18, 9:52 PM ยท cloud-services-team (Kanban), VPS-Projects
bd808 closed T195589: Reset 2-factor authentication on Wikitech for Susannaanas as Resolved.

@Nikerabbit confirmed with @Susannaanas and then relayed that confirmation to me.

Mon, Jun 18, 8:03 PM ยท cloud-services-team (Kanban), wikitech.wikimedia.org, Trust-and-Safety
bd808 added a subtask for T174469: LDAP account that is not attached on wikitech has no means for password reset: T197612: Attach Developer Account to Wikitech, to enable Reset password.
Mon, Jun 18, 5:47 PM ยท Striker, wikitech.wikimedia.org
bd808 added a parent task for T197612: Attach Developer Account to Wikitech, to enable Reset password: T174469: LDAP account that is not attached on wikitech has no means for password reset.
Mon, Jun 18, 5:47 PM ยท cloud-services-team (Kanban), Toolforge, wikitech.wikimedia.org
bd808 closed T197612: Attach Developer Account to Wikitech, to enable Reset password as Resolved.
$ ssh labweb1002.wikimedia.org
$ mwscript extensions/OpenStackManager/maintenance/attachLdapUser.php --wiki=labswiki --user Olem --email <email>
Mon, Jun 18, 5:47 PM ยท cloud-services-team (Kanban), Toolforge, wikitech.wikimedia.org

Fri, Jun 15

bd808 closed T101687: Explicitly document policies for requesting new projects, a subtask of T101659: Run a documentation sprint for Cloud VPS and Toolforge, as Resolved.
Fri, Jun 15, 10:58 PM ยท Developer-Wishlist (2017), Developer-Relations, Documentation, Toolforge
bd808 closed T101687: Explicitly document policies for requesting new projects as Resolved.

https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_project plus Cloud-VPS (Project-requests) intro text mostly cover this now. MOre detail can be added as needed.

Fri, Jun 15, 10:58 PM ยท cloud-services-team (Kanban), Cloud-Services, Documentation
bd808 moved T196095: Request creation of wbaas VPS project from Inbox to Declined on the Cloud-VPS (Project-requests) board.
Fri, Jun 15, 10:20 PM ยท Cloud-VPS (Project-requests)
bd808 moved T195214: Request creation of wikibox VPS project from Inbox to Declined on the Cloud-VPS (Project-requests) board.
Fri, Jun 15, 10:20 PM ยท Cloud-VPS (Project-requests)
bd808 created E910: bd808 @ Wikimania.
Fri, Jun 15, 6:07 PM ยท events

Thu, Jun 14

bd808 closed T192129: Redirect living-style-guide.wmflabs.org to design.wikimedia.org/style-guide as Resolved.
Thu, Jun 14, 9:31 PM ยท cloud-services-team (Kanban), VPS-Projects
bd808 closed T192129: Redirect living-style-guide.wmflabs.org to design.wikimedia.org/style-guide, a subtask of T166012: Mark the living style guide as deprecated, as Resolved.
Thu, Jun 14, 9:31 PM ยท Technical-Debt, Documentation, UI-Standardization
bd808 closed T192128: Redirect livingstyleguide.wmflabs.org to design.wikimedia.org/style-guide as Resolved.

Sorry this took so long @Prtksxna. I lost track of the task and just rediscovered it in my todo list today.

Thu, Jun 14, 9:31 PM ยท cloud-services-team (Kanban), VPS-Projects
bd808 closed T192128: Redirect livingstyleguide.wmflabs.org to design.wikimedia.org/style-guide, a subtask of T166012: Mark the living style guide as deprecated, as Resolved.
Thu, Jun 14, 9:31 PM ยท Technical-Debt, Documentation, UI-Standardization
bd808 moved T196545: Create new labs project for development & testing of WMDE technical wishlist features/extensions from Inbox to Discussion needed on the Cloud-VPS (Project-requests) board.
Thu, Jun 14, 9:18 PM ยท Cloud-VPS (Project-requests)
bd808 edited projects for T196545: Create new labs project for development & testing of WMDE technical wishlist features/extensions, added: Cloud-VPS (Project-requests); removed Cloud-Services.
Thu, Jun 14, 9:18 PM ยท Cloud-VPS (Project-requests)
bd808 added a comment to T197264: Help accessing wikitech for CCogdill (WMF).

(no (WMF) which is normal for developer accounts)

Thu, Jun 14, 8:51 PM ยท wikitech.wikimedia.org
bd808 added a comment to T197264: Help accessing wikitech for CCogdill (WMF).

The wikitech user Ccogdill (no (WMF) which is normal for developer accounts) exists. That is the only LDAP account using @CCogdill_WMF's @wikimedia.org email address.

Thu, Jun 14, 8:40 PM ยท wikitech.wikimedia.org
bd808 updated subscribers of T193806: Uncategorized articles.

According to https://toolsadmin.wikimedia.org/tools/id/dplbot, @Dispenser and @russblau also have maintainer rights for the dplbot tool in addition to @JaGa. They may or may not have time to help fix this particular task, but that could make it easier to get a new co-maintainer added if one can be found.

Thu, Jun 14, 5:45 PM ยท Tools
bd808 added a comment to T193806: Uncategorized articles.

@Bearcat there is a process documented at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Abandoned_tool_policy for either adopting or usurping a tool where the maintainer has gone missing. There is also T159595: Make sure abandoned useful tools are properly advertised so potentially interested new maintainers could find them and the Toolforge-standards-committee (Maintainer needed) Phabricator project for advertising for new maintainers. Folks from the Toolforge-standards-committee may be able to help you with this route.

Thu, Jun 14, 5:42 PM ยท Tools
bd808 added a project to T168580: Neutron implementation of routing_source_ip definition: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Patch-For-Review, Epic, Cloud-Services
bd808 added a project to T87001: Provide basic page view metrics for individual tools on toollabs: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Toolforge
bd808 edited projects for T193655: rack/setup/install labstore1008 & labstore1009, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
bd808 added a project to T171394: Better monitoring for labstore backup crons: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Data-Services
bd808 added a project to T196752: Replace labtestnet2001 with labtestnet2003 and decomission labtestnet2001: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Cloud-VPS
bd808 added a project to T196209: "Associate Floating IP" button next to instance broken: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Horizon
bd808 edited projects for T192098: Add tmpreaper to all tools execute nodes, if appropriate, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a project to T192128: Redirect livingstyleguide.wmflabs.org to design.wikimedia.org/style-guide: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), VPS-Projects
bd808 added a project to T192129: Redirect living-style-guide.wmflabs.org to design.wikimedia.org/style-guide: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), VPS-Projects
bd808 edited projects for T192156: Review encoding of all OpenStack databases, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Cloud-VPS
bd808 added a project to T194782: What is Cloud Services and why should I care?: cloud-services-team (Kanban).
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Wikimedia-Hackathon-2018
bd808 edited projects for T197245: Move toolsdb and wikilabels cluster servers for datacenter reconfiguration, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban), Toolforge, Wikilabels, Scoring-platform-team
bd808 edited projects for T197244: Move analytics wiki replica cluster for switch and data center reconfigure, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban)
bd808 edited projects for T197246: Move OpenStreetMaps postgresql cluster servers for datacenter reconfiguration, added: cloud-services-team (Kanban); removed cloud-services-team.
Thu, Jun 14, 4:54 PM ยท cloud-services-team (Kanban)
bd808 added a comment to T190707: Creation of tools with wikimedia-related names blocked by global title blacklist.

The global title blacklist has a different target use and I don't think the whole list should should be used for blacklisting tool names.

Thu, Jun 14, 3:42 PM ยท Striker

Mon, Jun 11

Liuxinyu970226 awarded T127792: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace a Love token.
Mon, Jun 11, 8:23 AM ยท DBA, Patch-For-Review, Wikimedia-Extension-setup, Community-Tech-Tool-Labs, Toolforge, Cloud-Services, StructuredDiscussions, Collaboration-Team-Triage, wikitech.wikimedia.org

Sun, Jun 10

bd808 placed T193848: https://wikitech.wikimedia.org/view/ no longer redirects to /wiki up for grabs.

@Aklapper how is this resolved? https://wikitech.wikimedia.org/view/Server_Admin_Log still doesn't work.

Sun, Jun 10, 3:00 PM ยท cloud-services-team, Wikimedia-Apache-configuration, Regression, wikitech.wikimedia.org
bd808 created E905: bd808 @ wikilead.
Sun, Jun 10, 2:45 PM ยท events

Fri, Jun 8

bd808 closed T196559: Create "JupyterHub 0.9" milestone for PAWS project as Resolved.

https://phabricator.wikimedia.org/project/view/3414/

Fri, Jun 8, 4:37 PM ยท PAWS, Project-Admins
bd808 created PAWS (JupyterHub 0.9).
Fri, Jun 8, 4:37 PM

Thu, Jun 7

bd808 added a comment to T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart.

I guess I should look into the restart logic we use for the Kubernetes backend and see if it always or only conditionally tears down and recreates the entire deployment.

Thu, Jun 7, 8:05 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to T183920: 2018-01-02: labstore Tools and Misc share very full.

I suppose that they are outputs of grid jobs commands. And as I know nothing delete/delete them by default. Can we imagine a log rotation of all jobs outputs by default ?

Thu, Jun 7, 6:15 PM ยท cloud-services-team (Kanban), Operations, Cloud-VPS
bd808 claimed T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart.
Thu, Jun 7, 4:54 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart.

the deployment rebooted the pod using old deployment configuration, but with new docker image, causing the failure.

Thu, Jun 7, 4:52 PM ยท cloud-services-team (Kanban), Toolforge
bd808 lowered the priority of T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart from High to Normal.

After more automated and manual cleanup, there are currently 0 pods in CrashLoopBackOff state. Lowering priority of task now. I'll keep it open to see if we have more occurrences of the /etc/wmcs-project mount failure in the near future.

Thu, Jun 7, 12:33 AM ยท cloud-services-team (Kanban), Toolforge
bd808 created T196597: Gerrit login failure for user SMcCandlish.
Thu, Jun 7, 12:24 AM ยท LDAP, Gerrit

Wed, Jun 6

bd808 added a comment to T156626: k8s webservice restart failure with `ValueError: get() more than one object; use filter`.

I found a new way this can be broken today. While doing some CrashLoopBackOff state cleanup I found a tool which had 1 pod, no deployments, and 7 replicasets. Running webservice stop was crashing with the "get() more than one object; use filter" error and I'm very sure it was due to the multiple replicaset objects in the namespace. This looked to be the result of some cluster state corruption from ~1 year ago.

Wed, Jun 6, 11:57 PM ยท Kubernetes, Tools-Kubernetes, Toolforge
bd808 merged T196595: "webservice --backend kubernetes nodejs restart" command fails at times into T156626: k8s webservice restart failure with `ValueError: get() more than one object; use filter`.
Wed, Jun 6, 11:54 PM ยท Kubernetes, Tools-Kubernetes, Toolforge
bd808 merged task T196595: "webservice --backend kubernetes nodejs restart" command fails at times into T156626: k8s webservice restart failure with `ValueError: get() more than one object; use filter`.
Wed, Jun 6, 11:54 PM ยท Toolforge
bd808 merged T196595: "webservice --backend kubernetes nodejs restart" command fails at times into T140415: `webservice restart` does not always wait for service to stop before trying to start again.
Wed, Jun 6, 11:53 PM ยท Kubernetes, Toolforge, Tools-Kubernetes
bd808 merged task T196595: "webservice --backend kubernetes nodejs restart" command fails at times into T140415: `webservice restart` does not always wait for service to stop before trying to start again.
Wed, Jun 6, 11:53 PM ยท Toolforge
bd808 closed T196568: 502 Bad Gateway using multiple WMF Labs Tools as Resolved.

Most things should be back up and running now. Our planned maintenance took a bit longer than expected and there were some additional complications with Toolforge webservices running on our Kubernetes cluster.

Wed, Jun 6, 11:06 PM ยท Toolforge, Tools
bd808 added a comment to T196568: 502 Bad Gateway using multiple WMF Labs Tools.

See T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart for info about lots of tools that did not start as expected following system maintenance. Most are resolved now.

Wed, Jun 6, 10:39 PM ยท Toolforge, Tools
bd808 added a comment to T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart.

On the third pass I used this helper script to only restart webservices that are logging the missing mounted file:

/tmp/klogs
#!/usr/bin/env bash
kubectl logs po/$(kubectl get po|grep CrashLoopBackOff|awk '{print $1}') |
grep wmcs-project &&
webservice restart
Wed, Jun 6, 10:14 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart.

After the second round of mass restarts, 48 tools are still in CrashLoopBackOff state.

Wed, Jun 6, 10:00 PM ยท cloud-services-team (Kanban), Toolforge
bd808 triaged T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart as High priority.

Triaging as high initially. I have been trying to automate restarts for pods in the CrashLoopBackOff state with some success. The initial 175 went down to 59 after the first pass. A second pass of restarts is happening now.

Wed, Jun 6, 9:39 PM ยท cloud-services-team (Kanban), Toolforge
bd808 created T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart.
Wed, Jun 6, 9:36 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to P7220 Tools stuck in CrashLoopBackOff after cluster reboots.
$ sudo kubectl get --all-namespaces pods --sort-by='.status.containerStatus
es[0].restartCount' -o wide | grep CrashLoopBackOff | awk '{print $1}' | sort |
uniq
addshore-dev
android-maven-repo
apt-browser
article
articlerequest
articlerequest-dev
ash-django
autolist
awmd-stats
bash
best-image
blankpages
bldrwnsch
blog
book2scroll
category-by-uploaders
catfood
catnap
catscan2
cats-php
cdnjs
cdnjs-beta
checker
citations-dev
cite-o-meter
commons-app-stats
commons-campaign-commander
commonsedge
commonshelper
enwnbot
extreg-wos
forrestbot
ft
gsoc-petscan-query-articles
guc
hartman
hashtags-test
hasteurbot
hennalabs
heritage                                                                [7/7346]
himo
hub
icommons
ifttt
ifttt-testing
ios-crashes
ircredirect
ircredirector
isbn2wiki
isbn-usage
isin
jayprakashbot
joanjoc
kasper-data-translator
ksamsok-rest
lingua-libre
list
lolrrit-wm
loltools
lyan
not-in-the-other-language
openrefine-wikidata
r96340-bot
readmore
redirtest
refill-api
reviewers
sejmedits
shields
sibu
sibutest
sighting
similarity
sistercities
snapshots
sparqlblocks
sphinxcapt-leaderboard
spiarticleanalyzer
static
statistics
strephit
tabletop
templatecheck
tesseract-ocr-service
tfaprotbot
thankyou
threed2commons
tool-db-usage
toolschecker
toolscript
verification-pages
wikidata-exports
wm-commons-emoji-bot
Wed, Jun 6, 9:11 PM
bd808 added a comment to T195780: Try to identify new developers (via assignee field) in Phab tasks and potentially follow up.

@Aklapper have you looked to see if it would be possible to get the data you need from https://dumps.wikimedia.org/other/misc/phabricator_public.dump and avoid the sql layer entirely?

Wed, Jun 6, 8:54 PM ยท Patch-For-Review, Developer-Relations (Apr-Jun-2018), Phabricator
bd808 added a comment to P7220 Tools stuck in CrashLoopBackOff after cluster reboots.

for t in $(cat restart-us.txt); do echo $t; sudo become $t; webservice restart; sleep 5; done

Wed, Jun 6, 8:24 PM
bd808 created P7220 Tools stuck in CrashLoopBackOff after cluster reboots.
Wed, Jun 6, 8:22 PM
mmodell awarded T72792: Set up puppet exported resources to collect ssh host keys for beta a Love token.
Wed, Jun 6, 5:54 PM ยท Patch-For-Review, Puppet, Beta-Cluster-Infrastructure

Tue, Jun 5

bd808 created T196495: Limit ability of a single user/tool to overwhealm job grid.
Tue, Jun 5, 6:29 PM ยท Toolforge
bd808 closed T196486: Concurrent generated jobs from a single user overloaded grid engine as Resolved.
Tue, Jun 5, 6:09 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to T196486: Concurrent generated jobs from a single user overloaded grid engine.

@Debenben, I think you can resume your work. The job grid should automatically limit you to running 2 concurrent jobs now regardless of how many you submit at the same time. Jobs that are queued for later execution will show with state qw in the outpur of qstat. The current limit of 2 is very conservative. We can revisit it if you find that running only 2 at a time will make your project take weeks/months to finish. I don't think we can go much above 6-8 concurrent jobs however and be fair to others since your parallel dumps parsing will be so IO intensive for the NFS servers.

Tue, Jun 5, 6:07 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to T196486: Concurrent generated jobs from a single user overloaded grid engine.

The quota seems to work:

$ for n in $(seq 1 9); do jsub -N test-$n test-concurrency.sh; done
Your job 6900851 ("test-1") has been submitted
Your job 6900852 ("test-2") has been submitted
Your job 6900853 ("test-3") has been submitted
Your job 6900854 ("test-4") has been submitted
Your job 6900855 ("test-5") has been submitted
Your job 6900856 ("test-6") has been submitted
Your job 6900857 ("test-7") has been submitted
Your job 6900858 ("test-8") has been submitted
Your job 6900859 ("test-9") has been submitted
$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6900851 0.30000 test-1     debenben     r     06/05/2018 17:50:53 task@tools-exec-1430.tools.eqi     1
6900852 0.30000 test-2     debenben     r     06/05/2018 17:50:53 task@tools-exec-1418.tools.eqi     1
6900853 0.30000 test-3     debenben     qw    06/05/2018 17:50:53                                    1
6900854 0.30000 test-4     debenben     qw    06/05/2018 17:50:53                                    1
6900855 0.30000 test-5     debenben     qw    06/05/2018 17:50:53                                    1
6900856 0.30000 test-6     debenben     qw    06/05/2018 17:50:53                                    1
6900857 0.30000 test-7     debenben     qw    06/05/2018 17:50:53                                    1
6900858 0.30000 test-8     debenben     qw    06/05/2018 17:50:53                                    1
6900859 0.30000 test-9     debenben     qw    06/05/2018 17:50:54                                    1
$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6900855 0.30000 test-5     debenben     r     06/05/2018 17:51:55 task@tools-exec-1430.tools.eqi     1
6900856 0.30000 test-6     debenben     r     06/05/2018 17:51:56 task@tools-exec-1418.tools.eqi     1
6900857 0.30000 test-7     debenben     qw    06/05/2018 17:50:53                                    1
6900858 0.30000 test-8     debenben     qw    06/05/2018 17:50:53                                    1
6900859 0.30000 test-9     debenben     qw    06/05/2018 17:50:54                                    1
$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6900859 0.30000 test-9     debenben     r     06/05/2018 17:52:57 task@tools-exec-1430.tools.eqi     1
$ qstat
$ 
Tue, Jun 5, 5:54 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to T196486: Concurrent generated jobs from a single user overloaded grid engine.

Following the advice from https://serverfault.com/a/184214/6479, I am adding a quota limiting Dubenben's user account to 2 simultaneous jobs:

$ sudo -i qconf -srqs debenben_max_slots
{
   name         debenben_max_slots
   description  "Limit user debenben to 2 slots"
   enabled      TRUE
   limit        users {debenben} hosts * to slots=2
}
Tue, Jun 5, 5:36 PM ยท cloud-services-team (Kanban), Toolforge
bd808 created T196486: Concurrent generated jobs from a single user overloaded grid engine.
Tue, Jun 5, 5:32 PM ยท cloud-services-team (Kanban), Toolforge
bd808 added a comment to T196028: Up quota for shinken project.

+1

Tue, Jun 5, 3:57 PM ยท Cloud-VPS (Quota-requests), Shinken
bd808 moved T193185: Increase quota for wikidata-federation project from Inbox to Discussion needed on the Cloud-VPS (Quota-requests) board.
Tue, Jun 5, 3:51 PM ยท Cloud-VPS (Quota-requests)

Mon, Jun 4

bd808 created T196418: Google Maps keyless usage sunset on 2018-06-11.
Mon, Jun 4, 10:21 PM ยท Toolforge, Tools
bd808 added a comment to T196137: toolforge: prometheus issue is filling up email queue.

Overall, the key here is that getent passwd prometheus gets a different UID than what it is in passwd.

Mon, Jun 4, 7:53 PM ยท cloud-services-team (Kanban), Patch-For-Review, Toolforge
bd808 added a project to T196137: toolforge: prometheus issue is filling up email queue: cloud-services-team (Kanban).
Mon, Jun 4, 7:51 PM ยท cloud-services-team (Kanban), Patch-For-Review, Toolforge
bd808 renamed T168433: Deprecate DSA (ssh-dss) SSH keys for Cloud VPS and Toolforge users from Deprecate DSA (ssh-dss) SSH keys for Labs users to Deprecate DSA (ssh-dss) SSH keys for Cloud VPS and Toolforge users.
Mon, Jun 4, 4:43 PM ยท Cloud-VPS, Toolforge, cloud-services-team (Kanban)
bd808 closed T196321: Toolforge user Patriccck having password reset problems as Resolved.
Mon, Jun 4, 2:31 PM ยท cloud-services-team (Kanban), wikitech.wikimedia.org, Toolforge
bd808 added a comment to T194093: Setup a demo wiki with ponydocs.

Possibly even worse is https://github.com/splunk/ponydocs/blob/master/MediaWiki.patch which is obviously not something we can deploy on a production wiki. If we demo it and find out that things are actually highly useful then we can consider if the benefit is worth finding a way to update the code.

Mon, Jun 4, 2:19 AM ยท cloud-services-team (Kanban), Documentation, Cloud-Services
bd808 added a comment to T148872: Make webservice command read default cli arguments from ~/.webservicerc.

Copied from comment on https://gerrit.wikimedia.org/r/#/c/435691/ where I wrote:

Mon, Jun 4, 1:03 AM ยท Patch-For-Review, Toolforge
bd808 moved T165337: Add new users to 'bastion' project via a Keystone hook from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:03 AM ยท Patch-For-Review, cloud-services-team (Kanban), Cloud-Services
bd808 moved T188994: toolforge: package upgrades as part of the new workflow from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:01 AM ยท Patch-For-Review, cloud-services-team (Kanban), Toolforge
bd808 moved T181523: labtest puppetmaster is not working for clients from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:01 AM ยท Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS, Epic
bd808 moved T184259: labspuppetmaster1001: have consistency in owner of git repos from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:01 AM ยท cloud-services-team (Kanban), Patch-For-Review
bd808 moved T189871: labmon1002 as cold standby for labmon1001 from To-Do to Doing on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:01 AM ยท cloud-services-team (Kanban), Patch-For-Review, Cloud-VPS
bd808 moved T188681: Maintain-dbusers should handle failures due to replicas being in maintenance from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:01 AM ยท Patch-For-Review, cloud-services-team (Kanban), Data-Services
bd808 moved T189871: labmon1002 as cold standby for labmon1001 from Inbox to To-Do on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:00 AM ยท cloud-services-team (Kanban), Patch-For-Review, Cloud-VPS
bd808 moved T191445: Document clear guidelines for what is and is not a good Cloud VPS project from Inbox to To-Do on the cloud-services-team (Kanban) board.
Mon, Jun 4, 12:00 AM ยท cloud-services-team (Kanban), Documentation, Cloud-VPS