Page MenuHomePhabricator

Phamhi (Phamhi)
Operations Engineer at Wikimedia Cloud Services Team

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Aug 5 2019, 1:02 PM (14 w, 19 h)
Availability
Available
LDAP User
Phamhi
MediaWiki User
HPham (WMF) [ Global Accounts ]

Operations Engineer with the Wikimedia Cloud Services Team

Recent Activity

Fri, Nov 8

Phamhi added a comment to T237768: Could not find dependency Package[python-yaml] error in profile::toolforge::grid::node::web .

That makes sense..thanks for pointing it out Krenair.

Fri, Nov 8, 9:55 PM · cloud-services-team
Phamhi created T237768: Could not find dependency Package[python-yaml] error in profile::toolforge::grid::node::web .
Fri, Nov 8, 9:27 PM · cloud-services-team
Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

I tested these images and confirmed working.

Fri, Nov 8, 3:57 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

Updating tools-webservice to include new buster images options to webservice: golang111, jdk11, php73, python37 and ruby25

Fri, Nov 8, 3:31 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi closed T193560: Tool keeps falling into permanent 500 error as Resolved.
Fri, Nov 8, 12:46 PM · cloud-services-team (Kanban), Toolforge

Wed, Nov 6

Phamhi added a comment to T234656: Systems and service continuity and availability constraints.

Updated documentation based on feedbacks from Arturo.

Wed, Nov 6, 2:43 PM · cloud-services-team (Kanban)
Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Wed, Nov 6, 12:24 PM · cloud-services-team (Kanban), Toolforge
Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Wed, Nov 6, 11:57 AM · cloud-services-team (Kanban), Toolforge

Tue, Nov 5

Phamhi added a comment to T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.

All of the lighttpd based k8s pods have been restarted

Tue, Nov 5, 5:41 PM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.

I am looking for a sane way to restart all of the lighttpd powered webservices to force the change to take effect...

Tue, Nov 5, 1:00 PM · cloud-services-team (Kanban), Toolforge
Phamhi updated the task description for T234656: Systems and service continuity and availability constraints.
Tue, Nov 5, 12:33 PM · cloud-services-team (Kanban)
Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Tue, Nov 5, 12:12 PM · cloud-services-team (Kanban), Toolforge

Mon, Nov 4

Phamhi added a comment to T237270: The spacemedia tool keeps crashing and filling kubernetes nodes.

I think our best bet is to set the max-size and max-file options for the json-file logging driver

Mon, Nov 4, 3:59 PM · Tool-spacemedia, cloud-services-team (Kanban)
Phamhi created T237257: Add "forgot your password" link on https://toolsadmin.wikimedia.org/auth/login/.
Mon, Nov 4, 2:14 PM · Striker
Phamhi updated the task description for T228942: Onboard Hieu Pham to Wikimedia Foundation as SRE in Cloud Services.
Mon, Nov 4, 12:09 PM · cloud-services-team (Kanban)

Sat, Nov 2

Phamhi committed rODITdcc521f2c64e: Docker-images: create new docker images based on buster. (authored by Phamhi).
Docker-images: create new docker images based on buster.
Sat, Nov 2, 12:37 AM

Fri, Nov 1

Phamhi added a comment to T236349: PAWS instance stopped working.

Thanks for letting us know.

Fri, Nov 1, 12:08 PM · cloud-services-team (Kanban), PAWS

Wed, Oct 30

Phamhi claimed T193560: Tool keeps falling into permanent 500 error.

Hey @Magnus ... we are just doing some ticket cleanup and noticed that the last update was back in March... and it looks like the ticket had been resolved. Is it ok if we close this ticket?

Wed, Oct 30, 10:23 AM · cloud-services-team (Kanban), Toolforge

Tue, Oct 29

Phamhi added a comment to T236349: PAWS instance stopped working.

Hey @Chicocvenancio ... I have manually force killed the jupyter--43riscod container on tools-paws-worker-1007. Let me know if you need anything else.

Tue, Oct 29, 5:55 PM · cloud-services-team (Kanban), PAWS
Phamhi added a comment to T236446: Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloads.

@Fae .... Try running it in one of the kubernetes python shell

Tue, Oct 29, 11:47 AM · cloud-services-team (Kanban), Cloud-VPS, video2commons
Phamhi added a comment to T234656: Systems and service continuity and availability constraints.

The first draft of the documentation has been posted. Will ask the cteam for the initial review.

Tue, Oct 29, 10:47 AM · cloud-services-team (Kanban)

Mon, Oct 28

Phamhi closed T169263: 502s and kubernetes-based tool labs service not restarting as Resolved.

This ticket is closed as part of scheduled clean-up. Please let us if this issue needs to be re-opened.

Mon, Oct 28, 12:38 PM · cloud-services-team (Kanban), Toolforge
Phamhi claimed T169263: 502s and kubernetes-based tool labs service not restarting.
Mon, Oct 28, 12:36 PM · cloud-services-team (Kanban), Toolforge
Phamhi closed T232769: Document some etcd cluster operations for Toolforge, a subtask of T232536: Toolforge Kubernetes internal API down, causing `webservice` and other tooling to fail, as Resolved.
Mon, Oct 28, 12:06 PM · Wikimedia-Incident, cloud-services-team (Kanban), Toolforge
Phamhi closed T232769: Document some etcd cluster operations for Toolforge as Resolved.

Marking as done as the documentation has met the requirement scopes of the request.

Mon, Oct 28, 12:06 PM · cloud-services-team (Kanban), Toolforge, Wikimedia-Incident
Phamhi updated the task description for T232769: Document some etcd cluster operations for Toolforge.
Mon, Oct 28, 12:05 PM · cloud-services-team (Kanban), Toolforge, Wikimedia-Incident

Fri, Oct 25

Phamhi added a project to T236445: Request increased quota for video Cloud VPS project: cloud-services-team.
Fri, Oct 25, 10:13 AM · cloud-services-team, Cloud-VPS (Quota-requests)
Phamhi closed T183090: Kubernetes should support basic CGI scripts and virtual environments as Resolved.
Fri, Oct 25, 10:09 AM · cloud-services-team (Kanban), Toolforge

Thu, Oct 24

Phamhi closed T228942: Onboard Hieu Pham to Wikimedia Foundation as SRE in Cloud Services as Resolved.
Thu, Oct 24, 4:54 PM · cloud-services-team (Kanban)
Phamhi added a comment to T228942: Onboard Hieu Pham to Wikimedia Foundation as SRE in Cloud Services.

It's fixed.. I have admin to incinga now.

Thu, Oct 24, 4:54 PM · cloud-services-team (Kanban)
Phamhi added a comment to T228942: Onboard Hieu Pham to Wikimedia Foundation as SRE in Cloud Services.

It looks like I still don't have admin access to icinga.. will attempt to update the cfg.cfg file

Thu, Oct 24, 3:27 PM · cloud-services-team (Kanban)
Phamhi reopened T228942: Onboard Hieu Pham to Wikimedia Foundation as SRE in Cloud Services as "Open".
Thu, Oct 24, 3:27 PM · cloud-services-team (Kanban)
Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Thu, Oct 24, 2:29 AM · cloud-services-team (Kanban), Toolforge

Wed, Oct 23

Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Wed, Oct 23, 8:01 PM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.

The new version of toollabs-webservice package 0.47 has been pushed out to:

Wed, Oct 23, 1:40 PM · cloud-services-team (Kanban), Toolforge
Phamhi closed T217025: webservice: misleading error message `Pod resisted shutdown` as Resolved.

I am closing this issue as I have confirmed that kmlexport uses grid not kubernetes (since CGI.pm isn't available). As the service is a simple lighttp based serving a single kmlexport.pl file, this issue was most likely one-off.

Wed, Oct 23, 11:35 AM · cloud-services-team (Kanban), Toolforge
Phamhi closed T169283: Homedir for user cosmiclattes is very large (>60G) as Resolved.

Jephpaul responded to the email letting us know that they have actioned the ticket. Confirmed that /home/cosmiclattes now only occupies 16GB of disk space.

Wed, Oct 23, 10:35 AM · cloud-services-team (Kanban), Toolforge

Tue, Oct 22

Phamhi moved T183090: Kubernetes should support basic CGI scripts and virtual environments from Inbox to Doing on the cloud-services-team (Kanban) board.
Tue, Oct 22, 11:52 AM · cloud-services-team (Kanban), Toolforge
Phamhi moved T234656: Systems and service continuity and availability constraints from Inbox to Doing on the cloud-services-team (Kanban) board.
Tue, Oct 22, 11:45 AM · cloud-services-team (Kanban)
Phamhi moved T234656: Systems and service continuity and availability constraints from Backlog to Kanban on the cloud-services-team board.
Tue, Oct 22, 11:44 AM · cloud-services-team (Kanban)

Mon, Oct 21

Phamhi added a comment to T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.

Hi @bd808 .. I think I got a grip on creating new debian package, uploading it to the repo and updating/push new docker images.. can we proceed with this step Merge and deploy the updated webservice code including Docker image rebuilds ?

Mon, Oct 21, 6:20 PM · cloud-services-team (Kanban), Toolforge
Phamhi updated Phamhi.
Mon, Oct 21, 6:04 PM
Phamhi closed T218461: `webservice --backend=kubernetes python restart` starts a php5.6 webservice as Resolved.
Mon, Oct 21, 5:51 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Phamhi added a comment to T218461: `webservice --backend=kubernetes python restart` starts a php5.6 webservice.

All jessie and stretch docker images have been rebuilt with toollabs-webservice package installed. Update images have been pushed to the docker registry.

Mon, Oct 21, 5:50 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Phamhi added a comment to T194859: Toolforge maintain-kubeusers doesn't fail well when LDAP servers are unreachable.

Hi @Bstorm... do you think it's safe to say that this workaround works and we can close this issue?

Mon, Oct 21, 2:42 PM · cloud-services-team (Kanban), Toolforge
Phamhi moved T217025: webservice: misleading error message `Pod resisted shutdown` from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Oct 21, 2:38 PM · cloud-services-team (Kanban), Toolforge
Phamhi moved T169283: Homedir for user cosmiclattes is very large (>60G) from Inbox to Doing on the cloud-services-team (Kanban) board.
Mon, Oct 21, 2:38 PM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T183090: Kubernetes should support basic CGI scripts and virtual environments.

I was able to successfully follow bd808's instructions. Does it meet your original requirements?

Mon, Oct 21, 2:35 PM · cloud-services-team (Kanban), Toolforge
Phamhi claimed T183090: Kubernetes should support basic CGI scripts and virtual environments.
Mon, Oct 21, 2:34 PM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T169283: Homedir for user cosmiclattes is very large (>60G).

Notified the user by email.

Mon, Oct 21, 1:31 PM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T169283: Homedir for user cosmiclattes is very large (>60G).

We have tracked you down as the owner of the cosmiclattes account.

Mon, Oct 21, 1:16 PM · cloud-services-team (Kanban), Toolforge
Phamhi claimed T217025: webservice: misleading error message `Pod resisted shutdown`.
Mon, Oct 21, 11:32 AM · cloud-services-team (Kanban), Toolforge
Phamhi claimed T169283: Homedir for user cosmiclattes is very large (>60G).
Mon, Oct 21, 11:23 AM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T218461: `webservice --backend=kubernetes python restart` starts a php5.6 webservice.

I will update the docker images to include this fix.

Mon, Oct 21, 11:08 AM · cloud-services-team (Kanban), Kubernetes, Toolforge

Fri, Oct 18

Phamhi added a comment to T224585: Migrate labmon* to Stretch (or Buster, better yet!).

Please disregard my last comment. Moritz has just let us know that Grafana6 is now available on Buster... https://debmonitor.wikimedia.org/packages/grafana

Fri, Oct 18, 1:05 PM · Cloud-VPS (Debian Jessie Deprecation), cloud-services-team (Kanban), Operations
Phamhi added a comment to T224585: Migrate labmon* to Stretch (or Buster, better yet!).

Hi @CDanis, could you please let me know the timeline for getting Grafana package on Buster repo?

Fri, Oct 18, 10:16 AM · Cloud-VPS (Debian Jessie Deprecation), cloud-services-team (Kanban), Operations

Thu, Oct 17

Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Thu, Oct 17, 4:04 PM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T218461: `webservice --backend=kubernetes python restart` starts a php5.6 webservice.

The new version of toollabs-webservice package 0.46 with this fix has been pushed out to:

Thu, Oct 17, 2:51 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Phamhi claimed T196815: Document uploading SSH keys to Striker / Toolsadmin.
Thu, Oct 17, 11:16 AM · cloud-services-team (Kanban), Documentation, Toolforge
Phamhi added a comment to T196815: Document uploading SSH keys to Striker / Toolsadmin.

Hi @SalixAlba . We are just doing some clean up... do you still need help with this issue? If not, please let us know if we can close this ticket.

Thu, Oct 17, 11:16 AM · cloud-services-team (Kanban), Documentation, Toolforge
Phamhi closed T235629: Port toollabs-webservice to buster-tools, a subtask of T230961: Install a version of Python newer than 3.5.3 in Toolforge, as Resolved.
Thu, Oct 17, 11:08 AM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi closed T235629: Port toollabs-webservice to buster-tools as Resolved.

This task is done. The package toollabs-webservice 0.46 has been ported to buster-tools repo.

Thu, Oct 17, 11:08 AM · cloud-services-team (Kanban), Toolforge (Software install/update)

Wed, Oct 16

Phamhi created T235629: Port toollabs-webservice to buster-tools.
Wed, Oct 16, 12:05 PM · cloud-services-team (Kanban), Toolforge (Software install/update)
Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

It looks like toollabs-webservice is not in buster-tools... will ask for help from arturo again to port this package

Wed, Oct 16, 11:53 AM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)

Tue, Oct 15

Phamhi moved T230961: Install a version of Python newer than 3.5.3 in Toolforge from Needs discussion to Doing on the cloud-services-team (Kanban) board.
Tue, Oct 15, 4:37 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi moved T230961: Install a version of Python newer than 3.5.3 in Toolforge from Inbox to Needs discussion on the cloud-services-team (Kanban) board.
Tue, Oct 15, 4:31 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi added a project to T230961: Install a version of Python newer than 3.5.3 in Toolforge: cloud-services-team (Kanban).
Tue, Oct 15, 4:30 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi moved T218461: `webservice --backend=kubernetes python restart` starts a php5.6 webservice from Needs discussion to Doing on the cloud-services-team (Kanban) board.
Tue, Oct 15, 4:29 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Phamhi moved T218461: `webservice --backend=kubernetes python restart` starts a php5.6 webservice from Doing to Needs discussion on the cloud-services-team (Kanban) board.
Tue, Oct 15, 4:06 PM · cloud-services-team (Kanban), Kubernetes, Toolforge
Phamhi added a comment to T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.

Wikitech documentation https://w.wiki/9go has been updated

Tue, Oct 15, 2:54 PM · cloud-services-team (Kanban), Toolforge
Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Tue, Oct 15, 2:53 PM · cloud-services-team (Kanban), Toolforge
Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

Woot woot... confirmed.. thanks arturo for blocking me

Tue, Oct 15, 12:28 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

I might as well extend the scope of this ticket to update all docker images based on buster.

Tue, Oct 15, 11:48 AM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Tue, Oct 15, 10:23 AM · cloud-services-team (Kanban), Toolforge

Oct 12 2019

Phamhi updated subscribers of T230961: Install a version of Python newer than 3.5.3 in Toolforge.

I think similar to https://phabricator.wikimedia.org/T200660, we need to forward-port python-pykube to buster. Will talk to @aborrero for help.

Oct 12 2019, 2:57 AM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)

Oct 11 2019

Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

I found a couple of previous tickets related to this:

Oct 11 2019, 9:31 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

python37/web (specifically toollabs-webservice) needs python-pykube which doesn't look like it's available in buster

Oct 11 2019, 9:30 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)
Phamhi added a comment to T230961: Install a version of Python newer than 3.5.3 in Toolforge.

stretch doesn't have python3.6 so as per suggestion from bstorm, I'm going to skip to python3.7 since it's available on buster

Oct 11 2019, 8:25 PM · cloud-services-team (Kanban), Patch-For-Review, Toolforge (Software install/update)

Oct 9 2019

Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Oct 9 2019, 2:36 PM · cloud-services-team (Kanban), Toolforge
Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Oct 9 2019, 2:34 PM · cloud-services-team (Kanban), Toolforge

Oct 8 2019

Phamhi updated the task description for T233347: Remove access.log generation from default lighttpd.conf generated by `webservice`.
Oct 8 2019, 8:06 PM · cloud-services-team (Kanban), Toolforge

Oct 7 2019

Phamhi added a comment to T234656: Systems and service continuity and availability constraints.

Ongoing documentation can be found here https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Systems_and_Service_Continuity

Oct 7 2019, 5:15 PM · cloud-services-team (Kanban)
Phamhi closed T234819: tools-sgewebgrid-lighttpd-0902 at critical disk space (<20.00%) as Resolved.
Oct 7 2019, 2:21 PM · cloud-services-team
Phamhi added a comment to T234819: tools-sgewebgrid-lighttpd-0902 at critical disk space (<20.00%).

The /tmp mount has been cleaned.

Oct 7 2019, 2:20 PM · cloud-services-team
Phamhi added a comment to T234819: tools-sgewebgrid-lighttpd-0902 at critical disk space (<20.00%).

Confirmed that /tmp is at 99% disk usage. Will attempt to cleanup unused resources.

Oct 7 2019, 12:43 PM · cloud-services-team
Phamhi created T234819: tools-sgewebgrid-lighttpd-0902 at critical disk space (<20.00%).
Oct 7 2019, 12:42 PM · cloud-services-team

Oct 4 2019

Phamhi added a project to T234656: Systems and service continuity and availability constraints: cloud-services-team.
Oct 4 2019, 3:45 PM · cloud-services-team (Kanban)
Phamhi created T234656: Systems and service continuity and availability constraints.
Oct 4 2019, 3:44 PM · cloud-services-team (Kanban)
Phamhi added a comment to T224585: Migrate labmon* to Stretch (or Buster, better yet!).

It looks like both grafana and python-whisper are not available on buster

Oct 4 2019, 2:43 PM · Cloud-VPS (Debian Jessie Deprecation), cloud-services-team (Kanban), Operations

Oct 3 2019

Phamhi added a comment to T224585: Migrate labmon* to Stretch (or Buster, better yet!).

I commented the lines related to puppetdb (34 to 44 https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/prometheus/manifests/class_config.pp) and was able to run it locally on a test labmon VM with buster installed

Oct 3 2019, 4:12 PM · Cloud-VPS (Debian Jessie Deprecation), cloud-services-team (Kanban), Operations
Phamhi closed T187812: https://tools.wmflabs.org/?tool=wikihistory is missing as Resolved.

It looks like it's been resolved since I no longer see the No such tool? Trying to guess, are you? error.

Oct 3 2019, 1:20 PM · cloud-services-team (Kanban), Tool-admin, Toolforge
Phamhi claimed T187812: https://tools.wmflabs.org/?tool=wikihistory is missing.
Oct 3 2019, 1:19 PM · cloud-services-team (Kanban), Tool-admin, Toolforge
Phamhi added a member for cloud-services-team: Phamhi.
Oct 3 2019, 1:15 PM
Phamhi added a watcher for cloud-services-team (Kanban): Phamhi.
Oct 3 2019, 1:14 PM
Phamhi removed a watcher for cloud-services-team (FY2019-20): Phamhi.
Oct 3 2019, 1:13 PM
Phamhi added a watcher for cloud-services-team (FY2019-20): Phamhi.
Oct 3 2019, 1:13 PM
Phamhi updated Phamhi.
Oct 3 2019, 1:09 PM
Phamhi updated Phamhi.
Oct 3 2019, 1:08 PM
Phamhi closed T136197: icelab is using 245G in Tools, a subtask of T136212: Contact tool maintainters using large amounts of disk space, as Resolved.
Oct 3 2019, 1:05 PM · Goal, Toolforge, Cloud-Services
Phamhi closed T136197: icelab is using 245G in Tools as Resolved.

Closed as the directory looks like it's already been cleaned.

Oct 3 2019, 1:05 PM · cloud-services-team (Kanban), Toolforge