Page MenuHomePhabricator
Feed Advanced Search

Tue, Aug 6

akosiaris committed rDEPLOYCHARTSbb7b60b556f9: Revert Add resources stanza to prometheus-metrics-exporter (authored by akosiaris).
Revert Add resources stanza to prometheus-metrics-exporter
Tue, Aug 6, 4:33 PM
akosiaris committed rDEPLOYCHARTS570e9797a096: blubberoid/sessionstore: Bump requests/limits (authored by akosiaris).
blubberoid/sessionstore: Bump requests/limits
Tue, Aug 6, 4:32 PM
akosiaris committed rDEPLOYCHARTSa57261b4dd8e: Fixup limitranges for citoid,cxserver (authored by akosiaris).
Fixup limitranges for citoid,cxserver
Tue, Aug 6, 4:15 PM
akosiaris committed rDEPLOYCHARTS1d231a283771: codfw: Bump all LimitRanges and ResourceQuotas (authored by akosiaris).
codfw: Bump all LimitRanges and ResourceQuotas
Tue, Aug 6, 3:52 PM
akosiaris committed rDEPLOYCHARTS447742bb881f: staging: Bump all LimitRanges and ResourceQuotas (authored by akosiaris).
staging: Bump all LimitRanges and ResourceQuotas
Tue, Aug 6, 3:37 PM
akosiaris added a comment to T199219: WDQS should use internal endpoint to communicate to Wikidata.

The other thing I want to mention and was missing here is overhead of encryption and TLS handshakes. In the @BBlack's example, we still use TLS but if you use plain http request, it's considerably faster (in both overhead of encryption and decryption):

ladsgroup@mwmaint1002:~$ time curl -H 'Host: www.wikidata.org' 'http://appservers-ro.discovery.wmnet/wiki/Special:EntityData/Q7251.ttl?revision=992109551&flavor=dump' > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  123k    0  123k    0     0   508k      0 --:--:-- --:--:-- --:--:--  510k
real	0m0.256s
user	0m0.008s
sys	0m0.004s

Unless there's any reason to encrypt requests internally, I think this would help us greatly.

Tue, Aug 6, 2:07 PM · Performance-Team (Radar), Wikidata, Wikidata-Query-Service
akosiaris committed rDEPLOYCHARTS00a179a855e5: mathoid: Take tiller into account as well (authored by akosiaris).
mathoid: Take tiller into account as well
Tue, Aug 6, 12:57 PM
akosiaris committed rDEPLOYCHARTS22c738922b64: mathoid: Partialy revert 269abb124130e0f (authored by akosiaris).
mathoid: Partialy revert 269abb124130e0f
Tue, Aug 6, 11:56 AM
akosiaris added a reverting change for rDEPLOYCHARTS269abb124130: k8s, codfw: disabling quotas on some namespaces.: rDEPLOYCHARTS22c738922b64: mathoid: Partialy revert 269abb124130e0f.
Tue, Aug 6, 11:56 AM
akosiaris committed rDEPLOYCHARTS1c9ec1b6c9bc: calico: add all kafka-main hosts to k8s eventgate policy (authored by akosiaris).
calico: add all kafka-main hosts to k8s eventgate policy
Tue, Aug 6, 11:50 AM
akosiaris added a comment to T229287: Profile wikifeeds memory usage for Helm chart.

Thanks for running this.
My only point is that the max CPU looks suspiciously close to 1, which is the default value in https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/_scaffold/values.yaml#25. This could be artificially limiting the app and could explain the errors. If you have already set it to higher values during your benchmarking disregard the next sentence. Otherwise, you might want want to bump it (considerably, say 10) and rerun the benchmark.
Great work, thanks a lot!

I ran into that problem in the beginning, than I set the CPU limit to 2 and memory to 6GB. Do you think CPU should be even higher?

Tue, Aug 6, 10:39 AM · Patch-For-Review, Wikifeeds, Product-Infrastructure-Team-Backlog (Kanban)
akosiaris added a comment to T229287: Profile wikifeeds memory usage for Helm chart.

Thanks for running this.

Tue, Aug 6, 9:47 AM · Patch-For-Review, Wikifeeds, Product-Infrastructure-Team-Backlog (Kanban)
akosiaris committed rDEPLOYCHARTSb2ecd2e4fec0: mathoid: Fix metrics exporter livenessProbe (authored by akosiaris).
mathoid: Fix metrics exporter livenessProbe
Tue, Aug 6, 8:44 AM
akosiaris committed rDEPLOYCHARTS22443a07e9c4: mathoid: Align limitranges/resourcequotas in staging (authored by akosiaris).
mathoid: Align limitranges/resourcequotas in staging
Tue, Aug 6, 8:44 AM
akosiaris committed rDEPLOYCHARTSf87e8ff73633: Add resources stanza to prometheus-metrics-exporter (authored by akosiaris).
Add resources stanza to prometheus-metrics-exporter
Tue, Aug 6, 8:19 AM
akosiaris awarded T224559: Migrate Failoid hosts to Stretch/Buster a Like token.
Tue, Aug 6, 7:39 AM · Traffic, serviceops, Operations
akosiaris added a comment to T229903: eqiad/codfw: One VM for Failoid.

LGTM. Naming wise I 'd say let's do failoid{1,2}001.(eqiad|codfw).wmnet instead of the less obvious tureis/roentgenium that we have now.

Tue, Aug 6, 7:35 AM · vm-requests, Operations

Mon, Aug 5

akosiaris committed rDEPLOYCHARTS5723d5fb98a5: Increase mathoid resourcequotas (authored by akosiaris).
Increase mathoid resourcequotas
Mon, Aug 5, 4:56 PM
akosiaris updated the task description for T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).
Mon, Aug 5, 4:08 PM · DC-Ops, Operations, ops-eqiad
akosiaris closed T200832: remove mathoid from scb as Resolved.

I see 'mathoid' => 'http://deployment-docker-mathoid01.eqiad.wmflabs:10044' so that last part has been addressed, so yes, we can close this.

Mon, Aug 5, 3:55 PM · Beta-Cluster-Infrastructure, Core Platform Team Legacy (Watching / External), Services (watching), SCB, Mathoid, Operations
akosiaris added a comment to T223953: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes.

restrouter was temporarily deployed in the staging cluster today. Deployment was rolled back as it was failing, trying to reach out to restbase on port 7233, where restbase does not listen on yet. As soon as we figure out the exact details of the migration plan this should be ready to go. Those are

Mon, Aug 5, 12:57 PM · CPT Initiatives (RESTBase Split (CDP2)), Patch-For-Review, Release Pipeline, Kubernetes, serviceops, Service-deployment-requests, Operations
akosiaris added a parent task for T223953: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes: T228676: Self-service Deployment Pipeline.
Mon, Aug 5, 12:53 PM · CPT Initiatives (RESTBase Split (CDP2)), Patch-For-Review, Release Pipeline, Kubernetes, serviceops, Service-deployment-requests, Operations
akosiaris added a subtask for T228676: Self-service Deployment Pipeline: T223953: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes.
Mon, Aug 5, 12:53 PM · Goal, Operations, Release Pipeline, Release-Engineering-Team (Pipeline), serviceops
akosiaris committed rDEPLOYCHARTS4c37caaffc1a: Restrouter: Specify the correct image (authored by akosiaris).
Restrouter: Specify the correct image
Mon, Aug 5, 9:03 AM
akosiaris committed rDEPLOYCHARTSe2b48335ae50: Realign restrouter limitranges (authored by akosiaris).
Realign restrouter limitranges
Mon, Aug 5, 8:45 AM

Fri, Aug 2

akosiaris added a comment to T102099: Fix IPv6 autoconf issues once and for all, across the fleet..

This still leaves all the servers currently installed which have a MAC based SLAAC address i.e. they do not have interface::add_ip6_mapped. It seems to me that it would be useful to ensure interface::add_ip6_mapped is added to the standard module so it is applied universally. However we still have 965 machines in this state as such changing everything in one go may pose to high a risk.

Fri, Aug 2, 1:48 PM · Patch-For-Review, Traffic, netops, Operations, IPv6
akosiaris committed rDEPLOYCHARTSbc22511d75ba: restrouter: Switch to event_service_uri (authored by akosiaris).
restrouter: Switch to event_service_uri
Fri, Aug 2, 8:53 AM
akosiaris committed rDEPLOYCHARTS590f152b70d6: restrouter: Add helmfile stanzas (authored by akosiaris).
restrouter: Add helmfile stanzas
Fri, Aug 2, 8:53 AM
akosiaris added a comment to T229287: Profile wikifeeds memory usage for Helm chart.

So, I finish the setup and started some preliminary tests with the endpoint /v1/feed/onthisday, you can see it in the following image.

Fri, Aug 2, 8:45 AM · Patch-For-Review, Wikifeeds, Product-Infrastructure-Team-Backlog (Kanban)

Thu, Aug 1

akosiaris added a comment to T229287: Profile wikifeeds memory usage for Helm chart.

I am now facing an odd issue that seems related to the k8s instance I'm running. When hitting some endpoints I got the following error:

{
  status: 504,
  type: "internal_http_error",
  detail: "Error: unable to get local issuer certificate",
  method: "post",
  uri: "https://en.wikipedia.org/w/api.php"
}

It looks like a service-runner requirement that is missing, @akosiaris do you have any ideas why this could be happening? cc/ @Pchelolo

Thu, Aug 1, 3:19 PM · Patch-For-Review, Wikifeeds, Product-Infrastructure-Team-Backlog (Kanban)
akosiaris added a comment to T229287: Profile wikifeeds memory usage for Helm chart.

For posterity's and transparency's sake, pasting the answer I already gave to @MSantos via email

Thu, Aug 1, 2:34 PM · Patch-For-Review, Wikifeeds, Product-Infrastructure-Team-Backlog (Kanban)
akosiaris committed rLPRI5621322074f4: Correctly structure the restrouter private data (authored by akosiaris).
Correctly structure the restrouter private data
Thu, Aug 1, 9:35 AM
akosiaris committed rLPRI5a06325b14c2: Add restrouter dummy secrets (authored by akosiaris).
Add restrouter dummy secrets
Thu, Aug 1, 9:19 AM

Wed, Jul 31

akosiaris committed rLPRI6a030d3fe838: Add restrouter kubernetes private data (authored by akosiaris).
Add restrouter kubernetes private data
Wed, Jul 31, 10:10 AM
akosiaris added a comment to T229236: Investigate if the code of Graphoid uses a proper user agent header.

@Lydia_Pintscher @alaa_wmde So the current user agent of graphoid service is graphoid (yurik at wikimedia). Yurik has left Wikimedia for a couple years I think. I made a patch to fix it but it fails because blubber version 3 is not supported anymore (does it mean we can't merge anything in graphoid right now? @akosiaris knows better)

Wed, Jul 31, 8:51 AM · Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), User-Ladsgroup, Patch-For-Review, Graphoid, Wikidata

Tue, Jul 30

akosiaris committed rDEPLOYCHARTS20c1649df1e1: Add anycast recdns to calico filters (authored by akosiaris).
Add anycast recdns to calico filters
Tue, Jul 30, 9:18 AM
akosiaris closed T227640: Migrate ORES pool counters to Buster as Resolved.
Tue, Jul 30, 8:27 AM · Scoring-platform-team, ORES, SRE-tools
akosiaris added a comment to T227640: Migrate ORES pool counters to Buster.

hosts removed from puppet code, remove from puppetdb, certs revoked and VMs removed from the cluster. Resolving.

Tue, Jul 30, 8:27 AM · Scoring-platform-team, ORES, SRE-tools

Mon, Jul 29

akosiaris triaged T229209: Strengthen backup infrastructure and support as Normal priority.
Mon, Jul 29, 8:50 AM · Goal, DBA, serviceops, Operations
akosiaris created T229209: Strengthen backup infrastructure and support.
Mon, Jul 29, 8:50 AM · Goal, DBA, serviceops, Operations
akosiaris added a comment to T208566: puppet.git rake fails with ruby 2.5.

The Gemfile had Puppet 4.8.2 to match the version provided by Debian Jessie:

Mon, Jul 29, 7:49 AM · Patch-For-Review, Continuous-Integration-Config, Operations, Puppet
akosiaris triaged T227640: Migrate ORES pool counters to Buster as Normal priority.
Mon, Jul 29, 6:38 AM · Scoring-platform-team, ORES, SRE-tools

Thu, Jul 25

akosiaris added a comment to T208566: puppet.git rake fails with ruby 2.5.

I 've uploaded to 4.10.2 today using https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/524525/ fwiw. I had not seen this task, sorry about that. Feel free to revert if I caused any problems (altough a fleet wide PCC said ok)

Thu, Jul 25, 3:39 PM · Patch-For-Review, Continuous-Integration-Config, Operations, Puppet
akosiaris moved T207200: Revisit the logging work done on Q1 2017-2018 for the standard pod setup from Backlog to Doing on the serviceops board.
Thu, Jul 25, 3:21 PM · serviceops, Release-Engineering-Team (Pipeline), Release-Engineering-Team-TODO, Core Platform Team Legacy (Watching / External), Services (watching), Release Pipeline, Operations
akosiaris added a project to T207200: Revisit the logging work done on Q1 2017-2018 for the standard pod setup: serviceops.
Thu, Jul 25, 3:21 PM · serviceops, Release-Engineering-Team (Pipeline), Release-Engineering-Team-TODO, Core Platform Team Legacy (Watching / External), Services (watching), Release Pipeline, Operations
akosiaris closed T228403: eqiad: One VM request for identity provider as Resolved.

idp1001.wikimedia.org has been installed and is up and running, I 'll resolve this

Thu, Jul 25, 1:10 PM · Patch-For-Review, vm-requests, Operations
akosiaris closed T228733: Add more SREs to gerritadmin LDAP group as Resolved.

+1. Thanks @Joe , thanks @Dzahn. I 've added both of you to the gerritadmin group.

Thu, Jul 25, 7:59 AM · Release-Engineering-Team-TODO, Release-Engineering-Team (Development services), Gerrit, LDAP-Access-Requests, Operations
akosiaris updated subscribers of T228926: rack/setup/instal (4) CI ganeti nodes.

Despite the designation as CI, we will be treating these uniformly as far as ganeti goes (we will handling the capacity allocations within ganeti) so:

Thu, Jul 25, 7:01 AM · ops-eqiad, Operations
akosiaris added a comment to T228924: rack/setup/install ganeti10([09]|1[0-8[).eqiad.wmnet.

Indeed the refreshes are for ganeti100[1-4] so row C it is. Try to spread them across 1G racks.

Thu, Jul 25, 6:58 AM · ops-eqiad, vm-requests, Operations

Jul 23 2019

akosiaris added a comment to T227529: Request rename of "waldir" to "waldyrious" on LDAP.

Your gerrit username (the one that is depicted in the username line on F29825695, can not change. That's something that is in gerrit, there is nothing we can do about that (aside from creating a new account, and even then deleting an account is not safe). What can change is the Full name (which is what is used in most screens/dashboards), which should happen when clicking that reload button, after which you should be kicked out of gerrit and forced to login again.

Jul 23 2019, 1:27 PM · LDAP-Access-Requests
akosiaris committed rDEPLOYCHARTSf22617d2d3d5: Redeploy on stream-config changes (authored by akosiaris).
Redeploy on stream-config changes
Jul 23 2019, 12:48 PM
akosiaris lowered the priority of T225199: Fatal error during RecentChange::notifyEdit (deferred update) from ORES/RecentChangeSaveHookHandler from Unbreak Now! to High.

Lowering priority from UBN, since that was on Jun 19th and it's been 4 days already.

Jul 23 2019, 12:41 PM · Growth-Team (Current Sprint), MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Scoring-platform-team, WMF-JobQueue, ORES, Wikimedia-production-error
akosiaris triaged T228700: helmfile apply with values.yaml file change did not deploy new k8s pods as Normal priority.
Jul 23 2019, 12:15 PM · Patch-For-Review, Analytics, serviceops, EventBus
akosiaris added a comment to T228700: helmfile apply with values.yaml file change did not deploy new k8s pods.

I think the issue is on the stream-config.yaml file, not the config.yaml template. Using .Files.Get means the file is taken as is and not invoked as template. I 've uploaded a change to fix that.

Jul 23 2019, 12:15 PM · Patch-For-Review, Analytics, serviceops, EventBus
akosiaris added a comment to T224572: Migrate pool counters to Buster.

poolcounter1004 has just been added

Jul 23 2019, 9:11 AM · serviceops, Operations
akosiaris triaged T228733: Add more SREs to gerritadmin LDAP group as Normal priority.
Jul 23 2019, 9:02 AM · Release-Engineering-Team-TODO, Release-Engineering-Team (Development services), Gerrit, LDAP-Access-Requests, Operations
akosiaris created T228733: Add more SREs to gerritadmin LDAP group.
Jul 23 2019, 8:59 AM · Release-Engineering-Team-TODO, Release-Engineering-Team (Development services), Gerrit, LDAP-Access-Requests, Operations
akosiaris committed rDEPLOYCHARTSeca24e6a11af: Fix bug in scaffold configmap.yaml and deployment.yaml (authored by jeena).
Fix bug in scaffold configmap.yaml and deployment.yaml
Jul 23 2019, 8:01 AM
akosiaris added a comment to T227140: a4-eqiad pdu refresh.

Sigh, this was already done. I just hope the info added will be useful at some point in the future as a guide

Jul 23 2019, 7:05 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated the task description for T227133: a8-eqiad pdu refresh (Thursday 9/19 @11am UTC).
Jul 23 2019, 7:03 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated the task description for T227143: a7-eqiad pdu refresh.
Jul 23 2019, 7:02 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated subscribers of T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).

sudo gnt-node migrate -f ganeti1006

Jul 23 2019, 7:02 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated the task description for T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).
Jul 23 2019, 7:00 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated subscribers of T227141: a5-eqiad pdu refresh.

sudo gnt-node migrate -f ganeti1008

Jul 23 2019, 6:58 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated the task description for T227141: a5-eqiad pdu refresh.
Jul 23 2019, 6:56 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated subscribers of T227140: a4-eqiad pdu refresh.
Jul 23 2019, 6:53 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated the task description for T227140: a4-eqiad pdu refresh.
Jul 23 2019, 6:51 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated subscribers of T227139: a3-eqiad pdu refresh.
Jul 23 2019, 6:49 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated the task description for T227139: a3-eqiad pdu refresh.
Jul 23 2019, 6:44 AM · DC-Ops, Operations, ops-eqiad
akosiaris added a comment to T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC).

conf1001 is fine to powerdown (no depool necessary), perform all wanted actions and then poweron as it will repool itself automatically

Jul 23 2019, 6:41 AM · DC-Ops, Operations, ops-eqiad
akosiaris updated the task description for T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC).
Jul 23 2019, 6:39 AM · DC-Ops, Operations, ops-eqiad

Jul 22 2019

akosiaris moved T228676: Self-service Deployment Pipeline from Backlog to Goal tasks on the serviceops board.
Jul 22 2019, 3:27 PM · Goal, Operations, Release Pipeline, Release-Engineering-Team (Pipeline), serviceops
akosiaris triaged T228676: Self-service Deployment Pipeline as Normal priority.
Jul 22 2019, 3:23 PM · Goal, Operations, Release Pipeline, Release-Engineering-Team (Pipeline), serviceops
akosiaris created T228676: Self-service Deployment Pipeline.
Jul 22 2019, 3:21 PM · Goal, Operations, Release Pipeline, Release-Engineering-Team (Pipeline), serviceops

Jul 19 2019

akosiaris closed T220235: Migrate Beta cluster services to use Kubernetes as Resolved.

Agreed with @Krenair, closing for now.

Jul 19 2019, 8:47 AM · Core Platform Team (Needs Cleaning - Services Operations), Editing-team, Kubernetes, Release Pipeline, serviceops, Beta-Cluster-Infrastructure
akosiaris triaged T226814: Create termbox release for test.wikidata.org as High priority.
Jul 19 2019, 8:44 AM · Wikibase-Termbox-Iteration-20, Wikidata-Termbox-Iteration-19, serviceops
akosiaris moved T226814: Create termbox release for test.wikidata.org from Backlog to Doing on the serviceops board.
Jul 19 2019, 8:44 AM · Wikibase-Termbox-Iteration-20, Wikidata-Termbox-Iteration-19, serviceops
akosiaris moved T226237: Investigate outgoing discarded packets in the codfw kubernetes cluster from Doing to Next up on the serviceops board.
Jul 19 2019, 8:43 AM · serviceops

Jul 18 2019

akosiaris added a comment to T227833: Everything fails with unable to load the docker file.

Good to know. Sorry for misunderstanding this!

Jul 18 2019, 3:49 PM · Operations, Wikimedia-production-error (Shared Build Failure)
akosiaris added a comment to T226814: Create termbox release for test.wikidata.org.

I 've brought this up in the weekly SRE meeting. Overall there's a number of concerns. I 'll be listing them below in no particular order

Jul 18 2019, 2:49 PM · Wikibase-Termbox-Iteration-20, Wikidata-Termbox-Iteration-19, serviceops
akosiaris closed T206339: Separate Traffic layer caches for PHP7/HHVM, a subtask of T206336: SRE quarterly goal: Ability to serve a fraction of the production traffic from PHP7, as Resolved.
Jul 18 2019, 2:34 PM · Operations
akosiaris closed T206339: Separate Traffic layer caches for PHP7/HHVM as Resolved.

I think we can resolve this, right? I am gonna be bold and resolve it, feel free to reopen if needed

Jul 18 2019, 2:34 PM · Traffic, Operations
akosiaris added a comment to T226236: Upload docker-ce 18.06.3 upstream package for Stretch.

Thanks, I can confirm the component is around and it addresses the concern of mixing up upgrades with Toolforge. However that imports 18.09.7 but we need the previous version 18.06.x for now :-\

Jul 18 2019, 2:30 PM · serviceops, Operations, Continuous-Integration-Infrastructure (phase-out-jessie)
akosiaris added a comment to T228403: eqiad: One VM request for identity provider.

LGTM

Jul 18 2019, 12:16 PM · Patch-For-Review, vm-requests, Operations
akosiaris awarded T228403: eqiad: One VM request for identity provider a Like token.
Jul 18 2019, 12:16 PM · Patch-For-Review, vm-requests, Operations
akosiaris added a comment to T227833: Everything fails with unable to load the docker file.

Is this still ongoing? Do I understand correctly that the change to Resolved status was erroneuous? Should we reopen?

Jul 18 2019, 11:18 AM · Operations, Wikimedia-production-error (Shared Build Failure)
akosiaris closed T223458: mgmt outages for cloud* systems seem to page everyone as Resolved.

Changed merged, this should be resolved now.

Jul 18 2019, 8:36 AM · Patch-For-Review, cloud-services-team (Kanban)
akosiaris added a comment to T222866: Ores hosts: mwparserfromhell tokenizer random segfault.

Indeed. done. Thanks!

Jul 18 2019, 8:27 AM · Scoring-platform-team (Current), Patch-For-Review, serviceops, ORES
akosiaris changed the visibility for T222866: Ores hosts: mwparserfromhell tokenizer random segfault.
Jul 18 2019, 8:27 AM · Scoring-platform-team (Current), Patch-For-Review, serviceops, ORES
akosiaris added a comment to T227529: Request rename of "waldir" to "waldyrious" on LDAP.

^ Worth to document this somewhere™ (or in more places) to avoid repeating?

Jul 18 2019, 8:26 AM · LDAP-Access-Requests
akosiaris updated subscribers of T227529: Request rename of "waldir" to "waldyrious" on LDAP.

I am resolving this, feel free to reopen is something is amiss

The cn and sn for uid=waldir,ou=people,dc=wikimedia,dc=org are both waldyrious (lower case w). Wikitech will never be able to authenticate as MediaWiki will canonicalize the username to start with a capital letter W and wikitech is configured to enforce same case matching for user account lookup.

Jul 18 2019, 8:20 AM · LDAP-Access-Requests
akosiaris raised the priority of T228196: docker-registry: some layers has been corrupted due to deleting other swift containers from Normal to High.

For that particular image I can recreate locally:

$ docker pull docker-registry.wikimedia.org/releng/composer-test:0.1.7-s1
0.1.7-s1: Pulling from releng/composer-test
8d22d214682d: Already exists 
dd5d82f356b7: Already exists 
e74dee1208c4: Verifying Checksum 
69208455aa1f: Download complete 
f1cba75babe0: Verifying Checksum 
2cd12524c0dc: Download complete 
3d3adb31207d: Download complete 
e17ba03e55ec: Waiting 
e9dd2befc159: Verifying Checksum 
filesystem layer verification failed for digest sha256:e9dd2befc159629b6a09232c6478fa48bedfc34117be1a04c1e337ebd0a46d27
Jul 18 2019, 8:03 AM · Patch-For-Review, Release-Engineering-Team-TODO (201907), Operations, Wikimedia-Incident, serviceops

Jul 17 2019

akosiaris triaged T228296: mw2269 rebooted/crashed unexpectedly on Jul 17th ~15:30UTC as Normal priority.
Jul 17 2019, 3:59 PM · Operations
akosiaris created T228296: mw2269 rebooted/crashed unexpectedly on Jul 17th ~15:30UTC.
Jul 17 2019, 3:59 PM · Operations
akosiaris added a comment to T224794: Degraded RAID on helium.

@akosiaris or @Volans - we can order drive replacements for this, since it's out of warranty, but I was trying to figure out how this correlates with the new replacement of backup1001. Do you need replacement drives on helium, to be able to complete the migration of data over to backup1001? I'll follow up on IRC with you later tonight as well. Thanks, Willy

Jul 17 2019, 3:40 PM · ops-eqiad, Operations
akosiaris closed T223698: Request access to deployment cluster for Alaa Sarhan as Resolved.

User has been added to the cluster. Resolving, feel free to reopen

Jul 17 2019, 2:51 PM · Release-Engineering-Team (Deployment services), Release-Engineering-Team-TODO, Operations, SRE-Access-Requests
akosiaris added a comment to T211881: graphoid: Code stewardship request.

the hardware and the Operating System powering this service are going out of support.

What's the timeline for that?

Jul 17 2019, 11:16 AM · Release-Engineering-Team-TODO (201908), Release-Engineering-Team (Code Health), Core Platform Team Legacy (Watching / External), Services (watching), Operations, Code-Stewardship-Reviews, Graphoid

Jul 16 2019

akosiaris added a comment to T198939: Decommission servermon.

What about the puppet database on m1?

The database should all be ephemeral data about past server state, so no need to retain, but adding @akosiaris for confirmation given he's the primary upstream author.

Jul 16 2019, 12:20 PM · Patch-For-Review, Operations

Jul 15 2019

akosiaris added a comment to T228061: Subscribe Urbanecm to ops@lists.wikimedia.org.

Sure, just done. I also removed the @wikimedia.org one, I hope that's what you wanted.

Jul 15 2019, 3:10 PM · Wikimedia-Mailing-lists, Operations
akosiaris closed T227529: Request rename of "waldir" to "waldyrious" on LDAP as Resolved.

I am resolving this, feel free to reopen is something is amiss

Jul 15 2019, 2:52 PM · LDAP-Access-Requests