Page MenuHomePhabricator

fgiunchedi (Filippo Giunchedi)
/* No comment */

Projects (18)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 8:06 AM (425 w, 1 d)
Availability
Available
IRC Nick
godog
LDAP User
Filippo Giunchedi
MediaWiki User
Filippo Giunchedi [ Global Accounts ]

Recent Activity

Yesterday

fgiunchedi moved T303154: Upgrade Thanos to latest version from Backlog to Up next on the User-fgiunchedi board.
Fri, Nov 25, 2:19 PM · Patch-For-Review, User-fgiunchedi, Observability-Metrics
fgiunchedi committed rODTH537bc1002ba6: Merge tag 'upstream/0.29.0' into debian/buster-wikimedia (authored by fgiunchedi).
Merge tag 'upstream/0.29.0' into debian/buster-wikimedia
Fri, Nov 25, 12:45 PM
fgiunchedi awarded T312235: [L] Image suggestions data pipeline monitoring a Like token.
Fri, Nov 25, 11:17 AM · Patch-For-Review, Structured-Data-Backlog (Current Work), Image-Suggestion-API, Image-Suggestions
fgiunchedi removed a project from T211018: Move restbase cassandra checks to Prometheus: User-fgiunchedi.
Fri, Nov 25, 8:52 AM · RESTBase-Cassandra
fgiunchedi removed a project from T301110: Ingest webrequest sampled 1000 into logstash: User-fgiunchedi.
Fri, Nov 25, 8:51 AM · SRE, Observability-Logging
fgiunchedi removed a project from T228380: Tech debt: sunsetting of Graphite (part 1) : User-fgiunchedi.
Fri, Nov 25, 8:46 AM · Observability-Metrics

Thu, Nov 24

fgiunchedi added a comment to T310266: Move mgmt SSH checks from Icinga to Prometheus/Alertmanager.

@fguinchedi sounds great but quick question. Will the ticket go directly to dcops? Or would it start with the team that is responsible for that service first?

Thu, Nov 24, 1:43 PM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q2)
fgiunchedi moved T323714: [alerting] Configure production karma to show alerts from metricsinfra alertmanager from Backlog to Radar on the User-fgiunchedi board.
Thu, Nov 24, 10:54 AM · User-fgiunchedi, User-dcaro, cloud-services-team (Kanban)
fgiunchedi updated subscribers of T310266: Move mgmt SSH checks from Icinga to Prometheus/Alertmanager.

With https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/804575 merged we can start opening tasks when mgmt has been unresponsive to ssh for more than 12h. The alert will open tasks to the correct dcops project in phab. @wiki_willy @Papaul @Cmjohnson @Jclark-ctr let me know what you think!

Thu, Nov 24, 10:37 AM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q2)
fgiunchedi moved T323718: decommission graphite2003.codfw.wmnet from Doing to Radar on the User-fgiunchedi board.
Thu, Nov 24, 10:06 AM · SRE, ops-codfw, User-fgiunchedi, decommission-hardware
fgiunchedi updated subscribers of T323718: decommission graphite2003.codfw.wmnet.

@Papaul host is ready for decom

Thu, Nov 24, 9:42 AM · SRE, ops-codfw, User-fgiunchedi, decommission-hardware
fgiunchedi updated the task description for T323718: decommission graphite2003.codfw.wmnet.
Thu, Nov 24, 9:42 AM · SRE, ops-codfw, User-fgiunchedi, decommission-hardware

Wed, Nov 23

fgiunchedi moved T323718: decommission graphite2003.codfw.wmnet from Backlog to Doing on the User-fgiunchedi board.
Wed, Nov 23, 4:25 PM · SRE, ops-codfw, User-fgiunchedi, decommission-hardware
fgiunchedi moved T318903: Put graphite1005 in service from Up next to Doing on the User-fgiunchedi board.
Wed, Nov 23, 4:25 PM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q3)
fgiunchedi closed T315524: Put graphite2004 in service as Resolved.

Host is in service for all intents and purposes. graphite2003 will be decom in https://phabricator.wikimedia.org/T323718

Wed, Nov 23, 4:25 PM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q3)
fgiunchedi created T323718: decommission graphite2003.codfw.wmnet.
Wed, Nov 23, 4:10 PM · SRE, ops-codfw, User-fgiunchedi, decommission-hardware
fgiunchedi added a comment to T284213: Improve AlertManager dashboard.

Now the "alert source links" for as expected even for prometheus alerts. i.e. they link to the correct site + prometheus instance for navigation/exploration of the alert

Wed, Nov 23, 9:33 AM · Observability-Alerting, Patch-For-Review, User-fgiunchedi

Tue, Nov 22

fgiunchedi added a comment to T175087: Create a navtiming processor for Prometheus.

Following up from a chat between @fgiunchedi @Krinkle and @Peter I have moved the webperf metrics scraping to the ext Prometheus instance and confirmed metrics are being scraped again. This in practice should be all transparent (esp because there isn't much usage of prometheus webperf in grafana yet).

Tue, Nov 22, 4:02 PM · Patch-For-Review, NavigationTiming, Performance-Team
fgiunchedi added a comment to T321099: ProbeSlow alerts for Wikifunctions on Beta Cluster.

I've silenced/acked the ProbeSlow alert for wikifunctions

Tue, Nov 22, 1:13 PM · Abstract Wikipedia team (Phase θ – Throttling)
fgiunchedi moved T315524: Put graphite2004 in service from Up next to Doing on the User-fgiunchedi board.
Tue, Nov 22, 1:09 PM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q3)
fgiunchedi moved T320973: [wmcs][alerting] Allow volunteer admins silencing alerts from cloudvps/toolforge/paws/quarry from Doing to Radar on the User-fgiunchedi board.
Tue, Nov 22, 1:09 PM · User-fgiunchedi, User-dcaro, cloud-services-team (Kanban)
fgiunchedi moved T320973: [wmcs][alerting] Allow volunteer admins silencing alerts from cloudvps/toolforge/paws/quarry from Backlog to Doing on the User-fgiunchedi board.
Tue, Nov 22, 10:27 AM · User-fgiunchedi, User-dcaro, cloud-services-team (Kanban)

Mon, Nov 21

fgiunchedi added a project to T320973: [wmcs][alerting] Allow volunteer admins silencing alerts from cloudvps/toolforge/paws/quarry: User-fgiunchedi.
Mon, Nov 21, 4:28 PM · User-fgiunchedi, User-dcaro, cloud-services-team (Kanban)
fgiunchedi updated the task description for T320973: [wmcs][alerting] Allow volunteer admins silencing alerts from cloudvps/toolforge/paws/quarry.
Mon, Nov 21, 4:09 PM · User-fgiunchedi, User-dcaro, cloud-services-team (Kanban)
fgiunchedi updated the task description for T320973: [wmcs][alerting] Allow volunteer admins silencing alerts from cloudvps/toolforge/paws/quarry.
Mon, Nov 21, 3:45 PM · User-fgiunchedi, User-dcaro, cloud-services-team (Kanban)
fgiunchedi closed T314353: Icinga downtimes not working as Resolved.

except for that spike it looks like check latency is under control (and going down, as we progressively remove more and more check from icinga). I'm optimistically resolving the task

Mon, Nov 21, 12:00 PM · User-fgiunchedi, Observability-Alerting, SRE

Fri, Nov 18

fgiunchedi added a comment to T156955: Standardizing our partman recipes.

I was reviewing this work again and realized the audit command should be updated. The situation in puppet.git as of 348f4a06ed is reported below.

Fri, Nov 18, 2:21 PM · Patch-For-Review, User-fgiunchedi, SRE
fgiunchedi added a comment to T310266: Move mgmt SSH checks from Icinga to Prometheus/Alertmanager.

We're basically ready to go an start opening tasks, however we should make sure hiera data is synced as part of the decom cookbook or we'll end up with false positives for decom hosts when the mgmt interface becomes unreachable. AFAICT that's not yet the case @Volans ?

Fri, Nov 18, 10:51 AM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q2)
fgiunchedi added a comment to T321874: Consider alternative configuration management tooling.

I can definitely relate with the long (and stressful!) cycles of Puppet patches you mention @bking and that one of my main motivations for starting Pontoon almost three years ago now.

Fri, Nov 18, 8:51 AM · Infrastructure-Foundations, Puppet
fgiunchedi awarded T321120: turn up 'aux' k8s cluster for o11y and other "ancillary"/"supportive" services a Like token.
Fri, Nov 18, 7:26 AM · Patch-For-Review, Observability-Tracing

Thu, Nov 17

fgiunchedi closed T319163: Add PKI support to Pontoon as Resolved.

This is now a thing! I've added bootstrap instructions at https://wikitech.wikimedia.org/wiki/Puppet/Pontoon#PKI and optimistically resolving the task

Thu, Nov 17, 1:04 PM · User-fgiunchedi, Patch-For-Review, Pontoon, SRE
fgiunchedi created P40102 (An Untitled Masterwork).
Thu, Nov 17, 12:24 PM
fgiunchedi created P40088 (An Untitled Masterwork).
Thu, Nov 17, 10:15 AM
fgiunchedi added a comment to T319214: Evaluate Benthos as stream processor.

And now (before the merge of the above patch) the data is back in sync. Hence it looks to me something that depends on live factors/loads.

Thu, Nov 17, 9:58 AM · Patch-For-Review, Event-Platform Value Stream, Data-Engineering-Planning, Observability-Logging, Machine-Learning-Team, observability
fgiunchedi added a comment to T319214: Evaluate Benthos as stream processor.

@fgiunchedi @elukey I seeing some strange behaviour of the data in the dashboard, not sure if it's a me problem or a data problem, but reporting it in case it's the latter.

Basically it seems like the data for the text records is delayed by ~7-8 minutes at the moment while the ones for upload are all much less behind.
Is that at all possible?

Thu, Nov 17, 8:07 AM · Patch-For-Review, Event-Platform Value Stream, Data-Engineering-Planning, Observability-Logging, Machine-Learning-Team, observability

Tue, Nov 15

Volans awarded T314981: Add a webrequest sampled topic and ingest into druid/turnilo a 100 token.
Tue, Nov 15, 6:11 PM · Patch-For-Review, Traffic, Data Pipelines, User-fgiunchedi, Data-Engineering-Planning, Foundational Technology Requests
fgiunchedi added a comment to T323129: Simulate client dispatch in a single scrape.

+1 on the simulation! As an additional data point on the other big constraint (i.e. ingestion/processing) to give a sense of scale on the cardinality numbers we're taking about: in eqiad prometheus ops instance (our biggest instance) ingests around 150k sample/s, all more or less on a 60s schedule.

Tue, Nov 15, 5:51 PM · NavigationTiming, Performance-Team
fgiunchedi added a comment to T288196: Retire Prometheus 'global' instance.

This would help reclaiming some space at least in eqiad, we have ~700G still free in the vg and getting rid of global would give us another 900G

Tue, Nov 15, 5:11 PM · Observability-Metrics, Performance-Team (Radar)
fgiunchedi moved T319163: Add PKI support to Pontoon from Backlog to Doing on the User-fgiunchedi board.
Tue, Nov 15, 4:25 PM · User-fgiunchedi, Patch-For-Review, Pontoon, SRE
fgiunchedi added a project to T319163: Add PKI support to Pontoon: User-fgiunchedi.
Tue, Nov 15, 4:25 PM · User-fgiunchedi, Patch-For-Review, Pontoon, SRE
fgiunchedi committed rLPRIc0b6141e337c: secrets: remove old pki intermediate keys names (authored by fgiunchedi).
secrets: remove old pki intermediate keys names
Tue, Nov 15, 1:53 PM

Mon, Nov 14

fgiunchedi committed rLPRI3173beff947c: secrets: new pki intermediates keys location (authored by fgiunchedi).
secrets: new pki intermediates keys location
Mon, Nov 14, 3:40 PM
fgiunchedi updated the task description for T313229: Production Dispatch Infrastructure.
Mon, Nov 14, 1:14 PM · SRE Observability (FY2022/2023-Q2), Patch-For-Review, User-fgiunchedi
fgiunchedi added a comment to T322795: Requesting access to analytics-privatedata-users for ryasmeen (superset access with no server access).

Interesting, thank you for the heads up @Volans . There's an obvious disconnect between CI validation of data.yaml and what cross-validate-accounts expects. Should we make sure all accounts have ssh_keys (even if empty) or should cross-validate-accounts tolerate a missing ssh_keys ? I've gone with the CI route but easy enough to change

Mon, Nov 14, 9:55 AM · User-Ryasmeen, SRE, SRE-Access-Requests
fgiunchedi added a comment to T322147: Requesting access to analytics-privatedata-users & Kerberos identity for Ilooremeta.

@fgiunchedi what would the email read like, please? I think I might have lost it in the many updates

Mon, Nov 14, 8:58 AM · SRE, SRE-Access-Requests

Thu, Nov 10

fgiunchedi committed rLPRIfe58ccc0ba20: pki: fix k8s_dse auth keys entry (authored by fgiunchedi).
pki: fix k8s_dse auth keys entry
Thu, Nov 10, 5:54 PM
fgiunchedi added a project to T310266: Move mgmt SSH checks from Icinga to Prometheus/Alertmanager: User-fgiunchedi.
Thu, Nov 10, 3:12 PM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q2)
fgiunchedi added a comment to T318209: Volunteer WMF-NDA access for Wangombe.

Thank you @Wangombe ! I see you have a -ctr@wikimedia.org email, I'm assuming that means you are currently contracting with the Foundation ? If that's the case I think we can skip the C-level signoff since the checklist is for volunteers. @Nikerabbit could you also confirm that's the case ? Thank you

Thu, Nov 10, 2:25 PM · WMF-NDA-Requests
fgiunchedi updated the task description for T318209: Volunteer WMF-NDA access for Wangombe.
Thu, Nov 10, 2:24 PM · WMF-NDA-Requests
fgiunchedi closed T322795: Requesting access to analytics-privatedata-users for ryasmeen (superset access with no server access) as Resolved.

Thank you @Ottomata ! @Ryasmeen this is complete, access will be live in the next 30min. I'm resolving the task, though feel free to reopen if sth is amiss

Thu, Nov 10, 2:20 PM · User-Ryasmeen, SRE, SRE-Access-Requests
fgiunchedi triaged T322795: Requesting access to analytics-privatedata-users for ryasmeen (superset access with no server access) as Medium priority.
Thu, Nov 10, 9:25 AM · User-Ryasmeen, SRE, SRE-Access-Requests
fgiunchedi updated subscribers of T322795: Requesting access to analytics-privatedata-users for ryasmeen (superset access with no server access).

Request looks good to me, @Ottomata @odimitrijevic I'm seeking approval for the above! Thank you

Thu, Nov 10, 9:25 AM · User-Ryasmeen, SRE, SRE-Access-Requests
fgiunchedi updated the task description for T322795: Requesting access to analytics-privatedata-users for ryasmeen (superset access with no server access).
Thu, Nov 10, 9:23 AM · User-Ryasmeen, SRE, SRE-Access-Requests
fgiunchedi assigned T322670: Requesting access to analytics-privatedata-users for David.pujol to Jcross.

Thank you @Dzahn and @Htriedman, I'm assigning to @Jcross for final approval

Thu, Nov 10, 9:14 AM · Patch-For-Review, SRE, SRE-Access-Requests
fgiunchedi closed T228497: Review sizing of maps cluster as Declined.

I'm going to be bold and decline the task -- while it is seems something valid in general I don't think anyone is actively working on it. Of course reopen as needed!

Thu, Nov 10, 9:12 AM · Product-Infrastructure-Team-Backlog, Sustainability (Incident Followup), SRE, Maps

Wed, Nov 9

fgiunchedi moved T314981: Add a webrequest sampled topic and ingest into druid/turnilo from Backlog to Up next on the User-fgiunchedi board.
Wed, Nov 9, 3:56 PM · Patch-For-Review, Traffic, Data Pipelines, User-fgiunchedi, Data-Engineering-Planning, Foundational Technology Requests
fgiunchedi moved T322523: Check confd last index in a mw-on-k8s world from Backlog to Radar on the User-fgiunchedi board.
Wed, Nov 9, 3:56 PM · MW-on-K8s, SRE Observability (FY2022/2023-Q2), User-fgiunchedi
fgiunchedi edited projects for T320403: Run the timezone update script periodically in prod and in beta, added: serviceops; removed SRE.

Thank you for reaching out @Daimona, I'll move this to serviceops for their opinion

Wed, Nov 9, 10:49 AM · Patch-For-Review, serviceops, Campaign-Tools (Campaign-Tools-Sprint-25), Wikimedia-Site-requests, CampaignEvents, Campaign-Registration
fgiunchedi closed T322723: Add new SSH key for Sam Smith as Resolved.

I've confirmed the request on Meet and patch is merged, new access will be live in the next 30 min. I'm resolving the task though feel free to reopen if sth is amiss

Wed, Nov 9, 9:32 AM · SRE-Access-Requests, SRE
fgiunchedi triaged T322670: Requesting access to analytics-privatedata-users for David.pujol as Medium priority.
Wed, Nov 9, 9:22 AM · Patch-For-Review, SRE, SRE-Access-Requests
fgiunchedi triaged T322723: Add new SSH key for Sam Smith as Medium priority.
Wed, Nov 9, 9:22 AM · SRE-Access-Requests, SRE
fgiunchedi added a comment to T322670: Requesting access to analytics-privatedata-users for David.pujol.

I have copy/pasted the expiry dates from other @tmlt.io folks, please @Htriedman confirm I got that right on https://gerrit.wikimedia.org/r/854952

Wed, Nov 9, 9:03 AM · Patch-For-Review, SRE, SRE-Access-Requests
fgiunchedi updated the task description for T322670: Requesting access to analytics-privatedata-users for David.pujol.
Wed, Nov 9, 8:51 AM · Patch-For-Review, SRE, SRE-Access-Requests
fgiunchedi updated subscribers of T322670: Requesting access to analytics-privatedata-users for David.pujol.

Hello @David.pujol ! I'll be processing this request. Overall looks good to me, though note that as a contractor we'll be adding you to nda group not wmf. In practice I don't expect to be making a whole lot of difference in terms of access though!

Wed, Nov 9, 8:48 AM · Patch-For-Review, SRE, SRE-Access-Requests
fgiunchedi closed T322147: Requesting access to analytics-privatedata-users & Kerberos identity for Ilooremeta as Resolved.

I have sent the temporary credentials via email following https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos#Create_a_principal_for_a_real_user please check, and change your password!

Wed, Nov 9, 8:36 AM · SRE, SRE-Access-Requests
fgiunchedi closed T322145: Requesting access to analytics-privatedata-users & Kerberos identity for Hghani as Resolved.
Wed, Nov 9, 8:36 AM · SRE, SRE-Access-Requests
fgiunchedi added a comment to T322145: Requesting access to analytics-privatedata-users & Kerberos identity for Hghani.

I have sent the temporary credentials via email following https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos#Create_a_principal_for_a_real_user please check, and change your password!

Wed, Nov 9, 8:36 AM · SRE, SRE-Access-Requests
fgiunchedi added a comment to T322146: Requesting access to analytics-privatedata-users & Kerberos identity for Hibashaath.

I have sent the temporary credentials via email following https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos#Create_a_principal_for_a_real_user please check, and change your password!

Wed, Nov 9, 8:35 AM · SRE, SRE-Access-Requests

Tue, Nov 8

fgiunchedi closed T273026: Errors for ifup@ens5.service after rebooting Ganeti VMs as Resolved.

I'll optimistically resolve the task, though of course reopen if sth is amiss!

Tue, Nov 8, 5:25 PM · Infrastructure-Foundations, netops, Analytics-Radar
fgiunchedi added a comment to T273026: Errors for ifup@ens5.service after rebooting Ganeti VMs.

I have deployed the bandaid above, so ifup failures will be reset (only once) on Ganeti VMs two minutes after boot. We're of course papering over the problem, though given the previous context in the task this seems the lesser evil, at least until we have ifupdown

Tue, Nov 8, 5:23 PM · Infrastructure-Foundations, netops, Analytics-Radar
fgiunchedi closed T322154: Grant Access to ldap/wmf for KMorgan as Resolved.

Yes. (On a meta level, I wonder how to make folks rely less on their memory but establish checking docs as docs may have changed over the years.)

[slight offtopic] Personally, if I do something often, I remember the steps even if I don't want to. I think the key would be to have a script that would take care of all the steps (something like add-wmf-bits kmorgan, which would do the magic).

Tue, Nov 8, 1:11 PM · SRE, LDAP-Access-Requests
fgiunchedi triaged T322154: Grant Access to ldap/wmf for KMorgan as Medium priority.
Tue, Nov 8, 10:51 AM · SRE, LDAP-Access-Requests
fgiunchedi added a comment to T322154: Grant Access to ldap/wmf for KMorgan.

If I'm not mistaken wmf-nda phab group membership was the only missing bit, does that seem correct @Aklapper ?

Tue, Nov 8, 10:50 AM · SRE, LDAP-Access-Requests
fgiunchedi reopened T322154: Grant Access to ldap/wmf for KMorgan as "Open".

Thank you @Aklapper, I'll reopen the task and finish the steps

Tue, Nov 8, 10:47 AM · SRE, LDAP-Access-Requests
fgiunchedi added a member for WMF-NDA: KMorgan-WMF.
Tue, Nov 8, 10:46 AM
fgiunchedi added a comment to T322339: Requesting access to ops and analytics for stevemunene.

@fgiunchedi - Are you happy for me to add Steve to the wmf and ops LDAP groups? I realise that we didn't specify them above, but I think they are required so I have reopened the ticket.

I will also create a kerberos identity and icinga contact etc with reference to this ticket, if that's ok with you.

Tue, Nov 8, 10:44 AM · SRE, SRE-Access-Requests
fgiunchedi triaged T322591: Requesting access to analytics-privatedata-users for Dasm as Medium priority.
Tue, Nov 8, 9:25 AM · Patch-For-Review, SRE, SRE-Access-Requests
fgiunchedi closed T322556: Site: codfw VM %request for dispatch-be2001, a subtask of T313229: Production Dispatch Infrastructure, as Resolved.
Tue, Nov 8, 9:22 AM · SRE Observability (FY2022/2023-Q2), Patch-For-Review, User-fgiunchedi
fgiunchedi closed T322556: Site: codfw VM %request for dispatch-be2001 as Resolved.

VM is up and running, for reference I've created it with:

Tue, Nov 8, 9:22 AM · vm-requests, Infrastructure-Foundations, SRE

Mon, Nov 7

fgiunchedi committed rOSNE05faaeb2bf85: hiera_export: skip mgmt for non-production tenants (authored by fgiunchedi).
hiera_export: skip mgmt for non-production tenants
Mon, Nov 7, 4:57 PM
fgiunchedi added a comment to T322556: Site: codfw VM %request for dispatch-be2001.

Filing the task for tracking purposes, I'm creating the VM ATM

Mon, Nov 7, 4:42 PM · vm-requests, Infrastructure-Foundations, SRE
fgiunchedi added a subtask for T313229: Production Dispatch Infrastructure: T322556: Site: codfw VM %request for dispatch-be2001.
Mon, Nov 7, 4:23 PM · SRE Observability (FY2022/2023-Q2), Patch-For-Review, User-fgiunchedi
fgiunchedi added a parent task for T322556: Site: codfw VM %request for dispatch-be2001: T313229: Production Dispatch Infrastructure.
Mon, Nov 7, 4:23 PM · vm-requests, Infrastructure-Foundations, SRE
fgiunchedi renamed T322556: Site: codfw VM %request for dispatch-be2001 from Site: codfw VM %request for dispatch-be to Site: codfw VM %request for dispatch-be2001.
Mon, Nov 7, 4:23 PM · vm-requests, Infrastructure-Foundations, SRE
fgiunchedi created T322556: Site: codfw VM %request for dispatch-be2001.
Mon, Nov 7, 4:23 PM · vm-requests, Infrastructure-Foundations, SRE
fgiunchedi closed T322339: Requesting access to ops and analytics for stevemunene as Resolved.

Thank you all! Patch is merged and will be fully effective in ~30min. Resolving task as completed, please reopen if something is missing and welcome again @Stevemunene !

Mon, Nov 7, 4:13 PM · SRE, SRE-Access-Requests
fgiunchedi updated the task description for T322339: Requesting access to ops and analytics for stevemunene.
Mon, Nov 7, 4:12 PM · SRE, SRE-Access-Requests
fgiunchedi closed T319299: Investigate longer run time for hiera_export netbox script, a subtask of T310266: Move mgmt SSH checks from Icinga to Prometheus/Alertmanager, as Invalid.
Mon, Nov 7, 3:17 PM · Patch-For-Review, User-fgiunchedi, SRE Observability (FY2022/2023-Q2)
fgiunchedi closed T319299: Investigate longer run time for hiera_export netbox script as Invalid.

I'm optimistically marking the task as invalid, feel free to reopen!

Mon, Nov 7, 3:17 PM · netbox, Infrastructure-Foundations, Observability-Alerting, User-fgiunchedi
fgiunchedi added a comment to T322339: Requesting access to ops and analytics for stevemunene.

Not AFAIK, we'll just need stamp of approval from @odimitrijevic and we're good to go I think

Mon, Nov 7, 1:09 PM · SRE, SRE-Access-Requests
fgiunchedi created T322523: Check confd last index in a mw-on-k8s world.
Mon, Nov 7, 10:50 AM · MW-on-K8s, SRE Observability (FY2022/2023-Q2), User-fgiunchedi
fgiunchedi added a comment to T313229: Production Dispatch Infrastructure.

I have started https://wikitech.wikimedia.org/wiki/Dispatch with some initialization steps, the page will need expanding of course!

Mon, Nov 7, 10:29 AM · SRE Observability (FY2022/2023-Q2), Patch-For-Review, User-fgiunchedi
fgiunchedi added a comment to T318209: Volunteer WMF-NDA access for Wangombe.

@Wangombe please check and sign L2 for this request and let us know when done!

Mon, Nov 7, 10:21 AM · WMF-NDA-Requests

Fri, Nov 4

fgiunchedi added a comment to T321684: haproxy::site doesn't work as expected on the first puppet run.

Thank you for the investigation and the context -- appreciate it!

Fri, Nov 4, 12:40 PM · Cloud-Services, Thumbor, Infrastructure-Foundations, Puppet, Data-Persistence
fgiunchedi created P38118 (An Untitled Masterwork).
Fri, Nov 4, 10:46 AM

Thu, Nov 3

fgiunchedi closed T171122: librenms: consider using Distributed Poller with multiple netmon servers as Declined.

I'm going to be bold and decline the task as we don't have any plans to tackle this

Thu, Nov 3, 1:26 PM · SRE
fgiunchedi edited projects for T320721: Decide whether decom'ing hosts mgmt DNS entry should stay or not, added: Infrastructure-Foundations; removed SRE Observability (FY2022/2023-Q1), Observability-Alerting, User-fgiunchedi.

I have gone ahead and excluded decommissioning hosts from syncing hiera data, will let @Volans take care of the cookbook bits

Thu, Nov 3, 9:20 AM · Patch-For-Review, Infrastructure-Foundations, DC-Ops

Wed, Nov 2

fgiunchedi updated the task description for T313229: Production Dispatch Infrastructure.
Wed, Nov 2, 1:55 PM · SRE Observability (FY2022/2023-Q2), Patch-For-Review, User-fgiunchedi
fgiunchedi closed T169860: Replace smokeping with a Prometheus-based solution as Resolved.

I've updated the references to "smokeping" on wikitech to point them to the replacement dashboards, and removed puppet + dns smokeping references. With all of that done, I'll call this task resolved!

Wed, Nov 2, 1:48 PM · SRE Observability (FY2022/2023-Q2), Patch-For-Review, Observability-Metrics, User-fgiunchedi, Prometheus-metrics-monitoring
fgiunchedi updated subscribers of T301944: Web interface to navigate Prometheus alerts and their status.

It was noticed today by @JMeybohm that the prometheus web interface is currently cached (and shouldn't for obvious reasons)

Wed, Nov 2, 11:28 AM · Patch-For-Review, Observability-Metrics