Page MenuHomePhabricator

fgiunchedi (Filippo Giunchedi)
/* No comment */

Projects (18)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 8:06 AM (237 w, 4 d)
Availability
Available
IRC Nick
godog
LDAP User
Filippo Giunchedi
MediaWiki User
Filippo Giunchedi [ Global Accounts ]

Recent Activity

Today

fgiunchedi closed T221202: kafka-logging __consumer_offsets topic traffic increased as Resolved.

To summarize: what I would do is closing this task and re-open if any increase in traffic is noticed. At that point we'll be able to quickly check what it is going on, and debug further.

Tue, Apr 23, 3:33 PM · Wikimedia-Logstash
fgiunchedi moved T141704: Unable to delete certain files due to "inconsistent state within the internal storage backends" from Backlog to Radar on the User-fgiunchedi board.
Tue, Apr 23, 12:29 PM · MediaWiki-File-management, Wikimedia-production-error, User-fgiunchedi, Multimedia, Commons, Operations, media-storage
fgiunchedi moved T214734: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp) from Backlog to Radar on the User-fgiunchedi board.
Tue, Apr 23, 12:29 PM · User-fgiunchedi, Operations, PHP 7.2 support
fgiunchedi moved T213933: PoC alert/notification functionality with Elastic Stack from Backlog to Doing on the User-fgiunchedi board.
Tue, Apr 23, 12:29 PM · User-fgiunchedi, Patch-For-Review, Restricted Project, Security-Team, Wikimedia-Logstash
fgiunchedi added a comment to T220907: Degraded RAID on ms-be1013.

@fgiunchedi do you want to power off unplug and power on...that will clear the issue

Tue, Apr 23, 12:18 PM · ops-eqiad, Operations
fgiunchedi added a comment to T221450: Broken puppet in the 'logging' project.

Fixed filippo-log-jessie01 !

Tue, Apr 23, 12:12 PM · Operations
fgiunchedi placed T221450: Broken puppet in the 'logging' project up for grabs.
Tue, Apr 23, 12:12 PM · Operations
fgiunchedi created T221618: ores icinga check for grafana alert returns 404.
Tue, Apr 23, 12:10 PM · Scoring-platform-team, ORES
fgiunchedi added a comment to T220326: prometheus1004 /srv/prometheus/ops almost full.

The specific check this was about, disk on prometheus 1004, now has the Icinga link:

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=prometheus1004

And all other DISK checks on all hosts will also link to

https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space

Does this resolve the ticket or was there something else left to be done?

Tue, Apr 23, 8:25 AM · Patch-For-Review, monitoring, Operations
fgiunchedi added a comment to T221202: kafka-logging __consumer_offsets topic traffic increased.

Also note that after the last reset on 16/04 __consumer_offsets now isn't increasing anymore, I'm wondering if it has to do with logstash upgrades (and mismatching versions? thinking out loud)

Tue, Apr 23, 8:22 AM · Wikimedia-Logstash

Thu, Apr 18

fgiunchedi added a comment to T187987: 100% of Prometheus traffic served by Prometheus v2.

Both prometheus1004 and prometheus2004 are now in service with Prometheus v2! So far no issues, syncing the whole storage from their counterparts took ~2h each.

Thu, Apr 18, 7:59 AM · Patch-For-Review, monitoring, Operations

Wed, Apr 17

fgiunchedi updated the task description for T221202: kafka-logging __consumer_offsets topic traffic increased.
Wed, Apr 17, 8:33 AM · Wikimedia-Logstash
fgiunchedi created T221202: kafka-logging __consumer_offsets topic traffic increased.
Wed, Apr 17, 8:27 AM · Wikimedia-Logstash

Tue, Apr 16

fgiunchedi created T221068: decom ms-be201[345].
Tue, Apr 16, 10:45 AM · User-fgiunchedi, Operations

Mon, Apr 15

fgiunchedi moved T220787: Fix RAID handler alert and puppet facter to work with Gen10 hosts and ssacli tool from Backlog to In progress on the monitoring board.
Mon, Apr 15, 3:20 PM · Patch-For-Review, Operations, Icinga, monitoring
fgiunchedi moved T216985: google safe browsing icinga checks sporadic UNKNOWN due to 404 from Up next to In progress on the monitoring board.
Mon, Apr 15, 3:16 PM · Patch-For-Review, monitoring, Operations
fgiunchedi added a comment to T216985: google safe browsing icinga checks sporadic UNKNOWN due to 404.

Consensus at the monitoring meeting is to remove this old check, as it is basically useless now.

Mon, Apr 15, 3:16 PM · Patch-For-Review, monitoring, Operations
fgiunchedi moved T213157: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) from In Dev/Progress to Backlog on the Wikimedia-Logstash board.
Mon, Apr 15, 2:54 PM · User-fgiunchedi, User-herron, Operations, Wikimedia-Logstash
fgiunchedi moved T220103: TEC6: Logging infrastructure (Q4 2018/19 goal) from Backlog to In Dev/Progress on the Wikimedia-Logstash board.
Mon, Apr 15, 2:54 PM · Patch-For-Review, Wikimedia-Logstash, User-fgiunchedi, Operations, Goal
fgiunchedi edited projects for T220103: TEC6: Logging infrastructure (Q4 2018/19 goal), added: Wikimedia-Logstash; removed monitoring.
Mon, Apr 15, 2:54 PM · Patch-For-Review, Wikimedia-Logstash, User-fgiunchedi, Operations, Goal
fgiunchedi moved T220500: logstash1012 lock up caused central logging stuck from Up next to In Dev/Progress on the Wikimedia-Logstash board.
Mon, Apr 15, 2:53 PM · User-herron, Wikimedia-Logstash, Operations
fgiunchedi moved T220500: logstash1012 lock up caused central logging stuck from Backlog to Up next on the Wikimedia-Logstash board.
Mon, Apr 15, 2:53 PM · User-herron, Wikimedia-Logstash, Operations
fgiunchedi moved T187987: 100% of Prometheus traffic served by Prometheus v2 from Backlog to In progress on the monitoring board.
Mon, Apr 15, 2:50 PM · Patch-For-Review, monitoring, Operations
fgiunchedi moved T213918: Investigate distributed and long term storage solutions for Prometheus from Backlog to Up next on the monitoring board.
Mon, Apr 15, 2:40 PM · User-fgiunchedi, Goal, monitoring, Operations
fgiunchedi moved T220116: Migrate all metrics originated by PoPs from statsd to Prometheus from Backlog to In progress on the monitoring board.
Mon, Apr 15, 2:39 PM · User-fgiunchedi, Operations, monitoring, Goal
fgiunchedi moved T220709: Upgrade statsd_exporter to 0.9 from Backlog to In progress on the monitoring board.
Mon, Apr 15, 2:39 PM · Core Platform Team Backlog (Watching / External), Services (watching), Analytics, EventBus, monitoring, User-fgiunchedi, Operations
fgiunchedi moved T205862: Expand modern metrics infrastructure coverage (2018-19 Q2 goal) from In progress to Backlog on the monitoring board.
Mon, Apr 15, 2:39 PM · Patch-For-Review, User-fgiunchedi, monitoring, Operations
fgiunchedi moved T213288: TEC6: Upgrade metrics monitoring infrastructure core components (Q3 2018/19 goal) from In progress to Backlog on the monitoring board.
Mon, Apr 15, 2:38 PM · User-fgiunchedi, Goal, monitoring, Operations
fgiunchedi awarded T220838: Upgrade grafana to 6.1 a Love token.
Mon, Apr 15, 2:07 PM · monitoring, Operations
fgiunchedi added a comment to T220907: Degraded RAID on ms-be1013.

Tried to reboot the host in the hope the controller freaked out and a reboot would "fix" it or at least reset. However the host isn't coming back, and console says No more sessions are available for this type of connection!. An mc reset cold as per https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/Dell_PowerEdge_RN20_Gen8#Troubleshooting also didn't seem to help.

Mon, Apr 15, 2:00 PM · ops-eqiad, Operations

Fri, Apr 12

fgiunchedi assigned T220785: Requesting deployment access for santhosh to akosiaris.

Followed up with Alex, assigning to him.

Fri, Apr 12, 3:16 PM · Patch-For-Review, Operations, SRE-Access-Requests
fgiunchedi moved T220641: Change ownership of wikimania-program@lists.wikimedia.org from Backlog to List maintenance on the Wikimedia-Mailing-lists board.
Fri, Apr 12, 2:10 PM · Wikimedia-Mailing-lists, Operations
fgiunchedi moved T220104: TEC6: Metrics monitoring infrastructure (Q4 2018/19 goal) from Backlog to In progress on the monitoring board.
Fri, Apr 12, 1:15 PM · User-fgiunchedi, Operations, monitoring, Goal
fgiunchedi added a comment to T220784: Investigate if a Prometheus exporter for the AMD GPU(s) can be easily created.

+1, something that parses the json and write metrics in text format for node-exporter to pick up sounds good to me

Fri, Apr 12, 9:46 AM · User-Elukey, Operations, Analytics
fgiunchedi reassigned T220641: Change ownership of wikimania-program@lists.wikimedia.org from fgiunchedi to ICueva.

This is done, please let us know if you need a new password for the list as well!

Fri, Apr 12, 9:10 AM · Wikimedia-Mailing-lists, Operations

Thu, Apr 11

Ottomata awarded T220709: Upgrade statsd_exporter to 0.9 a Like token.
Thu, Apr 11, 3:27 PM · Core Platform Team Backlog (Watching / External), Services (watching), Analytics, EventBus, monitoring, User-fgiunchedi, Operations
fgiunchedi created T220709: Upgrade statsd_exporter to 0.9.
Thu, Apr 11, 3:25 PM · Core Platform Team Backlog (Watching / External), Services (watching), Analytics, EventBus, monitoring, User-fgiunchedi, Operations
fgiunchedi assigned T220691: Add Rosalie to the ldap/wmde group to RStallman-legalteam.

@RStallman-legalteam I'm assigning this task to you for NDA processing, thanks!

Thu, Apr 11, 12:48 PM · WMF-Legal, LDAP-Access-Requests, Operations, WMF-NDA-Requests
fgiunchedi updated the task description for T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.
Thu, Apr 11, 12:41 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
fgiunchedi added a comment to T220641: Change ownership of wikimania-program@lists.wikimedia.org.

We certainly can! (I'm on SRE clinic duty this week, hence handing ML requests too)

Thu, Apr 11, 8:50 AM · Wikimedia-Mailing-lists, Operations

Wed, Apr 10

fgiunchedi added a comment to T217142: [WIP] [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors.

Notes from today's meeting https://etherpad.wikimedia.org/p/clients-error-logging

Wed, Apr 10, 3:48 PM · User-herron, Reading-Infrastructure-Team-Backlog, Wikimedia-Logstash
fgiunchedi added a project to T213933: PoC alert/notification functionality with Elastic Stack: User-fgiunchedi.
Wed, Apr 10, 1:35 PM · User-fgiunchedi, Patch-For-Review, Restricted Project, Security-Team, Wikimedia-Logstash
fgiunchedi added a project to T219764: jessie rsyslog upgrade problems: User-fgiunchedi.
Wed, Apr 10, 1:34 PM · User-fgiunchedi, Operations
fgiunchedi updated the task description for T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.
Wed, Apr 10, 1:30 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
fgiunchedi updated the task description for T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.
Wed, Apr 10, 1:28 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
fgiunchedi created T220590: Decom ms-be101[345].
Wed, Apr 10, 10:00 AM · User-fgiunchedi, media-storage, Operations
fgiunchedi closed T217368: rack/setup/deploy restbase2019 and restbase2020 as Resolved.

Hosts are in service, resolving

Wed, Apr 10, 8:10 AM · Patch-For-Review, Operations, ops-codfw

Tue, Apr 9

fgiunchedi added a comment to T220510: Removal of If-Cached VCL support.

+1 on my side to remove If-Cached as we're not using swiftrepl normally. When we do use swiftrepl however it isn't through varnish anyways but swift <-> swift directly.

Tue, Apr 9, 2:34 PM · Patch-For-Review, Traffic, Operations
fgiunchedi added a comment to T220500: logstash1012 lock up caused central logging stuck.

For exact reasons still unknown it looks like traffic to lithium/wezen also stopped around the same time logstash1012 went offline:

Tue, Apr 9, 2:00 PM · User-herron, Wikimedia-Logstash, Operations
fgiunchedi renamed T220500: logstash1012 lock up caused central logging stuck from logstash1012 lock up to logstash1012 lock up caused central logging stuck.
Tue, Apr 9, 1:49 PM · User-herron, Wikimedia-Logstash, Operations
fgiunchedi added a comment to T220500: logstash1012 lock up caused central logging stuck.

Additionally it looks like syslog traffic towards central logs (wezen/lithium) dropped at around the same time, which is unexpected as the two destinations (central syslog + kafka) should be independent.

Tue, Apr 9, 1:05 PM · User-herron, Wikimedia-Logstash, Operations
fgiunchedi created T220500: logstash1012 lock up caused central logging stuck.
Tue, Apr 9, 12:23 PM · User-herron, Wikimedia-Logstash, Operations
fgiunchedi triaged T217355: Revoke production prometheus fundraising access as Normal priority.
Tue, Apr 9, 8:38 AM · Operations, netops, Patch-For-Review, fundraising-tech-ops
fgiunchedi triaged T153068: Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads as Normal priority.
Tue, Apr 9, 8:38 AM · cloud-services-team (Kanban), Data-Services, Operations, video2commons
fgiunchedi triaged T218686: Create Gerrit Administrator right policy as Normal priority.
Tue, Apr 9, 8:38 AM · Operations, Release-Engineering-Team, Gerrit
fgiunchedi triaged T218995: re-enable deprecation warning logger on elasticsearch once issues are solved as Normal priority.
Tue, Apr 9, 8:38 AM · CirrusSearch, Discovery-Search, Operations
fgiunchedi triaged T218994: Epic: Deprecation warning on elasticsearch 6 as Normal priority.
Tue, Apr 9, 8:38 AM · Epic, Discovery-Search, CirrusSearch, Operations
fgiunchedi triaged T219129: Allow directing a percentage of API traffic to PHP7 as Normal priority.
Tue, Apr 9, 8:38 AM · User-jijiki, Traffic, Operations, serviceops
fgiunchedi triaged T219150: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters as Normal priority.
Tue, Apr 9, 8:38 AM · User-jijiki, Operations, serviceops
fgiunchedi triaged T219507: Create cookbook to reindex into elasticsearch / cirrus as Normal priority.
Tue, Apr 9, 8:38 AM · Operations, Discovery-Search
fgiunchedi triaged T219486: Send peering requests to AS with the worst TTFB as Normal priority.
Tue, Apr 9, 8:38 AM · Traffic, Performance-Team, Operations
fgiunchedi triaged T219400: Make authdns-update compatible with local emergency changes as Normal priority.
Tue, Apr 9, 8:38 AM · Traffic, Operations
fgiunchedi triaged T219274: cronspam: cross-validate-accounts as Normal priority.
Tue, Apr 9, 8:38 AM · Operations
fgiunchedi triaged T219589: Analyze and amend (if necessary) workflow of user reporting and detecting large regressions/outages as Normal priority.
Tue, Apr 9, 8:38 AM · Operations, Release-Engineering-Team, Wikimedia-Incident, Phabricator
fgiunchedi triaged T219586: Degraded RAID on cp4032 as Normal priority.
Tue, Apr 9, 8:38 AM · Operations, ops-ulsfo
fgiunchedi triaged T219764: jessie rsyslog upgrade problems as Normal priority.
Tue, Apr 9, 8:38 AM · User-fgiunchedi, Operations
fgiunchedi triaged T219921: Move cxserver logging to new logging pipeline as Normal priority.
Tue, Apr 9, 8:38 AM · Patch-For-Review, CX-cxserver, Core Platform Team Backlog (Watching / External), Services (watching), service-runner, Wikimedia-Logstash, Operations
fgiunchedi triaged T219799: Create cookbook to reset readonly indices on elasticsearch clusters as Normal priority.
Tue, Apr 9, 8:38 AM · Patch-For-Review, Operations, Wikimedia-Incident, Discovery-Search (Current work)
fgiunchedi triaged T219854: Broken disk on ms-be2026 as Normal priority.
Tue, Apr 9, 8:38 AM · Patch-For-Review, Operations, ops-codfw
fgiunchedi triaged T219927: Move parsoid logging to new logging pipeline as Normal priority.
Tue, Apr 9, 8:38 AM · Parsoid, Core Platform Team Backlog (Watching / External), Services (watching), service-runner, Wikimedia-Logstash, Operations
fgiunchedi triaged T219989: mwdebug2001 and mwdebug2002 "/" almost full as Normal priority.
Tue, Apr 9, 8:38 AM · Release-Engineering-Team (Backlog), Operations
fgiunchedi triaged T220103: TEC6: Logging infrastructure (Q4 2018/19 goal) as Normal priority.
Tue, Apr 9, 8:37 AM · Patch-For-Review, Wikimedia-Logstash, User-fgiunchedi, Operations, Goal
fgiunchedi triaged T220004: netbox: User's groups not updated as Normal priority.
Tue, Apr 9, 8:37 AM · Operations
fgiunchedi triaged T220116: Migrate all metrics originated by PoPs from statsd to Prometheus as Normal priority.
Tue, Apr 9, 8:37 AM · User-fgiunchedi, Operations, monitoring, Goal
fgiunchedi triaged T220193: Degraded RAID on cp3041 as Normal priority.
Tue, Apr 9, 8:37 AM · ops-esams, Operations
fgiunchedi triaged T220104: TEC6: Metrics monitoring infrastructure (Q4 2018/19 goal) as Normal priority.
Tue, Apr 9, 8:37 AM · User-fgiunchedi, Operations, monitoring, Goal
fgiunchedi triaged T220194: Degraded RAID on cp3034 as Normal priority.
Tue, Apr 9, 8:37 AM · ops-esams, Operations
fgiunchedi triaged T220355: decom netmon1003 as Normal priority.
Tue, Apr 9, 8:37 AM · Patch-For-Review, Operations
fgiunchedi triaged T220361: Audit our infrastructure for authenticated services as Normal priority.
Tue, Apr 9, 8:37 AM · Operations
fgiunchedi triaged T220362: Evaluate SSO solutions as Normal priority.
Tue, Apr 9, 8:37 AM · Operations
fgiunchedi triaged T220380: Upload Zuul 2.5.1-wmf7 package to apt.wikimedia.org as Normal priority.
Tue, Apr 9, 8:37 AM · Continuous-Integration-Infrastructure, Operations
fgiunchedi triaged T220390: Audit existing Kafka main producers/consumers and document their configuration and use cases as Normal priority.
Tue, Apr 9, 8:37 AM · Operations
fgiunchedi triaged T220389: Review current architecture/capacity and establish plan for Kafka main cluster upgrade/refresh to cover needs for next 2-3 years as Normal priority.
Tue, Apr 9, 8:37 AM · Operations
fgiunchedi triaged T220391: Establish guideline documentation for Kafka cluster use cases (main, jumbo, logging, etc.) as Normal priority.
Tue, Apr 9, 8:37 AM · Operations
fgiunchedi triaged T220416: Reset password for wikimedia-gh mailing list as Normal priority.
Tue, Apr 9, 8:37 AM · Operations, Wikimedia-Mailing-lists
fgiunchedi triaged T219898: Add WDoranWMF to `wmf` LDAP group as Normal priority.
Tue, Apr 9, 8:35 AM · Operations, LDAP-Access-Requests
fgiunchedi triaged T220226: LDAP access to the wmf group for Evan Prodromou as Normal priority.
Tue, Apr 9, 8:35 AM · Patch-For-Review, Operations, LDAP-Access-Requests
fgiunchedi triaged T219086: Add legoktm to gerritadmin LDAP group (restoring previously held access) as Normal priority.
Tue, Apr 9, 8:34 AM · Release-Engineering-Team (Kanban), User-greg, LDAP-Access-Requests
fgiunchedi moved T220416: Reset password for wikimedia-gh mailing list from Backlog to List maintenance on the Wikimedia-Mailing-lists board.
Tue, Apr 9, 8:29 AM · Operations, Wikimedia-Mailing-lists
fgiunchedi added a comment to T220416: Reset password for wikimedia-gh mailing list.

@Nkansahrexford you should have received the new password via email, please confirm!

Tue, Apr 9, 8:28 AM · Operations, Wikimedia-Mailing-lists

Mon, Apr 8

fgiunchedi added a comment to T218367: Create MoveCom mailing list for Movement communications group.

Actually, this will essentially be a replacement for the existing ComCom list. Perhaps for archive preservation it would be better to rename that list? Is that possible?

Mon, Apr 8, 1:36 PM · Wikimedia-Mailing-lists, Operations
fgiunchedi added a comment to T219923: Move graphoid logging to new logging pipeline.

Apparently graphoid is still using service::node::config and not the config template in the deployment repo.

Given that graphoid will be switched to k8s soon,

It will? We are still waiting on T211881 for an owner for the service to appear before that is even considered.

should we just postpone this until the switch, move graphoid to deployment-repo config or do the puppet work to enable rsyslog in service::node::config? What do you think @akosiaris @fgiunchedi @mobrovac ?

I would postpone any work until T211881 is completed tbh, unless we can't do otherwise.

Mon, Apr 8, 12:25 PM · Core Platform Team Kanban (Blocked Externally), Services (blocked), Core Platform Team (Security, stability, performance and scalability (TEC1)), service-runner, Wikimedia-Logstash, Operations
fgiunchedi removed a project from T219446: Terminate Wikimetrics: Wikimedia-Mailing-lists.

Removing mailing-lists tag since work there is done.

Mon, Apr 8, 12:21 PM · Operations, Analytics
fgiunchedi added a project to T218367: Create MoveCom mailing list for Movement communications group: Wikimedia-Mailing-lists.
Mon, Apr 8, 12:20 PM · Wikimedia-Mailing-lists, Operations
fgiunchedi removed a project from T218367: Create MoveCom mailing list for Movement communications group: Wikimedia-Mailing-lists.
Mon, Apr 8, 12:20 PM · Wikimedia-Mailing-lists, Operations
fgiunchedi removed a project from T211835: Sunset Wikimetrics : Wikimedia-Mailing-lists.

Removing mailing-lists tag since work there is done.

Mon, Apr 8, 12:20 PM · Operations, Patch-For-Review, Analytics-Kanban, Analytics
fgiunchedi moved T219898: Add WDoranWMF to `wmf` LDAP group from Backlog to Awaiting User Input on the LDAP-Access-Requests board.
Mon, Apr 8, 11:44 AM · Operations, LDAP-Access-Requests
fgiunchedi added a comment to T219898: Add WDoranWMF to `wmf` LDAP group.

@WDoranWMF access to wmf ldap group is granted, please confirm/verify !

Mon, Apr 8, 11:44 AM · Operations, LDAP-Access-Requests
fgiunchedi moved T220226: LDAP access to the wmf group for Evan Prodromou from Backlog to Awaiting User Input on the LDAP-Access-Requests board.
Mon, Apr 8, 11:26 AM · Patch-For-Review, Operations, LDAP-Access-Requests
fgiunchedi added a comment to T220226: LDAP access to the wmf group for Evan Prodromou.

@EvanProdromou you should now have access to wmf ldap group, please confirm/verify!

Mon, Apr 8, 10:25 AM · Patch-For-Review, Operations, LDAP-Access-Requests