Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Feb 20 2024

mark added a comment to T357847: Requesting access to analytics-privatedata-users for sdeckelmann-wmf.

This is approved.

Feb 20 2024, 11:57 AM · SRE, SRE-Access-Requests

Sep 8 2023

mark added a comment to T345877: Requesting shell access, deployment and analytics-privatedata-users rights for acooper.

Approved.

Sep 8 2023, 11:16 AM · SRE-Access-Requests, SRE

Sep 7 2023

mark added a comment to T344509: Security Issue Access Request for (Kappakayala).

Approved.

Sep 7 2023, 10:47 AM · SecTeam-Processed, Security-Team, Security

Apr 19 2023

mark updated subscribers of T315426: Audit abuse filter wikireplica view rules.

@BTullis @odimitrijevic Given that this is an ongoing privacy leak, could we get some clarity on whether we can get this deployed soon, or how other teams may be able to help if needed?

Apr 19 2023, 11:39 AM · Data-Platform-SRE, Data-Engineering-Planning, SecTeam-Processed, Vuln-Infoleak, Data-Services, Security, Security-Team

Jul 18 2022

mark moved T313102: Uncaught TimeoutError from inactivedc_request caused swift-proxy to wedge itself from Inbox to Backlog on the SRE-swift-storage board.

Jul 18 2022, 12:13 PM · Patch-For-Review, SRE-swift-storage

Jul 11 2022

mark moved T256217: Swift sends ETAG without double-quotes from Inbox to Backlog on the SRE-swift-storage board.

Jul 11 2022, 1:35 PM · Wikimedia-Performance-recommendation, Traffic-Icebox, SRE-swift-storage, SRE, affects-Kiwix-and-openZIM

mark added a comment to T256217: Swift sends ETAG without double-quotes.

This appears to be configurable now in Swift 2.24.0 and later (we currently seem to be running 2.26.0 on 6/8 of frontends...), by enabling a piece of middleware and configuring RFC compliant ETag responses for specific Swift user accounts or containers:
https://docs.openstack.org/swift/latest/middleware.html#module-swift.common.middleware.etag_quoter

Jul 11 2022, 1:31 PM · Wikimedia-Performance-recommendation, Traffic-Icebox, SRE-swift-storage, SRE, affects-Kiwix-and-openZIM

mark added a comment to T256217: Swift sends ETAG without double-quotes.

In T256217#7960730, @Krinkle wrote:

I'm not sure since when, but based on us having <14 days ats-be storage, and based on there still beeing ETag headers on cached responses, I am guessing this is a pretty recent regression.

I'm finding that upload.wikimedia.org responses have neither ETag nor Last-Modified. This is also observed in T295556, but I'm skeptical of whether it is the same given the above caching, but perhaps these heades are both kept in Swift and still used post-Swift upgrade for pre-existing objects?

Jul 11 2022, 1:28 PM · Wikimedia-Performance-recommendation, Traffic-Icebox, SRE-swift-storage, SRE, affects-Kiwix-and-openZIM

Feb 24 2022

mark added a parent task for T292322: Support large files in Shellbox: T302430: <Tech Initiative> Commons Copy-by-URL Image Uploads Slowdown (Shellbox).

Feb 24 2022, 4:37 PM · MW-1.38-notes (1.38.0-wmf.21; 2022-02-07), SRE-swift-storage, Shellbox, serviceops, MW-on-K8s

mark added a subtask for T302430: <Tech Initiative> Commons Copy-by-URL Image Uploads Slowdown (Shellbox): T292322: Support large files in Shellbox.

Feb 24 2022, 4:37 PM · Foundational Technology Requests

Jul 26 2021

mark raised the priority of T263220: Limit concurrency of DPL queries from Medium to High.

Given that the underlying problem that this change might help with has already caused multiple full outages (all wikis affected) in the past year alone and the extension is deployed on quite a few wiki, I'd like to ask this to be looked into again for the near-term. Raising priority to 'high'. Would this be in scope for PET's Clinic Duty? How can SRE help?

Jul 26 2021, 12:31 PM · SRE-Sprint-Week-Sustainability-March2023, serviceops-radar, Wikimedia-Slow-DB-Query, SecTeam-Processed, Security, Vuln-DoS, Sustainability (Incident Followup), Platform Team Workboards (Clinic Duty Team), MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Performance Issue, DynamicPageList (Wikimedia)

Feb 19 2021

mark added a comment to T274459: Eqiad: 2 VM request for GitLab.

In T274459#6841122, @thcipriani wrote:

Whoa, catching up on scrollback overnight. My question is: is this the first anyone in SRE has heard about any of this?

Feb 19 2021, 10:18 AM · GitLab (Initialization), Patch-For-Review, User-brennen, vm-requests, SRE

Jan 22 2021

mark added a comment to T272686: print a list of backed up directories in the MOTD of production servers.

It's purely an idea I've had for a long time, to make it immediately obvious to anyone logging in what is backed up, and what isn't. That should help to:

Jan 22 2021, 11:43 AM · Data-Persistence-Backup, SRE

Oct 8 2020

mark added a comment to T264398: 8-10% response start regression (Varnish 5.1.3-1wm15 -> 6.0.6-1wm1).

Hi all,

Oct 8 2020, 11:52 AM · Patch-For-Review, Performance-Team (Radar), SRE, Traffic

Sep 4 2020

mark added a comment to T262042: Security Issue Access Request for LSobanski.

Approved.

Sep 4 2020, 1:44 PM · Security-Team, Security

Sep 1 2020

mark added a comment to T261760: Requesting access to Production for lsobanski.

Approved.

Sep 1 2020, 3:24 PM · SRE, SRE-Access-Requests

mark added a comment to T261626: Requesting access to Production for klausman.

Approved.

Sep 1 2020, 10:34 AM · SRE, SRE-Access-Requests

Aug 20 2020

mark updated subscribers of T260764: backup2001 RAID controller failure, unable to post 2020-08-19.

@wiki_willy @Papaul It seems we've had an ongoing pattern of crashes with this (rather important) backup host, which means we are not yet able to trust it. Until we are able to resolve this we also cannot decommission the older hosts (that this replaces) either. At the moment the system doesn't even boot. Are there any steps we can take soon to debug this issue? Anything we can help with? Thanks!

Aug 20 2020, 10:35 AM · SRE, ops-codfw

Jul 10 2020

mark added a comment to T256451: Security Issue Access Request for Kormat.

Approved.

Jul 10 2020, 9:17 AM · User-Kormat, Security-Team, Security

May 27 2020

mark added a project to T247028: Database 'INSERT' query rate doubled (module_deps regression?): Platform Team Workboards (Clinic Duty Team).

May 27 2020, 10:43 AM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), Sustainability (Incident Followup), Performance Issue, MediaWiki-ResourceLoader, Performance-Team

Apr 7 2020

mark added a project to T157651: sql.php must not run LoadExtensionSchemaUpdates: DBA.

Apr 7 2020, 12:11 PM · Sustainability (Incident Followup), MW-1.35-notes (1.35.0-wmf.30; 2020-04-28), Wikidata, Growth-Team, StructuredDiscussions, Platform Team Workboards (Clinic Duty Team), Patch-For-Review, Performance-Team, MediaWiki-Maintenance-system

Feb 21 2020

mark added a comment to T245520: 2*10G optics down on cr2-esams.

I am pretty sure there are a bunch of optics (of various kinds) in the "spare" switches, in the bottom of rack OE15. Unfortunately those switches are not powered up, and certainly not configured and remote manageable - something we should probably fix on next visit.

Feb 21 2020, 12:17 PM · ops-esams, netops, SRE

Feb 18 2020

mark added a comment to T245520: 2*10G optics down on cr2-esams.

There are multiple 10G LR optics on-site for sure. Longer distance ones, less so.

Feb 18 2020, 2:59 PM · ops-esams, netops, SRE

Feb 13 2020

mark added a comment to T245060: Pybal should reject a confctl configuration that indicates only one cp-text is pooled.

Personally I don't think Pybal should be rejecting that; it's a valid configuration from a technical standpoint, and there can be valid reasons to have it, at least temporarily. But we may decide that in our specific environment that should be avoided at all cost, so perhaps that logic should be implemented elsewhere - in the code that manages pooling state.

Feb 13 2020, 11:49 AM · SRE-Sprint-Week-Sustainability-March2023, Traffic, Traffic-Icebox, Sustainability (Incident Followup), PyBal

Feb 12 2020

mark added a comment to T236437: (Need By Dec 20) rack/setup/install mw13[49-84].eqiad.wmnet.

@wiki_willy With Chris having been ill the past few days, what's a realistic new ETA for this?

Feb 12 2020, 4:47 PM · serviceops, SRE

Dec 10 2019

mark added a comment to T238909: Proposal: simplify set up of a new load-balanced service on kubernetes.

In T238909#5727693, @akosiaris wrote:

Dec 10 2019, 12:02 PM · SRE, Prod-Kubernetes, PyBal, Traffic, serviceops

mark added a comment to T238909: Proposal: simplify set up of a new load-balanced service on kubernetes.

I agree - it seems that PyBal adds no real value here, because it's essentially load balancing the k8s load balancers. Why couldn't our caching layer do that directly, and know about all the k8s proxies/nodes directly and do health checks for them?

Dec 10 2019, 11:43 AM · SRE, Prod-Kubernetes, PyBal, Traffic, serviceops

Nov 28 2019

mark moved T237041: wipe backup-array1 from Backlog to Blocked on the ops-esams board.

Nov 28 2019, 11:34 AM · ops-esams, SRE

mark moved T174637: Setup esams atlas anchor from Racking Tasks to Blocked on the ops-esams board.

Nov 28 2019, 11:34 AM · SRE, netops, ops-esams

Nov 27 2019

mark added a comment to T184066: rack/setup/install ps[12]-oe1[456]-esams.

In T184066#5695891, @RobH wrote:

In T184066#5694288, @Papaul wrote:

qfx5100-spare1, psu 0 {#20156} to ps2-oe15-esams:17
qfx5100-spare2, psu 0 {#20157} to ps2-oe15-esams:16
qfx5100-spare1, psu 1 {#20159} to ps1-oe15-esams:2
qfx5100-spare2, psu 1 {#20158} to ps1-oe15-esams:3
asw2-oe16-esams:psu0 {#20162} to ps2-oe16-esams:26
asw2-oe16-esams:psu1 {#20164} to ps1-oe16-esams:26

All the above are done, but NOT

scs1-oe15-esams:psu1 {#20163} to ps2-oe15-esams:34
scs1-oe15-esams:psu2 {#20164} to ps1-oe15-esams:34

as there is no scs-oe15-esams, not sure what that is. Mark's comment T184066#5694430 covers scs-oe16-esams.

Nov 27 2019, 11:05 AM · SRE, ops-esams

Nov 26 2019

mark added a comment to T184066: rack/setup/install ps[12]-oe1[456]-esams.

scs1-oe16-esams:psu1 {#20163} to ps2-oe16-esams:34
scs1-oe16-esams:psu2 {#20164} to ps1-oe16-esams:34

Nov 26 2019, 5:54 PM · SRE, ops-esams

mark closed T238835: apply asset tags to cable managers as Resolved.

Nov 26 2019, 4:39 PM · SRE, ops-esams

mark moved T237009: Add missing labels for equipment and cables from Procurement to Blocked on the ops-esams board.

Nov 26 2019, 4:38 PM · DC-Ops, ops-esams, SRE

mark updated the task description for T237009: Add missing labels for equipment and cables.

Nov 26 2019, 4:12 PM · DC-Ops, ops-esams, SRE

mark added a comment to T237009: Add missing labels for equipment and cables.

cr3-esams now has its power cables labeled:

Nov 26 2019, 4:11 PM · DC-Ops, ops-esams, SRE

mark added a comment to T237009: Add missing labels for equipment and cables.

cr2-esams now has its power cables labeled:

Nov 26 2019, 3:58 PM · DC-Ops, ops-esams, SRE

mark closed T237006: Relabel cables with duplicate IDs as Resolved.

All duplicate ids have been fixed, labels replaced for one pair and updated in netbox.

Nov 26 2019, 3:00 PM · SRE, ops-esams

mark updated the task description for T237009: Add missing labels for equipment and cables.

Nov 26 2019, 2:36 PM · DC-Ops, ops-esams, SRE

mark added a comment to T237009: Add missing labels for equipment and cables.

I've filled out all red cells in the (original) bootstrap spreadsheet.

Nov 26 2019, 2:36 PM · DC-Ops, ops-esams, SRE

mark added a comment to T238835: apply asset tags to cable managers.

All 7 cable managers have been asset tagged and put into Netbox with the appropriate info and rack position.

Nov 26 2019, 2:26 PM · SRE, ops-esams

mark added a comment to T237009: Add missing labels for equipment and cables.

All SERVER power cords have been audited in this sheet: https://docs.google.com/spreadsheets/d/1RMb6lMCc94wUj6MgSm1yYdnAC3SUsZIRj8zHLtxRx4o/edit?usp=sharing

Nov 26 2019, 1:26 PM · DC-Ops, ops-esams, SRE

mark updated the task description for T237009: Add missing labels for equipment and cables.

Nov 26 2019, 1:25 PM · DC-Ops, ops-esams, SRE

mark closed T237014: Update spare QFX labels as Resolved.

Done.

Nov 26 2019, 10:15 AM · ops-esams, SRE

Nov 25 2019

mark updated the task description for T237030: Setup new MX204 in knams.

Nov 25 2019, 6:14 PM · netops, SRE, ops-esams

mark updated the task description for T237030: Setup new MX204 in knams.

Nov 25 2019, 5:43 PM · netops, ops-esams, SRE

Nov 4 2019

mark moved T234450: Special:Contributions requests with a high &limit= caused excessive database load from Done to Discussing on the Platform Team Workboards (Clinic Duty Team) board.

CPT: please take a new look, thanks :)

Nov 4 2019, 5:17 PM · User-notice-archive, MW-1.31-release-notes, MW-1.33-notes, MW-1.34-notes, Platform Engineering, Security, MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), Vuln-DoS, Performance Issue, MediaWiki-Special-pages, Wikimedia-production-error

Oct 24 2019

mark added a comment to T232887: The phabricator server, WMF7426, was given to us temporarily, we would like to make it permanent.

I'm a bit confused; as far as I know the old plan was always to have HA of Phabricator between eqiad and codfw, and the linked task T190572 also talks about that. So is that no longer the case, and if so, why is that? I believe there have been blockers & complications for that deployment, but are they documented anywhere? How does this task relate to those plans, why do we feel failover within eqiad is (also) needed?

Oct 24 2019, 3:57 PM · SRE, hardware-requests, Release-Engineering-Team (Development services), serviceops, Phabricator

Oct 22 2019

mark added projects to T234450: Special:Contributions requests with a high &limit= caused excessive database load: Platform Engineering, Platform Team Workboards (Clinic Duty Team).

Could CPT take a look at this please? Thanks!

Oct 22 2019, 9:54 AM · User-notice-archive, MW-1.31-release-notes, MW-1.33-notes, MW-1.34-notes, Platform Engineering, Security, MW-1.35-notes (1.35.0-wmf.5; 2019-11-05), Vuln-DoS, Performance Issue, MediaWiki-Special-pages, Wikimedia-production-error

Sep 17 2019

mark added a comment to T231387: Updating DNS records (pr.wikimedia.org).

What's the status of this? Is this done and working?

Sep 17 2019, 12:25 PM · Mail, WMF-Communications, SRE

Sep 12 2019

mark added a project to T231387: Updating DNS records (pr.wikimedia.org): Mail.

Sep 12 2019, 12:55 PM · Mail, WMF-Communications, SRE

mark updated subscribers of T231387: Updating DNS records (pr.wikimedia.org).

In T231387#5471833, @Varnent wrote:

@mark - Thank you very much for that thoughtful and helpful reply!

Talking it over, we would like to try the first option if you believe that will work.

So how do we go about getting this setup?

Anusha Alikhan
aalikhan@pr.wikimedia.org

Samantha Lien
slien@pr.wikimedia.org

Sep 12 2019, 12:54 PM · Mail, WMF-Communications, SRE

Sep 5 2019

mark changed the status of T231387: Updating DNS records (pr.wikimedia.org) from Stalled to Open.

Sep 5 2019, 2:32 PM · Mail, WMF-Communications, SRE

mark added a comment to T231387: Updating DNS records (pr.wikimedia.org).

Hi Anusha, Greg,

Sep 5 2019, 2:32 PM · Mail, WMF-Communications, SRE

Aug 9 2019

mark added a comment to T229755: csw2-esams's VCP link flapped.

EX4200 can also have any port converted as VC - just won't be as fast, max 10Gbps.

Aug 9 2019, 10:02 AM · SRE, netops

Aug 6 2019

mark added a comment to T229860: SRE Onboarding for Sukhbir Singh.

Approved for access.

Aug 6 2019, 11:13 AM · SRE-Access-Requests, Traffic, SRE

Jul 23 2019

mark raised the priority of T228720: stub for enwiki broken, attempt to load content for bad rev during sha1 retrieval from High to Unbreak Now!.

Because this means that right now stub dumps generation for (at least) enwiki and dewiki and several other is broken, we have only a few days to fix this before the dumps need to be done at the end of the month. Setting UBN...

Jul 23 2019, 1:40 PM · Platform Team Initiatives (MCR), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Dumps-Generation

Apr 16 2019

mark renamed T218570: DB planning: include a writeable (?) misc DB cluster in codfw for WMCS from DB planning: include a misc cluster in codfw to DB planning: include a writeable (?) misc DB cluster in codfw for WMCS.

Apr 16 2019, 10:43 AM · DBA, cloud-services-team (Kanban)

Apr 5 2019

mark updated the task description for T219805: Investigate Doctrine DBAL usage possibility.

Apr 5 2019, 11:13 AM · User-Addshore, Wikidata-Trailblazing-Exploration, Wikidata, TechCom, Patch-For-Review

mark added a comment to T219805: Investigate Doctrine DBAL usage possibility.

While I agree with Daniel and others that the use of the MediaWiki db connection/load balancing layer is an absolute minimum requirement, there are quite a few other potential problems that could affect the security/privacy, reliability or maintainability of our data and services, if Doctrine is to be used to access MediaWiki's existing databases in any way (it's definitely easier if done in separate, not connected database clusters). However this ticket so far is very sparse on details, and we don't have the information we need to make an informed decision. I've requested access to the linked document yesterday, but so far it wasn't granted yet. Alternatively, could this perhaps be replicated here on Phabricator so everyone involved can build an informed opinion? Thanks. :)

Apr 5 2019, 11:13 AM · User-Addshore, Wikidata-Trailblazing-Exploration, Wikidata, TechCom, Patch-For-Review

Apr 1 2019

mark added a project to T190379: RFC: Re-establish the development policies: DBA.

There has been some concern from our DBAs the archiving of the old policy will make it even harder for developers to find out about what database-related requirements their code should fulfill, and what the processes would be to get any schema or query changes deployed (such as a link to the Schema_changes page). The old information on database related requirements, while admittedly a bit outdated, was discussed as an RFC at the time: https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2015-09-16

Apr 1 2019, 1:08 PM · DBA, Performance-Team, TechCom-RFC (TechCom-RFC-Closed), TechCom

Mar 22 2019

Effie Mouzeli <effie@wikimedia.org> committed rMSCA5e1eced094fe: Add unit testing of scap main.py (authored by mark).

Add unit testing of scap main.py

Mar 22 2019, 11:33 AM

Mar 21 2019

Mill <mill@mail.com> committed rMSCA135f64c71c56: 3%5eaaaaaaaaaaaa (authored by mark).

3%5eaaaaaaaaaaaa

Mar 21 2019, 12:11 AM

Mar 6 2019

Effie Mouzeli <effie@wikimedia.org> committed rMSCA8d204fe0b7a9: WiP: Add unit testing of scap main.py (authored by mark).