Page MenuHomePhabricator

Eevans (Eric Evans)
Staff Site Reliability Engineer

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Feb 27 2015, 10:47 PM (405 w, 2 d)
Availability
Available
IRC Nick
urandom
LDAP User
Eevans
MediaWiki User
Unknown

Recent Activity

Fri, Dec 2

Eevans updated subscribers of T324113: Provision restbase-dev200[1-3], decommission restbase-dev100[4-6].

@hnowlan Question: Do we still use this machine to run RESTBase on for anything, or is it purely for Cassandra these days? Can we provision the new machines as say cassandra-dev200[1-3]?

Fri, Dec 2, 1:55 AM · Cassandra

Thu, Dec 1

Eevans added a comment to T307035: Relocate hosts: aqs10[3-5] .

@Eevans take your time, I just want to make sure that we're not falling behind on-site. Let me know whenever you're ready.

Thu, Dec 1, 7:08 PM · SRE, DC-Ops, ops-eqiad, Cassandra, User-Eevans
Eevans updated the task description for T307035: Relocate hosts: aqs10[3-5] .
Thu, Dec 1, 7:06 PM · SRE, DC-Ops, ops-eqiad, Cassandra, User-Eevans
Eevans removed a project from T204024: Store WikibaseQualityConstraint check data in persistent storage instead of in the cache: Cassandra.
Thu, Dec 1, 7:01 PM · User-ItamarWMDE, User-Addshore, Dependency-Tracking, Platform Team Legacy (Designing), Services (designing), wdwb-tech, Wikibase-Quality-Constraints, Wikibase-Quality, Wikidata
Eevans closed T253244: Upstream gocql bug effects Kask as Resolved.

This is complete with the deployment of Kask v1.0.10

Thu, Dec 1, 7:00 PM · Patch-For-Review, Sustainability (Incident Followup), User-Eevans, Platform Team Workboards (Clinic Duty Team), Cassandra
Eevans closed T283838: Kask: gocql: no hosts available in the pool errors as Resolved.

This is complete with the deployment of Kask v1.0.10

Thu, Dec 1, 7:00 PM · Cassandra
Eevans closed T302278: Final cleanup tasks related to the AQS cluster migration as Resolved.

Done.

Thu, Dec 1, 6:56 PM · Shared-Data-Infrastructure, Cassandra
Eevans closed T302278: Final cleanup tasks related to the AQS cluster migration, a subtask of T249755: Cassandra3 migration for Analytics AQS, as Resolved.
Thu, Dec 1, 6:56 PM · Platform Team Workboards (Platform Engineering Reliability), Epic, Data-Engineering, Cassandra
Eevans updated the task description for T302278: Final cleanup tasks related to the AQS cluster migration.
Thu, Dec 1, 6:56 PM · Shared-Data-Infrastructure, Cassandra
Eevans triaged T302278: Final cleanup tasks related to the AQS cluster migration as Medium priority.
Thu, Dec 1, 3:10 PM · Shared-Data-Infrastructure, Cassandra
Eevans updated the task description for T302278: Final cleanup tasks related to the AQS cluster migration.
Thu, Dec 1, 3:09 PM · Shared-Data-Infrastructure, Cassandra

Wed, Nov 30

Eevans closed T324128: Can not log in, log out, or save edits to the beta cluster (session failures) as Resolved.

This was my bad™, a misconfiguration of the sessionstore VM (profile::java::java_packages not set correctly) caused Cassandra to be down.

Wed, Nov 30, 9:03 PM · Cassandra, User-zeljkofilipin, Beta-Cluster-reproducible, Beta-Cluster-Infrastructure
Eevans triaged T324113: Provision restbase-dev200[1-3], decommission restbase-dev100[4-6] as Medium priority.
Wed, Nov 30, 2:43 PM · Cassandra
Eevans added a project to T324113: Provision restbase-dev200[1-3], decommission restbase-dev100[4-6]: Cassandra.
Wed, Nov 30, 2:43 PM · Cassandra
Eevans created T324113: Provision restbase-dev200[1-3], decommission restbase-dev100[4-6].
Wed, Nov 30, 2:43 PM · Cassandra
Eevans moved T302278: Final cleanup tasks related to the AQS cluster migration from Backlog to Next on the Cassandra board.
Wed, Nov 30, 1:58 AM · Shared-Data-Infrastructure, Cassandra
Eevans closed T307641: AQS multi-datacenter cluster expansion as Resolved.

Complete.

Wed, Nov 30, 1:52 AM · Data-Engineering-Radar, Cassandra
Eevans updated the task description for T307641: AQS multi-datacenter cluster expansion.
Wed, Nov 30, 1:52 AM · Data-Engineering-Radar, Cassandra
Eevans moved T305102: Erroneous node placement (AQS Cassandra cluster) from Backlog to Next on the Cassandra board.
Wed, Nov 30, 1:51 AM · Cassandra, User-Eevans
Eevans moved T253244: Upstream gocql bug effects Kask from Backlog to In-Progress on the Cassandra board.
Wed, Nov 30, 1:51 AM · Patch-For-Review, Sustainability (Incident Followup), User-Eevans, Platform Team Workboards (Clinic Duty Team), Cassandra
Eevans moved T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results from Backlog to Next on the Cassandra board.
Wed, Nov 30, 1:51 AM · Structured-Data-Backlog (Current Work), Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Eevans moved T307035: Relocate hosts: aqs10[3-5] from Next to In-Progress on the Cassandra board.
Wed, Nov 30, 1:51 AM · SRE, DC-Ops, ops-eqiad, Cassandra, User-Eevans
Eevans moved T283838: Kask: gocql: no hosts available in the pool errors from Next to In-Progress on the Cassandra board.
Wed, Nov 30, 1:50 AM · Cassandra
Eevans closed T307802: Bootstrap new Cassandra nodes (eqiad) as Resolved.

Complete.

Wed, Nov 30, 1:48 AM · Data-Engineering-Radar, Cassandra
Eevans closed T307802: Bootstrap new Cassandra nodes (eqiad), a subtask of T307641: AQS multi-datacenter cluster expansion, as Resolved.
Wed, Nov 30, 1:48 AM · Data-Engineering-Radar, Cassandra
Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Wed, Nov 30, 1:48 AM · Data-Engineering-Radar, Cassandra

Tue, Nov 29

Eevans added a project to T323692: Create puppet defined type for adding/updating/deleting secrets or other small files on HDFS: Cassandra.
Tue, Nov 29, 3:54 PM · Cassandra, Data-Engineering-Planning
Eevans added a comment to T306895: Write dedicated cassandra authorization code to read password from file when loading.
Tue, Nov 29, 3:53 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning, Patch-For-Review, Cassandra
Eevans added a comment to T306895: Write dedicated cassandra authorization code to read password from file when loading.

Thank you @BTullis and @Ottomata for taking over this - this will be very useful :)

Tue, Nov 29, 3:44 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning, Patch-For-Review, Cassandra

Sun, Nov 27

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Sun, Nov 27, 10:06 PM · Data-Engineering-Radar, Cassandra

Sat, Nov 26

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Sat, Nov 26, 9:34 PM · Data-Engineering-Radar, Cassandra

Fri, Nov 25

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Fri, Nov 25, 5:37 PM · Data-Engineering-Radar, Cassandra

Wed, Nov 23

Eevans created P40873 (An Untitled Masterwork).
Wed, Nov 23, 11:16 PM
Eevans created P40867 (An Untitled Masterwork).
Wed, Nov 23, 10:45 PM
Eevans created P40863 (An Untitled Masterwork).
Wed, Nov 23, 10:29 PM
Eevans created P40856 cloud.yaml.
Wed, Nov 23, 9:48 PM
Eevans triaged T323733: Move Kask to Debian build of golang-github-gocql-gocql as Low priority.
Wed, Nov 23, 8:38 PM · Cassandra
Eevans created T323733: Move Kask to Debian build of golang-github-gocql-gocql.
Wed, Nov 23, 8:38 PM · Cassandra
Eevans added a comment to T323561: Remove old data from instanceof_cache and title_cache in image_suggestions.suggestions in Cassandra.

I think that I've gotten what I need from instanceof_cache & title_cache, they can be cleaned up.

Wed, Nov 23, 8:17 PM · Structured-Data-Backlog, Image-Suggestions
Eevans added a comment to T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results.

I assume, that whatever process loaded data into the suggestions table, also loaded into instanceof_cache and title_cache, is this assumption safe?

Yes, I think so ... can't be 100% sure because that process might not have always been working from complete or consistent datasets when we were developing the thing, but it definitely seems like an approach that has a good chance of working

Wed, Nov 23, 8:08 PM · Structured-Data-Backlog (Current Work), Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Wed, Nov 23, 5:39 PM · Data-Engineering-Radar, Cassandra

Tue, Nov 22

Eevans added a comment to T323561: Remove old data from instanceof_cache and title_cache in image_suggestions.suggestions in Cassandra.

Per T317364#8415200, let's hold off until we're sure this data can't be used to suss out T317364

Tue, Nov 22, 9:42 PM · Structured-Data-Backlog, Image-Suggestions
Eevans added a comment to T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results.

New ticket for cleaning up the other tables T323561

I'll close this if there are no objections @Eevans @kostajh @JAllemandou

Tue, Nov 22, 9:41 PM · Structured-Data-Backlog (Current Work), Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Eevans added a comment to T320835: Split IPA audio generation into multiple hooks.

@Eevans you mentioned Swift expiry in T320675#8317876. Would it be acceptable to set X-Delete-At to something like 3 months in the future, and if the file gets used a week prior to the expiration date, use describe to reset X-Delete-At for another 3 months? (Date ranges are TBD and can be adjusted as we go)

Wouldn't this solve any potential leakage?

Here is what I mean https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Phonos/+/858423

Tue, Nov 22, 8:52 PM · Community-Tech (CommTech-Sprint-37), MediaWiki-extensions-Phonos
Eevans added a comment to T253244: Upstream gocql bug effects Kask.

This has been deployed to sessionstore (production). It still needs to be deployed to:

Tue, Nov 22, 7:58 PM · Patch-For-Review, Sustainability (Incident Followup), User-Eevans, Platform Team Workboards (Clinic Duty Team), Cassandra

Mon, Nov 21

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Mon, Nov 21, 3:15 PM · Data-Engineering-Radar, Cassandra

Sun, Nov 20

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Sun, Nov 20, 8:29 PM · Data-Engineering-Radar, Cassandra

Sat, Nov 19

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Sat, Nov 19, 9:40 PM · Data-Engineering-Radar, Cassandra

Fri, Nov 18

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Fri, Nov 18, 2:29 PM · Data-Engineering-Radar, Cassandra

Wed, Nov 16

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Wed, Nov 16, 3:38 PM · Data-Engineering-Radar, Cassandra

Tue, Nov 15

Eevans reopened T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results as "Open".

@Cparle are you able to post the code you used for this? Just spot checking I found some older entries.

Tue, Nov 15, 10:10 PM · Structured-Data-Backlog (Current Work), Growth-Team (Current Sprint), Structured Data Engineering, Cassandra
Eevans committed rGDIS50ba843a399b: cassandra_schema.cql: change type of rejection reason(s) (authored by Eevans).
cassandra_schema.cql: change type of rejection reason(s)
Tue, Nov 15, 9:39 PM
Eevans updated subscribers of T306895: Write dedicated cassandra authorization code to read password from file when loading.

We now have a custom AuthConfFactory that will be passed as a parameter to the job using spark.cassandra.auth.conf.factory. This should be deployed today. @Ottomata @Eevans The next step would be to have the file containing the Cassandra password loaded to HDFS.

Tue, Nov 15, 8:13 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning, Patch-For-Review, Cassandra
Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Tue, Nov 15, 4:14 PM · Data-Engineering-Radar, Cassandra
Eevans closed T322392: Store rejection reasons as a set in Cassandra as Resolved.

@Tgr do you have concerns?

No, sounds good.

@Eevans please go ahead with adding rejected_reasons as set<text> when it's convenient for you. Thanks!

Sounds good; Let's shoot for tomorrow morning (US timezone) when we can do it during a lull in eqiad bootstraps (T307802).

Tue, Nov 15, 4:13 PM · Growth-Team, Cassandra, Image-Suggestions

Mon, Nov 14

Eevans committed rMSKS21cc94eb43a5: Upgrade build environment & dependencies (authored by Eevans).
Upgrade build environment & dependencies
Mon, Nov 14, 11:25 PM
Eevans added a comment to T322392: Store rejection reasons as a set in Cassandra.

@Tgr do you have concerns?

No, sounds good.

@Eevans please go ahead with adding rejected_reasons as set<text> when it's convenient for you. Thanks!

Mon, Nov 14, 9:20 PM · Growth-Team, Cassandra, Image-Suggestions
Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Mon, Nov 14, 5:23 PM · Data-Engineering-Radar, Cassandra

Sat, Nov 12

Eevans updated the task description for T307802: Bootstrap new Cassandra nodes (eqiad).
Sat, Nov 12, 10:50 PM · Data-Engineering-Radar, Cassandra

Mon, Nov 7

Eevans added a comment to T305570: Q4:(Need By: TBD) rack/setup/install aqs1016-aqs1021.

I've also run the sre.dns.netbox cookbook, the DNS records are now live.

Mon, Nov 7, 7:30 PM · Cassandra, SRE, ops-eqiad, DC-Ops

Nov 4 2022

Eevans added a project to T322392: Store rejection reasons as a set in Cassandra: Cassandra.

As the description notes, this is a breaking change. That said, I only see a couple of items in the feedback table, presumably test values of some sort? If there is no user-facing client code that would break as a result of a schema change, I could drop the rejected_reason attribute, and add rejected_reasons as set<text>.

Nov 4 2022, 3:04 PM · Growth-Team, Cassandra, Image-Suggestions
Eevans added a comment to T320675: Establish Phonos production storage requirements.

Ok, so let's try to move this in a (more) constructive direction:

Nov 4 2022, 12:11 AM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech

Nov 2 2022

Eevans updated subscribers of T320835: Split IPA audio generation into multiple hooks.

@Eevans we can't really do this "multi-hook" approach. It doesn't work.

That's disappointing; What changed?

Nov 2 2022, 8:42 PM · Community-Tech (CommTech-Sprint-37), MediaWiki-extensions-Phonos
Eevans added a comment to T320675: Establish Phonos production storage requirements.

I originally read T320675#8330640 as giving the go-ahead, but with the shared understanding there will be proper instrumentation and monitoring on our end moving forward. We can certainly commit to that. Our first pilot wikis (T316013) are comparatively very small, but will give us a better idea of the costs you're asking about, monetary or otherwise.

Nov 2 2022, 8:39 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech

Nov 1 2022

Eevans updated subscribers of T320675: Establish Phonos production storage requirements.

Since T320835: Split IPA audio generation into multiple hooks appears to be in jeopardy (see: T320835#8361202), perhaps we can revisit some of these concerns:

Nov 1 2022, 10:07 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans updated subscribers of T320835: Split IPA audio generation into multiple hooks.

@Eevans we can't really do this "multi-hook" approach. It doesn't work.

Nov 1 2022, 9:55 PM · Community-Tech (CommTech-Sprint-37), MediaWiki-extensions-Phonos

Oct 31 2022

Eevans added a comment to T283838: Kask: gocql: no hosts available in the pool errors.

[ ... ]

To update to the latest gocql driver release (1.2.1 as of the time of this writing), will roughly require:

  • Packaging golang-github-pierrec-lz4.v4-dev (not currently in any version of Debian, but package source already exists on Salsa), and uploading it to sid
  • ...
Oct 31 2022, 9:38 PM · Cassandra
Eevans added a comment to T283838: Kask: gocql: no hosts available in the pool errors.

Kask's dependencies are sourced entirely from Debian, the rationale for which can be found documented here. The most current version of the gocql driver in any version of Debian is 0.0~git20191102.0.9faa4c0-4 (the version we are already using); Continuing this practice will mean creating an updated package and adding it to a repository (preferably Debian, but possibly our own in the near-term).

Oct 31 2022, 9:14 PM · Cassandra

Oct 27 2022

Eevans moved T320831: Section Level Image Suggestions - Data Persistence Request from Backlog to Next on the Cassandra board.
Oct 27 2022, 7:56 PM · Data-Engineering-Planning, Section-Level-Image-Suggestions, Cassandra, Image-Suggestions
Eevans added a comment to T320831: Section Level Image Suggestions - Data Persistence Request.

[ ... ]

Size and Growth:

  • Still being investigated as the final algorithm is worked out -> https://phabricator.wikimedia.org/T315976
  • Size is not known yet but realistically would be a multiple of the existing page level image suggestions (10x?)

What is it that makes it a multiple of the existing (page-based) suggestions? Will the algorithm be somehow producing that many more results? Will we be (for example) storing N * num_sections suggestions (where N is the current per-page limit)? I guess what I'm wondering is whether this will result in a corresponding change to the size of result responses as well.

[ ... ]

  • I don’t think we need to have the new section id field as part of the key. Requests would still be on wiki/page_id (requestors wouldn't likely know the section id - SD correct me if I'm wrong?) - Growth may need to add a filter on items with a section id

Yes, this is important. If there is a need to query by-section, then changes to the data model will be needed. The sooner we establish this, the better.

Oct 27 2022, 7:56 PM · Data-Engineering-Planning, Section-Level-Image-Suggestions, Cassandra, Image-Suggestions

Oct 25 2022

Eevans added a comment to T320831: Section Level Image Suggestions - Data Persistence Request.

The existing image suggestions data pipeline suggests images at an article level. There is a new data pipeline being built that will suggest images at an article section level.

The output of the new data pipeline is expected to the be the same as the article level suggestions but with the addition of a field to contain the section identifier.

When the new data pipeline is built the existing article level data pipeline will continue to run and the output consumed as it's done currently.

Write Frequency and Method:

  • Weekly bulk from Airflow > Cassandra connector job

Size and Growth:

  • Still being investigated as the final algorithm is worked out -> https://phabricator.wikimedia.org/T315976
  • Size is not known yet but realistically would be a multiple of the existing page level image suggestions (10x?)
Oct 25 2022, 10:10 PM · Data-Engineering-Planning, Section-Level-Image-Suggestions, Cassandra, Image-Suggestions
Eevans moved T283838: Kask: gocql: no hosts available in the pool errors from Backlog to Next on the Cassandra board.
Oct 25 2022, 12:35 AM · Cassandra

Oct 24 2022

Eevans committed rGDIS81eba4b46598: Updated dependencies (authored by Eevans).
Updated dependencies
Oct 24 2022, 10:44 PM

Oct 21 2022

Eevans closed T313991: Investigate sessionstore Cassandra utilization improvements as Resolved.

By this point, things seem to have stabilized. Utilization ebbs and flows between a floor of ~50G and a ceiling of ~100G. While this is considerably more than what we assume the live set to be (the extra being TTL'd tombstones), it still only represents a worst-case of ~30% utilization. Since this is a single-purpose cluster, there is no value to be had from additional free space (and freeing that space would generate additional I/O, and shorten the lifespan of the SSDs).

Oct 21 2022, 7:47 PM · Cassandra

Oct 19 2022

Eevans added a comment to T320675: Establish Phonos production storage requirements.

To summarize what was discussed during a Data Persistence meeting earlier today (in no particular order):

Oct 19 2022, 9:11 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans added a comment to T306349: Public-facing API for image suggestions data.

Actually ... now that I think about it, @VirginiaPoundstone and @BPirkle is there scope to move some of the business logic into the API, so that each client doesn't have to implement it separately? IMO it'd make a lot of sense to do that

If there is scope, then Growth and us and Android need to have a little chat to figure out what we need ...

Oct 19 2022, 3:06 PM · serviceops, Patch-For-Review, API Platform, Wikipedia-Android-App-Backlog (Android Release FY2022-23), Structured-Data-Backlog, Image-Suggestions, Foundational Technology Requests

Oct 18 2022

Eevans added a comment to T320739: [SPIKE] Explore using GrowthExperiments as a proxy for bringing image recommendations into the Android app.

[ ... ]

TL;DR If such a service is going to accept POSTs for feedback submission, it would handle them by submitting an event.

AIUI need to proxy the feedback event through MediaWiki unless we adjust the EventGate configuration for the relevant stream to allow for external event creation.

Oct 18 2022, 6:35 PM · Patch-For-Review, Data-Persistence (work done), Image-Suggestions, API Platform, Wikipedia-Android-App-Backlog (Android Release FY2022-23)
Eevans removed a project from T283838: Kask: gocql: no hosts available in the pool errors: Platform Team Workboards (Clinic Duty Team).
Oct 18 2022, 1:18 AM · Cassandra

Oct 17 2022

Eevans raised the priority of T283838: Kask: gocql: no hosts available in the pool errors from Medium to High.
Oct 17 2022, 4:25 PM · Cassandra
Eevans updated Eevans.
Oct 17 2022, 1:53 PM
Eevans added a member for Data-Persistence: Eevans.
Oct 17 2022, 1:51 PM

Oct 14 2022

Eevans added a comment to T320675: Establish Phonos production storage requirements.

To summarize what was discussed on Slack:

Oct 14 2022, 7:50 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech

Oct 13 2022

Eevans added a comment to T320739: [SPIKE] Explore using GrowthExperiments as a proxy for bringing image recommendations into the Android app.

I feel like ImageSuggestions is growing beyond an "experiment" at this stage, and that this sort of API falls naturally under the purview of the API platform team rather than inside Product

While I don't disagree, I'm the only developer on API Platform with MediaWiki development experience, and my time is already pretty much spoken for. If this task involves MediaWiki development (as presumably it does if we're collecting user-specific feedback), then we'd need to reshuffle some things for API Platform to take it on. I have no objections to that - I'm here to do whatever is most needed - so I'll let the various folks with "Manager" in their title sort that out. :-)

Oct 13 2022, 7:44 PM · Patch-For-Review, Data-Persistence (work done), Image-Suggestions, API Platform, Wikipedia-Android-App-Backlog (Android Release FY2022-23)
Eevans added a project to T320739: [SPIKE] Explore using GrowthExperiments as a proxy for bringing image recommendations into the Android app: Data-Persistence (work done).
Oct 13 2022, 6:58 PM · Patch-For-Review, Data-Persistence (work done), Image-Suggestions, API Platform, Wikipedia-Android-App-Backlog (Android Release FY2022-23)
Eevans added a comment to T320739: [SPIKE] Explore using GrowthExperiments as a proxy for bringing image recommendations into the Android app.

So if I understand this correctly, @kostajh 's proposal is for the Growth team to

  1. write and maintain an Image Suggestions http api to proxy GET requests to the existing image suggestions cassandra gateway inside the GrowthExperiments extension
  2. possibly also provide a POST endpoint which would involve writing to Cassandra (or maybe just would write an event to EventGate that would eventually propagate to Cassandra)

If the growth team have the people/time to do this then it might expedite things, but I'm not sure it's a good idea. I feel like ImageSuggestions is growing beyond an "experiment" at this stage, and that this sort of API falls naturally under the purview of the API platform team rather than inside Product

Oct 13 2022, 6:57 PM · Patch-For-Review, Data-Persistence (work done), Image-Suggestions, API Platform, Wikipedia-Android-App-Backlog (Android Release FY2022-23)
Eevans added a comment to T320675: Establish Phonos production storage requirements.

Is there no point when a pronunciation is altered or replaced, invalidating the mp3? Or removed?

I suppose that's where the deleteOldPhonosFiles.php script comes into play. Seemingly this is the only thing Extension:Score does to guard against the same concern.

Oct 13 2022, 12:27 AM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech

Oct 12 2022

Eevans added a comment to T320675: Establish Phonos production storage requirements.

Oh, I missed these entirely! Would you rather we reopen and use one of these (T314789 presumably), or continue here?

No worries! Here is fine. Those old tasks are out of date as it is. For one, we're not using WAV anymore, and:

There is also some concern about how retention is managed, if we're treating this as a cache.

Let me be clear that this is NOT a proper caching technique (though we did envision it that way initially). What we want is persistent storage, pretty much exactly like Extension:Score. Files are shared globally, last indefinitely (but note we do have deleteOldPhonosFiles.php), etc.

Oct 12 2022, 9:51 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans added a comment to T320675: Establish Phonos production storage requirements.

Thanks for the info; Apologies for not following up on this sooner. I asked about utilization because we (Data-Persistence) are trying to engage with projects that need storage earlier, understand the requirements, be in a position to offer feedback, and plan accordingly. Ideally, earlier would be earlier than where we are now, but it would still be great to run through your storage requirements.

Since this seems out of scope for this ticket (sorry about that) -and since I didn't find a suitable existing issue- I've stubbed out T320675 for this.

I should have linked to it in my reply at T317417#8280934, but we have T314789 and T309315 from much earlier in this project where we tried to consult with Data Persistence.

Oct 12 2022, 8:31 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans renamed T320675: Establish Phonos production storage requirements from Establish Phonos production storage requirements. to Establish Phonos production storage requirements.
Oct 12 2022, 7:57 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans updated the task description for T320675: Establish Phonos production storage requirements.
Oct 12 2022, 7:56 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans added a parent task for T320675: Establish Phonos production storage requirements: T316011: Rollout plan for Phonos.
Oct 12 2022, 7:51 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans added a subtask for T316011: Rollout plan for Phonos: T320675: Establish Phonos production storage requirements.
Oct 12 2022, 7:51 PM · Community-Tech, MediaWiki-extensions-Phonos
Eevans added a comment to T317417: Phonos links to an unauthorized URL.

How big are the files themselves (min, max, average)? How many do we expect; What's the anticipated total storage? Read & write rates?

We aren't setting a maximum file size currently, but we do have a maximum amount of IPA that can be passed to the parser tag. That is currently set to 300 bytes (T316641). The generated MP3 in that case is somewhere in the neighborhood of 40-50kb, but on average files are going to be maybe be 3-5kb. The number of files generated depends on how well the communities adopt this feature. If fully rolled out across all wikis (we use Phonos everywhere we show IPA), we're probably looking at many hundreds of thousands of files, but I think it will be quite a while before we reach that point. Read rates are estimated to at around ~1.8 million a month (T307625). Write rates is harder to estimate, again depending on how communities adopt this feature. Since most wikis have a Template:IPA or something similar, the Phonos parser tag will likely be put there. Thus, the initial rollout will see a very high number of writes (directly proportional to the number of transclusions of Template:IPA), but we are building our own type of job and making it go slower than the normal job queue rate (T318086). After the rollout, writes will likely be by comparison relatively rare. Note however some communities might prefer to opt-in to using Phonos on a case-by-case basis, making the rollout longer and hence lower write rates.

Oct 12 2022, 7:50 PM · Community-Tech (CommTech-Sprint-36), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), SRE-swift-storage, Beta-Cluster-Infrastructure, MediaWiki-extensions-Phonos
Eevans added a project to T320675: Establish Phonos production storage requirements: Community-Tech.
Oct 12 2022, 7:48 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans added a project to T320675: Establish Phonos production storage requirements: Data-Persistence.
Oct 12 2022, 7:42 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech
Eevans created T320675: Establish Phonos production storage requirements.
Oct 12 2022, 7:42 PM · MediaWiki-extensions-Phonos, SRE-swift-storage, MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Community-Tech

Oct 6 2022

Eevans added a comment to T317364: [M] Stop unbounded image suggestions dataset growth and clean up legacy results.

Deletion has been done this morning using the following actions:

# On stat machine, in a screen, launch a spark-shell
spark3-shell  --master yarn --executor-memory 8G   --executor-cores 2   --driver-memory 8G   --driver-cores 2  --conf spark.dynamicAllocation.maxExecutors=90 --jars hdfs:///wmf/cache/artifacts/airflow/analytics/spark-cassandra-connector-assembly-3.2.0-WMF-1.jar --conf spark.cassandra.connection.host=aqs1010-a.eqiad.wmnet:9042 --conf spark.cassandra.auth.username=aqsloader --conf spark.cassandra.auth.password=cassandra

# Then execute scala code

import com.datastax.spark.connector._

val r = spark.sql("""
SELECT DISTINCT
  -- Selecting C* primary key columns (except for image, not needed)
  wiki, page_id, id
FROM analytics_platform_eng.image_suggestions_suggestions
WHERE
  -- Snapshot list extracted from  /user/hive/warehouse/analytics_platform_eng.db/image_suggestions_suggestions
  -- Keeping snapshot that are before Sept 9th as after that date data normally has TTL
  snapshot IN ('2022-05-02', '2022-05-16', '2022-06-13', '2022-06-20', '2022-06-27', '2022-07-04', '2022-07-11',
               '2022-07-18', '2022-07-25', '2022-08-01', '2022-08-08', '2022-08-15', '2022-08-22', '2022-08-29', '2022-09-05')
""").repartition(6).rdd

val f = new RDDFunctions(r)

f.deleteFromCassandra("image_suggestions", "suggestions", keyColumns = SomeColumns("wiki", "page_id", "id"))
Oct 6 2022, 6:56 PM · Structured-Data-Backlog (Current Work), Growth-Team (Current Sprint), Structured Data Engineering, Cassandra

Oct 3 2022

Eevans added a comment to T317417: Phonos links to an unauthorized URL.

Reading the discussion on https://gerrit.wikimedia.org/r/c/operations/puppet/+/831955/ I'm left a bit confused - is the aim to use swift as essentially a cache for these sound files? If so, how is expiring them being managed?

Correct (though we've been reminded "persistent storage" is more appropriate, as we kept referring to it as "cache"). Due to the small size of these files, and the relative inexpense of storage space, expiring is being handled by a maintenance script run infrequently (yearly?)

Oct 3 2022, 6:44 PM · Community-Tech (CommTech-Sprint-36), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), SRE-swift-storage, Beta-Cluster-Infrastructure, MediaWiki-extensions-Phonos

Sep 26 2022

Eevans added a comment to T253244: Upstream gocql bug effects Kask.

gocql/commit/312a614 looks...promising.

Sep 26 2022, 7:58 AM · Patch-For-Review, Sustainability (Incident Followup), User-Eevans, Platform Team Workboards (Clinic Duty Team), Cassandra

Sep 23 2022

Eevans updated the task description for T318407: Cassandra multi-tenant access configuration.
Sep 23 2022, 3:25 PM · Cassandra