Access to AQS keyspaces for cassandra
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Sfaci
	Apr 5 2023, 6:49 PM

Description

To help to create a AQS test environment, I'd need to access to cassandra AQS keyspaces to get some sample data.
Same access you provided for fgoodwin in https://phabricator.wikimedia.org/T334099 would be enough

My user is 'sfaci'

Thank you!

Related Objects

Mentioned In: T334851: Define a procedure/pattern to populate test environments
Mentioned Here: T334099: Enable Cassandra access to AQS keyspaces for user fgoodwin

Event Timeline

Sfaci created this task.Apr 5 2023, 6:49 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 5 2023, 6:49 PM

Approved.

As discussed with @Sfaci elsewhere: @FGoodwin had previously gone through the process of obtaining AQS cluster access separately, and had used that to access Cassandra via a default role before we locked it down. Creating their dedicated Cassandra role was just fixing that regression.

Before we create another Cassandra role, @Sfaci would need shell access to make use of it, and that would require following https://wikitech.wikimedia.org/wiki/SRE/Production_access#Access_Request_Process (at a minimum, submitting a new access request ticket).

That said, @FGoodwin's access here seems exceptional; Once the new AQS services are in production, will there be any on-going need? If not, we'd likely considering removing it at that time. If that's the case, my recommendation would be to limit the number of people with this access.

Once the new AQS services are in production, will there be any on-going need?

Possibly. At least, there will be an on-going need for additional testing data. I've seen informal discussion of additional AQS endpoints, which would require additional data to develop against. It is also reasonable to assume that we'll at least occasionally find/create a bug for which we'd like additional tests to prevent future regressions, and for which we need additional test data to run local tests.

Now, that doesn't necessarily mean we need to continue extracting production data in the way we are now. Maybe there's another way to get the data. Or maybe we switch to using mock data representative of production rather than actual extracted data.

From the API Platform side, at least, our processes, implementation, and understanding are all pretty formative. So I'd be interested in a conversation about how we best do all this going forward that meets everyone's needs and best practices.

We're also not that far away from having similar needs related to Druid. So if there are similar considerations, or better ways of doing things that would apply to both datastores, now is a great time to figure it out, before we recreate an undesirable situation that we have to quickly refactor.

In T334130#8771980, @BPirkle wrote:

Once the new AQS services are in production, will there be any on-going need?

Possibly. At least, there will be an on-going need for additional testing data. I've seen informal discussion of additional AQS endpoints, which would require additional data to develop against. It is also reasonable to assume that we'll at least occasionally find/create a bug for which we'd like additional tests to prevent future regressions, and for which we need additional test data to run local tests.

Now, that doesn't necessarily mean we need to continue extracting production data in the way we are now. Maybe there's another way to get the data. Or maybe we switch to using mock data representative of production rather than actual extracted data.

From the API Platform side, at least, our processes, implementation, and understanding are all pretty formative. So I'd be interested in a conversation about how we best do all this going forward that meets everyone's needs and best practices.

Speaking as the person that started us down this path (using test data that had been queried from the production database): The primary appeal at the time was that we were going to be implementing services to replace existing ones. Using actual data like this meant that integration tests could be interchangeably run against both production and the new implementations, and provide confidence of a matching contract. That wouldn't be the case for new services.

We should also be very careful here; These services don't have any PII, but will we always be able to say that? I don't think we'd want to establish doing this as best-practice.

BPirkle mentioned this in T334851: Define a procedure/pattern to populate test environments.Apr 17 2023, 3:11 PM

Eevans triaged this task as Low priority.Feb 22 2024, 5:49 PM

In T334130#8771980, @BPirkle wrote:

Once the new AQS services are in production, will there be any on-going need?

Possibly. At least, there will be an on-going need for additional testing data. I've seen informal discussion of additional AQS endpoints, which would require additional data to develop against. It is also reasonable to assume that we'll at least occasionally find/create a bug for which we'd like additional tests to prevent future regressions, and for which we need additional test data to run local tests.

Now, that doesn't necessarily mean we need to continue extracting production data in the way we are now. Maybe there's another way to get the data. Or maybe we switch to using mock data representative of production rather than actual extracted data.

Since we are a year on, and the AQS 2.0 services are now in production, is this issue still relevant?

In T334130#9694194, @Eevans wrote:

In T334130#8771980, @BPirkle wrote:

Once the new AQS services are in production, will there be any on-going need?

Possibly. At least, there will be an on-going need for additional testing data. I've seen informal discussion of additional AQS endpoints, which would require additional data to develop against. It is also reasonable to assume that we'll at least occasionally find/create a bug for which we'd like additional tests to prevent future regressions, and for which we need additional test data to run local tests.

Now, that doesn't necessarily mean we need to continue extracting production data in the way we are now. Maybe there's another way to get the data. Or maybe we switch to using mock data representative of production rather than actual extracted data.

Since we are a year on, and the AQS 2.0 services are now in production, is this issue still relevant?

I'm going to close this. Feel free to re-open if you feel that is in error.

Access to AQS keyspaces for cassandraClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Access to AQS keyspaces for cassandra
Closed, ResolvedPublic
Actions