Page MenuHomePhabricator

Define and then implement a way for a future service owner to provide the info required to have a new service brought into production
Closed, ResolvedPublic

Description

We need a clear way for new service owners to communicate to Services and Ops the intent to create a new service and bring it into production.

Taking ideas from https://phabricator.wikimedia.org/T90487#1227730 I am suggesting the following:

  • A clear description of what this service does. Preferably a link to a wiki page that clearly states what the service is for.
  • A desired timeline for the introduction of the service into production.
  • A link to a simplified proposed architecture diagram (possibly in the same wiki page as the description). The diagram should have:
    • Request flow from:
      1. The browser(end-user) to mediawiki (if any)
      2. The mediawiki (or relevant extension) to the service (if any)
      3. The browser to the service (if any)
      4. The service to any other WMF service (if any)
      5. The service to any external entity e.g. translation APIs, web sites that could be used as citation etc. (if any)
    • Jobs that might need to run via jobrunners (if any)
    • Data store dependencies (if any)
    • Anything else architecturally significant not covered by the above

No intermediate HTTP caching layers should be inserted for simplicity's sake, but if HTTP caching is off the essence it should be noted.

Lower level caching layers like memcached/redis should be added. It is highly preferable that the service should continue working if those are unavailable but in case this is impossible it should be clearly noted.

The idea is to have a very very clear diagram, or more if the service is complicated (which should raise eyebrows anyway) for everyone to understand at first glance so it can serve as documentation.

  • Who's running point

Probably the service owner, but it might differ so we want that info.

Event Timeline

akosiaris raised the priority of this task from to Medium.
akosiaris updated the task description. (Show Details)
akosiaris added projects: acl*sre-team, Services.
akosiaris added a subscriber: akosiaris.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 23 2015, 3:18 PM
akosiaris added a subscriber: Unknown Object (User).Jun 8 2015, 1:31 PM

After contemplating quite a bit on this, I think the best way to get this done is via a project in phabricator (service-requests?). People should be able to file tasks in that project informing of their intention to create a new service and filling up the basic info described above into the task. That way we have:

  • A clear way for future service owners to inform everyone beforehand about their intention to create a new service
  • A way for future service owners to provide the info we request
  • Everyone can see/search a list of "pending" services
  • Everyone can see/search a list of all in production services and the corresponding info
  • Everyone can see/search deprecated/phased out services (we should assume services will be phased out at some point).

The last is assuming we use the workboard feature of phabricator for that.

@mobrovac, @chasemp, @glavagetto what do you think?

I 'll request the project from the project creators (https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#New_projects) if you are OK with it

I'm all for it. The only inconvenient thing I see with this approach is that the Phabricator task status for each service will not be in line with the status of the service. Or am I mistaken and is it possible to change these labels?

Another point we should probably be discussing is keeping the big-picture info somewhere/somehow, but that's definitely outside the scope of this task.

I'm all for it. The only inconvenient thing I see with this approach is that the Phabricator task status for each service will not be in line with the status of the service. Or am I mistaken and is it possible to change these labels?

It is possible according to https://secure.phabricator.com/T1812, but I am afraid it is not on a per project level. In our installation they are listed here: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/phabricator/data/fixed_settings.yaml btw.

Another point we should probably be discussing is keeping the big-picture info somewhere/somehow, but that's definitely outside the scope of this task.

Yes, we should, but in another task

I imagine a tag with a workboard is pretty good. I don't have a super awesome grasp of the workflow in place now though :)

A few other questions I am conditioned to ask:

  • New technologies or packages introduced to support this service (things no one uses in production to date we intend to now support forever)
  • How will the service be deployed (This may require looping in Releng)
  • What is the release schedule (if there is one)

I imagine a tag with a workboard is pretty good. I don't have a super awesome grasp of the workflow in place now though :)

There is no real workflow yet, that is what we are trying to define here.

A few other questions I am conditioned to ask:

  • New technologies or packages introduced to support this service (things no one uses in production to date we intend to now support forever)

That probably will show up in reviews that are bound to happen anyway. I can add it though

  • How will the service be deployed (This may require looping in Releng)

Hopefully we will have one tool to rule them all by next quarter

  • What is the release schedule (if there is one)

Good point, although we should just have this referred to RelEng and the usual deployment schedule.

I imagine a tag with a workboard is pretty good. I don't have a super awesome grasp of the workflow in place now though :)

There is no real workflow yet, that is what we are trying to define here.

The way I see it as it pertains / maps to Phabricator:

statusworkboardticket state
active in prodprodstalled
to be activatedwaitingopen
moved out of prodinactiveclosed
waiting to be in prod, but not gonna happenbacklogdeclined

Names are subject to change, ofc, this is just a general idea I'm having.

A few other questions I am conditioned to ask:

  • New technologies or packages introduced to support this service (things no one uses in production to date we intend to now support forever)

That probably will show up in reviews that are bound to happen anyway. I can add it though

Yup, should show up even before code review, during the planning stage (as in why do you want to use technology X?).

  • How will the service be deployed (This may require looping in Releng)

Hopefully we will have one tool to rule them all by next quarter

+1 !

  • What is the release schedule (if there is one)

Good point, although we should just have this referred to RelEng and the usual deployment schedule.

Right. I'll try to see with RelEng about that, but in general they are only/mostly responsible for MW. The current state of affairs is that each group responsible for a service comes up with their own schedule, e.g.:

  • Parsoid - Mondays and Wednesdays
  • RESTBase - whenever there are improvements (exception to this are schema changes which need careful Cassandra monitoring)
  • Citoid (inc. Zotero) - Mondays
  • CXServer - once a week ???

The main challenge here is, of course, ensuring compatibility between various services and their relation to MW deploys. IMHO, that warrants a separate ticket entirely.

  • New technologies or packages introduced to support this service (things no one uses in production to date we intend to now support forever)

That probably will show up in reviews that are bound to happen anyway. I can add it though

Yup, should show up even before code review, during the planning stage (as in why do you want to use technology X?).

All good guys :)

The experience I have had (not here) is roughly:

"why did you guys start using x?"
"Well it was already in prod"
...and then we have to track back to find the one review on some other project where someone said "ehh...I guess installing X is ok".

Better to have it all explicit and declared at the time of service pitch and then, even better, it raises questions on why requirements are changing (if/when they do).

But just a thought based on my particular paranoia

Better to have it all explicit and declared at the time of service pitch and then, even better, it raises questions on why requirements are changing (if/when they do).
But just a thought based on my particular paranoia

That's where the Services team comes in through guidance and mentoring of other teams to make sure we keep these situations contained.

Joe added a subscriber: Joe.Jun 9 2015, 8:22 AM

I agree with chase that asking people to be as explicit as possible is good for everyone's clearness.

I want to add a constraint I stated previously here:

https://wikitech.wikimedia.org/wiki/User:Giuseppe_Lavagetto/MicroServices

we don't want more than two, max three different environments to take care of. Preferably excluding the JVM.

Since we actually already have 3 different envs in production (node, HHVM, python/uwsgi) I would probably set this as an additional policy. Any deviation from this should be possible, but only if justified (e.g. "I need to use lucene/some other big java library that has no matching in other languages")

Hello,

I 've updated service-deployment-requests description with the various issues you all thankfully pointed out here. I 've left out the deployment system for now as it is up to services and ops to define at this point, as well as the release/deployment schedule as this very easily changes and the info in the ticket is bound to be old and confusing quite soon. I did add the technologies used part. I 've also installed a workboard using @mobrovac's proposed table as is.

We will be probably revisiting some of these things in the future, but for now I think we are done. Unless someone would like to add anything, I 'll resolve this.

akosiaris closed this task as Resolved.Jun 15 2015, 6:17 PM
akosiaris claimed this task.

Resolving.