Page MenuHomePhabricator

Meet with Security and Ops about a long-term productionization plan
Closed, ResolvedPublic

Description

After talking with @csteipp, it's clear that the road to full productionization of the Program Dashboard Rails application—deployment to the production cluster, after deployment to labs by the end of this month—is going to entail a substantial amount of review, coordination, and some level of long-term ops commitment/support. Let's get this all on the table before the month is up, so we can communicate the realities to our stakeholders (and other jargon).

Agenda:

  • Identify immediate options for and limitations to running more reliably in labs (database consistency, replication, failover, varnish cache, etc.)
  • Identify security review requirements, commitments so we can scope a rough long-term productionization timeline.
  • How can we manage Rails application dependencies to allow for security review while maintaining a level of flexibility in deployment?
    • Bundler packaging of gems?
    • Precompilation of native extensions?
    • What will definitely require Debian packaging, etc.?
  • Are there options for some level of isolation in production that will mitigate security concerns? (Ganeti?)
  • What level of Ops commitment would we need to run a Rails application in production long-term? Can we get that commitment elsewhere?

Event Timeline

dduvall created this task.Feb 18 2016, 11:00 PM
Restricted Application added subscribers: StudiesWorld, Base, Aklapper. · View Herald TranscriptFeb 18 2016, 11:00 PM
dduvall updated the task description. (Show Details)Feb 18 2016, 11:25 PM

Who can Ops spare for a 55 min meeting next week to help us flesh this out? @fgiunchedi or @akosiaris, are either of you available?

Another thing to throw into the mix is the non-production APIs that the dashboard interfaces with. The Wiki Ed version currently uses these three:

Minor note, we could get rid of the custom APIs with direct database access, T127404 (and another task for ORES), then disable the plagiabot as ragesoss suggested.

Who can Ops spare for a 55 min meeting next week to help us flesh this out? @fgiunchedi or @akosiaris, are either of you available?

Sorry, I will not be around next week. Maybe the week after ?

Question, I assume the Program Dashboard Rails application is this http://wikiedu.org/. Am I correct ?

@dduvall the week after would work for me too, can you give us more context on what Program Dashboard is and what it will do, where's the source, etc? Personally it is the first time I've heard of it

@dduvall the week after would work for me too, can you give us more context on what Program Dashboard is and what it will do, where's the source, etc? Personally it is the first time I've heard of it

That works! I'll go ahead and schedule something.

I'll try to give you the gist of the project; I'm also pretty new to it. :) Check out the GitHub project README as well.

It's an upstream project from the Wiki Education Foundation that was developed to help educators integrate Wikimedia contribution into classroom curricula. We—a skunkworks-y team on temporary loan from our usual depts since CE lacks enough tech resources—are working in collaboration with WikiEd this month to help refactor the dashboard for more generalized use among other Wikimedia teams/projects, building in multi wiki and multi language support, and hosting an instance of the application under WMF resources.

The dashboard will serve as a replacement for the EducationProgram extension which has languished in security and support over the years and we're hoping that it will be more maintainable and secure over the long term since it will have both a highly invested/talented/dedicated upstream and hopefully some direct support from WMF in the near future.

I went ahead and scheduled a meeting for March 1st at 1900 UTC. @akosiaris, that seems especially late for you and we probably just need one of y'all opsen to help, so no worries if you can't make it. Alternatively, I could bump my meeting at 1600 UTC if that's not too early for @csteipp.

1600 UTC is fine for me.

Great. I've moved it to 1600 UTC to better accommodate ops.

Thanks again for meeting today, everyone. The general summary of this meeting is that in order to move forward with a full security review, we'll need a firm maintenance commitment from Ops and PC&L. Barring any changes in the annual plan that would grant the latter an actual tech budget, we should move forward with the assumption that the dashboard will continue to run in labs and that we should work to improve reliability/performance with a shared read/write application database and possibly a Redis instance for the Rails cache—ToolLabs was also mentioned as an option that could provide more scalability atop the Labs infrastructure.

Below are the notes from the etherpad:

Security review

  • subscribe to gem CVE
  • Rails CVE
  • oauth login
  • should run on separate domain
  • would want a commitment from ops before going forward with security review

ToolLabs

  • have we considered Tool Labs?
  • is it stable? performant?
    • Filippo will look into questions of HA, replication, etc.
    • we should also talk to Yuvi

Ganeti

    • separate VM but still on prod network
    • means we'd still have to endure the same security review, etc.
  • we don't need direct access to wiki DBs, etc.
  • more isolation the better
  • est 10,000 users in the first year

Ops concerns

  • what are security concerns?
  • where to put it?
  • what kind of access does it need?
    • doesn't access MySQL replica, but it could/should to relieve performance bottleneck
    • would be read-only
  • how often does it read/write to MySQL
    • lots of reads/writes
  • current bottleneck is service that wrap DB replica

TODO

  • compile list of gems that can be satisfied via Debian packages and which might require native compilation
  • Filippo to see if we can have a read/write shared application database in Labs (not on our own instances)
  • talk with Yuvi about whether ToolLabs might be a good fit
  • Dan to profile some application requests to see what kind of read/write DB traffic we should expect
Elitre added a subscriber: Elitre.Mar 1 2016, 4:06 PM

2c. For one reason or another, tools on toollabs may be occasionally down.
It's more or less OK when this means I won't be able to check my edit count for the day, it's definitely not a good circumstance for a tool which is meant to be uses in courses and classes where people can't just say "Well, the website doesn't work, I guess that means I'll see you all tomorrow".

dduvall triaged this task as Normal priority.Mar 3 2016, 7:52 PM
dduvall moved this task from Next Up to In Progress on the Program Dashboard Sprint 1 board.

2c. For one reason or another, tools on toollabs may be occasionally down.
It's more or less OK when this means I won't be able to check my edit count for the day, it's definitely not a good circumstance for a tool which is meant to be uses in courses and classes where people can't just say "Well, the website doesn't work, I guess that means I'll see you all tomorrow".

Tool Labs stability is something we discussed briefly in the meeting but I'll make a point of asking for more information regarding uptime trends, etc., when talking with @yuvipanda. Now that we've some level of dashboard provisioning in ops/puppet, I'm leaning towards just going forward with bare Labs instances for now. Of course, that also hinges on the availability of a stable database platform. If we can't get that without Tool Labs, the latter may still be worth looking into barring uptime concerns.

dduvall closed this task as Resolved.Mar 4 2016, 9:55 PM

All the TODOs from our meeting have been split out into separate tasks.