Paste P2837

Log from RFC Meeting: Architecture office hour (2016-03-30, #wikimedia-office)

Authored by RobLa-WMF on Mar 30 2016, 10:37 PM.
1​21:01:30 <robla> #startmeeting
2​21:01:30 <wm-labs-meetbot> Meeting started Wed Mar 30 21:01:30 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at
3​21:01:30 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
4​21:01:30 <wm-labs-meetbot> The meeting name has been set to 'https___phabricator_wikimedia_org_e152'
5​21:01:49 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs:
6​21:02:13 <robla> hi folks!
7​21:02:58 * robla begins to wonder if he's going to be the only one at this office hour ;-)
8​21:03:11 <Scott_WUaS> Hi Robla!
9​21:03:26 <robla> hi Scott_WUaS
10​21:05:19 <Scott_WUaS> @robla: What in particular do you want to focus on today?
11​21:05:47 <robla> this is really just going to be more of an office hour in a somewhat traditional sense. we only had a couple of ArchCom folks at the telecon last hour (gwicke and Krinkle)
12​21:07:01 <Scott_WUaS> sounds good ... and perhaps an opportunity to get things done in a different way, with relatively few participants
13​21:07:28 <robla> I listed a few RFCs that I'm shepherding in that I'm specifically happy to answer questions about, but really, no locked down agenda. Scott_WUaS, any specific questions you have?
14​21:07:57 <Scott_WUaS> Yes, thanks ...
15​21:08:30 <ostriches> robla: I'm around too if we need to discuss the Gerrit/Phab one a bit too
16​21:08:55 <robla> ostriches! o/
17​21:09:05 <Scott_WUaS> WUaS which donated WUaS to Wikidata last autumn is curious what the process is for communicating about further developing WUaS in Wikidata / MediaWiki and re ArchCom?
18​21:10:29 <robla> Scott_WUaS: I think your donation is something we can discuss in a different venue, and I'm happy to do so in the hour after this meeting
19​21:10:51 <Scott_WUaS> WUaS is currently talking with former CC MIT OCW Executive Director MIT Dean of Online Learning, Cecilia d'Oliveira and have received Creative Commons' permissions from her to develop and adapt MIT OCW in 7 languages and in Wikidata
20​21:11:02 <robla> ostriches: we touched on the Gerrit->Phab migration conversation in our last meeting
21​21:11:14 <Scott_WUaS> robla: thanks
22​21:12:10 <ostriches> robla: Yeah I saw. Was there any followup needed on that? I think the only question really is the status. It's not really in draft, it's under implementation now if we consider it accepted
23​21:12:47 <matt_flaschen> robla, I had one question about T123753
24​21:12:47 <stashbot> T123753: Establish retrospective reports for #security and #performance incidents -
25​21:12:48 <robla> ostriches: is there help you need from ArchCom? I think the "done"ness is something we can discuss a little bit
26​21:13:26 <ostriches> I don't think we really need much in the way of help from ArchCom at this point. Considering the outcome of the various discussions we've had so far I think there's consensus for it.
27​21:13:36 <ostriches> (for it to be accepted and move forward, that is)
28​21:14:37 <legoktm> ostriches: Are we going to see a Gerrit upgrade happen before we dump it? :)
29​21:14:46 <robla> ostriches: I wouldn't go so far as to consider it "accepted", but that did get us into a general conversation about what does "accepted" mean by ArchCom. I don't think anyone in ArchCom wants to block it
30​21:14:55 <ostriches> legoktm: Yes, I've been working on that this week. "Soon"
31​21:15:03 <legoktm> <3
32​21:15:37 * TimStarling is partially online this hour, as well as looking after kids
33​21:15:47 <robla> o/ TimStarling :-)
34​21:15:48 <ostriches> hi TimStarling :)
35​21:17:09 <ostriches> robla: In which case I think we're good then? I don't think we need to bikeshed over the template status too much :)
36​21:17:28 <ostriches> As long as ArchCom doesn't need to block and we've got general consensus based on passed discussions, I think RelEng can move ahead
37​21:17:37 <ostriches> *past
38​21:18:31 <robla> ostriches: I really appreciate that y'all wrote up an RFC on this, as I think having that written up is going to help the migration go more smoothly. there's some nitpicking we can do about the RFC about the "how" and the "when", but I don't personally see any problems with the "what"
39​21:19:00 <ostriches> I think some more of the how/when will become clear in the coming quarter.
40​21:19:05 <robla> Krinkle: do you mind if I further paraphrase what you said in the past hour?
41​21:19:16 <ostriches> It's going to be like the Gerrit RFC insofar as this one isn't going to be "done" for a long time.
42​21:19:49 <Krinkle> robla: OK
43​21:20:36 <robla> ostriches: I think the how/when questions need to be clear in order for it to be marked "approved" (under the current ArchCom process)
44​21:21:41 <robla> marking ArchCom-RFCs as "approved" is a subject that sends me down the process wonk rabbithole
45​21:22:02 <ostriches> robla: We do have as a result of our annual planning.
46​21:22:08 <ostriches> Which should be incorporated into the RFC.
47​21:22:15 * robla looks
48​21:23:02 <greg-g> (it's linked to from the RFC, as I was tired of copy/pasting tables all last week ;) )
49​21:23:29 <robla> greg-g: I understand, truly :-)
50​21:23:40 <ostriches> So many tables that was.
51​21:24:04 <greg-g> the outcome of planning is great, the process can sometimes be... subpar :)
52​21:25:59 <robla> ostriches: lemme see if I can paraphrase... Phase 1: T130418 done hopefully June 30 (and then do the same quarter math for Phase 2 and Phase 3)
53​21:25:59 <stashbot> T130418: Goal: Phase 1 repository migrations -
54​21:26:51 <ostriches> I think thats how the quarter math works out
55​21:27:20 <robla> Phase 2: T130420 done hopefully by December 31 of this year
56​21:27:21 <stashbot> T130420: Goal: Phase 2 repository migrations -
57​21:28:12 <robla> phase 3: T130421 done hopefully by 2017-03-31
58​21:28:13 <stashbot> T130421: Goal: Phase 3 repository migrations -
59​21:28:35 <robla> does that sum up the plan about right?
60​21:28:39 <greg-g> yup, and the KPIs might be helpful to understand what we consider {{done}} along the way
61​21:28:44 <greg-g> (they're at the bottom of the doc)
62​21:29:41 <greg-g> ie: what "phase X" means :)
63​21:30:35 <robla> "Q1: By the end of Q1 we plan to have a system in place to manage Differential and Nodepool/Continuous Integration interaction, from the baseline of no system in place." (Q1 ends 2016-09-30, so the middle of Phase2, right?)
64​21:30:54 <greg-g> FY
65​21:31:18 <greg-g> the system will be in place before phase 2
66​21:31:33 <greg-g> phase 2 is in Q2, semantically luckily enough
67​21:31:59 <robla> there a numberless phase to this project? ;-)
68​21:32:15 <greg-g> I'm confused by the "in the middle of phase2" part
69​21:32:43 <greg-g> phase2 happens in Q2, building the glue happens in Q1...
70​21:32:59 * greg-g goes to get his hoodie he left outside, his office is suprisingly cold
71​21:33:23 <robla> greg-g: my apologies, I was extrapolating phases based on when the endpoints were
72​21:33:48 * greg-g nods
73​21:34:30 <greg-g> I was worried I miss-aligned something along the way and was not looking forward to copy/pasting a lot more
74​21:34:33 <greg-g> :)
75​21:34:42 <robla> since Phase 1 hopefully ends 2016-06-30, and Phase 2 hopefully ends by 2016-12-31, I put 2016-09-30 in the "middle of Phase 2"
76​21:35:03 <greg-g> ah, I see what you mean, yeah
77​21:35:49 <greg-g> as we imagined it (correct me if I'm wrong, ostriches ) is that there'd be a period of "build integration and respond to our phase1 users" before starting the phase 2, er phase
78​21:36:36 <ostriches> phase1 completion requires integration work to be done, yeah.
79​21:36:55 <robla> so, phase 1.1 ;-)
80​21:37:24 <greg-g> 1.uhoh? ;)
81​21:37:58 <greg-g> (the release after 1.0, to fix the inevitable bug you missed, for those that don't get the joke/context)
82​21:39:45 <robla> Phase 1: hopefully ends 2016-06-30, Phase 1.1: hopefully ends 2016-09-30, Phase 2: hopefully ends by 2016-12-31, Phase 3: hopefully ends by 2017-03-31
83​21:40:15 * greg-g nods
84​21:41:00 * robla looks for the RFC number for this RFC
85​21:41:16 <robla> T119908
86​21:41:16 <stashbot> T119908: [RfC]: Migrate code review / management to Phabricator from Gerrit -
87​21:41:44 <robla> #info T119908: Phase 1: hopefully ends 2016-06-30, Phase 1.1: hopefully ends 2016-09-30, Phase 2: hopefully ends by 2016-12-31, Phase 3: hopefully ends by 2017-03-31
88​21:41:44 <stashbot> T119908: [RfC]: Migrate code review / management to Phabricator from Gerrit -
89​21:42:26 <robla> alright should we talk about the other RFCs, or is that one the most interesting to get cleared up?
90​21:43:04 <greg-g> reminder of other topics:
91​21:43:24 <greg-g> I think matt_flaschen had a question about T123753
92​21:43:25 <stashbot> T123753: Establish retrospective reports for #security and #performance incidents -
93​21:43:37 <greg-g> robla: anything else you want/curious about from ostriches and I?
94​21:43:44 <robla> greg-g: ah, right, thanks for the reminder
95​21:44:08 * robla doesn't have any followup right now for the Gerrit->Phab stuff
96​21:44:22 * greg-g nods
97​21:46:02 <robla> by the way, gwicke, Krinkle , and I discussed putting the mbstring requirement RFC into last call....I'll bring that up after I answer matt_flaschen 's question
98​21:46:08 <robla> matt_flaschen: your question?
99​21:47:19 <matt_flaschen> robla, what I mentioned on the task: How would these new retros relate to the Incident reports we have currently?
100​21:48:41 <robla> matt_flaschen: I'm hoping we figure out some social norms around this
101​21:48:44 <robla> so...
102​21:49:51 <robla> what I would envision happening is that the WMF Security Team being able to flag things as "this should have a retrospective"
103​21:50:11 <robla> key word being "should". I don't envision there would be 100% compliance
104​21:50:13 <Scott_WUaS> Hi Megan!
105​21:50:37 <robla> (WMF Performance Team would be able to do the same)
106​21:52:06 <robla> the point would be that it would not be socially ok to create many security issues and never write a retrospective. at the same time, if the WMF Security Team got really fussy, I wouldn't envision there being 100% of the retrospectives written they suggest.
107​21:52:45 <robla> (same holds true for Performance)
108​21:53:11 <robla> matt_flaschen: does that make sense?
109​21:53:21 <matt_flaschen> robla, do you think we should do them at ? Potential advantage: As I mentioned on the task, line between "really bad performance" and "outage" is not always clear cut.
110​21:53:44 <Krinkle> matt_flaschen: I don't think robla is asking for a duplicative reporting. Take the save-timing regression as example. When this happened it was mostly on the performance team to do the full investigation and (in later stages) (maybe) delegate some actionables to the relevant maintainers of the code in regression.
111​21:53:58 <Krinkle> That's not a healthy or maintainable way of working.
112​21:54:54 <matt_flaschen> Krinkle, so you are you saying "Incident documentation" should be for documenting the immediate response, and there should be a separate retrospective of the full solution?
113​21:55:56 <matt_flaschen> If a performance problem could also be considered an outage.
114​21:56:01 <matt_flaschen> Which depends on the severity.
115​21:56:20 <Krinkle> I imagine if the regression is result of regular deployment, it is subsequently reverted and the relevant author/merger/maintainer should do the investigation (probably on Phabricator). The deployer (if they notice the regression) could write an immediate response on wikitech, but I'm not sure it's all that useful. It depends on how big/obvious the
116​21:56:21 <Krinkle> regression is. In most cases (at least until we have better automated measurements) it will be noticed hours/days later, in which case I think using wikitech/incident is overkill.
117​21:56:49 <Krinkle> matt_flaschen: I agree, but I'd say the severity threshold is at "If the deployer observed it" (in logs/alerts etc.)
118​21:57:21 <Krinkle> Which will slowly become a lower threshold as our infrastructure improves
119​21:57:21 <robla> I think the credibility of the Security and Performance teams is tied up in how frequently they suggest postmortems are needed. It's very subjective, and that seems ok to me.
120​21:57:33 <Krinkle> +1
121​21:57:53 <gwicke> the issue with many of the big systemic issues is that it would take a lot of time to write a proper description & evaluate possible solutions
122​21:58:34 <matt_flaschen> Thanks, Krinkle, that answers my question. Basically, do an incident report for severe perf issues (if you notice immediately when deploying), and do a retrospective on Phabricator if the Performance team asks for it (I would add "or if your team thinks it's a good idea").
123​21:59:06 <robla> by the way, we're coming up on the end of our hour, so I feel bad about ending the official part right on the top of the hour. I may run over a couple minutes, but probably not more
124​22:00:04 <Krinkle> matt_flaschen: Yeah, I don't think it's worthwhile pursuing a really strict rule that one can autonomously follow. It's mostly a quest to adopt and accept this as a normal social behaviour going forward. And to not interpret it as an assignment of blame.
125​22:00:22 <robla> #info general discussion, most of the hour on T119908 , and then the end of the hour on T123753
126​22:00:23 <stashbot> T123753: Establish retrospective reports for #security and #performance incidents -
127​22:00:23 <stashbot> T119908: [RfC]: Migrate code review / management to Phabricator from Gerrit -
128​22:02:02 <robla> #info T129435 ( RFC: drop support for running without mbstring) is going to be heading into last call
129​22:02:02 <stashbot> T129435: RFC: drop support for running without mbstring -
130​22:02:38 <robla> thanks everyone!
131​22:02:43 <robla> #endmeeting