Page MenuHomePhabricator
Paste P4092


Authored by RobLa-WMF on Sep 21 2016, 10:36 PM.
21:01:41 <robla> #startmeeting ArchCom Meeting about Multi-Content Revisions (T107595)
21:01:41 <wm-labs-meetbot> Meeting started Wed Sep 21 21:01:41 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at
21:01:41 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:41 <wm-labs-meetbot> The meeting name has been set to 'archcom_meeting_about_multi_content_revisions__t107595_'
21:01:41 <stashbot> T107595: [RFC] Multi-Content Revisions -
21:01:41 <wm-labs-meetbot`> Meeting started Wed Sep 21 21:01:41 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at
21:01:41 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:41 <wm-labs-meetbot`> The meeting name has been set to 'archcom_meeting_about_multi_content_revisions__t107595_'
21:01:42 <DanielK_WMDE> hm, I'm still wondering whether we should go for the details questions first to get stuff done, or the broader questions first, for guidance...
21:02:11 <robla> #topic Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) |​ Logs:
21:02:53 <robla> hi everyone
21:03:08 <DanielK_WMDE> robla: do you think it would be ok to talk about schema details for half an hour, and the cut off and move on to discussing the migration?
21:03:55 <robla> DanielK_WMDE: possibly. what are you hoping we accomplish in today's conversation?
21:04:19 <DanielK_WMDE> 1) sort out the remaining details of what the schema should look like
21:04:35 <DanielK_WMDE> 2) get feedback about whether the migration plan is sane
21:05:21 <Scott_WUaS> (Hello:)
21:06:02 <robla> DanielK_WMDE: I'm assuming we're not ready to actually resolve the schema in the course of this hour though, correct?
21:06:39 <DanielK_WMDE> not as a final decision. i do hope to get oppinions on my questions.
21:06:42 <TimStarling> that plan sounds good to me
21:06:52 <DanielK_WMDE> and perhaps even answers :)
21:07:06 <DanielK_WMDE> so, the most important question regarding the schema is whether we should add one layer of indirection, or two. Adding only one layer of indirection means repeating the meta-data about the content of each slot for every revision.
21:07:44 <Scott_WUaS> Can you please post an example URL - re "The idea of this RFC is to allow multiple Content objects to be associated with a single revision (one per "slot"), resulting in multiple content "streams" for each page"? In what ways are Wikidata Q items involved here?
21:07:47 <DanielK_WMDE> Doing it that way keeps the schema simpler, but means a lot of redundand data. The basic schema is then:
21:08:02 <DanielK_WMDE> Scott_WUaS: they are not involved
21:08:22 <Scott_WUaS> Thanks
21:08:24 <DanielK_WMDE> The "basic" version of the schema looks like this:
21:08:26 <DanielK_WMDE> [page] --page_current--> [revision] <--cont_revision-- [content] --cont_address--> (text|external)
21:08:38 <Scott_WUaS> ok
21:09:08 <DanielK_WMDE> As an alternative, we can add another table, the "slot" table, to tell us which content belongs to which revision, so the content-meta-data can be re-used for multiple (typically consecutive) revisions
21:09:44 <DanielK_WMDE> so if we store categories in a separate slot, and the categories are nto touched by 10 edits, we would recycle the meta-data about the content of the category slot 10 times.
21:09:51 <DanielK_WMDE> the schema would look like this:
21:09:57 <DanielK_WMDE> [page] --page_current--> [revision] <--slot_revision-- [slots] --slot_content--> [content] --cont_address--> (text|external)
21:10:10 <TimStarling> I guess we have no jynus this week
21:10:12 <Scott_WUaS> (DanielK_WMDE: Is there an existing example URL which you may develop further?)
21:11:01 <DanielK_WMDE> schema details:
21:11:03 <DanielK_WMDE> #link
21:11:15 <DanielK_WMDE> #link
21:11:31 <Scott_WUaS> thanks
21:11:48 <DanielK_WMDE> TimStarling: looks like it... who else would have an oppinion on the schema?
21:12:01 <robla> DanielK_WMDE: is there an asynchronous conversation that is still moving forward?
21:12:20 <DanielK_WMDE> no. not with me anywway
21:12:52 <TimStarling> I can try to be surrogate jynus and raise a few of his points
21:13:19 <brion> great :)
21:13:20 <DanielK_WMDE> TimStarling: that would be helpful.
21:13:24 <robla> my fear is that most of the asynchronous conversation has been in private email. that makes it hard to then hope for a good public IRC conversation
21:13:29 <robla> TimStarling: thanks!
21:13:35 <TimStarling> surrogate jynus says: you want to store media info in a slot. Let's have a media_info table
21:13:47 <brion> yeah need to distill it down, the email convos were pretty high-bandwidth :)
21:13:48 <TimStarling> then that table will be small and easy to handle
21:14:04 <SMalyshev> DanielK_WMDE:I wonder if it's good to hold current and old content in the same place...
21:14:14 <DanielK_WMDE> TimStarling: what would the media_info table contain? the actual json blob?
21:14:17 <TimStarling> in history, present a union between revision and media_info if users really really want that
21:14:34 <TimStarling> unclear
21:14:38 <brion> SMalyshev: that's actually a good point leading -> to ideas about partitioning 'hot' and 'cold' data. for another time probably but we need to be thinking about it at some point
21:14:54 <SMalyshev> if we're already refactoring DB structure...
21:14:59 <DanielK_WMDE> SMalyshev: so far, the answer looks like yes: moving data between tables when the current version becomes an archived version is a major pain.
21:15:10 <tgr> (nitpick: if the slot table is only used as a many-to-many binding between revision and content, can we just call it revision_content? it's hard to keep up with the terminology)
21:15:34 <DanielK_WMDE> SMalyshev: we (tim, mostly) moved main storage away from that 10 years ago, we are now planning to mave image meta data away from it too. but it's a possible parameter for partitioning.
21:15:43 <James_F> tgr: I think the idea is that some of the slots are revision_content_derivedcontent thought.
21:15:52 <bblack> from my perspective, what I'm really lacking about this MCR thing is any context on its higher-level purpose and utility. All of the details are deep, but no simple big picture about why we're doing this.
21:16:06 <James_F> tgr: E.g. revision 3 -> wikitext -> JSON representation of the template or whatever.
21:16:12 <DanielK_WMDE> tgr: i was called that, I changed it to be in line with the use of "slots" in the conceptual model. i don't care about the name
21:16:27 <brion> bblack: at a high level, we want to be able to break things out of wikitext into structured data that's still atomically versioned with the wikitext
21:16:35 <SMalyshev> DanielK_WMDE: what's the idea behind reusable content? I.e. is that useful for something?
21:16:46 <bblack> brion: higher-level than that :)
21:16:56 <TimStarling> bblack: there's a list of use cases
21:17:00 <brion> :)
21:17:02 <bblack> I mean, wikitext does have some kind of structure. a single content can hav einternal structure in general
21:17:07 <TimStarling>
21:17:21 <James_F> bblack: "We want to move awat from MW's 1:1 relationship between "page" and "content"."
21:17:24 <James_F> Err. Away.
21:17:24 <DanielK_WMDE> TimStarling: that "unclear" bit is the problem i have with discussing the "store in dedicated table" option. how will the content be versioned?
21:17:47 <DanielK_WMDE> bblack:
21:17:48 <TimStarling> DanielK_WMDE: it would be linked to page and have its own timestamp
21:17:59 <TimStarling> like a clone of revision
21:18:08 <DanielK_WMDE> TimStarling: and it's own edit comment, reference to user, and so on?
21:18:09 <James_F> TimStarling: So we'd JOIN on string-matched timestamps?
21:18:13 <TimStarling> yes
21:18:15 <James_F> Eww.
21:18:22 <TimStarling> no
21:18:30 <TimStarling> yes to DanielK_WMDE, no to James_F
21:18:34 <James_F> Ah.
21:18:36 <brion> a related alternative would be to have each 'slot' live in a separate table, but all use the same revision key with metadata in revision. thus text edits would (or could) live in a separate table from revision too
21:18:37 <DanielK_WMDE> TimStarling: so we would dublicate the revision table for each kind of content, and use unions everywhere we want to list revisions?
21:18:39 <James_F> So it would have the revision_id in it?
21:18:46 <brion> but you'd have a consistent revision_id and place to search on
21:19:16 <brion> but there's some benefit in consistency and normalization, especially when we need to bulk-fetch data for dumps or otherwise handle them opaquely
21:19:22 <TimStarling> at the SQL level you'd have several totally distinct revision concepts, like how oldimage and revision are separate now
21:19:34 <DanielK_WMDE> TimStarling: i can't see that working, it sounds hideously complex to me. but maybe i'm just not seeing the elegance of it all.
21:19:35 <robla> #chair robla brion DanielK_WMDE TimStarling
21:19:35 <wm-labs-meetbot> Current chairs: DanielK_WMDE TimStarling brion robla
21:19:35 <wm-labs-meetbot`> Current chairs: DanielK_WMDE TimStarling brion robla
21:19:37 <TimStarling> at the application layer these may optionally be merged by a UNION
21:19:42 <Scott_WUaS> (what are the implications for multiple languages and translation here in Multi-Content Revisions, if any?)
21:20:01 <DanielK_WMDE> brion: so, have one revision table, but basically one "content" table per slot?
21:20:02 * robla steps afk for 2 minutes
21:20:18 <brion> Scott_WUaS: interesting question. one _could_ store multiple wikitext Content items as well, one per language
21:20:19 <DanielK_WMDE> brion: that's more doable, but still needs big jons or unions.
21:20:23 <James_F> Scott_WUaS: "Complicated". There are options to fundamentally re-work Translate and parallel translation based on MCR, but this is a bit out of scope.
21:20:29 <brion> though i'm not sure it's ideal for the way translations get versioned
21:20:33 <James_F> brion: *cough*DOM-based translation*cough*
21:20:36 <bblack> FWIW, I think most of those use-cases sound like metadata more than parallel alternative content, except for the ones that seem like they could just be separate objects (e.g. template+css), or embedded documentation
21:20:48 <Scott_WUaS> thanks
21:20:59 <brion> bblack: the big reason i want MCR for 'separate objects' is atomic versioning
21:21:11 <TimStarling> having a high-level abstraction in MW around several similar tables is an idea that was mentioned in that book jynus was passing around
21:21:13 <brion> template + css, gadget js+css, etc
21:21:20 <TimStarling> you know, feature table and bug table
21:21:26 <James_F> bblack: File description (wikitext), meta-data (JSON), and file (pointer to the BLOB) versioned together is the ambition.
21:21:30 <DanielK_WMDE> TimStarling, brion: can we assume that the revision or content tables that would exist per slot would all contain *exactly* the same fields?
21:21:42 <TimStarling> no
21:22:05 <brion> i think if we had separate tables they'd explicitly want to be different, otherwise it's only a partitioning mechanism
21:22:15 <brion> but that changes the interfaces
21:22:17 <DanielK_WMDE> brion: that's what i'm thinking
21:22:19 <TimStarling> if they're exactly the same then you have sharding, and jynus doesn't really seem keen on sharding
21:22:22 <DanielK_WMDE> i just don't see how they would be different
21:22:30 <TimStarling> I'll switch back from being pseudo-jynus to TimStarling for a second
21:22:30 <brion> and for data where the structured data would go straight into a table that makes sense
21:22:37 <TimStarling> let's do sharding, I like sharding
21:22:38 <brion> for where everything's a big blob, i don't see the benefit of splitting
21:22:39 <brion> :)
21:22:52 <brion> what's your preferred axis to shard on here tim?
21:22:58 <James_F> TimStarling: Do we have a plan for stopping the current tables from getting "too long" other than sharding? (Ignoring this change, which might make the rate of growth faster.)
21:23:18 <DanielK_WMDE> TimStarling: yes, +1 for sharding/partitioning. let's have an RFC about that
21:23:33 <brion> yups
21:24:23 <TimStarling> well, the existing recentchanges partitioning hack splits on user ID
21:24:27 <bblack> brion: to level do you expect it to be atomic? you'd still be fetching js+css as 2x http fetches, right? it seems like there are ways to solve the problem of always fetching synced revs of such things simpler...
21:24:31 <brion> (i like the idea of a 'hot'/'cold' separation with a union-like interface, with a consistent revision id lineage so most things won't notice the difference other than potentially issuing two queries and combining them)
21:24:35 <robla> #info discussion of sharding for much of the first part of the meeting
21:24:40 <TimStarling> which optimises for contributions queries
21:24:43 <DanielK_WMDE> brion: re "everythign is a big blob": if we want to move away from that, we need a document oriented db. the content models we have would be a pain to model on an rdbms. not to mention that they would create absolutely humangous tables.
21:24:45 <James_F> I've been lazily assuming that at some point we'd shard revision based on something (modulo the page_id?) but I don't know what's ideal.
21:24:52 <brion> bblack: http? oh no i mean inside, like the parser
21:25:02 <brion> or the html that specified which js/css to load
21:25:36 <brion> anyway i think we should address sharding/partitioning later, more explicitly
21:25:37 <DanielK_WMDE> i would prefer to shard by mod(page_id). or timestamp blocks.
21:25:56 <James_F> Yeah, let's fork that to another RfC.
21:26:12 <TimStarling> one possibility is to duplicate the revision table: once with user-based sharding (for contributions), and again with page/timestamp sharding (for history)
21:26:22 <TimStarling> denormalize the revision table, in other words
21:26:22 <DanielK_WMDE> so, if that's for another rfc, can we move forward with this one?
21:26:44 <brion> bblack: so the alternative to atomic updates of multiple content blobs in one revision is to build another versioning abstraction on top of multiple pages
21:26:59 <brion> bblack: which is certainly possible too
21:27:08 <DanielK_WMDE> TimStarling: basically, duplicate it. yea.
21:27:23 <DanielK_WMDE> so, key question: is is ok to maintain the meta-data for all slot content in a single table?
21:27:35 <DanielK_WMDE> with sharding to be descussed?
21:27:44 <TimStarling> I think the key question is project order: does sharding/partitioning block MCR?
21:27:45 <bblack> brion: or question why we're trying to version-sync css+js inside wiki articles in the first place...
21:27:51 <brion> DanielK_WMDE: i say yes, as long as we keep it compact and have a future plan to shard that won't explode based on our changes :D
21:28:12 <brion> bblack: well "scratch mediawiki, just use github" is a third option ;)
21:28:30 <DanielK_WMDE> TimStarling: that's also an important question, yes, though i think we can decide on the schema without knowing whether implementation is blocked on sharding
21:28:34 <TimStarling> I suspect jynus is on the verge of vetoing MCR until we have better scalability
21:28:43 <brion> it seems to be ok to have _lots of rows_ (tall tables) as long as those table rows are small (narrow)
21:29:09 <TimStarling> data size is a relevant metric, yes
21:29:16 <DanielK_WMDE> TimStarling: i'm fine with him vetoing implementation on this grounds. but i need to know whether and how i should change the design.
21:29:36 <DanielK_WMDE> implementaion o nthe cluster = deployment
21:29:38 <TimStarling> for example, you have to copy all the data in a table during ALTER TABLE, and that is becoming a problem
21:29:54 <TimStarling> remember it was a problem in the olden days too
21:30:09 <bblack> brion: or any of the thousands of saner ways to develop->deploy css and js than "do it inside the wiki it's meant to operate on, shoe-horning it in as if it's like article content, and then remodel the wiki software to support that use case poorly"
21:31:01 <DanielK_WMDE> bblack: if you want it to be user-maintained, i don't really see an alternative. but the css/js use case isn't really at the focus of this.
21:31:06 <bblack> (not entirely fair, but as fair as your github retort)
21:31:22 <brion> bblack: oh sure, you're not wrong. :) there's tradeoffs in all these directions
21:31:41 <brion> and honestly using a git-oriented backend for code? not an awful ideal at all
21:31:53 <DanielK_WMDE> i'm stilly trying to find out whether i can go ahead with implementing the revision<-slot->content schema
21:32:04 <James_F> brion: It's on the backlog. Let's not get further distracted from the RfC. ;-)
21:32:07 <brion> but even if we broke out gadgets/userscripts we've got these on-wiki data objects :D
21:32:10 <brion> yep
21:32:12 <DanielK_WMDE> or whether all work on this needs to rest until we have an rfc on optimizing revision storage & sharding
21:32:24 <TimStarling> I don't see how you can implement it if you can't deploy it
21:32:25 <DanielK_WMDE> or whether there is a concrete request to change the db schema i propos
21:32:35 <SMalyshev> I get an impression that jynus has to answer that :)
21:33:17 <brion> jynus is always reluctant to use the veto power we keep wanting to give him :)
21:33:18 <brion> be gentle
21:33:27 <DanielK_WMDE> TimStarling: we can get the code ready for deployment while we are also working on, or deciding on, optimization strategies for revision storage.
21:33:37 <TimStarling> I don't think we're going to get on board with jynus's idea of splitting the revision concept
21:33:54 <TimStarling> but I think we should work by consensus
21:34:03 <brion> *nod*
21:34:24 <robla> is jynus's idea spelled out somewhere?
21:34:36 <DanielK_WMDE> so if we want consensus but won't get on board with his idea, then we need to convince him?...
21:34:45 <brion> we've got some bits of discussions, no concrete alt proposal
21:34:59 <TimStarling> robla: no, not really, he was reluctant to dive in and do fully worked schema
21:35:12 <TimStarling> DanielK_WMDE: right
21:35:35 <DanielK_WMDE> i have tried and failed
21:36:38 <robla> DanielK_WMDE: I think one thing that may be slowing this conversation down is it getting too bogged down in details
21:36:59 <robla> there's a *lot* to sort through here:
21:37:04 <TimStarling> I don't want to get into detail about tactics in this discussion
21:37:40 <TimStarling> how would it work to implement it but not deploy it? would you be able to have a feature flag in MW? or would it have to be a branch?
21:37:52 <DanielK_WMDE> robla: yes, that's why I announced only the schema bit as today's topic:
21:37:58 <James_F> Branch or unmerged commit.
21:38:21 <DanielK_WMDE> robla: that's already quite a bit, but I think it is managable.
21:38:32 <bblack> where are we meant to have the bigger discussion? I just don't get artchitecting the details before having some consensus that this is the right model for some real use-cases. The use-cases section mentions its own speculative nature, many of them are more metadata than parallel separate content, which is an entirely simpler case to handle. the rest are questionable, IMHO...
21:38:58 <bblack> maybe that's for my lack of information, but still
21:39:01 <DanielK_WMDE> TimStarling: We will need feature flags for the migration/transition anyway. So, yes.
21:39:22 <TimStarling> it would be nice to have say two initial use cases which will be initially implemented
21:39:31 <DanielK_WMDE> TimStarling: hopefully, if/the/else cruft can be kept to a minimum be swapping in alternative implementation of the relevant components.
21:40:13 * brion hmms
21:40:18 <DanielK_WMDE> TimStarling: thw first two in the list: MediaInfo and PageAssessments.
21:40:29 <James_F> That could work.
21:40:39 <DanielK_WMDE> MassMessage is also a hot candidate I think
21:40:44 <James_F> And TemplateData. ;-)
21:40:55 <James_F> (As it's so simple.)
21:41:00 <brion> ok i think i'm going to try fleshing out an alt proposal along some, but not all, of jynus and surrogate-jynus's lines, and we can just compare that
21:41:06 <TimStarling> presumably we will have an MCR-aware API, and all the if/else will be in the implementation of that API
21:41:08 <brion> it'll be good to have some key use cases to go along with that
21:41:29 <TimStarling> RevisionLookup
21:41:33 <DanielK_WMDE> bblack: if it's editable and versioned, it's not meta-data
21:41:50 <brion> cause if we do concentrate on cases where the secondary slots are special kinds of data, maybe extra tables aren't too awful. but maybe they are ;)
21:42:02 <DanielK_WMDE> TimStarling: yes, exactly
21:42:34 <TimStarling> maybe we should start moving towards rev_id being opaque rather than an auto-increment integer
21:42:55 <AaronSchulz> but still an integer?
21:42:57 <DanielK_WMDE> brion: i'm not thinking of secondary (derived) slots any more. just primary user editable content.
21:43:06 <TimStarling> a UUID might make more sense if it is sharded
21:43:07 <brion> right, sorry wrong term :)
21:43:13 <brion> i mean non-main-wikitext slots
21:43:25 <TimStarling> but yes, still an integer initially
21:43:30 <TimStarling> but maybe type-hinted as a string
21:43:34 <DanielK_WMDE> TimStarling: or a time-uuid. gabriel loves those.
21:43:38 <brion> TimStarling: for multi-master insert that can be important yes
21:43:48 <DanielK_WMDE> But they are big. We are trying to make that table smaller, right?
21:43:58 <brion> bigints are smaller :)
21:44:00 <DanielK_WMDE> (but we are discussing the revision table again)
21:44:02 <brion> i will just warn about Bigints and the JavaScript/node 53-bit limit though
21:44:16 <tgr> brion: extra tables would need some PHP-layer abstraction on top of our current DB abstraction, for all code that needs to search or iterate all content. That seems scary.
21:44:29 <DanielK_WMDE> tgr: very.
21:44:30 <AaronSchulz> reminds me of
21:44:45 * AaronSchulz almost forgot about that, haha
21:44:49 <brion> tgr: yeah, at least some would need to add to the tables joined on things. others would not actually need to touch those tables, though, and would only care about what's in revision i think
21:44:50 <TimStarling> brion: you are worried that we will exceed 2^53 rows in a table? ;)
21:45:07 <AaronSchulz> (of course half of that was wild experimentation that would never be used)
21:45:12 <brion> depends how fine-grained we make editing ;)
21:45:16 <James_F> AaronSchulz: PTSD flashbacks to that code? ;-)
21:45:17 <DanielK_WMDE> AaronSchulz: that's basically home grown partitioning, right?
21:45:26 <SMalyshev> non-sequential revids may be problematic as it'd be impossible to know the order
21:45:37 <brion> mianly i was thinking if we do something clever like a 64-bit mini uuid
21:46:02 <DanielK_WMDE> i'm getting worried that I'm stranded with this with no way to actively move forward.
21:46:11 <brion> yeah :(
21:47:00 <DanielK_WMDE> can i at least get some feedback on "super tall content table" vs "not-so-tall content table + super tall slots table"?
21:47:17 <brion> i am strongly in favor of super tall slots table
21:47:22 <SMalyshev> I like the second one better
21:47:23 <DanielK_WMDE> as in
21:47:24 <brion> lets us keep the content table much smaller
21:47:26 <TimStarling> bblack: maybe you can discuss your concerns on ?
21:47:43 <SMalyshev> if we're going to have huge table, it's better to have it as "narrow" as possible
21:47:46 <tgr> basically this proposal is blocked on deciding how to handle very tall tables, which is something that needs to be decided soon anyway, right?
21:48:00 <tgr> so maybe just give up for now and make that decision happen as soon as possible?
21:48:01 <DanielK_WMDE> ok. how about I work on some strawman code that allows us to look at the schema with some data in it, maybe on labs?
21:48:13 <DanielK_WMDE> would that help, or would it be a waste of time?
21:48:14 <subbu> is there a wiki page / talk page / phab task that discusses ops concerns with the MCR proposal?
21:48:30 <robla> subbu: I'm not aware of any
21:48:43 <DanielK_WMDE> tgr: i'm not sure there is a generic answer to that question. it may very much depend on the table.
21:49:41 <DanielK_WMDE> subbu: there is one comment by jynus:
21:49:58 <DanielK_WMDE> i frankly can't extract much guidance from it
21:50:56 <TimStarling> "I will create an alternative one" -- maybe we just need to nag jynus to write that
21:51:03 <robla> thanks for the refresher about the link, DanielK_WMDE
21:51:16 <DanielK_WMDE> #link
21:51:27 <DanielK_WMDE> TimStarling: please do, i'm quite curious
21:52:01 <robla> well, on the tactics front, I'm hoping that ArchCom doesn't become NagCom ;-)
21:52:08 <brion> heh
21:52:21 <DanielK_WMDE> or ArgCom...
21:52:47 <robla> I think it may be a useful conversation starter to *attempt* to come up with what jynus is shooting for
21:53:00 <AaronSchulz> DanielK_WMDE: I'm up for discussing partitioning, since I still remember thinking about that a lot in the past. My inclination is tall-and-narrow metadata => sharded blobs though
21:53:21 <DanielK_WMDE> robla: i honestly can't imagine how it would work. if i could, i would have propsoed it.
21:53:50 <DanielK_WMDE> AaronSchulz: yes, i'm with you there. And I also think we should discuss sharding.
21:53:53 <TimStarling> I mentioned some ideas about making revision narrower, jynus was receptive to those
21:54:02 <DanielK_WMDE> AaronSchulz: who's going to drive that conversation?
21:54:40 <TimStarling> like splitting out rev_comment, you know we have a bug to make rev_comment be larger than 255 bytes
21:54:56 * AaronSchulz shrugs...probably would be good to know about what parameters jynus wants
21:55:00 <brion> yeah rev_comment and rev_user_text are easy wins
21:55:03 <Scott_WUaS> (Hoping all can keep this helpful conversation going - )
21:55:43 <DanielK_WMDE> AaronSchulz: yes, that would be good
21:56:04 <DanielK_WMDE> TimStarling, brion: so, who's going to write an rfc about optimizing row size in rrevision?
21:56:18 <James_F> TimStarling: Would we make rev_comment just another slot?
21:56:41 <brion> i can do that if TimStarling isn't excited about it, we have some good ideas from last week's offline discussion
21:57:13 * brion compacts ALL the rows!
21:57:19 <DanielK_WMDE> brion: yay :)
21:57:27 <TimStarling> ok brion, compact away, I will comment on it
21:57:31 <DanielK_WMDE> i'm happy to help and give input, but i don't see me driving this
21:57:33 <DanielK_WMDE> too much on my plate
21:57:51 <brion> no worries
21:57:55 <DanielK_WMDE> the problem with pausing MCR is: i have mde room for this in my schedule *now*
21:57:56 <robla> #info 14:55:00 <brion> yeah rev_comment and rev_user_text are easy wins
21:58:06 <DanielK_WMDE> if we drop this for 3 months, I have *no* idea when i can get back on working on it
21:58:11 <brion> great i'll write those up next couple days
21:58:15 <DanielK_WMDE> it also pushes back the sche4dule for structured commons
21:58:31 <SMalyshev> do we need MCR for structured commons?
21:58:39 <Scott_WUaS> (Thanks, All!)
21:58:51 <SMalyshev> I mean need like "no way we can do structured commons without it"?
21:58:56 <marktraceur> It would be super if we could not delay that again...
21:59:06 <DanielK_WMDE> #info re "super tall content table" vs "not-so-tall content table + super tall slots table": <brion> i am strongly in favor of super tall slots table <SMalyshev> I like the second one better
21:59:43 <brion> #info brion to write up additional RfC on compacting rows in revision table (should apply with or without MCR)
22:00:06 <robla> ok...should we end the official part of this meeting on that?
22:00:09 <DanielK_WMDE> brion: will partitioning be part of that?
22:00:23 * robla plans to hit #endmeeting in 120 seconds
22:00:36 <DanielK_WMDE> SMalyshev: pretty much, yes.
22:00:41 <brion> DanielK_WMDE: not explicitly but i'll mention some related concerns
22:01:06 <brion> can expand to that if we decide we must super-prioritize it
22:01:12 <DanielK_WMDE> SMalyshev: at least if we want to stick to the product requirements as set out by the WMF back in the day.
22:01:17 <robla> brion, thanks for taking that on!
22:01:36 <TimStarling> DanielK_WMDE: well, you say you can implement it with a feature switch, which should be relatively uncontroversial
22:01:36 <brion> :D
22:01:36 <subbu> so, reading that talk page topic, iiuc, jynus is objecting to using a single unified table for all slots and prefers different tables for different slots?
22:02:09 <robla> we can continue the conversation in #wikimedia-tech for those that want to
22:02:27 <robla> thanks all!
22:02:32 <robla> #endmeeting