HomePhabricator

ArchCom RFC Meeting W38: Multi-Content Revisions (2016-09-21, #wikimedia-office)
ActivePublic

Hosted by daniel on Sep 21 2016, 9:00 PM - 10:00 PM.

Description

Agenda

  • Location: #wikimedia-office IRC channel
  • Meeting type: TBD
  • Time: Weekly, Wednesday 21:00 UTC (2pm PDT, 23:00 CEST)
  • Topic:
    • T107595 - Multi-Content Revisions

This meeting is mainly about the Content Meta-Data Storage for MCR.

Detail questions:

  • Do we want a single "names" table, or separate tables for different kinds of names, i.e. content_model, content_format, etc?
  • Can we drop need cont_hash (or cont_sha1) and cont_logical_size?
  • De we re-use or copy content rows?
  • If we re-use, does the role live in the content or in the slot label?

Broader questions:

  • Are the scaling and Efficiency estimates correct?
  • What options do we have for optimization?
  • Will the proposed migration plan work?

Meeting summary

Meeting ended at 22:02:32 UTC.

People present (lines said)

  • DanielK_WMDE (92)
  • brion (61)
  • TimStarling (52)
  • robla (25)
  • James_F (21)
  • bblack (10)
  • Scott_WUaS (10)
  • SMalyshev (9)
  • AaronSchulz (6)
  • tgr (4)
  • wm-labs-meetbot (4)
  • wm-labs-meetbot` (4)
  • subbu (2)
  • stashbot (1)
  • marktraceur (1)

Log

121:01:41 <robla> #startmeeting ArchCom Meeting about Multi-Content Revisions (T107595)
221:01:41 <wm-labs-meetbot> Meeting started Wed Sep 21 21:01:41 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:01:41 <wm-labs-meetbot> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:01:41 <wm-labs-meetbot> The meeting name has been set to 'archcom_meeting_about_multi_content_revisions__t107595_'
521:01:41 <stashbot> T107595: [RFC] Multi-Content Revisions - https://phabricator.wikimedia.org/T107595
621:01:41 <wm-labs-meetbot`> Meeting started Wed Sep 21 21:01:41 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
721:01:41 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
821:01:41 <wm-labs-meetbot`> The meeting name has been set to 'archcom_meeting_about_multi_content_revisions__t107595_'
921:01:42 <DanielK_WMDE> hm, I'm still wondering whether we should go for the details questions first to get stuff done, or the broader questions first, for guidance...
1021:02:11 <robla> #topic Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) |​ Logs: https://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
1121:02:53 <robla> hi everyone
1221:03:08 <DanielK_WMDE> robla: do you think it would be ok to talk about schema details for half an hour, and the cut off and move on to discussing the migration?
1321:03:55 <robla> DanielK_WMDE: possibly. what are you hoping we accomplish in today's conversation?
1421:04:19 <DanielK_WMDE> 1) sort out the remaining details of what the schema should look like
1521:04:35 <DanielK_WMDE> 2) get feedback about whether the migration plan is sane
1621:05:21 <Scott_WUaS> (Hello:)
1721:06:02 <robla> DanielK_WMDE: I'm assuming we're not ready to actually resolve the schema in the course of this hour though, correct?
1821:06:39 <DanielK_WMDE> not as a final decision. i do hope to get oppinions on my questions.
1921:06:42 <TimStarling> that plan sounds good to me
2021:06:52 <DanielK_WMDE> and perhaps even answers :)
2121:07:06 <DanielK_WMDE> so, the most important question regarding the schema is whether we should add one layer of indirection, or two. Adding only one layer of indirection means repeating the meta-data about the content of each slot for every revision.
2221:07:44 <Scott_WUaS> Can you please post an example URL - re "The idea of this RFC is to allow multiple Content objects to be associated with a single revision (one per "slot"), resulting in multiple content "streams" for each page"? In what ways are Wikidata Q items involved here?
2321:07:47 <DanielK_WMDE> Doing it that way keeps the schema simpler, but means a lot of redundand data. The basic schema is then:
2421:08:02 <DanielK_WMDE> Scott_WUaS: they are not involved
2521:08:22 <Scott_WUaS> Thanks
2621:08:24 <DanielK_WMDE> The "basic" version of the schema looks like this:
2721:08:26 <DanielK_WMDE> [page] --page_current--> [revision] <--cont_revision-- [content] --cont_address--> (text|external)
2821:08:38 <Scott_WUaS> ok
2921:09:08 <DanielK_WMDE> As an alternative, we can add another table, the "slot" table, to tell us which content belongs to which revision, so the content-meta-data can be re-used for multiple (typically consecutive) revisions
3021:09:44 <DanielK_WMDE> so if we store categories in a separate slot, and the categories are nto touched by 10 edits, we would recycle the meta-data about the content of the category slot 10 times.
3121:09:51 <DanielK_WMDE> the schema would look like this:
3221:09:57 <DanielK_WMDE> [page] --page_current--> [revision] <--slot_revision-- [slots] --slot_content--> [content] --cont_address--> (text|external)
3321:10:10 <TimStarling> I guess we have no jynus this week
3421:10:12 <Scott_WUaS> (DanielK_WMDE: Is there an existing example URL which you may develop further?)
3521:11:01 <DanielK_WMDE> schema details: https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Database_Schema
3621:11:03 <DanielK_WMDE> #link https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Database_Schema
3721:11:15 <DanielK_WMDE> #link https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Re-using_Content_Rows
3821:11:31 <Scott_WUaS> thanks
3921:11:48 <DanielK_WMDE> TimStarling: looks like it... who else would have an oppinion on the schema?
4021:12:01 <robla> DanielK_WMDE: is there an asynchronous conversation that is still moving forward?
4121:12:20 <DanielK_WMDE> no. not with me anywway
4221:12:52 <TimStarling> I can try to be surrogate jynus and raise a few of his points
4321:13:19 <brion> great :)
4421:13:20 <DanielK_WMDE> TimStarling: that would be helpful.
4521:13:24 <robla> my fear is that most of the asynchronous conversation has been in private email. that makes it hard to then hope for a good public IRC conversation
4621:13:29 <robla> TimStarling: thanks!
4721:13:35 <TimStarling> surrogate jynus says: you want to store media info in a slot. Let's have a media_info table
4821:13:47 <brion> yeah need to distill it down, the email convos were pretty high-bandwidth :)
4921:13:48 <TimStarling> then that table will be small and easy to handle
5021:14:04 <SMalyshev> DanielK_WMDE:I wonder if it's good to hold current and old content in the same place...
5121:14:14 <DanielK_WMDE> TimStarling: what would the media_info table contain? the actual json blob?
5221:14:17 <TimStarling> in history, present a union between revision and media_info if users really really want that
5321:14:34 <TimStarling> unclear
5421:14:38 <brion> SMalyshev: that's actually a good point leading -> to ideas about partitioning 'hot' and 'cold' data. for another time probably but we need to be thinking about it at some point
5521:14:54 <SMalyshev> if we're already refactoring DB structure...
5621:14:59 <DanielK_WMDE> SMalyshev: so far, the answer looks like yes: moving data between tables when the current version becomes an archived version is a major pain.
5721:15:10 <tgr> (nitpick: if the slot table is only used as a many-to-many binding between revision and content, can we just call it revision_content? it's hard to keep up with the terminology)
5821:15:34 <DanielK_WMDE> SMalyshev: we (tim, mostly) moved main storage away from that 10 years ago, we are now planning to mave image meta data away from it too. but it's a possible parameter for partitioning.
5921:15:43 <James_F> tgr: I think the idea is that some of the slots are revision_content_derivedcontent thought.
6021:15:52 <bblack> from my perspective, what I'm really lacking about this MCR thing is any context on its higher-level purpose and utility. All of the details are deep, but no simple big picture about why we're doing this.
6121:16:06 <James_F> tgr: E.g. revision 3 -> wikitext -> JSON representation of the template or whatever.
6221:16:12 <DanielK_WMDE> tgr: i was called that, I changed it to be in line with the use of "slots" in the conceptual model. i don't care about the name
6321:16:27 <brion> bblack: at a high level, we want to be able to break things out of wikitext into structured data that's still atomically versioned with the wikitext
6421:16:35 <SMalyshev> DanielK_WMDE: what's the idea behind reusable content? I.e. is that useful for something?
6521:16:46 <bblack> brion: higher-level than that :)
6621:16:56 <TimStarling> bblack: there's a list of use cases
6721:17:00 <brion> :)
6821:17:02 <bblack> I mean, wikitext does have some kind of structure. a single content can hav einternal structure in general
6921:17:07 <TimStarling> https://www.mediawiki.org/wiki/Multi-Content_Revisions#Use_Cases
7021:17:21 <James_F> bblack: "We want to move awat from MW's 1:1 relationship between "page" and "content"."
7121:17:24 <James_F> Err. Away.
7221:17:24 <DanielK_WMDE> TimStarling: that "unclear" bit is the problem i have with discussing the "store in dedicated table" option. how will the content be versioned?
7321:17:47 <DanielK_WMDE> bblack: https://www.mediawiki.org/wiki/Multi-Content_Revisions#Use_Cases
7421:17:48 <TimStarling> DanielK_WMDE: it would be linked to page and have its own timestamp
7521:17:59 <TimStarling> like a clone of revision
7621:18:08 <DanielK_WMDE> TimStarling: and it's own edit comment, reference to user, and so on?
7721:18:09 <James_F> TimStarling: So we'd JOIN on string-matched timestamps?
7821:18:13 <TimStarling> yes
7921:18:15 <James_F> Eww.
8021:18:22 <TimStarling> no
8121:18:30 <TimStarling> yes to DanielK_WMDE, no to James_F
8221:18:34 <James_F> Ah.
8321:18:36 <brion> a related alternative would be to have each 'slot' live in a separate table, but all use the same revision key with metadata in revision. thus text edits would (or could) live in a separate table from revision too
8421:18:37 <DanielK_WMDE> TimStarling: so we would dublicate the revision table for each kind of content, and use unions everywhere we want to list revisions?
8521:18:39 <James_F> So it would have the revision_id in it?
8621:18:46 <brion> but you'd have a consistent revision_id and place to search on
8721:19:16 <brion> but there's some benefit in consistency and normalization, especially when we need to bulk-fetch data for dumps or otherwise handle them opaquely
8821:19:22 <TimStarling> at the SQL level you'd have several totally distinct revision concepts, like how oldimage and revision are separate now
8921:19:34 <DanielK_WMDE> TimStarling: i can't see that working, it sounds hideously complex to me. but maybe i'm just not seeing the elegance of it all.
9021:19:35 <robla> #chair robla brion DanielK_WMDE TimStarling
9121:19:35 <wm-labs-meetbot> Current chairs: DanielK_WMDE TimStarling brion robla
9221:19:35 <wm-labs-meetbot`> Current chairs: DanielK_WMDE TimStarling brion robla
9321:19:37 <TimStarling> at the application layer these may optionally be merged by a UNION
9421:19:42 <Scott_WUaS> (what are the implications for multiple languages and translation here in Multi-Content Revisions, if any?)
9521:20:01 <DanielK_WMDE> brion: so, have one revision table, but basically one "content" table per slot?
9621:20:02 * robla steps afk for 2 minutes
9721:20:18 <brion> Scott_WUaS: interesting question. one _could_ store multiple wikitext Content items as well, one per language
9821:20:19 <DanielK_WMDE> brion: that's more doable, but still needs big jons or unions.
9921:20:23 <James_F> Scott_WUaS: "Complicated". There are options to fundamentally re-work Translate and parallel translation based on MCR, but this is a bit out of scope.
10021:20:29 <brion> though i'm not sure it's ideal for the way translations get versioned
10121:20:33 <James_F> brion: *cough*DOM-based translation*cough*
10221:20:36 <bblack> FWIW, I think most of those use-cases sound like metadata more than parallel alternative content, except for the ones that seem like they could just be separate objects (e.g. template+css), or embedded documentation
10321:20:48 <Scott_WUaS> thanks
10421:20:59 <brion> bblack: the big reason i want MCR for 'separate objects' is atomic versioning
10521:21:11 <TimStarling> having a high-level abstraction in MW around several similar tables is an idea that was mentioned in that book jynus was passing around
10621:21:13 <brion> template + css, gadget js+css, etc
10721:21:20 <TimStarling> you know, feature table and bug table
10821:21:26 <James_F> bblack: File description (wikitext), meta-data (JSON), and file (pointer to the BLOB) versioned together is the ambition.
10921:21:30 <DanielK_WMDE> TimStarling, brion: can we assume that the revision or content tables that would exist per slot would all contain *exactly* the same fields?
11021:21:42 <TimStarling> no
11121:22:05 <brion> i think if we had separate tables they'd explicitly want to be different, otherwise it's only a partitioning mechanism
11221:22:15 <brion> but that changes the interfaces
11321:22:17 <DanielK_WMDE> brion: that's what i'm thinking
11421:22:19 <TimStarling> if they're exactly the same then you have sharding, and jynus doesn't really seem keen on sharding
11521:22:22 <DanielK_WMDE> i just don't see how they would be different
11621:22:30 <TimStarling> I'll switch back from being pseudo-jynus to TimStarling for a second
11721:22:30 <brion> and for data where the structured data would go straight into a table that makes sense
11821:22:37 <TimStarling> let's do sharding, I like sharding
11921:22:38 <brion> for where everything's a big blob, i don't see the benefit of splitting
12021:22:39 <brion> :)
12121:22:52 <brion> what's your preferred axis to shard on here tim?
12221:22:58 <James_F> TimStarling: Do we have a plan for stopping the current tables from getting "too long" other than sharding? (Ignoring this change, which might make the rate of growth faster.)
12321:23:18 <DanielK_WMDE> TimStarling: yes, +1 for sharding/partitioning. let's have an RFC about that
12421:23:33 <brion> yups
12521:24:23 <TimStarling> well, the existing recentchanges partitioning hack splits on user ID
12621:24:27 <bblack> brion: to level do you expect it to be atomic? you'd still be fetching js+css as 2x http fetches, right? it seems like there are ways to solve the problem of always fetching synced revs of such things simpler...
12721:24:31 <brion> (i like the idea of a 'hot'/'cold' separation with a union-like interface, with a consistent revision id lineage so most things won't notice the difference other than potentially issuing two queries and combining them)
12821:24:35 <robla> #info discussion of sharding for much of the first part of the meeting
12921:24:40 <TimStarling> which optimises for contributions queries
13021:24:43 <DanielK_WMDE> brion: re "everythign is a big blob": if we want to move away from that, we need a document oriented db. the content models we have would be a pain to model on an rdbms. not to mention that they would create absolutely humangous tables.
13121:24:45 <James_F> I've been lazily assuming that at some point we'd shard revision based on something (modulo the page_id?) but I don't know what's ideal.
13221:24:52 <brion> bblack: http? oh no i mean inside, like the parser
13321:25:02 <brion> or the html that specified which js/css to load
13421:25:36 <brion> anyway i think we should address sharding/partitioning later, more explicitly
13521:25:37 <DanielK_WMDE> i would prefer to shard by mod(page_id). or timestamp blocks.
13621:25:56 <James_F> Yeah, let's fork that to another RfC.
13721:26:12 <TimStarling> one possibility is to duplicate the revision table: once with user-based sharding (for contributions), and again with page/timestamp sharding (for history)
13821:26:22 <TimStarling> denormalize the revision table, in other words
13921:26:22 <DanielK_WMDE> so, if that's for another rfc, can we move forward with this one?
14021:26:44 <brion> bblack: so the alternative to atomic updates of multiple content blobs in one revision is to build another versioning abstraction on top of multiple pages
14121:26:59 <brion> bblack: which is certainly possible too
14221:27:08 <DanielK_WMDE> TimStarling: basically, duplicate it. yea.
14321:27:23 <DanielK_WMDE> so, key question: is is ok to maintain the meta-data for all slot content in a single table?
14421:27:35 <DanielK_WMDE> with sharding to be descussed?
14521:27:44 <TimStarling> I think the key question is project order: does sharding/partitioning block MCR?
14621:27:45 <bblack> brion: or question why we're trying to version-sync css+js inside wiki articles in the first place...
14721:27:51 <brion> DanielK_WMDE: i say yes, as long as we keep it compact and have a future plan to shard that won't explode based on our changes :D
14821:28:12 <brion> bblack: well "scratch mediawiki, just use github" is a third option ;)
14921:28:30 <DanielK_WMDE> TimStarling: that's also an important question, yes, though i think we can decide on the schema without knowing whether implementation is blocked on sharding
15021:28:34 <TimStarling> I suspect jynus is on the verge of vetoing MCR until we have better scalability
15121:28:43 <brion> it seems to be ok to have _lots of rows_ (tall tables) as long as those table rows are small (narrow)
15221:29:09 <TimStarling> data size is a relevant metric, yes
15321:29:16 <DanielK_WMDE> TimStarling: i'm fine with him vetoing implementation on this grounds. but i need to know whether and how i should change the design.
15421:29:36 <DanielK_WMDE> implementaion o nthe cluster = deployment
15521:29:38 <TimStarling> for example, you have to copy all the data in a table during ALTER TABLE, and that is becoming a problem
15621:29:54 <TimStarling> remember it was a problem in the olden days too
15721:30:09 <bblack> brion: or any of the thousands of saner ways to develop->deploy css and js than "do it inside the wiki it's meant to operate on, shoe-horning it in as if it's like article content, and then remodel the wiki software to support that use case poorly"
15821:31:01 <DanielK_WMDE> bblack: if you want it to be user-maintained, i don't really see an alternative. but the css/js use case isn't really at the focus of this.
15921:31:06 <bblack> (not entirely fair, but as fair as your github retort)
16021:31:22 <brion> bblack: oh sure, you're not wrong. :) there's tradeoffs in all these directions
16121:31:41 <brion> and honestly using a git-oriented backend for code? not an awful ideal at all
16221:31:53 <DanielK_WMDE> i'm stilly trying to find out whether i can go ahead with implementing the revision<-slot->content schema
16321:32:04 <James_F> brion: It's on the backlog. Let's not get further distracted from the RfC. ;-)
16421:32:07 <brion> but even if we broke out gadgets/userscripts we've got these on-wiki data objects :D
16521:32:10 <brion> yep
16621:32:12 <DanielK_WMDE> or whether all work on this needs to rest until we have an rfc on optimizing revision storage & sharding
16721:32:24 <TimStarling> I don't see how you can implement it if you can't deploy it
16821:32:25 <DanielK_WMDE> or whether there is a concrete request to change the db schema i propos
16921:32:35 <SMalyshev> I get an impression that jynus has to answer that :)
17021:33:17 <brion> jynus is always reluctant to use the veto power we keep wanting to give him :)
17121:33:18 <brion> be gentle
17221:33:27 <DanielK_WMDE> TimStarling: we can get the code ready for deployment while we are also working on, or deciding on, optimization strategies for revision storage.
17321:33:37 <TimStarling> I don't think we're going to get on board with jynus's idea of splitting the revision concept
17421:33:54 <TimStarling> but I think we should work by consensus
17521:34:03 <brion> *nod*
17621:34:24 <robla> is jynus's idea spelled out somewhere?
17721:34:36 <DanielK_WMDE> so if we want consensus but won't get on board with his idea, then we need to convince him?...
17821:34:45 <brion> we've got some bits of discussions, no concrete alt proposal
17921:34:59 <TimStarling> robla: no, not really, he was reluctant to dive in and do fully worked schema
18021:35:12 <TimStarling> DanielK_WMDE: right
18121:35:35 <DanielK_WMDE> i have tried and failed
18221:36:38 <robla> DanielK_WMDE: I think one thing that may be slowing this conversation down is it getting too bogged down in details
18321:36:59 <robla> there's a *lot* to sort through here: https://www.mediawiki.org/wiki/Multi-Content_Revisions
18421:37:04 <TimStarling> I don't want to get into detail about tactics in this discussion
18521:37:40 <TimStarling> how would it work to implement it but not deploy it? would you be able to have a feature flag in MW? or would it have to be a branch?
18621:37:52 <DanielK_WMDE> robla: yes, that's why I announced only the schema bit as today's topic: https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data
18721:37:58 <James_F> Branch or unmerged commit.
18821:38:21 <DanielK_WMDE> robla: that's already quite a bit, but I think it is managable.
18921:38:32 <bblack> where are we meant to have the bigger discussion? I just don't get artchitecting the details before having some consensus that this is the right model for some real use-cases. The use-cases section mentions its own speculative nature, many of them are more metadata than parallel separate content, which is an entirely simpler case to handle. the rest are questionable, IMHO...
19021:38:58 <bblack> maybe that's for my lack of information, but still
19121:39:01 <DanielK_WMDE> TimStarling: We will need feature flags for the migration/transition anyway. So, yes.
19221:39:22 <TimStarling> it would be nice to have say two initial use cases which will be initially implemented
19321:39:31 <DanielK_WMDE> TimStarling: hopefully, if/the/else cruft can be kept to a minimum be swapping in alternative implementation of the relevant components.
19421:40:13 * brion hmms
19521:40:18 <DanielK_WMDE> TimStarling: thw first two in the list: MediaInfo and PageAssessments.
19621:40:29 <James_F> That could work.
19721:40:39 <DanielK_WMDE> MassMessage is also a hot candidate I think
19821:40:44 <James_F> And TemplateData. ;-)
19921:40:55 <James_F> (As it's so simple.)
20021:41:00 <brion> ok i think i'm going to try fleshing out an alt proposal along some, but not all, of jynus and surrogate-jynus's lines, and we can just compare that
20121:41:06 <TimStarling> presumably we will have an MCR-aware API, and all the if/else will be in the implementation of that API
20221:41:08 <brion> it'll be good to have some key use cases to go along with that
20321:41:29 <TimStarling> RevisionLookup
20421:41:33 <DanielK_WMDE> bblack: if it's editable and versioned, it's not meta-data
20521:41:50 <brion> cause if we do concentrate on cases where the secondary slots are special kinds of data, maybe extra tables aren't too awful. but maybe they are ;)
20621:42:02 <DanielK_WMDE> TimStarling: yes, exactly
20721:42:34 <TimStarling> maybe we should start moving towards rev_id being opaque rather than an auto-increment integer
20821:42:55 <AaronSchulz> but still an integer?
20921:42:57 <DanielK_WMDE> brion: i'm not thinking of secondary (derived) slots any more. just primary user editable content.
21021:43:06 <TimStarling> a UUID might make more sense if it is sharded
21121:43:07 <brion> right, sorry wrong term :)
21221:43:13 <brion> i mean non-main-wikitext slots
21321:43:25 <TimStarling> but yes, still an integer initially
21421:43:30 <TimStarling> but maybe type-hinted as a string
21521:43:34 <DanielK_WMDE> TimStarling: or a time-uuid. gabriel loves those.
21621:43:38 <brion> TimStarling: for multi-master insert that can be important yes
21721:43:48 <DanielK_WMDE> But they are big. We are trying to make that table smaller, right?
21821:43:58 <brion> bigints are smaller :)
21921:44:00 <DanielK_WMDE> (but we are discussing the revision table again)
22021:44:02 <brion> i will just warn about Bigints and the JavaScript/node 53-bit limit though
22121:44:16 <tgr> brion: extra tables would need some PHP-layer abstraction on top of our current DB abstraction, for all code that needs to search or iterate all content. That seems scary.
22221:44:29 <DanielK_WMDE> tgr: very.
22321:44:30 <AaronSchulz> reminds me of https://gerrit.wikimedia.org/r/#/c/16696/20/includes/rdbstore/RDBStore.php
22421:44:45 * AaronSchulz almost forgot about that, haha
22521:44:49 <brion> tgr: yeah, at least some would need to add to the tables joined on things. others would not actually need to touch those tables, though, and would only care about what's in revision i think
22621:44:50 <TimStarling> brion: you are worried that we will exceed 2^53 rows in a table? ;)
22721:45:07 <AaronSchulz> (of course half of that was wild experimentation that would never be used)
22821:45:12 <brion> depends how fine-grained we make editing ;)
22921:45:16 <James_F> AaronSchulz: PTSD flashbacks to that code? ;-)
23021:45:17 <DanielK_WMDE> AaronSchulz: that's basically home grown partitioning, right?
23121:45:26 <SMalyshev> non-sequential revids may be problematic as it'd be impossible to know the order
23221:45:37 <brion> mianly i was thinking if we do something clever like a 64-bit mini uuid
23321:46:02 <DanielK_WMDE> i'm getting worried that I'm stranded with this with no way to actively move forward.
23421:46:11 <brion> yeah :(
23521:47:00 <DanielK_WMDE> can i at least get some feedback on "super tall content table" vs "not-so-tall content table + super tall slots table"?
23621:47:17 <brion> i am strongly in favor of super tall slots table
23721:47:22 <SMalyshev> I like the second one better
23821:47:23 <DanielK_WMDE> as in https://www.mediawiki.org/wiki/Multi-Content_Revisions/Content_Meta-Data#Re-using_Content_Rows
23921:47:24 <brion> lets us keep the content table much smaller
24021:47:26 <TimStarling> bblack: maybe you can discuss your concerns on https://www.mediawiki.org/wiki/Talk:Multi-Content_Revisions ?
24121:47:43 <SMalyshev> if we're going to have huge table, it's better to have it as "narrow" as possible
24221:47:46 <tgr> basically this proposal is blocked on deciding how to handle very tall tables, which is something that needs to be decided soon anyway, right?
24321:48:00 <tgr> so maybe just give up for now and make that decision happen as soon as possible?
24421:48:01 <DanielK_WMDE> ok. how about I work on some strawman code that allows us to look at the schema with some data in it, maybe on labs?
24521:48:13 <DanielK_WMDE> would that help, or would it be a waste of time?
24621:48:14 <subbu> is there a wiki page / talk page / phab task that discusses ops concerns with the MCR proposal?
24721:48:30 <robla> subbu: I'm not aware of any
24821:48:43 <DanielK_WMDE> tgr: i'm not sure there is a generic answer to that question. it may very much depend on the table.
24921:49:41 <DanielK_WMDE> subbu: there is one comment by jynus: https://www.mediawiki.org/wiki/Topic:Tb6fok3z43ar16fe
25021:49:58 <DanielK_WMDE> i frankly can't extract much guidance from it
25121:50:56 <TimStarling> "I will create an alternative one" -- maybe we just need to nag jynus to write that
25221:51:03 <robla> thanks for the refresher about the link, DanielK_WMDE
25321:51:16 <DanielK_WMDE> #link https://www.mediawiki.org/wiki/Topic:Tb6fok3z43ar16fe
25421:51:27 <DanielK_WMDE> TimStarling: please do, i'm quite curious
25521:52:01 <robla> well, on the tactics front, I'm hoping that ArchCom doesn't become NagCom ;-)
25621:52:08 <brion> heh
25721:52:21 <DanielK_WMDE> or ArgCom...
25821:52:47 <robla> I think it may be a useful conversation starter to *attempt* to come up with what jynus is shooting for
25921:53:00 <AaronSchulz> DanielK_WMDE: I'm up for discussing partitioning, since I still remember thinking about that a lot in the past. My inclination is tall-and-narrow metadata => sharded blobs though
26021:53:21 <DanielK_WMDE> robla: i honestly can't imagine how it would work. if i could, i would have propsoed it.
26121:53:50 <DanielK_WMDE> AaronSchulz: yes, i'm with you there. And I also think we should discuss sharding.
26221:53:53 <TimStarling> I mentioned some ideas about making revision narrower, jynus was receptive to those
26321:54:02 <DanielK_WMDE> AaronSchulz: who's going to drive that conversation?
26421:54:40 <TimStarling> like splitting out rev_comment, you know we have a bug to make rev_comment be larger than 255 bytes
26521:54:56 * AaronSchulz shrugs...probably would be good to know about what parameters jynus wants
26621:55:00 <brion> yeah rev_comment and rev_user_text are easy wins
26721:55:03 <Scott_WUaS> (Hoping all can keep this helpful conversation going - https://www.mediawiki.org/wiki/Topic:Tb6fok3z43ar16fe )
26821:55:43 <DanielK_WMDE> AaronSchulz: yes, that would be good
26921:56:04 <DanielK_WMDE> TimStarling, brion: so, who's going to write an rfc about optimizing row size in rrevision?
27021:56:18 <James_F> TimStarling: Would we make rev_comment just another slot?
27121:56:41 <brion> i can do that if TimStarling isn't excited about it, we have some good ideas from last week's offline discussion
27221:57:13 * brion compacts ALL the rows!
27321:57:19 <DanielK_WMDE> brion: yay :)
27421:57:27 <TimStarling> ok brion, compact away, I will comment on it
27521:57:31 <DanielK_WMDE> i'm happy to help and give input, but i don't see me driving this
27621:57:33 <DanielK_WMDE> too much on my plate
27721:57:51 <brion> no worries
27821:57:55 <DanielK_WMDE> the problem with pausing MCR is: i have mde room for this in my schedule *now*
27921:57:56 <robla> #info 14:55:00 <brion> yeah rev_comment and rev_user_text are easy wins
28021:58:06 <DanielK_WMDE> if we drop this for 3 months, I have *no* idea when i can get back on working on it
28121:58:11 <brion> great i'll write those up next couple days
28221:58:15 <DanielK_WMDE> it also pushes back the sche4dule for structured commons
28321:58:31 <SMalyshev> do we need MCR for structured commons?
28421:58:39 <Scott_WUaS> (Thanks, All!)
28521:58:51 <SMalyshev> I mean need like "no way we can do structured commons without it"?
28621:58:56 <marktraceur> It would be super if we could not delay that again...
28721:59:06 <DanielK_WMDE> #info re "super tall content table" vs "not-so-tall content table + super tall slots table": <brion> i am strongly in favor of super tall slots table <SMalyshev> I like the second one better
28821:59:43 <brion> #info brion to write up additional RfC on compacting rows in revision table (should apply with or without MCR)
28922:00:06 <robla> ok...should we end the official part of this meeting on that?
29022:00:09 <DanielK_WMDE> brion: will partitioning be part of that?
29122:00:23 * robla plans to hit #endmeeting in 120 seconds
29222:00:36 <DanielK_WMDE> SMalyshev: pretty much, yes.
29322:00:41 <brion> DanielK_WMDE: not explicitly but i'll mention some related concerns
29422:01:06 <brion> can expand to that if we decide we must super-prioritize it
29522:01:12 <DanielK_WMDE> SMalyshev: at least if we want to stick to the product requirements as set out by the WMF back in the day.
29622:01:17 <robla> brion, thanks for taking that on!
29722:01:36 <TimStarling> DanielK_WMDE: well, you say you can implement it with a feature switch, which should be relatively uncontroversial
29822:01:36 <brion> :D
29922:01:36 <subbu> so, reading that talk page topic, iiuc, jynus is objecting to using a single unified table for all slots and prefers different tables for different slots?
30022:02:09 <robla> we can continue the conversation in #wikimedia-tech for those that want to
30122:02:27 <robla> thanks all!
30222:02:32 <robla> #endmeeting

Other meetings

Architecture meetings
13:00 PT ArchCom Planning Meetingsupcomingall since 2016-03-30
14:00 PT ArchCom-RFC Meetingsupcomingall since 2015-09-09

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

RobLa-WMF renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting W38: Multi-Content Revisions (2016-09-21, #wikimedia-office).Sep 7 2016, 11:37 PM
RobLa-WMF updated the event description. (Show Details)
daniel updated the event description. (Show Details)Sep 19 2016, 6:21 PM
RobLa-WMF updated the event description. (Show Details)Sep 21 2016, 10:41 PM
daniel renamed this event from ArchCom RFC Meeting W38: Multi-Content Revisions (2016-09-21, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
daniel updated the event description. (Show Details)Dec 9 2016, 7:41 AM
daniel renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting W38: Multi-Content Revisions (2016-09-21, #wikimedia-office).