21:04:09 #startmeeting oldimage/image discussion T589 21:04:09 Meeting started Wed Aug 31 21:04:09 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:04:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:04:09 The meeting name has been set to 'oldimage_image_discussion_t589' 21:04:09 Meeting started Wed Aug 31 21:04:09 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:04:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:04:09 The meeting name has been set to 'oldimage_image_discussion_t589' 21:04:09 T589: RfC: image and oldimage tables - https://phabricator.wikimedia.org/T589 21:04:18 #topic Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ 21:04:51 hi everyone! 21:05:09 hi 21:05:29 hi All! 21:06:52 hi 21:06:58 I am a bit confused by this topic, I think most people agreed in general about the idea; implementation details could be filled and discussed offline? 21:07:13 Oh herp derp I'm here :) 21:07:23 it may be a short discussion then :D 21:07:25 * ostriches puts on his releng hat 21:07:48 wait, it was off the rest of the day? 21:08:12 have you seen the comment Krinkle wrote after the last meeting? 21:08:17 specially because I didn't see any were concerns, just a question 21:08:20 jynus: we found that you were giving a lot of great feedback. Krinkle was hoping for more input on release strategy 21:08:25 greg-g: I was wearing https://www.amazon.com/Adult-Propeller-Beanie-Hat-Made/dp/B001QK4RZC instead 21:08:29 Hi 21:08:44 So the main questions for this meeting are written here - https://phabricator.wikimedia.org/T589#2541630 21:08:46 no, that's a mistake; stop listening to what I say! :-) 21:08:51 These were also on wikitech-l 21:08:57 3 open questions. 21:09:26 robla: Shall I start with the first one? 21:09:31 sure 21:10:22 So before I do, I want to help set the stage and clarify any misunderstanding about the scope of this RFC. This was raised in the previous IRC meeting. We cleared it up at the time but not everyone may be caught up with that. https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tables#Problems 21:10:41 We've reduced the scope to two core problems we want to address. 21:10:48 1. File revisions should have better unique identifiers than "current file title + timestamp". 21:10:54 2. Uploading file revisions must not involve rows moving across tables, or rows being replaced. 21:11:26 The first one is primarily from an architecture perspective to simplify a lot of our internal classes and API consuming, as well as to remove conflicts in race conditions. 21:11:55 The second one is because of an anti-pattern in our database schema that jynus wants us to get rid of (and I wholeheartly agree) 21:12:08 2 is to stop commons from exploding like it happens every 6 months 21:12:19 :) 21:12:29 jynus: exploding as in replication failure? 21:12:29 and "make Labs great again" 21:12:44 yes, TimStarling 21:13:11 it has to be a special case for it to happen in production, but it definitely happens on labs 21:13:22 how is this related to multi-content revisions? it seems like a plan to refactor something that will need to be refactored again not much later 21:13:23 like a master failover 21:14:19 tgr: I think Multi-content revisions is for the future right now. It's too big a step to move into right away. However once MCR is stable and used elsewhere, by doing this RFC, we'll be in a position where migrating to MCR is fairly straight forward. 21:14:32 Krinkle, +1 21:15:06 question 1 was what fields to keep in image? 21:15:09 yea, what Krinkle said... is discussing the MCR option in scope of this meeting? I like the idea, but it would probably mean shelving this until we have MCR, and have some experience with it (let's say, in a year). 21:15:29 Yeah, we more or less shelved MCR for file tables in the last meeting. 21:15:36 But it's a good question to ask. 21:15:54 I think it is worth it, once I propose a solution for 3; independently of the status of MCR 21:15:54 migrating twice is extra work, but this sounds urgent, going for the more immediate options seems wise for now. 21:16:21 TimStarling: Yeah. So question 1: Given that we'll put all image revisions in 1 table (instead of one for current and one for older ones), which fields do we want to keep in our generic 'image' table (the equivalent of 'page' basically). 21:16:48 I notice that the following fields have indexes: img_name, img_timestamp, img_user_text, img_sha1, img_media_type, img_major_mime, img_minor_mime 21:16:56 The minimal set of fields would be: img_id, img_name, img_latest 21:17:08 DanielK_WMDE: urgent? 21:17:38 img_sha1 is used to provide duplicate upload warnings 21:17:52 AaronSchulz: if i understood correctly, jynus sais commons blows up because of data being moved between tables 21:18:29 "blows up" could be clarified 21:18:30 AaronSchulz: the point being: the problems is urgent. while managing file revisions in mcr is a nice idea, it's blocked for a while. so let's do something else for now 21:18:33 it's probably still needed for that purpose 21:18:40 Yeah, so we'll need to keep that in sync. This is not unprecedented as we do this for page as well (page_touched, page_is_redirect, page_len) whenever we update page_latest. But ideally fewer is better I guess. 21:18:42 AaronSchulz, https://wikitech.wikimedia.org/wiki/Incident_documentation/20160705-commons-replication 21:19:06 We also have a query page (special page and API) for finding files by mime type. 21:19:08 isn't that mostly just INSERT SELECT 21:19:11 ? 21:19:21 the img_user_text image is for contributions, that will be replaced by a filerevision index 21:19:22 AaronSchulz, yes 21:19:25 I would still like to see content-based image ids (hashes) rather than other random ids if that's feasible to do while we are at it 21:19:29 s/image/index/ 21:19:46 gwicke: For the revisions? 21:19:53 the img_media_mime is a joke, do we seriously use that? 21:19:59 yes, for each image original 21:20:09 gwicke: We do currently have duplicates, however. 21:20:22 gwicke, do not we have already a hash column? 21:20:22 And given we are not separating content from revision right now, that wouldn't be semantically possible. 21:20:34 jynus: we only do that for moving the current to the oldimage table and the current to the archive table (e.g. 1 row). The others now do the SELECT in the app. 21:20:35 #info agreement to shelve the idea to manage upload history using MCR. Discussing the two-table file / file_revision option for now. 21:20:42 since after a rollback it'd be the same content, but a different timestamp+user. 21:21:10 Krinkle: how does the current proposal support revision deletion? (do we support rev-deletion for images at all right now?) 21:21:25 Krinkle: documenting those considerations on the RFC might be useful 21:21:32 jynus: https://gerrit.wikimedia.org/r/#/c/307078/ will banish even those 21:21:42 for later reference, as well 21:22:26 gwicke: Even if we separate them later, we still need a revision ID. Which is the first step (this step). 21:22:56 TimStarling: the original idea was to allow people to search for "videos" or "bitmaps". never happened, should be done with elastic now. img_media_type can die in the db. the concept may still be useful in code, even if we just feed it to the search engine. 21:23:05 AaronSchulz, I say thanks, but why moving things in the first place? 21:23:05 unless we decide that having two different copies of the same thing & under possibly different licenses isn't really useful 21:23:17 Special:MIMEsearch.php mentions that the index is wrong 21:23:21 SMalyshev: does cirrus index media type (not mime) for files? 21:23:22 but, agreed, that brings up a whole can of worms 21:23:22 we do support rev-deletion for images, but that just updates the value of a bitfield, does not seem problematic 21:23:26 DanielK_WMDE: Elastic specfiic? Or possible in mysql default search too? 21:23:26 it doesn't contain a sorting field 21:23:27 jynus: oh, the moving still sucks, I'm just saying the urgent part isn't apparent to me 21:23:38 DanielK_WMDE: do you mean normal deletion of single revisions? 21:23:46 tgr: single revision. 21:23:48 I'm all on board for refactoring to be have filerevision 21:23:50 AaronSchulz, I want things fixed, I do not matter how 21:23:56 DanielK_WMDE: hmm not sure 21:24:02 I think he means 'revision deletion' (hiding of revisions within the same table, like rev_deleted) 21:24:06 DanielK_WMDE, SMalyshev: I don't believe we do 21:24:10 I don't think right now a lot is indexed for files 21:24:17 let me check 21:24:18 And yes, oi_deleted exists. 21:24:31 Krinkle: we have a field-based system for feeding the search engine now. core could expose the field, but the default (sql) search engine would just ignore it for now 21:24:32 that would ideally need to be integrated with the SearchEngine interface in core 21:24:43 AaronSchulz, I wasn't the one saying it was urgent; however, I think MCR is a bit ambitious to block lots of small changes on it 21:25:20 TimStarling: SMalyshev has recently implemented an interface that could be used to do that I think 21:25:41 Sidenote: SearchEngine & friends are terrible and need a lot of cleanup. 21:25:44 (never got to that) 21:25:45 Current fields we are discussing: https://www.mediawiki.org/wiki/Manual:Image_table 21:25:47 jynus, DanielK_WMDE: I just want us not to feel a need to rush it :) 21:25:55 Krinkle: that's being done via img_deleted / oi_deleted / fa_deleted currently, does not seem hard to migrate 21:26:13 so let's suppose that Special:MIMEsearch can be deleted from core 21:26:28 that would let us remove that index 21:26:29 * gwicke would be very interested in a summary outlining how this will integrate with the longer term image naming roadmap, including things like content based addressing 21:26:29 tgr: Yeah, oi_deleted is just a revision property. That'll migrate just fine and isn't part of the generic image object. 21:26:32 but normal deletion does move rows to filearchive (and page revisions work identically), do we intend to change that? 21:26:35 AaronSchulz: well, if it's ok to push this back for a year, I'm all for the MCR option :) 21:26:35 DanielK_WMDE: I don't think for files we index more than text 21:26:38 (the same way there is no rev_del in the page table) 21:26:39 the only thing related to fields I do not have clear is the whole old-revision-handling 21:26:54 tgr: No, archiving is not in the scope of this RFC. 21:26:57 SMalyshev: there is some special case code for file pages, let me find it... 21:26:57 does that require some tuning on indexes, extra fields, etc? 21:27:25 DanielK_WMDE: we could of course add more fields, shouldn't be hard 21:27:52 do we have currently issues with lots of old revisions of the same file? 21:28:00 gwicke: I'd say the outline is that it'll be a whole lot easier once we have primary keys for images and their revisions. It's mostly just technical debt that makes everything easier after it. 21:28:30 jynus: For some files, yes. Most files have very few revisions. 21:28:34 So, long tail. 21:28:45 We should think about those things a little, but not much imho as there isn't much we could do wrong or would different in that perspective. We're just moving the table around basically. 21:28:47 from what I saw, there were relatively very few old revisions 21:28:55 Yeah, that surprised me too. 21:29:06 I expected there to be more old revisions than current revisions. 21:29:12 But we didn't include filearchive here 21:29:13 meh, I've known that for a while 21:29:16 That doesn't surprise me. 21:29:20 DanielK_WMDE: though I wonder if there's intersection between this and structured commons work... 21:29:26 The vast majority of files get uploaded and never edited again 21:29:28 yeah, it makes sense that most images don't have multiple revisions. 21:29:32 I remember when I added paging to the file history due to big history files 21:29:44 On the other hand, there's some images that get edited all the time (think SVGs with routinely updated data) 21:29:45 SMalyshev: ah, right, file_text is just one field. there's a few more useful things we could index for use on commons, i think. we could supper type:video or resolution:>800 21:29:51 I don't recall any scaling issues we are facing right *now* aside from moving all those files around 21:29:53 *support 21:30:00 but that's not affected by this work 21:30:14 jynus: there are some edge cases, like the image that the smoke tests update every day, but they are extremely rare 21:30:15 DanielK_WMDE, SMalyshev: Yeah, we never stuffed a ton of data about files into Elastic originally 21:30:24 Mainly file_text so we could search inside of pdf/djvu files 21:30:24 So back to the original question :) Aside from img_latest and img_sha1 (for duplication detection). Any other fields we need to keep? 21:30:26 I suppose a pdf with many revisions could hurt 21:30:29 SMalyshev: i see no overlap with structured commons work tbh. it's orthogonal 21:30:32 DanielK_WMDE: it should be pretty easy to do now that he have infrastructure to add fields ieasily 21:30:35 or djvu...all those big blobs of text 21:30:42 If we don't keep img_media_type we need to migrate Special:MIMESearch first. Do we want to block on that? How trivial would it be? 21:30:43 in oi_metadata 21:30:50 SMalyshev: yay! make a ticket? 21:31:03 Will we be able to make that special page work with default mysql search? 21:31:20 DanielK_WMDE: and we have also much easier keyword adding in Cirrus, btw - it's not proper structure instead of one humngous switch 21:31:24 Or do we want to remove it from core or somehow make disable for simple backends? 21:31:30 can I be pedantic and suggest, as usual, not to store in text things that only have a few values? 21:31:42 like mime types 21:31:49 Krinkle: Not easily with the way things are right now. 21:31:54 DanielK_WMDE: so yeah I think making a ticket and figuring out what we want to put there won't hurt 21:31:55 I doubt there is anyone who really relies on that special page 21:31:56 I would be +10000 to removing it from core. 21:31:57 jynus: It's currently ENUM 21:32:03 Krinkle, good! 21:32:09 /** 21:32:09 * The index is on (img_media_type, img_major_mime, img_minor_mime) 21:32:09 * which unfortunately doesn't have img_name at the end for sorting. 21:32:09 * So tell db to sort it however it wishes (Its not super important 21:32:09 * that this report gives results in a logical order). 21:32:23 this is a special page which gives you images of a given type, in random order 21:32:29 Krinkle: one question that could have an impact on how we want to evolve things is which metadata should be associated with the image itself, and which should be associated with a name + description 21:32:39 DanielK_WMDE: byw there's a big reindex coming IIRC sometime soon... so maybe worth discussing with all search team if we want to add stuff before it 21:32:54 We also have https://commons.wikimedia.org/wiki/Special:MediaStatistics 21:32:56 DanielK_WMDE: because adding stuff after would take long time to become useful 21:33:01 Which is arguably less random than MIMESearch 21:33:14 can we go to question 2? 21:33:42 Yeah, time is moving. OK. So mime remains an open question for later. No other fields? 21:34:12 question 2: What about img_metadata field. It's fairly large. Do we just move it to the new imagerevision? (like it is now). Or differently? 21:34:26 Krinkle: MIMESearch is basically only useful for the less-used mimetypes :) 21:34:31 This field is a blob of serialised PHP (typically representing the Exif data of an image). 21:34:42 Like, who on earth is doing [[Special:MIMESearch/image/jpeg]] 21:34:44 ;-) 21:34:44 it's pretty horrible for djvu files especially 21:34:48 megabytes of xml 21:34:53 Krinkle: maybe external store support? 21:35:00 caused some outages 21:35:06 EXIF data is usally pretty small, the problem is that on wikisource it stores OCR text of enormous scanned books 21:35:10 I mean if we still have filearchive, we don't want to be moving those around even after a big refactor 21:35:13 #info Krinkle: question 2: What about img_metadata field. It's fairly large. Do we just move it to the new imagerevision? (like it is now). Or differently? 21:35:17 that would be a missed opportunity 21:35:49 It should've been normalized when Brian redid our Exif/etc support a few years back, we just didn't get to that point. 21:35:56 TimStarling: yeah, tiny ones should be able to stay, and a flag field could say if it use ES or something perhaps 21:36:12 I'd like to avoid another 'text' table. 21:36:25 well, we have metadata upgrade which means we need to load it all the time 21:36:54 this comes back to my earlier question about which metadata should be associated with the image itself, vs. the name -- exif data and OCR XML look more like a property of the image itself 21:36:54 we probably need to load it for other things... 21:37:10 gwicke: image revision itself? 21:37:17 no, image 21:37:19 we load it for thumbnails I think 21:37:31 the same image under different names would still have the same OCR XML 21:37:41 just use blob storage for the extracted text? 21:37:42 might be nice to have metadata that can point to more metadata 21:37:50 'image' = equivalent to page, not image content. There is no image content right now. 21:37:51 so thumbnails could render from the minimal metadata 21:37:53 or jpg metadata, for that matter 21:37:54 that's going to be its eventual fate anyway 21:37:58 and pages of djvu from the rest 21:38:04 gwicke: One image with two revisions needs two meta data blobs 21:38:21 to record user / timestamp / comment, yeah 21:38:24 It can't be in 'image'. 21:38:26 but the XML etc can be unchanged 21:38:34 do we currently store "metadata" for old revisions, it is not clear to me? 21:38:37 and the minimal metadata would fit in cache with the rest of the filerevision fields 21:38:45 jynus: we do 21:38:49 so, if that was keyed on content hash, then we'd get the right association 21:38:54 A new revision means a new upload, which would always bring new Exif. It could be unchanged, but it's part of the image file. 21:38:55 ok, then keep doing it 21:39:01 & avoid the need to touch it when reverting 21:39:20 Can we deterministically hash Exif and other meta data? 21:39:27 we can improve it in the future, as we will probably will, but I think that is out of the scope? 21:39:29 (Do we normalise first?) 21:39:31 we can hash the original 21:39:41 and just reuse the existing extracted metadata 21:39:54 what's wrong with having another text table? 21:40:07 Either way, yeah, we could have a separate image_metadata table. Would that table associate primary keys with blobs and/or references to external store? Whcih we reference in imagerevision? Or do we reference it directly? 21:40:28 TimStarling, seems like a good idea whenever even if there is 1:1, you will not always read or write the entire row 21:40:51 we said years ago that we would change the image / oldimage structure 21:40:52 could the image revision metadata reference a hash, and the blob metadata then key on hash as well? 21:40:55 maybe metadata for most things could be in filerevision, and the DjVu handler could have its own special pointer to ES 21:41:17 that way you can efficiently load things like video duration when it's needed for list display 21:41:21 gwicke: we currently cash djvu tree metadata by base file hash 21:41:21 TimStarling: I meant that if we need a generic place to store blobs in a way that won't be moved, altered or deleted. We may want to re-use text instead of creating another one. I don't know if that's beneficial or not though. 21:41:32 Given we already have sharding etc. and a scalable solution for that. 21:41:33 gwicke, you keep saying hash, and if it is a hash or another kind of id, it doesn't matter 21:42:00 when djvu OCR was introduced, we put it in img_metadata because there wasn't really a simple alternative 21:42:02 we can store the hash separately or as an id 21:42:02 jynus: it matters a lot for dedup and cache invalidation 21:42:09 ^ 21:42:33 it would be nice to have ids for images 21:42:34 TimStarling: I like that. So we have reason to avoid a separate table because we typically need to query both anyway? 21:42:56 Krinkle, that is the right question 21:43:25 separte if logically or physically is separated 21:43:28 And that balances okay with the cost of having to read/write both since we would mostly only be writing it once? (aside from archiving). In the current schema in production, we'd avoid this because it'd need to move the metadata when a new revision is uploaded. But we won't have that probelm anymore. 21:43:30 I'm not sure of the exact numbers, but file description pages for example obviously need metadata loaded, for all revisions of the file 21:43:48 gwicke: you could always change the ES id if you change the metadata. We probably shouldn't focus on hash vs ID too much for that :) 21:43:53 however, maybe that could be a decision that we do not need to take now? 21:44:01 things that use ImageListPager probably need metadata too, since they generate thumbs 21:44:05 yeah OCR text should be split out of generic metadata 21:44:11 Okay, so that means for the current migration it is orthogonal. (Supporting external store for img_metadata) 21:44:14 that's T32906 / T99263 fwiw 21:44:15 T32906: Store DjVu extracted text in a structured table instead of img_metadata - https://phabricator.wikimedia.org/T32906 21:44:15 T99263: Store Pdf extracted text in a structured table instead of img_metadata - https://phabricator.wikimedia.org/T99263 21:44:19 hmm, and using images from within the parser will generate thumbs and so need metadata 21:44:21 TimStarling: Should we do it before or after the schema change? 21:44:45 can be separate, the only question we need to answer right now is whether filerevision.fr_metadata will exist 21:44:51 and I guess the answer to that is yes 21:45:01 #info discussion of q2 has been about historic reason for having it (e.g. djvu) 21:45:08 we don't have nearly enough time for question 3 now you know 21:45:22 but we can give it a go 21:45:28 Yeah :) 21:45:34 SMalyshev: https://phabricator.wikimedia.org/T144447 21:45:38 So last question 3: Migration. 21:45:45 haha 21:45:47 #info question 3 discussion starts. Migration 21:46:02 tgr: perhaps OCR text can wait for MCR? seems like an obviour candidate 21:46:04 Background in 3. Migration at https://phabricator.wikimedia.org/T589#2541630 21:46:25 I chatted with Tim about a potential path at the data side, I think I summarized on the ticket response 21:47:06 I'd like to avoid adding temporary logic to MediaWiki for ignoring 'in-mid-air' rows since it's complicated technical debt to have to maintain. Especially since we support upgrades from older versions. 21:47:15 keep X table doesn't mean we have to keep the names 21:47:19 I fear we'd have to keep it forever. 21:47:39 last time jynus mentioned views 21:48:04 Good point. 21:48:07 I don't have a fully-worked plan for using them but it seems like that could avoid the need for MW migration code 21:48:07 AaronSchulz: I agree that any blob id will help with cache invalidation, but there are differences in dedup, multi-repo use (commons vs. local project) etc; numeric ids are basically less efficient for a lot of those problems, but have the advantage of being slightly shorter 21:48:52 So that would allow for migrating rows *to* the new table while keeping the old MediaWiki logic live. 21:49:01 how does non-duplication interact with 1) het deploy of MW versions and 2) ability to back it out if it fails? 21:49:03 can you insert into a view? 21:49:19 TimStarling: no 21:49:21 TimStarling, yes 21:49:26 what? 21:49:30 :D 21:49:32 if you have an "insertable view" 21:49:48 https://dev.mysql.com/doc/refman/5.7/en/view-updatability.html 21:49:48 which basically meas 1:1 relationship with the original table 21:49:53 * DanielK_WMDE just googled 21:49:54 robla: it's not like we are running two versions of commons at the same time right now 21:50:22 you cannot insert into a "GROUP BY" view, means mostly 21:50:39 #info can you insert into a view? if you have an "insertable view" 21:50:40 We always require backward compatibility indeed because of FileRepo on wmf wikis querying commons. 21:50:48 Which can be one version ahead or behind Commons 21:50:53 robla: thanks 21:50:59 but you can into a "WHERE deleted=0" 21:51:22 I had created https://phabricator.wikimedia.org/T134827 precisely because I thought (after testing) that you could not insert into views 21:51:33 ha ha 21:51:51 so that is another reason to keep the original tables around (even if they are renamed) 21:51:51 So if we temporarily add a 'fixme' column, and migrate rows from image to oldimage (to be renamed). 21:52:00 actually 21:52:05 the plan was the other way 21:52:10 yeah, the other way 21:52:13 I suppose we'd do that in the background and then keep doing that until it's small enough and then we go into read-only mode while we update the software? 21:52:14 as image was larger 21:52:19 Sorry, yeah, that's what I meant. 21:52:22 rename image to filerevision, create an image view that filters filerevision 21:52:44 I hope we don't do any "*" select anywhere 21:52:45 how efficient are table renames, jynus ? 21:52:51 table renames are O(1) 21:52:55 reana tables are cost "0" in mysql * (* conditions may apply) 21:53:02 #info rename image to filerevision, create an image view that filters filerevision 21:53:03 xD 21:53:14 TimStarling: depends on the storage 21:53:15 aka metadata lock, but that is an infra issue 21:53:29 I think there was something to take into account 21:53:31 Given we need less fields in the new table, should we do the view the other way around? 21:53:42 just like renaming a db should be O(1) 21:53:46 but is actually not supported 21:54:00 Krinkle, for performance reasons, you do not want to write heavility into a view 21:54:09 poor indexing support 21:54:28 Sure 21:54:33 views do not have indexes, only the underlying table does 21:54:36 if it's really necessary we can disable uploads for a few hours while migration takes place 21:54:42 but if it works, it works 21:54:52 (but it is a pain in labs) 21:54:58 how does that matter for writes? 21:54:59 I think the whole migration will take longer than a few hours though, no? 21:55:08 We need to do some of it in the background ideally 21:55:40 what is the slowest part? copying all rows of commons.oldimage to commons.filerevision 21:55:42 ? 21:55:46 image->filerevision, create 'image' view for back-compat, add all oldimage rows to image. 21:55:54 Right 21:56:06 And adding primary keys 21:56:07 can that be done with INSERT SELECT? 21:56:21 on a depooled slave? 21:56:37 that's why I say hours 21:56:40 What? 21:57:00 TimStarling: You want to do a musical chairs dance? 21:57:23 I think the "new" table didn't have but an id and a few fields? 21:57:33 a title, maybe 21:58:10 a page_id, maybe? 21:58:11 thanks, All! 21:58:16 sorry, I do not have the structure in front of me right now 21:58:18 yes 21:58:18 the link between file and page is a bit strange right now, no? 21:58:19 Hm.. If we project old 'image' on top of new 'filerevision', we still need to create the new 'file' table. 21:58:21 as well. 21:58:31 you can't do INSERT SELECT on tens of millions of rows on the live master can you? 21:58:59 Okay, I'll summarise input from question 1 and 2 on the task. 21:59:00 TimStarling, I would say you cannot do it on a slave either- how do you do that an keep inserting? 21:59:08 we're just about out of scheduled time, but we'll go over 5 minutes (or longer if everyone prefers) 21:59:16 I think that would lag slaves 21:59:17 we should wrap up 21:59:36 #info Okay, I'll summarise input from question 1 and 2 on the task. 21:59:36 We can continue talking in #wikimedia-tech jynus and TimStarling if you have time, and anyone else is welcome to join. 21:59:54 I can't stay for long today 22:00:11 #info We can continue talking in #wikimedia-tech jynus and TimStarling if you have time, and anyone else is welcome to join. 22:00:15 TimStarling: Id like to make sure I understnad what you meant though. 22:00:52 it really just needs design work 22:01:05 whether it can be done in SQL should be more obvious once the design is further along 22:01:24 Yeah, I think migration is clear enough for now that we know we have several optoins. 22:01:38 My advice: avoid sql and insert selects 22:01:43 We know that it can be done in a performant way. The details can be figured out in code review and elsewhere. 22:02:09 For the actual migration that is. As long as we know we can do it. 22:02:09 ok, is that what we should wrap up on? 22:02:30 60 second warning 22:02:42 I guess it's a bit late for last-call now. I'll finalise over the coming week and propose for next week (no rfc meeting but just the last-call anouncement) 22:02:50 sounds good? 22:03:07 ah, right, yeah, I think that sounds good 22:03:27 we don't have a full design to approve at the moment, there are a lot of (literal) question marks in the phab task 22:03:45 Krinkle: you're proposing that you do the design, put it up, give people one week in last call? 22:03:45 we can discuss procedure in next week's committee meeting 22:04:30 robla: one week from next week. 22:04:34 yup. ok. now 30 second warning :-) 22:04:47 thanks everyone! 22:04:58 TimStarling: Yeah, but good stuff to think about :) 22:05:09 Thanks! 22:05:12 #endmeeting