21:01:31 #startmeeting ArchCom RFC meeting - Markdown support | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ 21:01:31 Meeting started Wed Jun 22 21:01:31 2016 UTC and is due to finish in 60 minutes. The chair is brion. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:31 The meeting name has been set to 'archcom_rfc_meeting___markdown_support___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_' 21:01:46 i hope that wasn't too many bits for poor meetbot 21:02:19 #link https://phabricator.wikimedia.org/E218 Phab event for this week's meeting 21:02:40 #info discussing https://phabricator.wikimedia.org/T137946 develop Markdown support strategy for MediaWiki 21:03:20 #link https://www.mediawiki.org/wiki/Requests_for_comment/Markdown this week's RFC 21:03:30 * robla wipes brow 21:03:41 robla, care to chat a bit on the background? 21:04:33 sure, this is asking "what should our Markdown strategy be?", where pretty much any answer is valid 21:04:56 :D 21:05:12 why I'm asking that: there are many, many flavors of "wiki syntax" out there, of which MediaWiki wikitext is only one 21:06:00 (but ours is the _real_ wikisyntax... :P ) 21:06:04 many implementations claim "Markdown support", which the interpretation varies quite a bit based on implementation 21:06:43 YairRand: :-D I think that actually gets to the heart of it 21:07:52 YairRand: do you (or anyone out there) believe that all other implementations will "see the light" and start using our format? should they? 21:08:59 a different question is: will all the disparate markdown efforts to go beyond "simple" markdown eventually arrive at the wikitext level of complexity? 21:09:25 even if the syntax will probably not be wikitext syntax itself. 21:09:59 (taking off my chair hat momentarily) what's a reason a given wiki might have for choosing to use markdown? preference, or compatibility with existing data or other tools, or? 21:10:21 (that might affect how one would go about such support) 21:10:56 I think both questions are very good, and now I'm having trouble choosing :-) 21:11:00 :D 21:11:08 let's do em in turn 21:11:11 migrating from a github wiki to mediawiki might be one reason to want markdown page source 21:11:54 *nod* 21:11:54 bd808: yup 21:11:54 are there any serious limitations regarding wikitext that are solved in other syntaxes? are they pretty freely convertable? 21:12:09 (Is there a question here about how Wikimedia markdown talked about now will interface with SQID and Wikidata?) 21:12:35 YairRand: the Pandoc folks aspire to provide complete interchangability 21:12:58 #info open question: reasons for choosing markdown? example: moving hosting of a github wiki 21:12:59 robla: ... 21:13:06 * subbu is looking at http://pandoc.org/README.html#pandocs-markdown and sees that it is a pretty long spec 21:13:48 #info open question: complexity and extensions to the markup? example: would we need a syntax extension for templates/parserfunctions/lua/wikidata/etc? 21:14:39 easy things are easy to convert, hard things are ....... well that's the question isn't it :D 21:15:04 one good reason to entertain this markdown question for mediawiki is that it might let us abstract the markup / parsing parts of the codebase behind an interface. 21:15:26 #info for convertability of markdownish things, see pandoc http://pandoc.org/README.html#pandocs-markdown 21:15:41 what does cut and paste support mean for users in practice? 21:15:52 agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core 21:15:58 subbu: good point. also, how much do we rely on wikitext eg in the user interface? 21:16:00 i toyed with that interface idea in https://www.mediawiki.org/wiki/User:SSastry_(WMF)/Notes/Wikitext#Core_ideas 21:16:35 brion, yes, wikitext in the UI is tricky ... 21:16:49 site messages are another i guess. 21:17:25 TimStarling: I know what it means for me, but that's probably a better question for the folks who work with VE regularly, since my understanding is that cut-n-paste bugs happen a lot 21:17:28 #info question: heavy use of wikitext in UI may require core parser. implications for alternate formats? 21:17:48 * robla goes to find the Phab component for cut-n-paste issues 21:17:58 brion, is this (wikitext in UI) used a lot in non-wmf installs of mediawiki? 21:18:27 https://phabricator.wikimedia.org/project/view/898/ VisualEditor copypaste component in Phab 21:18:29 would markdown be a third editing mode, after "source" and VE? 21:18:33 #link https://phabricator.wikimedia.org/project/view/898/ VisualEditor copypaste component in Phab 21:19:00 TimStarling, I would think not. 21:19:12 would you have an "insert markdown" toolbar button which gives you a box for pasting markdown? 21:19:18 subbu: at least some yes, sentences and paragraphs allowing bold, links, etc on various special pages. don't know how scary they are 21:19:27 as in .. i see robla's proposal as that of using it as an interchange format for copy-paste 21:20:12 #info question: would cut-and-paste and interchange for markdown add a third editing mode beyond source/visual? 21:20:38 agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core 21:20:44 or it could be done as a ContentHandler 21:21:17 yeah. then you could have a mixed wiki if you wanted 21:21:29 then you wouldn't even touch $wgParser or create a Parser base class 21:21:31 i don't see a used case for mixed-markup-format wikis. 21:21:31 #info tim sez "getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core" 21:21:35 that would be pretty confusing. 21:21:44 no, I was quoting bd808 21:21:51 #info whoops bd808 sez that 21:22:04 * brion quote parsing error ;) 21:22:07 * bd808 denies it all 21:22:41 it can be the default content handler if you like, the point of doing it as a content handler is that it gives you a convenient pre-existing hook point 21:22:53 i can see particular uses, such as when a wiki is used as a source repository of documents to be reused.... but they get scary ;) 21:22:59 (for mixed modes) 21:23:00 pretty much everything about wikitext has already been abstracted there, for wikidata's benefit 21:23:36 things like links table updates, redirect syntax, PST and parsing itself 21:23:42 #info tim is pretty sure ContentHandler can implement a markdown mode well. should already be well-factored. can be used as default contenthandler in theory 21:23:48 i see ... 21:24:51 that wouldn't effect site messages because the message system grabs onto $wgParser 21:25:12 but maybe that's not a bad thing 21:25:17 but they'd still have to be written in wikitext if they are stored in a wiki page, right? 21:25:19 yeah, that's the point 21:25:47 site messages could have the wikitext content type, so you could even preview them using wikitext 21:26:27 we already support default content types that vary depending on namespace 21:26:27 #info example of needing core parser: messages in MediaWiki: namespace, such as site notices. force them to use wikitext CH 21:26:34 again for wikidata's benefit 21:26:34 is some sort of wikitext always going to be at the heart of MediaWiki or is T112999 forseeable? 21:26:35 T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 21:27:09 robla: it's conceivable but we'd have to eliminate or make optional the remaining wikitext users ;) 21:27:36 brion, i don't think robla is saying get rid of wikitext .. but whether mediawiki might support an option without wikitext. 21:27:48 allowing the parser for site messages to change would be like adding a language variant to every i18n language which seems unlikely to turn out well 21:28:22 right you'd basically have to change them to plaintext or plaintext with a very limited markup that is not full wikitext 21:28:31 * subbu is trying to grok what bd808 just said 21:28:34 I don't think it would really be helpful to attempt to translate i18n into some other markup language 21:28:46 but we've got all sorts of fun things like grammatical plural and gender markers done via a subset of wiki markup 21:28:47 you know, i18n really drove the development of a lot of parser features 21:28:48 subbu: en-wikitext && en-markdown 21:29:30 #info i18n is heavily dependent on a subset of the core parser for plurals, genders, and other message variants... but that doesn't have to be used for content if you don't want 21:29:36 let's say that the version of wikitext we have now is "wikitext 1.0". is "wikitext 1.1" something we could do? (and still support i18n) 21:30:10 * brion ponders 21:30:36 could we, or would we want to, split a wikitext spec into 'the bits used for i18n' and 'extra fancy-ass markup used in wikipedia-like content' 21:30:37 ? 21:30:45 robla, wikitext has evolved over the years .. so, i guess the qn. you are asking is if explicit versioning is needed? 21:30:48 or is that even worse :D 21:30:49 i18n of course is a mix of formats 21:31:05 preprocessed plain text, preprocessed HTML and true wikitext 21:31:07 plaintext, plaintext plus, wikitext, html, .... oh helllllls 21:31:14 subbu: yeah, I think so 21:31:49 well, except the qqq language which is pretty consistently wikitext 21:32:52 #info question: is explicit versioning needed? can/should we make a 'wikitext 1.1' that is always implemented for i18n and ui messages? 21:33:18 #info note i18n messages are a mix of plaintext+preprocess, HTML+preprocess, and pure wikitext 21:34:42 robla, are you proposing any role for markdown on WMF wikis? 21:35:10 (What are the implications of these MediaWiki markdown choices/decisions re ContentTranslation and Wikipedia's 358 languages, and security questions especially?) 21:35:47 TimStarling: I think it potentially has a role in normalizing CopyPaste issues, but the path toward that is complicated 21:35:59 #info question: implications of markdown choices on other tools like CT, need for i18n, and security? 21:36:19 that requires browsers, doc-creating systems (word, etc.) to support conversion to "standard" markdown. 21:36:45 it seems very limited as an interchange format 21:37:02 compared to RTF, HTML, PDF, etc. 21:37:22 if I were going to copy-paste from a markdown wiki page, bug report, or readme file on github for instance, my choices are to copy-paste the source, or copy-paste the rendered HTML 21:37:39 subbu: I think at a base level, we have a number of applications that claim "text/html" during copy/paste operations, but text/html copy pasting pretty much anything 21:37:52 we know that pasting text/html is way harder than it should be ;) but we already support it in VE 21:38:07 brion, from some sources, yes. 21:38:13 pasting HTML into VE is already good enough to be useful 21:38:14 brion: we support it today, but it's an arms race, isn't it? 21:38:17 benefits of source copy? 21:38:18 I have used it a few times 21:38:19 hehe yep 21:39:00 no one (that I'm aware of) has defined a useful subset of HTML that is safe for copy/paste operations 21:39:08 but so is markdown isn't it? 21:39:22 if we support github's extensions, next we get asked about someone else's extensions 21:40:18 #info question is the HTML copy-paste "arms race" good enough vs markup-specific paste converter tools for markdown etc? 21:40:36 HTML paste is likely to work if the HTML is very simple 21:41:00 for example if you're copying from a github README.md you'd expect it to work 21:41:29 TimStarling: is there a "very simple" subset of HTML we can get browser makers to support? 21:41:43 (for copy/paste purposes)? 21:41:47 robla, you linked to https://tools.ietf.org/html/draft-ietf-appsawg-text-markdown-12 ... what are your thoughts on how likely it is to be adopted? 21:42:24 robla: no... but then browsers can't export to markdown either 21:42:31 #link https://tools.ietf.org/html/draft-ietf-appsawg-text-markdown-12 21:42:55 subbu: I think like that could happen 21:43:18 our original goal for parsoid html2wt (which is still there as a comment in the serialization code) is to be able to accept arbitrary html and convert it to "acceptable" wikitext. but we haven't quite worked on that goal for a while now since we are mostly behind clients whose output is more controlled. 21:44:05 subbu: what do you mean by "output is more controlled"? 21:44:27 as in .. VE/CX/Flow etc. don't generate arbitrary html. 21:44:39 ah, got it 21:45:25 but, if you say, took the html from a bbc article and gave it to parsoid to convert to wikitext, the output isn't pretty. 21:45:35 so...basically, the copy/paste code works when we can control the generation of the HTML, but most implementations don't conform to our spec 21:45:51 no, VE does its own handling of copy-pasted HTML .. it doesn't go through parsoid. 21:46:20 fun :D 21:46:29 you mean it cleans up the HTML before it hands it to parsoid for serialization? 21:46:41 but, we've talked about creating a library for normalization and cleanup. 21:47:01 #info for comparison, the HTML paste handling in VE is done by normalizing HTML on the VE end, before it eventually lands in parsoid during save/serialization 21:47:10 TimStarling, as far as i know ... they strip unrecognized / unsupported attributes. 21:47:30 #info ideally the parsoid html2wt would take any html and produce 'acceptable' wikitext but is not fully exercised at that right now 21:49:28 things like html2wt are going to be necessary for a long time, I imagine, but it seems to me we should at least start pulling people toward a world where html2wt isn't necessary 21:50:32 well, there's the html-only world possibility :) 21:50:47 where you'd still have some validation stage 21:50:56 but not a major reparse i guess 21:51:14 (and presumably a stage to handle composition of templates, media etc) 21:51:56 for parsoid to accept arbitrary html, we would need to run a sanitization pass on the html and strip unrecognized attributes, normalize html, etc. 21:52:09 I think we live in a world where wikitext is sanitized and tries to be safe, and HTML is known unsafe 21:52:28 indeed we'd have "inside html" and "outside html" at the least 21:52:32 which is also something that needs to happen with a html-only wiki .. sanitization at the very least. 21:52:32 never, EVER mix em :D 21:52:44 there's no "sanitized HTML" spec 21:53:09 :) 21:53:14 #info an HTML-only storage world needs to carefully sanitize between "outside HTML" and "safe inside HTML".... but there's no spec! we'd need one 21:53:37 there's the old HTML email spec 21:53:58 (but yeah, that's not really a good alternative) 21:54:38 https://en.wikipedia.org/wiki/HTML_email 21:55:34 probably we need to spec out our extensions as well, such as how you extract the file name from a usage, a wiki page from a link, a template reference and parameter set from a big ol' blob of divs or whatever 21:55:46 I think if VE's HTML paste can produce reasonable wikitext markup for any HTML generated from original markdown, then that more or less replaces the need for direct markdown paste 21:56:15 i tend to agree 21:56:36 "original markdown" as in http://daringfireball.net/projects/markdown/syntax 21:56:45 which is much simpler than pandoc markdown 21:57:28 commonmark would be the modern simple version, I think 21:57:47 http://commonmark.org/ 21:57:50 ok we're getting low on time 21:58:09 any action items to pursue? decisions made? 21:58:21 T127329 is the placeholder for the parsoid side work to consolidate html-import/cleanup code into a library for use by whoever. 21:58:21 T127329: Using Parsoid as a wikitext bridge for importing content into wikitext format - https://phabricator.wikimedia.org/T127329 21:58:50 #link https://phabricator.wikimedia.org/T127329 related parsoid bridge for html-import-to-wikitext 21:59:19 Thanks All! 21:59:26 so I'm fairly skeptical about the idea of direct markdown paste as being superior to markdown->html->wikitext 21:59:28 subbu: my understanding is that you're working on RFCs as a goal soon, right? 21:59:28 i was interested in the markdown strategy as a potential benefit for refactoring some code in mediawiki .. but looks like that is mostly already in place? 21:59:50 yay wikidata -> contenthandler \o/ 22:00:17 robla, rfcs for .. that task i pasted above? 22:00:21 #info tim is skeptical of direct paste; html import seems to serve well 22:00:33 subbu: something related to T112999? 22:00:34 T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 22:00:43 #action someone should revise the RfC, probably drop the cut-paste 22:00:44 ah, cscott territory. 22:00:49 yes. 22:00:58 #action update T112999 for the ContentHandler era 22:00:58 T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 22:01:15 i'll chat with him about it. 22:01:36 #action subbu will chat with cscott 22:01:38 thanks all! 22:01:41 #endmeeting