Page MenuHomePhabricator
Paste P3299


Authored by RobLa-WMF on Jun 22 2016, 10:04 PM.
Referenced Files
F4193047: ArchCom-RFC-2016W25-irc-E218.txt
Jun 22 2016, 10:04 PM
21:01:31 <brion> #startmeeting ArchCom RFC meeting - Markdown support | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs:
21:01:31 <wm-labs-meetbot`> Meeting started Wed Jun 22 21:01:31 2016 UTC and is due to finish in 60 minutes. The chair is brion. Information about MeetBot at
21:01:31 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:31 <wm-labs-meetbot`> The meeting name has been set to 'archcom_rfc_meeting___markdown_support___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_'
21:01:46 <brion> i hope that wasn't too many bits for poor meetbot
21:02:19 <robla> #link Phab event for this week's meeting
21:02:40 <brion> #info discussing develop Markdown support strategy for MediaWiki
21:03:20 <robla> #link this week's RFC
21:03:30 * robla wipes brow
21:03:41 <brion> robla, care to chat a bit on the background?
21:04:33 <robla> sure, this is asking "what should our Markdown strategy be?", where pretty much any answer is valid
21:04:56 <brion> :D
21:05:12 <robla> why I'm asking that: there are many, many flavors of "wiki syntax" out there, of which MediaWiki wikitext is only one
21:06:00 <YairRand> (but ours is the _real_ wikisyntax... :P )
21:06:04 <robla> many implementations claim "Markdown support", which the interpretation varies quite a bit based on implementation
21:06:43 <robla> YairRand: :-D I think that actually gets to the heart of it
21:07:52 <robla> YairRand: do you (or anyone out there) believe that all other implementations will "see the light" and start using our format? should they?
21:08:59 <subbu> a different question is: will all the disparate markdown efforts to go beyond "simple" markdown eventually arrive at the wikitext level of complexity?
21:09:25 <subbu> even if the syntax will probably not be wikitext syntax itself.
21:09:59 <brion> (taking off my chair hat momentarily) what's a reason a given wiki might have for choosing to use markdown? preference, or compatibility with existing data or other tools, or?
21:10:21 <brion> (that might affect how one would go about such support)
21:10:56 <robla> I think both questions are very good, and now I'm having trouble choosing :-)
21:11:00 <brion> :D
21:11:08 <brion> let's do em in turn
21:11:11 <bd808> migrating from a github wiki to mediawiki might be one reason to want markdown page source
21:11:54 <brion> *nod*
21:11:54 <robla> bd808: yup
21:11:54 <YairRand> are there any serious limitations regarding wikitext that are solved in other syntaxes? are they pretty freely convertable?
21:12:09 <Scott_WUaS> (Is there a question here about how Wikimedia markdown talked about now will interface with SQID and Wikidata?)
21:12:35 <robla> YairRand: the Pandoc folks aspire to provide complete interchangability
21:12:58 <brion> #info open question: reasons for choosing markdown? example: moving hosting of a github wiki
21:12:59 <YairRand> robla: ... <clap clap clap>
21:13:06 * subbu is looking at and sees that it is a pretty long spec
21:13:48 <brion> #info open question: complexity and extensions to the markup? example: would we need a syntax extension for templates/parserfunctions/lua/wikidata/etc?
21:14:39 <brion> easy things are easy to convert, hard things are ....... well that's the question isn't it :D
21:15:04 <subbu> one good reason to entertain this markdown question for mediawiki is that it might let us abstract the markup / parsing parts of the codebase behind an interface.
21:15:26 <brion> #info for convertability of markdownish things, see pandoc
21:15:41 <TimStarling> what does cut and paste support mean for users in practice?
21:15:52 <bd808> agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core
21:15:58 <brion> subbu: good point. also, how much do we rely on wikitext eg in the user interface?
21:16:00 <subbu> i toyed with that interface idea in
21:16:35 <subbu> brion, yes, wikitext in the UI is tricky ...
21:16:49 <subbu> site messages are another i guess.
21:17:25 <robla> TimStarling: I know what it means for me, but that's probably a better question for the folks who work with VE regularly, since my understanding is that cut-n-paste bugs happen a lot
21:17:28 <brion> #info question: heavy use of wikitext in UI may require core parser. implications for alternate formats?
21:17:48 * robla goes to find the Phab component for cut-n-paste issues
21:17:58 <subbu> brion, is this (wikitext in UI) used a lot in non-wmf installs of mediawiki?
21:18:27 <robla> VisualEditor copypaste component in Phab
21:18:29 <TimStarling> would markdown be a third editing mode, after "source" and VE?
21:18:33 <robla> #link VisualEditor copypaste component in Phab
21:19:00 <subbu> TimStarling, I would think not.
21:19:12 <TimStarling> would you have an "insert markdown" toolbar button which gives you a box for pasting markdown?
21:19:18 <brion> subbu: at least some yes, sentences and paragraphs allowing bold, links, etc on various special pages. don't know how scary they are
21:19:27 <subbu> as in .. i see robla's proposal as that of using it as an interchange format for copy-paste
21:20:12 <brion> #info question: would cut-and-paste and interchange for markdown add a third editing mode beyond source/visual?
21:20:38 <TimStarling> <bd808> agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core
21:20:44 <TimStarling> or it could be done as a ContentHandler
21:21:17 <bd808> yeah. then you could have a mixed wiki if you wanted
21:21:29 <TimStarling> then you wouldn't even touch $wgParser or create a Parser base class
21:21:31 <subbu> i don't see a used case for mixed-markup-format wikis.
21:21:31 <brion> #info tim sez "getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core"
21:21:35 <subbu> that would be pretty confusing.
21:21:44 <TimStarling> no, I was quoting bd808
21:21:51 <brion> #info whoops bd808 sez that
21:22:04 * brion quote parsing error ;)
21:22:07 * bd808 denies it all
21:22:41 <TimStarling> it can be the default content handler if you like, the point of doing it as a content handler is that it gives you a convenient pre-existing hook point
21:22:53 <brion> i can see particular uses, such as when a wiki is used as a source repository of documents to be reused.... but they get scary ;)
21:22:59 <brion> (for mixed modes)
21:23:00 <TimStarling> pretty much everything about wikitext has already been abstracted there, for wikidata's benefit
21:23:36 <TimStarling> things like links table updates, redirect syntax, PST and parsing itself
21:23:42 <brion> #info tim is pretty sure ContentHandler can implement a markdown mode well. should already be well-factored. can be used as default contenthandler in theory
21:23:48 <subbu> i see ...
21:24:51 <bd808> that wouldn't effect site messages because the message system grabs onto $wgParser
21:25:12 <bd808> but maybe that's not a bad thing
21:25:17 <brion> but they'd still have to be written in wikitext if they are stored in a wiki page, right?
21:25:19 <TimStarling> yeah, that's the point
21:25:47 <TimStarling> site messages could have the wikitext content type, so you could even preview them using wikitext
21:26:27 <TimStarling> we already support default content types that vary depending on namespace
21:26:27 <brion> #info example of needing core parser: messages in MediaWiki: namespace, such as site notices. force them to use wikitext CH
21:26:34 <TimStarling> again for wikidata's benefit
21:26:34 <robla> is some sort of wikitext always going to be at the heart of MediaWiki or is T112999 forseeable?
21:26:35 <stashbot> T112999: Let MediaWiki operate entirely without wikitext -
21:27:09 <brion> robla: it's conceivable but we'd have to eliminate or make optional the remaining wikitext users ;)
21:27:36 <subbu> brion, i don't think robla is saying get rid of wikitext .. but whether mediawiki might support an option without wikitext.
21:27:48 <bd808> allowing the parser for site messages to change would be like adding a language variant to every i18n language which seems unlikely to turn out well
21:28:22 <brion> right you'd basically have to change them to plaintext or plaintext with a very limited markup that is not full wikitext
21:28:31 * subbu is trying to grok what bd808 just said
21:28:34 <TimStarling> I don't think it would really be helpful to attempt to translate i18n into some other markup language
21:28:46 <brion> but we've got all sorts of fun things like grammatical plural and gender markers done via a subset of wiki markup
21:28:47 <TimStarling> you know, i18n really drove the development of a lot of parser features
21:28:48 <bd808> subbu: en-wikitext && en-markdown
21:29:30 <brion> #info i18n is heavily dependent on a subset of the core parser for plurals, genders, and other message variants... but that doesn't have to be used for content if you don't want
21:29:36 <robla> let's say that the version of wikitext we have now is "wikitext 1.0". is "wikitext 1.1" something we could do? (and still support i18n)
21:30:10 * brion ponders
21:30:36 <brion> could we, or would we want to, split a wikitext spec into 'the bits used for i18n' and 'extra fancy-ass markup used in wikipedia-like content'
21:30:37 <brion> ?
21:30:45 <subbu> robla, wikitext has evolved over the years .. so, i guess the qn. you are asking is if explicit versioning is needed?
21:30:48 <brion> or is that even worse :D
21:30:49 <TimStarling> i18n of course is a mix of formats
21:31:05 <TimStarling> preprocessed plain text, preprocessed HTML and true wikitext
21:31:07 <brion> plaintext, plaintext plus, wikitext, html, .... oh helllllls
21:31:14 <robla> subbu: yeah, I think so
21:31:49 <TimStarling> well, except the qqq language which is pretty consistently wikitext
21:32:52 <brion> #info question: is explicit versioning needed? can/should we make a 'wikitext 1.1' that is always implemented for i18n and ui messages?
21:33:18 <brion> #info note i18n messages are a mix of plaintext+preprocess, HTML+preprocess, and pure wikitext
21:34:42 <TimStarling> robla, are you proposing any role for markdown on WMF wikis?
21:35:10 <Scott_WUaS> (What are the implications of these MediaWiki markdown choices/decisions re ContentTranslation and Wikipedia's 358 languages, and security questions especially?)
21:35:47 <robla> TimStarling: I think it potentially has a role in normalizing CopyPaste issues, but the path toward that is complicated
21:35:59 <brion> #info question: implications of markdown choices on other tools like CT, need for i18n, and security?
21:36:19 <subbu> that requires browsers, doc-creating systems (word, etc.) to support conversion to "standard" markdown.
21:36:45 <TimStarling> it seems very limited as an interchange format
21:37:02 <TimStarling> compared to RTF, HTML, PDF, etc.
21:37:22 <brion> if I were going to copy-paste from a markdown wiki page, bug report, or readme file on github for instance, my choices are to copy-paste the source, or copy-paste the rendered HTML
21:37:39 <robla> subbu: I think at a base level, we have a number of applications that claim "text/html" during copy/paste operations, but text/html copy pasting pretty much anything
21:37:52 <brion> we know that pasting text/html is way harder than it should be ;) but we already support it in VE
21:38:07 <subbu> brion, from some sources, yes.
21:38:13 <TimStarling> pasting HTML into VE is already good enough to be useful
21:38:14 <robla> brion: we support it today, but it's an arms race, isn't it?
21:38:17 <brion> benefits of source copy?
21:38:18 <TimStarling> I have used it a few times
21:38:19 <brion> hehe yep
21:39:00 <robla> no one (that I'm aware of) has defined a useful subset of HTML that is safe for copy/paste operations
21:39:08 <brion> but so is markdown isn't it?
21:39:22 <brion> if we support github's extensions, next we get asked about someone else's extensions
21:40:18 <brion> #info question is the HTML copy-paste "arms race" good enough vs markup-specific paste converter tools for markdown etc?
21:40:36 <TimStarling> HTML paste is likely to work if the HTML is very simple
21:41:00 <TimStarling> for example if you're copying from a github you'd expect it to work
21:41:29 <robla> TimStarling: is there a "very simple" subset of HTML we can get browser makers to support?
21:41:43 <robla> (for copy/paste purposes)?
21:41:47 <subbu> robla, you linked to ... what are your thoughts on how likely it is to be adopted?
21:42:24 <TimStarling> robla: no... but then browsers can't export to markdown either
21:42:31 <brion> #link
21:42:55 <robla> subbu: I think like that could happen
21:43:18 <subbu> our original goal for parsoid html2wt (which is still there as a comment in the serialization code) is to be able to accept arbitrary html and convert it to "acceptable" wikitext. but we haven't quite worked on that goal for a while now since we are mostly behind clients whose output is more controlled.
21:44:05 <robla> subbu: what do you mean by "output is more controlled"?
21:44:27 <subbu> as in .. VE/CX/Flow etc. don't generate arbitrary html.
21:44:39 <robla> ah, got it
21:45:25 <subbu> but, if you say, took the html from a bbc article and gave it to parsoid to convert to wikitext, the output isn't pretty.
21:45:35 <robla> so...basically, the copy/paste code works when we can control the generation of the HTML, but most implementations don't conform to our spec
21:45:51 <subbu> no, VE does its own handling of copy-pasted HTML .. it doesn't go through parsoid.
21:46:20 <brion> fun :D
21:46:29 <TimStarling> you mean it cleans up the HTML before it hands it to parsoid for serialization?
21:46:41 <subbu> but, we've talked about creating a library for normalization and cleanup.
21:47:01 <brion> #info for comparison, the HTML paste handling in VE is done by normalizing HTML on the VE end, before it eventually lands in parsoid during save/serialization
21:47:10 <subbu> TimStarling, as far as i know ... they strip unrecognized / unsupported attributes.
21:47:30 <brion> #info ideally the parsoid html2wt would take any html and produce 'acceptable' wikitext but is not fully exercised at that right now
21:49:28 <robla> things like html2wt are going to be necessary for a long time, I imagine, but it seems to me we should at least start pulling people toward a world where html2wt isn't necessary
21:50:32 <brion> well, there's the html-only world possibility :)
21:50:47 <brion> where you'd still have some validation stage
21:50:56 <brion> but not a major reparse i guess
21:51:14 <brion> (and presumably a stage to handle composition of templates, media etc)
21:51:56 <subbu> for parsoid to accept arbitrary html, we would need to run a sanitization pass on the html and strip unrecognized attributes, normalize html, etc.
21:52:09 <robla> I think we live in a world where wikitext is sanitized and tries to be safe, and HTML is known unsafe
21:52:28 <brion> indeed we'd have "inside html" and "outside html" at the least
21:52:32 <subbu> which is also something that needs to happen with a html-only wiki .. sanitization at the very least.
21:52:32 <brion> never, EVER mix em :D
21:52:44 <robla> there's no "sanitized HTML" spec
21:53:09 <subbu> :)
21:53:14 <brion> #info an HTML-only storage world needs to carefully sanitize between "outside HTML" and "safe inside HTML".... but there's no spec! we'd need one
21:53:37 <robla> there's the old HTML email spec
21:53:58 <robla> (but yeah, that's not really a good alternative)
21:54:38 <robla>
21:55:34 <brion> probably we need to spec out our extensions as well, such as how you extract the file name from a usage, a wiki page from a link, a template reference and parameter set from a big ol' blob of divs or whatever
21:55:46 <TimStarling> I think if VE's HTML paste can produce reasonable wikitext markup for any HTML generated from original markdown, then that more or less replaces the need for direct markdown paste
21:56:15 <brion> i tend to agree
21:56:36 <TimStarling> "original markdown" as in
21:56:45 <TimStarling> which is much simpler than pandoc markdown
21:57:28 <robla> commonmark would be the modern simple version, I think
21:57:47 <robla>
21:57:50 <brion> ok we're getting low on time
21:58:09 <brion> any action items to pursue? decisions made?
21:58:21 <subbu> T127329 is the placeholder for the parsoid side work to consolidate html-import/cleanup code into a library for use by whoever.
21:58:21 <stashbot> T127329: Using Parsoid as a wikitext bridge for importing content into wikitext format -
21:58:50 <brion> #link related parsoid bridge for html-import-to-wikitext
21:59:19 <Scott_WUaS> Thanks All!
21:59:26 <TimStarling> so I'm fairly skeptical about the idea of direct markdown paste as being superior to markdown->html->wikitext
21:59:28 <robla> subbu: my understanding is that you're working on RFCs as a goal soon, right?
21:59:28 <subbu> i was interested in the markdown strategy as a potential benefit for refactoring some code in mediawiki .. but looks like that is mostly already in place?
21:59:50 <brion> yay wikidata -> contenthandler \o/
22:00:17 <subbu> robla, rfcs for .. that task i pasted above?
22:00:21 <brion> #info tim is skeptical of direct paste; html import seems to serve well
22:00:33 <robla> subbu: something related to T112999?
22:00:34 <stashbot> T112999: Let MediaWiki operate entirely without wikitext -
22:00:43 <brion> #action someone should revise the RfC, probably drop the cut-paste
22:00:44 <subbu> ah, cscott territory.
22:00:49 <subbu> yes.
22:00:58 <brion> #action update T112999 for the ContentHandler era
22:00:58 <stashbot> T112999: Let MediaWiki operate entirely without wikitext -
22:01:15 <subbu> i'll chat with him about it.
22:01:36 <brion> #action subbu will chat with cscott
22:01:38 <brion> thanks all!
22:01:41 <brion> #endmeeting