HomePhabricator

ArchCom RFC Meeting: Replace Tidy in MW parser with HTML 5 parse/reserialize (2016-06-08, #wikimedia-office)
ActivePublic

Hosted by daniel on Jun 8 2016, 9:00 PM - 10:00 PM.

Description

Meeting summary

  • Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (robla, 21:02:12)
    • LINK: https://phabricator.wikimedia.org/E203 <-Phab event info, with links (robla, 21:03:17)
    • LINK: https://www.mediawiki.org/wiki/Html5Depurate (robla, 21:12:22)
    • LINK: https://github.com/wikimedia/html5depurate/blob/master/src/main/java/org/wikimedia/html5depurate/CompatibilitySerializer.java is a first step to teasing out exactly what tidy is doing, although I'd like to eventually document this properly on-wiki in english, not in code. (cscott, 21:12:54)
    • LINK: https://github.com/tstarling/remex-html (TimStarling, 21:16:01)
    • LINK: https://github.com/tstarling/remex-html parser that Tim is working on (robla, 21:16:47)
    • first 15-20 minutes of the meeting have been about different Tidy alternatives (robla, 21:17:50)
    • <subbu> we are replacing tidy with html5depurate .. that is where are headed right now. (robla, 21:18:16)
    • <gwicke> we will have to do something about third party use sooner rather than later <robla> this is a topic for a different meeting (robla, 21:38:10)
    • <cscott> i have (the start of a) slower pure-PHP implementation which might help with third party use. (note for that different meeting) (cscott, 21:39:18)
    • <TimStarling> first it should be deployed to all wikis as a gadget or other opt-in tool (robla, 21:42:45)
    • Tim intends to make the new html available to users as an experiment, by clicking a button to replace the old html with the new html, on request. (DanielK_WMDE__, 21:46:24)
    • <TimStarling> we could have a special page as well, to make it linkable; /wiki/Special:NewParser/Barack_Obama or something (DanielK_WMDE__, 21:47:10)
    • Tim suggests to run one instance of html5depurate on each app server, so mw can access it on localhost; html5depurate runs an integrated webserver (grizzly). (DanielK_WMDE__, 21:48:30)
    • <robla> so, the order is 1) rollout html5depurate instances 2) rollout special page+gadget 3) rollout to first wikis 4) plan full rollout based on step 3 (?) (robla, 21:55:29)
    • <robla> prerequisite for step 1: discussion with ops. prerequisite for step 3: discussion with liaisons, correct? <subbu> robla, sounds about right. (robla, 21:55:54)
    • <DanielK_WMDE__> robla: (2) should also be discussed with comcom/liaisons, i think (robla, 21:57:21)

Meeting ended at 22:00:19 UTC.

People present (lines said)

  • TimStarling (53)
  • subbu (37)
  • cscott (37)
  • robla (36)
  • gwicke (33)
  • DanielK_WMDE__ (23)
  • brion (6)
  • wm-labs-meetbot` (3)
  • stashbot (2)
  • bd808 (1)
  • Scott_WUaS (1)
  • ebernhardson (1)

121:01:58 <robla> #startmeeting ArchCom RFC Meeting: T89331 Replace Tidy in MW parser with HTML 5 parse/reserialize
221:01:58 <wm-labs-meetbot`> Meeting started Wed Jun 8 21:01:58 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot.
321:01:58 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
421:01:58 <wm-labs-meetbot`> The meeting name has been set to 'archcom_rfc_meeting__t89331_replace_tidy_in_mw_parser_with_html_5_parse_reserialize'
521:01:59 <stashbot> T89331: Replace Tidy in MW parser with HTML 5 parse/reserialize - https://phabricator.wikimedia.org/T89331
621:02:12 <robla> #topic Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
721:02:34 <robla> hi everyone!
821:02:35 <subbu> o/
921:03:17 <robla> #link https://phabricator.wikimedia.org/E203 <-Phab event info, with links
1021:03:41 <cscott> hey there
1121:03:44 <TimStarling> I wrote a bit of an update in the task description
1221:04:25 <robla> TimStarling: you had wanted to have this discussion be about the migration, and not so much about the design choices, right?
1321:04:51 <TimStarling> yes
1421:05:25 <TimStarling> I think the design was decided last year, and we've gone ahead and implemented it, but there's still a fair bit of work to do to get it out the door
1521:05:48 <gwicke> could you perhaps give a short summary of how this fits into your larger-picture roadmap?
1621:06:35 <subbu> whose roadmap specifically?
1721:06:42 <TimStarling> in the forseeable future, we plan on maintaining both the MW parser and Parsoid
1821:06:45 <gwicke> the parsing team roadmap
1921:07:15 <TimStarling> so it makes sense to do some work to bring them closer together in terms of syntax
2021:08:14 <robla> there are two questions Tim brought up at the end of the description of T89331. Question 1 ...
2121:08:15 <stashbot> T89331: Replace Tidy in MW parser with HTML 5 parse/reserialize - https://phabricator.wikimedia.org/T89331
2221:08:17 <robla> "Are we close enough now in visual diff testing to call that part of the project done? (96.79% showed less than 1% differences, 93.35% rendered with pixel-perfect accuracy.)"
2321:08:46 <subbu> I think there is benefit in and of itself to replace Tidy ... it has been requested for a long time. Plus, what TimStarling said about this bringing Parsoid and PHP parser output closer together.
2421:08:50 <gwicke> are you planning to add more DOM functionality in Html5Duperate?
2521:09:32 <TimStarling> cscott has some ideas about maybe using it for #balance
2621:09:40 <TimStarling> in which case I suppose it would need more DOM functionality
2721:09:46 <subbu> but, why is that relevant to this discussion?
2821:10:11 <gwicke> I'm mainly trying to understand where you are planning to go with Html5Duperate
2921:10:42 <subbu> ok ..
3021:11:02 <cscott> (one option for implementing #balance is to do the actual balancing in tidy or post-tidy, when we have clean serialized html5 with all tags matched, etc.)
3121:11:10 <gwicke> depending on your answers, it will be more or less required for third party users
3221:11:34 <TimStarling> currently we do not actually build a DOM in depurate, the parser gives us an event stream which we serialize
3321:12:10 <cscott> (one of *my* goals for a tidy replacement is obtaining better semantics for what the "tidy phase" does, exactly. tidy's actual transformations are not written down anywhere, and WMF wikis depend on their exact behavior.)
3421:12:22 <robla> #link https://www.mediawiki.org/wiki/Html5Depurate
3521:12:46 <subbu> gwicke, we want a tidy-replacement solution for 3rd party users and that replacement solution right now is html5depurate. but, other options are not ruled out for someone wanting to provide them.
3621:12:54 <cscott> https://github.com/wikimedia/html5depurate/blob/master/src/main/java/org/wikimedia/html5depurate/CompatibilitySerializer.java is a first step to teasing out exactly what tidy is doing, although I'd like to eventually document this properly on-wiki in english, not in code.
3721:13:18 <subbu> whatever that solution is has to solve the problems that we are solving right now, and for deployment on the wmf cluster, we need to solve the migration / how-to-roll-out problem.
3821:14:14 <brion> have we looked at https://github.com/Masterminds/html5-php for a pure-php solution?
3921:14:15 <cscott> if we can separate the "tidy compatibility" part from the html5 parsing/serialization part, we can (a) more easily use different html5 parsing/serialization solutions, depending on performance, ops considerations, etc, and (b) gradually deprecate the strangest corners of the "tidy compatiblity" part.
4021:14:25 <gwicke> so, are you planning to basically develop html5depurate to eventually be equivalent to Parsoid's DOM passes?
4121:14:38 <TimStarling> brion: yes, I think I had some notes about it on the task
4221:14:45 <brion> nice, i'll read up
4321:15:10 <TimStarling> yeah, show older comments and then search for it
4421:15:36 <TimStarling> I'm actually working on my own HTML 5 parser in PHP now
4521:15:42 <cscott> afaik, html5depurate is intended to be html5 parse+serialize only, with "as little as possible" of mediawiki-specific compat stuff. this means that some of what tidy is doing we end up moving into php.
4621:15:53 <subbu> gwicke, we haven't gotten that far yet since the core parser doesn't provide the functionality needed to implement Parsoid's DOM passes on top of that.
4721:16:01 <TimStarling> https://github.com/tstarling/remex-html
4821:16:22 <gwicke> so, you are planning to have a DOM in PHP, too?
4921:16:28 <subbu> so, short answer .. we won't be implementing those dom passes in depurate.
5021:16:34 <cscott> for example, https://gerrit.wikimedia.org/r/286928 moves some tidy-specific self-closing tag functionality into the sanitizer, so it doesn't have to be part of tidy (any of the possible tidy implementations)
5121:16:47 <robla> #link https://github.com/tstarling/remex-html parser that Tim is working on
5221:17:04 <subbu> gwicke, still trying to figure how that discussion is relevant to tidy replacement.
5321:17:08 <brion> ah looks like it (html5-php) may not handle all error corrections correctly per spec, which is worrying :)
5421:17:21 <gwicke> subbu: still trying to figure out where you are headed with this
5521:17:28 <cscott> and i think tim's doBlockLevels work is also intended to move more of the fixups done by tidy (sometimes incorrectly) into core. right?
5621:17:33 <subbu> we are replacing tidy with html5depurate .. that is where are headed right now.
5721:17:50 <robla> #info first 15-20 minutes of the meeting have been about different Tidy alternatives
5821:18:16 <robla> #info <subbu> we are replacing tidy with html5depurate .. that is where are headed right now.
5921:18:35 <TimStarling> doBlockLevels is actually generating invalid HTML from valid input
6021:18:43 <TimStarling> creating an awful mess for tidy to clean up
6121:18:44 <cscott> brion: just for completeness, https://gerrit.wikimedia.org/r/#/c/279669/7/includes/tidy/Balancer.php is another html5 "parser" (actually just the treebuilder phase) written in PHP.
6221:18:44 <cscott> "pp
6321:18:49 <TimStarling> I would like it to not do that in the first place
6421:18:59 <brion> cscott: tx
6521:19:44 <cscott> right, so the general idea is to gradually move to the place where "tidy" is just a standards-compliant html5 parse-and-reserialize, and all the other fixups are done in core (or avoided entirely, like with the doBlockLevels fixes)
6621:19:59 <gwicke> this sounds a lot like parsoid
6721:20:01 <cscott> we're not going to get there in one leap, the initial html5depurate will have some tidy compatibility hacks still.
6821:20:41 <cscott> gwicke: one goal is easier compatibility between php and parsoid, yes. but parsoid doesn't have a separate 'tidy' phase really.
6921:20:52 <TimStarling> I wonder if we should make Sanitizer into a proper balancing HTML parser
7021:21:26 <gwicke> it does beg the question if you are planning to gradually convert the PHP parser into a Parsoid port
7121:21:30 <TimStarling> it would simplify the main pass if the input were valid
7221:21:49 <cscott> well, parsoid is just differently structured. parsoid does token stream -> manipulations -> tree builder phase -> final manipulations. php does wikitext -> partially parsed html -> sanitizer -> doBlockLevels -> tidy -> languageconverter. or something like that.
7321:22:02 <TimStarling> gwicke: we've already talked about that twice, I don't really want to get into it again
7421:22:39 <robla> should we talk about visual diff test pass requirements?
7521:22:52 <subbu> i think this discussion is side-tracking into all the other things we can do with depurate / what the parsing team is doing / might be doing .. and is not about whether html5deuprate is an acceptable tidy replacement and what blocks its deployment.
7621:23:02 <gwicke> TimStarling: it would be interesting if you could clarify what your answer is
7721:23:04 <cscott> TimStarling: the input to removeHTMLtags is not fully-parsed HTML, so we can't actually do a proper HTML5 parse at that point. I tried that once.
7821:23:07 <TimStarling> robla: I think there are no comments on that
7921:23:21 <cscott> TimStarling: OTOH arlo has been gradually fixing the attribute regexps in the sanitizer (for example) to be html5-spec-compliant.
8021:23:38 <subbu> but https://phabricator.wikimedia.org/T134469#2281710 are my thoughts.
8121:24:15 <subbu> i'll prepare a wiki page addressing questions about parsing team roadmap / 2 parsers / etc .. i don't think that discussion is relevant to this rfc.
8221:24:27 <TimStarling> we need to start getting the community involved in text migration
8321:24:47 <gwicke> subbu: it matters wrt third party support, which is related to the migration
8421:24:53 <TimStarling> then they will have opinions on diff targets
8521:25:33 <subbu> gwicke, as it stands today .. TimStarling has written an abstract interface into which a tidy replacement can be dropped, and html5depurate is one viable option.
8621:25:37 <robla> TimStarling: I guess that's what I meant by requirements. I'm assuming you're trying to figure out how to get editors engaged on fixing the few bugs that are past the point of diminishing returns, right?
8721:26:27 <subbu> if a pure php alternative is avaialble that can be used .. if parsoid is available, then parsoid can be used ... but if parsoid is used for read views there, tidy is irrelevant ... so many possibilities.
8821:26:48 <subbu> but, for mediawiki installs that don't want parsoid, parsoid is not a solution for replacying tidy.
8921:26:49 <TimStarling> robla: yes, although diffs are not necessarily bugs
9021:27:28 <TimStarling> we don't really want to have exactly the same behaviour as tidy because we don't actually like tidy's behaviour
9121:27:49 <TimStarling> so it comes down to rolling out changes gradually, providing tools for editors
9221:27:57 <subbu> robla, https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy has a classification of diffs we currently see and possible strategies for addressing them.
9321:28:08 <TimStarling> perhaps James_F would have a sense of how the project would go
9421:28:27 <robla> do you envision a heterogeneous deployment for this? (i.e. one specific wiki first, then rolling out more widely)?
9521:28:28 <cscott> again, for completeness: https://gerrit.wikimedia.org/r/282733 is 90% of a pure-PHP implementation of html5depurate. It's not as fast as html5depurate, so WMF probably wouldn't use that in production, but it could ease the political issue for non-WMF users.
9621:28:52 <gwicke> if html5depurate is the equivalent of a HTML5 round-trip, then there are many alternative implementations
9721:29:15 <gwicke> I'm mostly worried about other features that require a DOM
9821:29:27 <TimStarling> it's an HTML parse with a custom serializer
9921:29:41 <TimStarling> *not* a compliant serializer
10021:30:00 <cscott> gwicke: sure. but only a pure PHP implementation really helps with the political problem.
10121:30:05 <subbu> right now, html5depuate it implements additional some simple tidy-like transforms to ease the migration from tidy .. otherwise the diffs would be numerous. the plan is to remove those passes one at a time using visual idffing and the process we come up with now ... that will migrate syntax gradually.
10221:30:21 <cscott> TimStarling: "not a compliant serializer" meaning the tidy-compatibility hacks? or the XHTML-compatibility hacks? or something else?
10321:30:35 <TimStarling> the tidy compatibility hacks
10421:30:39 <cscott> ok. sure.
10521:30:50 <TimStarling> the XHTML hacks will give you roughly the same thing if you reparse it
10621:31:01 <gwicke> cscott: there are two schools of thought there- one is that really the most important thing for third party users is a) resource needs, and b) setup / maintenance complexity
10721:31:08 <gwicke> the other is that it has to be PHP
10821:31:58 <cscott> +1 to subbu's comment about gradually migrating syntax.
10921:32:01 <brion> those are of course two separate things :) but related, in that extra daemons/languages/binaries add to the setup cost
11021:32:08 <DanielK_WMDE__> gwicke: the argument being that PHP still provides the best bang for the buck in terms of (b)
11121:32:11 <gwicke> how html5depurate fits into either remains to be seen; for the first case, it depends a lot on its resource needs
11221:32:25 <cscott> the "90% implementation" i mention above doesn't have tim's tidy compatibility hacks yet, that's the part that's still missing there.
11321:33:08 <gwicke> we can solve the setup issue with containers & automation (see mediawiki-containers)
11421:33:14 <subbu> gwicke, this is rehashing the first RFC discussion.
11521:33:38 <gwicke> what is the expected memory usage of html5depurate?
11621:33:41 <subbu> what are yo ugetting at exactly?
11721:34:06 <subbu> are you saying we shouldn't deploy html5depurate on the wmf cluster?
11821:34:20 <cscott> i didn't mean to distract us with my mention of the pure php option. in fact, what i meant was the opposite -- we should feel free to ignore some of the political issues wrt services because we might be able to do an end-run around them.
11921:34:30 <gwicke> memory isn't an issue on the cluster, but it is a critical resource for small VMs
12021:34:47 <cscott> so hopefully we can concentrate on wmf needs, and then have the "what's best for other deployments" as a separate discussion
12121:34:47 <robla> ok, I think this meeting is about deployment to the wmf cluster, no?
12221:35:33 <robla> so...I'd like to reask my earlier question: which production wiki would you plan to deploy to first?
12321:35:49 <DanielK_WMDE__> i'm not quite clear on where the service is going to run. it's not going to be a lonely tomcat instances somewhere, will it?
12421:36:20 <gwicke> we will have to do something about third party use sooner rather than later
12521:36:22 <TimStarling> it can run with one instance per MW host
12621:36:27 <TimStarling> listening on localhost
12721:36:33 <DanielK_WMDE__> can i read the plan for the service setup somewhere?
12821:36:34 <robla> gwicke: not in this meeting, though
12921:37:06 <gwicke> certainly not *doing*, but I think we'll need a plan soon
13021:37:09 <TimStarling> the plan is to basically install the existing debian packaging, and use it with the bundled configuration
13121:37:55 <DanielK_WMDE__> silly question: *are* we using tomcat? is it decent these days?
13221:38:10 <robla> #info <gwicke> we will have to do something about third party use sooner rather than later <robla> this is a topic for a different meeting
13321:38:17 <DanielK_WMDE__> (another silly question: can we use unix sockets instead of tcp-over-loopback?)
13421:38:27 <TimStarling> no, it's grizzly, it does its own daemonization and has an embedded grizzly webserver
13521:38:29 <subbu> TimStarling, can you also address robla's questions about deployment?
13621:38:50 <TimStarling> I started off in tomcat but switched to grizzly to make it easier to manage
13721:38:53 <DanielK_WMDE__> what servlet engine are we using for other stuff?
13821:38:55 <gwicke> if you are committing to addressing this issue as a team in a way that doesn't result in a significant change in overall MW system resource requirements, then this works for me
13921:39:13 <TimStarling> there is still a servlet class in the source tree but I'm not sure if it still works
14021:39:18 <cscott> #info <cscott> i have (the start of a) slower pure-PHP implementation which might help with third party use. (note for that different meeting)
14121:39:37 <bd808> DanielK_WMDE__: we don't have other servlets AFAIK
14221:39:43 <TimStarling> most java services do the same, they run an embedded webserver
14321:39:51 <TimStarling> that's how cirrus and gerrit work
14421:40:44 <DanielK_WMDE__> TimStarling: ah ok, i only saw that you talked about tomcat on the ticket, and I got worried ;)
14521:41:09 <robla> subbu: I think the answer to your question is "no" :-)
14621:41:21 <TimStarling> <robla> so...I'd like to reask my earlier question: which production wiki would you plan to deploy to first?
14721:41:30 <DanielK_WMDE__> TimStarling: also "is tomcat decent these days" - "no, it's grizzly" actually makes sense in it's own way ;)
14821:41:47 <TimStarling> first it should be deployed to all wikis as a gadget or other opt-in tool
14921:42:12 <TimStarling> when it is fully deployed, I'm not sure, maybe it would just follow the release train
15021:42:44 <TimStarling> the weekly train is not a bad deployment sequence
15121:42:45 <robla> #info <TimStarling> first it should be deployed to all wikis as a gadget or other opt-in tool
15221:42:45 <DanielK_WMDE__> TimStarling: how would the opt in work? split or skip the parser cache based on user preferences, or a cookie or something?
15321:42:51 <cscott> why not deploy to meta first?
15421:43:05 <gwicke> performance would likely be poor when used as a beta feature, because of parser cache misses
15521:43:06 <subbu> I think what we don't have full clarity is at what point we pull the switch on this being the default .. since there will be some rendering diffs that require fixing syntax on pages.
15621:43:29 <subbu> and when do we just deploy it and have some pages look broken and rely on editors fixing up the wikitext / templates.
15721:43:42 <TimStarling> maybe fetch a DocumentFragment via an API, switch it out when you click a button
15821:43:52 <gwicke> there are communities that are very open to trying new things, like Catalan
15921:43:53 <DanielK_WMDE__> i feel per-user opt-in makes this pretty tricky, and will be confusing to users. they can't easily show issues to each other, either...
16021:44:05 <TimStarling> maybe have a position:fixed button hovering over the page so you can flip back and forth
16121:44:13 <subbu> our visual diffing results and documentation on https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy gives us a good basis to make claims about what kind of things might possibly break.
16221:44:30 <gwicke> mw.org could also be a good testing ground
16321:44:31 <DanielK_WMDE__> TimStarling: ah, you want this on demand, with user interaction. that would be ok I guess.
16421:44:44 <subbu> gwicke, yes .. mw.org and catalan wikis are a good idea.
16521:44:48 <DanielK_WMDE__> otherwise, i'd have suggested to do per-namespace experiments
16621:45:01 <subbu> and cscott mentioned meta as well.
16721:45:35 <TimStarling> we could have a special page as well, to make it linkable
16821:45:58 <TimStarling> /wiki/Special:NewParser/Barack_Obama or something
16921:45:59 <cscott> commons ought to be safe-ish as well, since in theory most of the content there should be reasonably well-formed.
17021:46:05 * robla tries to remember who at WMF is most appropriate to loop into a rollout discussion
17121:46:22 <gwicke> liaisons and james might have other ideas for communities to target
17221:46:24 <DanielK_WMDE__> #info Tim intends to make the new html available to users as an experiment, by clicking a button to replace the old html with the new html, on request.
17321:46:37 <brion> nice
17421:47:10 <DanielK_WMDE__> #info <TimStarling> we could have a special page as well, to make it linkable; /wiki/Special:NewParser/Barack_Obama or something
17521:47:50 <robla> thanks DanielK_WMDE__ and thanks TimStarling . That clarifies the strategy a lot
17621:48:29 <subbu> TimStarling, robla what about deployment to individual wikis at some point .. like mw.org, meta .. before doing a full rollout?
17721:48:30 <DanielK_WMDE__> #info Tim suggests to run one instance of html5depurate on each app server, so mw can access it on localhost; html5depurate runs an integrated webserver (grizzly).
17821:49:35 <robla> subbu: yeah, I think that'll be a must. I think we'll need to get someone from liaisons to weigh on rollout ordering
17921:49:45 <subbu> k
18021:50:16 <DanielK_WMDE__> who is going to decide on where service instances live, how many we need, how they get managed, etc?
18121:50:21 <gwicke> mw.org has a good dogfooding factor
18221:50:22 <DanielK_WMDE__> that would be the next thing that needs planning, right?
18321:50:41 <gwicke> operational ownership as well
18421:50:43 <DanielK_WMDE__> hmmm dog food...
18521:50:58 <robla> om nom nom
18621:51:14 <TimStarling> yes, we will need to have a conversation with ops about it
18721:51:39 <gwicke> are there already metrics & logging?
18821:51:40 <TimStarling> I imagine the instances would be fully rolled out before the gadget deployment
18921:52:01 <TimStarling> logging yes, metrics no
19021:52:22 <TimStarling> the package installs a local log file
19121:52:27 <ebernhardson> w
19221:52:40 <DanielK_WMDE__> TimStarling: one per app server makes communication and setup easy, but may be overkill. do you think we need that many instances?
19321:53:28 <robla> so, the order is 1) rollout html5depurate instances 2) rollout special page+gadget 3) rollout to first wikis 4) plan full rollout based on step 3 (?)
19421:54:16 <DanielK_WMDE__> hm, most parsing doesn't happen on web accessible hosts, but on the jobqueue runners, right?
19521:54:22 <TimStarling> DanielK_WMDE__: I don't mind, I think that plan came out of a previous IRC discussion but it can go wherever ops wants it to go
19621:54:24 <subbu> step 0: talk with ops, puppetize, etc. i suspect.
19721:54:50 <robla> prerequisite for step 1: discussion with ops. prerequisite for step 3: discussion with liaisons, correct?
19821:54:51 <DanielK_WMDE__> TimStarling: yea, i'm just saying, we'll need a plan for that :)
19921:55:04 <subbu> robla, sounds about right.
20021:55:10 <cscott> 3) is a wmf-config change, should be easy to quickly rollback if there are any problems.
20121:55:29 <robla> #info <robla> so, the order is 1) rollout html5depurate instances 2) rollout special page+gadget 3) rollout to first wikis 4) plan full rollout based on step 3 (?)
20221:55:39 <gwicke> subbu: are you planning to help with third-party support work?
20321:55:54 <robla> #info <robla> prerequisite for step 1: discussion with ops. prerequisite for step 3: discussion with liaisons, correct? <subbu> robla, sounds about right.
20421:56:14 <DanielK_WMDE__> robla: (2) should also be discussed with comcom/liaisons, i think
20521:56:28 <cscott> 2a) is publicize the gadget well in tech news so that folks can help fix up bad markup, know what to look for, etc.
20621:56:34 <robla> we're coming up on the top of the hour
20721:57:08 <subbu> gwicke, we can have additional conversation about that .. but initially, a simple solution is: tidy replacement is not recommended for 3rd parties till we have some experience with this .. that is just my thought .. cscott and TimStarling might have other ideas.
20821:57:21 <robla> #info <DanielK_WMDE__> robla: (2) should also be discussed with comcom/liaisons, i think
20921:57:58 <TimStarling> yeah, not many benefits for third parties at this point
21021:58:13 <gwicke> vagrant is another question
21121:58:20 <cscott> there's a bit of a messaging question there: do we pitch this as "just a faster tidy, don't worry about it if you don't have a huge wiki"?
21221:58:27 <robla> TimStarling: anything else you want to make sure we cover in this hour, or should I plan on hitting #endmeeting in a minute?
21321:58:27 <Scott_WUaS> good discussion!
21421:58:39 <cscott> if we pitch it as "cleaner markup" or something like that, then folks may wish to run it immediately
21521:58:44 <subbu> ya .. adventurous wikis might decided to go with it, but it is at their own risk (like broken rendering, need to fix wikitext, templates, etc.).
21621:58:51 <TimStarling> there's no time for new topics
21721:59:03 <cscott> we'll have it packaged, so there's nothing *stopping* 3rd parties from running it
21821:59:11 <subbu> config option
21921:59:14 <robla> discussion can continue on #mediawiki-parosid?
22021:59:14 <cscott> it's just we won't push it as a "you should run this". not yet at least.
22121:59:23 <cscott> at some point in the future we'll probably deprecate the WMF fork of tidy.
22221:59:24 <subbu> robla, works for me.
22321:59:29 <robla> parsoid even? :-)
22421:59:30 <TimStarling> #mediawiki-parsoid perhaps
22521:59:32 <cscott> (since WMF doesn't even run stock tidy)
22621:59:46 * cscott has to turn into a pumpkin
22721:59:52 * DanielK_WMDE__ is getting parasoid
22821:59:53 <gwicke> to clarify, I strongly support the general direction of moving to HTML5 parsing over tidy; my concerns are mostly about the details of the implementation
22921:59:59 <cscott> i'm just saying let's be careful about messaging.
23022:00:00 <subbu> gwicke, thanks.
23122:00:02 <robla> alright meeting turning into a pumpkin in a few seconds :-)
23222:00:17 <robla> thanks everyone!
23322:00:19 <robla> #endmeeting

Architecture meetings
13:00 PT ArchCom Planning Meetingsupcomingall since 2016-03-30
14:00 PT ArchCom-RFC Meetingsupcomingall since 2015-09-09

Recurring Event

Event Series
This event is an instance of E66: ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office), and repeats every week.

Event Timeline

Tentative choice: Consensus meeting about T135963

I wrote:

Tentative choice: (something wrong)

My mistake. The plan we agreed to T89331: Replace HTML4 Tidy in MW parser with an equivalent HTML5 based tool. Sorry for the slow correction to the problem.

RobLa-WMF renamed this event from ArchCom RFC Meeting: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting: Replace Tidy in MW parser with HTML 5 parse/reserialize (2016-06-08, #wikimedia-office).Jun 6 2016, 11:30 PM
RobLa-WMF updated the event description. (Show Details)
RobLa-WMF added projects: Tidy, Parsing-Team.
RobLa-WMF updated the event description. (Show Details)Jun 8 2016, 10:12 PM
daniel renamed this event from ArchCom RFC Meeting: Replace Tidy in MW parser with HTML 5 parse/reserialize (2016-06-08, #wikimedia-office) to ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office).Nov 21 2016, 6:11 PM
daniel changed the host of this event from RobLa-WMF to daniel.
daniel invited: ; uninvited: .
daniel updated the event description. (Show Details)
daniel updated the event description. (Show Details)Dec 9 2016, 7:43 AM
daniel renamed this event from ArchCom RFC Meeting Wxx: <topic TBD> (<see "Starts" field>, #wikimedia-office) to ArchCom RFC Meeting: Replace Tidy in MW parser with HTML 5 parse/reserialize (2016-06-08, #wikimedia-office).