bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz48891.
TheDJ created this task.Via LegacyMay 28 2013, 12:34 PM
Jdforrester-WMF added a comment.Via ConduitMay 28 2013, 12:47 PM

Yes, the Translate extension needs to be extended to support VisualEditor editing, but the core issue is that Parsoid doesn't recognise <translate> as an extension block; should it take the local list of registered extensions, and for ones it hasn't been told how to deal with just wrap it as a typeof="mw:Object/Unknown" or something similar?

ssastry added a comment.Via ConduitMay 28 2013, 5:19 PM

Parsoid does fetch a list of registered extensions from the config and uses that to wrap the block with mw:Object/Extension/<ext-name-here> type. All other unknown tags are pass through as plain text.

The problem is that mediawiki does not report <translate> as an installed extension. Check http://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&format=jsonfm&siprop=extensiontags

Any idea why mw.org is not reporting <translate> as a registered extension?

Jdforrester-WMF added a comment.Via ConduitMay 28 2013, 6:31 PM

(In reply to comment #2)

Parsoid does fetch a list of registered extensions from the config and uses
that to wrap the block with mw:Object/Extension/<ext-name-here> type. All
other unknown tags are pass through as plain text.

Ah, my apologies; I thought you were doing this, but it didn't even occur to me that the Translate extension might fail to do this. Re-filing.

Nikerabbit added a comment.Via ConduitMay 29 2013, 6:41 AM

<translate> and <tvar> are not a normal extension tags. Please explain why I should register them.

ssastry added a comment.Via ConduitAug 1 2013, 5:32 PM

Parsoid needs to know if something is a valid extension tag so that it can process them rather than escape them to literal text. This will also let VE support them in the future and protect them from being modified right now.

Is there any other way to discover that <translate>, <tvar>, <language> are valid tags on the wiki, i.e. what is the mediawiki api endpoint? So far we've been using siprop and fetching installed extension tags and use that to detect these.

ssastry added a comment.Via ConduitAug 1 2013, 5:34 PM
  • Bug 52408 has been marked as a duplicate of this bug. ***
Esanders added a comment.Via ConduitAug 1 2013, 6:02 PM

@Niklas are there issues with registering them?

Nikerabbit added a comment.Via ConduitAug 1 2013, 6:06 PM

I don't know. Either the registration would just be symbolic without any side effects, or it could also interfere with the current functionality.

Nikerabbit added a comment.Via ConduitAug 1 2013, 7:28 PM

Does someone want to try registering them and see what happens? I probably don't have time before Wikimania.

ssastry added a comment.Via ConduitAug 22 2013, 11:33 PM

Ping.

gerritbot added a comment.Via ConduitSep 9 2013, 6:20 PM

Change 83427 had a related patch set uploaded by Nikerabbit:
Register translate and tvar to the parser

https://gerrit.wikimedia.org/r/83427

Nikerabbit added a comment.Via ConduitSep 9 2013, 6:24 PM

See the above commit. It is not thoroughly tested. I still don't understand what would the benefit.

ssastry added a comment.Via ConduitSep 9 2013, 7:20 PM

Niklas: If VE has to support editing for <translate>, <tvar>, etc. tags, Parsoid has to know about those tags. Parsoid finds out about installed tags by querying mediawiki API. So, either MW API should expose this information for hooks automatically or we need a different API endpoint, or you have to register them.

Nikerabbit added a comment.Via ConduitSep 10 2013, 12:19 AM

In the long term yes, but wont this render VE useless on translatable pages in the short them?

Jdforrester-WMF added a comment.Via ConduitSep 10 2013, 12:33 AM

(In reply to comment #14)

In the long term yes, but wont this render VE useless on translatable pages
in the short them?

Yes. On MW.org you can use the alien node editor to edit any extension node's contents (but they won't be rich), but on other wikis it will make them un-editable in VE.

However, making a Translation extension editor in VE shouldn't be hard at all - I've created that as bug 53974 along the lines of other bugs.

Qgil added a comment.Via ConduitSep 27 2013, 11:21 PM

fwiw this bug has been mentioned at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#VisualEditor_weekly_update_-_2013-09-26_.28MW_1.22wmf19.29

I have also seen this problem here and there while editing at mediawiki.org. Then again these tags are not used in most Wikimedia projects or MediaWikis there so fair enough.

siebrand added a comment.Via ConduitOct 4 2013, 7:13 AM

Assigning to James, hoping this may get a parsoid person to review the patch that's been available for nearly a month now.

Jdforrester-WMF added a comment.Via ConduitOct 4 2013, 4:13 PM

(In reply to comment #17)

Assigning to James, hoping this may get a parsoid person to review the patch
that's been available for nearly a month now.

Oh. We've been patiently waiting for you to merge it. :-(

Will ping the team.

Elitre added a comment.Via ConduitOct 5 2013, 11:25 AM

I think I already did it yesterday.

Nemo_bis added a comment.Via ConduitOct 5 2013, 11:49 AM

(In reply to comment #19)

I think I already did it yesterday.

There is a comment, but now a followup question requires addresssing

gerritbot added a comment.Via ConduitOct 5 2013, 12:37 PM

Change 83427 merged by jenkins-bot:
Register translate and tvar to the parser

https://gerrit.wikimedia.org/r/83427

Jdforrester-WMF added a comment.Via ConduitOct 7 2013, 3:51 PM

I don't deserve the credit for Niklas's fix. :-)

GWicke added a comment.Via ConduitOct 7 2013, 7:14 PM

That patch won't work well with Parsoid, see the comment in the patch set.

siebrand added a comment.Via ConduitOct 9 2013, 8:43 AM

This issue appears to suffer from scope creep. The requirements of the current summary have been met. See https://translatewiki.net/wiki/Special:Version?uselang=en: Section "Parser extension tags" now shows "<translate> and <tvar>"

Nikerabbit added a comment.Via ConduitOct 9 2013, 9:39 AM

Misunderstanding or miscommunication about the actual requirements is not scope creep. The patch which was made does not add value. I changed the title to reflect the issue with my understanding of it.

ssastry added a comment.Via ConduitOct 9 2013, 3:14 PM

The patch will work with Parsoid right now, because Parsoid always goes through the PHP parser when dealing with extensions and tag hooks. But, there are plans for Parsoid to bypass the PHP parser and call the extensions/hooks directly.

Right now, The translate code preprocesses the page by stripping the <translate> and <tvar> tags before the PHP parser actually parses the page source (by registering for ParserBeforeStrip hook). So, even though the tag hooks are registered (with a callback that bombs if called), the PHP parser never gets to calling them.

If Parsoid is able to mimic the behavior (by calling all hooks, not just extension callback hooks), the existing code will continue to work. However, if Parsoid only supports tag extensions directly, then, the ParserBeforeStrip code won't be invoked by Parsoid at that time and this will be a problem at that point.

So, this is not an issue *right now*, but could be at a later point depending on what functionality Parsoid will implement natively and what it will continue to defer to the PHP parser.

I think Gabriel was responding to that future concern and also hoping that we can use the opportunity of Parsoid's ongoing development to cleanup some of these interfaces and mechanisms. We could streamline extensions to go through narrow interfaces rather than continuing to use all sort of hooks into various points of the parsing timeline. Anything this is something we could discuss more.

Hope this summarizes where we stand now.

ssastry added a comment.Via ConduitOct 9 2013, 3:26 PM

Actually, now that I wrote that, I might have misspoken about whether it will work now as well. Niklas is right. For <translate> and <tvar> tags found in top-level pages, since we never go through PHP parser at all, this may still not work.

Parsoid will recognize the hooks because of the registration. In turn, it will call the PHP parser to translate the extension content, which in turn may callback into the translate extension rather than the ParserBeforeStrip callback (which is what <translate> and <tvar> relies on), and that will bomb.

One of us (Gabriel or me) will experiment with the updated code locally and update this bug.

ssastry added a comment.Via ConduitOct 9 2013, 3:57 PM

Okay, we tested and all is good for now. You can ignore #c27 :-).

As I indicated in #c26, this is a problem for when we start updating Parsoid code to call extensions directly rather than go through the full parse pipeline (which was meant to be a temporary hack while we figured out how to call extensions directly).

GWicke added a comment.Via ConduitOct 9 2013, 4:01 PM

To summarize:

  • We currently work around the missing tag hook API by calling the full action=parse pipeline for each extension tag. This is quite expensive as you can imagine.
  • This lets the translate hack work for now. At least it won't crash.
  • The translate extension will crash once we actually call the registered tag hook.

So there is some time to fix this up, but we should not pretend that it is fixed right now.

Can you provide more information on what stops you from using the actual tag hook?

Nikerabbit added a comment.Via ConduitOct 17 2013, 8:07 AM

I just found out that if edit is made via API, the ParserBeforeStrip hook I am using does not get called and the exception is shown. Does anybody have an idea why?

ssastry added a comment.Via ConduitOct 17 2013, 3:13 PM

Niklas: To me, it looks like an existing bug got exposed. Can you provide more details? Maybe full url of the API call? And/or, if you hop onto #mediawiki-parsoid, we could investigate.

ssastry added a comment.Via ConduitOct 17 2013, 4:11 PM

Niklas: So, here is what I found:

https://meta.wikimedia.org/w/api.php?action=parse&text=<translate>foo</translate> works and does not crash.

https://meta.wikimedia.org/w/api.php?action=parse&text=<translate><!--T:1-->foo</translate> throws the exception.

I am a bit baffled why the comment in the extension content should change what hook is called. I'll let you investigate from here. It might be worth using this opportunity to use a regular extension hook (rather than the ParserBeforeStrip hook), if that might work for you.

bzimport added a comment.Via ConduitOct 23 2013, 10:50 AM

gmaruzz wrote:

bump

Nikerabbit added a comment.Via ConduitOct 24 2013, 8:14 AM

Trying the parser hook solution now. I have code like:

		$parser->setHook( 'translate', function ( $input, $params, $parser, $frame ) {
			$re = '~<tvar\|([^>]+)>(.*?)</>~u';
			$output = preg_replace( $re, '\2', $input );

			$output = $parser->recursiveTagParse( $output, $frame );
			$output = trim( $output );
			return $output;
		} );

This seems to work okay on simple text, with the exception of section edit links.

I'm still trying with more complex pages with templates as well extensions and our tutorial.

gerritbot added a comment.Via ConduitOct 24 2013, 9:56 AM

Change 91583 had a related patch set uploaded by Nikerabbit:
Add $wgTranslatePageTranslationUseParserHook

https://gerrit.wikimedia.org/r/91583

Nikerabbit added a comment.Via ConduitOct 24 2013, 10:00 AM

Issues I have found:

  • [MAJOR] Disabling edit section links does not work for headings inside <translate> tags
  • [NORMAL] Edit section links don't work. Not a regression, but would be nice fix. Above issue make this a bigger issue.
  • [BLOCKER] Table of contents does not include links
  • [MINOR] White space trimming behavior has changed. The original logic cannot be implemented with information provided for the parser hook. You can see the buildup at bottom of http://dev.translatewiki.net/wiki/Extension:Translate compare with http://www.mediawiki.org/wiki/Help:Extension:Translate . This is annoying and would be nice to fix, but the workaround is to edit page source. Documentation about whitespace before has to be update if this way is chosen.

Please help me solve the remaining issues.

Nikerabbit added a comment.Via ConduitOct 24 2013, 10:04 AM

Also, please test the patch if you can to see if you find any other changes in behavior.

Nikerabbit added a comment.Via ConduitOct 24 2013, 10:07 AM

Editing translatable pages via API seems to work now.

GWicke added a comment.Via ConduitOct 25 2013, 11:03 PM

(In reply to comment #36)

Issues I have found:

  • [MAJOR] Disabling edit section links does not work for headings inside <translate> tags

You could try marking your output as HTML according to https://www.mediawiki.org/wiki/Manual:Tag_extensions#How_can_I_avoid_modification_of_my_extension.27s_HTML_output.3F

I'm not sure whether that suppresses section edit links too, but worth a try IMO.

  • [NORMAL] Edit section links don't work. Not a regression, but would be nice fix. Above issue make this a bigger issue.
  • [BLOCKER] Table of contents does not include links

Can you check whether your headings are matched by the formatHeadings regexp?

Ideally we'd move both TOC and section edit links to JS / CSS. That is our plan for Parsoid output, but not going to happen over night.

Nikerabbit added a comment.Via ConduitOct 26 2013, 10:01 AM

(In reply to comment #39)

(In reply to comment #36)
> Issues I have found:
>
> * [MAJOR] Disabling edit section links does not work for headings inside
> <translate> tags

You could try marking your output as HTML according to
https://www.mediawiki.org/wiki/Manual:
Tag_extensions#How_can_I_avoid_modification_of_my_extension.27s_HTML_output.
3F

This causes (lack of) whitespace issues and breaks all lists.

I'm not sure whether that suppresses section edit links too, but worth a try
IMO.

It doesn't.

> * [NORMAL] Edit section links don't work. Not a regression, but would be nice
> fix. Above issue make this a bigger issue.
> * [BLOCKER] Table of contents does not include links

Can you check whether your headings are matched by the formatHeadings regexp?

I'm assuming it processes the html I'm returning, which is:
<h2><span class="mw-headline" id="Culture">Culture</span><mw:editsection page="Pupu" section="1">Culture</mw:editsection></h2>

Not sure whether that matches or not.

Ideally we'd move both TOC and section edit links to JS / CSS. That is our
plan
for Parsoid output, but not going to happen over night.

Any changes for interim solutions?

gerritbot added a comment.Via ConduitApr 16 2014, 3:51 PM

Change 91583 abandoned by Siebrand:
Add $wgTranslatePageTranslationUseParserHook

Reason:
Abandoned as there is no progress on this patch set. Can be re-activated if there is outlook on getting this in a mergable state.

https://gerrit.wikimedia.org/r/91583

Jdforrester-WMF added a comment.Via ConduitMay 10 2014, 9:58 AM

No patch any more. :-(

Nemo_bis added a comment.Via ConduitMay 11 2014, 9:55 AM

(In reply to James Forrester from comment #42)

No patch any more. :-(

Hmm I'm not sure that's what "Can be re-activated if there is outlook on getting this in a mergable state" means, PATCH_TO_REVIEW seems a sensible way to tag such stale patches in need of TLC.

Jdforrester-WMF added a comment.Via ConduitMay 11 2014, 9:56 AM

(In reply to Nemo from comment #43)

(In reply to James Forrester from comment #42)
> No patch any more. :-(

Hmm I'm not sure that's what "Can be re-activated if there is outlook on
getting this in a mergable state" means, PATCH_TO_REVIEW seems a sensible
way to tag such stale patches in need of TLC.

No. There is no patch to review for merge. There is patch to re-*do*.

Nikerabbit added a comment.Via ConduitOct 6 2014, 6:38 AM

To be clear: there is nothing wrong with that patch itself, it just doesn't work due to limitations in the parser itself. I cannot continue with this patch/bug until I get help from other people to modify parser/parsoid.

Kabhi2104 added a subscriber: Kabhi2104.Via WebJan 24 2015, 4:26 AM
Nemo_bis added a comment.Via WebJan 24 2015, 3:12 PM

Kabhi2104 asked on #mediawiki-i18n:

Nemo_bis: i had one doubt, what does normal parser tags mean ?

Explained briefly from a non-technical perspective, this means that <translate> is a "tag" for the Translate extension, but the parser "doesn't know" it. A symptom is the absence from the list on Special:Version.

Concretely, the parser deals with tags in ways that interfere with the Translate syntax, with some consequences identified in T50891#524894. But even in the current situation there are some limitations, especially for the newline, which is meaningful for the parser and perhaps shouldn't be required.

More broadly, one could speculate that most wikitext markup is meant to add or modify the HTML of the resulting page, or properties of the page in the database: while translate tags don't do either and are properties of the wikitext itself.

Nikerabbit lowered the priority of this task from "Normal" to "Low".Via WebJan 28 2015, 5:07 AM
Nikerabbit moved this task to page translation on the MediaWiki-extensions-Translate workboard.
Nikerabbit raised the priority of this task from "Low" to "Normal".Via WebJan 28 2015, 5:46 AM
Nemo_bis changed the task status from "Open" to "Stalled".Via WebJan 28 2015, 6:57 AM
Nemo_bis set Security to None.
Nikerabbit added a comment.Via WebJan 28 2015, 10:01 PM

I had a chat with Subbu about this. In the short tem (this year) we could add configurable support for <translate> tags in parsoid. That support will preserve the tags and comments when pages are edited with VE. In the long term we will implement support for translatable pages without any kind of markup. Marking those pages for translation would only work with VE and it needs special page translation module in VE.

ssastry added a comment.Via WebJan 29 2015, 9:36 PM

Since <translate> is a special extension that hooks into the parser pipeline in a way that is not visible to Parsoid, and also since it doesn't render anything new, but marks DOM fragments for translation, the ideal way to deal with this in the short term (till translation goes with a VE based approach) is to implement a native version of translate in Parsoid (just like Cite and planned Gallery extension).

This native translate will be relatively straightforward and will not do anything special beyond recognizing the <translate> and associated tags and marking up DOM fragments with a special typeof that clients like VE or gadgets or anyone could use to enable translation. In addition, it would also have to register a html2wt handler for serializing it back to wikitext.

This should not be a lot of work -- most of it will be in figuring out if there are any tricky things that the translate extension does. Talking to Niklas, we figured this is not high priority, but something worth tackling over the next 6 months.

ssastry added a project: Parsoid.Via WebJan 29 2015, 9:37 PM
ssastry moved this task to Requests from VE/Flow/others on the Parsoid workboard.Via WebJan 31 2015, 12:42 AM
marcoil changed the title from "<translate> and <tvar> do not work with Parsoid because they are not normal parser tags" to "Parsoid should support <translate> and <tvar> natively as they are not normal parser tags".Via WebFeb 10 2015, 2:49 PM
marcoil added a project: Blocked-on-Parsoid.
marcoil moved this task to Backlog on the Parsoid workboard.Via WebFeb 13 2015, 12:44 PM
awight added a subscriber: awight.Via WebFri, May 8, 3:22 AM

Add Comment