Page MenuHomePhabricator

Android app edit summary becomes URL encoded
Open, In Progress, LowPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

What happens?:

Edit summary for sections that are in Chinese becomes URL encoded.

image.png (1×2 px, 779 KB)

What should have happened instead?:

UTF-8 character should be displayed instead.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

Event Timeline

@cooltey do you have any idea if this information comes from the Android App logic or mobile-html output?

Hi @MSantos

Looks like the id in mobile-html for each section has been encoded (see below).

1654278133144.jpg (1×1 px, 606 KB)

compare to the desktop version, each section contains both regular id and encoded id.

1654278293781.jpg (949×838 px, 295 KB)

Change 803476 had a related patch set uploaded (by Vadim Kovalenko; author: Vadim Kovalenko):

[mediawiki/services/mobileapps@master] Android app edit summary becomes URL encoded

https://gerrit.wikimedia.org/r/803476

The patch above is intended to test a possible solution. Seems that Parsoid propagates already encoded id values and I'm able to decode them on mobile apps (only for zh wiki). Parsoid output - https://zh.wikipedia.org/api/rest_v1/page/html/%E5%BC%80%E5%B9%B3%E5%B8%82/71986109 (see the shot as well).

Screenshot 2022-06-07 at 16.47.06.png (1×1 px, 563 KB)

@MSantos , @cooltey

This appears correctly on desktop, eg: https://zh.wikipedia.org/w/index.php?title=2011%E5%B9%B4%E5%A4%A7%E8%A5%BF%E6%B4%8B%E9%A3%93%E9%A3%8E%E5%AD%A3%E6%97%B6%E9%97%B4%E8%BD%B4&action=history

Both legacy and Parsoid output both "legacy" IDs and "html5" IDs, like so:

  • Legacy Parser output:
<h2><span id=".E9.A3.8E.E6.9A.B4.E6.97.B6.E9.97.B4.E8.BD.B4"></span><span class="mw-headline" id="风暴时间轴">風暴時間軸</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="https://zh.wikipedia.org/w/index.php?title=2011%E5%B9%B4%E5%A4%A7%E8%A5%BF%E6%B4%8B%E9%A3%93%E9%A3%8E%E5%AD%A3%E6%97%B6%E9%97%B4%E8%BD%B4&amp;section=1&amp;veaction=editsource" title="Edit section: 風暴時間軸">edit source</a><span class="mw-editsection-bracket">]</span></span></h2>
  • Parsoid output:
<h2 id="foo=bar"><span id="foo.3Dbar" typeof="mw:FallbackId"></span>foo=bar</h2>

Somebody seems to be picking the "wrong" ID. For the legacy parser you should use the id with the"mw-headline" class; for Parsoid you should use the ID on the <h> tag and *not* the one with the typeof="mw:FallbackId".

But it looks like the mobile-html API is returning the right (ie, not encoded) version of this ID, eg:

https://zh.wikipedia.org/api/rest_v1/page/mobile-sections/2011%E5%B9%B4%E5%A4%A7%E8%A5%BF%E6%B4%8B%E9%A3%93%E9%A3%8E%E5%AD%A3%E6%97%B6%E9%97%B4%E8%BD%B4

The bug seems to be in lib/transformations/wrapSections.js in mobileapps, which is trying to recreate section information from the legacy HTML output.
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/mobileapps/+/refs/heads/master/lib/transformations/wrapSections.js#33

const makeSection = ( heading, id, text ) => {
	const headingLabel = heading ? heading.querySelector( 'span[id]' ) : null;
	return {
		id,
		anchor: headingLabel ? headingLabel.getAttribute( 'id' ) : '',
		toclevel: heading ? getTocLevel( heading ) : undefined,
		line: heading ? heading.textContent : undefined,
		text
	};
};

I'm pretty sure the heading.querySelector( 'span[id]' ) should be heading.querySelector( 'span.mw-headline[id]' ) or something like that.

Thank you @cscott, I've added a fix for parsing the legacy HTML output and set the patch to the code review.

Change 803476 merged by jenkins-bot:

[mediawiki/services/mobileapps@master] Android app edit summary becomes URL encoded

https://gerrit.wikimedia.org/r/803476

I don't know if the merged patch above solved this problem, but seems edits from Android Apps are still wrongly encoded, an example.

I don't know if the merged patch above solved this problem, but seems edits from Android Apps are still wrongly encoded, an example.

Looks like this fix is still waiting to be deployed. @MSantos Do you know what's the schedule to deploy it?