Page MenuHomePhabricator

wfMsgWikiHtml does not ensure XHTML validity
Closed, InvalidPublic

Description

Author: bim2007

Description:
[Fatal Error] :43:3: The element type "br" must be terminated by the matching
end-tag "</br>".
c:\x\arstaticwiki\ar\!\!\!\صورة~!!!!ユニセフ0195.JPG_c267.html
org.xml.sax.SAXParseException: The element type "br" must be terminated by the
matching end-tag "</br>".
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :133:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :102:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :119:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :162:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Fatal Error] :44:3: The element type "br" must be terminated by the matching
end-tag "</br>".
c:\x\arstaticwiki\ar\(\2\6\صورة~(2691)_Tel_Aviv.jpg_40ef.html
org.xml.sax.SAXParseException: The element type "br" must be terminated by the
matching end-tag "</br>".
[Error] :172:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :82:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :172:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Fatal Error] :43:3: The element type "br" must be terminated by the matching
end-tag "</br>".
c:\x\arstaticwiki\ar\-\3\4\صورة~-34_sibirien_sviatoinos_bucht.JPG.JPG_07d9.html
org.xml.sax.SAXParseException: The element type "br" must be terminated by the
matching end-tag "</br>".
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".
[Error] :114:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :84:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :187:51: Attribute value "p-المشاركة و المساعدة" of type ID must be an
NCName when namespaces are enabled.
[Error] :6:8: The content of element type "head" is incomplete, it must match
"((script|style|meta|link|object|isindex)*,((title,(script|style|meta|link|object|isindex)*,(base,(script|style|meta|link|object|isindex)*)?)|(base,(script|style|meta|link|object|isindex)*,(title,(script|style|meta|link|object|isindex)*))))".


Version: unspecified
Severity: minor
Platform: PC

Details

Reference
bz9880

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
InvalidNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:39 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz9880.
bzimport added a subscriber: Unknown Object (MLST).
  1. Check with current version
  1. Provide sample input to produce this
  1. Compare with existing bug entries for ID issues

bim2007 wrote:

Hmm. I'm just consuming what's coming out of the public wikipedia site, which, I
presume, doesn't run quite current version.

So, much as I'd like to be a good citizen here, I'm not sure how to proceed.

Let me ask this question: is the claim that the current version would prevent
the unclosed br tags? Those are the big problem for me. If that's the claim, I
might be able to try an experiment to see if there is still a hole allowing
people to create them.

Please provide URLs to the pages you're checking, then.

ayg wrote:

We don't validate any IDs whatsoever, including those produced by the
interface; that issue is known. The invalid IDs produced there would be
typical for those involving the portal of an Arabic-alphabet site. See bug

It should be completely impossible for Wikipedia, which has HTML Tidy enabled,
to have unclosed <br> tags. I can't see any on ar.wikipedia's Main Page or at
[[ar:تل أبيب]] (the Tel Aviv article that you appear to have been using).

bim2007 wrote:

I'm working from the most recent AR static dump (April). Is it likely that the
quality of the tidy processing has gone up materially since then?

I'll attach a file ... I've yet to succeed in finding a live page to match one
of filenames. The page I've got here isn't the straight Tel Aviv page, it's some
special JPG-rights-explaining page.

bim2007 wrote:

File with an unclosed br.

Attached:

ayg wrote:

I see the problem. [[ar:MediaWiki:Sharedupload]] is at fault. Probably we
should run its output through Tidy or Sanitizer or something (does Sanitizer
fix unclosed <br>s?), if that's not too slow. As a site-specific workaround,
you can ask a sysop there to edit the message to begin with <br
style="clear:both" /> instead of <br style="clear:both">, or sed your files to
kill that string, but this should probably be fixed in the function itself?

bim2007 wrote:

Thank you for tracking this down from my less than informative breadcrumbs.

If these are relatively uninteresting pages, I can switch on XML parsing and
ignore pages that flunk due to this problem.

See the configuration settings for tidy usage; we have it disabled for UI
messages for performance reasons.