T106578: Update Sanitizer to match legal HTML5 character entities.
Invalid HTML5 character entities become instances of UTF8_REPLACEMENT,
so we also ensure that checkCSS notices this and emits the proper
human-friendly sanitization notice.
Change-Id: I76cef7c772b1e3eba0af8dab6403e9100beab03a