Page MenuHomePhabricator

Single letter tokens suffixed to article text in search

Authored By
EBernhardson
May 1 2024, 4:41 PM
Size
350 B
Referenced Files
None
Subscribers
None

Single letter tokens suffixed to article text in search

On https://en.wikipedia.org/wiki/Waiting_at_the_Royal the indexed `text` field is suffixed with `v t e v t e`. This creates single letter tokens that end up getting used by morelike. There is no immediately obvious reason for these tokens from looking at the rendered webpage, likely something in the parser html -> plaintext conversion is incorrect.

File Metadata

Mime Type
text/plain; charset=utf-8
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
16597230
Default Alt Text
Single letter tokens suffixed to article text in search (350 B)

Event Timeline