Page MenuHomePhabricator

Remove line feed characters in title (and possible other fields?)
Open, LowPublic0 Estimated Story Points

Description

There are line feed characters in title on pages from this website, which causes template errors.

http://www.juntadeandalucia.es/presidencia/portavoz/gobierno/114816/susana/diaz/destaca/denominacion/origen/montilla/moriles/sinonimo/riqueza/calidad

Remove line feed characters from all fields (maybe except for 'abstract' field?).

Event Timeline

Elitre raised the priority of this task from to Needs Triage.
Elitre updated the task description. (Show Details)
Elitre added a project: Citoid.
Elitre subscribed.
Mvolz renamed this task from Results of a test with 10 random .es URLs on the beta cluster to Remove line feed characters in title (and possible other fields?).Sep 19 2016, 3:52 PM
Mvolz triaged this task as Low priority.
Mvolz updated the task description. (Show Details)

Re-checked all of these, all have since been resolved except the one I've edited in the description. (The date format is in ISO which is the most compatible format across languages)

I've tried to reproduce this issue, but the URL in the task description (http://www.juntadeandalucia.es/presidencia/portavoz/gobierno/114816/susana/diaz/destaca/denominacion/origen/montilla/moriles/sinonimo/riqueza/calidad) now gives a 404 error, and so Citoid can't generate anything from it. The URL in the duplicate task I just merged (http://www.superheromoviesnews.com/2014/02/x-men-producer-lauren-shuler-donner.html) is also a 404.

I tested with a temporary page I hosted and this is still a problem. You can reproduce with a page containing this HTML:

<title>Newline
test</title>