Page MenuHomePhabricator

HHVM fills logstash with junk (due to not handling multiline errors?)
Closed, ResolvedPublicPRODUCTION ERROR

Description

The production logstash fatalmonitor is full of stuff like this:

\xa9partement_de_la_Meuse','Portail_d\xc3\xa9veloppement','Portail_d\xc3\xa9veloppement_durable','Portail_electro','Portail_empire_Ottoman','Portail_empire_du_Japon','Portail_empire_ottoman','Portail_enseignement_sup\xc3\xa9rieur','Portail_entreprise','Portail_environnementalisme','Portail_espace_vert','Portail_espagne','Portail_espionnage','Portail_europe','Portail_fables','Portail_fantastique','Portail_fantasy','Portail_femme','Portail_film_fran\xc3\xa7ais','Portail_finances','Portail_flibustier','Portail_fluvial','Portail_football_ga\xc3\xa9lique','Portail_formule_1','Portail_for\xc3\xaat','Portail_for\xc3\xaats','Portail_france','Portail_francophonie','Portail_fran\xc3\xa7ais','Portail_funk','Portail_f\xc3\xa9lin','Portail_f\xc3\xaates','Portail_gastronomie','Portail_gay','Portail_gestion','Portail_grande_vitesse','Portail_guerre','Portail_guerres','Portail_g\xc3\xa9ophysique','Portail_haut_Moyen_\xc3\x82ge','Portail_haute-Garonne','Portail_hd','Portail_hip_hop','Portail_horticulture','Portail_humanitaire','Portail_humour_qu\xc3\xa9b\xc3\xa9cois','Portail_ichthyologie','Portail_infirmi\xc3\xa8re','Portail_insectes','Portail_internet','Portail_jardinage','Portail_jeu','Portail_jeux_olympiques','Portail_jeux_vid\xc3\xa9o','Portail_journalisme','Portail_juif','Portail_la_R\xc3\xa9union','Portail_lac','Portail_lacs','Portail_langue_fran\xc3\xa7aise_et_Francophonie','Portail_litt\xc3\xa9rature_de_jeunesse','Portail_litt\xc3\xa9rature_francophone','Portail_logiciel_libre','Portail_lorraine','Portail_l\xc3\xa9gion_\xc3\xa9trang\xc3\xa8re','Portail_mammalogie','Portail_manga','Portail_maths','Portail_mer','Portail_mers','Portail_micro_et_nanotechnologies','Portail_militaire','Portail_min\xc3\xa9ralogie','Portail_min\xc3\xa9raux','Portail_mollusque','Portail_mollusques','Portail_monarchie_de_Juillet','Portail_monde_maritime','Portail_monument_historique','Portail_moto','Portail_moyen_age_ancien','Portail_musique_au_Qu\xc3\xa9bec','Portail_musique_du_monde','Portail_musique_qu\xc3\xa9b\xc3\xa9coise','Portail_musiques_\
','370934711','370931934','370931890','370681997','370681841','370681623','370681014')
<Classification>Books & Manuscripts</Classification>
^

They all have type:hhvm, level:info. Presumably something generates errors with multiline and/or ridiculously long messages which get broken up and reported line-by-line as lots of separate errors.

Event Timeline

The HHVM logging transport is syslog, so anything that makes multi-line messages is going to look like garbage. I think the problem here is encoding however. Those look like parts of SlowTimer messages that have completely confused the logstash rules for parsing the HHVM syslog messages. Do they all have a _grokparsefailure tag as well?

Do they all have a _grokparsefailure tag as well?

The ones I checked all did. Then again, most HHVM messages seem to have that tag, even the seemingly single-line ones.

Is it possible to just switch to some different transport? (I feel we had this discussion already but can't remember where...)

Some related tickets: T63785 (proposes Kafka), T97297 (proposes Lumberjack)

Is it possible to just switch to some different transport? (I feel we had this discussion already but can't remember where...)

Syslog is actually natively how we get logs out of HHVM. The state of HHVM custom logging is discussed in some detail at T89169#1573731. The setup we have today is that each HHVM container that is running MediaWiki code is configured to log via HHVM's built in syslog logger to a local rsyslog service on the same host. That rsyslog is then configured to forward the HHVM log events both the logstash and to a central rsyslog aggregator on the log server (e.g. fluorine).

Some related tickets: T63785 (proposes Kafka), T97297 (proposes Lumberjack)

Kafka and Lumberjack solve two different problems. Kafka is about reliable transport (and possibly load balancing consumers). Lumberjack is about parsing raw log files from disk for applications that can't be made to send logs directly to Logstash.

If we wanted to replace the current HHVM log aggregation pipeline we would either need something like Lumberjack to parse log events written to disk by HHVM and send them to Logstash and the log aggregation server, or to find/build another logger implementation that we could compile into HHVM that included an alternate formatter and transport.

Googling around a bit, it seems that at least rsyslog supports multiline messages so we should figure out at which point do those get broken up.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:10 PM
Krinkle subscribed.

Closing per T176370 and T229792.