Page MenuHomePhabricator

Investigate stopping logging api-feature-usage channel in Logstash
Closed, ResolvedPublic

Description

MediaWiki logs to the bucket api-feature-usage which ends up in logstash. That represent maybe 1/6 of messages logged and I am not sure whether we need them to be stored in logstash.

There is a special page to query them: https://commons.wikimedia.org/wiki/Special:ApiFeatureUsage

InitialiseSettings.php has it had debug and I have tracked it down to August 2014: https://gerrit.wikimedia.org/r/#/c/154096/

Not sure whether it is still needed?

Similar request for AbuseFilter/StashEdit 146697

Event Timeline

hashar created this task.Sep 23 2016, 1:32 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 23 2016, 1:32 PM

I find this channel extremely useful when attempting to determine the usage level of deprecated API features to inform decisions on removing those features. It was used to generate the lists of users to notify for T96858: Use "new" continuation by default and T105794: Insecure POST traffic, and likely will be again for T145649: Deprecate and remove use of API action=purge with GET requests and other changes.

Also, it looks like just one company is responsible for around 38% of the recent entries in this channel: https://logstash.wikimedia.org/goto/82cfc871be7cdb1b34cdc9913e2b4461

bd808 triaged this task as Lowest priority.Sep 26 2016, 8:51 PM
bd808 added a subscriber: bd808.

Are we having storage issues with Logstash that I'm not aware of? As @Anomie mentioned, this log channel is valuable for API feature deprecation. It is also used to populate indices in the CirrusSearch Elasticsearch servers that powers https://en.wikipedia.org/wiki/Special:ApiFeatureUsage

hashar closed this task as Resolved.Sep 26 2016, 8:53 PM
hashar claimed this task.

I was wondering what it was for and the use case. Looks that is definitely helpful to fix up issue and maintain the API so lets keep it :]

Later we might want to have MediaWiki to use Monolog to emit nicely formatted messages. Would make it easier to search in logstash.

Thank you @Anomie for the detailed replies!

@bd808 we raced on replying! I am not aware of Logstash issue, it is just that having so many logs is sometime cumbersome when digging for production issues, but then I can just easily filter out the bits I am not interested in.

I was wondering from where the data of Special:ApiFeatureUsage comes from. Looks like we do not duplicate collection and definitely rely on logstash gathering.

Sorry if the task got interpreted as "please stop doing that!" when I really just wanted to know whether it was at least used by someone/somehow :-]

As for the user consuming a lot of API calls, probably want to fill a private task about it based on Brad report https://logstash.wikimedia.org/goto/82cfc871be7cdb1b34cdc9913e2b4461

I was wondering from where the data of Special:ApiFeatureUsage comes from. Looks like we do not duplicate collection and definitely rely on logstash gathering.

In more detail: The API writes messages into the logging system, which goes into logstash. The raw messages go into kibana's ES like normal, and a filtered version[1] also goes into a different ES index[2] that's queried by ApiFeatureUsage.

[1]: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/files/logstash/filter-apifeatureusage.conf
[2]: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/manifests/role/logstash.pp;21b46a74ca8bc6e062f523f871ba74dee5dfeb52$265-306

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM