Instrument MediaWiki on the WMF production cluster to send structured Action API request information to Hadoop via Kafka.
Data to collect:
|Timestamp||ts||int (unix epoch seconds)|
|Time spent processing request (ms resolution)||timeSpentBackend||int|
|Were errors encountered?||hadError||boolean|
|List of error codes||errorCodes||array<string>|
|Request parameters (name=value pairs)||params||map<string,string>|
Data will be collected by adding a new debug logging channel (ApiRequest) with structured data in the the PSR-3 context. Any MediaWiki deployment can then choose where and how to route these log messages.
For the WMF production cluster, introduce configuration to route this log channel to the local Kafka cluster in a topic that can be loaded into Hadoop.
These steps are intended to be done in-order.
- Commit schema to mediawiki/event-schemas repository (gerrit)
- Commit submodule bump to analytics/refinery/source repository (gerrit)
- Commit Oozie job to create partitions to analytics/refinery repository (gerrit)
- Commit property changes for Camus to operations/puppet repository (gerrit)
- Wait for analytics to deploy new versions of refinery and refinery-source to analytics cluster
- T129889: Create mediawiki_ApiAction Kafka topic
- Go back and fix things that were done incorrectly (T108618#2132875)
- Commit submodule bump along with proper configuration to operations/mediawiki-config repository (gerrit)
- Deploy initial mediawiki-config patch to production with a sampling rate of a few events per minute for testing
- Verify events in Kafka are as expected. Check mediawiki logs for errors.
- After enough time has passed (Camus runs once per hour) verify events are showing up in HDFS
- Create table in Hive pointing at the events in HDFS (T129886: Create wmf_raw.ApiAction table)
- Submit coordinator to Oozie to auto-create partitions
- Adjust (or remove) sampling of events in operations/mediawiki-config repository
Original task description
log user agent in api.log
We tell clients to use an interesting user agent, but don't log it to api.php.
In T102079#1417411 Anomie commented
User agent could be included easily enough, but would need to be run by Ops for the text logfile and @bd808 for logstash (if it wouldn't already be there) to verify that it wouldn't make a prohibitive difference to the storage requirements.
Seems a simple change to ApiMain->logRequest(), ApiBase->logFeatureUsage() already logs user agent to api-feature-usage.log.