Instrument MediaWiki on the WMF production cluster to send structured Action API request information to Hadoop via Kafka.
**Data to collect:**
| Data | Name | Type |
| ---- | ---- | ---- |
| Timestamp | ts | int (unix epoch seconds) |
| Requesting IP | ip | string |
| Requesting User-Agent | userAgent | string |
| Wiki (`wfWikiId()`) | wiki | string |
| Time spent processing request (ms resolution) | timeSpentBackend | int |
| Were errors encountered? | hadError | boolean |
| List of error codes | errorCodes | array<string> |
| Request parameters (name=value pairs) | params | map<string,string> |
Data will be collected by adding a new debug logging channel (`ApiRequest`) with structured data in the the [[https://www.mediawiki.org/wiki/Structured_logging#Add_structured_data_to_logging_context|PSR-3 context]]. Any MediaWiki deployment can then choose where and how to route these log messages.
For the WMF production cluster, introduce configuration to route this log channel to the local Kafka cluster in a topic that can be loaded into Hadoop.
Deployment Checklist
--------------------
These steps are intended to be done in-order.
# [x] Commit schema to mediawiki/event-schemas repository ([[https://gerrit.wikimedia.org/r/#/c/265164|gerrit]])
# [x] Commit submodule bump to analytics/refinery/source repository ([[https://gerrit.wikimedia.org/r/#/c/273556/|gerrit]])
# [x] Commit Oozie job to create partitions to analytics/refinery repository ([[https://gerrit.wikimedia.org/r/#/c/273557/|gerrit]])
# [x] Commit property changes for Camus to operations/puppet repository ([[https://gerrit.wikimedia.org/r/#/c/273558/|gerrit]])
# [x] Wait for analytics to deploy new versions of refinery and refinery-source to analytics cluster
# [x] {T129889}
# [x] Go back and fix things that were done incorrectly (T108618#2132875)
# [x] Commit submodule bump along with proper configuration to operations/mediawiki-config repository ([[https://gerrit.wikimedia.org/r/#/c/278347/|gerrit]])
# [x] Deploy initial mediawiki-config patch to production with a sampling rate of a few events per minute for testing
# [x] Verify events in Kafka are as expected. Check mediawiki logs for errors.
# [] After enough time has passed (Camus runs once per hour) verify events are showing up in HDFS
# [] Create table in Hive pointing at the events in HDFS ({T129886})
# [] Submit coordinator to Oozie to auto-create partitions
# [] Adjust (or remove) sampling of events in operations/mediawiki-config repository
---
## Original task description
**log user agent in api.log**
We [[ https://www.mediawiki.org/wiki/API:Main_page#Identifying_your_client | tell clients to use an interesting user agent]], but don't log it to `api.php`.
In T102079#1417411 Anomie commented
> User agent could be included easily enough, but would need to be run by Ops for the text logfile and @bd808 for logstash (if it wouldn't already be there) to verify that it wouldn't make a prohibitive difference to the storage requirements.
Seems a simple change to `ApiMain->logRequest()`, `ApiBase->logFeatureUsage()` already logs user agent to `api-feature-usage.log`.