The WMF's analytics team provides files containing real-time data of WMDE banner impressions every 15 minutes. The data is publicly hosted on an indexed directory on analytics.wikimedia.org.
**Acceptance Criteria**
* The application has an entry point that cares for picking up the data.
* A file is only picked up/processed once.
**Notes**
Example for a publicly hosted dataset provided by the Analytics Team of the WMF:
https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/browser/
**Questions**
How are files created?
* File gets appended to every 15 minutes
* File gets overwritten every 15 minutes
* One file per date range
How often are files cleaned up?
How exactly does the format look like?
**Implementation Hints**
* Outcome of this task is to create classes for downloading the data, parsing the data and remembering which data has already been processed.
* After finishing the ticket, the entry point script should call the downloader and parser classes and leave a TODO for processing/storing the data.
* Guzzle should be used for testability and might even help parsing the files.