In T331201#8690623, @santhosh shared documentation for a public API to read configuration,
https://cxserver.wikimedia.org/v2/list/mt exposes cxserver's MT capabilities via an API with json output. This output would be the true source for production as the config files are amended in deployment by production configurations. https://cxserver.wikimedia.org/v2?doc is the API spec for cxserver.
We want to see whether information is lost or changed by the config scraper, and one way to do that is to compare the API result with scraper output. The data needs to be transformed into the same shape in one direction or another to be compared. This is a one-time operation and only a small amount of reusable logic, so we don't care which direction the transformation is in.
- Read and parse JSON from the cxserver mt endpoint.
- Select one of the CSV output files included in contributions for T331201: Extract cxserver configuration and export to CSV and download it to your machine, either by cloning the repository or from the web using GitHub's "raw" mode.
- Transform data so it has the same shape. Note that sort order may also affect comparability.
- Compare the configuration structures.
- We don't need a detailed list of the differences if any, just an overview of what you see.
Nice to haves:
If there are differences, can they be explained by something in cxserver source code? By a quirk of the scraper?