Page MenuHomePhabricator

[Commons Impact Metrics] Create API documentation
Open, MediumPublic3 Estimated Story Points

Description

We should write the documentation of Commons Impact Metrics API (AQS2) in Wikitech

Things to document:

  • Overview
  • Endpoints
  • Examples
  • Limitations

To do:

Details

Other Assignee
mforns

Event Timeline

mforns set the point value for this task to 3.

@apaskulin Hi! Think these new endpoint fit into the AQS docs work you are currently active on?

There will be a few changes based on the new data modeling guidelines before we are ready to do these docs.

@apaskulin Hi! Think these new endpoint fit into the AQS docs work you are currently active on?

There will be a few changes based on the new data modeling guidelines before we are ready to do these docs.

Hi! This would be a great opportunity to test our self service process next month. I'll keep you updated on when that will be ready.

@apaskulin we are ready to start working on the docs... should we start drafting and then connect with you on publishing or does self-service process include the drafting stage?

@VirginiaPoundstone, the self-service process won't be ready until the next of next week (around June 13), but I can start putting something together sooner if you'd like to start the docs work now. Let me know!

@apaskulin it looks like @SGupta-WMF picked up this task today. Anything she can do to advance this while you get the self service process ready?

@SGupta-WMF and @VirginiaPoundstone, I updated the task description with the first step to move this forward. It looks like there's already been some work done on steps 1-3 in the linked process, which is great! Please add me as a reviewer to the patch once it's ready, and I can help with testing.

Hi @apaskulin The docs are already on main branch but you may review them on this patch as well
https://gitlab.wikimedia.org/repos/generated-data-platform/aqs/commons-impact-analytics/-/merge_requests/6

Thanks, @SGupta-WMF! Now that you have the spec files committed to the repo in that patch, the next step is to create an endpoint to serve the spec publicly so it can be consumed and displayed by the docs site. See step 7 in the docs on wiki

I have a few questions to help clarify the docs:

  • For the endpoints that return a time series without monthly in the path, what is the granularity of the data?
  • For the ranking endpoints, is there a limit on how many results are returned? For example, the top 100

I have created a wikitech document https://wikitech.wikimedia.org/wiki/AQS_2.0/CommonImpactAnalytics

Thank you @SGupta-WMF! The initial page looks good to me! 2 questions:

  • Should we rename the page from CommonImpactAnalytics to CommonsImpactAnalytics (Commons in plural as opposed of the adjective Common)?
  • Won't the URLs have a commons-impact-analytics reference? Like:
https://wikimedia.org/api/rest_v1/metrics/commons-impact-analytics/edits-per-category-monthly/1974_State_Visit_to_the_USSR_photo_kit_at_the_Gerald_R._Ford_Presidential_Library/shallow/all-edit-types/20221101/20231201

instead of

https://wikimedia.org/api/rest_v1/metrics/edits-per-category-monthly/1974_State_Visit_to_the_USSR_photo_kit_at_the_Gerald_R._Ford_Presidential_Library/shallow/all-edit-types/20221101/20231201

I have a few questions to help clarify the docs:

  • For the endpoints that return a time series without monthly in the path, what is the granularity of the data?
  • For the ranking endpoints, is there a limit on how many results are returned? For example, the top 100

@apaskulin For endpoints without a monthly parameter, the data will display all records within the specified start and end dates, without any specific granularity. Additionally, for the ranking endpoints, there is no defined limit on the number of results shown. @mforns Please correct if this is not accurate.

@apaskulin

For the endpoints that return a time series without monthly in the path, what is the granularity of the data?

AQS and the API service do not aggregate, they just pass all records from the underlying data that belong to the selected interval.
However, the underlying data is aggregated monthly, so in the end, the results are indeed monthly for all the endpoints.

For the ranking endpoints, is there a limit on how many results are returned? For example, the top 100

Same as above, AQS and the API service do not apply any threshold, they will just pass all the underlying data.
However, the source data is already cut to the top 100. So, the final result will be the top 100.

However, the source data is already cut to the top 100. So, the final result will be the top 100.

@mforns Is it not 1000 ?

@mforns Is it not 1000 ?

@SGupta-WMF Originally, it was 1000. But looking at data size in Cassandra, we saw that some tables were pretty big, and that cutting to just 100 elements per top would reduce the number of rows significantly.
Here's the reasoning https://phabricator.wikimedia.org/T364583#9843977
And the change: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1037167

We must change it in this document .

Oh! You're right. I've updated the doc to have 100 instead of 1000.
Thanks for the heads-up!

@SGupta-WMF I've updated the task description with the remaining steps to complete and moved this back to in process. Let me know if you have any questions.

apaskulin updated the task description. (Show Details)