Page MenuHomePhabricator

Updates to docs and data dictionary
Closed, ResolvedPublic5 Estimated Story Points

Description

As a WME, I want the API docs and Data Dictionary to have up-to-date and clear/consistent descriptions, this will help clients understand the API and simplify onboarding.

We can split this ticket so that one dev does the YAML changes and another developer takes on the Data Dictionary (spreadsheet) updates

Note: Ruairi added additional comments to clarify some tasks, comments are towards the end of the page. If you need more context chat with Chuck or Ruairi

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
/snapshots/ docs vs api resp fields discrepancy

Current site docs example in /v2/snapshot we have in the example box (top level things not the sub items)

{
    "identifier": "string",
    "name": "string",
    "version": "string",
    "in_language": {},
    "is_part_of": {},
    "namespace": {},
    "size": {}
}

When you hit that endpoint (using postman here) I get those but:
A: I don't get name field in response
B: The site docs example is missing a field I do get in resp called date_modified (screenshot)

docs-snapshot-diff.png (1×838 px, 168 KB)

prabhat updated the task description. (Show Details)
creynolds renamed this task from Update docs and data dictionary with the customer feedback to Updates to docs and data dictionary.May 7 2024, 9:07 PM
creynolds updated the task description. (Show Details)
Snapshot identifier description and examples

In Snapshot /snapshots/{identifier}/download docs we just show the description of "Snapshot identifier" and it's not very specific. In the getting started example calls page we say the following which also needs to be updated to be more helpful

Note the project identifier (e.g. “enwiki” for English Wikipedia) as that is the identifier you will use to identify that project in other calls and responses.

From Slack thread:

The identifier is a combination of project and namespace.
There are different things here:
project identifier : enwiki
namespace identifier : 0
snapshot identifier: enwiki_namesapce_0
language code : en
project code : wiki

Idea is to update the Snapshot path params description to be more helpful in understanding how to find/formulate that.
Similarly in docs examples we should be more helpful too.
Furthermore if the Data Dictionary identifier description should also be updated with this in mind.

/docs/authentication/ login resp fields discrepancy
  • /v1/login/ Login Response example shows six fields, when in production it actually responds with only four.
    • Docs show: challenge_name, id_token, access_token, refresh_token, session, expires_in
    • Actual resp: id_token, access_token, refresh_token, expires_in
    • So challenge_name and session are the issues

Note: https://enterprise.wikimedia.com/docs/#getting-api-keys "Login Response" is accurate to response

/authentication/ endpoints description updates

These /authentication/ endpoint descriptions need copy edits for clarity/grammar:

Login:

  • orig: By receiving username and password creates new access, id and refresh tokens.
  • change to: Create new access, id, and refresh tokens with your valid username and password.

Refresh Token:

  • orig: By receiving refresh token and username provides new access and id tokens.
  • change to: Provides new access and id tokens with your valid username and refresh token.

Revoke Token:

  • orig: By receiving refresh token revokes access for all of its access tokens. After the token is revoked, you can not use the revoked token to access authenticated APIs.
  • change to: Revokes the associated access tokens with your valid refresh token.

Forgot Password:

  • orig: By receiving username sends confirmation code that is required to change the user's password (look into /v1/forgot-password-confirm).
  • change to: Sends a confirmation code via email. That code is then required to change user's password using /v1/forgot-password-confirm.

Forgot Password Confirmation:

  • orig: By receiving username, new password and confirmation code (see /v1/forgot-password) changes user password.
  • change to: Changes user password with your valid username, confirmation_code sent to your email from previously hitting /v1/forgot-password endpoint, and new password.

Change Password:

  • orig: Changes user password by receiving access token, previous password, and proposed password.
  • change to: Changes user password with your valid access token, current password, and a new proposed password.

New Password Required:

  • orig: Responds NEW_PASSWORD_REQUIRED challenge by receiving username and session token and setting new password.
  • proposed: NEED ENG INSIGHT HERE -- Is this still needed???
    • why would it respond with this? anything helpful here? Feel free to ping chuck in slack with draft when ready.
    • How do we explain that better in the description?

Description from @HShaikh "The new password requirement was I think a last minute addition so really needs a deeper look on how that works. If I remember correctly, it was due to mark from internet archive losing password and having us redo his password for him"

  • Forgot password, action needed? - this should be part of the same workflow and contextual
/docs/realtime/ Intro and Descriptions and FAQs
  • Article Updates (Streaming):
    • orig: Returns a stream of new articles, updates, or name changes across all supported projects
    • Proposed [Prabhat]: Returns a stream of new articles, updates, name changes, deletes, visibility changes across all supported projects. The type of event can be discerned by article.event.type.
      • Since Event types are only: update, delete, visibility-change I'm thinking the description should include this info/context of the exact type's string as well IMO. Will helps to explain that the update event is inclusive of events: new articles, updates to previous articles, and name changes.
/data-dictionary/ Description updates
  • Event shows six fields but only 3 are in On-demand responses and the last 3 are ONLY in Realtime (date_published, partition, offset). We should make that clear in the Data Dictionary. Any other caveats?
  • Visibility Last part of description mentions visibility only occurs in the visibility-change event.type. Should it also mention that this field is specific only to Realtime API and no others?
POST Request Body examples have additional characters

Metadata, On-demand, Snapshot, and Realtime POST sections have characters in Request Body example JSON that we shouldn't be displaying. Remove "\" and "\n" and double check that what's in those JSON boxes can be copy/pasted by users/customers to use in postman (or other) request body.
Examples:

{
  "fields": "[\"name\",\"identifier\"]\n",
  "filters": "[{\"field\":\"identifier\",\"value\":\"wiki\"}]\n"
}
{
  "fields": "[\"name\",\"identifier\"]\n",
  "filters": "[{\"field\":\"in_language.identifier\",\"value\":\"en\"}]\n",
  "limit": 3
}
{
  "since": "2006-01-02T15:04:05Z",
  "fields": [
    "name",
    "identifier"
  ],
  "filters": "[{\"field\":\"in_language.identifier\",\"value\":\"en\"}]\n",
  "parts": [
    0,
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9
  ],
  "offsets": {
    "\u201c0\u201d": 3614782,
    "\u201c4\u201d": 3593806,
    "\u201c8\u201d": 3588693
  },
  "since_per_partition": {
    "\u201c1\u201d": "2023-06-05T12:00:00Z",
    "\u201c2\u201d": "2023-06-05T12:00:00Z"
  }
}
Add GET Request Examples

Our POST sections all have "request body" examples; we should add GET url params JSON examples too where available

DeDupe Path Descriptions & Operations Description or only use one

We use "operation descriptions" on /authentication/ docs but we use both "path description" and "operation description" on all other pages and endpoints. Unless there's a strong requirement to have both we should nuke the "operation description" and just use "path description" on everything. If there is a strong requirement to use them; they should not be dupes of path desc and should be unique per operation.

Screenshot 2024-05-07 at 16.28.04.png (498×2 px, 448 KB)

Screenshot 2024-05-07 at 16.28.17.png (598×1 px, 199 KB)

creynolds added a subscriber: JArguello-WMF.

All done adding items for /docs/** updates. cc @prabhat @JArguello-WMF

Adding on another request:
API Response Codes & Descriptors

Reference: https://api.enterprise.wikimedia.com/spec/spec.yaml

WHAT TO DO:
As a user I should be able to have a reference guide for response codes encountered while using WME APIs. As a Dev this stuff is implicit knowledge but I feel it's good to explain, in human language, what these codes mean and why a user may have received them specific to WME APIs. More focus here on WHY and how to fix it. Like WHY would a user encounter a 422 in our API specifically (not a general description of it)… it's probably a misplaced comma or something but we should have a short content piece that will be helpful.

mega rough init draft:

200: ok
Your request was processed successfully, and the server returned the expected result. Everything is good.

401: unauthorized_error
The server couldn't verify your credentials. This usually means your user credentials are invalid or missing, or access_token is invalid or expired. Make sure you're sending the correct credentials with your request.

403: forbidden_error
You’ve been authenticated, but your permissions aren’t sufficient to complete this operation.

404: not_found_error
The server can't locate the resource you're requesting. This could be due to a typo in the endpoint path.

422: unprocessable_entity_error
The server understood your request, but there's a semantic issue with the data you've provided. This often happens if required fields are missing, or the data format doesn't match what the API expects. Review the request payload for any inconsistencies.

500: internal_server_error
The server encountered an unexpected condition that prevented it from fulfilling the request. Retry request.

prabhat updated the task description. (Show Details)

Say "Project Identifier" and give examples of enwiki frwikinews eswikivoyage. Sya they can find the project identifier list from the Project list API

Snapshot identifier description and examples

In Snapshot /snapshots/{identifier}/download docs we just show the description of "Snapshot identifier" and it's not very specific. In the getting started example calls page we say the following which also needs to be updated to be more helpful

Note the project identifier (e.g. “enwiki” for English Wikipedia) as that is the identifier you will use to identify that project in other calls and responses.

From Slack thread:

The identifier is a combination of project and namespace.
There are different things here:
project identifier : enwiki
namespace identifier : 0
snapshot identifier: enwiki_namesapce_0
language code : en
project code : wiki

Idea is to update the Snapshot path params description to be more helpful in understanding how to find/formulate that.
Similarly in docs examples we should be more helpful too.
Furthermore if the Data Dictionary identifier description should also be updated with this in mind.

Partially done by Chuk already, the description is improved. Check the 3 event types are described and how they differ

/docs/realtime/ Intro and Descriptions and FAQs
  • Article Updates (Streaming):
    • orig: Returns a stream of new articles, updates, or name changes across all supported projects
    • Proposed [Prabhat]: Returns a stream of new articles, updates, name changes, deletes, visibility changes across all supported projects. The type of event can be discerned by article.event.type.
      • Since Event types are only: update, delete, visibility-change I'm thinking the description should include this info/context of the exact type's string as well IMO. Will helps to explain that the update event is inclusive of events: new articles, updates to previous articles, and name changes.

Remove the \n from the example JSON. Explain why we need the \" on the inner double quotes in JSON syntax.

POST Request Body examples have additional characters

Metadata, On-demand, Snapshot, and Realtime POST sections have characters in Request Body example JSON that we shouldn't be displaying. Remove "\" and "\n" and double check that what's in those JSON boxes can be copy/pasted by users/customers to use in postman (or other) request body.
Examples:

{
  "fields": "[\"name\",\"identifier\"]\n",
  "filters": "[{\"field\":\"identifier\",\"value\":\"wiki\"}]\n"
}
{
  "fields": "[\"name\",\"identifier\"]\n",
  "filters": "[{\"field\":\"in_language.identifier\",\"value\":\"en\"}]\n",
  "limit": 3
}
{
  "since": "2006-01-02T15:04:05Z",
  "fields": [
    "name",
    "identifier"
  ],
  "filters": "[{\"field\":\"in_language.identifier\",\"value\":\"en\"}]\n",
  "parts": [
    0,
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9
  ],
  "offsets": {
    "\u201c0\u201d": 3614782,
    "\u201c4\u201d": 3593806,
    "\u201c8\u201d": 3588693
  },
  "since_per_partition": {
    "\u201c1\u201d": "2023-06-05T12:00:00Z",
    "\u201c2\u201d": "2023-06-05T12:00:00Z"
  }
}

Check the Go code for more error codes that we should list in the docs and a description

Adding on another request:
API Response Codes & Descriptors

Reference: https://api.enterprise.wikimedia.com/spec/spec.yaml

WHAT TO DO:
As a user I should be able to have a reference guide for response codes encountered while using WME APIs. As a Dev this stuff is implicit knowledge but I feel it's good to explain, in human language, what these codes mean and why a user may have received them specific to WME APIs. More focus here on WHY and how to fix it. Like WHY would a user encounter a 422 in our API specifically (not a general description of it)… it's probably a misplaced comma or something but we should have a short content piece that will be helpful.

mega rough init draft:

200: ok
Your request was processed successfully, and the server returned the expected result. Everything is good.

401: unauthorized_error
The server couldn't verify your credentials. This usually means your user credentials are invalid or missing, or access_token is invalid or expired. Make sure you're sending the correct credentials with your request.

403: forbidden_error
You’ve been authenticated, but your permissions aren’t sufficient to complete this operation.

404: not_found_error
The server can't locate the resource you're requesting. This could be due to a typo in the endpoint path.

422: unprocessable_entity_error
The server understood your request, but there's a semantic issue with the data you've provided. This often happens if required fields are missing, or the data format doesn't match what the API expects. Review the request payload for any inconsistencies.

500: internal_server_error
The server encountered an unexpected condition that prevented it from fulfilling the request. Retry request.

ROdonnell-WMF changed the point value for this task from 3 to 5.
JArguello-WMF raised the priority of this task from Low to Medium.Nov 11 2024, 4:02 PM
Snapshot identifier description and examples

In Snapshot /snapshots/{identifier}/download docs we show the description of "Snapshot identifier" and it's not very specific. In the getting started example calls page we say the following which also needs to be updated to be more helpful

Note the project identifier (e.g. “enwiki” for English Wikipedia) as that is the identifier you will use to identify that project in other calls and responses.

From Slack thread:

The identifier is a combination of project and namespace.
There are different things here:
project identifier : enwiki
namespace identifier : 0
snapshot identifier: enwiki_namesapce_0
language code : en
project code : wiki

Idea is to update the Snapshot path params description to be more helpful in understanding how to find/formulate that.
Similarly in docs examples we should be more helpful too.
Furthermore if the Data Dictionary identifier description should also be updated with this in mind.

@creynolds I don't see the "Getting Started Page" in our Repos. Maybe it's in the CRM?

Can you update this paragraph for:

Next, try using the Snapshot API. Run this cURL command to download a compressed file containing every article in English Wikipedia (it’s large).

To:

Next, try using the Snapshot API. Run this cURL command to download a compressed file containing every article in English Wikipedia (it’s large).  Note: the "Snapshot identifier" looks like `<language><project_name>_namespace_<number>`, examples: `dewiki_namespace_14` downloads categories used in de.wikipedia.org, `enwiki_namespace_0` downloads articles used in en.wikipedia.org, `frwikivoyage_namespace_10` downloads wikitext templates used in fr.wikivoyage.org. Here are links to our current [languages](https://api.enterprise.wikimedia.com/v2/languages), [projects](https://api.enterprise.wikimedia.com/v2/codes) and [namespaces](https://api.enterprise.wikimedia.com/v2/namespaces).

About your comment on the Data Dictionary:

Furthermore if the Data Dictionary identifier description should also be updated with this in mind.

The Data Dictionary describes the output JSON. It doesn't describe the filter parameters for the API. I think it would make the semantics more confusing if we did a long description of what "identifier" means in our data dictionary.

In the docs, we use identifier for 4+ different contexts: snapshot identifier, project identifier, article identifier, and revision identifier and those are only the cases where we're describing input filters. In the case of "Getting Started" the wording is correct.

New Password Required:

  • orig: Responds NEW_PASSWORD_REQUIRED challenge by receiving username and session token and setting new password.
  • proposed: NEED ENG INSIGHT HERE -- Is this still needed???
    • why would it respond with this? anything helpful here? Feel free to ping chuck in slack with draft when ready.
    • How do we explain that better in the description?

Description from @HShaikh "The new password requirement was I think a last minute addition so really needs a deeper look on how that works. If I remember correctly, it was due to mark from internet archive losing password and having us redo his password for him"

  • Forgot password, action needed? - this should be part of the same workflow and contextual

The AWS documentation shows it's an API call in Cognito to allow users to change their password: https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-identity-provider_example_cognito-identity-provider_RespondToAuthChallenge_section.html

I don't know the history of the code, Stephan wrote it 3 years ago.

This is my understanding: In AWS Cognito, RespondToAuthChallenge API is used as part of the custom authentication flow or multi-factor authentication (MFA) process.

Custom Authentication Flows: When using a custom authentication flow, Cognito can send challenges (e.g., "CUSTOM_CHALLENGE") to the client during login. With RespondToAuthChallenge, you send back the response to the challenge, such as a password, code, or other verification data, as defined in your custom authentication flow. It looks like Stephan repurposed the typical MFA MFA to allow users to reset their password using a challenge code to provide an MFA code. RespondToAuthChallenge allows you to pass the MFA code to Cognito to complete the authentication. Users can update their passwords (e.g., after a password reset using this "NEW_PASSWORD_REQUIRED" challenge. The user responds with the new password using RespondToAuthChallenge.

Basic Flow with RespondToAuthChallenge:

  1. User initiates sign-in using a method like InitiateAuth.
  2. Cognito sends a challenge (e.g., MFA code request or custom challenge).
  3. User responds to the challenge using RespondToAuthChallenge with the required data.
  4. This API is essential for enabling more customized and secure authentication experiences in AWS Cognito.

My guess is that it's for corporate account logins, where someone leaves and the rest of the team don't have the old password and it's encrypted in some secrets manager. Then the team need a way to reset the password using a valid session Auth token

We should not explain this in or Documentation, I see it as a workaround for customers that have a niche issue.

Add GET Request Examples

Our POST sections all have "request body" examples; we should add GET url params JSON examples too where available

We should not have GET and POST for the same API operations, this was a poor API design. Given that the Request payload can be more than a few 100 bytes, I'd strongly recommend that clients do NOT use GET requests to our APIs. Proxy servers can cut short long URLs for security reasons. I'd prefer to nudge our clients towards the POST endpoints, and not give examples of GET requests.

For the moment, I'm not accepting this suggestion as part of this ticket.

DeDupe Path Descriptions & Operations Description or only use one

We use "operation descriptions" on /authentication/ docs but we use both "path description" and "operation description" on all other pages and endpoints. Unless there's a strong requirement to have both we should nuke the "operation description" and just use "path description" on everything. If there is a strong requirement to use them; they should not be dupes of path desc and should be unique per operation.

Screenshot 2024-05-07 at 16.28.04.png (498×2 px, 448 KB)

Screenshot 2024-05-07 at 16.28.17.png (598×1 px, 199 KB)

This is an HTML/CSS code request, not an improvement that clients will see. The CSS style rules are brittle, I'm not confident that renaming these two CSS class names will have side effects on other HTML sections. I'd prefer to leave this as a low priority.

API Response Codes & Descriptors
mega rough init draft:

200: ok
Your request was processed successfully, and the server returned the expected result. Everything is good.
...

I added this wording to the /v2/codes response descriptions, which is the first endpoint in the documentation. Do you want to repeat this throughout the docs?

@ROdonnell-WMF as part of this task did you do any changes to the data dictionary spreadsheet?

Yes, and I mentioned it to Chuck in the eng channel

RE: "API Response Codes & Descriptors"
Is that 'bulky' wording for response codes something we want in every single area in docs .... or I think the original intent was to have a single location with the long descriptions (like part of data dictionary maybe) and not add all that text to every single response section in docs because it adds a lot of space. THOUGHTS? idk if in yaml we can add a text link to the response section linking to that or not... but that's an option.

The API docs look better without the bulky text. The YAML uses a reference to re-use the same wording throughout the HTML. The best option is to put the Status Code descriptions in a new Web page and leave the Data Dictionary for JSON field descriptions. You know best how to optimise the Web page content.

Maybe when the Status Codes are on the website, I can add links in the Swagger YAML to these descriptions.

Can we close this ticket? Are all your suggestions/recommendations covered in my changes? There is a load and I've lost track of the updates!

Couple things I noticed still @ROdonnell-WMF

  • in /docs/realtime/ in /v2/articles 403 and 404 are in the wrong order
  • 204 is a response code in /docs/authentication/ /v1/ token-revoke, forgot-password, forgot-password-confirm, change-password endpoints but not included with descriptors.

to note: some of the 200's and 204's repond different based on endpoint or condition so we'd want to make sure the description makes sense for those.

Also line 106 is this dupe line intentional? (also similar in other 200 response examples like line 275, 380, etc)

example: >
        {"identifier":"string","name":"string","description":"string"}

        {"identifier":"string","name":"string","description":"string"}

dev site updated with latest commit; reverts/shortens Resp code desc's and keeps other changes.

  1. site openapi parser updated to run path & param descriptions through backtick checker.
  2. deduping descriptions is still an issue
  • parser vers updated to 1.1 for markdown backticks... merged into dev
  • duped descriptions fixed; on dev
  • double checking all issues to confirm completeness.

Issues still unresolved:

  1. /snapshots/ docs vs api resp fields discrepancy (comment #9500552)
  2. /docs/authentication/ login resp fields discrepancy (comment #9779250)
  3. /authentication/ endpoints description updates comment #9779257 is all done with the exception of v1/new-password-required. This was left open for ENG to make a call on 1. if we still need that endpoint in docs, 2. what the description should be. @ROdonnell-WMF w/ updated intel here in comment 10311882. cc @HShaikh again or @prabhat to make a call please.
  4. add GET request examples -- @ROdonnell-WMF updated comment here about not wanting GET requests in docs. That is definitely something I asked a LONG time ago with Stephan and (pretty sure) @prabhat. This one should be extracted out into a NEW Phab ticket for exploration. Agree? Confirm and Ruairi or I will make that happen. I don't want to let that convo die in this thread.

On me:

  • chuck/me still has to go through DD sheet to get that stuff out.
  • FYI ✅ "Snapshot identifier description and examples" in main /docs/ page updated/done.

@ROdonnell-WMF there are still some unresolved issues, I put it in progress on the current sprint. Can you take a look please?

Marking as done:

Issues still unresolved:

  1. /snapshots/ docs vs api resp fields discrepancy (comment #9500552)
  2. /docs/authentication/ login resp fields discrepancy (comment #9779250)
  1. /authentication/ @creynolds I think we should leave v1/new-password-required as it is. It's not causing confusion for clients and is still part of our API
  2. add GET request examples @creynolds I think we discuss it with Tech Leads as possible future work. It's a dramatic change to the docs and WME API philosophy

RE: #4 "add GET request examples" yeah... need to make a stub ticket to get convo goin.
I'll test and get this all on dev soon. standby.

Okay DEV updated. Looks good to merge into main && PROD.