Page MenuHomePhabricator

Malformed URL in infobox Structured Contents API
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • query the the Structured Contents API for Josephine Baker
curl -X 'POST' \
  'https://api.enterprise.wikimedia.com/v2/structured-contents/Josephine_Baker' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer $ACCESS_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
  "fields": [
    "name",
    "identifier",
    "infobox"
  ],
  "filters": [
    {
      "field": "in_language.identifier",
      "value": "en"
    }
  ],
  "limit": 1
}'

What happens?:
You are getting the following response:

[
  {
    "identifier": 255083,
    "name": "Josephine Baker",
    "infobox": [
      {
        "name": "Infobox person\n",
        "type": "infobox",
        "has_parts": [
          {
            "name": "Josephine Baker",
            "type": "section",
            "has_parts": [
              {
                "type": "image",
                "value": "Baker in 1940",
                "images": [
                  {
                    "content_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Baker_Harcourt_1940_2.jpg/220px-Baker_Harcourt_1940_2.jpg",
                    "caption": "Baker in 1940",
                    "height": 220,
                    "width": 220
                  }
                ]
              },
              {
                "name": "Born",
                "type": "field",
                "value": "Freda Josephine McDonald June 3, 1906 St. Louis, Missouri, US",
                "links": [
                  {
                    "url": "St._Louis",
                    "text": "St. Louis"
                  },
                  {
                    "url": "Missouri",
                    "text": "Missouri"
                  }
                ]
              },
              {
                "name": "Died",
                "type": "field",
                "value": "April 12, 1975 (aged 68) Paris, France",
                "links": [
                  {
                    "url": "Paris",
                    "text": "Paris"
                  }
                ]
              },
              {
                "name": "Resting place",
                "type": "field",
                "value": "Monaco Cemetery",
                "links": [
                  {
                    "url": "Monaco_Cemetery",
                    "text": "Monaco Cemetery"
                  }
                ]
              },
              {
                "name": "Nationality",
                "type": "field",
                "value": "American (renounced) French (1937–1975)"
              },
              {
                "name": "Occupation(s)",
                "type": "field",
                "value": "Vedette, singer, dancer, actress, civil rights activist, French Resistance agent",
                "links": [
                  {
                    "url": "Vedette_(cabaret)",
                    "text": "Vedette"
                  },
                  {
                    "url": "Civil_rights_activist",
                    "text": "civil rights activist"
                  },
                  {
                    "url": "French_Resistance",
                    "text": "French Resistance"
                  }
                ]
              },
              {
                "name": "Years active",
                "type": "field",
                "value": "1921–1975"
              },
              {
                "name": "Spouses",
                "type": "list",
                "values": [
                  "Willie Wells (m. 1919; div. 1919)",
                  "William Baker (m. 1921; div. 1925)",
                  "Jean Lion (m. 1937; div. 1940)",
                  "Jo Bouillon (m. 1947; div. 1961)"
                ],
                "links": [
                  {
                    "url": "Jo_Bouillon",
                    "text": "Jo Bouillon"
                  }
                ]
              },
              {
                "name": "Partner(s)",
                "type": "field",
                "value": "Robert Brady (1973–1975)"
              },
              {
                "name": "Children",
                "type": "field",
                "value": "12; Jean-Claude Baker presented himself as her foster son (contested by the Baker children)",
                "links": [
                  {
                    "url": "Jean-Claude_Baker",
                    "text": "Jean-Claude Baker"
                  },
                  {
                    "url": "Josephine_Baker#cite_note-1"
                  },
                  {
                    "url": "Josephine_Baker#cite_note-2"
                  }
                ]
              },
              {
                "type": "field",
                "value": "Musical career"
              },
              {
                "name": "Genres",
                "type": "field",
                "value": "Cabaret music hall French pop French jazz",
                "links": [
                  {
                    "url": "Cabaret",
                    "text": "Cabaret"
                  },
                  {
                    "url": "Music_hall",
                    "text": "music hall"
                  },
                  {
                    "url": "French_pop_music",
                    "text": "French pop"
                  },
                  {
                    "url": "French_jazz",
                    "text": "French jazz"
                  }
                ]
              },
              {
                "name": "Instrument(s)",
                "type": "field",
                "value": "Vocals"
              },
              {
                "name": "Labels",
                "type": "field",
                "value": "Columbia Mercury RCA Victor",
                "links": [
                  {
                    "url": "Columbia_Records",
                    "text": "Columbia"
                  },
                  {
                    "url": "Mercury_Records",
                    "text": "Mercury"
                  },
                  {
                    "url": "RCA_Records",
                    "text": "RCA Victor"
                  }
                ]
              }
            ]
          },
          {
            "name": "Signature",
            "type": "section",
            "has_parts": [
              {
                "type": "image",
                "images": [
                  {
                    "content_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Signature_de_Jos%C3%A9phine_Baker_-_Archives_nationales_%28France%29.png/150px-Signature_de_Jos%C3%A9phine_Baker_-_Archives_nationales_%28France%29.png",
                    "height": 150,
                    "width": 150
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
]

What should have happened instead?:
Take a closer look at the URLs inside the infobox, they are not properly constructed.
For example text property value is St. Louis and url property value is St._Louis, should be https://en.wikipedia.org/wiki/St._Louis.
Here's part of the payload to highlight the issue:

[
  {
    "identifier": 255083,
    "name": "Josephine Baker",
    "infobox": [
      {
        "name": "Infobox person\n",
        "type": "infobox",
        "has_parts": [
          {
            "name": "Josephine Baker",
            "type": "section",
            "has_parts": [
              ...
              {
                "name": "Born",
                "type": "field",
                "value": "Freda Josephine McDonald June 3, 1906 St. Louis, Missouri, US",
                "links": [
                  {
                    "url": "St._Louis",
                    "text": "St. Louis"
                  },
                  {
                    "url": "Missouri",
                    "text": "Missouri"
                  }
                ]
              },
              ...
            ]
          }
        ]
      }
    ]
  }
]

Event Timeline

SDelbecque-WMF renamed this task from Malformed URL ins Structured Contents API to Malformed URL in infobox Structured Contents API.Nov 22 2023, 2:59 PM
ROdonnell-WMF subscribed.

I'll pick this up, I think I know where the issue is and should be a fast fix

Not working in dev, I will look at the CICD pipeline to see why

ROdonnell-WMF triaged this task as Medium priority.
JArguello-WMF changed the task status from In Progress to Open.Feb 26 2024, 2:34 PM