Page MenuHomePhabricator

SPIKE Image Browsing: Determine what is possible for including text content alongside images
Closed, ResolvedPublic3 Estimated Story PointsSpike

Description

Background

Some of the potential designs for the Image Browsing feature include snippets of the surrounding text from wherever an article image was embedded on a given page.

We need to investigate how feasible it would be to build something like this. Wikipedia articles are notoriously unstructured, and are essentially "tag soup" that is meant to be understood by a human reader. But maybe there are some heuristics we can rely on to get reasonably-useful information most of the time?

We could consider both back-end and front-end approaches – a back-end approach might involve doing something with image metadata or parser output; a front-end approach might look like relying in jQuery or DOM elements to grab text near where an image was placed in the hopes that is relevant.

This investigation should be time-boxed to a single sprint, and we should be mindful about practical returns on time investment of any proposed approach here.

If there is no feasible solution (at least not one that we can fit into the scope of our MVP prototype) then we should drop this aspect of the design for now.

Requirements

  • Investigate ways to get relevant and useful text content for images.
  • Consider frontend and backend approaches.
  • Document findings and file a follow-up task if needed.

Event Timeline

egardner triaged this task as Medium priority.Jul 8 2025, 5:44 PM
ovasileva renamed this task from Image Browsing: Determine what is possible for including text content alongside images to SPIKE Image Browsing: Determine what is possible for including text content alongside images.Jul 22 2025, 4:37 PM
ovasileva set the point value for this task to 3.Jul 22 2025, 4:43 PM
ovasileva moved this task from Incoming/Inbox to Ready on the Reader Growth Team board.

If we follow the HTML extraction method for T398992, it should be straightforward to get _the caption if present_ and a bit less straightforward to get _an adjacent paragraph_. Captions should be extremely useful in context and easy to display; however adjacent paragraphs could be quite long, and might or might not have the relevant context.

Another thing to consider is whether if extracting textual context is too hard, we should plan to link to the context instead (perhaps a floating gallery sidebar that scrolls you to the positions in the article). This may have design impacts.

lwatson changed the task status from Open to In Progress.Jul 25 2025, 4:46 PM
lwatson updated the task description. (Show Details)

Notes

Table summary about the different image content sources on wikis.

SourceNotes
Image alt textMissing alt text, or if provided, it may not be useful. Find the image’s alt text attribute.
InfoboxFind infobox <table> element with infobox class. Inside the <td> element with class infobox-image, lies the image and caption (div.infobox-caption). The <img> tag may have a CSS class of mw-file-element. Only 1 infobox image per article. May include alt text and caption. Most likely caption is provided. Alt text is inconsistently provided due to old practices.
FigcaptionFind <figcaption> element. It is a child of the <figure> element that contains the image and caption.
GalleryGallery image captions are optional. Find the <div> elements with classes: thumb for the image and gallerytext for the thumb caption. They are nested inside the parent gallery <ul> element(ul.gallery).
Thumb imageFind <div> element with these CSS classes: thumbimage and thumbcaption.
Adjacent paragraphFind the adjacent paragraph(s) in the same section. Experimental and may produce unfavorable results. Requires additional logic to determine relevancy.

Ways to get the data:

  • DOM API: Use the HTML DOM API to get all images in an article (with some exclusions like flags and icons). To get the images, you can use querySelectorAll or the document images property (HTMLCollection). The Document images property provides a list of all the images contained in the current HTML document. HTML DOM parsing can be done via client-side (JS/Vue) or server-side (Node.js).
  • MediaWiki API: Use the MediaWiki Action API to fetch article content, images, and metadata via wikitext or parsed HTML.

Related projects:

Approaches to use adjacent paragraphs

Note that this feature is experimental and may produce unfavorable results. We should test the accuracy of the relevant text content and iterate as needed.

Option 1: First section paragraph
  • Use the section's first paragraph where the image is located. It may or may not be relevant to the image. Bonus: add a link to that article section header. Brings the reader to the article section to learn more.
  • Low cost
Option 2: Keyword matching
  • Verify that the paragraph mentions or describes the image by cross-referencing with the image metadata.
  • Logic that compares the image metadata to the adjacent paragraphs.
Option 3: Image analysis
  • Use AI-based tools to interpret the image, metadata, and adjacent paragraph text. It will determine the relevant text content.
  • One advantage is that AI-tools can interpret the meaning of words and phrases. As per https://www.ibm.com/think/topics/natural-language-processing: “Instead of relying solely on keyword matching, NLP-powered search engines analyze the meaning of words and phrases, making it easier to find information even when queries are vague or complex.”

Future considerations

  • Brooke: If extracting textual context is too hard, we should plan to link to the context instead (perhaps a floating gallery sidebar that scrolls you to the positions in the article). This may have design impacts.
  • Ensure that the adjacent text is from the same section to increase relevance.
  • Investigate further into AI-based tooling for images. Find out what tools can interpret images.
  • Find out more from external teams that have used AI with images, for example, Future Audiences.

Open questions

  • Is there a length limit (min/max) of content?
  • [add more]

Follow-up

  • SPIKE task that investigates AI-based tools for interpreting images.
  • Determine an approach
ovasileva subscribed.

Discussed in planning: T401159: [SPIKE] Image Browsing: Investigate a keyword-based approach will be used to create a PoC for Approach 1 outlined above. We will return to the list of approaches here if that doesn't work. Resolving this in the meantime.