Page MenuHomePhabricator

Revisit HTML included in pastes from popular LLMs
Open, Needs TriagePublic

Description

In T379908, we learned popular LLMs (e.g. Gemini, Claude, and ChatGPT) in include unique HTML elements along with text people paste from these services.

Through this investigation, we also concluded (T379908#10419299) that the HTML these service generate is likely to evolve over time and thus, not reliable/stable enough to be used a signal to configure Paste Check with.

This ticket involves the work of revisiting the HTML these popular LLMs include in text people paste from them to:

  • Learn if and how the HTML has changed since we first investigated this in November 2024
  • Decide whether we think the HTML is stable enough to be used as a signal Paste Check can be configured with/off of

Related

Event Timeline

One ChatGPT artifact that started to emerge: :contentReference[oaicite:1]{index=1} or similar (example). Confirmed by... ChatGPT itself.

You need to use the "Share" button in ChatGPT to create a shareable link. The regular URL is private.

Edited the original post.

I realize the task might not have been understood correctly. The title is "Revisit HTML...", that is, the "code" stored in the clipboard prior to copy-pasting it into a wiki (like T376306#10197664 or T379908#10322254).

On the other hand, the resources deal rather with the impression after copy-pasting (e.g., wording).

Editors of the English Wikipedia have been collecting empirical signs of machine-generated content, including some markup quirks.