In https://phabricator.wikimedia.org/T130651#3416860, @Nirmos asked if we could have parsedcomment in the recentchange stream data. It has never been in RCFeed/RCStream, so I'm not familiar with parsedcomment. This task is to determine if we can / should add it to recentchange data, and possibly also other relevant event schemas (mediawiki/revision/create, etc.)
|mediawiki/extensions/EventBus : master||Recentchange: Populate the parsedcomment field|
|mediawiki/event-schemas : master||Recentchange: Add optional parsedcomment to the schema.|
I'm not familiar with parsedcomment
comment is the pseudo-wikitext that comes from user input. parsedcomment is the HTML that this comment produces. For example, in https://sv.wikipedia.org/w/index.php?diff=40470304, the comment is
/* heading */ [[link]]
but the parsedcomment is
<a href=\"/wiki/Anv%C3%A4ndare:Nirmos/sandl%C3%A5da#heading\" title=\"Användare:Nirmos/sandlåda\">→</a><span dir=\"auto\"><span class=\"autocomment\">heading: </span> <a href=\"/wiki/Link\" title=\"Link\">link</a></span>
It's available in the RecentChanges API described on https://www.mediawiki.org/wiki/API:RecentChanges
It would be super useful to have for tools that use EventStreams.
Apologies in advance if this is not what you meant.
The fact that we don' do it right now doesn't mean we shouldn't consider doing it in the future. I can see how access to this information can be really useful for any UI built around the recentchanges stream, so why not?
We use the non-parsed comment for matching of some Wikidata change prop rules. This seems like the situation we have with extract and extract_html, so I don't see a lot of harm in giving both. The data size is small, won't create any technical problems.
Hm, not as trivial as I thought. In Event-Platform we rely on the core RecentChange behavior, using the default formatter and sending it to the EventBus. WE could generically add parsedcomment to the core RecentChange, but I don't think that's a good idea - inside MediaWiki the comment could be easily parsed if needed. WE could add some custom logic in the formatter, which I suppose is a better way to go.
Another question is whether we want to include it across the board or only in recent change?
Especially now that revision-create is available in EventStreams (we should do an announcement about this, maybe after we settle the revision-score stuff), I'd like to do things that encourage folks to use revision-create rather than recentchange, where possible. So, I'm for adding it to revision-create, and other streams too.