In https://phabricator.wikimedia.org/T130651#3416860, @Nirmos asked if we could have parsedcomment in the recentchange stream data. It has never been in RCFeed/RCStream, so I'm not familiar with parsedcomment. This task is to determine if we can / should add it to recentchange data, and possibly also other relevant event schemas (mediawiki/revision/create, etc.)
Description
Details
Related Objects
Event Timeline
I'm not familiar with parsedcomment
comment is the pseudo-wikitext that comes from user input. parsedcomment is the HTML that this comment produces. For example, in https://sv.wikipedia.org/w/index.php?diff=40470304, the comment is
/* heading */ [[link]]
but the parsedcomment is
<a href=\"/wiki/Anv%C3%A4ndare:Nirmos/sandl%C3%A5da#heading\" title=\"Användare:Nirmos/sandlåda\">→</a><span dir=\"auto\"><span class=\"autocomment\">heading: </span> <a href=\"/wiki/Link\" title=\"Link\">link</a></span>
It's available in the RecentChanges API described on https://www.mediawiki.org/wiki/API:RecentChanges
It would be super useful to have for tools that use EventStreams.
Apologies in advance if this is not what you meant.
We don't include any HTML-formatted content in our events, so I'm with you: I don't think we should include parsedcomment.
The fact that we don' do it right now doesn't mean we shouldn't consider doing it in the future. I can see how access to this information can be really useful for any UI built around the recentchanges stream, so why not?
I don't have a firm stand on this, but it seems like verbose information that is already present in the comment field. Should we then get rid of comment?
Should we then get rid of comment?
Don't think we shoudl get rid of comment. Is there an easy way for someone to parse the comment text, via an API maybe?
We use the non-parsed comment for matching of some Wikidata change prop rules. This seems like the situation we have with extract and extract_html, so I don't see a lot of harm in giving both. The data size is small, won't create any technical problems.
Hm, not as trivial as I thought. In Event-Platform we rely on the core RecentChange behavior, using the default formatter and sending it to the EventBus. WE could generically add parsedcomment to the core RecentChange, but I don't think that's a good idea - inside MediaWiki the comment could be easily parsed if needed. WE could add some custom logic in the formatter, which I suppose is a better way to go.
Another question is whether we want to include it across the board or only in recent change?
Especially now that revision-create is available in EventStreams (we should do an announcement about this, maybe after we settle the revision-score stuff), I'd like to do things that encourage folks to use revision-create rather than recentchange, where possible. So, I'm for adding it to revision-create, and other streams too.
Change 364600 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/event-schemas@master] Recentchange: Add optional parsedcomment to the schema.
Change 364602 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/extensions/EventBus@master] Recentchange: Populate the parsedcomment field
Change 364600 merged by jenkins-bot:
[mediawiki/event-schemas@master] Recentchange: Add optional parsedcomment to the schema.
Change 364602 merged by jenkins-bot:
[mediawiki/extensions/EventBus@master] Recentchange: Populate the parsedcomment field