Page MenuHomePhabricator

Add parsedcomment to recentchange stream
Closed, ResolvedPublic

Description

In https://phabricator.wikimedia.org/T130651#3416860, @Nirmos asked if we could have parsedcomment in the recentchange stream data. It has never been in RCFeed/RCStream, so I'm not familiar with parsedcomment. This task is to determine if we can / should add it to recentchange data, and possibly also other relevant event schemas (mediawiki/revision/create, etc.)

Event Timeline

Nuria moved this task from Incoming to Dashiki on the Analytics board.

I'm not familiar with parsedcomment

comment is the pseudo-wikitext that comes from user input. parsedcomment is the HTML that this comment produces. For example, in https://sv.wikipedia.org/w/index.php?diff=40470304, the comment is

/* heading */ [[link]]

but the parsedcomment is

<a href=\"/wiki/Anv%C3%A4ndare:Nirmos/sandl%C3%A5da#heading\" title=\"Användare:Nirmos/sandlåda\">→</a>‎<span dir=\"auto\"><span class=\"autocomment\">heading: </span> <a href=\"/wiki/Link\" title=\"Link\">link</a></span>

It's available in the RecentChanges API described on https://www.mediawiki.org/wiki/API:RecentChanges

It would be super useful to have for tools that use EventStreams.

Apologies in advance if this is not what you meant.

@mobrovac, @Pchelolo, any thoughts on this? We don't include parsed wikitext anywhere else in events (AFAIK), so I'm not so sure we should include it here.

We don't include any HTML-formatted content in our events, so I'm with you: I don't think we should include parsedcomment.

We don't include any HTML-formatted content in our events, so I'm with you: I don't think we should include parsedcomment.

The fact that we don' do it right now doesn't mean we shouldn't consider doing it in the future. I can see how access to this information can be really useful for any UI built around the recentchanges stream, so why not?

I don't have a firm stand on this, but it seems like verbose information that is already present in the comment field. Should we then get rid of comment?

Should we then get rid of comment?

Don't think we shoudl get rid of comment. Is there an easy way for someone to parse the comment text, via an API maybe?

Hm, I guess they could just get it from RecentChanges API, but mehhhhhh :/

I don't have a firm stand on this, but it seems like verbose information that is already present in the comment field. Should we then get rid of comment?

We use the non-parsed comment for matching of some Wikidata change prop rules. This seems like the situation we have with extract and extract_html, so I don't see a lot of harm in giving both. The data size is small, won't create any technical problems.

I'm not opposed. Feels a little redundant to me, but I also don't have a firm stand.

So, in my understanding, @Ottomata is -0.5, @mobrovac is also -0.5, I'm +0.5 and I guess @Nirmos who requested this is +1.0. Overall we have +0.5 for including. Did I summarize correctly?

Hahahaha, yeah I think so. Can easily get this from MW when the event is emitted?

It seems so from the code Lemme try it in vagrant

Hm, not as trivial as I thought. In Event-Platform we rely on the core RecentChange behavior, using the default formatter and sending it to the EventBus. WE could generically add parsedcomment to the core RecentChange, but I don't think that's a good idea - inside MediaWiki the comment could be easily parsed if needed. WE could add some custom logic in the formatter, which I suppose is a better way to go.

Another question is whether we want to include it across the board or only in recent change?

Especially now that revision-create is available in EventStreams (we should do an announcement about this, maybe after we settle the revision-score stuff), I'd like to do things that encourage folks to use revision-create rather than recentchange, where possible. So, I'm for adding it to revision-create, and other streams too.

Change 364600 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/event-schemas@master] Recentchange: Add optional parsedcomment to the schema.

https://gerrit.wikimedia.org/r/364600

Change 364602 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/extensions/EventBus@master] Recentchange: Populate the parsedcomment field

https://gerrit.wikimedia.org/r/364602

Change 364600 merged by jenkins-bot:
[mediawiki/event-schemas@master] Recentchange: Add optional parsedcomment to the schema.

https://gerrit.wikimedia.org/r/364600

Change 364602 merged by jenkins-bot:
[mediawiki/extensions/EventBus@master] Recentchange: Populate the parsedcomment field

https://gerrit.wikimedia.org/r/364602

fdans claimed this task.