Page MenuHomePhabricator

Include image/file changes in page-links-change
Open, Needs TriagePublicFeature

Description

I would like to be able to see the addition/removal of images (`[[File:Example.png]]') in an EventStream.

Rather than create an EventStream for page-images-change, and given that the addition/removal of a file is "just" a link, it appears to make sense to add this to the pre-existing page-links-change.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 904315 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/EventBus@master] EventBusHooks: Merge File changes with internal link changes

https://gerrit.wikimedia.org/r/904315

Cool, thanks for the patch. Let's involve some other users of this stream in a discussion before we decided to do this.

@Isaac , @achou I think you use these events? What do you think? Can you also ping others that might have opinions?

Also, relevant task: T331399: Create new mediawiki links change streams based on fragment/mediawiki/state/change/page

What do you think?

Hmm...what's the use-case for having wikilinks to articles and images in the same stream? On one hand, assuming the stream specifies the link namespace explicitly, it simplifies things to only have one place to check for link changes. On the other hand, it could force folks to filter a lot of events just to get to the ones that interest them and opens to the door to other questions like whether the intent to also include templatelinks, categorylinks, etc.? As a potential end-user, my gut feeling is to keep them separate like the mediawiki tables because personally I'm not generally working with models that use both links and images (and if I am, I'd likely prefer to just listen for the more generic page-change events because I'm probably watching for a lot of other things like references that aren't link-specific). Curious to hear other perspectives though.

Can you also ping others that might have opinions?

FYI @Miriam @MGerlach as folks who work on recommender systems that deal with images/links.

What do you think?

Hmm...what's the use-case for having wikilinks to articles and images in the same stream? On one hand, assuming the stream specifies the link namespace explicitly, it simplifies things to only have one place to check for link changes. On the other hand, it could force folks to filter a lot of events just to get to the ones that interest them and opens to the door to other questions like whether the intent to also include templatelinks, categorylinks, etc.? As a potential end-user, my gut feeling is to keep them separate like the mediawiki tables because personally I'm not generally working with models that use both links and images (and if I am, I'd likely prefer to just listen for the more generic page-change events because I'm probably watching for a lot of other things like references that aren't link-specific). Curious to hear other perspectives though.

My (weak) justification is primarily that a [[wikilink]] and [[File:Link.png]] are both "just internal links" — I would expect modifications to links on a page to be included in page-links-change. The use-case which prompted me to look at this was globally watching for the use of potentially disruptive image additions to templates — using the generic page-change events would require me to fetch the revision content, whereas page-links-change would include the added (or removed) images.

This being said, I'm certainly not opposed to the creation of a separate page-images-change stream 🙂

@TheresNoTime thanks for explaining. I think I still lean towards separate streams all things equal then but ultimately I'm fine with whatever is decided so long as it enables your use case.

opens to the door to other questions like whether the intent to also include templatelinks, categorylinks, etc.

Can you say more about this? IIUC, these are different kinds of links, yes? The page and image links are similar as @TheresNoTime says, since they are both internal hyperlinks. Is a link to a category or a template kind of the same, or are those very different?

Can you say more about this? IIUC, these are different kinds of links, yes? The page and image links are similar as @TheresNoTime says, since they are both internal hyperlinks. Is a link to a category or a template kind of the same, or are those very different?

@Ottomata fair question and I'll try to better explain myself: in theory, "links" cover a lot of interconnections between pages where changes might be useful to know about for an end-user. There are lots of ways to categorize them (intrawiki vs. interwiki vs. external; what syntax to use for creating them; how they're stored in Mediawiki; how they're used; etc.). Given that this link stream question depends on mediawiki code, I'll do my best to categorize them according to a mixture of what they do and how they're indexed on the backend. Apologies if I get any details wrong/missing in trying to do this quickly:

TypeMediawiki tableDetails
Classic wikilinkspagelinksblue text link to other pages on a wiki but doesn't have any special rendering, follow-on effects etc. Can be to any namespace and generally added via wikitext but can also be transcluded via templates etc.
Interwiki linksiwlinksLike classic wikilinks but to a page on another wiki
Template linkstemplatelinksNot wikilink to template namespace but template invocation via {{...}}. Still, has same mediawiki database pattern as wikilinks of source+target page information.
Category linkscategorylinksNot wikilink to category namespace but category invocation that sets a category for a page. Has same mediawiki database pattern of source+target page information with a little added complication related to category hierarchies etc.
Image linksimagelinksNot wikilink to file namespace but image invocation that adds an image/audio/etc. to a page. Has same mediawiki database pattern of source+target page information
Language linkslanglinksGenerally updated via Wikidata now but langlinks still stores the information and they can still be added to wikitext to link articles across projects/languages as being about the same subject.
External linksexternallinksalso blue text link but to non-wikimedia page so more possibilities as to what they go to and treated very differently by end-users as a result. Also externallinks table truncates the link text column length which can result in incomplete links being stored in mediawiki's backend even as the wikitext retains the true value.

Theoretically, page-link-change could have all of these because they all have ...links mediawiki tables that I assume have update hooks when they're changed (caveat I don't know this for sure). They're also all really all just the same few pieces of information and so could probably be stored via a common (if hacky) schema if desired: source page, target page, type of link. In reality, these links serve vastly different functions though and so I suspect that people who are interested in e.g., when template links changes generally aren't also interested in knowing when language links change. For me, given how many types there are that change at very different frequencies (langlinks very rarely to pagelinks very frequently), I default to separate streams because I think it's easier on the end-user apps and clearly tracks the Mediawiki database layout. That layout might not be obvious/known to every end user but it is a reasonable way to split up these types of links. If we start grouping them arbitrarily though, I think it would be semi-confusing to group some but not all because you'd be defining new ways of categorizing the types of links. It's not the end of the world (could have clear documentation that says e.g., page-links = pagelinks+imagelinks) but it's not clear to me that that it's better than all separate or all together.

Wow thanks Isaac, very helpful. I'm also inclined to treat these as separate streams then. We could however use a common schema designed in T331399 for each of these streams. This would help with downstream joining/querying for folks that do want to consume e.g. both page and image links together.

I'll link this comment from that ticket.

Change 904315 abandoned by Samtar:

[mediawiki/extensions/EventBus@master] EventBusHooks: Merge File changes with internal link changes

Reason:

T333497#8775767 (and prev.)

https://gerrit.wikimedia.org/r/904315

To build on @issac's excellent statement, I'll point to 004cb43c5667406cdb068a516b359d58b5592dd0 which tried to unify the different link types internally. Missing from @issac's list was ParserOutputLinkTypes::SPECIAL (a list of special pages transcluded by the current page) and "image links" are more properly thought of as "media links".

TY!

which tried to unify the different link types internally

"tried". Did you succeed? :)