Page MenuHomePhabricator

Metrics request on portal namespace usage
Open, Stalled, Needs TriagePublic

Description

The Request

In preparation for a new RfC on enwiki regarding new guidelines for the Portal system, we (WikiProject Portals) would like to gain additional insight into how much the current portal system is currently used. We can get easy stuff like view count already, but that does not take into account other more detailed aspects. Such as:

  • How many people clicked on a portal link from an article, or a category?
    • Which articles, and which categories generated the highest traffic?
  • How many people followed through to another article from the portal, or just closed the tab?
    • Which articles and portals had the highest engagement? (useful for replicating their success)
  • How many people even scroll to the bottom of the article and even had the opportunity to notice the portal link?
    • About what article size tends to result in zero portal link visibility?

I'm not sure how detailed the analytics are the WMF is currently collecting, so some of this may not be plausible at this time. I'm basing this off of the various studies I've seen run, like the recent citation usage study. Granted, that is a more involved example, and I'm not asking you to make schema changes to facilitate this or anything.

Background

This whole thing stems from the infamous RfC back in April, calling for the deletion of the Portal namespace, which was closed with no consensus. Since then, we've rebooted WikiProject Portals, and have overhauled most of the portals (which currently number around 5200) with new tech we've been building to make portal creation and maintenance easier, as well as adding new features not previously possible over a decade ago when the namespace was created.

The original problem hasn't disappeared however. There are those who would like to expand the number of portals significantly, while others would see portals removed, or at least reduced by an order of magnitude. The mechanism for this would be the portal guidelines, which would determine the criteria a portal must meet in order to exist. Before we propose a new set of guidelines, we want to collect some actual data to base our discussion on, rather than just opinion.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 28 2018, 1:29 AM
Ottomata added a subscriber: Ottomata.

@AfroThundr3007730 the Product-Analytics team works on things like this, so we added them. Hopefully someone there can get back to you.

Ottomata moved this task from Incoming to Radar on the Analytics board.Oct 4 2018, 5:14 PM
Ottomata added a project: Analytics.
Ottomata raised the priority of this task from Normal to Needs Triage.

@AfroThundr3007730 Hi! I don't think we've crossed paths yet, but you are clearly very active on both the wikis and phab. I'm a director of product at the foundation and the interim manager for the product analytics team. Thanks for reaching out and for drafting such a well-thought out request. As you likely suspected, the team is incredibly overloaded right now: our current mandate is to provide guidance to the product development teams here at the Wikimedia Foundation, and all of them have unanswered questions and would like more support from us.

However, you are the first (yes the first!) community member to ask a request of us since we became a formal team in January. Furthermore, some of it looks both straightforward and reliant on access to data that is not public. So, a member of our team, Tilman (@Tbayer) has offered to squeeze out some time to tackle 1 or more of these questions. There is no way he will be able to answer more than a few.

Our first question to you is, when do you need this by? If the RFC is tomorrow, I don't think we can help, but a longer time frame is likely doable. I'll let you and @Tbayer take it from here.

Thanks for looking into this. We're still in the process of discussing the construction of the RfC and the criteria for the guidelines, so we have time. We want to be thorough and build a proposal that will withstand thorough community scrutiny. You can see the ongoing discussion for details on how that's progressing.

We'd like to collect enough useful data to frame the upcoming discussion. Ideally, we'd have data on portal traffic and reader engagement in general, along with detailed data for the top-level portals.

We'd also like to find the "hot" portals, and analyze what it is that makes them successful, so we could use them as a model of where we need to aim with the others.

I don't want to add too much to your plate, so I guess we go for the low-hanging fruit? I'm not sure how much work each of those questions would cost you guys.

I agree with AfroThundr3007730 on the probable usefulness and scope of this request. Cheers, P

@AfroThundr3007730 Thanks for the additional background!
I should be able to get you some data for the first to groups of questions soon, based on the internal referrer data we have available.

For the first question, this will be pageviews to portals (defined as pages whose name starts with "Portal:") with a referrer that is not the main page, distinguished by whether the referring page is a category page or not.

Regarding the third question, there is some general information on this page which may be of interest: https://meta.wikimedia.org/wiki/Research:Which_parts_of_an_article_do_readers_read (and in the Wikimania slide deck that is linked there, which @Pbsouthwood is already familar with). But unfortunately I won't have time right now to dig deeper into what the (sparse) available data could tell us specifically about portals in that regard.

Tbayer moved this task from Triage to Doing on the Product-Analytics board.Oct 11 2018, 8:11 PM
Tbayer claimed this task.

@Tbayer Just curious if the analytics team had time to pull any data for this yet.

@Tbayer Just curious if the analytics team had time to pull any data for this yet.

Thanks for the ping! I spent some time working on this a couple of weeks ago, but encountered an unexpected issue with the referer data, which gave rise to some questions about its validity in general (basically, an implausibly large number of referers are HTTP instead of HTTPS URLs), and I ran out of the allotted time while investigating this. I think I'll be able to get back to that and wrap this task up (possibly with somewhat less accurate results) by early next week.

Just got to work on this again inbetween other things - sorry about the delay, but figuring out what is going on with those referrers that shouldn't exist turned out to be a bit more complicated than expected. And your request had the bad fortune of being the first time that this issue in our existing data surfaced...

No worries, I figured things would slow down over the holidays.

And your request had the bad fortune of being the first time that this issue in our existing data surfaced...

Nice. Gotta keep you guys on your toes somehow. :)

I'm going to set aside some time again on Monday (Feb 25) to wrap this up, including documenting what we now know about the data issue with the referrers (and whether/how it might affect the validity of the results for this request). Let me know in case the needs here have changed in the meantime, or also if anything else occurred to you that should be considered in the analysis.

@Tbayer Sorry to keep bugging you, but have you been able to find time to finish this? We've been making waves on the Village Pump (again), and with the new discussions underway, I think this data would help shed some light on how the namespace is used. Off the top of my head, I can't think of anything else to add to the above request - not until we see the data, at least. Thanks again for looking into this.

kzimmerman changed the task status from Open to Stalled.Apr 18 2019, 7:59 PM
kzimmerman added a subscriber: kzimmerman.

Hi @AfroThundr3007730, I'm Head of Product Analytics (joined Wikimedia Foundation toward the end of 2018). Tilman is no longer at the Foundation and I'm working with my team on transitioning or closing out his open tasks.

Given the data issues Tilman previously ran into and our team being down one person, I don't think we'll be able to resolve your questions. But, before I decline this task, I wanted to reach out to you and see if I should revisit our priority list. Please let me know your thoughts on the urgency of this, and what decisions may be blocked by not having the data. Thank you!

kzimmerman removed Tbayer as the assignee of this task.Apr 18 2019, 7:59 PM
kzimmerman moved this task from Doing to Stalled on the Product-Analytics board.

@kzimmerman First off, thanks for checking up on this.

As mentioned previously, the primary reason for gathering this information was to provide more concrete and in-depth usage data on portals to the enwiki community, so that we could better shape the governing guidelines. Portals have recently become quite a controversial topic (again), as anyone who frequents the Village Pump or Administrator's Noticeboard will have no doubt noticed. I think that this data would help put things into perspective and (hopefully) allow for more rational logic-based discussions to take place.

We would, of course, like to see this ticket completed, but it the Product Analytics team is unable to devote further resources to it at this time, that is understandable. The discussions will continue in the absence of the requested data, and the community will eventually arrive at a consensus regarding portal matters. (Hopefully, one that everyone can live with, or at least hate equally...) If you find time to revisit this at a later date, we would still be interested in what you come up with. Despite recent developments, I doubt portals are going anywhere anytime soon.