Page MenuHomePhabricator

Consider recording the page where the question was asked
Closed, ResolvedPublic

Description

Should the data collected include the page/URL where the question was asked?

On the one hand, it's possibly useful to know that a person who was struggling was at [[Example article]], since there may be some problem (e.g., invalid wikitext) at that article. But recording this detail has some privacy risks, since the username of a person who just edited [[Example article]] at 18:42 UTC on 1 January 2018" is usually easy to determine from the page history.

Options:

  • Make it possible to do this, but only turn it on if it's needed (e.g., Special:Search results).
  • Record only the namespace (perhaps adding specific Special pages, such as Special:Search, that aren't associated with publicly logged actions)

Event Timeline

For most questions I can think of, the just-record-the-namespace idea is fine. One could also think of an abstraction layer that could potentially extract some other information like article length, number of editors etc. in chunks (like percentiles, S/M/L/XL or so) and put it into a structured format (JSON or so), resulting into data like

Length: [in] 42[th percentile]
Number of editors: 10-20
Namespace: Article

matmarex subscribed.

Removing MediaWiki-Page-editing so that this doesn't clutter searches related to actually editing pages in MediaWiki. This project should only be on the parent task, probably (T89970). [batch edit]

There should be a concrete use case before we implement such a feature.

But I wanted to mention that we're already recording PII in the event capsule, include a detailed browser fingerprint and geolocation. I'm not thrilled about this reality, but it does suggest that page title isn't a particularly dangerous field to add. The danger is more in the way of querying and making data public, which is already considered in any analysis.

awight claimed this task.

Okay, the punchline is that this *was* implemented in cb41111. The page metadata is only recorded for responses and not for the initial impression, which seems right to me. Marking resolved.