Page MenuHomePhabricator

Consider keeping user entered URL and removing tracking parameters
Open, MediumPublic

Description

We started using the canonical rather than the user entered URL for security reasons (T107322), but users have been regularly complaining this is destructive (removes anchors: T212608, page number in google books links and other important query parameters, and sometimes websites redirect us to a weird place like the home page if we aren't emulating a browser well enough).

An alternative is to remove tracking parameters based on a black list (i.e. https://en.wikipedia.org/wiki/UTM_parameters for some) and default to leaving them in if we don't "know" the parameter isn't needed.

Cons

  • Users adding prohibited urls like private ip addresses that we don't allow, could be left in (this can probably be ameliorated if we disallow this particular case.)
  • User tracking parameters that aren't on our blacklist would be left in.
  • The metadata might actually not be from the URL they used, for instance in T210871, the metadata is from a splash page, or in this bug here the metadata is from the home page, not the intended url. Putting the "bad" url in indicates to the user they need to fix the metadata manually, and also notes the actual source of the metadata.
  • The actual source of the metadata is discarded (the resolved / canonical url, as opposed to the user entered one)

Pros

  • Leaves important page query parameters in
  • Leaves anchors in
  • Creates a citation that needs slightly less fixing if the metadata is bad, because
  • Less confusing for users who expect that the url they enter will go in the url field as written

Event Timeline

Mvolz added a subscriber: Xaosflux.

@Xaosflux I think this was what you were trying to report on the other task.

@Mvolz in T210871 the problem isn't so much about "removing parameters" but it seems to be that the process is following third party redirects and then replacing the entire path (and coincidentally in the example it is resulting in adding parameters). In replacing the entire path, editorial control of references is being lost (and in the example also results in a references that is useless for readers and editors)