Page MenuHomePhabricator

Consider supporting hosts with port in Web2Cit
Open, LowestPublicFeature

Description

Feature summary (what you would like to be able to do and where):

Consider adding support for hosts with port. For example www.example.com:8080.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

For some websites it may be necessary to specify a port in addition to a hostname. However, Web2Cit currently ignores the host's port.

Benefits (why should this be implemented?):

However, note that of the more than 300k url references extracted by Web2Cit-Research, only 101 have have a port.

Event Timeline

diegodlh triaged this task as Lowest priority.EditedAug 11 2022, 3:58 PM

Note that it may be even more important to support schemes other than https.

Maybe we could address all of this by changing the way we save configuration files, from com/example/www/ to simply www.example.com/.

This way, ports may be specified as www.example.com:8080.

Schemes may be less straightforward to specify, as https://www.example.com/ would not be a valid path. Maybe just https:www.example.com.

The reason why we have paths defined as com/example/www/ is probably because we wanted Web2Cit to use configuration files hierarchically (i.e., use com.example.com.sub, if not available use com.example.com, else use com.example). But then plans changed.

Finally, note that we may still support these and other use cases with the current approach, though. For example: com:8080/example/http:diegodlh@www/

Or, alternatively, to preserve the hierarchy, because scheme, port and userinfo refer to ways of accessing the same host:

  • com/example/www/http:/
  • com/example/www/diegodlh@
  • com/example/www/:8080/
  • any combination: com/example/www/http:diegodlh@:8080/