Page MenuHomePhabricator

[spike] Investigation on which geolocation API
Closed, ResolvedPublic3 Estimated Story Points

Description

User stories:
As a campaign organizer, I would like to be able to easily enter the address of my in-person event.
As a campaign participant, I would like to be able to quickly know where the in-person event is taking place


This ticket captures the investigation work to determine which Geolocation API Provider we can use to provide the experiences outlined above ๐Ÿ‘†. However, we will not be including the actual implementation work in v0.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptApr 8 2022, 2:33 PM
vyuen renamed this task from Investigation on which geolocation API to [spike] Investigation on which geolocation API.Apr 8 2022, 2:33 PM
vyuen moved this task from Backlog to V0 on the Campaign-Registration board.

In order to more easily choose a provider, here are the features that we may want (or at least were mentioned), and how important they are:

  • Being able to filter events by country (seems very important, given that the event database is central)
  • Provide address autocompletion when creating/updating a registration (nice to have)
  • Show only relevant info in the event page bar, e.g. the building name, see T301692#7797849 (very important if we want to show relevant info only; otherwise we need an alternative solution)
  • Show events happening nearby (not important for now, could become more relevant in later versions)
  • Show where the event is happening on a map (probably a nice to have)

Even if this is not for V0, I believe that we should address this now. The reason is that once we will start using a geocoding library, there may be backwards compatibility issues with existing events (e.g. someone entered an invalid address or country, or in a format that the library does not recognize), and this could be very problematic.

From sprint planning: we won't tackle this yet, instead we will first spend some time deciding on how we should structure addresses (T306202).

The team found that Pelias seems to be perfect for our use case: it's open source and it does exactly what we need (parsing addressed, providing coordinates, autocompletion...). They have a web service and offer discounted/free plans for OS projects. However, I am slightly concerned because we would be making HTTP requests to a third party, and I believe this would at least require clearance from legal. As such, I think it would be better if we could self-host it. This could also help with other things, e.g. controlling access and manually managing the dataset. Some documentation is available here and here. As you can read, we would need a lot of disk space and computational power to get it up and running decently. I was wondering what would the next steps be, and particularly who do we need to talk to.

@Daimona Should we create a follow-up ticket to explore self-hosting for the Pelias solution?

@Daimona thank you for the summary!

@ifried I think a good next step is to determine whether using we can use the 3rd party APIs. I imagine legal or other parties may need to be involved? If using this service is acceptable, that would free us from having to self-host, which is an expensive option, especially considering the maintenance work that will come with it.

Thanks, @vyuen! In that case, perhaps @Daimona or one of the engineers can write up a proposal for what we plan, and once it is ready, @ldelench_wmf can help us connect with Legal for them to review it.

While writing the above I was implicitly thinking that it may be possible to make this opt-in (so we make requests to 3rd parties only if the user agrees), but on second thought, I'm not sure if it's possible.

Thanks, @vyuen! In that case, perhaps @Daimona or one of the engineers can write up a proposal for what we plan, and once it is ready, @ldelench_wmf can help us connect with Legal for them to review it.

What kind of write-up would you need? In short, we would be making HTTP requests to a third party (https://geocode.earth, maybe?) in our client code, hence sending the user's personal data (IP and UA) to that service. I don't think this is quite possible.

Perhaps, instead of self-hosting, we could set up a proxy that forwards requests from users to the 3rd party, hence hiding the user's data.

Note that this is mostly for the autocompletion part. We will also need to make requests to that side for other things (i.e. parsing the address when creating the registration), but those are made from PHP and do not contain user data.

It sounds like we're still figuring stuff out, so perhaps we're not ready to share a write-up? Is this correct? Anyway, whenever we are ready, by "write-up," I mean a short description of what we plan that we can share with Legal. Lauren can then help us with coordination with Legal to receive a response.

It sounds like we're still figuring stuff out, so perhaps we're not ready to share a write-up? Is this correct? Anyway, whenever we are ready, by "write-up," I mean a short description of what we plan that we can share with Legal. Lauren can then help us with coordination with Legal to receive a response.

I think we are ready, as long as it doesn't have to be too detailed / formal-ish. We could perhaps discuss this next week.

Okay, good to know. I don't know exactly what Legal would want (in terms of format), but a short write-up should be sufficient for now. If they have any follow-up questions, we can then add more details.

Pulling info into a one-pager for Legal, Trust & Saftey Policy, and Security and will ping team for review.

ldelench_wmf changed the task status from Open to In Progress.May 10 2022, 3:41 PM
ldelench_wmf moved this task from Backlog to V1 (MVP) on the Campaign-Registration board.

Optimistically pulling into V1 until we learn more!

ldelench_wmf set the point value for this task to 3.May 16 2022, 1:53 PM

Added points per meeting notes

Investigation is complete & captured here. Reviews are tracked in T309325.