Page MenuHomePhabricator

[GDPR🔎] Implement a full text search for personal data in the error logging database
Open, Needs TriagePublic2 Estimated Story Points

Description

If an error occurs during a donation or membership application process, personal data may be logged to provide context about an error. The data is usually stored for only a short amount of time, but we still need to be able to lookup and/or delete personal data on request.

Acceptance Criteria

  • An API endpoint in the FOC returns matches given a combination of different address fields.
  • As search parameters the endpoint accepts
    • first names
    • last names
    • street addresses
    • postal codes
    • city names
    • email addresses
  • The endpoint returns a list of matches that includes all of the above address fields and an identifier to be used for further action.

Implementation Notes
This ticket has the following sub-tasks that can be worked on individually or split out into a separate child ticket to fill a sprint.

  • Investigate and document the database structure in the errbit database. You can use docker exec to run the mongodb client inside the mongodb container of errbit. (see details in the comments below for help)
  • Find out if membership and donation logs are different (if they use different fields) and need different search "paths" (e.g. http_params.address vs http_params.strasse. Ideally only the names are different, but not the nesting structure.
  • add a mongodb container to the FOC docker-compose setup. For development, you can fill it with some data, otherwise it can be empty.
  • We should have a controller, use case and database abstraction interface. You might need to pull in a composer dependency to communicate with mongodb.
  • the database "client" should map the "deep and complicated" JSON structure from MongoDB to our desired result data structure.
  • The return JSON from the controller should resemble the return JSON from the other endpoints, contain only the field names and their values for the search parameters and a unique ID.
  • Request and retrieve data from errbit's database.
  • The FOC is hosted on all three web servers, errbit is only on web03.
  • We might need to expose the database port to the host, without exposing it to the world. This is fine because the firewall of the servers should block all non-ssh traffic that comes "from the internet". If the firewall also blocks internal traffic from web01 and web2, we need to write a ticket to our hosting provider

Testing

You can trigger an errbit entry with personal data with the following command:

curl -H "Content-type: multipart/form-data" \
-X POST \
-F interval=0 \
-F paymentType=UEB \
-F amount=1299 \
-F addressType=person \
-F firstName=Tom \
-F lastName=Tester \
-F street=Teststr \
-F email=tom@example.com \
-v \
https://spenden.wikimedia.de/donation/add

Since we don't have the UI yet (see T357519: [GDPR🔎] Add a GDPR request section to the FOC), we need to test with curl HTTP requests (using the cookie jar functionality and token from a browser sessiion), then use the following command:

curl -H "Content-type: multipart/form-data" \
-X POST \
-b cookies.txt \
-c cookies.txt \
-F firstName=Tom \
-F formToken=TOKEN_FROM_HTML_HERE \
-v \
https://backend.wikimedia.de/search/errbitnotices

Event Timeline

kai.nissen set the point value for this task to 8.

In order to access the errbit mongoDB you have to
-> go to our webserver, cd into the sites/errbit folder
->docker compose exec mongodb bash
That wil let you inspect the mongo DB inside the docker container.
Once you're inside the docker container, run:
mongo #starts the interactive shell

You can use commands like
show dbs
help
use admin #errbit error entires are stored here
show collections
to explore the database.
In order to view / filter the error messages, that errbit stores, run
db.notices.find().pretty() #"notices" contains the actual data we need

gabriel-wmde changed the point value for this task from 8 to 3.Apr 23 2024, 9:53 AM
gabriel-wmde changed the point value for this task from 3 to 5.

@kai.nissen product question:
we have the problem that MongoDB does not support partial search terms
(-> looking for "Donald" will find "Donald" but not "Donaldine").
(but: e.g. looking for "Benz" will find "Mercedes-Benz")

Is this a critically important feature to have for the search or is it fine to live with this limitation for errbit notices?

That's fine. The limitations should be documented, though.

kai.nissen changed the point value for this task from 5 to 3.Jun 4 2024, 10:14 AM
kai.nissen changed the point value for this task from 3 to 2.Tue, Jun 18, 10:08 AM
CorinnaHillebrand_WMDE renamed this task from Implement a full text search for personal data in the error logging database to [GDPRdel] Implement a full text search for personal data in the error logging database.Wed, Jun 19, 4:25 PM
CorinnaHillebrand_WMDE renamed this task from [GDPRdel] Implement a full text search for personal data in the error logging database to [GDPR🔎] Implement a full text search for personal data in the error logging database.

@CorinnaHillebrand_WMDE
When requesting the endpoint, the server responds:

Fatal error: Uncaught Error: Class "MongoDB\Driver\Manager" not found

@CorinnaHillebrand_WMDE
When requesting the endpoint, the server responds:

Fatal error: Uncaught Error: Class "MongoDB\Driver\Manager" not found

Unfortunately the MongoDB extension is not installed on our production servers. I've written an email to the hosting providers.