The current database schema the following problems:
- German column names - This leads to confusion for non-German speaking developers, diminishing developer experience, making onboarding more involved and leading to subtle errors: adresstyp vs address_type, note the different amount of "d"
- Large Object (LOB) Columns (data fields) - This makes it hard to export, query and update (purge) existing data.
Persistence layer breaks the boundaries of bounded contexts. Example: Both Donations and Memberships have a relation to Address Changes, but Address Change is its own bounded context and the Donation and Membership Entities (Business Objects) don't have references to Address Change. The reference is only there as a convenience, to trigger the creation of an AddressChange record when we save Donations/Memberships. Here, the framework/storage layer bleeds into the BC.- Overloaded Status - For donations, the payment status (waiting for payment, received payment notification from external provider), donation status (new, deleted in Fundraising Operation Center, Canceled on the Confirmation Page)
and moderation status (Address contains suspicious words, donation comment contains suspicious words)`` are kept in one field, leading to data loss on status changes because the status has no "permutations" for thethree~~ two different aspects. For memberships the situation is slightly better: Status is kept as flags in bit field on the database level, but that makes it hard to comprehend & has errors in the implementation. - Strings as monetary amounts - in donations the amount field is defined as a nullable, 250 character string. This is wrong on several levels - it could introduce errors with the decimal points and it's way too long. The correct type should be an integer (donation amount in cents)
- Inefficient Export status - We use nullable date fields as indicators for a donation being exported. This makes the index bloated (we'll never query for specific export dates) and slow.
- Too many nullable values - using NULL instead of empty strings for optional or anonymizable values leads to unnecessary null checks in properly typed PHP (e.g. fullName)
- Nullable IDs - Our repositories (mis)use the null value to determine if they need to create a new donation or update an existing one. If we wanted to be type safe, we'd need to litter our code with null checks, because the rest of the code only deals with loaded donations where we assume the ID to be always set (but don't check explicitly).
- Read-only IDs - Some of our entities (e.g. BucketLog, BucketLogBucket) have an id property that we don't set, because we're relying on Doctrine to autogenerate it from the database. This is database logic bleeding into the domain layer, mostly for convenience's sake.
- Signed IDs - We don't have negative IDs and should reflect that in our database schema with an UNSIGNED modifier
- Tracking data cannot be queried - For donations, we can't query a tracking field, because the tracking is hidden away in the data blob. For memberships and subscriptions, campaigns and keywords are "smushed together" in a nullable field, making it hard to query
During the refactoring, these problems should be rectified, with the following constraints:
- The FundraisingFrontend, Fundraising Operation Center and Export (SpendenDumper) must be operational at all times.
To enforce the separation of the bounded contexts, we want to get rid of FundraisingStore "library" dependency in the fundraising-donation, fundraising-memberships and fundraising-subscriptions code repositories. Each bounded context should manage its own database schema.- Entities in different bounded contexts are not allowed to reference each other (creating a dependency on entities outside the bounded context). If there is a reference on the storage level, for example to aggregate data for export, the reference should be inserted/changed at the storage level and should not show up in the entity definition.
- We want to keep the code of our domain objects unaffected by database technology - no ORM annotations on them. Use a separate XML file instead
Acceptance Criteria:
- All column names are in English
- If using ENUM or ENUM-Like columns (e.g. address_type), the values are in English.
- The data stored in the LOB is split out into tables
- The data stored in the "status" field is split among the entities, with a separate database field for each purpose.
- The export flags are booleans.
- Amounts have INTEGER or DECIMAL type (with correct conversion)
- All database access to Domain entities is behind one or more interfaces, using the Repository Pattern.
- We're still able to do database migrations across all bounded contexts.
- Entity IDs are non-nullable unsigned integers either set through required constructor arguments or through repository checks that generate a new integer ID when the id is 0.
- String and integer values are non-nullable and have empty strings or zero as defaults.
- We have a clear model of the various states the data can be in, with optional relationships depending on the state:
- Unconfirmed donations with external providers might have no payment data
- Anonymized donations don't have donor data
- Anonymized membership applications don't have applicant data
- etc
- The unit tests prove that the new repositories can transparently load and update data in the various states, without losing data or accidentally creating optional data
- Use utf8mb4_unicode_ci collation on all VARCHAR and TEXT fields to ensure maximum character support, case-insensitive search/indexing and good collation. See https://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci/766996#766996
Implementation Notes:
- To keep the Fundraising Operation Center and export working without huge refactorings, we could store the data redundantly, using the StranglerApplication pattern:
- Copy the DoctrineDonationRepository and DoctrineMembershipApplicationRepository to the FundraisingFrontend, change the Doctrine prefix to Legacy
- Change the DoctrineDonationRepository in fundraising-donation and fundraising-membership to map the Domain entities to database tables.
- In FundraisingFrontend, create DonationRepositoryWrapper and MembershipApplicationRepositoryWrapper classes that implement the repository interfaces and have the Legacy and newly created repositories as dependencies. Whenever an interface method is called, both repositories must receive the method call. There must be a "mapping table" that maps between the automatically generated IDs for the legacy and "new" entities.
- For more information on where to put the code for ID generation, see https://matthiasnoback.nl/2018/05/when-and-where-to-determine-the-id-of-an-entity/
- Not all data, especially the data for displaying information to the user (e.g. confirmation page or comment list) needs the full entities. In fact passing entities is an anti-pattern. Ideally, we would pass simple value objects instead. These value objects can also be queried without the ORM, with a SQL query builder, see https://twitter.com/nikolaposa/status/1141987195160616960
Possible tables/entities (to be discussed)
- donation
- Separate state into several states:
- payment state (kept in payment entity instead of donation entity)
- soft delete (in FOC). Soft-deleted entries could be hard deleted after a period of time, see T271731: Permanently delete soft-deleted donations and memberships.
- moderation of donor data and amount. Moderation of comments should go into the comments table. See also T137704: Add `isApproved` field for comments.
- export
- anonymized (could be a special AnonymizedDonor entity instead of a field)
- Maybe the concept of a "followup donation" (for recurring payments) that's linked to a previous donation and shares its entities. But maybe that's too much and should be implemented as a detail of the payment domain.
- Separate state into several states:
- donor
- Our donor classes are now polymorphic (see T220367: Split e-mail address out of postal address section), we need to find a way to put them either into the same table or in different tables, see Doctrine Inheritance Mapping.
- When purging the data, we'll probably replace the existing donor with an AnonymizedDonor data object. The existence of this object can act as a "has been anonymized" state for the donation. It should retain non-identifiable information that we use for statistics in the FOC (e.g. salutation and original donor type)
- This is a 1:1 relation. That means we don't get that many benefit as with other normalizations (where the normalization allows for better consistency and space savings on disk). The only benefit we would be getting from extracting this relationship would be that we could throw away lots of our PHP code (DonorFactory, DonorFieldMapper, some or all parts of LegacyToDomainConverter and DomainToLegacyConverter) and rely on the Doctrine ORM code.
- payment (we'll have to see how we model the different payment types & their metadata), see T192323: Improve expressiveness of Payment domain and reduce coupling and https://github.com/wmde/FundraisingFrontend/blob/master/doc/Planning_for_Payment_refactoring.md
- comment
- tracking (impcount, keyword and campaign from banner). See T134327: Clean up TrackingInfo for deprecated columns
- buckets (bucket and name campaign from banner)
- membership_applications
- disentangle status into single fields
- soft delete / canceled (see above)
- anonymized
- Exported
- Moderated
- disentangle status into single fields
- membership_applicants (We should follow the pattern in donations: different applicants types and an anonymized applicant type)
- subscriptions
- address_change
- Change Log (Which user in the Fundraising Operations Center triggered which status changes, payment notifications, cancellations by user, exports). Entries can come from all Applications.
- Users (for FOC)
- Impression counting (should be its own domain/entities in the backend, depending on the outcome of T243092: Find a better way for WMDE to get impression counts for their banners.
We don't need a 1:1 mapping between Domain object and tables. Objects (like DonorName) could be implemented as Doctrine Mapping Types.
For nullable relations, we should investigate if it makes sense to use null objects instead of null, to avoid excessive null checks.
Resources: