## Background
MediaWiki's link database tables are among the largest tables of any WMF production database. Optimizing the way we store and retreive this would generally see benefits such as:
* lower diskspace per database: slower growth, less hardware in the long run (co-locate more wikis together), and ensures each individual replica host can continue to fit and perform well with a full core database of a given wiki.
* faster backup and recovery,
* lower replication lag (smaller queries, smaller binlogs).
This is a continuation in spirit of {T222224} and {T300222}, where a similar optimization and major win was made to the `templatelinks` table (with upcoming work for `pagelinks`, `imagelinks` and `categorylinks`).
## externalinks
This table has grown a lot in production. It's one of the biggest tables for Commons, at 200GB, and will cause more issues in the future.
Unlike other links table, this table can't be normalized using the `linktarget` feature (T299417). We currently store each external link upto 8 times as part of a single row (and its indexes).
For example, this is a typical row today:
```
*************************** 4. row ***************************
el_id: 9
el_from: 379228
el_to: http://www.iau.org/public_press/themes/naming/#minorplanets
el_index: http://org.iau.www./public_press/themes/naming/#minorplanets
el_index_60: http://org.iau.www./public_press/themes/naming/#minorplanets
```
One external link is three ways in three columns of a given row, and those columns in turn are used in **five** different indexes.
**Option A: Create new table**
Create a new table holding the domai names (call it, `externaldomain`?) and it would hold the domain and its inverse.
Something like:
```
ed_id: 1234
ed_domain: http://www.iau.org
ed_index: http://org.iau.www.
```
The row externallinks would be reduced to
```
el_id: 9
el_from: 379228
el_to_domain: 1234 # Key to externaldomain.ed_id
el_to_path: public_press/themes/naming/#minorplanets
```
**Option B:**
Avoid creating a new table by storing the domain inline, using only the index variant. We go from 8 to 3 copies (1 in the columns, plus two large indexes).
```
el_id: 9
el_from: 379228
el_to_domain_index: http://org.iau.www.
el_to_path: /public_press/themes/naming/#minorplanets
```
Change the indexes:
```lang=diff
- INDEX el_from (el_from, el_to(40)),
- INDEX el_from_index_60 (el_from, el_index_60, el_id),
+ INDEX el_page_domain_id (el_from, el_to_domain_index, el_id),
- INDEX el_to (el_to(60), el_from),
+ INDEX el_domain_page (el_to_domain_index, el_from),
- INDEX el_index (el_index(60)),
- INDEX el_index_60 (el_index_60, el_id),
```
**Option B-2** (chosen)
Per T312666#8146447, even further minimised the index. This reduces from 8 to 2 copies (1 in the columns, and 1 large index).
```lang=yaml
el_id: 9
el_from: 379228
el_to_domain_index: http://org.iau.www.
el_to_path: /public_press/themes/naming/#minorplanets
```
Change the indexes:
```lang=diff
- INDEX el_from (el_from, el_to(40)),
- INDEX el_from_index_60 (el_from, el_index_60, el_id),
+ INDEX el_from (el_from, el_id),
- INDEX el_to (el_to(60), el_from),
+ INDEX el_to (el_to_domain_index, el_from),
- INDEX el_index (el_index(60)),
- INDEX el_index_60 (el_index_60, el_id),
```
Prior art: {T59176}