**Requirements**
Based on initial examination of section topics, some sections should not have topics generated and store in the section topics pipeline:
- (M) References
- (M) External links
- (M) Further reading
- ~~(XS) last section~~ - **Update: this rule is too strong. it actually wipes out plenty of potentially useful sections, see `/user/mfossati/section_topics/last_section_titles` on HDFS for a detailed dataset**
- We can base which sections to exclude similar to the task for add links T279519
Excluded:
- (XL) tree type sections (if possible?)
- (S) Infoboxes - can be used to understand what are important sections on the article level
- (L) Sections without textual content
Should we exclude templates?
**Usage Note:**
Note that section topics will be used for section level image suggestions and certain sections are to be excluded to have images suggestion to them as per https://phabricator.wikimedia.org/T311730. The sections excluded from having topics is a subset of section excluded of having images recommended.
==Estimated complexity breakdown==
Complexity varies depending on what we want to exclude:
* sections without textual content may be tricky. L complexity to figure that out
* references, external links, further reading can be tackled with section alignments machine-learned by Research. M complexity
* infoboxes should be easy - S
* last section with category links is trivial - XS
* tree-type sections look like the upper bound, as I have no idea - XL