Now that T145231: TextExtracts exception on very long repetitive content has been fixed, the TextExtracts API can now respond to requests for large numbers of characters without falling over. This is great (!) but we should define/enforce limits for both in order to limit memory consumption.
- Define sensible limits for the number of characters/sentences that can be requested.
- If the client request exceeds those limits, then:
- Limit the extract, e.g. return 1050 characters of a 3000 character extract.
- Add a warning to the output notifying the client of the action.
Characters: 1050 – Page Previews, for example, requests 525.