Please do a security review of Wikibase-Quality-Constraints
Thank you very much!
= Project =
* wikibase/wikibase, wikibase/data-model, wikibase/quality
= Review =
**Did we work with the team during the design phase?** No
== Architecture issues
* (T101303) The update process is a manual ETL, and has to be run by a deployer for each data update. It's not clear why the database and tools need to live on the cluster.
* The design behind FormatChecker makes the safety of the processing rely on the CSV not being hostile. Unless there's a way to indicate what needs to be in the regex and you build a known good regex, I'm not sure that check can be run in production. Also noted below as T101467.
* (T101305) Escape SQL input in the functions that interact with the DB
* SpecialConstraintReport should use an HTMLForm, which will add csrf protection
* (T101306) Use escaped() instead of text() when inserting messages into raw html
* (T101308) Escape $entityId->getSerialization()
* Document which function parameters are expected to be sanitized HTML, and when functions return HTML. If the safety of the returned HTML relies on an input parameter being properly sanitized, document that assumption.
* (T101468 / T101469) CommonsLinkChecker: proxying user-controlled values is dangerous-- need to url encode any user controlled values, and the url needs to be http**s**://. Need to check with ops that the url is accessible, I think you need to use the outbound proxy to access commons like that. Also, get_headers doesn't validate tls certificate, you need to use something that does.
* UniqueValueChecker - if this isn't working, the class should be removed.
* (T101467) FormatChecker executes a regex from the Contstraint table-- stored regex DoS.
* The result messages are all hardcoded English-- those should be translatable.
* The performance characteristics of this do not look good; you may want to work with @aaron to find better ways to do some of these things