Page MenuHomePhabricator

Review the entire flow of interaction between VisualEditor Citoid extension, the Citoid service and Zotero
Open, Needs TriagePublicFeature

Description

When an editor enters a URI into the Citoid extension, it gets passed to the Citoid service, which performs preliminary checks then typically hands over to Zotero to download and analyze the resource. However, the exact details of this interaction works are complex, and careful review may allow us to improve and simplify the experience for both the editor and the resource provider.

Priorities

  • Clean separation of responsibilities between modules
  • Sufficient data flow for detailed error reporting and logging
  • Enough flexibility to ensure a good experience for the editor (e.g. identifying errors and offering suitable fallbacks)
  • Enough flexibility to ensure a good experience for the resource provider (e.g. avoiding multiple downloads and request flooding)
  • Readable code

Some relevant technical details

The typical logic flow for a URL request is:

  1. The editor enters a URL into the VisualEditor Citoid extension.
  2. Citoid's CitoidService.js detects that the resource given is a URL, and calls requestFromURL.
  3. Citoid's requestFromURL calls hostIsAllowed.js to do a DNS check. (This dates back to a 2015 security fix, T98533)
    • If the address is not allowed, then requestFromURL rejects with a CitoidError wrapping an AddressError.
  4. Citoid's requestFromURL then calls unshorten.js to resolve redirects, by making HTTP requests directly to the URL host. (This dates back to 2014, at which time Zotero apparently could not resolve redirects itself)
    • unshorten calls hostIsAllowed repeatedly, which does the DNS check once more for each redirect.
    • If unshorten receives any response with a status code >= 400, then requestFromURL rejects with a CitoidError wrapping the underlying HTTP error from preq.
  5. Once redirects are resolved, Zotero is passed the resulting URL.
    • If Zotero receives an error response from the URL host, then Zotero rejects with a generic 500 status code, meaning the Citoid service cannot see the details of the original error response. (@zoe has submitted a fix to Zotero to return the original status code)
  6. If Zotero receives a successful response it can parse, it the Citoid service passes it back to the VisualEditor Citoid extension to provide to the editor.

Event Timeline

Something to note: preq library will automatically retry requests in certain situations so potentially every preq request is two.

Also: We could considerably cut down on request number by porting this to Zotero, but then we lose the ability to log as logging is not enabled for Zotero. And fixing logging in Zotero to make it suitable for our production environment would touch a lot of code. :/

@dchan suggested I pop the details of how I've been tinkering with zotero/citoid in here.

  • For citoid
npm install
node server.js
  • For Zotero
npm install
npm start
  • hit up http://localhost:1970/?doc or the API directly, eg
curl -X 'GET' \
  'http://localhost:1970/api?format=mediawiki&search=https%3A%2F%2Fmembers.parliament.uk%2Fmember%2F1524%2Fcontact' \
  -H 'accept: application/json; charset=utf-8;'

In addition, it's possible to see the full flow by adding some instrumentation to the mix:

  • Jaeger
docker run --rm --name jaeger \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:1.60
  • Citoid
env OTEL_NODE_DISABLED_INSTRUMENTATIONS=fs OTEL_SERVICE_NAME=citoid OTEL_TRACES_EXPORTER=otlp OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces node --require @opentelemetry/auto-instrumentations-node/register server.js
  • Zotero
env OTEL_NODE_DISABLED_INSTRUMENTATIONS=fs OTEL_SERVICE_NAME=zotero OTEL_TRACES_EXPORTER=otlp OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces node --require @opentelemetry/auto-instrumentations-node/register src/server.js

and then visiting http://localhost:16686/search to see the traces