Page MenuHomePhabricator

Strip trailing slashes in url field
Open, LowPublic0 Estimated Story Points

Description

URLs sent to the service seem to sometimes come back with trailing / and sometimes not.

Reproduce --

  1. Start VE
  2. Open Citoid inspector
  3. Type the string 'amazon.com' in the lookup input
  4. Click 'lookup'
  5. Examine the network response
  6. Repeat 2-3 times

Result -- sometimes the response shows url with trailing slash (http://amazon.com/) and sometimes without (http://amazon.com)

These are two instances of the stored object from the document store for the same reference node created by citoid for the query "amazon.com"; one has / and one doesn't, which caused the store to think it didn't exist the second time, rather than pull it from the cache.

[{"mw":{"parts":[{"template":{"params":{"accessdate":{"wt":"2015-03-19"},"title":{"wt":"Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more"},"url":{"wt":"http://amazon.com"}},"target":{"href":"Template:Cite web","wt":"Cite web"}}}]},"type":"mwTransclusionBlock"},null]

vs

[{"mw":{"parts":[{"template":{"params":{"accessdate":{"wt":"2015-03-19"},"title":{"wt":"Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more"},"url":{"wt":"http://amazon.com/"}},"target":{"href":"Template:Cite web","wt":"Cite web"}}}]},"type":"mwTransclusionBlock"},null]

Event Timeline

Mooeypoo raised the priority of this task from to Needs Triage.
Mooeypoo updated the task description. (Show Details)
Mooeypoo added a project: Citoid.
Mooeypoo subscribed.
Mooeypoo set Security to None.

Aha, you found a zotero instability issue. What's happening is that sometimes Zotero is translating this url, and sometimes it gives me a 500 and then the url gets kicked to the native scraper, which uses the url without the slash.

Example of success/failure:

zotero(5)(+0021483): POST /web HTTP/1.1
host: localhost:1969
accept: application/json
content-type: application/json
content-length: 82
Connection: keep-alive



zotero(3)(+0000001): Translators: Looking for translators for http://amazon.com/

zotero(3)(+0000000): Loading http://amazon.com/

zotero(5)(+0000001): CookieSandbox: Managing cookies for amazon.com

zotero(5)(+0000000): CookieSandbox: Added cookies for request to amazon.com

zotero(5)(+0000238): CookieSandbox: Managing cookies for amazon.com

zotero(5)(+0000000): CookieSandbox: Managing cookies for www.amazon.com

zotero(5)(+0000000): CookieSandbox: Cleared cookies to be sent to www.amazon.com

zotero(5)(+0000157): CookieSandbox: Managing cookies for www.amazon.com

zotero(5)(+0000000): CookieSandbox: Rejected cookies from www.amazon.com

zotero(3)(+0001494): Translators: Looking for translators for http://amazon.com/

zotero(4)(+0000001): Translate: Binding sandbox to http://amazon.com/

zotero(4)(+0000003): Translate: Parsing code for Amazon

zotero(4)(+0000017): Translate: Parsing code for unAPI

zotero(4)(+0000006): Translate: Parsing code for COinS

zotero(4)(+0000008): Translate: Parsing code for DOI

zotero(4)(+0000009): Translate: Parsing code for Embedded Metadata

zotero(3)(+0000010): Translate: Embedded Metadata: found 12 meta tags.

zotero(3)(+0000010): Translate: Creating translate instance of type import in sandbox

zotero(4)(+0000000): Translate: Binding sandbox to http://amazon.com/

zotero(4)(+0000002): Translate: Parsing code for RDF

zotero(3)(+0000005): Translate: Initializing RDF data store

zotero(3)(+0000003): Translate: All translator detect calls and RPC calls complete

zotero(5)(+0000000): Translate: Running handler 0 for translators

zotero(4)(+0000000): Translate: Parsing code for Embedded Metadata

zotero(3)(+0000005): Translate: Beginning translation with Embedded Metadata

zotero(3)(+0000007): Translate: Embedded Metadata: found 12 meta tags.

zotero(3)(+0000008): Translate: Creating translate instance of type import in sandbox

zotero(4)(+0000000): Translate: Binding sandbox to http://amazon.com/

zotero(4)(+0000003): Translate: Parsing code for RDF

zotero(3)(+0000004): Translate: Initializing RDF data store

zotero(5)(+0000033): Translate: Running handler 0 for itemDone

zotero(3)(+0000016): Translate: Title was not found in meta tags. Using document title as title

zotero(3)(+0000000): Translate: Looking for authors in byline, vcard

zotero(3)(+0000001): Translate: Found 1 elements with 'byline' class

zotero(3)(+0000001): Translate: Found 0 elements with 'vcard' class

zotero(3)(+0000001): Translate: Extracting author(s) from byline: 

zotero(3)(+0000002): Translate: Translation successful

zotero(5)(+0000000): Translate: Running handler 0 for done

zotero(3)(+0000001): itemToServerJSON: Discarded field libraryCatalog: field not valid for type webpage

zotero(5)(+0000000): HTTP/1.0 200 OK
Content-Type: application/json

[[{"itemKey":"SK6JPMSV","itemVersion":0,"itemType":"webpage","creators":[],"tags":[],"abstractNote":"Online shopping from the earth's biggest selection of books, magazines, music, DVDs, videos, electronics, computers, software, apparel & accessories, shoes, jewelry, tools & hardware, housewares, furniture, sporting goods, beauty & personal care, broadband & dsl, gourmet food & just about anything else.","url":"http://amazon.com/","title":"Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more","accessDate":"CURRENT_TIMESTAMP","shortTitle":"Amazon.com"}]]

Failure:

zotero(5)(+0006672): POST /web HTTP/1.1
host: localhost:1969
accept: application/json
content-type: application/json
content-length: 82
Connection: keep-alive



zotero(3)(+0000000): Translators: Looking for translators for http://amazon.com/

zotero(3)(+0000001): Loading http://amazon.com/

zotero(5)(+0000000): CookieSandbox: Managing cookies for amazon.com

zotero(5)(+0000000): CookieSandbox: Added cookies for request to amazon.com

zotero(5)(+0000217): CookieSandbox: Managing cookies for amazon.com

zotero(5)(+0000000): CookieSandbox: Managing cookies for www.amazon.com

zotero(5)(+0000000): CookieSandbox: Cleared cookies to be sent to www.amazon.com

zotero(5)(+0000144): CookieSandbox: Managing cookies for www.amazon.com

zotero(5)(+0000001): CookieSandbox: Rejected cookies from www.amazon.com

zotero(3)(+0001720): Translators: Looking for translators for http://amazon.com/

zotero(4)(+0000000): Translate: Binding sandbox to http://amazon.com/

zotero(4)(+0000002): Translate: Parsing code for Amazon

zotero(3)(+0000050): Translate: All translator detect calls and RPC calls complete

zotero(5)(+0000000): Translate: Running handler 0 for translators

zotero(4)(+0000000): Translate: Parsing code for Amazon

zotero(3)(+0000003): Translate: Beginning translation with Amazon

zotero(3)(+0000006): Translate: Scraping from Page

zotero(3)(+0000002): invalid 'in' operand elements at chrome://translation-server/content/utilities.js:1039

zotero(2)(+0000000): Translate: Translation using Amazon failed: 
string => TypeError: invalid 'in' operand elements
stack => Zotero.Utilities.xpath@chrome://translation-server/content/utilities.js:1039
Zotero.Utilities.xpathText@chrome://translation-server/content/utilities.js:1110
Zotero.Translate.SandboxManager.prototype.importObject/attachTo[localKey]@chrome://translation-server/content/translation/translate_firefox.js:476
scrape@Amazon:186
doWeb@Amazon:90
Zotero.Translate.Base.prototype._translateTranslatorLoaded@chrome://translation-server/content/translation/translate.js:1156
Zotero.Translate.Web.prototype._translateTranslatorLoaded@chrome://translation-server/content/translation/translate.js:1724
Zotero.Translate.Base.prototype.translate/<@chrome://translation-server/content/translation/translate.js:1123
Zotero.Translate.Base.prototype._loadTranslator@chrome://translation-server/content/translation/translate.js:1479
Zotero.Translate.Base.prototype.translate@chrome://translation-server/content/translation/translate.js:1123
Zotero.Translate.Web.prototype.translate@chrome://translation-server/content/translation/translate.js:1715
Zotero.Server.Translation.Web.prototype.translators@chrome://translation-server/content/server_translation.js:250
Zotero.Translate.Base.prototype._runHandler@chrome://translation-server/content/translation/translate.js:966
Zotero.Translate.Base.prototype._detectTranslatorsCollected@chrome://translation-server/content/translation/translate.js:1445
Zotero.Translate.Base.prototype.complete@chrome://translation-server/content/translation/translate.js:1287
Zotero.Translate.Web.prototype.complete@chrome://translation-server/content/translation/translate.js:1846
Zotero.Translate.Base.prototype.decrementAsyncProcesses@chrome://translation-server/content/translation/translate.js:937
Zotero.Translate.Base.prototype._detectTranslatorLoaded@chrome://translation-server/content/translation/translate.js:1436
Zotero.Translate.Base.prototype._detect/<@chrome://translation-server/content/translation/translate.js:1417
Zotero.Translate.Base.prototype._loadTranslator@chrome://translation-server/content/translation/translate.js:1479
Zotero.Translate.Base.prototype._detect@chrome://translation-server/content/translation/translate.js:1417
Zotero.Translate.Base.prototype._getTranslatorsTranslatorsReceived@chrome://translation-server/content/translation/translate.js:1072
Zotero.Translate.Web.prototype._getTranslatorsGetPotentialTranslators/<@chrome://translation-server/content/translation/translate.js:1676
Zotero.Translators.CodeGetter.prototype.getCodeFor@chrome://translation-server/content/connector/translator.js:351
Zotero.Translators.CodeGetter@chrome://translation-server/content/connector/translator.js:343
Zotero.Translators</this.getWebTranslatorsForLocation@chrome://translation-server/content/connector/translator.js:219
Zotero.Translate.Web.prototype._getTranslatorsGetPotentialTranslators@chrome://translation-server/content/translation/translate.js:1673
Zotero.Translate.Base.prototype.getTranslators@chrome://translation-server/content/translation/translate.js:1025
Zotero.Server.Translation.Web.prototype.init/</<@chrome://translation-server/content/server_translation.js:211
Zotero.HTTP.processDocuments/onLoad@chrome://translation-server/content/hacks.js:126

url => http://amazon.com/
downloadAssociatedFiles => undefined
automaticSnapshots => undefined

zotero(5)(+0000001): Translate: Running handler 0 for done

zotero(5)(+0000000): HTTP/1.0 500 Internal Server Error
Content-Type: text/plain

An error occurred during translation. Please check translation with Zotero client.

@Mooeypoo, were you using the server in production or on localhost?

Mvolz renamed this task from Citoid's formatting of URLs should be consistent to Strip trailing urls in url field.Mar 21 2015, 12:30 PM
Mvolz triaged this task as Medium priority.
Mvolz renamed this task from Strip trailing urls in url field to Strip trailing slashes in url field.Mar 21 2015, 12:46 PM
Mvolz lowered the priority of this task from Medium to Low.Mar 27 2015, 1:39 PM
Mvolz removed Mvolz as the assignee of this task.Aug 5 2015, 8:18 AM
Restricted Application added a subscriber: TerraCodes. · View Herald Transcript