Page MenuHomePhabricator

Date extraction fails on Jessie && Node 4.2
Closed, ResolvedPublic

Description

One test in the Citoid test suite is failing on Debian Jessie running on Node 4.2:

Expected:
"2004-05-01"
Result:
"2004-01-01"

  1) scraping  using native scraper Article with doi and DublinCore metadata:

      AssertionError: false == true
      + expected - actual

      -false
      +true
      
      at Object.deepEqual (test/utils/assert.js:50:20)
      at Request.<anonymous> (test/features/scraping/index.js:354:24)
      at Request.tryCatcher (node_modules/bluebird/js/main/util.js:24:31)
      at Promise._settlePromiseFromHandler (node_modules/bluebird/js/main/promise.js:582:31)
      at Promise._settlePromiseAt (node_modules/bluebird/js/main/promise.js:727:18)
      at Promise._settlePromises (node_modules/bluebird/js/main/promise.js:845:14)
      at Async._drainQueue (node_modules/bluebird/js/main/async.js:79:16)
      at Async._drainQueues (node_modules/bluebird/js/main/async.js:89:10)
      at Async.drainQueues (node_modules/bluebird/js/main/async.js:14:14)

The test is seemingly successful on Trusty with Node 4.2. We need to investigate why is that happening.

Event Timeline

mobrovac raised the priority of this task from to High.
mobrovac updated the task description. (Show Details)
mobrovac added a project: Citoid.
mobrovac added subscribers: mobrovac, Mvolz.

Interestingly, fixDate() receives the string 2004/05/01 in Trusty, but on Jessie for the same citation only 2014 is received.

Inserting a console.trace() in fixDate() produces the following stack on Trusty:

at fixDate (/home/doorman/code/wikimiki/citoid/lib/Exporter.js:488:17)
at Object.translateProp [as translate] (/home/doorman/code/wikimiki/citoid/lib/translators/util/index.js:22:11)
at /home/doorman/code/wikimiki/citoid/lib/Scraper.js:171:25
at Array.forEach (native)
at translate (/home/doorman/code/wikimiki/citoid/lib/Scraper.js:167:24)
at doSyncMethods (/home/doorman/code/wikimiki/citoid/lib/Scraper.js:255:10)
at Request.<anonymous> (/home/doorman/code/wikimiki/citoid/lib/Scraper.js:285:11)
at Request.tryCatcher (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/util.js:24:31)
at Promise._settlePromiseFromHandler (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/promise.js:582:31)
at Promise._settlePromiseAt (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/promise.js:727:18)
at Async._drainQueue (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/async.js:84:12)
at Async._drainQueues (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/async.js:89:10)
at Async.drainQueues (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/async.js:14:14)
at doNTCallback0 (node.js:417:9)
at process._tickCallback (node.js:346:13)

But, on Jessie it yields:

at fixDate (/srv/citoid/lib/Exporter.js:488:17)
at /srv/citoid/lib/Exporter.js:430:16
at tryCatcher (/srv/citoid/node_modules/bluebird/js/main/util.js:24:31)
at /srv/citoid/node_modules/bluebird/js/main/promise.js:186:34
at Request.<anonymous> (/srv/citoid/lib/ZoteroService.js:99:12)
at Request.tryCatcher (/srv/citoid/node_modules/bluebird/js/main/util.js:24:31)
at Promise._settlePromiseFromHandler (/srv/citoid/node_modules/bluebird/js/main/promise.js:582:31)
at Promise._settlePromiseAt (/srv/citoid/node_modules/bluebird/js/main/promise.js:727:18)
at Promise._settlePromises (/srv/citoid/node_modules/bluebird/js/main/promise.js:845:14)
at Async._drainQueue (/srv/citoid/node_modules/bluebird/js/main/async.js:79:16)
at Async._drainQueues (/srv/citoid/node_modules/bluebird/js/main/async.js:89:10)
at Async.drainQueues (/srv/citoid/node_modules/bluebird/js/main/async.js:14:14)
at doNTCallback0 (node.js:419:9)
at process._tickCallback (node.js:348:13)

So it seems like different code paths are taken.

Without testing anything, my analysis:

2004 implies the date is coming from crossRef because they only give us year.

Full date implies the date is coming from the dublinCore data.

We do the crossRef data first though, so we get the bad date preferentially. But line 336 should be deleting the translator for the date as a hack for the fact that the date is bad from crossRef.

Wish I could test this, but what I'd suggest is to see if the citation has the data after the crossRef data has been added (either 336 or in Scraper.prototype.parseHTML) and if it does, we have a problem with modifying the js translator object.

Could be totally wrong, just an idea.

So, my local version (Trusty, Node 4.2) fails to get the data from Zotero and resorts to the native scraper, as seen in this trace:

[2016-01-20T01:37:34.082Z] TRACE: citoid/7943 on wiki-istria: incoming request (levelPath=trace/req, request_id=60ea1f11-bf16-11e5-909c-a0d6d74b8912, req.remoteAddress=127.0.0.1, req.remotePort=37335)
    GET /api?format=mediawiki&search=http%3A%2F%2Fmic.sgmjournals.org%2Fcontent%2Fjournal%2Fmicro%2F10.1099%2Fmic.0.26954-0 HTTP/1.1
    content-length: 0
    x-request-id: 60ea1f11-bf16-11e5-909c-a0d6d74b8912
    --
    req.params: {
      "0": "/api"
    }
    --
    req.query: {
      "format": "mediawiki",
      "search": "http://mic.sgmjournals.org/content/journal/micro/10.1099/mic.0.26954-0"
    }
[2016-01-20T01:37:34.086Z] TRACE: test-log/7943 on wiki-istria: Resolving hostname mic.sgmjournals.org
[2016-01-20T01:37:34.092Z] TRACE: test-log/7943 on wiki-istria: lookupAsync ran: 93.91.26.109
[2016-01-20T01:37:34.094Z] DEBUG: test-log/7943 on wiki-istria: Resolved mic.sgmjournals.org to 93.91.26.109
[2016-01-20T01:37:34.095Z] DEBUG: test-log/7943 on wiki-istria: mic.sgmjournals.org is public, and is allowed
[2016-01-20T01:37:34.105Z] DEBUG: test-log/7943 on wiki-istria: Zotero request made for: http://mic.sgmjournals.org/content/journal/micro/10.1099/mic.0.26954-0
[2016-01-20T01:37:34.105Z] TRACE: test-log/7943 on wiki-istria: No Zotero translator found, looking for redirects
[2016-01-20T01:37:34.106Z] TRACE: test-log/7943 on wiki-istria: Resolving hostname mic.sgmjournals.org
[2016-01-20T01:37:34.107Z] TRACE: test-log/7943 on wiki-istria: lookupAsync ran: 93.91.26.109
[2016-01-20T01:37:34.107Z] DEBUG: test-log/7943 on wiki-istria: Resolved mic.sgmjournals.org to 93.91.26.109
[2016-01-20T01:37:34.108Z] DEBUG: test-log/7943 on wiki-istria: mic.sgmjournals.org is public, and is allowed
[2016-01-20T01:37:34.539Z] TRACE: test-log/7943 on wiki-istria: Resolving hostname mic.microbiologyresearch.org
[2016-01-20T01:37:34.540Z] TRACE: test-log/7943 on wiki-istria: lookupAsync ran: 93.91.26.109
[2016-01-20T01:37:34.540Z] DEBUG: test-log/7943 on wiki-istria: Resolved mic.microbiologyresearch.org to 93.91.26.109
[2016-01-20T01:37:34.541Z] DEBUG: test-log/7943 on wiki-istria: mic.microbiologyresearch.org is public, and is allowed
[2016-01-20T01:37:35.964Z] TRACE: test-log/7943 on wiki-istria: Redirect detected to http://mic.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.26954-0
[2016-01-20T01:37:35.971Z] DEBUG: test-log/7943 on wiki-istria: Zotero request made for: http://mic.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.26954-0
[2016-01-20T01:37:35.972Z] DEBUG: test-log/7943 on wiki-istria: No Zotero response available.
[2016-01-20T01:37:35.973Z] DEBUG: test-log/7943 on wiki-istria: Using native scraper on http://mic.sgmjournals.org/content/journal/micro/10.1099/mic.0.26954-0
Trace: 

DATE: 2004/05/01
    at fixDate (/home/doorman/code/wikimiki/citoid/lib/Exporter.js:488:17)
    at Object.translateProp [as translate] (/home/doorman/code/wikimiki/citoid/lib/translators/util/index.js:22:11)
    at /home/doorman/code/wikimiki/citoid/lib/Scraper.js:172:25
    at Array.forEach (native)
    at translate (/home/doorman/code/wikimiki/citoid/lib/Scraper.js:168:24)
    at doSyncMethods (/home/doorman/code/wikimiki/citoid/lib/Scraper.js:256:10)
    at Request.<anonymous> (/home/doorman/code/wikimiki/citoid/lib/Scraper.js:286:11)
    at Request.tryCatcher (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/util.js:24:31)
    at Promise._settlePromiseFromHandler (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/promise.js:582:31)
    at Promise._settlePromiseAt (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/promise.js:727:18)
    at Async._drainQueue (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/async.js:84:12)
    at Async._drainQueues (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/async.js:89:10)
    at Async.drainQueues (/home/doorman/code/wikimiki/citoid/node_modules/bluebird/js/main/async.js:14:14)
    at doNTCallback0 (node.js:417:9)
    at process._tickCallback (node.js:346:13)
   #: 2004-05-01
[2016-01-20T01:37:37.705Z] DEBUG: test-log/7943 on wiki-istria: Sucessfully scraped resource at http://mic.sgmjournals.org/content/journal/micro/10.1099/mic.0.26954-0
[2016-01-20T01:37:37.981Z] DEBUG: test-log/7943 on wiki-istria: PubMed query made for: http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=citoid&email=citoid@mediawiki&format=json&ids=10.1099%2Fmic.0.26954-0

On the other hand, the Labs instance is quite happy with what it's getting from Zotero:

[2016-01-20T01:37:16.822Z] TRACE: citoid/24723 on citoid-jessie-test: incoming request (levelPath=trace/req, request_id=569fd811-bf16-11e5-8953-9a9344c3c42f, req.remoteAddress=127.0.0.1, req.remotePort=44498)
    GET /api?format=mediawiki&search=http%3A%2F%2Fmic.sgmjournals.org%2Fcontent%2Fjournal%2Fmicro%2F10.1099%2Fmic.0.26954-0 HTTP/1.1
    content-length: 0
    x-request-id: 569fd811-bf16-11e5-8953-9a9344c3c42f
    --
    req.params: {
      "0": "/api"
    }
    --
    req.query: {
      "format": "mediawiki",
      "search": "http://mic.sgmjournals.org/content/journal/micro/10.1099/mic.0.26954-0"
    }
[2016-01-20T01:37:16.838Z] TRACE: test-log/24723 on citoid-jessie-test: Resolving hostname mic.sgmjournals.org
[2016-01-20T01:37:16.908Z] TRACE: test-log/24723 on citoid-jessie-test: lookupAsync ran: 93.91.26.97
[2016-01-20T01:37:16.912Z] DEBUG: test-log/24723 on citoid-jessie-test: Resolved mic.sgmjournals.org to 93.91.26.97
[2016-01-20T01:37:16.914Z] DEBUG: test-log/24723 on citoid-jessie-test: mic.sgmjournals.org is public, and is allowed
[2016-01-20T01:37:16.982Z] DEBUG: test-log/24723 on citoid-jessie-test: Zotero request made for: http://mic.sgmjournals.org/content/journal/micro/10.1099/mic.0.26954-0
[2016-01-20T01:37:16.984Z] TRACE: test-log/24723 on citoid-jessie-test: No Zotero translator found, looking for redirects
[2016-01-20T01:37:16.985Z] TRACE: test-log/24723 on citoid-jessie-test: Resolving hostname mic.sgmjournals.org
[2016-01-20T01:37:16.986Z] TRACE: test-log/24723 on citoid-jessie-test: lookupAsync ran: 93.91.26.97
[2016-01-20T01:37:16.989Z] DEBUG: test-log/24723 on citoid-jessie-test: Resolved mic.sgmjournals.org to 93.91.26.97
[2016-01-20T01:37:16.990Z] DEBUG: test-log/24723 on citoid-jessie-test: mic.sgmjournals.org is public, and is allowed
[2016-01-20T01:37:17.194Z] TRACE: test-log/24723 on citoid-jessie-test: Resolving hostname mic.microbiologyresearch.org
[2016-01-20T01:37:17.265Z] TRACE: test-log/24723 on citoid-jessie-test: lookupAsync ran: 93.91.26.97
[2016-01-20T01:37:17.269Z] DEBUG: test-log/24723 on citoid-jessie-test: Resolved mic.microbiologyresearch.org to 93.91.26.97
[2016-01-20T01:37:17.271Z] DEBUG: test-log/24723 on citoid-jessie-test: mic.microbiologyresearch.org is public, and is allowed
[2016-01-20T01:37:18.528Z] TRACE: test-log/24723 on citoid-jessie-test: Redirect detected to http://mic.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.26954-0
[2016-01-20T01:37:19.934Z] DEBUG: test-log/24723 on citoid-jessie-test: Zotero request made for: http://mic.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.26954-0
Trace: 
DATE: 2004

    at fixDate (/srv/citoid/lib/Exporter.js:488:17)
    at /srv/citoid/lib/Exporter.js:430:16
    at tryCatcher (/srv/citoid/node_modules/bluebird/js/main/util.js:24:31)
    at /srv/citoid/node_modules/bluebird/js/main/promise.js:186:34
    at Request.<anonymous> (/srv/citoid/lib/ZoteroService.js:99:12)
    at Request.tryCatcher (/srv/citoid/node_modules/bluebird/js/main/util.js:24:31)
    at Promise._settlePromiseFromHandler (/srv/citoid/node_modules/bluebird/js/main/promise.js:582:31)
    at Promise._settlePromiseAt (/srv/citoid/node_modules/bluebird/js/main/promise.js:727:18)
    at Promise._settlePromises (/srv/citoid/node_modules/bluebird/js/main/promise.js:845:14)
    at Async._drainQueue (/srv/citoid/node_modules/bluebird/js/main/async.js:79:16)
    at Async._drainQueues (/srv/citoid/node_modules/bluebird/js/main/async.js:89:10)
    at Async.drainQueues (/srv/citoid/node_modules/bluebird/js/main/async.js:14:14)
    at doNTCallback0 (node.js:419:9)
    at process._tickCallback (node.js:348:13)
   #: 2004-01-01

[2016-01-20T01:37:19.949Z] DEBUG: test-log/24723 on citoid-jessie-test: Successfully retrieved body from Zotero
[2016-01-20T01:37:19.993Z] DEBUG: test-log/24723 on citoid-jessie-test: PubMed query made for: http://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=citoid&email=citoid@mediawiki&format=json&ids=10.1099%2Fmic.0.26954-0

Ok, nevermind. It was the fault of Zotero, naturally :(

mobrovac claimed this task.