Page MenuHomePhabricator

The GenerateFileList function in data/site-stats.js seems to be breaking
Open, Needs TriagePublic

Description

In this particular Patch : https://gerrit.wikimedia.org/r/c/wikimedia/portals/+/1008095. We migrated away from the npm package moment. Now I feel it is breaking with the error logs mentioned below

[07:47:52] Using gulpfile ~/Desktop/portals/gulpfile.js
[07:47:52] Starting 'update-stats'...
TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-000000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-000000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-010000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-010000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-020000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-020000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-030000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-030000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-040000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-040000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-050000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-050000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-060000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-060000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-070000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-070000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-080000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-080000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-090000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-090000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-100000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-100000

TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-110000
Unhandled rejection TypeError: fetch failed requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-03/projectviews-20240312-110000

I dont think the particular patch is the reason for it, as the URL seems to be valid. I also made a revert of the above mentioned patch and the error still persists

Event Timeline

Punith.nyk updated the task description. (Show Details)
Punith.nyk updated the task description. (Show Details)
Reedy renamed this task from The GenerateFileList function in data/site-stats.js seems to be braking to The GenerateFileList function in data/site-stats.js seems to be breaking .Mar 13 2024, 2:54 AM
Reedy updated the task description. (Show Details)

I tried a random URL from your stack trace, and it opens correctly in my browser, so it doesn’t sound like 64f44423eedd91160c727f8642c8ebce0dabc3e9 would be the culprit. 6fe410491f1e1f285ef48e7855a984e8d0345674, however, could cause issues. fetch is experimental in Node.js 18 LTS and 20 LTS, becoming stable only in Node.js 21 (non-LTS) and, consequently, in 22 LTS (to be released next month). Which Node.js version do you use?

function httpGet(url) {
  // Determine if the URL is HTTP or HTTPS
  var protocol = url.startsWith('https://') ? https : http;

  var options = {
    headers: { 'User-Agent': 'Wikimedia portals updater' }
  };

  return new Promise(function (resolve, reject) {
    var req = protocol.get(url, options, function (res) {
      var responseData = [];

      res.on('data', function (chunk) {
        responseData.push(chunk);
      });

      res.on('end', function () {
        var response = {
          status: res.statusCode,
          headers: res.headers,
          body: Buffer.concat(responseData)
        };

        console.log(response.body);
        resolve(response.body);
      });
    });

    req.on('error', function (err) {
      // Handle errors
      var msg = err.toString() + ' requesting ' + url;
      console.error(msg);
      reject(msg);
    });

    req.end();
  });
}

okay by making simple layman anaylysis I got to this I campared how preq and BBPromise worked (which I should have done it earlier)

I tried a random URL from your stack trace, and it opens correctly in my browser

… but it doesn’t return JSON, so maybe we shouldn’t blindly call response.json() in httpGet.

I get a similar error (as does @ehughes)

[11:48:54] Using gulpfile ~/repos/portals/gulpfile.js
[11:48:54] Starting 'update-stats'...
SyntaxError: Unexpected non-whitespace character after JSON at position 3 (line 1 column 4) requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-04/projectviews-20240409-000000
Unhandled rejection SyntaxError: Unexpected non-whitespace character after JSON at position 3 (line 1 column 4) requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-04/projectviews-20240409-000000

SyntaxError: Unexpected non-whitespace character after JSON at position 3 (line 1 column 4) requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-04/projectviews-20240409-010000
Unhandled rejection SyntaxError: Unexpected non-whitespace character after JSON at position 3 (line 1 column 4) requesting https://dumps.wikimedia.org/other/pageviews/2024/2024-04/projectviews-20240409-010000

...

Node version v21.6.1

@Punith.nyk, @Tacsipacsi is right, we shouldn't be blindly calling response.json() in httpGet, but converting the response to JSON when necessary is still a nice convenience to have. We can make the following edit to httpGet() to wrap the JSON.parse() in a try catch statement:

function httpGet( url ) {
  var options = { headers: { 'User-Agent': 'Wikimedia portals updater' } };
  return fetch( url, options )
    .then( response => response.text() )
    .then( responseText => {
        try {
            const responseJSON = JSON.parse( responseText );
            return Promise.resolve( responseJSON );
	} catch {
	    return Promise.resolve( responseText );
	}
    } )
    .catch( err => {
        var msg = err.toString() + ' requesting ' + url;
	console.error( msg ); // eslint-disable-line no-console
	return Promise.reject( msg );
    } );
}

Even with this edit though, I'm still getting a blank (no wiki) output when running the portal build locally, so I'll look into this some more.

Instead of trying if the response is JSON, I think httpGet should have a second parameter that specifies whether it is. I tried this:

diff --git a/data/site-stats.js b/data/site-stats.js
index 8b43cf73..ef712437 100644
--- a/data/site-stats.js
+++ b/data/site-stats.js
@@ -23,11 +23,16 @@ var BBPromise = require( 'bluebird' ),
                zero: ''
        };
 
-function httpGet( url ) {
+/**
+ * @param {string} url
+ * @param {boolean} json Whether the return value should be interpreted as JSON.
+ *  Otherwise it’s interpreted as UTF-8 text.
+ */
+function httpGet( url, json = true ) {
        var options = { headers: { 'User-Agent': 'Wikimedia portals updater' } };
 
        return fetch( url, options )
-               .then( response => BBPromise.resolve( response.json() ) )
+               .then( response => BBPromise.resolve( json ? response.json() : response.text() ) )
                .catch( err => {
                        // I can haz error message that makes sense?
                        var msg = err.toString() + ' requesting ' + url;
@@ -134,7 +139,7 @@ function getViewsData() {
                } catch ( ex ) {
                        if ( !content ) {
                                promise = promise.then( function () {
-                                       return httpGet( hour.url )
+                                       return httpGet( hour.url, false )
                                                .then( function ( text ) {
                                                        if ( !text ) {
                                                                return;

…and the cache files were generated without errors, but then I run into a seemingly unrelated problem (Error: error:0308010C:digital envelope routines::unsupported). (I may not have run this project on my new computer yet, so it’s probably a setup issue.)

(By the way, it was me who proposed calling .json() in https://gerrit.wikimedia.org/r/c/wikimedia/portals/+/1008109/comment/62e991ae_ea557d44/ without thinking enough about it. Sorry.)

Change #1022151 had a related patch set uploaded (by Jdrewniak; author: Jdrewniak):

[wikimedia/portals@master] Convert HTTP response to JSON Conditionally

https://gerrit.wikimedia.org/r/1022151

Change #1022157 had a related patch set uploaded (by Jdrewniak; author: Jdrewniak):

[wikimedia/portals@master] Fix Stats.format() function

https://gerrit.wikimedia.org/r/1022157

Change #1022151 merged by jenkins-bot:

[wikimedia/portals@master] Convert HTTP response to JSON Conditionally

https://gerrit.wikimedia.org/r/1022151

Change #1022157 merged by jenkins-bot:

[wikimedia/portals@master] Fix Stats.format() function

https://gerrit.wikimedia.org/r/1022157