Page MenuHomePhabricator

Beta Cluster search box displays unexisting pages as results
Closed, ResolvedPublic

Description

Per title. Enter "S", for example, on https://deployment.wikimedia.beta.wmflabs.org and you'll get spam titles of pages that no longer exist. This is happening since a lot of time now.

Event Timeline

  • Force reindexed the page from deployment-deploy01: mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php commonswiki --fromId 63986 --toId 63987
  • Page now exists.

so the question is why force index works, but edit-time indexing seems to be broken. I would expect errors of this nature to generate some logs, but performing edits and investigating logstash-beta.wmflabs.org isn't showing me anything interesting. Needs more investigation, will next try and find the appropriate jobs in the kafka topics and re-run the jobs manually from a repl.

Pulled a job from the job queue by logging into deployment-kafka-main-2 and running:

kafkacat -b localhost:9092 -t eqiad.mediawiki.job.cirrusSearchLinksUpdatePrioritized -o -100 -c 100 | jq 'if (.database == "commonswiki") then . else empty end'

one job:

{
  "database": "commonswiki",
  "mediawiki_signature": "af99ee855340fcc1c080f1710ade2c4c17d1b2d77de6c5371ce0b46908c6d379",
  "meta": {
    "domain": "commons.wikimedia.beta.wmflabs.org",
    "dt": "2019-02-05T15:55:43+00:00",
    "id": "7e0daf34-295e-11e9-94bf-fa163ee123fd",
    "request_id": "XFmx-awQBGoAAEUIE34AAABW",
    "schema_uri": "mediawiki/job/2",
    "topic": "mediawiki.job.cirrusSearchLinksUpdatePrioritized",
    "uri": "https://commons.wikimedia.beta.wmflabs.org/wiki/File:Super.pdf"
  },
  "page_namespace": 6,
  "page_title": "File:Super.pdf",
  "params": {
    "addedLinks": [],
    "cluster": null,
    "prioritize": true,
    "removedLinks": [],
    "requestId": "XFmx-awQBGoAAEUIE34AAABW"
  },
  "sha1": "8d9c9e8d485c50d1caf0db03458e0077aeb22502",
  "type": "cirrusSearchLinksUpdatePrioritized"
}

Running this job in mwrepl via;

$title = Title::newFromText("File:Super.pdf");
$params = ["addedLinks"=>[],"cluster"=> null,"prioritize"=>true,"removedLinks"=>[],"requestId"=>"XFmx-awQBGoAAEUIE34AAABW"]
Job::factory('cirrusSearchLinksUpdatePrioritized', $title, $params)->run();

The page now exists in the index as well, suggesting that when the jobs are run they don't always fail. I'm not sure where to look next, @Pchelolo Is there anything related to job queue i could look at to see what the result of these jobs is? afaict by running them manually they work, but I'm not seeing the results of the jobs show up on wiki when run through the job queue.

Ulala...

curl -XPOST http://deployment-jobrunner03.deployment-prep.eqiad.wmflabs:9005/rpc/RunSingleJob.php

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /rpc/RunSingleJob.php was not found on this server.</p>
</body></html>

Seems like the job runner has been moved to a new VM and it's not configured correctly. The PHP file itself exists, so I assume the apache configs are incorrect.

Based on the apache config i see inside the server port 9005 only accepts health checks. RunSingleJob.php is accepted on port 9006. This looks to have been done as part of the jobrunner php7 support in https://gerrit.wikimedia.org/r/481866. Prior to this patch it looks like all .php endpoints were accepted, and after this patch they have to match a specific rewrite rule.

The problem looks to be that in production the jobrunners have nginx running on port 443 proxying requests to the $local_only port 9006. On deployment-jobrunner03 nginx is not installed. This is because `role::mediawiki::jobrunner' has the following block of code. beta cluster doesn't have lvs, so tls is not installed. Without tls installed nothing can make requests against RunSingleJob.php.

# TODO: change role used in beta
if hiera('has_lvs', true) {
    include ::role::lvs::realserver
    include ::profile::mediawiki::jobrunner_tls
}

I'm not sure who exactly to pass this off to, but I think fixing the installation of deployment-prep job queue is a bit beyond me.

Testing Structured Data on Commons on beta is blocked by this ... @Pchelolo is this something you're able to look at?

Triaging as High, but i suspect if this is blocking SDoC it could perhaps be unbreak-now ?

Krenair subscribed.

It looks like this got resolved by Joe in the child ticket