Page MenuHomePhabricator

Search engines do not index some pages
Closed, InvalidPublic

Description

Author: qleah

Description:
This makes it very difficult to find some specific discussions when full-text
search is disabled. It may be that many talk pages are "orphaned" from Google's
perspective, and will never be found during a regular crawl of the website. For
example, searching for "Discussions about the MediaWiki namespace" produces two
results only, ''on a mirror''.


Version: unspecified
Severity: major
URL: http://www.yourencyclopedia.net/Wikipedia_talk:MediaWiki_namespace_text.html

Details

Reference
bz546
TitleReferenceAuthorSource BranchDest Branch
Rename jenkins-slave to jenkins-agentrepos/releng/jenkins-deploy!29hasharjenkins-agent-user-renamemaster
data.yaml: introduce mariadb image for jobs-frameworkrepos/cloud/toolforge/image-config!4aborreromariadbmain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 6:57 PM
bzimport set Reference to bz546.
bzimport added a subscriber: Unknown Object (MLST).

mshiltonj wrote:

I am having this problem as well.

I was running 1.4 and a few days ago I upgraded to 1.5beta3. I ran the two
commandline php upgrade scripts in order -- upgrade1_5.php update.php.

The upgrade seemed to have been completely transparent and successful. Then late
friday afternoon (it's always a late friday afternoon, isn't it?), people began
to notice that searching wasn't working.

I'm not that familiar with mediawiki's internals yet, but I think I narrowed it
down to this:

Search works, but new articles created since the upgrade are not getting into
the index. I even dropped and rebuilt in search index (with
rebuildtextindex.php) hoping that would fix it, but I still get the same results

  • only content created while still running the 1.4 version is showing up in the

search results even after a rebuild.

I *think*, but am not really not sure, that it has something to with how stuff
gets into the index. The index update is triggered by data in the
'recentchanges' table? The updateSearchIndex.php script seems to select against
the recentchanges table to find out what to put in the search index.

In trying to track in down further, I found that running the
'rebuildrecentchanges.php' script dies with this error:

php rebuildrecentchanges.php
Loading from CUR table...
Loading from OLD table...
A database error has occurred
Query: INSERT INTO recentchanges
(rc_timestamp,rc_cur_time,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type)
SELECT
old_timestamp,cur_timestamp,old_user,old_user_text,old_namespace,old_title,old_comment,old_minor_edit,0,0,cur_id,old_id,0,0
FROM old,cur WHERE old_namespace=cur_namespace AND old_title=cur_title ORDER
BY old_timestamp DESC LIMIT 5000
Function:
Error: 1146 Table 'midev_wiki_db.old' doesn't exist

That is correct, the 'old' table does not exist in my schema. Further, when I
ran the script, *I lost all of the recent changes data made since the upgrade.*
The content and history of individual articles was not lost, and new edits do
still go into the recentchanges table. But I'm afraid I just lost several days
of recent changes data by running a maintenance script.

I will continue to investigate as time allows.