Page MenuHomePhabricator

Lucene search 2 uses GNU-specific cp options when snapshotting indexes
Closed, ResolvedPublic

Description

Testing a local copy of the lucene-search-2 module on my MacBook, I noticed this error dropping out while building the indexes:

2383 [main] INFO org.wikimedia.lsearch.index.IndexThread - Making snapshot for trunkwiki
2412 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 64 while executing /bin/cp -lr /opt/web/lsearch/import/trunkwiki/_8t.cfs /opt/web/lsearch/snapshot/trunkwiki/20070828164041/_8t.cfs
2414 [main] ERROR org.wikimedia.lsearch.index.IndexThread - Error making snapshot /opt/web/lsearch/snapshot/trunkwiki/20070828164041: Error executing command: cp: illegal option -- lusage: cp [-R [-H | -L | -P]] [-f | -i | -n] [-pv] src target cp [-R [-H | -L | -P]] [-f | -i | -n] [-pv] src1 ... srcN directory

The -l option to cp seems to be a GNU extension, and isn't present in the Mac OS X version of cp. Needless to say, /bin/cp may also not be present on non-Unix systems. ;)

cp -lr is also used in UpdateThread, as is rm -rf and /usr/bin/rsync (if rsync is in use).

It might be wise to make the snapshots in a more portable way, using portable file-manipulation functions. The rsync path should be customizable; in theory it might be nice if it runs on Windows as well as name-brand *nixes.


Version: unspecified
Severity: enhancement

Details

Reference
bz11103

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:51 PM
bzimport set Reference to bz11103.
bzimport added a subscriber: Unknown Object (MLST).

rainman wrote:

Currently lsearch requires linux. It could work on MacOS since it
afaik support hard and soft links, with some minor modification.

Windows is more difficult, one would probably need something like
info files that would describe the links, or just use a slower
copy, instead of linking.

So, the above cp would be: cp + ln. Hopefully, in some next release
it should have customizable paths, and should be able to detect
the OS, and use the most efficient solution for it.

Creating the destination directory, then shelling out to ln to link each file should be compatible with any UNIXy system (Linux, Mac OS X, *BSD, Solaris, etc) and not a very difficult change, so that'd be my recommendation for a short-term change.

Windows does support hardlinks of some sort on NTFS, though I couldn't tell you if it's worth the bother. :)

I did a little googling yesterday for Java code supporting creation of hardlinks; this util class from an Apache project has a function which allegedly can create hardlinks on either Unix or Windows by shelling out the appropriate low-level command:

http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/fs/FileUtil.java?view=markup

Haven't tested it meself. :)

robchur wrote:

(In reply to comment #2)

Windows does support hardlinks of some sort on NTFS, though I couldn't tell you
if it's worth the bother. :)

NTFS Junction Points? No, it's not worth the bother, and there's no standard toolkit to create or manage them - Windows uses them internally, and exposes them via Win32, but doesn't ship with any kind of command-line utility comparable to ln.

*ahem*

C:\Documents and Settings\brion> fsutil hardlink create bar.txt foo.txt
Hardlink created for C:\Documents and Settings\brion\bar.txt <<===>> C:\Documents and Settings\brion\foo.txt

:)

robchur wrote:

Well, in my defence, what I said was more or less accurate; that's not an NTFS Junction Point at all. What's actually indefensible here is that I read "hardlink" and thought of the (dodgy) equivalent of a symlink.

Points off all round.

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]