Page MenuHomePhabricator

Segfault in strtr()
Closed, ResolvedPublic

Description

Reproduction case:

<?php
$map = array(
  'ﭐ' => 'ٱ', 'ﭑ' => 'ٱ', 'ﭒ' => 'ٻ', 'ﭓ' => 'ٻ', 'ﭔ' => 'ٻ', 'ﭕ' => 'ٻ', 'ﭖ' => 'پ',
  'ﭗ' => 'پ', 'ﭘ' => 'پ', 'ﭙ' => 'پ', 'ﭚ' => 'ڀ', 'ﭛ' => 'ڀ', 'ﭜ' => 'ڀ', 'ﭝ' => 'ڀ',
  'ﭞ' => 'ٺ', 'ﭟ' => 'ٺ', 'ﭠ' => 'ٺ', 'ﭡ' => 'ٺ', 'ﭢ' => 'ٿ', 'ﭣ' => 'ٿ', 'ﭤ' => 'ٿ',
  'ﭥ' => 'ٿ', 'ﭦ' => 'ٹ', 'ﭧ' => 'ٹ', 'ﭨ' => 'ٹ', 'ﭩ' => 'ٹ', 'ﭪ' => 'ڤ', 'ﭫ' => 'ڤ',
  'ﭬ' => 'ڤ', 'ﭭ' => 'ڤ', 'ﭮ' => 'ڦ', 'ﭯ' => 'ڦ', 'ﭰ' => 'ڦ', 'ﭱ' => 'ڦ', 'ﭲ' => 'ڄ',
  'ﭳ' => 'ڄ', 'ﭴ' => 'ڄ', 'ﭵ' => 'ڄ', 'ﭶ' => 'ڃ', 'ﭷ' => 'ڃ', 'ﭸ' => 'ڃ', 'ﭹ' => 'ڃ',
  'ﭺ' => 'چ', 'ﭻ' => 'چ', 'ﭼ' => 'چ', 'ﭽ' => 'چ', 'ﭾ' => 'ڇ', 'ﭿ' => 'ڇ', 'ﮀ' => 'ڇ',
  'ﮁ' => 'ڇ', 'ﮂ' => 'ڍ', 'ﮃ' => 'ڍ', 'ﮄ' => 'ڌ', 'ﮅ' => 'ڌ', 'ﮆ' => 'ڎ', 'ﮇ' => 'ڎ',
  'ﮈ' => 'ڈ', 'ﮉ' => 'ڈ', 'ﮊ' => 'ژ', 'ﮋ' => 'ژ', 'ﮌ' => 'ڑ', 'ﮍ' => 'ڑ', 'ﮎ' => 'ک',
  'ﮏ' => 'ک', 'ﮐ' => 'ک',
);
strtr( 'ab', $map );

I haven't isolated it further, but it only happens when count( $map ) > 64 and when the subject string is two characters long or fewer.

Event Timeline

ori raised the priority of this task from to Needs Triage.
ori updated the task description. (Show Details)
ori added a project: HHVM.
ori added subscribers: ori, EBernhardson.

almost certainly the new algo, will poke at it.

ori assigned this task to EBernhardson.
ori set Security to None.

Still happening:

Host: mw1254
ProcessID: 7147
ThreadID: 7f82457ff700
ThreadPID: 11838
Name: unknown program
Type: Segmentation fault
Runtime: hhvm
Version: 1440023731_994787068
DebuggerCount: 0

Server: en.wikipedia.org
ThreadType: Web Request
Server_SERVER_NAME: en.wikipedia.org
URL: /wiki/BCH_code

# 0  std::_Function_base::_Base_manager<void (*)(char const*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation) at /usr/bin/hhvm:0
# 1  HPHP::StringData::releaseDataSlowPath() at /usr/bin/hhvm:0
# 2  std::_Sp_counted_ptr<HPHP::WuManberReplacement*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() at /usr/bin/hhvm:0
# 3  tbb::interface5::concurrent_hash_map<long, HPHP::ConcurrentLRUCache<long, std::shared_ptr<HPHP::WuManberReplacement>, tbb::tbb_hash_compare<long> >::HashMapValue, tbb::tbb_hash_compare<long>, tbb::tbb_allocator<std::pair<long, HPHP::ConcurrentLRUCache<long, std::shared_ptr<HPHP::WuManberReplacement>, tbb::tbb_hash_compare<long> >::HashMapValue> > >::exclude(tbb::interface5::concurrent_hash_map<long, HPHP::ConcurrentLRUCache<long, std::shared_ptr<HPHP::WuManberReplacement>, tbb::tbb_hash_compare<long> >::HashMapValue, tbb::tbb_hash_compare<long>, tbb::tbb_allocator<std::pair<long, HPHP::ConcurrentLRUCache<long, std::shared_ptr<HPHP::WuManberReplacement>, tbb::tbb_hash_compare<long> >::HashMapValue> > >::const_accessor&) at /usr/bin/hhvm:0
# 4  HPHP::ConcurrentLRUCache<long, std::shared_ptr<HPHP::WuManberReplacement>, tbb::tbb_hash_compare<long> >::evict() at /usr/bin/hhvm:0
# 5  HPHP::ConcurrentLRUCache<long, std::shared_ptr<HPHP::WuManberReplacement>, tbb::tbb_hash_compare<long> >::insert(long const&, std::shared_ptr<HPHP::WuManberReplacement> const&) at /usr/bin/hhvm:0
# 6  HPHP::f_strtr(HPHP::String const&, HPHP::Variant const&, HPHP::Variant const&) at /usr/bin/hhvm:0
# 7  HPHP::jit::x64::BackEnd::enterTCHelper(unsigned char*, HPHP::ActRec*) at /usr/bin/hhvm:0
# 8  HPHP::jit::MCGenerator::enterTC(unsigned char*, HPHP::ActRec*) at /usr/bin/hhvm:0
# 9  HPHP::ExecutionContext::invokeFunc(HPHP::TypedValue*, HPHP::Func const*, HPHP::Variant const&, HPHP::ObjectData*, HPHP::Class*, HPHP::VarEnv*, HPHP::StringData*, HPHP::ExecutionContext::InvokeFlags) at /usr/bin/hhvm:0
# 10 HPHP::ExecutionContext::invokeUnit(HPHP::TypedValue*, HPHP::Unit const*) at /usr/bin/hhvm:0
# 11 HPHP::PackedArray::ToMixedCopyReserve(HPHP::ArrayData const*, unsigned long) at /usr/bin/hhvm:0
# 12 HPHP::include_impl_invoke(HPHP::String const&, bool, char const*) at /usr/bin/hhvm:0
# 13 HPHP::hphp_invoke(HPHP::ExecutionContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, HPHP::Array const&, HPHP::VRefParamValue const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool, bool, bool) at /usr/bin/hhvm:0
# 14 HPHP::HttpRequestHandler::executePHPRequest(HPHP::Transport*, HPHP::RequestURI&, HPHP::SourceRootInfo&, bool) at /usr/bin/hhvm:0
# 15 HPHP::HttpRequestHandler::handleRequest(HPHP::Transport*) at /usr/bin/hhvm:0
# 16 HPHP::ServerWorker<std::shared_ptr<HPHP::FastCGIJob>, HPHP::FastCGITransportTraits>::doJobImpl(std::shared_ptr<HPHP::FastCGIJob>, bool) at /usr/bin/hhvm:0
# 17 HPHP::ServerWorker<std::shared_ptr<HPHP::FastCGIJob>, HPHP::FastCGITransportTraits>::doJob(std::shared_ptr<HPHP::FastCGIJob>) at /usr/bin/hhvm:0
# 18 HPHP::JobQueueWorker<std::shared_ptr<HPHP::FastCGIJob>, HPHP::Server*, true, false, HPHP::JobQueueDropVMStack>::start() at /usr/bin/hhvm:0
# 19 HPHP::AsyncFuncImpl::ThreadFunc(void*) at /usr/bin/hhvm:0
# 20 HPHP::start_routine_wrapper(void*) at /usr/bin/hhvm:0
# 21 start_thread at /build/buildd/eglibc-2.19/nptl/pthread_create.c:312
# 22 clone at /build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Full PHP trace (1.1Mb):

From the backtrace it appears that this happens when attempting to insert a value into the pattern cache when the pattern cache is full, causing an existing item to be evicted. The cache size is pretty small -- is it possible that the item being evicted is still referenced by another request which is being executed at the same time?

even if evicted i think the shared_ptr semantics will ensure nothing funny happens there. I could be improperly using the ConcurrentLRUCache though. Its the same data structure that backs APC but i could have easily gotten something wrong. will look around a bit.

You should be careful assigning the result of dereferencing a ConstAccessor to a non-const shared_ptr. That pointer is shared between threads now, so initTables() could be run more than once. prefix.reserve() doesn't change the size, so you have a race condition window open from the "prefix.size() == 0" check until the first prefix.push_back() call.

Maybe you could move the initTables() call to the constructor, and call it unconditionally, instead of deferring it, that way there would be no race condition. Then make "replacer" a const variable, and make translate() a const function, so that any other inappropriate shared writes will be flagged by the compiler.

I'm not sure how the memcmp() avoids reading past the end of "source". The inequality only seems to prevent it from reading past the end of "pat", and the lastpos check would only prevent overflow if all the patterns were the same size. There is the hash check, but the hash is only 16 bits so presumably collides all the time.

Speaking of hash collisions, I see you are using a 64-bit non-cryptographic hash as the cache key, which seems a bit dangerous. What if the cache holds trusted HTML for one module and untrusted user input for another? There is PHP_SHA1Init() etc., or you could use the whole array as the key and define custom hash and equality functions.

noticed the problem with initTables() when ori first noticed this problem, applied a similar fix[1] in my initial attempt to fix this which ori put into prod last a little over a week ago. I only moved the call and didn't follow up with the rest of your suggestions, I will now.

I'm pretty sure the first part of the conditional, pnr->pat.size() > source.size() - pos, prevents overrunning source as intended. if source.size() is 100 and pos is 96 we will skip the memcmp and go straight to continue for all patterns longer than 4 bytes.

The hashes are small as they are used to index into a table. It's almost always a hash of either a 2 or 3 byte prefix/suffix of the pattern (maximum is set by the shortest pattern length). In the case of the function with memcmp that is indexing into a 10 bit table, so on our larger pattern sets (~15k zh2hant) there is a good bit of collision. The size of the tables is a trade off between quick initialization with small pattern sets and efficient operation with larger ones (and memory usage). It would actually be nice to refactor so that those work on a bounded sliding scale based on the number of patterns but i haven't had the time.

Good point about not using murmur for the hash function. If we upgrade to a cryptographic hash i wonder if it will make the LRU less worthwhile. As is it saves ~75% of runtime on shorter strings with larger pattern sets. Most of that remaining runtime is the hashing time. I'm wonder if the LRU cache is truly worthwhile.

For the best efficiency this LRU isn't our best bet anyways, the hashing itself takes a significant portion of the runtime. The best efficiency would be found by caching the WuManberReplacement by a user definable string. We might be better off creating a custom extension (except, this was supposed to kill a custom extension...) the holds the WuManberReplacement ptr within a php object(or resource? probably object), and fetch/store the cached values directly in APC. If i understand APC correctly in HHVM in takes advantage of COW and just directly stores the variable representations. I'm not sure how exactly to go about that (although i'm sure can figure it out). More so though i'm not sure we really want to change from fss to yet another custom extension.

Ideas/thoughts?

[1] https://github.com/facebook/hhvm/pull/6071

Good point about not using murmur for the hash function. If we upgrade to a cryptographic hash i wonder if it will make the LRU less worthwhile. As is it saves ~75% of runtime on shorter strings with larger pattern sets. Most of that remaining runtime is the hashing time. I'm wonder if the LRU cache is truly worthwhile.

Using a cryptographic hash is a simple but inefficient solution. The efficient way to do it is to use the whole array as the key, and to cache the murmur hash value in the key object, like what I did in LRUCacheKey. That way you only compute the hash once per insert/fetch. Using the same underlying storage for the key and value avoids copying. The additional overhead would then just be an equality check on fetch, consisting of a pair of string comparisons for each array element.

And as I type that, the words "avoids copying" are reverberating in my head...

You are just inserting String objects into the shared cache, without copying. If the string is dynamically constructed, not a string literal, then the underlying StringData will be allocated from the request-local allocator and it will be destroyed at the end of the request. So that is a bug and at least explains Ori's backtrace above.

I think the best solution for this is to copy. In theory you could save a few cycles for string literals and strings fetched from APC by doing something analogous to APCHandle::Create(), but it doesn't seem worth the effort when the replacement strings are just going to be copied into the destination string. You can use std::string or whatever.

For the best efficiency this LRU isn't our best bet anyways, the hashing itself takes a significant portion of the runtime. The best efficiency would be found by caching the WuManberReplacement by a user definable string. We might be better off creating a custom extension (except, this was supposed to kill a custom extension...) the holds the WuManberReplacement ptr within a php object(or resource? probably object), and fetch/store the cached values directly in APC. If i understand APC correctly in HHVM in takes advantage of COW and just directly stores the variable representations. I'm not sure how exactly to go about that (although i'm sure can figure it out). More so though i'm not sure we really want to change from fss to yet another custom extension.

I think this would be justified only if profiling clearly shows that hashing is an important user-visible performance issue.

Looks like tim was on the right track for the crasher. This script consistently segfaults hhvm on the 11th web request after a restart:

<?php
$alphabet = 'ﭐٱﭑٱﭒٻﭓٻﭔٻﭕٻﭖپ ﭗپﭘپﭙپﭚڀﭛڀﭜڀﭝڀ ﭞٺﭟٺﭠٺﭡٺﭢٿﭣٿﭤٿ ﭥٿﭦٹﭧٹﭨٹﭩٹﭪڤﭫڤ ﭬڤﭭڤﭮڦﭯڦﭰڦﭱڦﭲڄ ﭳڄﭴڄﭵڄﭶڃﭷڃﭸڃﭹڃ ﭺچﭻچﭼچﭽچﭾڇﭿڇﮀڇ ﮁڇﮂڍﮃڍﮄڌﮅڌﮆڎﮇڎ ﮈڈﮉڈﮊژﮋژﮌڑﮍڑﮎک ﮏکﮐک';

$map = array();
$alphaLen = strlen( $alphabet );
for( $i = 0; $i < 1001; ++$i ) {
        $len = mt_rand( 3, $alphaLen );
        $key = substr( str_shuffle( $alphabet ), 0, $len );
        $map[$key] = $key;
}
strtr( implode( '', array_keys( $map ) ), $map );
echo ".";

After switching from String to std::string this problem looks to be solved. I have made most of tim's suggested updates here: https://github.com/ebernhardson/hiphop-php/commit/dd8fe894be6f3a784c287ce7af4e39e39863e29c

This is still missing the LRU cache key handing adjustments, i'm pretty sure i follow what you are suggesting now after looking over LRUCacheKey I didn't have time to work that out tonight though.

This has been closed by I don't see any commit to the HHVM repository and just a version of HHVM in production (whitout the corresponding source, so I can't even backport whatever is in there).

@ori did I miss the change open? If not please submit it to git so that I can build a new package.

Joe reassigned this task from EBernhardson to ori.
Joe triaged this task as High priority.

The relevant patch is still stalled at https://reviews.facebook.net/D44973

No issues reported there, its just taking time to do the back and forth dance once we figured out the last part of the problem.

ori reassigned this task from ori to EBernhardson.

just to follow up, this was merged to HHVM today.

And segfault in same place in strtr inside ReplacementArray is now with HHVM 3.10.0, see https://github.com/facebook/hhvm/issues/6384.

I am not sure whether this is exactly the same issue or just similar.

I advice against updating to HHVM 3.10.0 (caused us some downtime) until the issue is fixed.

It is the same issue, and jwatzman is pushing out a 3.10.1 soon.