diff mbox

OSD memory usage during startup - advice needed

Message ID 564E316A.8050601@beyond.pl (mailing list archive)
State New, archived
Headers show

Commit Message

Marcin Gibu?a Nov. 19, 2015, 8:30 p.m. UTC
> Judging from debug output, the problem is in journal recovery, when it
> tries to delete object with huge (several milion keys - it is radosgw
> index* for bucket with over 50mln objects) amount of keys, using
> leveldb's rmkeys_by_prefix() method.
>
> Looking at the source code, rmkeys_by_prefix() batches all operations
> into one list and then submit_transaction() executes them all atomically.
>
> I'd love to write a patch for this issue, but it seems unfixable (or is
> it?) with current API and method behaviour. Could you offer any advice
> on how to proceed?

Answering myself, could anyone verify if attached patch looks ok? Should 
reduce memory footprint a bit.

When I first read this code, I assumed that data pointed by 
leveldb::Slice have to be reachable until db->Write is called.

However, looking into leveldb and into its source code, there is no such 
requirement - leveldb makes its own copy of key, so we're effectivly 
doubling memory footprint for no reason.

Comments

Samuel Just Nov. 19, 2015, 10:26 p.m. UTC | #1
Actually, looks like Xiaoxi beat you to it for infernalis!
42a3ab95ec459042e92198fb061c8393146bd1b4
-Sam

On Thu, Nov 19, 2015 at 12:30 PM, Marcin Gibu?a <m.gibula@beyond.pl> wrote:
>> Judging from debug output, the problem is in journal recovery, when it
>> tries to delete object with huge (several milion keys - it is radosgw
>> index* for bucket with over 50mln objects) amount of keys, using
>> leveldb's rmkeys_by_prefix() method.
>>
>> Looking at the source code, rmkeys_by_prefix() batches all operations
>> into one list and then submit_transaction() executes them all atomically.
>>
>> I'd love to write a patch for this issue, but it seems unfixable (or is
>> it?) with current API and method behaviour. Could you offer any advice
>> on how to proceed?
>
>
> Answering myself, could anyone verify if attached patch looks ok? Should
> reduce memory footprint a bit.
>
> When I first read this code, I assumed that data pointed by leveldb::Slice
> have to be reachable until db->Write is called.
>
> However, looking into leveldb and into its source code, there is no such
> requirement - leveldb makes its own copy of key, so we're effectivly
> doubling memory footprint for no reason.
>
> --
> mg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/src/os/LevelDBStore.cc
+++ b/src/os/LevelDBStore.cc
@@ -156,9 +156,8 @@  void LevelDBStore::LevelDBTransactionImpl::set(
   buffers.push_back(to_set_bl);
   bufferlist &bl = *(buffers.rbegin());
   string key = combine_strings(prefix, k);
-  keys.push_back(key);
-  bat.Delete(leveldb::Slice(*(keys.rbegin())));
-  bat.Put(leveldb::Slice(*(keys.rbegin())),
+  bat.Delete(leveldb::Slice(key));
+  bat.Put(leveldb::Slice(key),
          leveldb::Slice(bl.c_str(), bl.length()));
 }

@@ -166,8 +165,7 @@  void LevelDBStore::LevelDBTransactionImpl::rmkey(const string &prefix,
                                                 const string &k)
 {
   string key = combine_strings(prefix, k);
-  keys.push_back(key);
-  bat.Delete(leveldb::Slice(*(keys.rbegin())));
+  bat.Delete(leveldb::Slice(key));
 }

 void LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(const string &prefix)
@@ -177,8 +175,7 @@  void LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(const string &prefix
        it->valid();
        it->next()) {
     string key = combine_strings(prefix, it->key());
-    keys.push_back(key);
-    bat.Delete(*(keys.rbegin()));
+    bat.Delete(key);
   }
 }

diff --git a/src/os/LevelDBStore.h b/src/os/LevelDBStore.h
index 4617c5c..dd248dd 100644
--- a/src/os/LevelDBStore.h
+++ b/src/os/LevelDBStore.h
@@ -175,7 +175,6 @@  public:
   public:
     leveldb::WriteBatch bat;
     list<bufferlist> buffers;
-    list<string> keys;
     LevelDBStore *db;

     LevelDBTransactionImpl(LevelDBStore *db) : db(db) {}