From patchwork Mon Apr 27 15:50:13 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dan van der Ster X-Patchwork-Id: 6281651 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id DE7ECBEEEF for ; Mon, 27 Apr 2015 15:59:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D5675204B5 for ; Mon, 27 Apr 2015 15:51:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5EBCF204AE for ; Mon, 27 Apr 2015 15:51:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753182AbbD0Pu7 (ORCPT ); Mon, 27 Apr 2015 11:50:59 -0400 Received: from mail-lb0-f181.google.com ([209.85.217.181]:35196 "EHLO mail-lb0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752321AbbD0Pu6 convert rfc822-to-8bit (ORCPT ); Mon, 27 Apr 2015 11:50:58 -0400 Received: by lbbuc2 with SMTP id uc2so85540960lbb.2 for ; Mon, 27 Apr 2015 08:50:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vanderster.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=yPqlbEoOZ5w3QzaL1WsLq51mWM/221+Lcpm/xD4JKL4=; b=OJcF6dNmYml6kt4YL2pNlJLlYvgewJbewpHJigSmzv3kYcAmyNJgIB+SDUgjnUeacy OEksP/ozEs5Or6O77mHSRVdyCqmEwyiIG8ZdppieP2kco3j4qF3CXowibCNzEN943eFY LCGwLhc6rPsDwZh4wKngk9/Yww329K0s6lVA8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding; bh=yPqlbEoOZ5w3QzaL1WsLq51mWM/221+Lcpm/xD4JKL4=; b=W3zwqULegRtSbAI7QkySBaDt1gb3pL+Z0S+fQVCJM2C1HvvegmzItkmofvDxHO3jF+ 5t+Oe/5N38xC8WX/rudF9Re4QpgCHJTWZ1/zOhKtMYT9JGkR5YFANvI6YucGTA+A2eqG fDqkpKX4ZoNJ2YZz0bV4Gf0O3JOhjUJ+zfb2k3hDGKr0Y9LRyMAVDebaPDDrdl14nQx7 sGvYdrDiZR0PNGqZdNVzEWWqq4d2+FqIyQdcQGLk9XRdL0XZzu8T8DCHZ3NRH3MqN/b4 Fd4F717NZWUkkpDK3ZMPhf9nc1dqaW8ZximjN7b5VC1rcKR04HaakL+thKzD2iX2l6PO XMGw== X-Gm-Message-State: ALoCoQkE7e8jH0mJhsvhoYS/MNxX09b5TEkykcT9/yPI5psRfF0VgQxPA04Hb/XYj8wdOw01bbzL X-Received: by 10.112.130.129 with SMTP id oe1mr10442692lbb.37.1430149856154; Mon, 27 Apr 2015 08:50:56 -0700 (PDT) Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com. [209.85.217.177]) by mx.google.com with ESMTPSA id jl4sm4856444lbc.14.2015.04.27.08.50.54 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Apr 2015 08:50:54 -0700 (PDT) Received: by lbcga7 with SMTP id ga7so85686808lbc.1 for ; Mon, 27 Apr 2015 08:50:53 -0700 (PDT) X-Received: by 10.112.169.42 with SMTP id ab10mr10647151lbc.3.1430149853932; Mon, 27 Apr 2015 08:50:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.211.200 with HTTP; Mon, 27 Apr 2015 08:50:13 -0700 (PDT) In-Reply-To: References: <1232683114.515741917.1429692005478.JavaMail.zimbra@oxygem.tv> <1270073308.621430804.1429893511370.JavaMail.zimbra@oxygem.tv> <553A8527.3020503@redhat.com> <1416266012.633995646.1429937143054.JavaMail.zimbra@oxygem.tv> <94521284.683547748.1430110881936.JavaMail.zimbra@oxygem.tv> <1545533798.684536916.1430113581382.JavaMail.zimbra@oxygem.tv> <195300554.685766176.1430116301190.JavaMail.zimbra@oxygem.tv> From: Dan van der Ster Date: Mon, 27 Apr 2015 17:50:13 +0200 Message-ID: Subject: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops To: Sage Weil Cc: Alexandre DERUMIER , ceph-users , ceph-devel , Venkateswara Rao Jujjuri , Milosz Tanski Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Sage, Alexandre et al. Here's another data point... we noticed something similar awhile ago. After we restart our OSDs the "4kB object write latency" [1] temporarily drops from ~8-10ms down to around 3-4ms. Then slowly over time the latency increases back to 8-10ms. The time that the OSDs stay with low latency is a function of how much work those OSDs are doing (i.e. on our idle test cluster, they stay with low latency for a couple hours; on our production cluster the latency is high again pretty much immediately). We also attributed this to the tcmalloc::ThreadCache::ReleaseToCentralCache issue, since that function is always very high %-wise in perf top. And finally today we managed to get the fixed tcmalloc [2] on our el6 servers and tried the larger cache. And as we expected, with 128M cache size [3] the latency is staying low (actually below 3ms on the test cluster vs 9ms earlier today). We should probably send a patched init script option to make this configurable. Cheers, Dan [1] rados bench -p test -b 4096 -t 1 [2] rpmbuild --rebuild https://kojipkgs.fedoraproject.org//packages/gperftools/2.4/1.fc23/src/gperftools-2.4-1.fc23.src.rpm [3] if [ $dofsmount -eq 1 ] && [ -n "$fs_devs" ]; then On Mon, Apr 27, 2015 at 5:27 PM, Sage Weil wrote: > On Mon, 27 Apr 2015, Alexandre DERUMIER wrote: >> >>If I want to use librados API for performance testing, are there any >> >>existing benchmark tools which directly accesses librados (not through >> >>rbd or gateway) >> >> you can use "rados bench" from ceph packages >> >> http://ceph.com/docs/master/man/8/rados/ >> >> " >> bench seconds mode [ -b objsize ] [ -t threads ] >> Benchmark for seconds. The mode can be write, seq, or rand. seq and rand are read benchmarks, either sequential or random. Before running one of the reading benchmarks, run a write benchmark with the ?no-cleanup option. The default object size is 4 MB, and the default number of simulated threads (parallel writes) is 16. >> " > > This one creates whole objects. You might also look at ceph_smalliobench > (in the ceph-tests package) which is a bit more featureful but less > friendly to use. > > Also, fio has an rbd driver. > > sage > > >> >> >> ----- Mail original ----- >> De: "Venkateswara Rao Jujjuri" >> À: "aderumier" >> Cc: "Mark Nelson" , "ceph-users" , "ceph-devel" , "Milosz Tanski" >> Envoyé: Lundi 27 Avril 2015 08:12:49 >> Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops >> >> If I want to use librados API for performance testing, are there any >> existing benchmark tools which directly accesses librados (not through >> rbd or gateway) >> >> Thanks in advance, >> JV >> >> On Sun, Apr 26, 2015 at 10:46 PM, Alexandre DERUMIER >> wrote: >> >>>I'll retest tcmalloc, because I was prety sure to have patched it correctly. >> > >> > Ok, I really think I have patched tcmalloc wrongly. >> > I have repatched it, reinstalled it, and now I'm getting 195k iops with a single osd (10fio rbd jobs 4k randread). >> > >> > So better than jemalloc. >> > >> > >> > ----- Mail original ----- >> > De: "aderumier" >> > À: "Mark Nelson" >> > Cc: "ceph-users" , "ceph-devel" , "Milosz Tanski" >> > Envoyé: Lundi 27 Avril 2015 07:01:21 >> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops >> > >> > Hi, >> > >> > also another big difference, >> > >> > I can reach now 180k iops with a single jemalloc osd (data in buffer) vs 50k iops max with tcmalloc. >> > >> > I'll retest tcmalloc, because I was prety sure to have patched it correctly. >> > >> > >> > ----- Mail original ----- >> > De: "aderumier" >> > À: "Mark Nelson" >> > Cc: "ceph-users" , "ceph-devel" , "Milosz Tanski" >> > Envoyé: Samedi 25 Avril 2015 06:45:43 >> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops >> > >> >>>We haven't done any kind of real testing on jemalloc, so use at your own >> >>>peril. Having said that, we've also been very interested in hearing >> >>>community feedback from folks trying it out, so please feel free to give >> >>>it a shot. :D >> > >> > Some feedback, I have runned bench all the night, no speed regression. >> > >> > And I have a speed increase with fio with more jobs. (with tcmalloc, it seem to be the reverse) >> > >> > with tcmalloc : >> > >> > 10 fio-rbd jobs = 300k iops >> > 15 fio-rbd jobs = 290k iops >> > 20 fio-rbd jobs = 270k iops >> > 40 fio-rbd jobs = 250k iops >> > >> > (all with up and down values during the fio bench) >> > >> > >> > with jemalloc: >> > >> > 10 fio-rbd jobs = 300k iops >> > 15 fio-rbd jobs = 320k iops >> > 20 fio-rbd jobs = 330k iops >> > 40 fio-rbd jobs = 370k iops (can get more currently, only 1 client machine with 20cores 100%) >> > >> > (all with contant values during the fio bench) >> > >> > ----- Mail original ----- >> > De: "Mark Nelson" >> > À: "Stefan Priebe" , "aderumier" >> > Cc: "ceph-users" , "ceph-devel" , "Somnath Roy" , "Milosz Tanski" >> > Envoyé: Vendredi 24 Avril 2015 20:02:15 >> > Objet: Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops >> > >> > We haven't done any kind of real testing on jemalloc, so use at your own >> > peril. Having said that, we've also been very interested in hearing >> > community feedback from folks trying it out, so please feel free to give >> > it a shot. :D >> > >> > Mark >> > >> > On 04/24/2015 12:36 PM, Stefan Priebe - Profihost AG wrote: >> >> Is jemalloc recommanded in general? Does it also work for firefly? >> >> >> >> Stefan >> >> >> >> Excuse my typo sent from my mobile phone. >> >> >> >> Am 24.04.2015 um 18:38 schrieb Alexandre DERUMIER > >> >: >> >> >> >>> Hi, >> >>> >> >>> I have finished to rebuild ceph with jemalloc, >> >>> >> >>> all seem to working fine. >> >>> >> >>> I got a constant 300k iops for the moment, so no speed regression. >> >>> >> >>> I'll do more long benchmark next week. >> >>> >> >>> Regards, >> >>> >> >>> Alexandre >> >>> >> >>> ----- Mail original ----- >> >>> De: "Irek Fasikhov" > >> >>> À: "Somnath Roy" > >>> > >> >>> Cc: "aderumier" >, >> >>> "Mark Nelson" >, >> >>> "ceph-users" > >>> >, "ceph-devel" >> >>> >, >> >>> "Milosz Tanski" > >> >>> Envoyé: Vendredi 24 Avril 2015 13:37:52 >> >>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd >> >>> daemon improve performance from 100k iops to 300k iops >> >>> >> >>> Hi,Alexandre! >> >>> Do not try to change the parameter vm.min_free_kbytes? >> >>> >> >>> 2015-04-23 19:24 GMT+03:00 Somnath Roy < Somnath.Roy@sandisk.com >> >>> > : >> >>> >> >>> >> >>> Alexandre, >> >>> You can configure with --with-jemalloc or ./do_autogen -J to build >> >>> ceph with jemalloc. >> >>> >> >>> Thanks & Regards >> >>> Somnath >> >>> >> >>> -----Original Message----- >> >>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com >> >>> ] On Behalf Of Alexandre >> >>> DERUMIER >> >>> Sent: Thursday, April 23, 2015 4:56 AM >> >>> To: Mark Nelson >> >>> Cc: ceph-users; ceph-devel; Milosz Tanski >> >>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd >> >>> daemon improve performance from 100k iops to 300k iops >> >>> >> >>>>> If you have the means to compile the same version of ceph with >> >>>>> jemalloc, I would be very interested to see how it does. >> >>> >> >>> Yes, sure. (I have around 3-4 weeks to do all the benchs) >> >>> >> >>> But I don't know how to do it ? >> >>> I'm running the cluster on centos7.1, maybe it can be easy to patch >> >>> the srpms to rebuild the package with jemalloc. >> >>> >> >>> >> >>> >> >>> ----- Mail original ----- >> >>> De: "Mark Nelson" < mnelson@redhat.com > >> >>> À: "aderumier" < aderumier@odiso.com >, >> >>> "Srinivasula Maram" < Srinivasula.Maram@sandisk.com >> >>> > >> >>> Cc: "ceph-users" < ceph-users@lists.ceph.com >> >>> >, "ceph-devel" < >> >>> ceph-devel@vger.kernel.org >, >> >>> "Milosz Tanski" < milosz@adfin.com > >> >>> Envoyé: Jeudi 23 Avril 2015 13:33:00 >> >>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd >> >>> daemon improve performance from 100k iops to 300k iops >> >>> >> >>> Thanks for the testing Alexandre! >> >>> >> >>> If you have the means to compile the same version of ceph with >> >>> jemalloc, I would be very interested to see how it does. >> >>> >> >>> In some ways I'm glad it turned out not to be NUMA. I still suspect we >> >>> will have to deal with it at some point, but perhaps not today. ;) >> >>> >> >>> Mark >> >>> >> >>> On 04/23/2015 05:58 AM, Alexandre DERUMIER wrote: >> >>>> Maybe it's tcmalloc related >> >>>> I thinked to have patched it correctly, but perf show a lot of >> >>>> tcmalloc::ThreadCache::ReleaseToCentralCache >> >>>> >> >>>> before osd restart (100k) >> >>>> ------------------ >> >>>> 11.66% ceph-osd libtcmalloc.so.4.1.2 [.] >> >>>> tcmalloc::ThreadCache::ReleaseToCentralCache >> >>>> 8.51% ceph-osd libtcmalloc.so.4.1.2 [.] >> >>>> tcmalloc::CentralFreeList::FetchFromSpans >> >>>> 3.04% ceph-osd libtcmalloc.so.4.1.2 [.] >> >>>> tcmalloc::CentralFreeList::ReleaseToSpans >> >>>> 2.04% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.63% swapper >> >>>> [kernel.kallsyms] [k] intel_idle 1.35% ceph-osd libtcmalloc.so.4.1.2 >> >>>> [.] tcmalloc::CentralFreeList::ReleaseListToSpans >> >>>> 1.33% ceph-osd libtcmalloc.so.4.1.2 [.] operator delete 1.07% ceph-osd >> >>>> libstdc++.so.6.0.19 [.] std::basic_string> >>>> std::char_traits, std::allocator >::basic_string 0.91% >> >>>> ceph-osd libpthread-2.17.so [.] pthread_mutex_trylock 0.88% ceph-osd >> >>>> libc-2.17.so [.] __memcpy_ssse3_back 0.81% ceph-osd ceph-osd [.] >> >>>> Mutex::Lock 0.79% ceph-osd [kernel.kallsyms] [k] >> >>>> copy_user_enhanced_fast_string 0.74% ceph-osd libpthread-2.17.so [.] >> >>>> pthread_mutex_unlock 0.67% ceph-osd [kernel.kallsyms] [k] >> >>>> _raw_spin_lock 0.63% swapper [kernel.kallsyms] [k] >> >>>> native_write_msr_safe 0.62% ceph-osd [kernel.kallsyms] [k] >> >>>> avc_has_perm_noaudit 0.58% ceph-osd ceph-osd [.] operator< 0.57% >> >>>> ceph-osd [kernel.kallsyms] [k] __schedule 0.57% ceph-osd >> >>>> [kernel.kallsyms] [k] __d_lookup_rcu 0.54% swapper [kernel.kallsyms] >> >>>> [k] __schedule >> >>>> >> >>>> >> >>>> after osd restart (300k iops) >> >>>> ------------------------------ >> >>>> 3.47% ceph-osd libtcmalloc.so.4.1.2 [.] operator new 1.92% ceph-osd >> >>>> libtcmalloc.so.4.1.2 [.] operator delete 1.86% swapper >> >>>> [kernel.kallsyms] [k] intel_idle 1.52% ceph-osd libstdc++.so.6.0.19 >> >>>> [.] std::basic_string, >> >>>> std::allocator >::basic_string 1.34% ceph-osd >> >>>> libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache >> >>>> 1.24% ceph-osd libc-2.17.so [.] __memcpy_ssse3_back 1.23% ceph-osd >> >>>> ceph-osd [.] Mutex::Lock 1.21% ceph-osd libpthread-2.17.so [.] >> >>>> pthread_mutex_trylock 1.11% ceph-osd [kernel.kallsyms] [k] >> >>>> copy_user_enhanced_fast_string 0.95% ceph-osd libpthread-2.17.so [.] >> >>>> pthread_mutex_unlock 0.94% ceph-osd [kernel.kallsyms] [k] >> >>>> _raw_spin_lock 0.78% ceph-osd [kernel.kallsyms] [k] __d_lookup_rcu >> >>>> 0.70% ceph-osd [kernel.kallsyms] [k] tcp_sendmsg 0.70% ceph-osd >> >>>> ceph-osd [.] Message::Message 0.68% ceph-osd [kernel.kallsyms] [k] >> >>>> __schedule 0.66% ceph-osd [kernel.kallsyms] [k] idle_cpu 0.65% >> >>>> ceph-osd libtcmalloc.so.4.1.2 [.] >> >>>> tcmalloc::CentralFreeList::FetchFromSpans >> >>>> 0.64% swapper [kernel.kallsyms] [k] native_write_msr_safe 0.61% >> >>>> ceph-osd ceph-osd [.] >> >>>> std::tr1::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release >> >>>> 0.60% swapper [kernel.kallsyms] [k] __schedule 0.60% ceph-osd >> >>>> libstdc++.so.6.0.19 [.] 0x00000000000bdd2b 0.57% ceph-osd ceph-osd [.] >> >>>> operator< 0.57% ceph-osd ceph-osd [.] crc32_iscsi_00 0.56% ceph-osd >> >>>> libstdc++.so.6.0.19 [.] std::string::_Rep::_M_dispose 0.55% ceph-osd >> >>>> [kernel.kallsyms] [k] __switch_to 0.54% ceph-osd libc-2.17.so [.] >> >>>> vfprintf 0.52% ceph-osd [kernel.kallsyms] [k] fget_light >> >>>> >> >>>> ----- Mail original ----- >> >>>> De: "aderumier" < aderumier@odiso.com > >> >>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com >> >>>> > >> >>>> Cc: "ceph-users" < ceph-users@lists.ceph.com >> >>>> >, "ceph-devel" >> >>>> < ceph-devel@vger.kernel.org >, >> >>>> "Milosz Tanski" < milosz@adfin.com > >> >>>> Envoyé: Jeudi 23 Avril 2015 10:00:34 >> >>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd >> >>>> daemon improve performance from 100k iops to 300k iops >> >>>> >> >>>> Hi, >> >>>> I'm hitting this bug again today. >> >>>> >> >>>> So don't seem to be numa related (I have try to flush linux buffer to >> >>>> be sure). >> >>>> >> >>>> and tcmalloc is patched (I don't known how to verify that it's ok). >> >>>> >> >>>> I don't have restarted osd yet. >> >>>> >> >>>> Maybe some perf trace could be usefulll ? >> >>>> >> >>>> >> >>>> ----- Mail original ----- >> >>>> De: "aderumier" < aderumier@odiso.com > >> >>>> À: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com >> >>>> > >> >>>> Cc: "ceph-users" < ceph-users@lists.ceph.com >> >>>> >, "ceph-devel" >> >>>> < ceph-devel@vger.kernel.org >, >> >>>> "Milosz Tanski" < milosz@adfin.com > >> >>>> Envoyé: Mercredi 22 Avril 2015 18:30:26 >> >>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd >> >>>> daemon improve performance from 100k iops to 300k iops >> >>>> >> >>>> Hi, >> >>>> >> >>>>>> I feel it is due to tcmalloc issue >> >>>> >> >>>> Indeed, I had patched one of my node, but not the other. >> >>>> So maybe I have hit this bug. (but I can't confirm, I don't have >> >>>> traces). >> >>>> >> >>>> But numa interleaving seem to help in my case (maybe not from >> >>>> 100->300k, but 250k->300k). >> >>>> >> >>>> I need to do more long tests to confirm that. >> >>>> >> >>>> >> >>>> ----- Mail original ----- >> >>>> De: "Srinivasula Maram" < Srinivasula.Maram@sandisk.com >> >>>> > >> >>>> À: "Mark Nelson" < mnelson@redhat.com >, >> >>>> "aderumier" >> >>>> < aderumier@odiso.com >, "Milosz Tanski" >> >>>> < milosz@adfin.com > >> >>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org >> >>>> >, "ceph-users" >> >>>> < ceph-users@lists.ceph.com > >> >>>> Envoyé: Mercredi 22 Avril 2015 16:34:33 >> >>>> Objet: RE: [ceph-users] strange benchmark problem : restarting osd >> >>>> daemon improve performance from 100k iops to 300k iops >> >>>> >> >>>> I feel it is due to tcmalloc issue >> >>>> >> >>>> I have seen similar issue in my setup after 20 days. >> >>>> >> >>>> Thanks, >> >>>> Srinivas >> >>>> >> >>>> >> >>>> >> >>>> -----Original Message----- >> >>>> From: ceph-users [mailto: ceph-users-bounces@lists.ceph.com >> >>>> ] On Behalf >> >>>> Of Mark Nelson >> >>>> Sent: Wednesday, April 22, 2015 7:31 PM >> >>>> To: Alexandre DERUMIER; Milosz Tanski >> >>>> Cc: ceph-devel; ceph-users >> >>>> Subject: Re: [ceph-users] strange benchmark problem : restarting osd >> >>>> daemon improve performance from 100k iops to 300k iops >> >>>> >> >>>> Hi Alexandre, >> >>>> >> >>>> We should discuss this at the perf meeting today. We knew NUMA node >> >>>> affinity issues were going to crop up sooner or later (and indeed >> >>>> already have in some cases), but this is pretty major. It's probably >> >>>> time to really dig in and figure out how to deal with this. >> >>>> >> >>>> Note: this is one of the reasons I like small nodes with single >> >>>> sockets and fewer OSDs. >> >>>> >> >>>> Mark >> >>>> >> >>>> On 04/22/2015 08:56 AM, Alexandre DERUMIER wrote: >> >>>>> Hi, >> >>>>> >> >>>>> I have done a lot of test today, and it seem indeed numa related. >> >>>>> >> >>>>> My numastat was >> >>>>> >> >>>>> # numastat >> >>>>> node0 node1 >> >>>>> numa_hit 99075422 153976877 >> >>>>> numa_miss 167490965 1493663 >> >>>>> numa_foreign 1493663 167491417 >> >>>>> interleave_hit 157745 167015 >> >>>>> local_node 99049179 153830554 >> >>>>> other_node 167517697 1639986 >> >>>>> >> >>>>> So, a lot of miss. >> >>>>> >> >>>>> In this case , I can reproduce ios going from 85k to 300k iops, up >> >>>>> and down. >> >>>>> >> >>>>> now setting >> >>>>> echo 0 > /proc/sys/kernel/numa_balancing >> >>>>> >> >>>>> and starting osd daemons with >> >>>>> >> >>>>> numactl --interleave=all /usr/bin/ceph-osd >> >>>>> >> >>>>> >> >>>>> I have a constant 300k iops ! >> >>>>> >> >>>>> >> >>>>> I wonder if it could be improve by binding osd daemons to specific >> >>>>> numa node. >> >>>>> I have 2 numanode of 10 cores with 6 osd, but I think it also >> >>>>> require ceph.conf osd threads tunning. >> >>>>> >> >>>>> >> >>>>> >> >>>>> ----- Mail original ----- >> >>>>> De: "Milosz Tanski" < milosz@adfin.com > >> >>>>> À: "aderumier" < aderumier@odiso.com > >> >>>>> Cc: "ceph-devel" < ceph-devel@vger.kernel.org >> >>>>> >, "ceph-users" >> >>>>> < ceph-users@lists.ceph.com > >> >>>>> Envoyé: Mercredi 22 Avril 2015 12:54:23 >> >>>>> Objet: Re: [ceph-users] strange benchmark problem : restarting osd >> >>>>> daemon improve performance from 100k iops to 300k iops >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Wed, Apr 22, 2015 at 5:01 AM, Alexandre DERUMIER < >> >>>>> aderumier@odiso.com > wrote: >> >>>>> >> >>>>> >> >>>>> I wonder if it could be numa related, >> >>>>> >> >>>>> I'm using centos 7.1, >> >>>>> and auto numa balacning is enabled >> >>>>> >> >>>>> cat /proc/sys/kernel/numa_balancing = 1 >> >>>>> >> >>>>> Maybe osd daemon access to buffer on wrong numa node. >> >>>>> >> >>>>> I'll try to reproduce the problem >> >>>>> >> >>>>> >> >>>>> >> >>>>> Can you force the degenerate case using numactl? To either affirm or >> >>>>> deny your suspicion. >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> ----- Mail original ----- >> >>>>> De: "aderumier" < aderumier@odiso.com > >> >>>>> À: "ceph-devel" < ceph-devel@vger.kernel.org >> >>>>> >, "ceph-users" < >> >>>>> ceph-users@lists.ceph.com > >> >>>>> Envoyé: Mercredi 22 Avril 2015 10:40:05 >> >>>>> Objet: [ceph-users] strange benchmark problem : restarting osd daemon >> >>>>> improve performance from 100k iops to 300k iops >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> I was doing some benchmarks, >> >>>>> I have found an strange behaviour. >> >>>>> >> >>>>> Using fio with rbd engine, I was able to reach around 100k iops. >> >>>>> (osd datas in linux buffer, iostat show 0% disk access) >> >>>>> >> >>>>> then after restarting all osd daemons, >> >>>>> >> >>>>> the same fio benchmark show now around 300k iops. >> >>>>> (osd datas in linux buffer, iostat show 0% disk access) >> >>>>> >> >>>>> >> >>>>> any ideas? >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> before restarting osd >> >>>>> --------------------- >> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, >> >>>>> ioengine=rbd, iodepth=32 ... >> >>>>> fio-2.2.7-10-g51e9 >> >>>>> Starting 10 processes >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> ^Cbs: 10 (f=10): [r(10)] [2.9% done] [376.1MB/0KB/0KB /s] [96.6K/0/0 >> >>>>> iops] [eta 14m:45s] >> >>>>> fio: terminating on signal 2 >> >>>>> >> >>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=17075: Wed Apr >> >>>>> 22 10:00:04 2015 read : io=11558MB, bw=451487KB/s, iops=112871, runt= >> >>>>> 26215msec slat (usec): min=5, max=3685, avg=16.89, stdev=17.38 clat >> >>>>> (usec): min=5, max=62584, avg=2695.80, stdev=5351.23 lat (usec): >> >>>>> min=109, max=62598, avg=2712.68, stdev=5350.42 clat percentiles >> >>>>> (usec): >> >>>>> | 1.00th=[ 155], 5.00th=[ 183], 10.00th=[ 205], 20.00th=[ 247], >> >>>>> | 30.00th=[ 294], 40.00th=[ 354], 50.00th=[ 446], 60.00th=[ 660], >> >>>>> | 70.00th=[ 1176], 80.00th=[ 3152], 90.00th=[ 9024], 95.00th=[14656], >> >>>>> | 99.00th=[25984], 99.50th=[30336], 99.90th=[38656], 99.95th=[41728], >> >>>>> | 99.99th=[47360] >> >>>>> bw (KB /s): min=23928, max=154416, per=10.07%, avg=45462.82, >> >>>>> stdev=28809.95 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, >> >>>>> 250=20.79% lat (usec) : 500=32.74%, 750=8.99%, 1000=5.03% lat (msec) : >> >>>>> 2=8.37%, 4=6.21%, 10=8.90%, 20=6.60%, 50=2.37% lat (msec) : 100=0.01% >> >>>>> cpu : usr=15.90%, sys=3.01%, ctx=765446, majf=0, minf=8710 IO depths : >> >>>>> 1=0.4%, 2=0.9%, 4=2.3%, 8=7.4%, 16=75.5%, 32=13.6%, >=64=0.0% submit : >> >>>>> 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% >> >>>>> complete : 0=0.0%, 4=93.6%, 8=2.8%, 16=2.4%, 32=1.2%, 64=0.0%, >> >>>>>> =64=0.0% issued : total=r=2958935/w=0/d=0, short=r=0/w=0/d=0, >> >>>>> drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, >> >>>>> depth=32 >> >>>>> >> >>>>> Run status group 0 (all jobs): >> >>>>> READ: io=11558MB, aggrb=451487KB/s, minb=451487KB/s, maxb=451487KB/s, >> >>>>> mint=26215msec, maxt=26215msec >> >>>>> >> >>>>> Disk stats (read/write): >> >>>>> sdg: ios=0/29, merge=0/16, ticks=0/3, in_queue=3, util=0.01% >> >>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd >> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, >> >>>>> ioengine=rbd, iodepth=32 >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> AFTER RESTARTING OSDS >> >>>>> ---------------------- >> >>>>> [root@ceph1-3 fiorbd]# ./fio fiorbd >> >>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, >> >>>>> ioengine=rbd, iodepth=32 ... >> >>>>> fio-2.2.7-10-g51e9 >> >>>>> Starting 10 processes >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> rbd engine: RBD version: 0.1.9 >> >>>>> ^Cbs: 10 (f=10): [r(10)] [0.2% done] [1155MB/0KB/0KB /s] [296K/0/0 >> >>>>> iops] [eta 01h:09m:27s] >> >>>>> fio: terminating on signal 2 >> >>>>> >> >>>>> rbd_iodepth32-test: (groupid=0, jobs=10): err= 0: pid=18252: Wed Apr >> >>>>> 22 10:02:28 2015 read : io=7655.7MB, bw=1036.8MB/s, iops=265218, >> >>>>> runt= 7389msec slat (usec): min=5, max=3406, avg=26.59, stdev=40.35 >> >>>>> clat >> >>>>> (usec): min=8, max=684328, avg=930.43, stdev=6419.12 lat (usec): >> >>>>> min=154, max=684342, avg=957.02, stdev=6419.28 clat percentiles >> >>>>> (usec): >> >>>>> | 1.00th=[ 243], 5.00th=[ 314], 10.00th=[ 366], 20.00th=[ 450], >> >>>>> | 30.00th=[ 524], 40.00th=[ 604], 50.00th=[ 692], 60.00th=[ 796], >> >>>>> | 70.00th=[ 924], 80.00th=[ 1096], 90.00th=[ 1400], 95.00th=[ 1720], >> >>>>> | 99.00th=[ 2672], 99.50th=[ 3248], 99.90th=[ 5920], 99.95th=[ 9792], >> >>>>> | 99.99th=[436224] >> >>>>> bw (KB /s): min=32614, max=143160, per=10.19%, avg=108076.46, >> >>>>> stdev=28263.82 lat (usec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%, >> >>>>> 250=1.23% lat (usec) : 500=25.64%, 750=29.15%, 1000=18.84% lat (msec) >> >>>>> : 2=22.19%, 4=2.69%, 10=0.21%, 20=0.02%, 50=0.01% lat (msec) : >> >>>>> 250=0.01%, 500=0.02%, 750=0.01% cpu : usr=44.06%, sys=11.26%, >> >>>>> ctx=642620, majf=0, minf=6832 IO depths : 1=0.1%, 2=0.5%, 4=2.0%, >> >>>>> 8=11.5%, 16=77.8%, 32=8.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, >> >>>>> 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, >> >>>>> 4=94.1%, 8=1.3%, 16=2.3%, 32=2.3%, 64=0.0%, >=64=0.0% issued : >> >>>>> total=r=1959697/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : >> >>>>> target=0, window=0, percentile=100.00%, depth=32 >> >>>>> >> >>>>> Run status group 0 (all jobs): >> >>>>> READ: io=7655.7MB, aggrb=1036.8MB/s, minb=1036.8MB/s, >> >>>>> maxb=1036.8MB/s, mint=7389msec, maxt=7389msec >> >>>>> >> >>>>> Disk stats (read/write): >> >>>>> sdg: ios=0/21, merge=0/10, ticks=0/2, in_queue=2, util=0.03% >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> CEPH LOG >> >>>>> -------- >> >>>>> >> >>>>> before restarting osd >> >>>>> ---------------------- >> >>>>> >> >>>>> 2015-04-22 09:53:17.568095 mon.0 10.7.0.152:6789/0 2144 : cluster >> >>>>> [INF] pgmap v11330: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 298 MB/s rd, 76465 op/s >> >>>>> 2015-04-22 09:53:18.574524 mon.0 10.7.0.152:6789/0 2145 : cluster >> >>>>> [INF] pgmap v11331: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 333 MB/s rd, 85355 op/s >> >>>>> 2015-04-22 09:53:19.579351 mon.0 10.7.0.152:6789/0 2146 : cluster >> >>>>> [INF] pgmap v11332: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 343 MB/s rd, 87932 op/s >> >>>>> 2015-04-22 09:53:20.591586 mon.0 10.7.0.152:6789/0 2147 : cluster >> >>>>> [INF] pgmap v11333: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 328 MB/s rd, 84151 op/s >> >>>>> 2015-04-22 09:53:21.600650 mon.0 10.7.0.152:6789/0 2148 : cluster >> >>>>> [INF] pgmap v11334: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 237 MB/s rd, 60855 op/s >> >>>>> 2015-04-22 09:53:22.607966 mon.0 10.7.0.152:6789/0 2149 : cluster >> >>>>> [INF] pgmap v11335: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 144 MB/s rd, 36935 op/s >> >>>>> 2015-04-22 09:53:23.617780 mon.0 10.7.0.152:6789/0 2150 : cluster >> >>>>> [INF] pgmap v11336: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 321 MB/s rd, 82334 op/s >> >>>>> 2015-04-22 09:53:24.622341 mon.0 10.7.0.152:6789/0 2151 : cluster >> >>>>> [INF] pgmap v11337: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 368 MB/s rd, 94211 op/s >> >>>>> 2015-04-22 09:53:25.628432 mon.0 10.7.0.152:6789/0 2152 : cluster >> >>>>> [INF] pgmap v11338: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 244 MB/s rd, 62644 op/s >> >>>>> 2015-04-22 09:53:26.632855 mon.0 10.7.0.152:6789/0 2153 : cluster >> >>>>> [INF] pgmap v11339: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 175 MB/s rd, 44997 op/s >> >>>>> 2015-04-22 09:53:27.636573 mon.0 10.7.0.152:6789/0 2154 : cluster >> >>>>> [INF] pgmap v11340: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 122 MB/s rd, 31259 op/s >> >>>>> 2015-04-22 09:53:28.645784 mon.0 10.7.0.152:6789/0 2155 : cluster >> >>>>> [INF] pgmap v11341: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 229 MB/s rd, 58674 op/s >> >>>>> 2015-04-22 09:53:29.657128 mon.0 10.7.0.152:6789/0 2156 : cluster >> >>>>> [INF] pgmap v11342: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 271 MB/s rd, 69501 op/s >> >>>>> 2015-04-22 09:53:30.662796 mon.0 10.7.0.152:6789/0 2157 : cluster >> >>>>> [INF] pgmap v11343: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 211 MB/s rd, 54020 op/s >> >>>>> 2015-04-22 09:53:31.666421 mon.0 10.7.0.152:6789/0 2158 : cluster >> >>>>> [INF] pgmap v11344: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 164 MB/s rd, 42001 op/s >> >>>>> 2015-04-22 09:53:32.670842 mon.0 10.7.0.152:6789/0 2159 : cluster >> >>>>> [INF] pgmap v11345: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 134 MB/s rd, 34380 op/s >> >>>>> 2015-04-22 09:53:33.681357 mon.0 10.7.0.152:6789/0 2160 : cluster >> >>>>> [INF] pgmap v11346: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 293 MB/s rd, 75213 op/s >> >>>>> 2015-04-22 09:53:34.692177 mon.0 10.7.0.152:6789/0 2161 : cluster >> >>>>> [INF] pgmap v11347: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 337 MB/s rd, 86353 op/s >> >>>>> 2015-04-22 09:53:35.697401 mon.0 10.7.0.152:6789/0 2162 : cluster >> >>>>> [INF] pgmap v11348: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 229 MB/s rd, 58839 op/s >> >>>>> 2015-04-22 09:53:36.699309 mon.0 10.7.0.152:6789/0 2163 : cluster >> >>>>> [INF] pgmap v11349: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 390 GB data, 391 GB used, 904 GB / >> >>>>> 1295 GB avail; 152 MB/s rd, 39117 op/s >> >>>>> >> >>>>> >> >>>>> restarting osd >> >>>>> --------------- >> >>>>> >> >>>>> 2015-04-22 10:00:09.766906 mon.0 10.7.0.152:6789/0 2255 : cluster >> >>>>> [INF] osd.0 marked itself down >> >>>>> 2015-04-22 10:00:09.790212 mon.0 10.7.0.152:6789/0 2256 : cluster >> >>>>> [INF] osdmap e849: 9 osds: 8 up, 9 in >> >>>>> 2015-04-22 10:00:09.793050 mon.0 10.7.0.152:6789/0 2257 : cluster >> >>>>> [INF] pgmap v11439: 964 pgs: 2 active+undersized+degraded, 8 >> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, >> >>>>> stale+active+794 >> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail; 516 >> >>>>> kB/s rd, 130 op/s >> >>>>> 2015-04-22 10:00:10.795966 mon.0 10.7.0.152:6789/0 2258 : cluster >> >>>>> [INF] osdmap e850: 9 osds: 8 up, 9 in >> >>>>> 2015-04-22 10:00:10.796675 mon.0 10.7.0.152:6789/0 2259 : cluster >> >>>>> [INF] pgmap v11440: 964 pgs: 2 active+undersized+degraded, 8 >> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, >> >>>>> stale+active+794 >> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:11.798257 mon.0 10.7.0.152:6789/0 2260 : cluster >> >>>>> [INF] pgmap v11441: 964 pgs: 2 active+undersized+degraded, 8 >> >>>>> stale+active+remapped, 106 stale+active+clean, 54 active+remapped, >> >>>>> stale+active+794 >> >>>>> active+clean; 419 GB data, 420 GB used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:12.339696 mon.0 10.7.0.152:6789/0 2262 : cluster >> >>>>> [INF] osd.1 marked itself down >> >>>>> 2015-04-22 10:00:12.800168 mon.0 10.7.0.152:6789/0 2263 : cluster >> >>>>> [INF] osdmap e851: 9 osds: 7 up, 9 in >> >>>>> 2015-04-22 10:00:12.806498 mon.0 10.7.0.152:6789/0 2264 : cluster >> >>>>> [INF] pgmap v11443: 964 pgs: 1 active+undersized+degraded, 13 >> >>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, >> >>>>> stale+active+684 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:13.804186 mon.0 10.7.0.152:6789/0 2265 : cluster >> >>>>> [INF] osdmap e852: 9 osds: 7 up, 9 in >> >>>>> 2015-04-22 10:00:13.805216 mon.0 10.7.0.152:6789/0 2266 : cluster >> >>>>> [INF] pgmap v11444: 964 pgs: 1 active+undersized+degraded, 13 >> >>>>> stale+active+remapped, 216 stale+active+clean, 49 active+remapped, >> >>>>> stale+active+684 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:14.781785 mon.0 10.7.0.152:6789/0 2268 : cluster >> >>>>> [INF] osd.2 marked itself down >> >>>>> 2015-04-22 10:00:14.810571 mon.0 10.7.0.152:6789/0 2269 : cluster >> >>>>> [INF] osdmap e853: 9 osds: 6 up, 9 in >> >>>>> 2015-04-22 10:00:14.813871 mon.0 10.7.0.152:6789/0 2270 : cluster >> >>>>> [INF] pgmap v11445: 964 pgs: 1 active+undersized+degraded, 22 >> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, >> >>>>> stale+active+600 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:15.810333 mon.0 10.7.0.152:6789/0 2271 : cluster >> >>>>> [INF] osdmap e854: 9 osds: 6 up, 9 in >> >>>>> 2015-04-22 10:00:15.811425 mon.0 10.7.0.152:6789/0 2272 : cluster >> >>>>> [INF] pgmap v11446: 964 pgs: 1 active+undersized+degraded, 22 >> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, >> >>>>> stale+active+600 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:16.395105 mon.0 10.7.0.152:6789/0 2273 : cluster >> >>>>> [INF] HEALTH_WARN; 2 pgs degraded; 323 pgs stale; 2 pgs stuck >> >>>>> degraded; 64 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs >> >>>>> undersized; 3/9 in osds are down; clock skew detected on mon.ceph1-2 >> >>>>> 2015-04-22 10:00:16.814432 mon.0 10.7.0.152:6789/0 2274 : cluster >> >>>>> [INF] osd.1 10.7.0.152:6800/14848 boot >> >>>>> 2015-04-22 10:00:16.814938 mon.0 10.7.0.152:6789/0 2275 : cluster >> >>>>> [INF] osdmap e855: 9 osds: 7 up, 9 in >> >>>>> 2015-04-22 10:00:16.815942 mon.0 10.7.0.152:6789/0 2276 : cluster >> >>>>> [INF] pgmap v11447: 964 pgs: 1 active+undersized+degraded, 22 >> >>>>> stale+active+remapped, 300 stale+active+clean, 40 active+remapped, >> >>>>> stale+active+600 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:17.222281 mon.0 10.7.0.152:6789/0 2278 : cluster >> >>>>> [INF] osd.3 marked itself down >> >>>>> 2015-04-22 10:00:17.819371 mon.0 10.7.0.152:6789/0 2279 : cluster >> >>>>> [INF] osdmap e856: 9 osds: 6 up, 9 in >> >>>>> 2015-04-22 10:00:17.822041 mon.0 10.7.0.152:6789/0 2280 : cluster >> >>>>> [INF] pgmap v11448: 964 pgs: 1 active+undersized+degraded, 25 >> >>>>> stale+active+remapped, 394 stale+active+clean, 37 active+remapped, >> >>>>> stale+active+506 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:18.551068 mon.0 10.7.0.152:6789/0 2282 : cluster >> >>>>> [INF] osd.6 marked itself down >> >>>>> 2015-04-22 10:00:18.819387 mon.0 10.7.0.152:6789/0 2283 : cluster >> >>>>> [INF] osd.2 10.7.0.152:6812/15410 boot >> >>>>> 2015-04-22 10:00:18.821134 mon.0 10.7.0.152:6789/0 2284 : cluster >> >>>>> [INF] osdmap e857: 9 osds: 6 up, 9 in >> >>>>> 2015-04-22 10:00:18.824440 mon.0 10.7.0.152:6789/0 2285 : cluster >> >>>>> [INF] pgmap v11449: 964 pgs: 1 active+undersized+degraded, 30 >> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, >> >>>>> stale+active+398 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:19.820947 mon.0 10.7.0.152:6789/0 2287 : cluster >> >>>>> [INF] osdmap e858: 9 osds: 6 up, 9 in >> >>>>> 2015-04-22 10:00:19.821853 mon.0 10.7.0.152:6789/0 2288 : cluster >> >>>>> [INF] pgmap v11450: 964 pgs: 1 active+undersized+degraded, 30 >> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, >> >>>>> stale+active+398 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:20.828047 mon.0 10.7.0.152:6789/0 2290 : cluster >> >>>>> [INF] osd.3 10.7.0.152:6816/15971 boot >> >>>>> 2015-04-22 10:00:20.828431 mon.0 10.7.0.152:6789/0 2291 : cluster >> >>>>> [INF] osdmap e859: 9 osds: 7 up, 9 in >> >>>>> 2015-04-22 10:00:20.829126 mon.0 10.7.0.152:6789/0 2292 : cluster >> >>>>> [INF] pgmap v11451: 964 pgs: 1 active+undersized+degraded, 30 >> >>>>> stale+active+remapped, 502 stale+active+clean, 32 active+remapped, >> >>>>> stale+active+398 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:20.991343 mon.0 10.7.0.152:6789/0 2294 : cluster >> >>>>> [INF] osd.7 marked itself down >> >>>>> 2015-04-22 10:00:21.830389 mon.0 10.7.0.152:6789/0 2295 : cluster >> >>>>> [INF] osd.0 10.7.0.152:6804/14481 boot >> >>>>> 2015-04-22 10:00:21.832518 mon.0 10.7.0.152:6789/0 2296 : cluster >> >>>>> [INF] osdmap e860: 9 osds: 7 up, 9 in >> >>>>> 2015-04-22 10:00:21.836129 mon.0 10.7.0.152:6789/0 2297 : cluster >> >>>>> [INF] pgmap v11452: 964 pgs: 1 active+undersized+degraded, 35 >> >>>>> stale+active+remapped, 608 stale+active+clean, 27 active+remapped, >> >>>>> stale+active+292 >> >>>>> active+clean, 1 stale+active+undersized+degraded; 419 GB data, 420 GB >> >>>>> used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:22.830456 mon.0 10.7.0.152:6789/0 2298 : cluster >> >>>>> [INF] osd.6 10.7.0.153:6808/21955 boot >> >>>>> 2015-04-22 10:00:22.832171 mon.0 10.7.0.152:6789/0 2299 : cluster >> >>>>> [INF] osdmap e861: 9 osds: 8 up, 9 in >> >>>>> 2015-04-22 10:00:22.836272 mon.0 10.7.0.152:6789/0 2300 : cluster >> >>>>> [INF] pgmap v11453: 964 pgs: 3 active+undersized+degraded, 27 >> >>>>> stale+active+remapped, 498 stale+active+clean, 2 peering, 28 >> >>>>> active+remapped, 402 active+clean, 4 remapped+peering; 419 GB data, >> >>>>> 420 GB used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:23.420309 mon.0 10.7.0.152:6789/0 2302 : cluster >> >>>>> [INF] osd.8 marked itself down >> >>>>> 2015-04-22 10:00:23.833708 mon.0 10.7.0.152:6789/0 2303 : cluster >> >>>>> [INF] osdmap e862: 9 osds: 7 up, 9 in >> >>>>> 2015-04-22 10:00:23.836459 mon.0 10.7.0.152:6789/0 2304 : cluster >> >>>>> [INF] pgmap v11454: 964 pgs: 3 active+undersized+degraded, 44 >> >>>>> stale+active+remapped, 587 stale+active+clean, 2 peering, 11 >> >>>>> active+remapped, 313 active+clean, 4 remapped+peering; 419 GB data, >> >>>>> 420 GB used, 874 GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:24.832905 mon.0 10.7.0.152:6789/0 2305 : cluster >> >>>>> [INF] osd.7 10.7.0.153:6804/22536 boot >> >>>>> 2015-04-22 10:00:24.834381 mon.0 10.7.0.152:6789/0 2306 : cluster >> >>>>> [INF] osdmap e863: 9 osds: 8 up, 9 in >> >>>>> 2015-04-22 10:00:24.836977 mon.0 10.7.0.152:6789/0 2307 : cluster >> >>>>> [INF] pgmap v11455: 964 pgs: 3 active+undersized+degraded, 31 >> >>>>> stale+active+remapped, 503 stale+active+clean, 4 >> >>>>> active+undersized+degraded+remapped, 5 peering, 13 active+remapped, >> >>>>> 397 active+clean, 8 remapped+peering; 419 GB data, 420 GB used, 874 >> >>>>> GB / 1295 GB avail >> >>>>> 2015-04-22 10:00:25.834459 mon.0 10.7.0.152:6789/0 2309 : cluster >> >>>>> [INF] osdmap e864: 9 osds: 8 up, 9 in >> >>>>> 2015-04-22 10:00:25.835727 mon.0 10.7.0.152:6789/0 2310 : cluster >> >>>>> [INF] pgmap v11456: 964 pgs: 3 active+undersized+degraded, 31 >> >>>>> stale+active+remapped, 503 stale+active+clean, 4 >> >>>>> active+undersized+degraded+remapped, 5 peering, 13 active >> >>>>> >> >>>>> >> >>>>> AFTER OSD RESTART >> >>>>> ------------------ >> >>>>> >> >>>>> >> >>>>> 2015-04-22 10:09:27.609052 mon.0 10.7.0.152:6789/0 2339 : cluster >> >>>>> [INF] pgmap v11478: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 786 MB/s rd, 196 kop/s >> >>>>> 2015-04-22 10:09:28.618082 mon.0 10.7.0.152:6789/0 2340 : cluster >> >>>>> [INF] pgmap v11479: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 1578 MB/s rd, 394 kop/s >> >>>>> 2015-04-22 10:09:30.629067 mon.0 10.7.0.152:6789/0 2341 : cluster >> >>>>> [INF] pgmap v11480: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 932 MB/s rd, 233 kop/s >> >>>>> 2015-04-22 10:09:32.645890 mon.0 10.7.0.152:6789/0 2342 : cluster >> >>>>> [INF] pgmap v11481: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 627 MB/s rd, 156 kop/s >> >>>>> 2015-04-22 10:09:33.652634 mon.0 10.7.0.152:6789/0 2343 : cluster >> >>>>> [INF] pgmap v11482: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 1034 MB/s rd, 258 kop/s >> >>>>> 2015-04-22 10:09:35.655657 mon.0 10.7.0.152:6789/0 2344 : cluster >> >>>>> [INF] pgmap v11483: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 529 MB/s rd, 132 kop/s >> >>>>> 2015-04-22 10:09:37.674332 mon.0 10.7.0.152:6789/0 2345 : cluster >> >>>>> [INF] pgmap v11484: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 770 MB/s rd, 192 kop/s >> >>>>> 2015-04-22 10:09:38.679445 mon.0 10.7.0.152:6789/0 2346 : cluster >> >>>>> [INF] pgmap v11485: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 1358 MB/s rd, 339 kop/s >> >>>>> 2015-04-22 10:09:40.690037 mon.0 10.7.0.152:6789/0 2347 : cluster >> >>>>> [INF] pgmap v11486: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 649 MB/s rd, 162 kop/s >> >>>>> 2015-04-22 10:09:42.707164 mon.0 10.7.0.152:6789/0 2348 : cluster >> >>>>> [INF] pgmap v11487: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 580 MB/s rd, 145 kop/s >> >>>>> 2015-04-22 10:09:43.713736 mon.0 10.7.0.152:6789/0 2349 : cluster >> >>>>> [INF] pgmap v11488: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 962 MB/s rd, 240 kop/s >> >>>>> 2015-04-22 10:09:45.718658 mon.0 10.7.0.152:6789/0 2350 : cluster >> >>>>> [INF] pgmap v11489: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 506 MB/s rd, 126 kop/s >> >>>>> 2015-04-22 10:09:47.737358 mon.0 10.7.0.152:6789/0 2351 : cluster >> >>>>> [INF] pgmap v11490: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 774 MB/s rd, 193 kop/s >> >>>>> 2015-04-22 10:09:48.743338 mon.0 10.7.0.152:6789/0 2352 : cluster >> >>>>> [INF] pgmap v11491: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 1363 MB/s rd, 340 kop/s >> >>>>> 2015-04-22 10:09:50.746685 mon.0 10.7.0.152:6789/0 2353 : cluster >> >>>>> [INF] pgmap v11492: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 662 MB/s rd, 165 kop/s >> >>>>> 2015-04-22 10:09:52.762461 mon.0 10.7.0.152:6789/0 2354 : cluster >> >>>>> [INF] pgmap v11493: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 593 MB/s rd, 148 kop/s >> >>>>> 2015-04-22 10:09:53.767729 mon.0 10.7.0.152:6789/0 2355 : cluster >> >>>>> [INF] pgmap v11494: 964 pgs: 2 active+undersized+degraded, 62 >> >>>>> active+remapped, 900 active+clean; 419 GB data, 421 GB used, 874 GB / >> >>>>> 1295 GB avail; 938 MB/s rd, 234 kop/s >> >>>>> >> >>>>> _______________________________________________ >> >>>>> ceph-users mailing list >> >>>>> ceph-users@lists.ceph.com >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>>> >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@lists.ceph.com >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>> >> >>>> ________________________________ >> >>>> >> >>>> PLEASE NOTE: The information contained in this electronic mail >> >>>> message is intended only for the use of the designated recipient(s) >> >>>> named above. If the reader of this message is not the intended >> >>>> recipient, you are hereby notified that you have received this >> >>>> message in error and that any review, dissemination, distribution, or >> >>>> copying of this message is strictly prohibited. If you have received >> >>>> this communication in error, please notify the sender by telephone or >> >>>> e-mail (as shown above) immediately and destroy any and all copies of >> >>>> this message in your possession (whether hard copies or >> >>>> electronically stored copies). >> >>>> >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@lists.ceph.com >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>> >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@lists.ceph.com >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>>> >> >>> _______________________________________________ >> >>> ceph-users mailing list >> >>> ceph-users@lists.ceph.com >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> _______________________________________________ >> >>> ceph-users mailing list >> >>> ceph-users@lists.ceph.com >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> ? ?????????, ??????? ???? ??????????? >> >>> ???.: +79229045757 >> >>> >> >>> >> >>> >> >>> -- >> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> >>> the body of a message to majordomo@vger.kernel.org >> >>> >> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Jvrao >> --- >> First they ignore you, then they laugh at you, then they fight you, >> then you win. - Mahatma Gandhi >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- /tmp/ceph 2015-04-27 17:43:56.726216645 +0200 +++ /etc/init.d/ceph 2015-04-27 17:21:58.567859403 +0200 @@ -306,7 +306,7 @@ if [ -n "$SYSTEMD_RUN" ]; then cmd="$SYSTEMD_RUN -r bash -c '$files $cmd --cluster $cluster -f'" else - cmd="$files $wrap $cmd --cluster $cluster $runmode" + cmd="export TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728; $files $wrap $cmd --cluster $cluster $runmode" fi