From patchwork Thu May 19 06:58:42 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Tian X-Patchwork-Id: 796362 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p4J6wjLh000480 for ; Thu, 19 May 2011 06:58:45 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932577Ab1ESG6p (ORCPT ); Thu, 19 May 2011 02:58:45 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:64856 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932463Ab1ESG6o convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2011 02:58:44 -0400 Received: by wwa36 with SMTP id 36so2533138wwa.1 for ; Wed, 18 May 2011 23:58:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=q9bQotDr7wMFvwFYEfWOaGQwvbGnLAfvsuZ9N6xfM8c=; b=xGVIsfnJdvgy3C98RCNEzVfDfWhnbdxGaUctxlS5cd8HDKJvRSUkYl+XtY0YnhRbwV hvLMz1oUC32TU8vGDA8GNY+cwrKVrVgJKSloaf58Z16Cq+XYSbDXTNEc+YOvGvGrD1nO R6r4sZeUCo961lZTRsCqBSbN6Oh14XHRB5G7o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=VT1z6uNYaz4bnS/oidQmAuEuz/pqdJQvmuzxZJM5TTgg02rA9F6iuuwGKZu5JqP1l7 WR1qDV1XstmeI+qWZMGkPTZ9HXFsYyb5QC6GY3MlGdxj+lYe3VmN++TmA9Zibu78iNf7 68ZJtal42dNBv0XRerBoVKJyEk3djl6urJprU= MIME-Version: 1.0 Received: by 10.216.145.135 with SMTP id p7mr5569666wej.38.1305788322899; Wed, 18 May 2011 23:58:42 -0700 (PDT) Received: by 10.216.159.67 with HTTP; Wed, 18 May 2011 23:58:42 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 May 2011 14:58:42 +0800 Message-ID: Subject: Re: operate one file in multi clients with libceph From: Simon Tian To: Sage Weil Cc: ceph-devel@vger.kernel.org Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Thu, 19 May 2011 06:58:46 +0000 (UTC) ========================== back trace 1 ================================== (gdb) bt #0 0x000000367fa30265 in raise () from /lib64/libc.so.6 #1 0x000000367fa31d10 in abort () from /lib64/libc.so.6 #2 0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #3 0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6 #4 0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6 #5 0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6 #6 0x00007ffff7c51a54 in ceph::__ceph_assert_fail (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1", file=0x7ffff7ce1431 "client/Client.cc", line=2190, func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)") at common/assert.cc:86 #7 0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50, onfinish=0x7fffe8000b60) at client/Client.cc:2190 #8 0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090, in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0) at client/Client.cc:2930 #9 0x00007ffff7b52d2a in Client::handle_caps (this=0x629090, m=0x631fc0) at client/Client.cc:2711 #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090, m=0x631fc0) at client/Client.cc:1444 #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch (this=0x628350, m=0x631fc0) at msg/Messenger.h:98 #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry (this=0x628350) at msg/SimpleMessenger.cc:352 #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry (this=0x6287d8) at msg/SimpleMessenger.h:533 #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at ./common/Thread.h:41 #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0 #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6 ============================================================ In addition, when writing data for a long time, the write thread will hang on a pthread_cond_wait, as back trace 2 showed: ========================== back trace 2 ================================== Thread 10 (Thread 0x43f87940 (LWP 19986)): #0 0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f147a8e763b in Cond::Wait (this=0x43f86b00, mutex=@0x66d790) at ./common/Cond.h:46 #2 0x00007f147a86630d in Client::wait_on_list (this=0x66d430, ls=@0x674df0) at client/Client.cc:2140 #3 0x00007f147a8771ff in Client::get_caps (this=0x66d430, in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848) at client/Client.cc:1827 #4 0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510, offset=235470336, size=512, buf=0x6452b0 '?' , ".001", '?' ...) at client/Client.cc:5055 #5 0x00007f147a886d73 in Client::write (this=0x66d430, fd=10, buf=0x6452b0 '?' , ".001", '?' ..., size=512, offset=235470336) at client/Client.cc:5007 #6 0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 '?' , ".001", '?' ..., size=512, offset=235470336) at libceph.cc:322 #7 0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30, offset=235470336, length=512, buf=0x6452b0 '?' , ".001", '?' ..., queue=@0x644b60, io=0x6456a0) at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191 #8 0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30, io=0x6456a0, queue=@0x644b60, netfd=5) at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268 #9 0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390 #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0 #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6 ============================================================ Seem not very safe with the patch. So I applied this patch: diff --git a/src/client/Client.cc b/src/client/Client.cc index 7f7fb08..bf0997a 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -4934,7 +4934,6 @@ public: void Client::sync_write_commit(Inode *in) { - client_lock.Lock(); + int r = client_lock.Lock(); assert(unsafe_sync_write > 0); unsafe_sync_write--; @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) } put_inode(in); + if (r) client_lock.Unlock(); } int Client::write(int fd, const char *buf, loff_t size, loff_t offset)