From patchwork Tue Jun 18 16:59:52 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 2744231 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id CAE019F8E1 for ; Tue, 18 Jun 2013 17:00:05 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 15D7720306 for ; Tue, 18 Jun 2013 17:00:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DDF61202F8 for ; Tue, 18 Jun 2013 16:59:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933184Ab3FRQ74 (ORCPT ); Tue, 18 Jun 2013 12:59:56 -0400 Received: from dkim2.fusionio.com ([66.114.96.54]:51169 "EHLO dkim2.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933050Ab3FRQ7z convert rfc822-to-8bit (ORCPT ); Tue, 18 Jun 2013 12:59:55 -0400 Received: from mx2.fusionio.com (unknown [10.101.1.160]) by dkim2.fusionio.com (Postfix) with ESMTP id DEBFF9A06B0 for ; Tue, 18 Jun 2013 10:59:54 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fusionio.com; s=default; t=1371574794; bh=XQn3K1MDUAiRj+4ngXSU1iAG2nj2PDuWy7oAJoR56io=; h=To:From:In-Reply-To:CC:References:Subject:Date; b=hBqXj2f7J/FLMWCMzUUO9FIw5C89iTDEGv/HpthJ9obWXlJJEm/I4oj1A9VaLp3Sp v6EuNQT/2md5h2DjoKGSKOu8XqTykQ5urZhAfcQMz7UMrXnvZI5h5kv0P7N76pHMAy BBaiL6w7v68fQnbzQSCMJ4B2dNnIEPS6Svzw1FmM= X-ASG-Debug-ID: 1371574794-0421b5021f38a40001-6jHSXT Received: from CAS1.int.fusionio.com (cas1.int.fusionio.com [10.101.1.40]) by mx2.fusionio.com with ESMTP id Gj6AA0L1fcrpzeN4 (version=TLSv1 cipher=AES128-SHA bits=128 verify=NO); Tue, 18 Jun 2013 10:59:54 -0600 (MDT) X-Barracuda-Envelope-From: clmason@fusionio.com Received: from localhost (10.101.1.160) by mail.fusionio.com (10.101.1.40) with Microsoft SMTP Server (TLS) id 14.3.123.3; Tue, 18 Jun 2013 10:59:53 -0600 MIME-Version: 1.0 To: Josef Bacik , Sage Weil From: Chris Mason In-Reply-To: <20130618163706.GC19183@localhost.localdomain> CC: "linux-btrfs@vger.kernel.org" References: <20130618163706.GC19183@localhost.localdomain> Message-ID: <20130618165952.9494.8953@localhost.localdomain> User-Agent: alot/0.3.4 Subject: Re: hang on 3.9, 3.10-rc5 Date: Tue, 18 Jun 2013 12:59:52 -0400 X-ASG-Orig-Subj: Re: hang on 3.9, 3.10-rc5 X-Originating-IP: [10.101.1.160] X-Barracuda-Connect: cas1.int.fusionio.com[10.101.1.40] X-Barracuda-Start-Time: 1371574794 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.101.1.181:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at fusionio.com X-Barracuda-Spam-Score: 0.01 X-Barracuda-Spam-Status: No, SCORE=0.01 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=BSF_SC0_SA_TO_FROM_DOMAIN_MATCH X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.134282 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.01 BSF_SC0_SA_TO_FROM_DOMAIN_MATCH Sender Domain Matches Recipient Domain Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Quoting Josef Bacik (2013-06-18 12:37:06) > On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote: > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5. Is this > > is a known problem? In this case there is no powercycling; just a regular > > ceph-osd workload. > > > > Have you gotten sysrq+w? Can you tell me where He attached it last week. > > log_one_extent.isra.22+0x485 > > is on your box? Thanks, It's very suspect that both of the times he logged this log_one_extent popped up. That should be: wait_event(ordered->wait, ordered->csum_bytes_left == 0); But Sage it would definitely help if you could confirm. If we follow log_one_extent all the way up to btrfs_log_inode: } else if (test_and_clear_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags)) { if (inode_only == LOG_INODE_ALL) fast_search = true; max_key.type = BTRFS_XATTR_ITEM_KEY; ret = drop_objectid_items(trans, log, path, ino, max_key.type); Now fast_search is true, but we don't jump directly to logging the extent. The while loop runs, we hit the first break. ins_nr is zero. Then we: if (fast_search) { btrfs_release_path(dst_path); ret = btrfs_log_changed_extents(trans, root, inode, dst_path); if (ret) { err = ret; goto out_unlock; } Very long way of saying I think we're one release_path short. Sage, I haven't tested this at all yet, I was hoping to trigger it first. --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c276ac9..c1954b3 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3730,6 +3730,7 @@ next_slot: log_extents: if (fast_search) { btrfs_release_path(dst_path); + btrfs_release_path(path); ret = btrfs_log_changed_extents(trans, root, inode, dst_path); if (ret) { err = ret;