From patchwork Mon Jun 22 02:26:48 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 6652981 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 066679F1C1 for ; Mon, 22 Jun 2015 02:27:26 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1295B20658 for ; Mon, 22 Jun 2015 02:27:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 00CE02064F for ; Mon, 22 Jun 2015 02:27:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753208AbbFVC1V (ORCPT ); Sun, 21 Jun 2015 22:27:21 -0400 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:35509 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753059AbbFVC1T (ORCPT ); Sun, 21 Jun 2015 22:27:19 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CHCAAmcYdV//nFLHlcgxBUX6tMDAEBAQEBAQaNdoYShXoCAgEBAoEoTQEBAQEBAYELhCIBAQEEJxMcIxAIEQcDCRoLDwUlAyETiC7IBgEBCAIBHxiGA4lLZQeEKwWMDodvhFaGd41DLYpCJmOBKBwVgU8sMYEDgUUBAQE Received: from ppp121-44-197-249.lns20.syd7.internode.on.net (HELO dastard) ([121.44.197.249]) by ipmail07.adl2.internode.on.net with ESMTP; 22 Jun 2015 11:57:01 +0930 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1Z6rRg-00008C-2Z; Mon, 22 Jun 2015 12:26:48 +1000 Date: Mon, 22 Jun 2015 12:26:48 +1000 From: Dave Chinner To: Josef Bacik Cc: linux-fsdevel@vger.kernel.org, kernel-team@fb.com, viro@ZenIV.linux.org.uk, hch@infradead.org, jack@suse.cz Subject: [PATCH] sync: wait_sb_inodes() calls iput() with spinlock held (was Re: [PATCH 0/7] super block scalabilit patches V3) Message-ID: <20150622022648.GO10224@dastard> References: <1434051673-13838-1-git-send-email-jbacik@fb.com> <20150615213429.GB10224@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150615213429.GB10224@dastard> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Jun 16, 2015 at 07:34:29AM +1000, Dave Chinner wrote: > On Thu, Jun 11, 2015 at 03:41:05PM -0400, Josef Bacik wrote: > > Here are the cleaned up versions of Dave Chinners super block scalability > > patches. I've been testing them locally for a while and they are pretty solid. > > They fix a few big issues, such as the global inode list and soft lockups on > > boxes on unmount that have lots of inodes in cache. Al if you would consider > > pulling these in that would be great, you can pull from here > > > > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git superblock-scaling > > Passes all my smoke tests. > > Tested-by: Dave Chinner FWIW, I just updated my trees to whatever is in the above branch and v4.1-rc8, and now I'm seeing problems with wb.list_lock recursion and "sleeping in atomic" scehduling issues. generic/269 produced this: BUG: spinlock cpu recursion on CPU#1, fsstress/3852 lock: 0xffff88042a650c28, .magic: dead4ead, .owner: fsstress/3804, .owner_cpu: 1 CPU: 1 PID: 3852 Comm: fsstress Tainted: G W 4.1.0-rc8-dgc+ #263 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 ffff88042a650c28 ffff88039898b8e8 ffffffff81e18ffd ffff88042f250fb0 ffff880428f6b8e0 ffff88039898b908 ffffffff81e12f09 ffff88042a650c28 ffffffff8221337b ffff88039898b928 ffffffff81e12f34 ffff88042a650c28 Call Trace: [] dump_stack+0x4c/0x6e [] spin_dump+0x90/0x95 [] spin_bug+0x26/0x2b [] do_raw_spin_lock+0x10d/0x150 [] _raw_spin_lock+0x15/0x20 [] __mark_inode_dirty+0x2b0/0x450 [] __set_page_dirty+0x78/0xd0 [] mark_buffer_dirty+0x61/0xf0 [] __block_commit_write.isra.24+0x81/0xb0 [] block_write_end+0x36/0x70 [] ? __xfs_get_blocks+0x8a0/0x8a0 [] generic_write_end+0x34/0xb0 [] ? wait_for_stable_page+0x1d/0x50 [] xfs_vm_write_end+0x67/0xc0 [] pagecache_write_end+0x1f/0x30 [] xfs_iozero+0x10d/0x190 [] xfs_zero_last_block+0xdb/0x110 [] xfs_zero_eof+0x11a/0x290 [] ? complete_walk+0x60/0x100 [] ? path_lookupat+0x5f/0x660 [] xfs_file_aio_write_checks+0x13e/0x160 [] xfs_file_buffered_aio_write+0x75/0x250 [] ? user_path_at_empty+0x5f/0xa0 [] ? __might_sleep+0x4d/0x90 [] xfs_file_write_iter+0x105/0x120 [] __vfs_write+0xae/0xf0 [] vfs_write+0xa1/0x190 [] SyS_write+0x49/0xb0 [] ? SyS_lseek+0x91/0xb0 [] system_call_fastpath+0x12/0x71 And there are a few tests (including generic/269) producing in_atomic/"scheduling while atomic" bugs in the evict() path such as: in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: fsstress CPU: 12 PID: 3852 Comm: fsstress Not tainted 4.1.0-rc8-dgc+ #263 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 000000000000015d ffff88039898b6d8 ffffffff81e18ffd 0000000000000000 ffff880398865550 ffff88039898b6f8 ffffffff810c5f89 ffff8803f15c45c0 ffffffff8227a3bf ffff88039898b728 ffffffff810c601d ffff88039898b758 Call Trace: [] dump_stack+0x4c/0x6e [] ___might_sleep+0xf9/0x140 [] __might_sleep+0x4d/0x90 [] block_invalidatepage+0xab/0x140 [] xfs_vm_invalidatepage+0x39/0xb0 [] truncate_inode_page+0x67/0xa0 [] truncate_inode_pages_range+0x1a2/0x6f0 [] ? find_get_pages_tag+0xf1/0x1b0 [] ? __switch_to+0x1e3/0x5a0 [] ? pagevec_lookup_tag+0x25/0x40 [] ? __inode_wait_for_writeback+0x6d/0xc0 [] truncate_inode_pages_final+0x4c/0x60 [] xfs_fs_evict_inode+0x4f/0x100 [] evict+0xc0/0x1a0 [] iput+0x1bb/0x220 [] sync_inodes_sb+0x353/0x3d0 [] xfs_flush_inodes+0x28/0x40 [] xfs_create+0x638/0x770 [] ? xfs_dir2_sf_lookup+0x199/0x330 [] xfs_generic_create+0xd1/0x300 [] ? security_inode_permission+0x1c/0x30 [] xfs_vn_create+0x16/0x20 [] vfs_create+0xd5/0x140 [] do_last+0xff3/0x1200 [] ? path_init+0x186/0x450 [] path_openat+0x80/0x610 [] ? xfs_iunlock+0xc4/0x210 [] do_filp_open+0x3a/0x90 [] ? getname_flags+0x4f/0x200 [] ? _raw_spin_unlock+0xe/0x30 [] ? __alloc_fd+0xa7/0x130 [] do_sys_open+0x128/0x220 [] SyS_creat+0x1e/0x20 [] system_call_fastpath+0x12/0x71 It looks to me like iput() is being called with the wb.list_lock held in wait_sb_inodes(), and everything is going downhill from there. Patch below fixes the problem for me. Cheers, Dave. diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 1718702..a2cd363 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1436,6 +1436,7 @@ static void wait_sb_inodes(struct super_block *sb) { struct backing_dev_info *bdi = sb->s_bdi; LIST_HEAD(sync_list); + struct inode *iput_inode = NULL; /* * We need to be protected against the filesystem going from @@ -1497,6 +1498,9 @@ static void wait_sb_inodes(struct super_block *sb) spin_unlock(&inode->i_lock); spin_unlock(&bdi->wb.list_lock); + if (iput_inode) + iput(iput_inode); + filemap_fdatawait(mapping); cond_resched(); @@ -1516,9 +1520,19 @@ static void wait_sb_inodes(struct super_block *sb) } else list_del_init(&inode->i_wb_list); spin_unlock_irq(&mapping->tree_lock); - iput(inode); + + /* + * can't iput inode while holding the wb.list_lock. Save it for + * the next time through the loop when we drop all our spin + * locks. + */ + iput_inode = inode; } spin_unlock(&bdi->wb.list_lock); + + if (iput_inode) + iput(iput_inode); + mutex_unlock(&sb->s_sync_lock); }