From patchwork Wed Aug 24 23:42:37 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 9298497 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 16917607F0 for ; Wed, 24 Aug 2016 23:42:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E2B2428D43 for ; Wed, 24 Aug 2016 23:42:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C161229131; Wed, 24 Aug 2016 23:42:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from oss.sgi.com (oss.sgi.com [192.48.182.195]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1D40228D43 for ; Wed, 24 Aug 2016 23:42:52 +0000 (UTC) Received: from oss.sgi.com (localhost [IPv6:::1]) by oss.sgi.com (Postfix) with ESMTP id 413397CA1; Wed, 24 Aug 2016 18:42:51 -0500 (CDT) X-Original-To: xfs@oss.sgi.com Delivered-To: xfs@oss.sgi.com Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 2C8187CA0 for ; Wed, 24 Aug 2016 18:42:48 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id D65968F8033 for ; Wed, 24 Aug 2016 16:42:44 -0700 (PDT) X-ASG-Debug-ID: 1472082162-0bf8155c751637b0001-NocioJ Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) by cuda.sgi.com with ESMTP id 8qupufAXPwnUkwAk (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 24 Aug 2016 16:42:42 -0700 (PDT) X-Barracuda-Envelope-From: darrick.wong@oracle.com X-Barracuda-Effective-Source-IP: aserp1040.oracle.com[141.146.126.69] X-Barracuda-Apparent-Source-IP: 141.146.126.69 Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u7ONgeoW013792 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 24 Aug 2016 23:42:40 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u7ONgd49021634 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 24 Aug 2016 23:42:40 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u7ONgdmB022082; Wed, 24 Aug 2016 23:42:39 GMT Received: from localhost (/24.21.211.40) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 24 Aug 2016 16:42:38 -0700 Date: Wed, 24 Aug 2016 16:42:37 -0700 From: "Darrick J. Wong" To: Dave Chinner , Brian Foster Subject: "Bad page state" errors when calling BULKSTAT under memory pressure? Message-ID: <20160824234237.GA22760@birch.djwong.org> X-ASG-Orig-Subj: "Bad page state" errors when calling BULKSTAT under memory pressure? MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-Barracuda-Connect: aserp1040.oracle.com[141.146.126.69] X-Barracuda-Start-Time: 1472082162 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://192.48.157.11:443/cgi-mod/mark.cgi X-Barracuda-Scan-Msg-Size: 7140 X-Virus-Scanned: by bsmtpd at sgi.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using per-user scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.7 tests=BSF_SC0_MISMATCH_TO, UNPARSEABLE_RELAY X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.32299 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.00 BSF_SC0_MISMATCH_TO Envelope rcpt doesn't match header 0.00 UNPARSEABLE_RELAY Informational: message has unparseable relay lines Cc: linux-xfs@vger.kernel.org, xfs@oss.sgi.com X-BeenThere: xfs@oss.sgi.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com X-Virus-Scanned: ClamAV using ClamSMTP Hi everyone, [cc'ing Brian because he was the last one to touch xfs_buf.c] I've been stress-testing xfs_scrub against a 900GB filesystem with 2M inodes using a VM with 512M of RAM. I've noticed that I get BUG messages about pages with negative refcount, but only if the system is under memory pressure. No errors are seen if the VM memory is increased to, say, 20GB. : BUG: Bad page state in process xfs_scrub pfn:00426 : page:ffffea0000010980 count:-1 mapcount:0 mapping: (null) index:0x0 : flags: 0x0() : page dumped because: nonzero _count : Modules linked in: xfs libcrc32c sch_fq_codel af_packet : CPU: 1 PID: 2058 Comm: xfs_scrub Not tainted 4.8.0-rc3-mcsum #18 : Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 : 0000000000000000 ffff906f9c4237b0 ffffffff98331bf3 ffffea0000010980 : ffffffff989d923c ffff906f9c4237d8 ffffffff9819c5b4 0000000000000000 : 0000000000000000 0000000000000000 ffff906f9c4237e8 ffffffff9819c701 : Call Trace: : [] dump_stack+0x85/0xc2 : [] bad_page+0xc4/0x130 : [] check_new_page_bad+0x71/0x80 : [] get_page_from_freelist+0x907/0xa60 : [] __alloc_pages_nodemask+0x156/0xf70 : [] ? is_module_address+0x2f/0x50 : [] xfs_buf_allocate_memory+0x1f6/0x2f0 [xfs] : [] xfs_buf_get_map+0x1bd/0x400 [xfs] : [] xfs_buf_read_map+0x2c/0x360 [xfs] : [] ? queue_unplugged+0x57/0x190 : [] xfs_buf_readahead_map+0x52/0x70 [xfs] : [] xfs_btree_reada_bufs+0x9d/0xb0 [xfs] : [] xfs_bulkstat_ichunk_ra+0xda/0x110 [xfs] : [] ? xfs_btree_check_block+0x14/0x20 [xfs] : [] xfs_bulkstat+0x2b8/0x6c0 [xfs] : [] ? xfs_bulkstat_one_int+0x430/0x430 [xfs] : [] xfs_ioc_bulkstat+0x116/0x210 [xfs] : [] xfs_file_ioctl+0x591/0xf10 [xfs] : [] ? kvm_clock_read+0x23/0x40 : [] ? kvm_sched_clock_read+0x9/0x20 : [] ? sched_clock+0x9/0x10 : [] ? __lock_acquire+0x25d/0x14b0 : [] ? __fget+0xf3/0x200 : [] ? __fget+0xf3/0x200 : [] ? debug_smp_processor_id+0x17/0x20 : [] ? get_lock_stats+0x19/0x60 : [] do_vfs_ioctl+0x93/0x700 : [] ? __fget+0x110/0x200 : [] ? __fget+0x5/0x200 : [] SyS_ioctl+0x41/0x70 : [] entry_SYSCALL_64_fastpath+0x1f/0xbd : Disabling lock debugging due to kernel taint Obviously, a page refcount of -1 is not a good sign. I had a hunch that the page in question was (hopefully) a page backing an xfs_buf, so I applied the following debug patch: I then saw this: : SGI XFS with ACLs, security attributes, realtime, debug enabled : XFS (sda): Mounting V4 Filesystem : XFS (sda): Ending clean mount : XFS: xfs_buf_free: OHNO! daddr=113849120 page=ffffea0000010980 ref=0 : CPU: 2 PID: 2058 Comm: xfs_scrub Not tainted 4.8.0-rc3-mcsum #18 : Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 : 0000000000000000 ffff906f9c4239f8 ffffffff98331bf3 ffff906f95e86b00 : 0000000000000000 ffff906f9c423a20 ffffffffc03ca53e ffff906f9c604900 : 0000000000000000 0000000000010015 ffff906f9c423a60 ffffffffc03cc640 : Call Trace: : [] dump_stack+0x85/0xc2 : [] xfs_buf_free+0x27e/0x290 [xfs] : [] xfs_buf_get_map+0x360/0x400 [xfs] : [] xfs_buf_read_map+0x2c/0x360 [xfs] : [] ? xfs_buf_readahead_map+0x52/0x70 [xfs] : [] xfs_buf_readahead_map+0x52/0x70 [xfs] : [] xfs_btree_reada_bufs+0x9d/0xb0 [xfs] : [] xfs_bulkstat_ichunk_ra+0xda/0x110 [xfs] : [] ? xfs_btree_check_block+0x14/0x20 [xfs] : [] xfs_bulkstat+0x2b8/0x6c0 [xfs] : [] ? xfs_bulkstat_one_int+0x430/0x430 [xfs] : [] xfs_ioc_bulkstat+0x116/0x210 [xfs] : [] xfs_file_ioctl+0x591/0xf10 [xfs] : [] ? kvm_clock_read+0x23/0x40 : [] ? kvm_sched_clock_read+0x9/0x20 : [] ? sched_clock+0x9/0x10 : [] ? __lock_acquire+0x25d/0x14b0 : [] ? __fget+0xf3/0x200 : [] ? __fget+0xf3/0x200 : [] ? debug_smp_processor_id+0x17/0x20 : [] ? get_lock_stats+0x19/0x60 : [] do_vfs_ioctl+0x93/0x700 : [] ? __fget+0x110/0x200 : [] ? __fget+0x5/0x200 : [] SyS_ioctl+0x41/0x70 : [] entry_SYSCALL_64_fastpath+0x1f/0xbd ...followed several seconds later by the BUG message seen earlier. It looks as though the bulkstat call is attempting to start readahead of the inode chunk, but I'm not familiar enough with how RA works to readily tell what's going on here, or why it's freeing a xfs_buf as part of starting RA. The sector 113849120 does indeed contain an inode. # xfs_info /mnt meta-data=/dev/sda isize=512 agcount=4, agsize=56308224 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=225232896, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=109977, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # df -i /mnt ; df -k /mnt Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda 430M 2.0M 428M 1% /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda 900491676 326109388 574382288 37% /mnt Kernel is 4.8-rc3 + Darrick's giant patchset, though I can reproduce it with plain 4.8-rc3 if I beat on bulkstat somewhat harder. Not sure what the fix is here, though disabling inode chunk readahead for bulkstat does make the BUG go away. :$ --D diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 607cc29..144b976 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -317,7 +317,7 @@ xfs_buf_free( for (i = 0; i < bp->b_page_count; i++) { struct page *page = bp->b_pages[i]; - +if (page_ref_count(page) != 1) {xfs_err(NULL, "%s: OHNO! daddr=%llu page=%p ref=%d", __func__, bp->b_bn, page, page_ref_count(page)); dump_stack();} __free_page(page); } } else if (bp->b_flags & _XBF_KMEM)