From patchwork Sat Oct 13 04:29:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 10639889 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA5AF17E3 for ; Sat, 13 Oct 2018 04:41:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C78BF2B776 for ; Sat, 13 Oct 2018 04:41:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B62BF2B78D; Sat, 13 Oct 2018 04:41:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 56D902B776 for ; Sat, 13 Oct 2018 04:41:23 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 106CE21173C65; Fri, 12 Oct 2018 21:41:23 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=dan.j.williams@intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 66FFD2194D3B3 for ; Fri, 12 Oct 2018 21:41:21 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Oct 2018 21:41:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,375,1534834800"; d="scan'208";a="270981463" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005.fm.intel.com with ESMTP; 12 Oct 2018 21:41:20 -0700 Subject: [ndctl PATCH] test, device-dax: Fix intermittent poison handling failures From: Dan Williams To: linux-nvdimm@lists.01.org Date: Fri, 12 Oct 2018 21:29:32 -0700 Message-ID: <153940497244.1425803.2319137619591631976.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP The device-dax unit test sometimes fails with the following kernel message signature: Memory failure: Unable to find user space address 204300 in lt-device-dax Memory failure: 0x204300: forcibly killing lt-device-dax:1334 because of failure to unmap This happens when there is a 3rd party vma in the rmap that has an entry at the same index as the currently failing page. While the test has munmap()'d the previous mapping we still trip over the fact that the kernel memory-failure code does not differentiate munmap vs mremap and upgrades the failure to process fatal. The add_to_kill() routine in the kernel has a comment that says: /* * In theory we don't have to kill when the page was * munmaped. But it could be also a mremap. Since that's * likely very rare kill anyways just out of paranoia, but use * a SIGKILL because the error is not contained anymore. */ ...when it is determining what to do when it can't find the given pfn mapped into the process at the given index. Avoid this case by munmap()'ing *and* closing the file to trigger old / stale vma's to be reaped. With that the only vma that can be looked up is the one the error was injected, the lookup succeeds, and the test passes. Signed-off-by: Dan Williams --- test/device-dax.c | 49 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 34 insertions(+), 15 deletions(-) diff --git a/test/device-dax.c b/test/device-dax.c index 46580fcbaae3..b19c1ed0b535 100644 --- a/test/device-dax.c +++ b/test/device-dax.c @@ -244,23 +244,8 @@ static int __test_device_dax(unsigned long align, int loglevel, if (rc) goto out; - /* upgrade to a writable mapping */ close(fd); munmap(buf, VERIFY_SIZE(align)); - fd = open(path, O_RDWR); - if (fd < 0) { - fprintf(stderr, "%s: failed to open(O_RDWR) device-dax instance\n", - daxctl_dev_get_devname(dev)); - rc = -ENXIO; - goto out; - } - - buf = mmap(NULL, VERIFY_SIZE(align), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); - if (buf == MAP_FAILED) { - fprintf(stderr, "%s: expected PROT_WRITE + MAP_SHARED success\n", - path); - return -ENXIO; - } /* * Prior to 4.8-final these tests cause crashes, or are @@ -270,21 +255,39 @@ static int __test_device_dax(unsigned long align, int loglevel, static const bool devdax = false; int fd2; + fd = open(path, O_RDWR); + if (fd < 0) { + fprintf(stderr, "%s: failed to open for direct-io test\n", + daxctl_dev_get_devname(dev)); + rc = -ENXIO; + goto out; + } rc = test_dax_directio(fd, align, NULL, 0); if (rc) { fprintf(stderr, "%s: failed dax direct-i/o\n", ndctl_namespace_get_devname(ndns)); goto out; } + close(fd); fprintf(stderr, "%s: test dax poison\n", ndctl_namespace_get_devname(ndns)); + + fd = open(path, O_RDWR); + if (fd < 0) { + fprintf(stderr, "%s: failed to open for poison test\n", + daxctl_dev_get_devname(dev)); + rc = -ENXIO; + goto out; + } + rc = test_dax_poison(test, fd, align, NULL, 0, devdax); if (rc) { fprintf(stderr, "%s: failed dax poison\n", ndctl_namespace_get_devname(ndns)); goto out; } + close(fd); fd2 = open("/proc/self/smaps", O_RDONLY); if (fd2 < 0) { @@ -306,6 +309,22 @@ static int __test_device_dax(unsigned long align, int loglevel, } } + /* establish a writable mapping */ + fd = open(path, O_RDWR); + if (fd < 0) { + fprintf(stderr, "%s: failed to open(O_RDWR) device-dax instance\n", + daxctl_dev_get_devname(dev)); + rc = -ENXIO; + goto out; + } + + buf = mmap(NULL, VERIFY_SIZE(align), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); + if (buf == MAP_FAILED) { + fprintf(stderr, "%s: expected PROT_WRITE + MAP_SHARED success\n", + path); + return -ENXIO; + } + rc = reset_device_dax(ndns); if (rc < 0) { fprintf(stderr, "%s: failed to reset device-dax instance\n",