From patchwork Thu Apr 6 21:29:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Zwisler X-Patchwork-Id: 9668489 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 58D6A602B8 for ; Thu, 6 Apr 2017 21:30:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 831F0285CB for ; Thu, 6 Apr 2017 21:30:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 778CB285D4; Thu, 6 Apr 2017 21:30:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2130D285CB for ; Thu, 6 Apr 2017 21:30:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755160AbdDFV37 (ORCPT ); Thu, 6 Apr 2017 17:29:59 -0400 Received: from mga02.intel.com ([134.134.136.20]:37649 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754749AbdDFV35 (ORCPT ); Thu, 6 Apr 2017 17:29:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=intel; t=1491514196; x=1523050196; h=from:to:cc:subject:date:message-id; bh=hrmhXvkluDTVhUAp8cRuvrStJk51Ig78AKgR1UbGI9E=; b=Ue0YBtHboSxTDb7aEtZY/smzCnt29D7bM1OA+hcM7lgarE02rBSHfMXP arJDDumLOrpHpzoCiVtt/0BdOz4V/g==; Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Apr 2017 14:29:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,161,1488873600"; d="scan'208";a="953025165" Received: from theros.lm.intel.com ([10.232.112.77]) by orsmga003.jf.intel.com with ESMTP; 06 Apr 2017 14:29:54 -0700 From: Ross Zwisler To: Andrew Morton , linux-kernel@vger.kernel.org Cc: Ross Zwisler , "Darrick J. Wong" , Alexander Viro , Christoph Hellwig , Dan Williams , Jan Kara , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-nvdimm@lists.01.org Subject: [PATCH] dax: fix radix tree insertion race Date: Thu, 6 Apr 2017 15:29:44 -0600 Message-Id: <20170406212944.2866-1-ross.zwisler@linux.intel.com> X-Mailer: git-send-email 2.9.3 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP While running generic/340 in my test setup I hit the following race. It can happen with kernels that support FS DAX PMDs, so v4.10 thru v4.11-rc5. Thread 1 Thread 2 -------- -------- dax_iomap_pmd_fault() grab_mapping_entry() spin_lock_irq() get_unlocked_mapping_entry() 'entry' is NULL, can't call lock_slot() spin_unlock_irq() radix_tree_preload() dax_iomap_pmd_fault() grab_mapping_entry() spin_lock_irq() get_unlocked_mapping_entry() ... lock_slot() spin_unlock_irq() dax_pmd_insert_mapping() spin_lock_irq() __radix_tree_insert() fails with -EEXIST The issue is that we have to drop mapping->tree_lock while calling radix_tree_preload(), but since we didn't have a radix tree entry to lock (unlike in the pmd_downgrade case) we have no protection against Thread 2 coming along and inserting a PMD at the same index. For 4k entries we handled this with a special-case response to -EEXIST coming from the __radix_tree_insert(), but this doesn't save us for PMDs because the -EEXIST case can also mean that we collided with a 4k entry in the radix tree at a different index, but one that is covered by our PMD range. So, correctly handle both the 4k and 2M collision cases by explicitly re-checking the radix tree for an entry at our index once we reacquire mapping->tree_lock. This patch has made it through a clean xfstests run with the current v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a loop. It used to fail within the first 10 iterations. Signed-off-by: Ross Zwisler Cc: [4.10+] --- fs/dax.c | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index de622d4..85abd74 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -373,6 +373,22 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index, } spin_lock_irq(&mapping->tree_lock); + if (!entry) { + /* + * We needed to drop the page_tree lock while calling + * radix_tree_preload() and we didn't have an entry to + * lock. See if another thread inserted an entry at + * our index during this time. + */ + entry = __radix_tree_lookup(&mapping->page_tree, index, + NULL, &slot); + if (entry) { + radix_tree_preload_end(); + spin_unlock_irq(&mapping->tree_lock); + goto restart; + } + } + if (pmd_downgrade) { radix_tree_delete(&mapping->page_tree, index); mapping->nrexceptional--; @@ -388,19 +404,12 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index, if (err) { spin_unlock_irq(&mapping->tree_lock); /* - * Someone already created the entry? This is a - * normal failure when inserting PMDs in a range - * that already contains PTEs. In that case we want - * to return -EEXIST immediately. - */ - if (err == -EEXIST && !(size_flag & RADIX_DAX_PMD)) - goto restart; - /* - * Our insertion of a DAX PMD entry failed, most - * likely because it collided with a PTE sized entry - * at a different index in the PMD range. We haven't - * inserted anything into the radix tree and have no - * waiters to wake. + * Our insertion of a DAX entry failed, most likely + * because we were inserting a PMD entry and it + * collided with a PTE sized entry at a different + * index in the PMD range. We haven't inserted + * anything into the radix tree and have no waiters to + * wake. */ return ERR_PTR(err); }