From patchwork Sun Oct  7 23:38:48 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mike Kravetz <mike.kravetz@oracle.com>
X-Patchwork-Id: 10629841
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 95C96112B
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Sun,  7 Oct 2018 23:39:15 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7906828B54
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Sun,  7 Oct 2018 23:39:15 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 699CB28B99; Sun,  7 Oct 2018 23:39:15 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,
	UNPARSEABLE_RELAY autolearn=ham version=3.3.1
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7812828B54
	for <patchwork-linux-mm@patchwork.kernel.org>;
 Sun,  7 Oct 2018 23:39:14 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E8A246B000A; Sun,  7 Oct 2018 19:39:12 -0400 (EDT)
Delivered-To: linux-mm-outgoing@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E135B6B000C; Sun,  7 Oct 2018 19:39:12 -0400 (EDT)
X-Original-To: int-list-linux-mm@kvack.org
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C8CF26B000D; Sun,  7 Oct 2018 19:39:12 -0400 (EDT)
X-Original-To: linux-mm@kvack.org
X-Delivered-To: linux-mm@kvack.org
Received: from mail-it1-f199.google.com (mail-it1-f199.google.com
 [209.85.166.199])
	by kanga.kvack.org (Postfix) with ESMTP id 9AFAA6B000A
	for <linux-mm@kvack.org>; Sun,  7 Oct 2018 19:39:12 -0400 (EDT)
Received: by mail-it1-f199.google.com with SMTP id p73-v6so9489543itb.0
        for <linux-mm@kvack.org>; Sun, 07 Oct 2018 16:39:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:dkim-signature:from:to:cc:subject:date
         :message-id:in-reply-to:references;
        bh=rnwwipCvTDYAeg40/byE+gC9vQXbeVfq0i+KmhBH80s=;
        b=pnj14HQX+fWANVx2uxJdnYvvCytyPhtu7uvIgoUevliWuV+TLHxowtZd7h+vGDRjOx
         MlL69LZWRmlTsFzrQhDZeGlsnFT17VVim+4lf+ziqHuZWtj39rEurrf+iyC4/PMqoc00
         3EPLTrWA0M2c4kHsNzSOPjG3f1QOgSMVJKxPTottmqfAufuj5sCJR8RAIvwKcdl+KYQ/
         gOANNr1NTiOuc7NTLbet/csJysCD/biZoLgdnNG5mGGf6V/WJUC01fe00cKuUsKrSGX+
         3EPBAEKf+ces0n0ZyLos62z8/zrcdE5kEo7YROCOKnhEeQWvNCHJI+Sb6DsfU2EbOSiL
         f4sQ==
X-Gm-Message-State: ABuFfohaDnz/Uq3im2kl0ezHg2fxhhHT1AZquCjp3waKW6vFZ7PBMDkr
	JGqDHUEwaT7SYGsNVhmAri9VhllHlzInrGi8GmfX6wyL1kb23spGqZvqJY7PpAMMyY1zdEMGt46
	n/uW8WWbak/VfdTrORwh0sIkTfHAaIazzaexWqyg9g+8XRQSz7VWmqqNSxNjB1eJ/JQ==
X-Received: by 2002:a24:4486:: with SMTP id
 o128-v6mr16176602ita.23.1538955552295;
        Sun, 07 Oct 2018 16:39:12 -0700 (PDT)
X-Google-Smtp-Source: 
 ACcGV615PnHgnozVvmwfz8xCaFzDITcrz+hGbgo4tBKuc4EIII9n1BgEjYw36ziW7v697vSu+hbB
X-Received: by 2002:a24:4486:: with SMTP id
 o128-v6mr16176563ita.23.1538955550943;
        Sun, 07 Oct 2018 16:39:10 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1538955550; cv=none;
        d=google.com; s=arc-20160816;
        b=p9Q8xS4NdPz5ILQjcYDIIL5h4UHX/UGJ4Om4xtjwQsIb0z1QJugVaz0bazLLhwWnJA
         C7aSOinHaiSPOLchvEhKP2Rr6NOLz4qFdL2Ww0YadAc+1/INe5s8vwfDVbuYQ9Oy3CRu
         biecGlywevwqbCA9WHG9llcZLMU/rz2JaLWKuXgklpeJs/8b1bpivzuFuMo5p6sA7Ygj
         klvh49+z9LJq7JwlOmqN8JgLtBcc94Ek3MAdhtlYTvn/IL4TLNFzA1HBqjYkY4XALyvK
         VoRypA/rNBum6T0up3WhybhsiZSuR8ixK7eLCxNiNeJoioxsvMX/PMouflHq/Tt5qsAu
         ccjQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=rnwwipCvTDYAeg40/byE+gC9vQXbeVfq0i+KmhBH80s=;
        b=LbEIZ45KiQpolMmqcp7RFb4KfNIb35sMGDiwLvjUWVKAppIS8FypwXVGef9FLU7929
         hbvtAC9PjR3PKIhdbBoscW1h3f28MFIuY4VNx3YgPtlSKsoHk3cf3Jm3xkOpiK71tgLY
         mFLsMm0t5wMcHuOECoUcdPyrBlANuHitVcOfAAqzhrx7Z5eg5pKBfId9wHb84584Cm6i
         ow8tQVozIGZe2HkHsLmZH7CNgRro1LJHeTAxMsdmgqYkIc/l3J/jtNsOHgWsGljoJ6CT
         4TBLRd1oOsfGbsQNAH9ozziR06lfmRa9qHPBimgB+/bVcMmdIOMchx4QS+kjwlC7bY7/
         fGDg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@oracle.com header.s=corp-2018-07-02
 header.b="c22b+Z/Z";
       spf=pass (google.com: domain of mike.kravetz@oracle.com designates
 156.151.31.85 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com
Received: from userp2120.oracle.com (userp2120.oracle.com. [156.151.31.85])
        by mx.google.com with ESMTPS id
 y67-v6si10366229jaa.24.2018.10.07.16.39.10
        for <linux-mm@kvack.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Sun, 07 Oct 2018 16:39:10 -0700 (PDT)
Received-SPF: pass (google.com: domain of mike.kravetz@oracle.com designates
 156.151.31.85 as permitted sender) client-ip=156.151.31.85;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@oracle.com header.s=corp-2018-07-02
 header.b="c22b+Z/Z";
       spf=pass (google.com: domain of mike.kravetz@oracle.com designates
 156.151.31.85 as permitted sender) smtp.mailfrom=mike.kravetz@oracle.com;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com
Received: from pps.filterd (userp2120.oracle.com [127.0.0.1])
	by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w97NZ6Ox178050;
	Sun, 7 Oct 2018 23:39:03 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com;
 h=from : to : cc :
 subject : date : message-id : in-reply-to : references; s=corp-2018-07-02;
 bh=rnwwipCvTDYAeg40/byE+gC9vQXbeVfq0i+KmhBH80s=;
 b=c22b+Z/Z7wWwRWUerVeORgJbzanl7eEhiillY//iKTPF+Js65q+bzj6IKkyqlDcFkLlD
 4m/u2N1eRkTtO06DRH8k5gAY49Crpj5+rquiRHbUgj9wNOGKGipwJkILhr+eQtmQ21NN
 9mfd53E6mXKW8nCd4DR/Iy1cBtkw4qTfrs1gaOTnydaF46QRrKPANeD3VYXoR51aqMJm
 ZY+VJTOhB6Xq5jZFeepCgytcJx0BX53U6r6FhaVFqCtayPvUDo5zE1xsRtbqI+j7F8y7
 BYsqDXFG7p6cZmWBU4Q/OF9xd0xycw39LwYxFCGT4Vkmy9pEIqiGGYHJQkkKFrKexSDR Mg==
Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234])
	by userp2120.oracle.com with ESMTP id 2mxnpqkekw-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Sun, 07 Oct 2018 23:39:03 +0000
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72])
	by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w97Nd1kC008712
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Sun, 7 Oct 2018 23:39:02 GMT
Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8])
	by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w97Nd1e7001116;
	Sun, 7 Oct 2018 23:39:01 GMT
Received: from monkey.oracle.com (/50.38.38.67)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Sun, 07 Oct 2018 23:39:00 +0000
From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Michal Hocko <mhocko@kernel.org>, Hugh Dickins <hughd@google.com>,
        Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@redhat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
        Davidlohr Bueso <dave@stgolabs.net>,
        Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH RFC 1/1] hugetlbfs: introduce truncation/fault mutex to avoid
 races
Date: Sun,  7 Oct 2018 16:38:48 -0700
Message-Id: <20181007233848.13397-2-mike.kravetz@oracle.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181007233848.13397-1-mike.kravetz@oracle.com>
References: <20181007233848.13397-1-mike.kravetz@oracle.com>
X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9039
 signatures=668706
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2
 malwarescore=0
 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=985
 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.0.1-1807170000 definitions=main-1810070241
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
X-Virus-Scanned: ClamAV using ClamSMTP

The following hugetlbfs truncate/page fault race can be recreated
with programs doing something like the following.

A huegtlbfs file is mmap(MAP_SHARED) with a size of 4 pages.  At
mmap time, 4 huge pages are reserved for the file/mapping.  So,
the global reserve count is 4.  In addition, since this is a shared
mapping an entry for 4 pages is added to the file's reserve map.
The first 3 of the 4 pages are faulted into the file.  As a result,
the global reserve count is now 1.

Task A starts to fault in the last page (routines hugetlb_fault,
hugetlb_no_page).  It allocates a huge page (alloc_huge_page).
The reserve map indicates there is a reserved page, so this is
used and the global reserve count goes to 0.

Now, task B truncates the file to size 0.  It starts by setting
inode size to 0(hugetlb_vmtruncate).  It then unmaps all mapping
of the file (hugetlb_vmdelete_list).  Since task A's page table
lock is not held at the time, truncation is not blocked.  Truncation
removes the 3 pages from the file (remove_inode_hugepages).  When
cleaning up the reserved pages (hugetlb_unreserve_pages), it notices
the reserve map was for 4 pages.  However, it has only freed 3 pages.
So it assumes there is still (4 - 3) 1 reserved pages.  It then
decrements the global reserve count by 1 and it goes negative.

Task A then continues the page fault process and adds it's newly
acquired page to the page cache.  Note that the index of this page
is beyond the size of the truncated file (0).  The page fault process
then notices the file has been truncated and exits.  However, the
page is left in the cache associated with the file.

Now, if the file is immediately deleted the truncate code runs again.
It will find and free the one page associated with the file.  When
cleaning up reserves, it notices the reserve map is empty.  Yet, one
page freed.  So, the global reserve count is decremented by (0 - 1) -1.
This returns the global count to 0 as it should be.  But, it is
possible for someone else to mmap this file/range before it is deleted.
If this happens, a reserve map entry for the allocated page is created
and the reserved page is forever leaked.

To avoid all these conditions, let's simply prevent faults to a file
while it is being truncated.  Add a new truncation specific rw mutex
to hugetlbfs inode extensions.  faults take the mutex in read mode,
truncation takes in write mode.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c    | 24 ++++++++++++++++++++----
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            | 25 +++++++++++++++++++------
 mm/userfaultfd.c        |  8 +++++++-
 4 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 40d4c66c7751..07b0ba049c37 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -427,10 +427,17 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 			u32 hash;
 
 			index = page->index;
-			hash = hugetlb_fault_mutex_hash(h, current->mm,
+			/*
+			 * Only need to acquire fault mutex in hole punch case.
+			 * For truncation, we are synchronized via truncation
+			 * mutex.
+			 */
+			if (!truncate_op) {
+				hash = hugetlb_fault_mutex_hash(h, current->mm,
 							&pseudo_vma,
 							mapping, index, 0);
-			mutex_lock(&hugetlb_fault_mutex_table[hash]);
+				mutex_lock(&hugetlb_fault_mutex_table[hash]);
+			}
 
 			/*
 			 * If page is mapped, it was faulted in after being
@@ -471,7 +478,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 			}
 
 			unlock_page(page);
-			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+			if (!truncate_op)
+				mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 		}
 		huge_pagevec_release(&pvec);
 		cond_resched();
@@ -498,16 +506,19 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
 	pgoff_t pgoff;
 	struct address_space *mapping = inode->i_mapping;
 	struct hstate *h = hstate_inode(inode);
+	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
 
 	BUG_ON(offset & ~huge_page_mask(h));
 	pgoff = offset >> PAGE_SHIFT;
 
+	down_write(&info->trunc_rwsem);
 	i_size_write(inode, offset);
 	i_mmap_lock_write(mapping);
 	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
 		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
 	i_mmap_unlock_write(mapping);
 	remove_inode_hugepages(inode, offset, LLONG_MAX);
+	up_write(&info->trunc_rwsem);
 	return 0;
 }
 
@@ -626,7 +637,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		/* addr is the offset within the file (zero based) */
 		addr = index * hpage_size;
 
-		/* mutex taken here, fault path and hole punch */
+		/*
+		 * mutex taken here, for fault path and hole punch.
+		 * No need to worry about truncation as we are synchronized
+		 * with inode mutex
+		 */
 		hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping,
 						index, addr);
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -761,6 +776,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 		inode->i_mapping->private_data = resv_map;
 		info->seals = F_SEAL_SEAL;
+		init_rwsem(&info->trunc_rwsem);
 		switch (mode & S_IFMT) {
 		default:
 			init_special_inode(inode, mode, dev);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 36fa6a2a82e3..73844107ee8a 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -277,6 +277,7 @@ struct hugetlbfs_inode_info {
 	struct shared_policy policy;
 	struct inode vfs_inode;
 	unsigned int seals;
+	struct rw_semaphore trunc_rwsem;
 };
 
 static inline struct hugetlbfs_inode_info *HUGETLBFS_I(struct inode *inode)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3103099f64fd..10142c922aab 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3696,6 +3696,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	pte_t new_pte;
 	spinlock_t *ptl;
 	unsigned long haddr = address & huge_page_mask(h);
+	struct hugetlbfs_inode_info *hinode_info = HUGETLBFS_I(mapping->host);
 
 	/*
 	 * Currently, we are forced to kill the process in the event the
@@ -3738,14 +3739,18 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			};
 
 			/*
-			 * hugetlb_fault_mutex must be dropped before
-			 * handling userfault.  Reacquire after handling
-			 * fault to make calling code simpler.
+			 * hugetlb_fault_mutex and truncation mutex must be
+			 * dropped before handling userfault.  Reacquire after
+			 * handling fault to make calling code simpler.
 			 */
 			hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping,
 							idx, haddr);
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+			up_read(&hinode_info->trunc_rwsem);
+
 			ret = handle_userfault(&vmf, VM_UFFD_MISSING);
+
+			down_read(&hinode_info->trunc_rwsem);
 			mutex_lock(&hugetlb_fault_mutex_table[hash]);
 			goto out;
 		}
@@ -3894,6 +3899,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	struct address_space *mapping;
 	int need_wait_lock = 0;
 	unsigned long haddr = address & huge_page_mask(h);
+	struct hugetlbfs_inode_info *hinode_info;
 
 	ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
 	if (ptep) {
@@ -3914,10 +3920,16 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	idx = vma_hugecache_offset(h, vma, haddr);
 
 	/*
-	 * Serialize hugepage allocation and instantiation, so that we don't
-	 * get spurious allocation failures if two CPUs race to instantiate
-	 * the same page in the page cache.
+	 * Use truncate mutex to serialize truncation and page faults.  This
+	 * prevents ANY faults from happening on the file during truncation.
+	 * The fault mutex serializes hugepage allocation and instantiation
+	 * on the same page.  This prevents spurious allocation failures if
+	 * two CPUs race to instantiate the same page in the page cache.
+	 *
+	 * Acquire truncate mutex BEFORE fault mutex.
 	 */
+	hinode_info = HUGETLBFS_I(mapping->host);
+	down_read(&hinode_info->trunc_rwsem);
 	hash = hugetlb_fault_mutex_hash(h, mm, vma, mapping, idx, haddr);
 	mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
@@ -4005,6 +4017,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 out_mutex:
 	mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+	up_read(&hinode_info->trunc_rwsem);
 	/*
 	 * Generally it's safe to hold refcount during waiting page lock. But
 	 * here we just wait to defer the next page fault to avoid busy loop and
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 5029f241908f..554d1731028e 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -169,6 +169,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	pgoff_t idx;
 	u32 hash;
 	struct address_space *mapping;
+	struct hugetlbfs_inode_info *hinode_info;
 
 	/*
 	 * There is no default zero huge page for all huge page sizes as
@@ -244,10 +245,12 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		VM_BUG_ON(dst_addr & ~huge_page_mask(h));
 
 		/*
-		 * Serialize via hugetlb_fault_mutex
+		 * Serialize via truncation and hugetlb_fault_mutex
 		 */
 		idx = linear_page_index(dst_vma, dst_addr);
 		mapping = dst_vma->vm_file->f_mapping;
+		hinode_info = HUGETLBFS_I(mapping->host);
+		down_read(&hinode_info->trunc_rwsem);
 		hash = hugetlb_fault_mutex_hash(h, dst_mm, dst_vma, mapping,
 								idx, dst_addr);
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
@@ -256,6 +259,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		dst_pte = huge_pte_alloc(dst_mm, dst_addr, huge_page_size(h));
 		if (!dst_pte) {
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+			up_read(&hinode_info->trunc_rwsem);
 			goto out_unlock;
 		}
 
@@ -263,6 +267,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		dst_pteval = huge_ptep_get(dst_pte);
 		if (!huge_pte_none(dst_pteval)) {
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+			up_read(&hinode_info->trunc_rwsem);
 			goto out_unlock;
 		}
 
@@ -270,6 +275,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 						dst_addr, src_addr, &page);
 
 		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+		up_read(&hinode_info->trunc_rwsem);
 		vm_alloc_shared = vm_shared;
 
 		cond_resched();