diff mbox series

hugetlbfs: always use address space in inode for resv_map pointer

Message ID 20190419204435.16984-1-mike.kravetz@oracle.com (mailing list archive)
State New, archived
Headers show
Series hugetlbfs: always use address space in inode for resv_map pointer | expand

Commit Message

Mike Kravetz April 19, 2019, 8:44 p.m. UTC
Continuing discussion about commit 58b6e5e8f1ad ("hugetlbfs: fix memory
leak for resv_map") brought up the issue that inode->i_mapping may not
point to the address space embedded within the inode at inode eviction
time.  The hugetlbfs truncate routine handles this by explicitly using
inode->i_data.  However, code cleaning up the resv_map will still use
the address space pointed to by inode->i_mapping.  Luckily, private_data
is NULL for address spaces in all such cases today but, there is no
guarantee this will continue.

Change all hugetlbfs code getting a resv_map pointer to explicitly get
it from the address space embedded within the inode.  In addition, add
more comments in the code to indicate why this is being done.

Reported-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c | 11 +++++++++--
 mm/hugetlb.c         | 19 ++++++++++++++++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

Comments

Yufen Yu May 8, 2019, 7:10 a.m. UTC | #1
On 2019/4/20 4:44, Mike Kravetz wrote:
> Continuing discussion about commit 58b6e5e8f1ad ("hugetlbfs: fix memory
> leak for resv_map") brought up the issue that inode->i_mapping may not
> point to the address space embedded within the inode at inode eviction
> time.  The hugetlbfs truncate routine handles this by explicitly using
> inode->i_data.  However, code cleaning up the resv_map will still use
> the address space pointed to by inode->i_mapping.  Luckily, private_data
> is NULL for address spaces in all such cases today but, there is no
> guarantee this will continue.
>
> Change all hugetlbfs code getting a resv_map pointer to explicitly get
> it from the address space embedded within the inode.  In addition, add
> more comments in the code to indicate why this is being done.
>
> Reported-by: Yufen Yu <yuyufen@huawei.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   fs/hugetlbfs/inode.c | 11 +++++++++--
>   mm/hugetlb.c         | 19 ++++++++++++++++++-
>   2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 9285dd4f4b1c..cbc649cd1722 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -499,8 +499,15 @@ static void hugetlbfs_evict_inode(struct inode *inode)
>   	struct resv_map *resv_map;
>   
>   	remove_inode_hugepages(inode, 0, LLONG_MAX);
> -	resv_map = (struct resv_map *)inode->i_mapping->private_data;
> -	/* root inode doesn't have the resv_map, so we should check it */
> +
> +	/*
> +	 * Get the resv_map from the address space embedded in the inode.
> +	 * This is the address space which points to any resv_map allocated
> +	 * at inode creation time.  If this is a device special inode,
> +	 * i_mapping may not point to the original address space.
> +	 */
> +	resv_map = (struct resv_map *)(&inode->i_data)->private_data;
> +	/* Only regular and link inodes have associated reserve maps */
>   	if (resv_map)
>   		resv_map_release(&resv_map->refs);
>   	clear_inode(inode);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6cdc7b2d9100..b30e97b0ef37 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -740,7 +740,15 @@ void resv_map_release(struct kref *ref)
>   
>   static inline struct resv_map *inode_resv_map(struct inode *inode)
>   {
> -	return inode->i_mapping->private_data;
> +	/*
> +	 * At inode evict time, i_mapping may not point to the original
> +	 * address space within the inode.  This original address space
> +	 * contains the pointer to the resv_map.  So, always use the
> +	 * address space embedded within the inode.
> +	 * The VERY common case is inode->mapping == &inode->i_data but,
> +	 * this may not be true for device special inodes.
> +	 */
> +	return (struct resv_map *)(&inode->i_data)->private_data;
>   }
>   
>   static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
> @@ -4477,6 +4485,11 @@ int hugetlb_reserve_pages(struct inode *inode,
>   	 * called to make the mapping read-write. Assume !vma is a shm mapping
>   	 */
>   	if (!vma || vma->vm_flags & VM_MAYSHARE) {
> +		/*
> +		 * resv_map can not be NULL as hugetlb_reserve_pages is only
> +		 * called for inodes for which resv_maps were created (see
> +		 * hugetlbfs_get_inode).
> +		 */
>   		resv_map = inode_resv_map(inode);
>   
>   		chg = region_chg(resv_map, from, to);
> @@ -4568,6 +4581,10 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
>   	struct hugepage_subpool *spool = subpool_inode(inode);
>   	long gbl_reserve;
>   
> +	/*
> +	 * Since this routine can be called in the evict inode path for all
> +	 * hugetlbfs inodes, resv_map could be NULL.
> +	 */
>   	if (resv_map) {
>   		chg = region_del(resv_map, start, end);
>   		/*

Dose this patch have been applied?

I think it is better to add fixes label, like:
Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")

Since the commit 58b6e5e8f1a has been merged to stable, this patch also 
be needed.
https://www.spinics.net/lists/stable/msg298740.html
Mike Kravetz May 8, 2019, 8:16 p.m. UTC | #2
On 5/8/19 12:10 AM, yuyufen wrote:
> On 2019/4/20 4:44, Mike Kravetz wrote:
>> Continuing discussion about commit 58b6e5e8f1ad ("hugetlbfs: fix memory
>> leak for resv_map") brought up the issue that inode->i_mapping may not
>> point to the address space embedded within the inode at inode eviction
>> time.  The hugetlbfs truncate routine handles this by explicitly using
>> inode->i_data.  However, code cleaning up the resv_map will still use
>> the address space pointed to by inode->i_mapping.  Luckily, private_data
>> is NULL for address spaces in all such cases today but, there is no
>> guarantee this will continue.
>>
>> Change all hugetlbfs code getting a resv_map pointer to explicitly get
>> it from the address space embedded within the inode.  In addition, add
>> more comments in the code to indicate why this is being done.
>>
>> Reported-by: Yufen Yu <yuyufen@huawei.com>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
...
> 
> Dose this patch have been applied?

Andrew has pulled it into his tree.  However, I do not believe there has
been an ACK or Review, so am not sure of the exact status.

> I think it is better to add fixes label, like:
> Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> 
> Since the commit 58b6e5e8f1a has been merged to stable, this patch also be needed.
> https://www.spinics.net/lists/stable/msg298740.html

It must have been the AI that decided 58b6e5e8f1a needed to go to stable.
Even though this technically does not fix 58b6e5e8f1a, I'm OK with adding
the Fixes: to force this to go to the same stable trees.
Andrew Morton May 9, 2019, 11:11 p.m. UTC | #3
On Wed, 8 May 2019 13:16:09 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> > I think it is better to add fixes label, like:
> > Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
> > 
> > Since the commit 58b6e5e8f1a has been merged to stable, this patch also be needed.
> > https://www.spinics.net/lists/stable/msg298740.html
> 
> It must have been the AI that decided 58b6e5e8f1a needed to go to stable.

grr.

> Even though this technically does not fix 58b6e5e8f1a, I'm OK with adding
> the Fixes: to force this to go to the same stable trees.

Why are we bothering with any of this, given that

: Luckily, private_data is NULL for address spaces in all such cases
: today but, there is no guarantee this will continue.

?

Even though 58b6e5e8f1ad was inappropriately backported, the above
still holds, so what problem does a backport of "hugetlbfs: always use
address space in inode for resv_map pointer" actually solve?

And yes, some review of this would be nice
Mike Kravetz May 9, 2019, 11:32 p.m. UTC | #4
On 5/9/19 4:11 PM, Andrew Morton wrote:
> On Wed, 8 May 2019 13:16:09 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>>> I think it is better to add fixes label, like:
>>> Fixes: 58b6e5e8f1ad ("hugetlbfs: fix memory leak for resv_map")
>>>
>>> Since the commit 58b6e5e8f1a has been merged to stable, this patch also be needed.
>>> https://www.spinics.net/lists/stable/msg298740.html
>>
>> It must have been the AI that decided 58b6e5e8f1a needed to go to stable.
> 
> grr.
> 
>> Even though this technically does not fix 58b6e5e8f1a, I'm OK with adding
>> the Fixes: to force this to go to the same stable trees.
> 
> Why are we bothering with any of this, given that
> 
> : Luckily, private_data is NULL for address spaces in all such cases
> : today but, there is no guarantee this will continue.
> 
> ?

You are right.  For stable releases, I do not see any way for this to
be an issue.  We are lucky today (and in the past).  The patch is there
to guard against code changes which may cause this condition to change
in the future.

Yufen Yu, do you see this actually fixing a problem in stable releases?
I believe you originally said this is not a problem today, which would
also imply older releases.  Just want to make sure I am not missing something.
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9285dd4f4b1c..cbc649cd1722 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -499,8 +499,15 @@  static void hugetlbfs_evict_inode(struct inode *inode)
 	struct resv_map *resv_map;
 
 	remove_inode_hugepages(inode, 0, LLONG_MAX);
-	resv_map = (struct resv_map *)inode->i_mapping->private_data;
-	/* root inode doesn't have the resv_map, so we should check it */
+
+	/*
+	 * Get the resv_map from the address space embedded in the inode.
+	 * This is the address space which points to any resv_map allocated
+	 * at inode creation time.  If this is a device special inode,
+	 * i_mapping may not point to the original address space.
+	 */
+	resv_map = (struct resv_map *)(&inode->i_data)->private_data;
+	/* Only regular and link inodes have associated reserve maps */
 	if (resv_map)
 		resv_map_release(&resv_map->refs);
 	clear_inode(inode);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cdc7b2d9100..b30e97b0ef37 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -740,7 +740,15 @@  void resv_map_release(struct kref *ref)
 
 static inline struct resv_map *inode_resv_map(struct inode *inode)
 {
-	return inode->i_mapping->private_data;
+	/*
+	 * At inode evict time, i_mapping may not point to the original
+	 * address space within the inode.  This original address space
+	 * contains the pointer to the resv_map.  So, always use the
+	 * address space embedded within the inode.
+	 * The VERY common case is inode->mapping == &inode->i_data but,
+	 * this may not be true for device special inodes.
+	 */
+	return (struct resv_map *)(&inode->i_data)->private_data;
 }
 
 static struct resv_map *vma_resv_map(struct vm_area_struct *vma)
@@ -4477,6 +4485,11 @@  int hugetlb_reserve_pages(struct inode *inode,
 	 * called to make the mapping read-write. Assume !vma is a shm mapping
 	 */
 	if (!vma || vma->vm_flags & VM_MAYSHARE) {
+		/*
+		 * resv_map can not be NULL as hugetlb_reserve_pages is only
+		 * called for inodes for which resv_maps were created (see
+		 * hugetlbfs_get_inode).
+		 */
 		resv_map = inode_resv_map(inode);
 
 		chg = region_chg(resv_map, from, to);
@@ -4568,6 +4581,10 @@  long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
 	struct hugepage_subpool *spool = subpool_inode(inode);
 	long gbl_reserve;
 
+	/*
+	 * Since this routine can be called in the evict inode path for all
+	 * hugetlbfs inodes, resv_map could be NULL.
+	 */
 	if (resv_map) {
 		chg = region_del(resv_map, start, end);
 		/*