hugetlbfs: fix memory leak for resv_map
diff mbox series

Message ID 20190302104713.31467-1-yuyufen@huawei.com
State New
Headers show
Series
  • hugetlbfs: fix memory leak for resv_map
Related show

Commit Message

Yufen Yu March 2, 2019, 10:47 a.m. UTC
When .mknod create a block device file in hugetlbfs, it will
allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
For now, inode->i_mapping->private_data is used to point the resv_map.
However, when open the device, bd_acquire() will set i_mapping as
bd_inode->imapping, result in resv_map memory leak.

We fix the leak by adding a new entry resv_map in hugetlbfs_inode_info.
It can store resv_map pointer.

Programs to reproduce:
	mount -t hugetlbfs nodev hugetlbfs
	mknod hugetlbfs/dev b 0 0
	exec 30<> hugetlbfs/dev
	umount hugetlbfs/

Fixes: 9119a41e90 ("mm, hugetlb: unify region structure handling")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
 fs/hugetlbfs/inode.c    | 12 +++++++++---
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            |  2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

Comments

Mike Kravetz March 4, 2019, 6:29 p.m. UTC | #1
Thank you for finding this issue.

On 3/2/19 2:47 AM, Yufen Yu wrote:
> When .mknod create a block device file in hugetlbfs, it will
> allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
> For now, inode->i_mapping->private_data is used to point the resv_map.
> However, when open the device, bd_acquire() will set i_mapping as
> bd_inode->imapping, result in resv_map memory leak.

We are certainly leaking the resv_map.

> We fix the leak by adding a new entry resv_map in hugetlbfs_inode_info.
> It can store resv_map pointer.

This approach preserves the way the existing code always allocates a
resv_map at inode allocation time.  However, it does add an extra word
to every hugetlbfs inode.  My first thought was, why not special case
the block/char inode creation to not allocate a resv_map?  After all,
it is not used in this case.  In fact, we only need/use the resv_map
when mmap'ing a regular file.  It is a waste to allocate the structure
in all other cases.

It seems like we should be able to wait until a call to hugetlb_reserve_pages()
to allocate the inode specific resv_map in much the same way we do for
private mappings.  We could then remove the resv_map allocation at inode
creation time.  Of course, we would still need the code to free the structure
when the inode is destroyed.

I have not looked too closely at this approach, and there may be some
unknown issues.  However, it would address the leak you discovered and
would result in less memory used for hugetlbfs inodes that are never
mmap'ed.

Any thoughts on this approach?

I know it is beyond the scope of your patch.  If you do not want to try this,
I can code up something in a couple days.
Yufen Yu March 5, 2019, 2:09 a.m. UTC | #2
Hi, Mike


On 2019/3/5 2:29, Mike Kravetz wrote:
> Thank you for finding this issue.
>
> On 3/2/19 2:47 AM, Yufen Yu wrote:
>> When .mknod create a block device file in hugetlbfs, it will
>> allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
>> For now, inode->i_mapping->private_data is used to point the resv_map.
>> However, when open the device, bd_acquire() will set i_mapping as
>> bd_inode->imapping, result in resv_map memory leak.
> We are certainly leaking the resv_map.
>
>> We fix the leak by adding a new entry resv_map in hugetlbfs_inode_info.
>> It can store resv_map pointer.
> This approach preserves the way the existing code always allocates a
> resv_map at inode allocation time.  However, it does add an extra word
> to every hugetlbfs inode.  My first thought was, why not special case
> the block/char inode creation to not allocate a resv_map?  After all,
> it is not used in this case.  In fact, we only need/use the resv_map
> when mmap'ing a regular file.  It is a waste to allocate the structure
> in all other cases.
>
> It seems like we should be able to wait until a call to hugetlb_reserve_pages()
> to allocate the inode specific resv_map in much the same way we do for
> private mappings.  We could then remove the resv_map allocation at inode
> creation time.  Of course, we would still need the code to free the structure
> when the inode is destroyed.
>
> I have not looked too closely at this approach, and there may be some
> unknown issues.  However, it would address the leak you discovered and
> would result in less memory used for hugetlbfs inodes that are never
> mmap'ed.
>
> Any thoughts on this approach?
>
> I know it is beyond the scope of your patch.  If you do not want to try this,
> I can code up something in a couple days.

Thanks for your suggestion. I agree with you. It will be a better solution.
I don't understand hugetlbfs deeply, but I want to try my best to solve 
this problem.

Yufen
Thanks.

Patch
diff mbox series

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a7fa037b876b..7f332443c5da 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -484,11 +484,15 @@  static void hugetlbfs_evict_inode(struct inode *inode)
 {
 	struct resv_map *resv_map;
 
+	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
+
 	remove_inode_hugepages(inode, 0, LLONG_MAX);
-	resv_map = (struct resv_map *)inode->i_mapping->private_data;
+	resv_map = info->resv_map;
 	/* root inode doesn't have the resv_map, so we should check it */
-	if (resv_map)
+	if (resv_map) {
 		resv_map_release(&resv_map->refs);
+		info->resv_map = NULL;
+	}
 	clear_inode(inode);
 }
 
@@ -757,7 +761,7 @@  static struct inode *hugetlbfs_get_inode(struct super_block *sb,
 				&hugetlbfs_i_mmap_rwsem_key);
 		inode->i_mapping->a_ops = &hugetlbfs_aops;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
-		inode->i_mapping->private_data = resv_map;
+		info->resv_map = resv_map;
 		info->seals = F_SEAL_SEAL;
 		switch (mode & S_IFMT) {
 		default:
@@ -1025,6 +1029,7 @@  static struct inode *hugetlbfs_alloc_inode(struct super_block *sb)
 	 * private inode.  This simplifies hugetlbfs_destroy_inode.
 	 */
 	mpol_shared_policy_init(&p->policy, NULL);
+	p->resv_map = NULL;
 
 	return &p->vfs_inode;
 }
@@ -1039,6 +1044,7 @@  static void hugetlbfs_destroy_inode(struct inode *inode)
 {
 	hugetlbfs_inc_free_inodes(HUGETLBFS_SB(inode->i_sb));
 	mpol_free_shared_policy(&HUGETLBFS_I(inode)->policy);
+	HUGETLBFS_I(inode)->resv_map = NULL;
 	call_rcu(&inode->i_rcu, hugetlbfs_i_callback);
 }
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 087fd5f48c91..eafcf14056bc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -291,6 +291,7 @@  struct hugetlbfs_inode_info {
 	struct shared_policy policy;
 	struct inode vfs_inode;
 	unsigned int seals;
+	struct resv_map *resv_map;
 };
 
 static inline struct hugetlbfs_inode_info *HUGETLBFS_I(struct inode *inode)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8dfdffc34a99..763164b15f8f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -739,7 +739,7 @@  void resv_map_release(struct kref *ref)
 
 static inline struct resv_map *inode_resv_map(struct inode *inode)
 {
-	return inode->i_mapping->private_data;
+	return HUGETLBFS_I(inode)->resv_map;
 }
 
 static struct resv_map *vma_resv_map(struct vm_area_struct *vma)