[v2] mm: hugetlb: support for shared memory policy

Message ID	20221019092928.44146-1-huangjie.albert@bytedance.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Albert Huang <huangjie.albert@bytedance.com> To: mike.kravetz@oracle.com Cc: "huangjie.albert" <huangjie.albert@bytedance.com>, Jonathan Corbet <corbet@lwn.net>, Muchun Song <songmuchun@bytedance.com>, Andrew Morton <akpm@linux-foundation.org>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2] mm: hugetlb: support for shared memory policy Date: Wed, 19 Oct 2022 17:29:25 +0800 Message-Id: <20221019092928.44146-1-huangjie.albert@bytedance.com> In-Reply-To: <Y0mUt84TctGP3BtT@monkey> References: <Y0mUt84TctGP3BtT@monkey> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[v2] mm: hugetlb: support for shared memory policy \| expand [v2] mm: hugetlb: support for shared memory policy

Message ID

20221019092928.44146-1-huangjie.albert@bytedance.com (mailing list archive)

State

New

Headers

From: Albert Huang <huangjie.albert@bytedance.com>
To: mike.kravetz@oracle.com
Cc: "huangjie.albert" <huangjie.albert@bytedance.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH v2] mm: hugetlb: support for shared memory policy
Date: Wed, 19 Oct 2022 17:29:25 +0800
Message-Id: <20221019092928.44146-1-huangjie.albert@bytedance.com>
In-Reply-To: <Y0mUt84TctGP3BtT@monkey>
References: <Y0mUt84TctGP3BtT@monkey>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[v2] mm: hugetlb: support for shared memory policy | expand

Commit Message

黄杰 Oct. 19, 2022, 9:29 a.m. UTC

From: "huangjie.albert" <huangjie.albert@bytedance.com>

implement get/set_policy for hugetlb_vm_ops to support the shared policy
This ensures that the mempolicy of all processes sharing this huge page
file is consistent.

In some scenarios where huge pages are shared:
if we need to limit the memory usage of vm within node0, so I set qemu's
mempilciy bind to node0, but if there is a process (such as virtiofsd)
shared memory with the vm, in this case. If the page fault is triggered
by virtiofsd, the allocated memory may go to node1 which depends on
virtiofsd. Although we can use the memory prealloc provided by qemu to
avoid this issue, but this method will significantly increase the
creation time of the vm(a few seconds, depending on memory size).

after we hooked up hugetlb_vm_ops(set/get_policy):
both the shared memory segments created by shmget() with SHM_HUGETLB flag
and the mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.

v1->v2:
1、hugetlb share the memory policy when the vma with the VM_SHARED flag.
2、update the documentation.

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 .../admin-guide/mm/numa_memory_policy.rst     | 20 +++++++++------
 mm/hugetlb.c                                  | 25 +++++++++++++++++++
 2 files changed, 37 insertions(+), 8 deletions(-)

Comments

Aneesh Kumar K.V Oct. 19, 2022, 11:49 a.m. UTC | #1

On 10/19/22 2:59 PM, Albert Huang wrote:
> From: "huangjie.albert" <huangjie.albert@bytedance.com>
> 
> implement get/set_policy for hugetlb_vm_ops to support the shared policy
> This ensures that the mempolicy of all processes sharing this huge page
> file is consistent.
> 
> In some scenarios where huge pages are shared:
> if we need to limit the memory usage of vm within node0, so I set qemu's
> mempilciy bind to node0, but if there is a process (such as virtiofsd)
> shared memory with the vm, in this case. If the page fault is triggered
> by virtiofsd, the allocated memory may go to node1 which depends on
> virtiofsd. Although we can use the memory prealloc provided by qemu to
> avoid this issue, but this method will significantly increase the
> creation time of the vm(a few seconds, depending on memory size).
> 
> after we hooked up hugetlb_vm_ops(set/get_policy):
> both the shared memory segments created by shmget() with SHM_HUGETLB flag
> and the mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
> 
> v1->v2:
> 1、hugetlb share the memory policy when the vma with the VM_SHARED flag.
> 2、update the documentation.
> 
> Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
> ---
>  .../admin-guide/mm/numa_memory_policy.rst     | 20 +++++++++------
>  mm/hugetlb.c                                  | 25 +++++++++++++++++++
>  2 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
> index 5a6afecbb0d0..5672a6c2d2ef 100644
> --- a/Documentation/admin-guide/mm/numa_memory_policy.rst
> +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
> @@ -133,14 +133,18 @@ Shared Policy
>  	the object share the policy, and all pages allocated for the
>  	shared object, by any task, will obey the shared policy.
>  
> -	As of 2.6.22, only shared memory segments, created by shmget() or
> -	mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  When shared
> -	policy support was added to Linux, the associated data structures were
> -	added to hugetlbfs shmem segments.  At the time, hugetlbfs did not
> -	support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
> -	shmem segments were never "hooked up" to the shared policy support.
> -	Although hugetlbfs segments now support lazy allocation, their support
> -	for shared policy has not been completed.
> +	As of 2.6.22, only shared memory segments, created by shmget() without
> +	SHM_HUGETLB flag or mmap(MAP_ANONYMOUS|MAP_SHARED) without MAP_HUGETLB
> +	flag, support shared policy. When shared policy support was added to Linux,
> +	the associated data structures were added to hugetlbfs shmem segments.
> +	At the time, hugetlbfs did not support allocation at fault time--a.k.a
> +	lazy allocation--so hugetlbfs shmem segments were never "hooked up" to
> +	the shared policy support. Although hugetlbfs segments now support lazy
> +	allocation, their support for shared policy has not been completed.
> +
> +	after we hooked up hugetlb_vm_ops(set/get_policy):
> +	both the shared memory segments created by shmget() with SHM_HUGETLB flag
> +	and mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
>  
>  	As mentioned above in :ref:`VMA policies <vma_policy>` section,
>  	allocations of page cache pages for regular files mmap()ed
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87d875e5e0a9..fc7038931832 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4632,6 +4632,27 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_fault *vmf)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_NUMA
> +int hugetlb_vm_op_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
> +{
> +	struct inode *inode = file_inode(vma->vm_file);
> +
> +	if (!(vma->vm_flags & VM_SHARED))
> +		return 0;
> +
> +	return mpol_set_shared_policy(&HUGETLBFS_I(inode)->policy, vma, mpol);
> +}
> +
> +struct mempolicy *hugetlb_vm_op_get_policy(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct inode *inode = file_inode(vma->vm_file);
> +	pgoff_t index;
> +
> +	index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> +	return mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, index);
> +}
> +#endif
> +
>  /*
>   * When a new function is introduced to vm_operations_struct and added
>   * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops.
> @@ -4645,6 +4666,10 @@ const struct vm_operations_struct hugetlb_vm_ops = {
>  	.close = hugetlb_vm_op_close,
>  	.may_split = hugetlb_vm_op_split,
>  	.pagesize = hugetlb_vm_op_pagesize,
> +#ifdef CONFIG_NUMA
> +	.set_policy = hugetlb_vm_op_set_policy,
> +	.get_policy = hugetlb_vm_op_get_policy,
> +#endif
>  };
>  
>  static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,


How is the current usage of 

/* Set numa allocation policy based on index */
hugetlb_set_vma_policy(&pseudo_vma, inode, index);

enforcing the policy with the current code? Also if we have get_policy()

Can we remove the usage of the same in hugetlbfs_fallocate()
after this patch? With shared policy we should be able to fetch
the policy via get_vma_policy()?

A related question does shm_pseudo_vma_init() requires that mpolicy_lookup? 

-aneesh

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index 5a6afecbb0d0..5672a6c2d2ef 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -133,14 +133,18 @@  Shared Policy
 	the object share the policy, and all pages allocated for the
 	shared object, by any task, will obey the shared policy.
 
-	As of 2.6.22, only shared memory segments, created by shmget() or
-	mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  When shared
-	policy support was added to Linux, the associated data structures were
-	added to hugetlbfs shmem segments.  At the time, hugetlbfs did not
-	support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
-	shmem segments were never "hooked up" to the shared policy support.
-	Although hugetlbfs segments now support lazy allocation, their support
-	for shared policy has not been completed.
+	As of 2.6.22, only shared memory segments, created by shmget() without
+	SHM_HUGETLB flag or mmap(MAP_ANONYMOUS|MAP_SHARED) without MAP_HUGETLB
+	flag, support shared policy. When shared policy support was added to Linux,
+	the associated data structures were added to hugetlbfs shmem segments.
+	At the time, hugetlbfs did not support allocation at fault time--a.k.a
+	lazy allocation--so hugetlbfs shmem segments were never "hooked up" to
+	the shared policy support. Although hugetlbfs segments now support lazy
+	allocation, their support for shared policy has not been completed.
+
+	after we hooked up hugetlb_vm_ops(set/get_policy):
+	both the shared memory segments created by shmget() with SHM_HUGETLB flag
+	and mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
 
 	As mentioned above in :ref:`VMA policies <vma_policy>` section,
 	allocations of page cache pages for regular files mmap()ed
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 87d875e5e0a9..fc7038931832 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4632,6 +4632,27 @@  static vm_fault_t hugetlb_vm_op_fault(struct vm_fault *vmf)
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
+int hugetlb_vm_op_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
+{
+	struct inode *inode = file_inode(vma->vm_file);
+
+	if (!(vma->vm_flags & VM_SHARED))
+		return 0;
+
+	return mpol_set_shared_policy(&HUGETLBFS_I(inode)->policy, vma, mpol);
+}
+
+struct mempolicy *hugetlb_vm_op_get_policy(struct vm_area_struct *vma, unsigned long addr)
+{
+	struct inode *inode = file_inode(vma->vm_file);
+	pgoff_t index;
+
+	index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+	return mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, index);
+}
+#endif
+
 /*
  * When a new function is introduced to vm_operations_struct and added
  * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops.
@@ -4645,6 +4666,10 @@  const struct vm_operations_struct hugetlb_vm_ops = {
 	.close = hugetlb_vm_op_close,
 	.may_split = hugetlb_vm_op_split,
 	.pagesize = hugetlb_vm_op_pagesize,
+#ifdef CONFIG_NUMA
+	.set_policy = hugetlb_vm_op_set_policy,
+	.get_policy = hugetlb_vm_op_get_policy,
+#endif
 };
 
 static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,

[v2] mm: hugetlb: support for shared memory policy

Commit Message

Comments

Patch