mm/mempolicy: support MPOL_MF_STRICT for huge page mapping
diff mbox series

Message ID 1580434395-9962-1-git-send-email-lixinhai.lxh@gmail.com
State New
Headers show
Series
  • mm/mempolicy: support MPOL_MF_STRICT for huge page mapping
Related show

Commit Message

Li Xinhai Jan. 31, 2020, 1:33 a.m. UTC
MPOL_MF_STRICT is used in mbind() for purposes:
(1) MPOL_MF_STRICT is set alone without MPOL_MF_MOVE or MPOL_MF_MOVE_ALL,
    to check if there is misplaced page and return -EIO;
(2) MPOL_MF_STRICT is set with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check
    if there is misplaced page which is failed to isolate, or page is
    success on isolate but failed to move, and return -EIO.

For non hugepage mapping, (1) and (2) are implemented as expectation.
For hugepage mapping, (1) is not implemented. And in (2), the part about
failed to isolate and report -EIO is not implemented.

This patch implements the missed parts for hugepage mapping. Benefits
with it applied:
- User space can apply same code logic to handle mbind() on hugepage and
  non hugepage mapping;
- Reliably using MPOL_MF_STRICT alone to check whether there is misplaced
  page or not when bind policy on address range, especially for address
  range which contains both hugepage and non hugepage mapping.

Analysis of potential impact on existing users:
- For users who using MPOL_MF_STRICT alone, since mbind() don't report
  reliable answer about misplaced page, their existing code have to
  utilize other facilities, e.g. numa_maps of proc, to check misplaced
  page. After this patch applied, that dedicated checking will still work
  without been impacted;
- For users who using MPOL_MF_STRICT with MPOL_MF_MOVE or
  MPOL_MF_MOVE_ALL, the semantic about some pages could not be moved will
  not be changed by this patch, because failed to isolate and failed to
  move have same effects to users, so their existing code will not be
  impacted.

In mbind man page, the note about 'MPOL_MF_STRICT is ignored on huge page
mappings' can be removed after this patch is applied.

Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-man <linux-man@vger.kernel.org>
---
Link to relevant discussion:
https://lore.kernel.org/linux-mm/1578993378-10860-2-git-send-email-lixinhai.lxh@gmail.com/


 mm/mempolicy.c | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

Comments

Mike Kravetz Feb. 1, 2020, 3:28 a.m. UTC | #1
On 1/30/20 5:33 PM, Li Xinhai wrote:
> MPOL_MF_STRICT is used in mbind() for purposes:
> (1) MPOL_MF_STRICT is set alone without MPOL_MF_MOVE or MPOL_MF_MOVE_ALL,
>     to check if there is misplaced page and return -EIO;
> (2) MPOL_MF_STRICT is set with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check
>     if there is misplaced page which is failed to isolate, or page is
>     success on isolate but failed to move, and return -EIO.
> 
> For non hugepage mapping, (1) and (2) are implemented as expectation.
> For hugepage mapping, (1) is not implemented. And in (2), the part about
> failed to isolate and report -EIO is not implemented.
> 
> This patch implements the missed parts for hugepage mapping. Benefits
> with it applied:
> - User space can apply same code logic to handle mbind() on hugepage and
>   non hugepage mapping;
> - Reliably using MPOL_MF_STRICT alone to check whether there is misplaced
>   page or not when bind policy on address range, especially for address
>   range which contains both hugepage and non hugepage mapping.
> 
> Analysis of potential impact on existing users:
> - For users who using MPOL_MF_STRICT alone, since mbind() don't report
>   reliable answer about misplaced page, their existing code have to
>   utilize other facilities, e.g. numa_maps of proc, to check misplaced
>   page. After this patch applied, that dedicated checking will still work
>   without been impacted;

I do not agree with this characterization of impact to existing users.  If
someone uses MPOL_MF_STRICT alone with hugetlb pages today, they will never
get EIO.  After this patch, they will get EIO if the hugetlb pages do not
follow the policy.

Yes, this is the desired behavior after the change with updated documentation.
However, it is a potential change for existing users.  I honestly do not
believe anyone will notice.  But, it is a change in behavior.

> - For users who using MPOL_MF_STRICT with MPOL_MF_MOVE or
>   MPOL_MF_MOVE_ALL, the semantic about some pages could not be moved will
>   not be changed by this patch, because failed to isolate and failed to
>   move have same effects to users, so their existing code will not be
>   impacted.
> 
> In mbind man page, the note about 'MPOL_MF_STRICT is ignored on huge page
> mappings' can be removed after this patch is applied.
> 
> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: linux-man <linux-man@vger.kernel.org>
> ---
> Link to relevant discussion:
> https://lore.kernel.org/linux-mm/1578993378-10860-2-git-send-email-lixinhai.lxh@gmail.com/
> 

Thanks for doing this and the commit message.  I do not see any issues with
the code.

I believe removing the special case for hugetlb pages in mbind is a good
thing.  It is unfortunate that it will cause a change in behavior.  My
belief is that this change will go unnoticed.  Providing consistent
behavior that matches the (updated) documentation is better that maintaining
the current functionality into the future.
Li Xinhai Feb. 1, 2020, 1:56 p.m. UTC | #2
On 2020-02-01 at 11:28 Mike Kravetz wrote:
>On 1/30/20 5:33 PM, Li Xinhai wrote:
>> MPOL_MF_STRICT is used in mbind() for purposes:
>> (1) MPOL_MF_STRICT is set alone without MPOL_MF_MOVE or MPOL_MF_MOVE_ALL,
>>     to check if there is misplaced page and return -EIO;
>> (2) MPOL_MF_STRICT is set with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check
>>     if there is misplaced page which is failed to isolate, or page is
>>     success on isolate but failed to move, and return -EIO.
>>
>> For non hugepage mapping, (1) and (2) are implemented as expectation.
>> For hugepage mapping, (1) is not implemented. And in (2), the part about
>> failed to isolate and report -EIO is not implemented.
>>
>> This patch implements the missed parts for hugepage mapping. Benefits
>> with it applied:
>> - User space can apply same code logic to handle mbind() on hugepage and
>>   non hugepage mapping;
>> - Reliably using MPOL_MF_STRICT alone to check whether there is misplaced
>>   page or not when bind policy on address range, especially for address
>>   range which contains both hugepage and non hugepage mapping.
>>
>> Analysis of potential impact on existing users:
>> - For users who using MPOL_MF_STRICT alone, since mbind() don't report
>>   reliable answer about misplaced page, their existing code have to
>>   utilize other facilities, e.g. numa_maps of proc, to check misplaced
>>   page. After this patch applied, that dedicated checking will still work
>>   without been impacted;
>
>I do not agree with this characterization of impact to existing users.  If
>someone uses MPOL_MF_STRICT alone with hugetlb pages today, they will never
>get EIO.  After this patch, they will get EIO if the hugetlb pages do not
>follow the policy.
> 

Thanks, your point is correct on this specific case.

Try to summarize cases that have been used as setting MPOL_MF_STRICT alone:
- If user has not noticed that STRICT been ignored, and implemented code
  branch for failure, then that branch will now be reachable when page is
  misplaced. This should not be a problem (maybe bug in that user's branch would
  be revealed now).
- If user has noticed that STRICT been ignored, and prepared to check misplaced
  pages by dedicated code (this case is that described in changelog, no impaction). 
- (*) If user has noticed that STRICT been ignored, and assumed mbind() would never
  return failure for misplaced pages (e.g., assert on it). Then, those users will
  be impacted, although their code logic seems weird.
- if users didn't define code logic to handle failure on misplaced pages, no matter
  they noticed that STRICT been ignored or not, they can't be impacted.
In my mind, (*) part is the only case impact to users.

>Yes, this is the desired behavior after the change with updated documentation.
>However, it is a potential change for existing users.  I honestly do not
>believe anyone will notice.  But, it is a change in behavior.
> 
>> - For users who using MPOL_MF_STRICT with MPOL_MF_MOVE or
>>   MPOL_MF_MOVE_ALL, the semantic about some pages could not be moved will
>>   not be changed by this patch, because failed to isolate and failed to
>>   move have same effects to users, so their existing code will not be
>>   impacted.
>>
>> In mbind man page, the note about 'MPOL_MF_STRICT is ignored on huge page
>> mappings' can be removed after this patch is applied.
>>
>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: linux-man <linux-man@vger.kernel.org>
>> ---
>> Link to relevant discussion:
>> https://lore.kernel.org/linux-mm/1578993378-10860-2-git-send-email-lixinhai.lxh@gmail.com/
>>
>
>Thanks for doing this and the commit message.  I do not see any issues with
>the code. 

Thanks for inspecting the code.

>
>I believe removing the special case for hugetlb pages in mbind is a good
>thing.  It is unfortunate that it will cause a change in behavior.  My
>belief is that this change will go unnoticed. Providing consistent
>behavior that matches the (updated) documentation is better that maintaining
>the current functionality into the future.
>
>--
>Mike Kravetz
Mike Kravetz Feb. 10, 2020, 5:19 p.m. UTC | #3
On 1/30/20 5:33 PM, Li Xinhai wrote:
> MPOL_MF_STRICT is used in mbind() for purposes:
> (1) MPOL_MF_STRICT is set alone without MPOL_MF_MOVE or MPOL_MF_MOVE_ALL,
>     to check if there is misplaced page and return -EIO;
> (2) MPOL_MF_STRICT is set with MPOL_MF_MOVE or MPOL_MF_MOVE_ALL, to check
>     if there is misplaced page which is failed to isolate, or page is
>     success on isolate but failed to move, and return -EIO.
> 
> For non hugepage mapping, (1) and (2) are implemented as expectation.
> For hugepage mapping, (1) is not implemented. And in (2), the part about
> failed to isolate and report -EIO is not implemented.
> 
> This patch implements the missed parts for hugepage mapping. Benefits
> with it applied:
> - User space can apply same code logic to handle mbind() on hugepage and
>   non hugepage mapping;
> - Reliably using MPOL_MF_STRICT alone to check whether there is misplaced
>   page or not when bind policy on address range, especially for address
>   range which contains both hugepage and non hugepage mapping.
> 
> Analysis of potential impact on existing users:
> - For users who using MPOL_MF_STRICT alone, since mbind() don't report
>   reliable answer about misplaced page, their existing code have to
>   utilize other facilities, e.g. numa_maps of proc, to check misplaced
>   page. After this patch applied, that dedicated checking will still work
>   without been impacted;
> - For users who using MPOL_MF_STRICT with MPOL_MF_MOVE or
>   MPOL_MF_MOVE_ALL, the semantic about some pages could not be moved will
>   not be changed by this patch, because failed to isolate and failed to
>   move have same effects to users, so their existing code will not be
>   impacted.
> 
> In mbind man page, the note about 'MPOL_MF_STRICT is ignored on huge page
> mappings' can be removed after this patch is applied.
> 
> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: linux-man <linux-man@vger.kernel.org>

Hello Naoya,

Do you have any thoughts on this?  In previous discussions you suggested this
approach of removing the special casing for hugetlb pages.  The code looks
good to me and patch is fine with commmit message modification.  Just wanted
to get your opinion.
HORIGUCHI NAOYA(堀口 直也) Feb. 12, 2020, 3:21 a.m. UTC | #4
On Mon, Feb 10, 2020 at 09:19:48AM -0800, Mike Kravetz wrote:
> On 1/30/20 5:33 PM, Li Xinhai wrote:
...
> > 
> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: linux-man <linux-man@vger.kernel.org>
> 
> Hello Naoya,
> 
> Do you have any thoughts on this?  In previous discussions you suggested this
> approach of removing the special casing for hugetlb pages.  The code looks
> good to me and patch is fine with commmit message modification.  Just wanted
> to get your opinion.

Hi Mike, Xinhai,

The suggested change looks good to me, too.

Thanks,
Naoya Horiguchi
Li Xinhai Feb. 12, 2020, 5:25 a.m. UTC | #5
On 2020-02-12 at 11:21 HORIGUCHI NAOYA(堀口 直也) wrote:
>On Mon, Feb 10, 2020 at 09:19:48AM -0800, Mike Kravetz wrote:
>> On 1/30/20 5:33 PM, Li Xinhai wrote:
>...
>> >
>> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> > Cc: Michal Hocko <mhocko@suse.com>
>> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> > Cc: linux-man <linux-man@vger.kernel.org>
>>
>> Hello Naoya,
>>
>> Do you have any thoughts on this?  In previous discussions you suggested this
>> approach of removing the special casing for hugetlb pages.  The code looks
>> good to me and patch is fine with commmit message modification.  Just wanted
>> to get your opinion.
>
>Hi Mike, Xinhai,
>
>The suggested change looks good to me, too.
>
>Thanks,
>Naoya Horiguchi 

Naoya, thanks for inspecting the change!
Mike Kravetz Feb. 12, 2020, 11:50 p.m. UTC | #6
On 2/11/20 9:25 PM, Li Xinhai wrote:
> On 2020-02-12 at 11:21 HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Mon, Feb 10, 2020 at 09:19:48AM -0800, Mike Kravetz wrote:
>>> On 1/30/20 5:33 PM, Li Xinhai wrote:
>> ...
>>>>
>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>> Cc: linux-man <linux-man@vger.kernel.org>
>>>
>>> Hello Naoya,
>>>
>>> Do you have any thoughts on this?  In previous discussions you suggested this
>>> approach of removing the special casing for hugetlb pages.  The code looks
>>> good to me and patch is fine with commmit message modification.  Just wanted
>>> to get your opinion.
>>
>> Hi Mike, Xinhai,
>>
>> The suggested change looks good to me, too.
>>
>> Thanks,
>> Naoya Horiguchi 
> 
> Naoya, thanks for inspecting the change!

Can you please send V2 of patch with an updated commit message.

I would like the section that which lists the impact to "users using
MPOL_MF_STRICT alone" to say something like this:
If MPOL_MF_STRICT alone was previously used, hugetlb pages not following the
memory policy would not cause an EIO error.  After this change, hugetlb pages
are treated like all other pages.  If  MPOL_MF_STRICT alone is used and hugetlb
pages do not follow memory policy an EIO error will be returned.
Li Xinhai Feb. 13, 2020, 1:50 a.m. UTC | #7
On 2020-02-13 at 07:50 Mike Kravetz wrote:
>On 2/11/20 9:25 PM, Li Xinhai wrote:
>> On 2020-02-12 at 11:21 HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Mon, Feb 10, 2020 at 09:19:48AM -0800, Mike Kravetz wrote:
>>>> On 1/30/20 5:33 PM, Li Xinhai wrote:
>>> ...
>>>>>
>>>>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>> Cc: linux-man <linux-man@vger.kernel.org>
>>>>
>>>> Hello Naoya,
>>>>
>>>> Do you have any thoughts on this?  In previous discussions you suggested this
>>>> approach of removing the special casing for hugetlb pages.  The code looks
>>>> good to me and patch is fine with commmit message modification.  Just wanted
>>>> to get your opinion.
>>>
>>> Hi Mike, Xinhai,
>>>
>>> The suggested change looks good to me, too.
>>>
>>> Thanks,
>>> Naoya Horiguchi
>>
>> Naoya, thanks for inspecting the change!
>
>Can you please send V2 of patch with an updated commit message.
>
>I would like the section that which lists the impact to "users using
>MPOL_MF_STRICT alone" to say something like this:
>If MPOL_MF_STRICT alone was previously used, hugetlb pages not following the
>memory policy would not cause an EIO error.  After this change, hugetlb pages
>are treated like all other pages.  If  MPOL_MF_STRICT alone is used and hugetlb
>pages do not follow memory policy an EIO error will be returned.
>--
>Mike Kravetz 

I will send a new patch, thanks.

Patch
diff mbox series

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b2920ae..ec897d1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -557,9 +557,10 @@  static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 			       unsigned long addr, unsigned long end,
 			       struct mm_walk *walk)
 {
+	int ret = 0;
 #ifdef CONFIG_HUGETLB_PAGE
 	struct queue_pages *qp = walk->private;
-	unsigned long flags = qp->flags;
+	unsigned long flags = (qp->flags & MPOL_MF_VALID);
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t entry;
@@ -571,16 +572,44 @@  static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 	page = pte_page(entry);
 	if (!queue_pages_required(page, qp))
 		goto unlock;
+
+	if (flags == MPOL_MF_STRICT) {
+		/*
+		 * STRICT alone means only detecting misplaced page and no
+		 * need to further check other vma.
+		 */
+		ret = -EIO;
+		goto unlock;
+	}
+
+	if (!vma_migratable(walk->vma)) {
+		/*
+		 * Must be STRICT with MOVE*, otherwise .test_walk() have
+		 * stopped walking current vma.
+		 * Detecting misplaced page but allow migrating pages which
+		 * have been queued.
+		 */
+		ret = 1;
+		goto unlock;
+	}
+
 	/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
 	if (flags & (MPOL_MF_MOVE_ALL) ||
-	    (flags & MPOL_MF_MOVE && page_mapcount(page) == 1))
-		isolate_huge_page(page, qp->pagelist);
+	    (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
+		if (!isolate_huge_page(page, qp->pagelist) &&
+			(flags & MPOL_MF_STRICT))
+			/*
+			 * Failed to isolate page but allow migrating pages
+			 * which have been queued.
+			 */
+			ret = 1;
+	}
 unlock:
 	spin_unlock(ptl);
 #else
 	BUG();
 #endif
-	return 0;
+	return ret;
 }
 
 #ifdef CONFIG_NUMA_BALANCING