diff mbox series

[2/2] mm: rmap: Move the cache flushing to the correct place for hugetlb PMD sharing

Message ID f5e3b77c5a4c646e000ffadbf6c3db0531a01795.1650810915.git.baolin.wang@linux.alibaba.com (mailing list archive)
State New
Headers show
Series Fix cache flush issues considering PMD sharing | expand

Commit Message

Baolin Wang April 24, 2022, 2:50 p.m. UTC
The cache level flush will always be first when changing an existing
virtual–>physical mapping to a new value, since this allows us to
properly handle systems whose caches are strict and require a
virtual–>physical translation to exist for a virtual address. So we
should move the cache flushing before huge_pmd_unshare().

As Muchun pointed out[1], now the architectures whose supporting hugetlb
PMD sharing have no cache flush issues in practice. But I think we
should still follow the cache/TLB flushing rules when changing a valid
virtual address mapping in case of potential issues in future.

[1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/rmap.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

Comments

Mike Kravetz April 26, 2022, 12:20 a.m. UTC | #1
On 4/24/22 07:50, Baolin Wang wrote:
> The cache level flush will always be first when changing an existing
> virtual–>physical mapping to a new value, since this allows us to
> properly handle systems whose caches are strict and require a
> virtual–>physical translation to exist for a virtual address. So we
> should move the cache flushing before huge_pmd_unshare().
> 
> As Muchun pointed out[1], now the architectures whose supporting hugetlb
> PMD sharing have no cache flush issues in practice. But I think we
> should still follow the cache/TLB flushing rules when changing a valid
> virtual address mapping in case of potential issues in future.
> 
> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/rmap.c | 40 ++++++++++++++++++++++------------------
>  1 file changed, 22 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 61e63db..81872bb 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			 * do this outside rmap routines.
>  			 */
>  			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
> +			/*
> +			 * huge_pmd_unshare unmapped an entire PMD page.

Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
an entire PMD page?

> +			 * There is no way of knowing exactly which PMDs may
> +			 * be cached for this mm, so we must flush them all.
> +			 * start/end were already adjusted above to cover this
> +			 * range.
> +			 */
> +			flush_cache_range(vma, range.start, range.end);
> +
>  			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
> -				/*
> -				 * huge_pmd_unshare unmapped an entire PMD
> -				 * page.  There is no way of knowing exactly
> -				 * which PMDs may be cached for this mm, so
> -				 * we must flush them all.  start/end were
> -				 * already adjusted above to cover this range.
> -				 */
> -				flush_cache_range(vma, range.start, range.end);
>  				flush_tlb_range(vma, range.start, range.end);
>  				mmu_notifier_invalidate_range(mm, range.start,
>  							      range.end);
> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  				page_vma_mapped_walk_done(&pvmw);
>  				break;
>  			}
> +		} else {
> +			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));

I know this call to flush_cache_page() existed before your change.  But, when
looking at this now I wonder how hugetlb pages are handled?  Are there any
versions of flush_cache_page() that take page size into account?
Baolin Wang April 26, 2022, 6:26 a.m. UTC | #2
On 4/26/2022 8:20 AM, Mike Kravetz wrote:
> On 4/24/22 07:50, Baolin Wang wrote:
>> The cache level flush will always be first when changing an existing
>> virtual–>physical mapping to a new value, since this allows us to
>> properly handle systems whose caches are strict and require a
>> virtual–>physical translation to exist for a virtual address. So we
>> should move the cache flushing before huge_pmd_unshare().
>>
>> As Muchun pointed out[1], now the architectures whose supporting hugetlb
>> PMD sharing have no cache flush issues in practice. But I think we
>> should still follow the cache/TLB flushing rules when changing a valid
>> virtual address mapping in case of potential issues in future.
>>
>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/rmap.c | 40 ++++++++++++++++++++++------------------
>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 61e63db..81872bb 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>   			 * do this outside rmap routines.
>>   			 */
>>   			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>> +			/*
>> +			 * huge_pmd_unshare unmapped an entire PMD page.
> 
> Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
> an entire PMD page?

Sure, will do.

> 
>> +			 * There is no way of knowing exactly which PMDs may
>> +			 * be cached for this mm, so we must flush them all.
>> +			 * start/end were already adjusted above to cover this
>> +			 * range.
>> +			 */
>> +			flush_cache_range(vma, range.start, range.end);
>> +
>>   			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>> -				/*
>> -				 * huge_pmd_unshare unmapped an entire PMD
>> -				 * page.  There is no way of knowing exactly
>> -				 * which PMDs may be cached for this mm, so
>> -				 * we must flush them all.  start/end were
>> -				 * already adjusted above to cover this range.
>> -				 */
>> -				flush_cache_range(vma, range.start, range.end);
>>   				flush_tlb_range(vma, range.start, range.end);
>>   				mmu_notifier_invalidate_range(mm, range.start,
>>   							      range.end);
>> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>   				page_vma_mapped_walk_done(&pvmw);
>>   				break;
>>   			}
>> +		} else {
>> +			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
> 
> I know this call to flush_cache_page() existed before your change.  But, when
> looking at this now I wonder how hugetlb pages are handled?  Are there any
> versions of flush_cache_page() that take page size into account?

Thanks for reminding. I checked the flush_cache_page() implementation on 
some architectures (like arm32), they did not consider the hugetlb 
pages, so I think we may miss flushing the whole cache for hguetlb pages 
on some architectures.

With this patch, we can mitigate this issue, since we change to use 
flush_cache_range() to cover the possible range to flush cache for 
hugetlb pages. Bur for anon hugetlb pages, we should also convert to use
flush_cache_range() instead. I think we can do this conversion in a 
separate patch set with checking all the places, where using 
flush_cache_page() to flush cache for hugetlb pages. How do you think?
Mike Kravetz April 26, 2022, 4:28 p.m. UTC | #3
On 4/25/22 23:26, Baolin Wang wrote:
> 
> 
> On 4/26/2022 8:20 AM, Mike Kravetz wrote:
>> On 4/24/22 07:50, Baolin Wang wrote:
>>> The cache level flush will always be first when changing an existing
>>> virtual–>physical mapping to a new value, since this allows us to
>>> properly handle systems whose caches are strict and require a
>>> virtual–>physical translation to exist for a virtual address. So we
>>> should move the cache flushing before huge_pmd_unshare().
>>>
>>> As Muchun pointed out[1], now the architectures whose supporting hugetlb
>>> PMD sharing have no cache flush issues in practice. But I think we
>>> should still follow the cache/TLB flushing rules when changing a valid
>>> virtual address mapping in case of potential issues in future.
>>>
>>> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@FVFYT0MHHV2J.usts.net/
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>>   mm/rmap.c | 40 ++++++++++++++++++++++------------------
>>>   1 file changed, 22 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 61e63db..81872bb 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>                * do this outside rmap routines.
>>>                */
>>>               VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
>>> +            /*
>>> +             * huge_pmd_unshare unmapped an entire PMD page.
>>
>> Perhaps update this comment to say that huge_pmd_unshare 'may' unmap
>> an entire PMD page?
> 
> Sure, will do.
> 
>>
>>> +             * There is no way of knowing exactly which PMDs may
>>> +             * be cached for this mm, so we must flush them all.
>>> +             * start/end were already adjusted above to cover this
>>> +             * range.
>>> +             */
>>> +            flush_cache_range(vma, range.start, range.end);
>>> +
>>>               if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
>>> -                /*
>>> -                 * huge_pmd_unshare unmapped an entire PMD
>>> -                 * page.  There is no way of knowing exactly
>>> -                 * which PMDs may be cached for this mm, so
>>> -                 * we must flush them all.  start/end were
>>> -                 * already adjusted above to cover this range.
>>> -                 */
>>> -                flush_cache_range(vma, range.start, range.end);
>>>                   flush_tlb_range(vma, range.start, range.end);
>>>                   mmu_notifier_invalidate_range(mm, range.start,
>>>                                     range.end);
>>> @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>                   page_vma_mapped_walk_done(&pvmw);
>>>                   break;
>>>               }
>>> +        } else {
>>> +            flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>
>> I know this call to flush_cache_page() existed before your change.  But, when
>> looking at this now I wonder how hugetlb pages are handled?  Are there any
>> versions of flush_cache_page() that take page size into account?
> 
> Thanks for reminding. I checked the flush_cache_page() implementation on some architectures (like arm32), they did not consider the hugetlb pages, so I think we may miss flushing the whole cache for hguetlb pages on some architectures.
> 
> With this patch, we can mitigate this issue, since we change to use flush_cache_range() to cover the possible range to flush cache for hugetlb pages. Bur for anon hugetlb pages, we should also convert to use
> flush_cache_range() instead. I think we can do this conversion in a separate patch set with checking all the places, where using flush_cache_page() to flush cache for hugetlb pages. How do you think?

Yes, I am OK with that approach.
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index 61e63db..81872bb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1535,15 +1535,16 @@  static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 			 * do this outside rmap routines.
 			 */
 			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+			/*
+			 * huge_pmd_unshare unmapped an entire PMD page.
+			 * There is no way of knowing exactly which PMDs may
+			 * be cached for this mm, so we must flush them all.
+			 * start/end were already adjusted above to cover this
+			 * range.
+			 */
+			flush_cache_range(vma, range.start, range.end);
+
 			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				/*
-				 * huge_pmd_unshare unmapped an entire PMD
-				 * page.  There is no way of knowing exactly
-				 * which PMDs may be cached for this mm, so
-				 * we must flush them all.  start/end were
-				 * already adjusted above to cover this range.
-				 */
-				flush_cache_range(vma, range.start, range.end);
 				flush_tlb_range(vma, range.start, range.end);
 				mmu_notifier_invalidate_range(mm, range.start,
 							      range.end);
@@ -1560,13 +1561,14 @@  static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 				page_vma_mapped_walk_done(&pvmw);
 				break;
 			}
+		} else {
+			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		}
 
 		/*
 		 * Nuke the page table entry. When having to clear
 		 * PageAnonExclusive(), we always have to flush.
 		 */
-		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		if (should_defer_flush(mm, flags) && !anon_exclusive) {
 			/*
 			 * We clear the PTE but do not flush so potentially
@@ -1890,15 +1892,16 @@  static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 			 * do this outside rmap routines.
 			 */
 			VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+			/*
+			 * huge_pmd_unshare unmapped an entire PMD page.
+			 * There is no way of knowing exactly which PMDs may
+			 * be cached for this mm, so we must flush them all.
+			 * start/end were already adjusted above to cover this
+			 * range.
+			 */
+			flush_cache_range(vma, range.start, range.end);
+
 			if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) {
-				/*
-				 * huge_pmd_unshare unmapped an entire PMD
-				 * page.  There is no way of knowing exactly
-				 * which PMDs may be cached for this mm, so
-				 * we must flush them all.  start/end were
-				 * already adjusted above to cover this range.
-				 */
-				flush_cache_range(vma, range.start, range.end);
 				flush_tlb_range(vma, range.start, range.end);
 				mmu_notifier_invalidate_range(mm, range.start,
 							      range.end);
@@ -1915,10 +1918,11 @@  static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
 				page_vma_mapped_walk_done(&pvmw);
 				break;
 			}
+		} else {
+			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		}
 
 		/* Nuke the page table entry. */
-		flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
 		pteval = ptep_clear_flush(vma, address, pvmw.pte);
 
 		/* Set the dirty flag on the folio now the pte is gone. */