diff mbox series

[v2,02/14] arm: adjust_pte() use pte_offset_map_rw_nolock()

Message ID 7915acf5887e7bf0c5cc71ff30ad2fe8447d005d.1724310149.git.zhengqi.arch@bytedance.com (mailing list archive)
State New, archived
Headers show
Series introduce pte_offset_map_{ro|rw}_nolock() | expand

Commit Message

Qi Zheng Aug. 22, 2024, 7:13 a.m. UTC
In do_adjust_pte(), we may modify the pte entry. At this time, the write
lock of mmap_lock is not held, and the pte_same() check is not performed
after the PTL held. The corresponding pmd entry may have been modified
concurrently. Therefore, in order to ensure the stability if pmd entry,
use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), and do
pmd_same() check after holding the PTL.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 arch/arm/mm/fault-armv.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

David Hildenbrand Aug. 26, 2024, 3:26 p.m. UTC | #1
On 22.08.24 09:13, Qi Zheng wrote:
> In do_adjust_pte(), we may modify the pte entry. At this time, the write
> lock of mmap_lock is not held, and the pte_same() check is not performed
> after the PTL held. The corresponding pmd entry may have been modified
> concurrently. Therefore, in order to ensure the stability if pmd entry,
> use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), and do
> pmd_same() check after holding the PTL.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>   arch/arm/mm/fault-armv.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index 831793cd6ff94..de6c7d8a2ddfc 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
>   	pud_t *pud;
>   	pmd_t *pmd;
>   	pte_t *pte;
> +	pmd_t pmdval;
>   	int ret;
>   
>   	pgd = pgd_offset(vma->vm_mm, address);
> @@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
>   	if (pmd_none_or_clear_bad(pmd))
>   		return 0;
>   
> +again:
>   	/*
>   	 * This is called while another page table is mapped, so we
>   	 * must use the nested version.  This also means we need to
>   	 * open-code the spin-locking.
>   	 */
> -	pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl);
> +	pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, &pmdval, &ptl);
>   	if (!pte)
>   		return 0;
>   
>   	do_pte_lock(ptl);
> +	if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
> +		do_pte_unlock(ptl);
> +		pte_unmap(pte);
> +		goto again;
> +	}
>   
>   	ret = do_adjust_pte(vma, address, pfn, pte);
>   

Looks correct to me, but I wonder why the missing pmd_same check is not 
an issue so far ... any experts? THP on __LINUX_ARM_ARCH__ < 6 is not 
really used/possible?

Acked-by: David Hildenbrand <david@redhat.com>
Muchun Song Aug. 29, 2024, 3:39 a.m. UTC | #2
On 2024/8/26 23:26, David Hildenbrand wrote:
> On 22.08.24 09:13, Qi Zheng wrote:
>> In do_adjust_pte(), we may modify the pte entry. At this time, the write
>> lock of mmap_lock is not held, and the pte_same() check is not performed
>> after the PTL held. The corresponding pmd entry may have been modified
>> concurrently. Therefore, in order to ensure the stability if pmd entry,
>> use pte_offset_map_rw_nolock() to replace pte_offset_map_nolock(), 
>> and do
>> pmd_same() check after holding the PTL.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

>> ---
>>   arch/arm/mm/fault-armv.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
>> index 831793cd6ff94..de6c7d8a2ddfc 100644
>> --- a/arch/arm/mm/fault-armv.c
>> +++ b/arch/arm/mm/fault-armv.c
>> @@ -94,6 +94,7 @@ static int adjust_pte(struct vm_area_struct *vma, 
>> unsigned long address,
>>       pud_t *pud;
>>       pmd_t *pmd;
>>       pte_t *pte;
>> +    pmd_t pmdval;
>>       int ret;
>>         pgd = pgd_offset(vma->vm_mm, address);
>> @@ -112,16 +113,22 @@ static int adjust_pte(struct vm_area_struct 
>> *vma, unsigned long address,
>>       if (pmd_none_or_clear_bad(pmd))
>>           return 0;
>>   +again:
>>       /*
>>        * This is called while another page table is mapped, so we
>>        * must use the nested version.  This also means we need to
>>        * open-code the spin-locking.
>>        */
>> -    pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl);
>> +    pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, 
>> &pmdval, &ptl);
>>       if (!pte)
>>           return 0;
>>         do_pte_lock(ptl);
>> +    if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
>> +        do_pte_unlock(ptl);
>> +        pte_unmap(pte);
>> +        goto again;
>> +    }
>>         ret = do_adjust_pte(vma, address, pfn, pte);
>
> Looks correct to me, but I wonder why the missing pmd_same check is 
> not an issue so far ... any experts? THP on __LINUX_ARM_ARCH__ < 6 is 
> not really used/possible?

I think it is because it does not support THP.

TRANSPARENT_HUGEPAGE depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE which
depends on ARM_LPAE. However, the Kconfig says ARM_LPAE is only
supported on ARMv7 processor.

config ARM_LPAE
          bool "Support for the Large Physical Address Extension"
          depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
                  !CPU_32v4 && !CPU_32v3
          select PHYS_ADDR_T_64BIT
          select SWIOTLB
          help
            Say Y if you have an ARMv7 processor supporting the LPAE page
            table format and you would like to access memory beyond the
            4GB limit. The resulting kernel image will not run on
            processors without the LPA extension.

            If unsure, say N.

Thanks.
>
> Acked-by: David Hildenbrand <david@redhat.com>
>
diff mbox series

Patch

diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 831793cd6ff94..de6c7d8a2ddfc 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -94,6 +94,7 @@  static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
+	pmd_t pmdval;
 	int ret;
 
 	pgd = pgd_offset(vma->vm_mm, address);
@@ -112,16 +113,22 @@  static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
 	if (pmd_none_or_clear_bad(pmd))
 		return 0;
 
+again:
 	/*
 	 * This is called while another page table is mapped, so we
 	 * must use the nested version.  This also means we need to
 	 * open-code the spin-locking.
 	 */
-	pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl);
+	pte = pte_offset_map_rw_nolock(vma->vm_mm, pmd, address, &pmdval, &ptl);
 	if (!pte)
 		return 0;
 
 	do_pte_lock(ptl);
+	if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
+		do_pte_unlock(ptl);
+		pte_unmap(pte);
+		goto again;
+	}
 
 	ret = do_adjust_pte(vma, address, pfn, pte);