Patchwork a racy access flag clearing warning when calling mmap system call

login
register
mail settings
Submitter JianKang Chen
Date Dec. 8, 2017, 3:19 a.m.
Message ID <bd446eae-cb6f-65f8-e5f7-cfb7e4424507@huawei.com>
Download mbox | patch
Permalink /patch/10101565/
State New
Headers show

Comments

JianKang Chen - Dec. 8, 2017, 3:19 a.m.
在 2017/12/7 21:23, Will Deacon 写道:
> On Thu, Dec 07, 2017 at 09:46:59AM +0800, Yisheng Xie wrote:
>> On 2017/12/1 21:18, Will Deacon wrote:
>>> On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>>>> ------------[ cut here ]------------      
>>>> WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
>>>
>>> Given that this is a fairly old 4.1 kernel, could you try to reproduce the
>>> failure with something more recent, please? We've fixed many bugs since
>>> then, some of them involving huge pages.
>>
>> Yeah, this is and old kernel, but I find a scene that will cause this warn_on:
>> When fork and dup_mmap, it will call copy_huge_pmd() and clear the Access Flag.
>>   dup_mmap
>>     -> copy_page_range
>>          -> copy_pud_range
>>               -> copy_pmd_range
>>                    -> copy_huge_pmd
>>                         -> pmd_mkold
>>
>> If we do not have any access after dup_mmap, and start to split this thp,
>> it will cause this call trace in the old kernel, right?
>>
>> It seems this is normal scene but will cause call trace for this old kernel,
>> therefore, for this old kernel, we should just remove this WARN_ON_ONCE, right?
> 
> Whilst racy clearing of the access flag should be safe in practice, I like
> having the warning around because it does indicate that we're setting
> something to old which could immediately be made young again by the CPU.
> 
> In this case, it looks like the mm isn't even live, so a better approach
> would probably be to predicate that conditional on mm == current->active_mm
> or something like that. That also avoids us getting false positive for
> the dirty bit case, which would be harmful if the table was installed.
> 
> diff below. It's still racy with concurrent fork, but I don't want this
> check to become a generic "does my caller hold all the locks to protect
> against a concurrent walk" predicate and it just means we won't catch all
> possible races.
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 149d05fb9421..8fe103b1e101 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -42,6 +42,8 @@
>  #include <asm/cmpxchg.h>
>  #include <asm/fixmap.h>
>  #include <linux/mmdebug.h>
> +#include <linux/mm_types.h>
> +#include <linux/sched.h>
>  
>  extern void __pte_error(const char *file, int line, unsigned long val);
>  extern void __pmd_error(const char *file, int line, unsigned long val);
> @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>  	}
>  }
>  
> -struct mm_struct;
> -struct vm_area_struct;
> -
>  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
>  
>  /*
> @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	 * hardware updates of the pte (ptep_set_access_flags safely changes
>  	 * valid ptes without going through an invalid entry).
>  	 */
> -	if (pte_valid(*ptep) && pte_valid(pte)) {
> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
> +	   (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
>  		VM_WARN_ONCE(!pte_young(pte),
>  			     "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
>  			     __func__, pte_val(*ptep), pte_val(pte));
> 
> 
> .
> 

Hi will
  
    From the print information, the only difference between pte and ptep is the PTE_SPECIAL bit.
And the the PTE access bit is all zero.
    diff below. Whether the access bit of the new ptep should be judged to eliminate the 
false positive?

Thanks 

Jiankang
Catalin Marinas - Dec. 8, 2017, 11:01 a.m.
On Fri, Dec 08, 2017 at 11:19:52AM +0800, chenjiankang wrote:
> 在 2017/12/7 21:23, Will Deacon 写道:
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 149d05fb9421..8fe103b1e101 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -42,6 +42,8 @@
> >  #include <asm/cmpxchg.h>
> >  #include <asm/fixmap.h>
> >  #include <linux/mmdebug.h>
> > +#include <linux/mm_types.h>
> > +#include <linux/sched.h>
> >  
> >  extern void __pte_error(const char *file, int line, unsigned long val);
> >  extern void __pmd_error(const char *file, int line, unsigned long val);
> > @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
> >  	}
> >  }
> >  
> > -struct mm_struct;
> > -struct vm_area_struct;
> > -
> >  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
> >  
> >  /*
> > @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> >  	 * hardware updates of the pte (ptep_set_access_flags safely changes
> >  	 * valid ptes without going through an invalid entry).
> >  	 */
> > -	if (pte_valid(*ptep) && pte_valid(pte)) {
> > +	if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
> > +	   (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
> >  		VM_WARN_ONCE(!pte_young(pte),
> >  			     "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
> >  			     __func__, pte_val(*ptep), pte_val(pte));
[...]
>     From the print information, the only difference between pte and ptep is the PTE_SPECIAL bit.
> And the the PTE access bit is all zero.
>     diff below. Whether the access bit of the new ptep should be judged to eliminate the 
> false positive?
[...]
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 2987d5a..3c1b0c6 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>          * valid ptes without going through an invalid entry).
>          */
>         if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
> -               VM_WARN_ONCE(!pte_young(pte),
> +               VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
>                              "%s: racy access flag clearing: %016llx -> %016llx",
>                              __func__, pte_val(*ptep), pte_val(pte));

It's actually the other way around: *ptep being "old" (AF = 0) could at
any point be made "young" by the hardware (AF = 1). This is racing with
the software update which keeps the AF bit 0.
JianKang Chen - Dec. 12, 2017, 3:13 a.m.
> On Fri, Dec 08, 2017 at 11:19:52AM +0800, chenjiankang wrote:
>> 在 2017/12/7 21:23, Will Deacon 写道:
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index 149d05fb9421..8fe103b1e101 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -42,6 +42,8 @@
>>>  #include <asm/cmpxchg.h>
>>>  #include <asm/fixmap.h>
>>>  #include <linux/mmdebug.h>
>>> +#include <linux/mm_types.h>
>>> +#include <linux/sched.h>
>>>  
>>>  extern void __pte_error(const char *file, int line, unsigned long val);
>>>  extern void __pmd_error(const char *file, int line, unsigned long val);
>>> @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>>>  	}
>>>  }
>>>  
>>> -struct mm_struct;
>>> -struct vm_area_struct;
>>> -
>>>  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
>>>  
>>>  /*
>>> @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>>  	 * hardware updates of the pte (ptep_set_access_flags safely changes
>>>  	 * valid ptes without going through an invalid entry).
>>>  	 */
>>> -	if (pte_valid(*ptep) && pte_valid(pte)) {
>>> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
>>> +	   (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
>>>  		VM_WARN_ONCE(!pte_young(pte),
>>>  			     "%s: racy access flag clearing: 0x%016llx -> 0x%016llx",
>>>  			     __func__, pte_val(*ptep), pte_val(pte));
> [...]

Hi Will:
     I contruct a simple use case that can reproduce this fail;

like this: 
   #define LEN 1024*1024*100
                       
int main(void){        
        int* addr = NULL; 
        int pid = -1; 
        addr = (int*)mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); 
        madvise(addr, LEN, MADV_HUGEPAGE);
        memset(addr, 1, LEN);
        pid = fork();                                                                                                                
                       
        if(pid==0){ 
                printf("wow! I am a child!\n");
        }              
        else {         
                printf("I am a father!\n");
                mmap(addr, 1024*1024*10, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); 
        }              
                       
        return 0;   
} 

And then, I use the will's modification,which can solve this problem;
Will, this patch should send a upstream?

>>     From the print information, the only difference between pte and ptep is the PTE_SPECIAL bit.
>> And the the PTE access bit is all zero.
>>     diff below. Whether the access bit of the new ptep should be judged to eliminate the 
>> false positive?
> [...]
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 2987d5a..3c1b0c6 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>          * valid ptes without going through an invalid entry).
>>          */
>>         if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
>> -               VM_WARN_ONCE(!pte_young(pte),
>> +               VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
>>                              "%s: racy access flag clearing: %016llx -> %016llx",
>>                              __func__, pte_val(*ptep), pte_val(pte));
> 
> It's actually the other way around: *ptep being "old" (AF = 0) could at
> any point be made "young" by the hardware (AF = 1). This is racing with
> the software update which keeps the AF bit 0.
>

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2987d5a..3c1b0c6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -206,7 +206,7 @@  static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
         * valid ptes without going through an invalid entry).
         */
        if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
-               VM_WARN_ONCE(!pte_young(pte),
+               VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
                             "%s: racy access flag clearing: %016llx -> %016llx",
                             __func__, pte_val(*ptep), pte_val(pte));
                VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(pte),