Message ID | 20240805-fix-vma-lock-type-confusion-v1-1-9f25443a9a71@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: fix (harmless) type confusion in lock_vma_under_rcu() | expand |
On Mon, Aug 5, 2024 at 5:52 AM Jann Horn <jannh@google.com> wrote: > > There is a (harmless) type confusion in lock_vma_under_rcu(): > After vma_start_read(), we have taken the VMA lock but don't know yet > whether the VMA has already been detached and scheduled for RCU freeing. > At this point, ->vm_start and ->vm_end are accessed. > > vm_area_struct contains a union such that ->vm_rcu uses the same memory as > ->vm_start and ->vm_end; so accessing ->vm_start and ->vm_end of a detached > VMA is illegal and leads to type confusion between union members. > > Fix it by reordering the vma->detached check above the address checks, and > document the rules for RCU readers accessing VMAs. > > This will probably change the number of observed VMA_LOCK_MISS events > (since previously, trying to access a detached VMA whose ->vm_rcu has been > scheduled would bail out when checking the fault address against the > rcu_head members reinterpreted as VMA bounds). > > Fixes: 50ee32537206 ("mm: introduce lock_vma_under_rcu to be used from arch-specific code") > Signed-off-by: Jann Horn <jannh@google.com> Thanks for fixing this subtle issue and clearly documenting the rules! Not sure if we should CC stable? It is harmless but it's still a bug... Acked-by: Suren Baghdasaryan <surenb@google.com> > --- > sidenote: I'm not entirely sure why we handle detached VMAs and moved > VMAs differently here (detached VMAs retry, moved VMAs bail out), but > that's kinda out of scope of this patch. Yeah, technically in both cases the address space is being modified and we should bail out and retry with mmap_lock. I just think if the VMA got replaced while we are calling lock_vma_under_rcu(), it's reasonable to retry the search and try finding the new VMA if it's already established and unlocked. > --- > include/linux/mm_types.h | 15 +++++++++++++-- > mm/memory.c | 14 ++++++++++---- > 2 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 485424979254..498cdf3121d0 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -657,58 +657,69 @@ struct vma_numab_state { > > /* > * This struct describes a virtual memory area. There is one of these > * per VM-area/task. A VM area is any part of the process virtual memory > * space that has a special rule for the page-fault handlers (ie a shared > * library, the executable area etc). > + * > + * Only explicitly marked struct members may be accessed by RCU readers before > + * getting a stable reference. > */ > struct vm_area_struct { > /* The first cache line has the info for VMA tree walking. */ > > union { > struct { > /* VMA covers [vm_start; vm_end) addresses within mm */ > unsigned long vm_start; > unsigned long vm_end; > }; > #ifdef CONFIG_PER_VMA_LOCK > struct rcu_head vm_rcu; /* Used for deferred freeing. */ > #endif > }; > > - struct mm_struct *vm_mm; /* The address space we belong to. */ > + /* > + * The address space we belong to. > + * Unstable RCU readers are allowed to read this. > + */ > + struct mm_struct *vm_mm; > pgprot_t vm_page_prot; /* Access permissions of this VMA. */ > > /* > * Flags, see mm.h. > * To modify use vm_flags_{init|reset|set|clear|mod} functions. > */ > union { > const vm_flags_t vm_flags; > vm_flags_t __private __vm_flags; > }; > > #ifdef CONFIG_PER_VMA_LOCK > - /* Flag to indicate areas detached from the mm->mm_mt tree */ > + /* > + * Flag to indicate areas detached from the mm->mm_mt tree. > + * Unstable RCU readers are allowed to read this. > + */ > bool detached; > > /* > * Can only be written (using WRITE_ONCE()) while holding both: > * - mmap_lock (in write mode) > * - vm_lock->lock (in write mode) > * Can be read reliably while holding one of: > * - mmap_lock (in read or write mode) > * - vm_lock->lock (in read or write mode) > * Can be read unreliably (using READ_ONCE()) for pessimistic bailout > * while holding nothing (except RCU to keep the VMA struct allocated). > * > * This sequence counter is explicitly allowed to overflow; sequence > * counter reuse can only lead to occasional unnecessary use of the > * slowpath. > */ > int vm_lock_seq; > + /* Unstable RCU readers are allowed to read this. */ > struct vma_lock *vm_lock; > #endif > > /* > * For areas with an address space and backing store, > * linkage into the address_space->i_mmap interval tree. > diff --git a/mm/memory.c b/mm/memory.c > index 34f8402d2046..3f4232b985a1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5996,23 +5996,29 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > if (!vma) > goto inval; > > if (!vma_start_read(vma)) > goto inval; > > - /* Check since vm_start/vm_end might change before we lock the VMA */ > - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) > - goto inval_end_read; > - > /* Check if the VMA got isolated after we found it */ > if (vma->detached) { > vma_end_read(vma); > count_vm_vma_lock_event(VMA_LOCK_MISS); > /* The area was replaced with another one */ > goto retry; > } > + /* > + * At this point, we have a stable reference to a VMA: The VMA is > + * locked and we know it hasn't already been isolated. > + * From here on, we can access the VMA without worrying about which > + * fields are accessible for RCU readers. > + */ > + > + /* Check since vm_start/vm_end might change before we lock the VMA */ > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) > + goto inval_end_read; > > rcu_read_unlock(); > return vma; > > inval_end_read: > vma_end_read(vma); > > --- > base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed > change-id: 20240805-fix-vma-lock-type-confusion-0a956d9d31ae > -- > Jann Horn <jannh@google.com> >
On Tue, Aug 6, 2024 at 6:14 PM Suren Baghdasaryan <surenb@google.com> wrote: > On Mon, Aug 5, 2024 at 5:52 AM Jann Horn <jannh@google.com> wrote: > > > > There is a (harmless) type confusion in lock_vma_under_rcu(): > > After vma_start_read(), we have taken the VMA lock but don't know yet > > whether the VMA has already been detached and scheduled for RCU freeing. > > At this point, ->vm_start and ->vm_end are accessed. > > > > vm_area_struct contains a union such that ->vm_rcu uses the same memory as > > ->vm_start and ->vm_end; so accessing ->vm_start and ->vm_end of a detached > > VMA is illegal and leads to type confusion between union members. > > > > Fix it by reordering the vma->detached check above the address checks, and > > document the rules for RCU readers accessing VMAs. > > > > This will probably change the number of observed VMA_LOCK_MISS events > > (since previously, trying to access a detached VMA whose ->vm_rcu has been > > scheduled would bail out when checking the fault address against the > > rcu_head members reinterpreted as VMA bounds). > > > > Fixes: 50ee32537206 ("mm: introduce lock_vma_under_rcu to be used from arch-specific code") > > Signed-off-by: Jann Horn <jannh@google.com> > > Thanks for fixing this subtle issue and clearly documenting the rules! > Not sure if we should CC stable? It is harmless but it's still a bug... Yeah, I'm not sure - I guess it kinda depends on how much we care about VMA_LOCK_MISS being accurate? > Acked-by: Suren Baghdasaryan <surenb@google.com> Thanks!
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 485424979254..498cdf3121d0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -657,58 +657,69 @@ struct vma_numab_state { /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory * space that has a special rule for the page-fault handlers (ie a shared * library, the executable area etc). + * + * Only explicitly marked struct members may be accessed by RCU readers before + * getting a stable reference. */ struct vm_area_struct { /* The first cache line has the info for VMA tree walking. */ union { struct { /* VMA covers [vm_start; vm_end) addresses within mm */ unsigned long vm_start; unsigned long vm_end; }; #ifdef CONFIG_PER_VMA_LOCK struct rcu_head vm_rcu; /* Used for deferred freeing. */ #endif }; - struct mm_struct *vm_mm; /* The address space we belong to. */ + /* + * The address space we belong to. + * Unstable RCU readers are allowed to read this. + */ + struct mm_struct *vm_mm; pgprot_t vm_page_prot; /* Access permissions of this VMA. */ /* * Flags, see mm.h. * To modify use vm_flags_{init|reset|set|clear|mod} functions. */ union { const vm_flags_t vm_flags; vm_flags_t __private __vm_flags; }; #ifdef CONFIG_PER_VMA_LOCK - /* Flag to indicate areas detached from the mm->mm_mt tree */ + /* + * Flag to indicate areas detached from the mm->mm_mt tree. + * Unstable RCU readers are allowed to read this. + */ bool detached; /* * Can only be written (using WRITE_ONCE()) while holding both: * - mmap_lock (in write mode) * - vm_lock->lock (in write mode) * Can be read reliably while holding one of: * - mmap_lock (in read or write mode) * - vm_lock->lock (in read or write mode) * Can be read unreliably (using READ_ONCE()) for pessimistic bailout * while holding nothing (except RCU to keep the VMA struct allocated). * * This sequence counter is explicitly allowed to overflow; sequence * counter reuse can only lead to occasional unnecessary use of the * slowpath. */ int vm_lock_seq; + /* Unstable RCU readers are allowed to read this. */ struct vma_lock *vm_lock; #endif /* * For areas with an address space and backing store, * linkage into the address_space->i_mmap interval tree. diff --git a/mm/memory.c b/mm/memory.c index 34f8402d2046..3f4232b985a1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5996,23 +5996,29 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, if (!vma) goto inval; if (!vma_start_read(vma)) goto inval; - /* Check since vm_start/vm_end might change before we lock the VMA */ - if (unlikely(address < vma->vm_start || address >= vma->vm_end)) - goto inval_end_read; - /* Check if the VMA got isolated after we found it */ if (vma->detached) { vma_end_read(vma); count_vm_vma_lock_event(VMA_LOCK_MISS); /* The area was replaced with another one */ goto retry; } + /* + * At this point, we have a stable reference to a VMA: The VMA is + * locked and we know it hasn't already been isolated. + * From here on, we can access the VMA without worrying about which + * fields are accessible for RCU readers. + */ + + /* Check since vm_start/vm_end might change before we lock the VMA */ + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) + goto inval_end_read; rcu_read_unlock(); return vma; inval_end_read: vma_end_read(vma);
There is a (harmless) type confusion in lock_vma_under_rcu(): After vma_start_read(), we have taken the VMA lock but don't know yet whether the VMA has already been detached and scheduled for RCU freeing. At this point, ->vm_start and ->vm_end are accessed. vm_area_struct contains a union such that ->vm_rcu uses the same memory as ->vm_start and ->vm_end; so accessing ->vm_start and ->vm_end of a detached VMA is illegal and leads to type confusion between union members. Fix it by reordering the vma->detached check above the address checks, and document the rules for RCU readers accessing VMAs. This will probably change the number of observed VMA_LOCK_MISS events (since previously, trying to access a detached VMA whose ->vm_rcu has been scheduled would bail out when checking the fault address against the rcu_head members reinterpreted as VMA bounds). Fixes: 50ee32537206 ("mm: introduce lock_vma_under_rcu to be used from arch-specific code") Signed-off-by: Jann Horn <jannh@google.com> --- sidenote: I'm not entirely sure why we handle detached VMAs and moved VMAs differently here (detached VMAs retry, moved VMAs bail out), but that's kinda out of scope of this patch. --- include/linux/mm_types.h | 15 +++++++++++++-- mm/memory.c | 14 ++++++++++---- 2 files changed, 23 insertions(+), 6 deletions(-) --- base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed change-id: 20240805-fix-vma-lock-type-confusion-0a956d9d31ae