diff mbox series

mm: fix (harmless) type confusion in lock_vma_under_rcu()

Message ID 20240805-fix-vma-lock-type-confusion-v1-1-9f25443a9a71@google.com (mailing list archive)
State New
Headers show
Series mm: fix (harmless) type confusion in lock_vma_under_rcu() | expand

Commit Message

Jann Horn Aug. 5, 2024, 12:52 p.m. UTC
There is a (harmless) type confusion in lock_vma_under_rcu():
After vma_start_read(), we have taken the VMA lock but don't know yet
whether the VMA has already been detached and scheduled for RCU freeing.
At this point, ->vm_start and ->vm_end are accessed.

vm_area_struct contains a union such that ->vm_rcu uses the same memory as
->vm_start and ->vm_end; so accessing ->vm_start and ->vm_end of a detached
VMA is illegal and leads to type confusion between union members.

Fix it by reordering the vma->detached check above the address checks, and
document the rules for RCU readers accessing VMAs.

This will probably change the number of observed VMA_LOCK_MISS events
(since previously, trying to access a detached VMA whose ->vm_rcu has been
scheduled would bail out when checking the fault address against the
rcu_head members reinterpreted as VMA bounds).

Fixes: 50ee32537206 ("mm: introduce lock_vma_under_rcu to be used from arch-specific code")
Signed-off-by: Jann Horn <jannh@google.com>
---
sidenote: I'm not entirely sure why we handle detached VMAs and moved
VMAs differently here (detached VMAs retry, moved VMAs bail out), but
that's kinda out of scope of this patch.
---
 include/linux/mm_types.h | 15 +++++++++++++--
 mm/memory.c              | 14 ++++++++++----
 2 files changed, 23 insertions(+), 6 deletions(-)


---
base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed
change-id: 20240805-fix-vma-lock-type-confusion-0a956d9d31ae

Comments

Suren Baghdasaryan Aug. 6, 2024, 4:13 p.m. UTC | #1
On Mon, Aug 5, 2024 at 5:52 AM Jann Horn <jannh@google.com> wrote:
>
> There is a (harmless) type confusion in lock_vma_under_rcu():
> After vma_start_read(), we have taken the VMA lock but don't know yet
> whether the VMA has already been detached and scheduled for RCU freeing.
> At this point, ->vm_start and ->vm_end are accessed.
>
> vm_area_struct contains a union such that ->vm_rcu uses the same memory as
> ->vm_start and ->vm_end; so accessing ->vm_start and ->vm_end of a detached
> VMA is illegal and leads to type confusion between union members.
>
> Fix it by reordering the vma->detached check above the address checks, and
> document the rules for RCU readers accessing VMAs.
>
> This will probably change the number of observed VMA_LOCK_MISS events
> (since previously, trying to access a detached VMA whose ->vm_rcu has been
> scheduled would bail out when checking the fault address against the
> rcu_head members reinterpreted as VMA bounds).
>
> Fixes: 50ee32537206 ("mm: introduce lock_vma_under_rcu to be used from arch-specific code")
> Signed-off-by: Jann Horn <jannh@google.com>

Thanks for fixing this subtle issue and clearly documenting the rules!
Not sure if we should CC stable? It is harmless but it's still a bug...

Acked-by: Suren Baghdasaryan <surenb@google.com>

> ---
> sidenote: I'm not entirely sure why we handle detached VMAs and moved
> VMAs differently here (detached VMAs retry, moved VMAs bail out), but
> that's kinda out of scope of this patch.

Yeah, technically in both cases the address space is being modified
and we should bail out and retry with mmap_lock. I just think if the
VMA got replaced while we are calling lock_vma_under_rcu(), it's
reasonable to retry the search and try finding the new VMA if it's
already established and unlocked.

> ---
>  include/linux/mm_types.h | 15 +++++++++++++--
>  mm/memory.c              | 14 ++++++++++----
>  2 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 485424979254..498cdf3121d0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -657,58 +657,69 @@ struct vma_numab_state {
>
>  /*
>   * This struct describes a virtual memory area. There is one of these
>   * per VM-area/task. A VM area is any part of the process virtual memory
>   * space that has a special rule for the page-fault handlers (ie a shared
>   * library, the executable area etc).
> + *
> + * Only explicitly marked struct members may be accessed by RCU readers before
> + * getting a stable reference.
>   */
>  struct vm_area_struct {
>         /* The first cache line has the info for VMA tree walking. */
>
>         union {
>                 struct {
>                         /* VMA covers [vm_start; vm_end) addresses within mm */
>                         unsigned long vm_start;
>                         unsigned long vm_end;
>                 };
>  #ifdef CONFIG_PER_VMA_LOCK
>                 struct rcu_head vm_rcu; /* Used for deferred freeing. */
>  #endif
>         };
>
> -       struct mm_struct *vm_mm;        /* The address space we belong to. */
> +       /*
> +        * The address space we belong to.
> +        * Unstable RCU readers are allowed to read this.
> +        */
> +       struct mm_struct *vm_mm;
>         pgprot_t vm_page_prot;          /* Access permissions of this VMA. */
>
>         /*
>          * Flags, see mm.h.
>          * To modify use vm_flags_{init|reset|set|clear|mod} functions.
>          */
>         union {
>                 const vm_flags_t vm_flags;
>                 vm_flags_t __private __vm_flags;
>         };
>
>  #ifdef CONFIG_PER_VMA_LOCK
> -       /* Flag to indicate areas detached from the mm->mm_mt tree */
> +       /*
> +        * Flag to indicate areas detached from the mm->mm_mt tree.
> +        * Unstable RCU readers are allowed to read this.
> +        */
>         bool detached;
>
>         /*
>          * Can only be written (using WRITE_ONCE()) while holding both:
>          *  - mmap_lock (in write mode)
>          *  - vm_lock->lock (in write mode)
>          * Can be read reliably while holding one of:
>          *  - mmap_lock (in read or write mode)
>          *  - vm_lock->lock (in read or write mode)
>          * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
>          * while holding nothing (except RCU to keep the VMA struct allocated).
>          *
>          * This sequence counter is explicitly allowed to overflow; sequence
>          * counter reuse can only lead to occasional unnecessary use of the
>          * slowpath.
>          */
>         int vm_lock_seq;
> +       /* Unstable RCU readers are allowed to read this. */
>         struct vma_lock *vm_lock;
>  #endif
>
>         /*
>          * For areas with an address space and backing store,
>          * linkage into the address_space->i_mmap interval tree.
> diff --git a/mm/memory.c b/mm/memory.c
> index 34f8402d2046..3f4232b985a1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5996,23 +5996,29 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>         if (!vma)
>                 goto inval;
>
>         if (!vma_start_read(vma))
>                 goto inval;
>
> -       /* Check since vm_start/vm_end might change before we lock the VMA */
> -       if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> -               goto inval_end_read;
> -
>         /* Check if the VMA got isolated after we found it */
>         if (vma->detached) {
>                 vma_end_read(vma);
>                 count_vm_vma_lock_event(VMA_LOCK_MISS);
>                 /* The area was replaced with another one */
>                 goto retry;
>         }
> +       /*
> +        * At this point, we have a stable reference to a VMA: The VMA is
> +        * locked and we know it hasn't already been isolated.
> +        * From here on, we can access the VMA without worrying about which
> +        * fields are accessible for RCU readers.
> +        */
> +
> +       /* Check since vm_start/vm_end might change before we lock the VMA */
> +       if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> +               goto inval_end_read;
>
>         rcu_read_unlock();
>         return vma;
>
>  inval_end_read:
>         vma_end_read(vma);
>
> ---
> base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed
> change-id: 20240805-fix-vma-lock-type-confusion-0a956d9d31ae
> --
> Jann Horn <jannh@google.com>
>
Jann Horn Aug. 6, 2024, 4:20 p.m. UTC | #2
On Tue, Aug 6, 2024 at 6:14 PM Suren Baghdasaryan <surenb@google.com> wrote:
> On Mon, Aug 5, 2024 at 5:52 AM Jann Horn <jannh@google.com> wrote:
> >
> > There is a (harmless) type confusion in lock_vma_under_rcu():
> > After vma_start_read(), we have taken the VMA lock but don't know yet
> > whether the VMA has already been detached and scheduled for RCU freeing.
> > At this point, ->vm_start and ->vm_end are accessed.
> >
> > vm_area_struct contains a union such that ->vm_rcu uses the same memory as
> > ->vm_start and ->vm_end; so accessing ->vm_start and ->vm_end of a detached
> > VMA is illegal and leads to type confusion between union members.
> >
> > Fix it by reordering the vma->detached check above the address checks, and
> > document the rules for RCU readers accessing VMAs.
> >
> > This will probably change the number of observed VMA_LOCK_MISS events
> > (since previously, trying to access a detached VMA whose ->vm_rcu has been
> > scheduled would bail out when checking the fault address against the
> > rcu_head members reinterpreted as VMA bounds).
> >
> > Fixes: 50ee32537206 ("mm: introduce lock_vma_under_rcu to be used from arch-specific code")
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Thanks for fixing this subtle issue and clearly documenting the rules!
> Not sure if we should CC stable? It is harmless but it's still a bug...

Yeah, I'm not sure - I guess it kinda depends on how much we care
about VMA_LOCK_MISS being accurate?

> Acked-by: Suren Baghdasaryan <surenb@google.com>

Thanks!
diff mbox series

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 485424979254..498cdf3121d0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -657,58 +657,69 @@  struct vma_numab_state {
 
 /*
  * This struct describes a virtual memory area. There is one of these
  * per VM-area/task. A VM area is any part of the process virtual memory
  * space that has a special rule for the page-fault handlers (ie a shared
  * library, the executable area etc).
+ *
+ * Only explicitly marked struct members may be accessed by RCU readers before
+ * getting a stable reference.
  */
 struct vm_area_struct {
 	/* The first cache line has the info for VMA tree walking. */
 
 	union {
 		struct {
 			/* VMA covers [vm_start; vm_end) addresses within mm */
 			unsigned long vm_start;
 			unsigned long vm_end;
 		};
 #ifdef CONFIG_PER_VMA_LOCK
 		struct rcu_head vm_rcu;	/* Used for deferred freeing. */
 #endif
 	};
 
-	struct mm_struct *vm_mm;	/* The address space we belong to. */
+	/*
+	 * The address space we belong to.
+	 * Unstable RCU readers are allowed to read this.
+	 */
+	struct mm_struct *vm_mm;
 	pgprot_t vm_page_prot;          /* Access permissions of this VMA. */
 
 	/*
 	 * Flags, see mm.h.
 	 * To modify use vm_flags_{init|reset|set|clear|mod} functions.
 	 */
 	union {
 		const vm_flags_t vm_flags;
 		vm_flags_t __private __vm_flags;
 	};
 
 #ifdef CONFIG_PER_VMA_LOCK
-	/* Flag to indicate areas detached from the mm->mm_mt tree */
+	/*
+	 * Flag to indicate areas detached from the mm->mm_mt tree.
+	 * Unstable RCU readers are allowed to read this.
+	 */
 	bool detached;
 
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
 	 *  - vm_lock->lock (in write mode)
 	 * Can be read reliably while holding one of:
 	 *  - mmap_lock (in read or write mode)
 	 *  - vm_lock->lock (in read or write mode)
 	 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
 	 * while holding nothing (except RCU to keep the VMA struct allocated).
 	 *
 	 * This sequence counter is explicitly allowed to overflow; sequence
 	 * counter reuse can only lead to occasional unnecessary use of the
 	 * slowpath.
 	 */
 	int vm_lock_seq;
+	/* Unstable RCU readers are allowed to read this. */
 	struct vma_lock *vm_lock;
 #endif
 
 	/*
 	 * For areas with an address space and backing store,
 	 * linkage into the address_space->i_mmap interval tree.
diff --git a/mm/memory.c b/mm/memory.c
index 34f8402d2046..3f4232b985a1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5996,23 +5996,29 @@  struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 	if (!vma)
 		goto inval;
 
 	if (!vma_start_read(vma))
 		goto inval;
 
-	/* Check since vm_start/vm_end might change before we lock the VMA */
-	if (unlikely(address < vma->vm_start || address >= vma->vm_end))
-		goto inval_end_read;
-
 	/* Check if the VMA got isolated after we found it */
 	if (vma->detached) {
 		vma_end_read(vma);
 		count_vm_vma_lock_event(VMA_LOCK_MISS);
 		/* The area was replaced with another one */
 		goto retry;
 	}
+	/*
+	 * At this point, we have a stable reference to a VMA: The VMA is
+	 * locked and we know it hasn't already been isolated.
+	 * From here on, we can access the VMA without worrying about which
+	 * fields are accessible for RCU readers.
+	 */
+
+	/* Check since vm_start/vm_end might change before we lock the VMA */
+	if (unlikely(address < vma->vm_start || address >= vma->vm_end))
+		goto inval_end_read;
 
 	rcu_read_unlock();
 	return vma;
 
 inval_end_read:
 	vma_end_read(vma);