diff mbox series

mm, kmsan: fix infinite recursion due to RCU critical section

Message ID 20240118110022.2538350-1-elver@google.com (mailing list archive)
State New
Headers show
Series mm, kmsan: fix infinite recursion due to RCU critical section | expand

Commit Message

Marco Elver Jan. 18, 2024, 10:59 a.m. UTC
Alexander Potapenko writes in [1]: "For every memory access in the code
instrumented by KMSAN we call kmsan_get_metadata() to obtain the
metadata for the memory being accessed. For virtual memory the metadata
pointers are stored in the corresponding `struct page`, therefore we
need to call virt_to_page() to get them.

According to the comment in arch/x86/include/asm/page.h,
virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr)
is true, so KMSAN needs to call virt_addr_valid() as well.

To avoid recursion, kmsan_get_metadata() must not call instrumented
code, therefore ./arch/x86/include/asm/kmsan.h forks parts of
arch/x86/mm/physaddr.c to check whether a virtual address is valid or
not.

But the introduction of rcu_read_lock() to pfn_valid() added
instrumented RCU API calls to virt_to_page_or_null(), which is called by
kmsan_get_metadata(), so there is an infinite recursion now.  I do not
think it is correct to stop that recursion by doing
kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that
would prevent instrumented functions called from within the runtime from
tracking the shadow values, which might introduce false positives."

Fix the issue by switching pfn_valid() to the _sched() variant of
rcu_read_lock/unlock(), which does not require calling into RCU. Given
the critical section in pfn_valid() is very small, this is a reasonable
trade-off (with preemptible RCU).

KMSAN further needs to be careful to suppress calls into the scheduler,
which would be another source of recursion. This can be done by wrapping
the call to pfn_valid() into preempt_disable/enable_no_resched(). The
downside is that this sacrifices breaking scheduling guarantees;
however, a kernel compiled with KMSAN has already given up any
performance guarantees due to being heavily instrumented.

Note, KMSAN code already disables tracing via Makefile, and since
mmzone.h is included, it is not necessary to use the notrace variant,
which is generally preferred in all other cases.

Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1]
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
---
 arch/x86/include/asm/kmsan.h | 17 ++++++++++++++++-
 include/linux/mmzone.h       |  6 +++---
 2 files changed, 19 insertions(+), 4 deletions(-)

Comments

Marco Elver Jan. 18, 2024, 11:07 a.m. UTC | #1
On Thu, 18 Jan 2024 at 12:00, Marco Elver <elver@google.com> wrote:
>
> Alexander Potapenko writes in [1]: "For every memory access in the code
> instrumented by KMSAN we call kmsan_get_metadata() to obtain the
> metadata for the memory being accessed. For virtual memory the metadata
> pointers are stored in the corresponding `struct page`, therefore we
> need to call virt_to_page() to get them.
>
> According to the comment in arch/x86/include/asm/page.h,
> virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr)
> is true, so KMSAN needs to call virt_addr_valid() as well.
>
> To avoid recursion, kmsan_get_metadata() must not call instrumented
> code, therefore ./arch/x86/include/asm/kmsan.h forks parts of
> arch/x86/mm/physaddr.c to check whether a virtual address is valid or
> not.
>
> But the introduction of rcu_read_lock() to pfn_valid() added
> instrumented RCU API calls to virt_to_page_or_null(), which is called by
> kmsan_get_metadata(), so there is an infinite recursion now.  I do not
> think it is correct to stop that recursion by doing
> kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that
> would prevent instrumented functions called from within the runtime from
> tracking the shadow values, which might introduce false positives."
>
> Fix the issue by switching pfn_valid() to the _sched() variant of
> rcu_read_lock/unlock(), which does not require calling into RCU. Given
> the critical section in pfn_valid() is very small, this is a reasonable
> trade-off (with preemptible RCU).
>
> KMSAN further needs to be careful to suppress calls into the scheduler,
> which would be another source of recursion. This can be done by wrapping
> the call to pfn_valid() into preempt_disable/enable_no_resched(). The
> downside is that this sacrifices breaking scheduling guarantees;
> however, a kernel compiled with KMSAN has already given up any
> performance guarantees due to being heavily instrumented.
>
> Note, KMSAN code already disables tracing via Makefile, and since
> mmzone.h is included, it is not necessary to use the notrace variant,
> which is generally preferred in all other cases.
>
> Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1]
> Reported-by: Alexander Potapenko <glider@google.com>
> Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com
> Signed-off-by: Marco Elver <elver@google.com>
> Cc: Charan Teja Kalla <quic_charante@quicinc.com>

This might want a:

Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing
memory_section->usage")

For reference which patch introduced the problem.

> ---
>  arch/x86/include/asm/kmsan.h | 17 ++++++++++++++++-
>  include/linux/mmzone.h       |  6 +++---
>  2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kmsan.h b/arch/x86/include/asm/kmsan.h
> index 8fa6ac0e2d76..d91b37f5b4bb 100644
> --- a/arch/x86/include/asm/kmsan.h
> +++ b/arch/x86/include/asm/kmsan.h
> @@ -64,6 +64,7 @@ static inline bool kmsan_virt_addr_valid(void *addr)
>  {
>         unsigned long x = (unsigned long)addr;
>         unsigned long y = x - __START_KERNEL_map;
> +       bool ret;
>
>         /* use the carry flag to determine if x was < __START_KERNEL_map */
>         if (unlikely(x > y)) {
> @@ -79,7 +80,21 @@ static inline bool kmsan_virt_addr_valid(void *addr)
>                         return false;
>         }
>
> -       return pfn_valid(x >> PAGE_SHIFT);
> +       /*
> +        * pfn_valid() relies on RCU, and may call into the scheduler on exiting
> +        * the critical section. However, this would result in recursion with
> +        * KMSAN. Therefore, disable preemption here, and re-enable preemption
> +        * below while suppressing reschedules to avoid recursion.
> +        *
> +        * Note, this sacrifices occasionally breaking scheduling guarantees.
> +        * Although, a kernel compiled with KMSAN has already given up on any
> +        * performance guarantees due to being heavily instrumented.
> +        */
> +       preempt_disable();
> +       ret = pfn_valid(x >> PAGE_SHIFT);
> +       preempt_enable_no_resched();
> +
> +       return ret;
>  }
>
>  #endif /* !MODULE */
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4ed33b127821..a497f189d988 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2013,9 +2013,9 @@ static inline int pfn_valid(unsigned long pfn)
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
>         ms = __pfn_to_section(pfn);
> -       rcu_read_lock();
> +       rcu_read_lock_sched();
>         if (!valid_section(ms)) {
> -               rcu_read_unlock();
> +               rcu_read_unlock_sched();
>                 return 0;
>         }
>         /*
> @@ -2023,7 +2023,7 @@ static inline int pfn_valid(unsigned long pfn)
>          * the entire section-sized span.
>          */
>         ret = early_section(ms) || pfn_section_valid(ms, pfn);
> -       rcu_read_unlock();
> +       rcu_read_unlock_sched();
>
>         return ret;
>  }
> --
> 2.43.0.381.gb435a96ce8-goog
>
Charan Teja Kalla Jan. 18, 2024, 11:27 a.m. UTC | #2
May I ask if KMSAN also instruments the access to the memory managed as
ZONE_DEVICE. You know this is not the RAM and also these pages will
never be onlined thus also not be available in buddy.

Reason for the ask is that this patch is introduced because of a race
between pfn walker ends up in pfn of zone device memory.

If KMSAN never instruments this, does it look good to you to have the
KMSAN version of pfn_valid(), as being suggested by Alexander in the
other mail.

Thanks,

On 1/18/2024 4:37 PM, Marco Elver wrote:
> On Thu, 18 Jan 2024 at 12:00, Marco Elver <elver@google.com> wrote:
>>
>> Alexander Potapenko writes in [1]: "For every memory access in the code
>> instrumented by KMSAN we call kmsan_get_metadata() to obtain the
>> metadata for the memory being accessed. For virtual memory the metadata
>> pointers are stored in the corresponding `struct page`, therefore we
>> need to call virt_to_page() to get them.
>>
>> According to the comment in arch/x86/include/asm/page.h,
>> virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr)
>> is true, so KMSAN needs to call virt_addr_valid() as well.
>>
>> To avoid recursion, kmsan_get_metadata() must not call instrumented
>> code, therefore ./arch/x86/include/asm/kmsan.h forks parts of
>> arch/x86/mm/physaddr.c to check whether a virtual address is valid or
>> not.
>>
>> But the introduction of rcu_read_lock() to pfn_valid() added
>> instrumented RCU API calls to virt_to_page_or_null(), which is called by
>> kmsan_get_metadata(), so there is an infinite recursion now.  I do not
>> think it is correct to stop that recursion by doing
>> kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that
>> would prevent instrumented functions called from within the runtime from
>> tracking the shadow values, which might introduce false positives."
>>
>> Fix the issue by switching pfn_valid() to the _sched() variant of
>> rcu_read_lock/unlock(), which does not require calling into RCU. Given
>> the critical section in pfn_valid() is very small, this is a reasonable
>> trade-off (with preemptible RCU).
>>
>> KMSAN further needs to be careful to suppress calls into the scheduler,
>> which would be another source of recursion. This can be done by wrapping
>> the call to pfn_valid() into preempt_disable/enable_no_resched(). The
>> downside is that this sacrifices breaking scheduling guarantees;
>> however, a kernel compiled with KMSAN has already given up any
>> performance guarantees due to being heavily instrumented.
>>
>> Note, KMSAN code already disables tracing via Makefile, and since
>> mmzone.h is included, it is not necessary to use the notrace variant,
>> which is generally preferred in all other cases.
>>
>> Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1]
>> Reported-by: Alexander Potapenko <glider@google.com>
>> Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com
>> Signed-off-by: Marco Elver <elver@google.com>
>> Cc: Charan Teja Kalla <quic_charante@quicinc.com>
> 
> This might want a:
> 
> Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing
> memory_section->usage")
> 
> For reference which patch introduced the problem.
> 
>> ---
>>  arch/x86/include/asm/kmsan.h | 17 ++++++++++++++++-
>>  include/linux/mmzone.h       |  6 +++---
>>  2 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kmsan.h b/arch/x86/include/asm/kmsan.h
>> index 8fa6ac0e2d76..d91b37f5b4bb 100644
>> --- a/arch/x86/include/asm/kmsan.h
>> +++ b/arch/x86/include/asm/kmsan.h
>> @@ -64,6 +64,7 @@ static inline bool kmsan_virt_addr_valid(void *addr)
>>  {
>>         unsigned long x = (unsigned long)addr;
>>         unsigned long y = x - __START_KERNEL_map;
>> +       bool ret;
>>
>>         /* use the carry flag to determine if x was < __START_KERNEL_map */
>>         if (unlikely(x > y)) {
>> @@ -79,7 +80,21 @@ static inline bool kmsan_virt_addr_valid(void *addr)
>>                         return false;
>>         }
>>
>> -       return pfn_valid(x >> PAGE_SHIFT);
>> +       /*
>> +        * pfn_valid() relies on RCU, and may call into the scheduler on exiting
>> +        * the critical section. However, this would result in recursion with
>> +        * KMSAN. Therefore, disable preemption here, and re-enable preemption
>> +        * below while suppressing reschedules to avoid recursion.
>> +        *
>> +        * Note, this sacrifices occasionally breaking scheduling guarantees.
>> +        * Although, a kernel compiled with KMSAN has already given up on any
>> +        * performance guarantees due to being heavily instrumented.
>> +        */
>> +       preempt_disable();
>> +       ret = pfn_valid(x >> PAGE_SHIFT);
>> +       preempt_enable_no_resched();
>> +
>> +       return ret;
>>  }
>>
>>  #endif /* !MODULE */
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 4ed33b127821..a497f189d988 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -2013,9 +2013,9 @@ static inline int pfn_valid(unsigned long pfn)
>>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>>                 return 0;
>>         ms = __pfn_to_section(pfn);
>> -       rcu_read_lock();
>> +       rcu_read_lock_sched();
>>         if (!valid_section(ms)) {
>> -               rcu_read_unlock();
>> +               rcu_read_unlock_sched();
>>                 return 0;
>>         }
>>         /*
>> @@ -2023,7 +2023,7 @@ static inline int pfn_valid(unsigned long pfn)
>>          * the entire section-sized span.
>>          */
>>         ret = early_section(ms) || pfn_section_valid(ms, pfn);
>> -       rcu_read_unlock();
>> +       rcu_read_unlock_sched();
>>
>>         return ret;
>>  }
>> --
>> 2.43.0.381.gb435a96ce8-goog
>>
Marco Elver Jan. 18, 2024, 12:22 p.m. UTC | #3
On Thu, 18 Jan 2024 at 12:28, Charan Teja Kalla
<quic_charante@quicinc.com> wrote:
>
> May I ask if KMSAN also instruments the access to the memory managed as
> ZONE_DEVICE. You know this is not the RAM and also these pages will
> never be onlined thus also not be available in buddy.
>
> Reason for the ask is that this patch is introduced because of a race
> between pfn walker ends up in pfn of zone device memory.
>
> If KMSAN never instruments this, does it look good to you to have the
> KMSAN version of pfn_valid(), as being suggested by Alexander in the
> other mail.

It would be nice to avoid duplicating functions - both options have downsides:
1. Shared pfn_valid(): it might break for KMSAN again in future if new
recursion is introduced.
2. KMSAN-version of pfn_valid(): it might break if pfn_valid() changes
in future.

I suspect #1 is less likely.

What is your main concern by switching to rcu_read_lock_sched()?
Alexander Potapenko Jan. 18, 2024, 12:46 p.m. UTC | #4
On Thu, Jan 18, 2024 at 12:28 PM Charan Teja Kalla
<quic_charante@quicinc.com> wrote:
>
> May I ask if KMSAN also instruments the access to the memory managed as
> ZONE_DEVICE. You know this is not the RAM and also these pages will
> never be onlined thus also not be available in buddy.

Is there a way to tell whether a memory chunk belongs to ZONE_DEVICE
by its address?
Won't such check involve calling pfn_valid() or a similar function
that would also require synchronization?

In general, if e.g. one can call memset() on ZONE_DEVICE memory, it is
already going to be instrumented by KMSAN.
Charan Teja Kalla Jan. 19, 2024, 5 p.m. UTC | #5
On 1/18/2024 5:52 PM, Marco Elver wrote:
> It would be nice to avoid duplicating functions - both options have downsides:
> 1. Shared pfn_valid(): it might break for KMSAN again in future if new
> recursion is introduced.
> 2. KMSAN-version of pfn_valid(): it might break if pfn_valid() changes
> in future.
> 
> I suspect #1 is less likely.
> 
> What is your main concern by switching to rcu_read_lock_sched()?

No concerns from my side. Just wanted to know the thought behind
changing the pfn_valid instead of kmsan version, like for some
functions. Thanks for the clarification.
Alexander Potapenko Jan. 22, 2024, 9:56 a.m. UTC | #6
On Thu, Jan 18, 2024 at 12:00 PM Marco Elver <elver@google.com> wrote:
>
> Alexander Potapenko writes in [1]: "For every memory access in the code
> instrumented by KMSAN we call kmsan_get_metadata() to obtain the
> metadata for the memory being accessed. For virtual memory the metadata
> pointers are stored in the corresponding `struct page`, therefore we
> need to call virt_to_page() to get them.
>
> According to the comment in arch/x86/include/asm/page.h,
> virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr)
> is true, so KMSAN needs to call virt_addr_valid() as well.
>
> To avoid recursion, kmsan_get_metadata() must not call instrumented
> code, therefore ./arch/x86/include/asm/kmsan.h forks parts of
> arch/x86/mm/physaddr.c to check whether a virtual address is valid or
> not.
>
> But the introduction of rcu_read_lock() to pfn_valid() added
> instrumented RCU API calls to virt_to_page_or_null(), which is called by
> kmsan_get_metadata(), so there is an infinite recursion now.  I do not
> think it is correct to stop that recursion by doing
> kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that
> would prevent instrumented functions called from within the runtime from
> tracking the shadow values, which might introduce false positives."
>
> Fix the issue by switching pfn_valid() to the _sched() variant of
> rcu_read_lock/unlock(), which does not require calling into RCU. Given
> the critical section in pfn_valid() is very small, this is a reasonable
> trade-off (with preemptible RCU).
>
> KMSAN further needs to be careful to suppress calls into the scheduler,
> which would be another source of recursion. This can be done by wrapping
> the call to pfn_valid() into preempt_disable/enable_no_resched(). The
> downside is that this sacrifices breaking scheduling guarantees;
> however, a kernel compiled with KMSAN has already given up any
> performance guarantees due to being heavily instrumented.
>
> Note, KMSAN code already disables tracing via Makefile, and since
> mmzone.h is included, it is not necessary to use the notrace variant,
> which is generally preferred in all other cases.
>
> Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1]
> Reported-by: Alexander Potapenko <glider@google.com>
> Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com
> Signed-off-by: Marco Elver <elver@google.com>
> Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kmsan.h b/arch/x86/include/asm/kmsan.h
index 8fa6ac0e2d76..d91b37f5b4bb 100644
--- a/arch/x86/include/asm/kmsan.h
+++ b/arch/x86/include/asm/kmsan.h
@@ -64,6 +64,7 @@  static inline bool kmsan_virt_addr_valid(void *addr)
 {
 	unsigned long x = (unsigned long)addr;
 	unsigned long y = x - __START_KERNEL_map;
+	bool ret;
 
 	/* use the carry flag to determine if x was < __START_KERNEL_map */
 	if (unlikely(x > y)) {
@@ -79,7 +80,21 @@  static inline bool kmsan_virt_addr_valid(void *addr)
 			return false;
 	}
 
-	return pfn_valid(x >> PAGE_SHIFT);
+	/*
+	 * pfn_valid() relies on RCU, and may call into the scheduler on exiting
+	 * the critical section. However, this would result in recursion with
+	 * KMSAN. Therefore, disable preemption here, and re-enable preemption
+	 * below while suppressing reschedules to avoid recursion.
+	 *
+	 * Note, this sacrifices occasionally breaking scheduling guarantees.
+	 * Although, a kernel compiled with KMSAN has already given up on any
+	 * performance guarantees due to being heavily instrumented.
+	 */
+	preempt_disable();
+	ret = pfn_valid(x >> PAGE_SHIFT);
+	preempt_enable_no_resched();
+
+	return ret;
 }
 
 #endif /* !MODULE */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4ed33b127821..a497f189d988 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2013,9 +2013,9 @@  static inline int pfn_valid(unsigned long pfn)
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
 	ms = __pfn_to_section(pfn);
-	rcu_read_lock();
+	rcu_read_lock_sched();
 	if (!valid_section(ms)) {
-		rcu_read_unlock();
+		rcu_read_unlock_sched();
 		return 0;
 	}
 	/*
@@ -2023,7 +2023,7 @@  static inline int pfn_valid(unsigned long pfn)
 	 * the entire section-sized span.
 	 */
 	ret = early_section(ms) || pfn_section_valid(ms, pfn);
-	rcu_read_unlock();
+	rcu_read_unlock_sched();
 
 	return ret;
 }