diff mbox series

[v2,1/9] kasan: sw_tags: Use arithmetic shift for shadow computation

Message ID 20241022015913.3524425-2-samuel.holland@sifive.com (mailing list archive)
State Changes Requested
Headers show
Series [v2,1/9] kasan: sw_tags: Use arithmetic shift for shadow computation | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh took 108.74s
conchuod/patch-1-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh took 1874.57s
conchuod/patch-1-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh took 2212.70s
conchuod/patch-1-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh took 16.40s
conchuod/patch-1-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh took 17.96s
conchuod/patch-1-test-6 success .github/scripts/patches/tests/checkpatch.sh took 1.38s
conchuod/patch-1-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh took 35.79s
conchuod/patch-1-test-8 success .github/scripts/patches/tests/header_inline.sh took 0.01s
conchuod/patch-1-test-9 success .github/scripts/patches/tests/kdoc.sh took 0.51s
conchuod/patch-1-test-10 success .github/scripts/patches/tests/module_param.sh took 0.02s
conchuod/patch-1-test-11 success .github/scripts/patches/tests/verify_fixes.sh took 0.00s
conchuod/patch-1-test-12 success .github/scripts/patches/tests/verify_signedoff.sh took 0.02s

Commit Message

Samuel Holland Oct. 22, 2024, 1:57 a.m. UTC
Currently, kasan_mem_to_shadow() uses a logical right shift, which turns
canonical kernel addresses into non-canonical addresses by clearing the
high KASAN_SHADOW_SCALE_SHIFT bits. The value of KASAN_SHADOW_OFFSET is
then chosen so that the addition results in a canonical address for the
shadow memory.

For KASAN_GENERIC, this shift/add combination is ABI with the compiler,
because KASAN_SHADOW_OFFSET is used in compiler-generated inline tag
checks[1], which must only attempt to dereference canonical addresses.

However, for KASAN_SW_TAGS we have some freedom to change the algorithm
without breaking the ABI. Because TBI is enabled for kernel addresses,
the top bits of shadow memory addresses computed during tag checks are
irrelevant, and so likewise are the top bits of KASAN_SHADOW_OFFSET.
This is demonstrated by the fact that LLVM uses a logical right shift
in the tag check fast path[2] but a sbfx (signed bitfield extract)
instruction in the slow path[3] without causing any issues.

Using an arithmetic shift in kasan_mem_to_shadow() provides a number of
benefits:

1) The memory layout is easier to understand. KASAN_SHADOW_OFFSET
becomes a canonical memory address, and the shifted pointer becomes a
negative offset, so KASAN_SHADOW_OFFSET == KASAN_SHADOW_END regardless
of the shift amount or the size of the virtual address space.

2) KASAN_SHADOW_OFFSET becomes a simpler constant, requiring only one
instruction to load instead of two. Since it must be loaded in each
function with a tag check, this decreases kernel text size by 0.5%.

3) This shift and the sign extension from kasan_reset_tag() can be
combined into a single sbfx instruction. When this same algorithm change
is applied to the compiler, it removes an instruction from each inline
tag check, further reducing kernel text size by an additional 4.6%.

These benefits extend to other architectures as well. On RISC-V, where
the baseline ISA does not shifted addition or have an equivalent to the
sbfx instruction, loading KASAN_SHADOW_OFFSET is reduced from 3 to 2
instructions, and kasan_mem_to_shadow(kasan_reset_tag(addr)) similarly
combines two consecutive right shifts.

Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L1316 [1]
Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#L895 [2]
Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp#L669 [3]
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v2:
 - Improve the explanation for how KASAN_SHADOW_END is derived
 - Update the range check in kasan_non_canonical_hook()

 arch/arm64/Kconfig              | 10 +++++-----
 arch/arm64/include/asm/memory.h | 17 +++++++++++++++--
 arch/arm64/mm/kasan_init.c      |  7 +++++--
 include/linux/kasan.h           | 10 ++++++++--
 mm/kasan/report.c               | 22 ++++++++++++++++++----
 scripts/gdb/linux/mm.py         |  5 +++--
 6 files changed, 54 insertions(+), 17 deletions(-)

Comments

Andrey Konovalov Oct. 23, 2024, 6:41 p.m. UTC | #1
On Tue, Oct 22, 2024 at 3:59 AM Samuel Holland
<samuel.holland@sifive.com> wrote:
>
> Currently, kasan_mem_to_shadow() uses a logical right shift, which turns
> canonical kernel addresses into non-canonical addresses by clearing the
> high KASAN_SHADOW_SCALE_SHIFT bits. The value of KASAN_SHADOW_OFFSET is
> then chosen so that the addition results in a canonical address for the
> shadow memory.
>
> For KASAN_GENERIC, this shift/add combination is ABI with the compiler,
> because KASAN_SHADOW_OFFSET is used in compiler-generated inline tag
> checks[1], which must only attempt to dereference canonical addresses.
>
> However, for KASAN_SW_TAGS we have some freedom to change the algorithm
> without breaking the ABI. Because TBI is enabled for kernel addresses,
> the top bits of shadow memory addresses computed during tag checks are
> irrelevant, and so likewise are the top bits of KASAN_SHADOW_OFFSET.
> This is demonstrated by the fact that LLVM uses a logical right shift
> in the tag check fast path[2] but a sbfx (signed bitfield extract)
> instruction in the slow path[3] without causing any issues.
>
> Using an arithmetic shift in kasan_mem_to_shadow() provides a number of
> benefits:
>
> 1) The memory layout is easier to understand. KASAN_SHADOW_OFFSET
> becomes a canonical memory address, and the shifted pointer becomes a
> negative offset, so KASAN_SHADOW_OFFSET == KASAN_SHADOW_END regardless
> of the shift amount or the size of the virtual address space.
>
> 2) KASAN_SHADOW_OFFSET becomes a simpler constant, requiring only one
> instruction to load instead of two. Since it must be loaded in each
> function with a tag check, this decreases kernel text size by 0.5%.
>
> 3) This shift and the sign extension from kasan_reset_tag() can be
> combined into a single sbfx instruction. When this same algorithm change
> is applied to the compiler, it removes an instruction from each inline
> tag check, further reducing kernel text size by an additional 4.6%.
>
> These benefits extend to other architectures as well. On RISC-V, where
> the baseline ISA does not shifted addition or have an equivalent to the
> sbfx instruction, loading KASAN_SHADOW_OFFSET is reduced from 3 to 2
> instructions, and kasan_mem_to_shadow(kasan_reset_tag(addr)) similarly
> combines two consecutive right shifts.
>
> Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L1316 [1]
> Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#L895 [2]
> Link: https://github.com/llvm/llvm-project/blob/llvmorg-20-init/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp#L669 [3]
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
>
> Changes in v2:
>  - Improve the explanation for how KASAN_SHADOW_END is derived
>  - Update the range check in kasan_non_canonical_hook()
>
>  arch/arm64/Kconfig              | 10 +++++-----
>  arch/arm64/include/asm/memory.h | 17 +++++++++++++++--
>  arch/arm64/mm/kasan_init.c      |  7 +++++--
>  include/linux/kasan.h           | 10 ++++++++--
>  mm/kasan/report.c               | 22 ++++++++++++++++++----
>  scripts/gdb/linux/mm.py         |  5 +++--
>  6 files changed, 54 insertions(+), 17 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fd9df6dcc593..6a326908c941 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -418,11 +418,11 @@ config KASAN_SHADOW_OFFSET
>         default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
>         default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
>         default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
> -       default 0xefff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
> -       default 0xefffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
> -       default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
> -       default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
> -       default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
> +       default 0xffff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
> +       default 0xffffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
> +       default 0xfffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
> +       default 0xffffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
> +       default 0xfffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
>         default 0xffffffffffffffff
>
>  config UNWIND_TABLES
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 0480c61dbb4f..a93fc9dc16f3 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -80,7 +80,8 @@
>   * where KASAN_SHADOW_SCALE_SHIFT is the order of the number of bits that map
>   * to a single shadow byte and KASAN_SHADOW_OFFSET is a constant that offsets
>   * the mapping. Note that KASAN_SHADOW_OFFSET does not point to the start of
> - * the shadow memory region.
> + * the shadow memory region, since not all possible addresses have shadow
> + * memory allocated for them.

I'm not sure this addition makes sense: the original statement was to
point out that KASAN_SHADOW_OFFSET and KASAN_SHADOW_START are
different values. Even if we were to map shadow for userspace,
KASAN_SHADOW_OFFSET would still be a weird offset value for Generic
KASAN.

>   *
>   * Based on this mapping, we define two constants:
>   *
> @@ -89,7 +90,15 @@
>   *
>   * KASAN_SHADOW_END is defined first as the shadow address that corresponds to
>   * the upper bound of possible virtual kernel memory addresses UL(1) << 64
> - * according to the mapping formula.
> + * according to the mapping formula. For Generic KASAN, the address in the
> + * mapping formula is treated as unsigned (part of the compiler's ABI), so the
> + * end of the shadow memory region is at a large positive offset from
> + * KASAN_SHADOW_OFFSET. For Software Tag-Based KASAN, the address in the
> + * formula is treated as signed. Since all kernel addresses are negative, they
> + * map to shadow memory below KASAN_SHADOW_OFFSET, making KASAN_SHADOW_OFFSET
> + * itself the end of the shadow memory region. (User pointers are positive and
> + * would map to shadow memory above KASAN_SHADOW_OFFSET, but shadow memory is
> + * not allocated for them.)

This looks good!

>   *
>   * KASAN_SHADOW_START is defined second based on KASAN_SHADOW_END. The shadow
>   * memory start must map to the lowest possible kernel virtual memory address
> @@ -100,7 +109,11 @@
>   */
>  #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
>  #define KASAN_SHADOW_OFFSET    _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
> +#ifdef CONFIG_KASAN_GENERIC
>  #define KASAN_SHADOW_END       ((UL(1) << (64 - KASAN_SHADOW_SCALE_SHIFT)) + KASAN_SHADOW_OFFSET)
> +#else
> +#define KASAN_SHADOW_END       KASAN_SHADOW_OFFSET
> +#endif
>  #define _KASAN_SHADOW_START(va)        (KASAN_SHADOW_END - (UL(1) << ((va) - KASAN_SHADOW_SCALE_SHIFT)))
>  #define KASAN_SHADOW_START     _KASAN_SHADOW_START(vabits_actual)
>  #define PAGE_END               KASAN_SHADOW_START
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index b65a29440a0c..6836e571555c 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -198,8 +198,11 @@ static bool __init root_level_aligned(u64 addr)
>  /* The early shadow maps everything to a single page of zeroes */
>  asmlinkage void __init kasan_early_init(void)
>  {
> -       BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
> -               KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC))
> +               BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
> +                       KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
> +       else
> +               BUILD_BUG_ON(KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
>         BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), SHADOW_ALIGN));
>         BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), SHADOW_ALIGN));
>         BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, SHADOW_ALIGN));
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index 00a3bf7c0d8f..03b440658817 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -58,8 +58,14 @@ int kasan_populate_early_shadow(const void *shadow_start,
>  #ifndef kasan_mem_to_shadow
>  static inline void *kasan_mem_to_shadow(const void *addr)
>  {
> -       return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
> -               + KASAN_SHADOW_OFFSET;
> +       void *scaled;
> +
> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC))
> +               scaled = (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT);
> +       else
> +               scaled = (void *)((long)addr >> KASAN_SHADOW_SCALE_SHIFT);
> +
> +       return KASAN_SHADOW_OFFSET + scaled;
>  }
>  #endif
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index b48c768acc84..c08097715686 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -644,15 +644,29 @@ void kasan_report_async(void)
>   */
>  void kasan_non_canonical_hook(unsigned long addr)
>  {
> +       unsigned long max_shadow_size = BIT(BITS_PER_LONG - KASAN_SHADOW_SCALE_SHIFT);
>         unsigned long orig_addr;
>         const char *bug_type;
>
>         /*
> -        * All addresses that came as a result of the memory-to-shadow mapping
> -        * (even for bogus pointers) must be >= KASAN_SHADOW_OFFSET.
> +        * With the default kasan_mem_to_shadow() algorithm, all addresses
> +        * returned by the memory-to-shadow mapping (even for bogus pointers)
> +        * must be within a certain displacement from KASAN_SHADOW_OFFSET.
> +        *
> +        * For Generic KASAN, the displacement is unsigned, so
> +        * KASAN_SHADOW_OFFSET is the smallest possible shadow address. For

This part of the comment doesn't seem correct: KASAN_SHADOW_OFFSET is
still a weird offset value for Generic KASAN, not the smallest
possible shadow address.

> +        * Software Tag-Based KASAN, the displacement is signed, so
> +        * KASAN_SHADOW_OFFSET is the center of the range.
>          */
> -       if (addr < KASAN_SHADOW_OFFSET)
> -               return;
> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
> +               if (addr < KASAN_SHADOW_OFFSET ||
> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
> +                       return;
> +       } else {
> +               if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
> +                       return;

Hm, I might be wrong, but I think this check does not work.

Let's say we have non-canonical address 0x4242424242424242 and number
of VA bits is 48.

Then:

KASAN_SHADOW_OFFSET == 0xffff800000000000
kasan_mem_to_shadow(0x4242424242424242) == 0x0423a42424242424
max_shadow_size == 0x1000000000000000
KASAN_SHADOW_OFFSET - max_shadow_size / 2 == 0xf7ff800000000000
KASAN_SHADOW_OFFSET + max_shadow_size / 2 == 0x07ff800000000000 (overflows)

0x0423a42424242424 is < than 0xf7ff800000000000, so the function will
wrongly return.

> +       }
>
>         orig_addr = (unsigned long)kasan_shadow_to_mem((void *)addr);
>

Just to double-check: kasan_shadow_to_mem() and addr_has_metadata()
don't need any changes, right?

> diff --git a/scripts/gdb/linux/mm.py b/scripts/gdb/linux/mm.py
> index 7571aebbe650..2e63f3dedd53 100644
> --- a/scripts/gdb/linux/mm.py
> +++ b/scripts/gdb/linux/mm.py
> @@ -110,12 +110,13 @@ class aarch64_page_ops():
>          self.KERNEL_END = gdb.parse_and_eval("_end")
>
>          if constants.LX_CONFIG_KASAN_GENERIC or constants.LX_CONFIG_KASAN_SW_TAGS:
> +            self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
>              if constants.LX_CONFIG_KASAN_GENERIC:
>                  self.KASAN_SHADOW_SCALE_SHIFT = 3
> +                self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
>              else:
>                  self.KASAN_SHADOW_SCALE_SHIFT = 4
> -            self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
> -            self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
> +                self.KASAN_SHADOW_END = self.KASAN_SHADOW_OFFSET
>              self.PAGE_END = self.KASAN_SHADOW_END - (1 << (self.vabits_actual - self.KASAN_SHADOW_SCALE_SHIFT))
>          else:
>              self.PAGE_END = self._PAGE_END(self.VA_BITS_MIN)
> --
> 2.45.1
>

Could you also check that everything works when CONFIG_KASAN_SW_TAGS +
CONFIG_KASAN_OUTLINE? I think it should, be makes sense to confirm.

Thank you!
Maciej Wieczor-Retman Feb. 10, 2025, 3:22 p.m. UTC | #2
On 2024-10-23 at 20:41:57 +0200, Andrey Konovalov wrote:
>On Tue, Oct 22, 2024 at 3:59 AM Samuel Holland
><samuel.holland@sifive.com> wrote:
...
>> +        * Software Tag-Based KASAN, the displacement is signed, so
>> +        * KASAN_SHADOW_OFFSET is the center of the range.
>>          */
>> -       if (addr < KASAN_SHADOW_OFFSET)
>> -               return;
>> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
>> +               if (addr < KASAN_SHADOW_OFFSET ||
>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
>> +                       return;
>> +       } else {
>> +               if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
>> +                       return;
>
>Hm, I might be wrong, but I think this check does not work.
>
>Let's say we have non-canonical address 0x4242424242424242 and number
>of VA bits is 48.
>
>Then:
>
>KASAN_SHADOW_OFFSET == 0xffff800000000000
>kasan_mem_to_shadow(0x4242424242424242) == 0x0423a42424242424
>max_shadow_size == 0x1000000000000000
>KASAN_SHADOW_OFFSET - max_shadow_size / 2 == 0xf7ff800000000000
>KASAN_SHADOW_OFFSET + max_shadow_size / 2 == 0x07ff800000000000 (overflows)
>
>0x0423a42424242424 is < than 0xf7ff800000000000, so the function will
>wrongly return.

As I understand this check aims to figure out if the address landed in shadow
space and if it didn't we can return.

Can't this above snippet be a simple:

	if (!addr_in_shadow(addr))
		return;

?
Maciej Wieczor-Retman Feb. 10, 2025, 3:52 p.m. UTC | #3
On 2025-02-10 at 16:22:41 +0100, Maciej Wieczor-Retman wrote:
>On 2024-10-23 at 20:41:57 +0200, Andrey Konovalov wrote:
>>On Tue, Oct 22, 2024 at 3:59 AM Samuel Holland
>><samuel.holland@sifive.com> wrote:
>...
>>> +        * Software Tag-Based KASAN, the displacement is signed, so
>>> +        * KASAN_SHADOW_OFFSET is the center of the range.
>>>          */
>>> -       if (addr < KASAN_SHADOW_OFFSET)
>>> -               return;
>>> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
>>> +               if (addr < KASAN_SHADOW_OFFSET ||
>>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
>>> +                       return;
>>> +       } else {
>>> +               if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
>>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
>>> +                       return;
>>
>>Hm, I might be wrong, but I think this check does not work.
>>
>>Let's say we have non-canonical address 0x4242424242424242 and number
>>of VA bits is 48.
>>
>>Then:
>>
>>KASAN_SHADOW_OFFSET == 0xffff800000000000
>>kasan_mem_to_shadow(0x4242424242424242) == 0x0423a42424242424
>>max_shadow_size == 0x1000000000000000
>>KASAN_SHADOW_OFFSET - max_shadow_size / 2 == 0xf7ff800000000000
>>KASAN_SHADOW_OFFSET + max_shadow_size / 2 == 0x07ff800000000000 (overflows)
>>
>>0x0423a42424242424 is < than 0xf7ff800000000000, so the function will
>>wrongly return.
>
>As I understand this check aims to figure out if the address landed in shadow
>space and if it didn't we can return.
>
>Can't this above snippet be a simple:
>
>	if (!addr_in_shadow(addr))
>		return;
>
>?

Sorry, I think this wouldn't work. The tag also needs to be reset. Does this
perhaps work for this problem?

	if (!addr_in_shadow(kasan_reset_tag((void *)addr)))
		return;

>
>-- 
>Kind regards
>Maciej Wieczór-Retman
Andrey Konovalov Feb. 10, 2025, 10:57 p.m. UTC | #4
On Mon, Feb 10, 2025 at 4:53 PM Maciej Wieczor-Retman
<maciej.wieczor-retman@intel.com> wrote:
>
> On 2025-02-10 at 16:22:41 +0100, Maciej Wieczor-Retman wrote:
> >On 2024-10-23 at 20:41:57 +0200, Andrey Konovalov wrote:
> >>On Tue, Oct 22, 2024 at 3:59 AM Samuel Holland
> >><samuel.holland@sifive.com> wrote:
> >...
> >>> +        * Software Tag-Based KASAN, the displacement is signed, so
> >>> +        * KASAN_SHADOW_OFFSET is the center of the range.
> >>>          */
> >>> -       if (addr < KASAN_SHADOW_OFFSET)
> >>> -               return;
> >>> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
> >>> +               if (addr < KASAN_SHADOW_OFFSET ||
> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
> >>> +                       return;
> >>> +       } else {
> >>> +               if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
> >>> +                       return;
> >>
> >>Hm, I might be wrong, but I think this check does not work.
> >>
> >>Let's say we have non-canonical address 0x4242424242424242 and number
> >>of VA bits is 48.
> >>
> >>Then:
> >>
> >>KASAN_SHADOW_OFFSET == 0xffff800000000000
> >>kasan_mem_to_shadow(0x4242424242424242) == 0x0423a42424242424
> >>max_shadow_size == 0x1000000000000000
> >>KASAN_SHADOW_OFFSET - max_shadow_size / 2 == 0xf7ff800000000000
> >>KASAN_SHADOW_OFFSET + max_shadow_size / 2 == 0x07ff800000000000 (overflows)
> >>
> >>0x0423a42424242424 is < than 0xf7ff800000000000, so the function will
> >>wrongly return.
> >
> >As I understand this check aims to figure out if the address landed in shadow
> >space and if it didn't we can return.
> >
> >Can't this above snippet be a simple:
> >
> >       if (!addr_in_shadow(addr))
> >               return;
> >
> >?
>
> Sorry, I think this wouldn't work. The tag also needs to be reset. Does this
> perhaps work for this problem?
>
>         if (!addr_in_shadow(kasan_reset_tag((void *)addr)))
>                 return;

This wouldn't work as well.

addr_in_shadow() checks whether an address belongs to the proper
shadow memory area. That area is the result of the memory-to-shadow
mapping applied to the range of proper kernel addresses.

However, what we want to check in this function is whether the given
address can be the result of the memory-to-shadow mapping for some
memory address, including userspace addresses, non-canonical
addresses, etc. So essentially we need to check whether the given
address belongs to the area that is the result of the memory-to-shadow
mapping applied to the whole address space, not only to proper kernel
addresses.
Maciej Wieczor-Retman Feb. 11, 2025, 8:58 a.m. UTC | #5
On 2025-02-10 at 23:57:10 +0100, Andrey Konovalov wrote:
>On Mon, Feb 10, 2025 at 4:53 PM Maciej Wieczor-Retman
><maciej.wieczor-retman@intel.com> wrote:
>>
>> On 2025-02-10 at 16:22:41 +0100, Maciej Wieczor-Retman wrote:
>> >On 2024-10-23 at 20:41:57 +0200, Andrey Konovalov wrote:
>> >>On Tue, Oct 22, 2024 at 3:59 AM Samuel Holland
>> >><samuel.holland@sifive.com> wrote:
>> >...
>> >>> +        * Software Tag-Based KASAN, the displacement is signed, so
>> >>> +        * KASAN_SHADOW_OFFSET is the center of the range.
>> >>>          */
>> >>> -       if (addr < KASAN_SHADOW_OFFSET)
>> >>> -               return;
>> >>> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
>> >>> +               if (addr < KASAN_SHADOW_OFFSET ||
>> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
>> >>> +                       return;
>> >>> +       } else {
>> >>> +               if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
>> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
>> >>> +                       return;
>> >>
>> >>Hm, I might be wrong, but I think this check does not work.
>> >>
>> >>Let's say we have non-canonical address 0x4242424242424242 and number
>> >>of VA bits is 48.
>> >>
>> >>Then:
>> >>
>> >>KASAN_SHADOW_OFFSET == 0xffff800000000000
>> >>kasan_mem_to_shadow(0x4242424242424242) == 0x0423a42424242424
>> >>max_shadow_size == 0x1000000000000000
>> >>KASAN_SHADOW_OFFSET - max_shadow_size / 2 == 0xf7ff800000000000
>> >>KASAN_SHADOW_OFFSET + max_shadow_size / 2 == 0x07ff800000000000 (overflows)
>> >>
>> >>0x0423a42424242424 is < than 0xf7ff800000000000, so the function will
>> >>wrongly return.
>> >
>> >As I understand this check aims to figure out if the address landed in shadow
>> >space and if it didn't we can return.
>> >
>> >Can't this above snippet be a simple:
>> >
>> >       if (!addr_in_shadow(addr))
>> >               return;
>> >
>> >?
>>
>> Sorry, I think this wouldn't work. The tag also needs to be reset. Does this
>> perhaps work for this problem?
>>
>>         if (!addr_in_shadow(kasan_reset_tag((void *)addr)))
>>                 return;
>
>This wouldn't work as well.
>
>addr_in_shadow() checks whether an address belongs to the proper
>shadow memory area. That area is the result of the memory-to-shadow
>mapping applied to the range of proper kernel addresses.
>
>However, what we want to check in this function is whether the given
>address can be the result of the memory-to-shadow mapping for some
>memory address, including userspace addresses, non-canonical
>addresses, etc. So essentially we need to check whether the given
>address belongs to the area that is the result of the memory-to-shadow
>mapping applied to the whole address space, not only to proper kernel
>addresses.k

Ah, okay, I get it. Would the old version

       if (addr < KASAN_SHADOW_OFFSET)
               return;

work if the *addr* had kasan_reset_tag() around it? That would sort of re-unsign
the address only for the purpose of the if().

Also I was thinking about it because x86 even with address masking enabled keeps
bit 63 set, so all kernel addresses will be negative in the signed
kasan_mem_to_shadow(). That's great for simplifying the KASAN_SHADOW_OFFSET but
it differs from the TBI and risc-v ideas where half of addresses are negative,
hald positive. So the temporary re-unsigning could maybe make it simpler for x86
and avoid adding separate cases or alternative kasan_non_canonical_hook()
implementation.
Maciej Wieczor-Retman Feb. 11, 2025, 1:42 p.m. UTC | #6
On 2025-02-11 at 09:58:22 +0100, Maciej Wieczor-Retman wrote:
>On 2025-02-10 at 23:57:10 +0100, Andrey Konovalov wrote:
>>On Mon, Feb 10, 2025 at 4:53 PM Maciej Wieczor-Retman
>><maciej.wieczor-retman@intel.com> wrote:
>>>
>>> On 2025-02-10 at 16:22:41 +0100, Maciej Wieczor-Retman wrote:
>>> >On 2024-10-23 at 20:41:57 +0200, Andrey Konovalov wrote:
>>> >>On Tue, Oct 22, 2024 at 3:59 AM Samuel Holland
>>> >><samuel.holland@sifive.com> wrote:
>>> >...
>>> >>> +        * Software Tag-Based KASAN, the displacement is signed, so
>>> >>> +        * KASAN_SHADOW_OFFSET is the center of the range.
>>> >>>          */
>>> >>> -       if (addr < KASAN_SHADOW_OFFSET)
>>> >>> -               return;
>>> >>> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
>>> >>> +               if (addr < KASAN_SHADOW_OFFSET ||
>>> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
>>> >>> +                       return;
>>> >>> +       } else {
>>> >>> +               if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
>>> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
>>> >>> +                       return;
>>> >>
>>> >>Hm, I might be wrong, but I think this check does not work.
>>> >>
>>> >>Let's say we have non-canonical address 0x4242424242424242 and number
>>> >>of VA bits is 48.
>>> >>
>>> >>Then:
>>> >>
>>> >>KASAN_SHADOW_OFFSET == 0xffff800000000000
>>> >>kasan_mem_to_shadow(0x4242424242424242) == 0x0423a42424242424
>>> >>max_shadow_size == 0x1000000000000000
>>> >>KASAN_SHADOW_OFFSET - max_shadow_size / 2 == 0xf7ff800000000000
>>> >>KASAN_SHADOW_OFFSET + max_shadow_size / 2 == 0x07ff800000000000 (overflows)
>>> >>
>>> >>0x0423a42424242424 is < than 0xf7ff800000000000, so the function will
>>> >>wrongly return.
>>> >
>>> >As I understand this check aims to figure out if the address landed in shadow
>>> >space and if it didn't we can return.
>>> >
>>> >Can't this above snippet be a simple:
>>> >
>>> >       if (!addr_in_shadow(addr))
>>> >               return;
>>> >
>>> >?
>>>
>>> Sorry, I think this wouldn't work. The tag also needs to be reset. Does this
>>> perhaps work for this problem?
>>>
>>>         if (!addr_in_shadow(kasan_reset_tag((void *)addr)))
>>>                 return;
>>
>>This wouldn't work as well.
>>
>>addr_in_shadow() checks whether an address belongs to the proper
>>shadow memory area. That area is the result of the memory-to-shadow
>>mapping applied to the range of proper kernel addresses.
>>
>>However, what we want to check in this function is whether the given
>>address can be the result of the memory-to-shadow mapping for some
>>memory address, including userspace addresses, non-canonical
>>addresses, etc. So essentially we need to check whether the given
>>address belongs to the area that is the result of the memory-to-shadow
>>mapping applied to the whole address space, not only to proper kernel
>>addresses.k
>
>Ah, okay, I get it. Would the old version
>
>       if (addr < KASAN_SHADOW_OFFSET)
>               return;
>
>work if the *addr* had kasan_reset_tag() around it? That would sort of re-unsign
>the address only for the purpose of the if().
>
>Also I was thinking about it because x86 even with address masking enabled keeps
>bit 63 set, so all kernel addresses will be negative in the signed
>kasan_mem_to_shadow(). That's great for simplifying the KASAN_SHADOW_OFFSET but
>it differs from the TBI and risc-v ideas where half of addresses are negative,
>hald positive. So the temporary re-unsigning could maybe make it simpler for x86
>and avoid adding separate cases or alternative kasan_non_canonical_hook()
>implementation.

Oh, nevermind, I see that this is more complicated than that. Sorry for the
spam, I'll do some better calculations what is mapped where when doing
kasan_mem_to_shadow() and maybe then I'll figure this out.
Maciej Wieczor-Retman Feb. 11, 2025, 6:06 p.m. UTC | #7
On 2025-02-10 at 23:57:10 +0100, Andrey Konovalov wrote:
>On Mon, Feb 10, 2025 at 4:53 PM Maciej Wieczor-Retman
><maciej.wieczor-retman@intel.com> wrote:
>>
>> On 2025-02-10 at 16:22:41 +0100, Maciej Wieczor-Retman wrote:
>> >On 2024-10-23 at 20:41:57 +0200, Andrey Konovalov wrote:
>> >>On Tue, Oct 22, 2024 at 3:59 AM Samuel Holland
>> >><samuel.holland@sifive.com> wrote:
>> >...
>> >>> +        * Software Tag-Based KASAN, the displacement is signed, so
>> >>> +        * KASAN_SHADOW_OFFSET is the center of the range.
>> >>>          */
>> >>> -       if (addr < KASAN_SHADOW_OFFSET)
>> >>> -               return;
>> >>> +       if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
>> >>> +               if (addr < KASAN_SHADOW_OFFSET ||
>> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
>> >>> +                       return;
>> >>> +       } else {
>> >>> +               if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
>> >>> +                   addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
>> >>> +                       return;
>> >>
>> >>Hm, I might be wrong, but I think this check does not work.
>> >>
>> >>Let's say we have non-canonical address 0x4242424242424242 and number
>> >>of VA bits is 48.
>> >>
>> >>Then:
>> >>
>> >>KASAN_SHADOW_OFFSET == 0xffff800000000000
>> >>kasan_mem_to_shadow(0x4242424242424242) == 0x0423a42424242424
>> >>max_shadow_size == 0x1000000000000000
>> >>KASAN_SHADOW_OFFSET - max_shadow_size / 2 == 0xf7ff800000000000
>> >>KASAN_SHADOW_OFFSET + max_shadow_size / 2 == 0x07ff800000000000 (overflows)
>> >>
>> >>0x0423a42424242424 is < than 0xf7ff800000000000, so the function will
>> >>wrongly return.
>> >
>> >As I understand this check aims to figure out if the address landed in shadow
>> >space and if it didn't we can return.
>> >
>> >Can't this above snippet be a simple:
>> >
>> >       if (!addr_in_shadow(addr))
>> >               return;
>> >
>> >?
>>
>> Sorry, I think this wouldn't work. The tag also needs to be reset. Does this
>> perhaps work for this problem?
>>
>>         if (!addr_in_shadow(kasan_reset_tag((void *)addr)))
>>                 return;
>
>This wouldn't work as well.
>
>addr_in_shadow() checks whether an address belongs to the proper
>shadow memory area. That area is the result of the memory-to-shadow
>mapping applied to the range of proper kernel addresses.
>
>However, what we want to check in this function is whether the given
>address can be the result of the memory-to-shadow mapping for some
>memory address, including userspace addresses, non-canonical
>addresses, etc. So essentially we need to check whether the given
>address belongs to the area that is the result of the memory-to-shadow
>mapping applied to the whole address space, not only to proper kernel
>addresses.

I did some experiments with multiple addresses passed through
kasan_mem_to_shadow(). And it seems like we can get almost any address out when
we consider any random bogus pointers.

I used the KASAN_SHADOW_OFFSET from your example above. Userspace addresses seem
to map to the range [KASAN_SHADOW_OFFSET - 0xffff8fffffffffff]. Then going
through non-canonical addresses until 0x0007ffffffffffff we reach the end of
kernel LA and we loop around. Then the addresses seem to go from 0 until we
again start reaching the kernel space and then it maps into the proper shadow
memory.

It gave me the same results when using the previous version of
kasan_mem_to_shadow() so I'm wondering whether I'm doing this experiment
incorrectly or if there aren't any addresses we can rule out here?
Andrey Konovalov Feb. 13, 2025, 1:21 a.m. UTC | #8
On Tue, Feb 11, 2025 at 7:07 PM Maciej Wieczor-Retman
<maciej.wieczor-retman@intel.com> wrote:
>
> I did some experiments with multiple addresses passed through
> kasan_mem_to_shadow(). And it seems like we can get almost any address out when
> we consider any random bogus pointers.
>
> I used the KASAN_SHADOW_OFFSET from your example above. Userspace addresses seem
> to map to the range [KASAN_SHADOW_OFFSET - 0xffff8fffffffffff]. Then going
> through non-canonical addresses until 0x0007ffffffffffff we reach the end of
> kernel LA and we loop around. Then the addresses seem to go from 0 until we
> again start reaching the kernel space and then it maps into the proper shadow
> memory.
>
> It gave me the same results when using the previous version of
> kasan_mem_to_shadow() so I'm wondering whether I'm doing this experiment
> incorrectly or if there aren't any addresses we can rule out here?

By the definition of the shadow mapping, if we apply that mapping to
the whole 64-bit address space, the result will only contain 1/8th
(1/16th for SW/HW_TAGS) of that space.

For example, with the current upstream value of KASAN_SHADOW_OFFSET on
x86 and arm64, the value of the top 3 bits (4 for SW/HW_TAGS) of any
shadow address are always the same: KASAN_SHADOW_OFFSET's value is
such that the shadow address calculation never overflows. Addresses
that have a different value for those top 3 bits are the once we can
rule out.

The KASAN_SHADOW_OFFSET value from my example does rely on the
overflow (arguably, this makes things more confusing [1]). But still,
the possible values of shadow addresses should only cover 1/16th of
the address space.

So whether the address belongs to that 1/8th (1/16th) of the address
space is what we want to check in kasan_non_canonical_hook().

The current upstream version of kasan_non_canonical_hook() actually
does a simplified check by only checking for the lower bound (e.g. for
x86, there's also an upper bound: KASAN_SHADOW_OFFSET +
(0xffffffffffffffff >> 3) == 0xfffffbffffffffff), so we could improve
it.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=218043
Andrey Konovalov Feb. 13, 2025, 1:28 a.m. UTC | #9
On Thu, Feb 13, 2025 at 2:21 AM Andrey Konovalov <andreyknvl@gmail.com> wrote:
>
> On Tue, Feb 11, 2025 at 7:07 PM Maciej Wieczor-Retman
> <maciej.wieczor-retman@intel.com> wrote:
> >
> > I did some experiments with multiple addresses passed through
> > kasan_mem_to_shadow(). And it seems like we can get almost any address out when
> > we consider any random bogus pointers.
> >
> > I used the KASAN_SHADOW_OFFSET from your example above. Userspace addresses seem
> > to map to the range [KASAN_SHADOW_OFFSET - 0xffff8fffffffffff]. Then going
> > through non-canonical addresses until 0x0007ffffffffffff we reach the end of
> > kernel LA and we loop around. Then the addresses seem to go from 0 until we
> > again start reaching the kernel space and then it maps into the proper shadow
> > memory.
> >
> > It gave me the same results when using the previous version of
> > kasan_mem_to_shadow() so I'm wondering whether I'm doing this experiment
> > incorrectly or if there aren't any addresses we can rule out here?
>
> By the definition of the shadow mapping, if we apply that mapping to
> the whole 64-bit address space, the result will only contain 1/8th
> (1/16th for SW/HW_TAGS) of that space.
>
> For example, with the current upstream value of KASAN_SHADOW_OFFSET on
> x86 and arm64, the value of the top 3 bits (4 for SW/HW_TAGS) of any
> shadow address are always the same: KASAN_SHADOW_OFFSET's value is
> such that the shadow address calculation never overflows. Addresses
> that have a different value for those top 3 bits are the once we can
> rule out.

Eh, scratch that, the 3rd bit from the top changes, as
KASAN_SHADOW_OFFSET is not a that-well-aligned value, the overall size
of the mapping holds.

> The KASAN_SHADOW_OFFSET value from my example does rely on the
> overflow (arguably, this makes things more confusing [1]). But still,
> the possible values of shadow addresses should only cover 1/16th of
> the address space.
>
> So whether the address belongs to that 1/8th (1/16th) of the address
> space is what we want to check in kasan_non_canonical_hook().
>
> The current upstream version of kasan_non_canonical_hook() actually
> does a simplified check by only checking for the lower bound (e.g. for
> x86, there's also an upper bound: KASAN_SHADOW_OFFSET +
> (0xffffffffffffffff >> 3) == 0xfffffbffffffffff), so we could improve
> it.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=218043
Maciej Wieczor-Retman Feb. 13, 2025, 4:20 p.m. UTC | #10
On 2025-02-13 at 02:28:08 +0100, Andrey Konovalov wrote:
>On Thu, Feb 13, 2025 at 2:21 AM Andrey Konovalov <andreyknvl@gmail.com> wrote:
>>
>> On Tue, Feb 11, 2025 at 7:07 PM Maciej Wieczor-Retman
>> <maciej.wieczor-retman@intel.com> wrote:
>> >
>> > I did some experiments with multiple addresses passed through
>> > kasan_mem_to_shadow(). And it seems like we can get almost any address out when
>> > we consider any random bogus pointers.
>> >
>> > I used the KASAN_SHADOW_OFFSET from your example above. Userspace addresses seem
>> > to map to the range [KASAN_SHADOW_OFFSET - 0xffff8fffffffffff]. Then going
>> > through non-canonical addresses until 0x0007ffffffffffff we reach the end of
>> > kernel LA and we loop around. Then the addresses seem to go from 0 until we
>> > again start reaching the kernel space and then it maps into the proper shadow
>> > memory.
>> >
>> > It gave me the same results when using the previous version of
>> > kasan_mem_to_shadow() so I'm wondering whether I'm doing this experiment
>> > incorrectly or if there aren't any addresses we can rule out here?
>>
>> By the definition of the shadow mapping, if we apply that mapping to
>> the whole 64-bit address space, the result will only contain 1/8th
>> (1/16th for SW/HW_TAGS) of that space.
>>
>> For example, with the current upstream value of KASAN_SHADOW_OFFSET on
>> x86 and arm64, the value of the top 3 bits (4 for SW/HW_TAGS) of any
>> shadow address are always the same: KASAN_SHADOW_OFFSET's value is
>> such that the shadow address calculation never overflows. Addresses
>> that have a different value for those top 3 bits are the once we can
>> rule out.
>
>Eh, scratch that, the 3rd bit from the top changes, as
>KASAN_SHADOW_OFFSET is not a that-well-aligned value, the overall size
>of the mapping holds.
>
>> The KASAN_SHADOW_OFFSET value from my example does rely on the
>> overflow (arguably, this makes things more confusing [1]). But still,
>> the possible values of shadow addresses should only cover 1/16th of
>> the address space.
>>
>> So whether the address belongs to that 1/8th (1/16th) of the address
>> space is what we want to check in kasan_non_canonical_hook().
>>

Right, I somehow forgot that obviously the whole LA has to map to 1/16th of the
address space and it shold stay contiguous.

After rethinking how the mapping worked before and will work after making stuff
signed I thought this patch could make use of the overflow?

From what I noticed, all the Kconfig values for KASAN_SHADOW_OFFSET should make
it so there will be overflow when inputing more and more positive addresses.

So maybe we should first find what the most negative and most positive (signed)
addresses map to in shadow memory address space. And then when looking for
invalid values that aren't the product of kasan_mem_to_shadow() we should check

	if (addr > kasan_mem_to_shadow(biggest_positive_address) &&
	    addr < kasan_mem_to_shadow(smallest_negative_address))
		return;

Is this correct?

I think this works with TBI because valid kernel addresses depending on the top
bit value will go both up and down from the KASAN_SHADOW_OFFSET. And the same
will work for x86 where the (negative) kernel addresses are mapped down of
KASAN_SHADOW_OFFSET and positive user addresses are mapped above and also
overflow (in tag-based mode).

>> The current upstream version of kasan_non_canonical_hook() actually
>> does a simplified check by only checking for the lower bound (e.g. for
>> x86, there's also an upper bound: KASAN_SHADOW_OFFSET +
>> (0xffffffffffffffff >> 3) == 0xfffffbffffffffff), so we could improve
>> it.

Right, Samuel's check for generic KASAN seems to cover that case.

>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=218043
>
>_______________________________________________
>linux-riscv mailing list
>linux-riscv@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/linux-riscv
Maciej Wieczor-Retman Feb. 14, 2025, 8:20 a.m. UTC | #11
On 2025-02-13 at 17:20:22 +0100, Maciej Wieczor-Retman wrote:
>On 2025-02-13 at 02:28:08 +0100, Andrey Konovalov wrote:
>>On Thu, Feb 13, 2025 at 2:21 AM Andrey Konovalov <andreyknvl@gmail.com> wrote:
>>>
>>> On Tue, Feb 11, 2025 at 7:07 PM Maciej Wieczor-Retman
>>> <maciej.wieczor-retman@intel.com> wrote:
>>> >
>>> > I did some experiments with multiple addresses passed through
>>> > kasan_mem_to_shadow(). And it seems like we can get almost any address out when
>>> > we consider any random bogus pointers.
>>> >
>>> > I used the KASAN_SHADOW_OFFSET from your example above. Userspace addresses seem
>>> > to map to the range [KASAN_SHADOW_OFFSET - 0xffff8fffffffffff]. Then going
>>> > through non-canonical addresses until 0x0007ffffffffffff we reach the end of
>>> > kernel LA and we loop around. Then the addresses seem to go from 0 until we
>>> > again start reaching the kernel space and then it maps into the proper shadow
>>> > memory.
>>> >
>>> > It gave me the same results when using the previous version of
>>> > kasan_mem_to_shadow() so I'm wondering whether I'm doing this experiment
>>> > incorrectly or if there aren't any addresses we can rule out here?
>>>
>>> By the definition of the shadow mapping, if we apply that mapping to
>>> the whole 64-bit address space, the result will only contain 1/8th
>>> (1/16th for SW/HW_TAGS) of that space.
>>>
>>> For example, with the current upstream value of KASAN_SHADOW_OFFSET on
>>> x86 and arm64, the value of the top 3 bits (4 for SW/HW_TAGS) of any
>>> shadow address are always the same: KASAN_SHADOW_OFFSET's value is
>>> such that the shadow address calculation never overflows. Addresses
>>> that have a different value for those top 3 bits are the once we can
>>> rule out.
>>
>>Eh, scratch that, the 3rd bit from the top changes, as
>>KASAN_SHADOW_OFFSET is not a that-well-aligned value, the overall size
>>of the mapping holds.
>>
>>> The KASAN_SHADOW_OFFSET value from my example does rely on the
>>> overflow (arguably, this makes things more confusing [1]). But still,
>>> the possible values of shadow addresses should only cover 1/16th of
>>> the address space.
>>>
>>> So whether the address belongs to that 1/8th (1/16th) of the address
>>> space is what we want to check in kasan_non_canonical_hook().
>>>
>
>Right, I somehow forgot that obviously the whole LA has to map to 1/16th of the
>address space and it shold stay contiguous.
>
>After rethinking how the mapping worked before and will work after making stuff
>signed I thought this patch could make use of the overflow?
>
>From what I noticed, all the Kconfig values for KASAN_SHADOW_OFFSET should make
>it so there will be overflow when inputing more and more positive addresses.
>
>So maybe we should first find what the most negative and most positive (signed)
>addresses map to in shadow memory address space. And then when looking for
>invalid values that aren't the product of kasan_mem_to_shadow() we should check
>
>	if (addr > kasan_mem_to_shadow(biggest_positive_address) &&
>	    addr < kasan_mem_to_shadow(smallest_negative_address))
>		return;
>
>Is this correct?

I suppose the original code in the patch does the same thing when you change the
|| into &&:

	if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 &&
	    addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
		return;

kasan_mem_to_shadow(0x7FFFFFFFFFFFFFFF) -> 0x07ff7fffffffffff
kasan_mem_to_shadow(0x8000000000000000) -> 0xf7ff800000000000

Also after thinking about this overflow and what maps where I rechecked the
kasan_shadow_to_mem() and addr_has_metadata() and they seem to return the values
I'd expect without making any changes there. Just mentioning this because I
recall you asked about it at the start of this thread.
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd9df6dcc593..6a326908c941 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -418,11 +418,11 @@  config KASAN_SHADOW_OFFSET
 	default 0xdffffe0000000000 if ARM64_VA_BITS_42 && !KASAN_SW_TAGS
 	default 0xdfffffc000000000 if ARM64_VA_BITS_39 && !KASAN_SW_TAGS
 	default 0xdffffff800000000 if ARM64_VA_BITS_36 && !KASAN_SW_TAGS
-	default 0xefff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
-	default 0xefffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
-	default 0xeffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
-	default 0xefffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
-	default 0xeffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
+	default 0xffff800000000000 if (ARM64_VA_BITS_48 || (ARM64_VA_BITS_52 && !ARM64_16K_PAGES)) && KASAN_SW_TAGS
+	default 0xffffc00000000000 if (ARM64_VA_BITS_47 || ARM64_VA_BITS_52) && ARM64_16K_PAGES && KASAN_SW_TAGS
+	default 0xfffffe0000000000 if ARM64_VA_BITS_42 && KASAN_SW_TAGS
+	default 0xffffffc000000000 if ARM64_VA_BITS_39 && KASAN_SW_TAGS
+	default 0xfffffff800000000 if ARM64_VA_BITS_36 && KASAN_SW_TAGS
 	default 0xffffffffffffffff
 
 config UNWIND_TABLES
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 0480c61dbb4f..a93fc9dc16f3 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -80,7 +80,8 @@ 
  * where KASAN_SHADOW_SCALE_SHIFT is the order of the number of bits that map
  * to a single shadow byte and KASAN_SHADOW_OFFSET is a constant that offsets
  * the mapping. Note that KASAN_SHADOW_OFFSET does not point to the start of
- * the shadow memory region.
+ * the shadow memory region, since not all possible addresses have shadow
+ * memory allocated for them.
  *
  * Based on this mapping, we define two constants:
  *
@@ -89,7 +90,15 @@ 
  *
  * KASAN_SHADOW_END is defined first as the shadow address that corresponds to
  * the upper bound of possible virtual kernel memory addresses UL(1) << 64
- * according to the mapping formula.
+ * according to the mapping formula. For Generic KASAN, the address in the
+ * mapping formula is treated as unsigned (part of the compiler's ABI), so the
+ * end of the shadow memory region is at a large positive offset from
+ * KASAN_SHADOW_OFFSET. For Software Tag-Based KASAN, the address in the
+ * formula is treated as signed. Since all kernel addresses are negative, they
+ * map to shadow memory below KASAN_SHADOW_OFFSET, making KASAN_SHADOW_OFFSET
+ * itself the end of the shadow memory region. (User pointers are positive and
+ * would map to shadow memory above KASAN_SHADOW_OFFSET, but shadow memory is
+ * not allocated for them.)
  *
  * KASAN_SHADOW_START is defined second based on KASAN_SHADOW_END. The shadow
  * memory start must map to the lowest possible kernel virtual memory address
@@ -100,7 +109,11 @@ 
  */
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 #define KASAN_SHADOW_OFFSET	_AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
+#ifdef CONFIG_KASAN_GENERIC
 #define KASAN_SHADOW_END	((UL(1) << (64 - KASAN_SHADOW_SCALE_SHIFT)) + KASAN_SHADOW_OFFSET)
+#else
+#define KASAN_SHADOW_END	KASAN_SHADOW_OFFSET
+#endif
 #define _KASAN_SHADOW_START(va)	(KASAN_SHADOW_END - (UL(1) << ((va) - KASAN_SHADOW_SCALE_SHIFT)))
 #define KASAN_SHADOW_START	_KASAN_SHADOW_START(vabits_actual)
 #define PAGE_END		KASAN_SHADOW_START
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index b65a29440a0c..6836e571555c 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -198,8 +198,11 @@  static bool __init root_level_aligned(u64 addr)
 /* The early shadow maps everything to a single page of zeroes */
 asmlinkage void __init kasan_early_init(void)
 {
-	BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
-		KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		BUILD_BUG_ON(KASAN_SHADOW_OFFSET !=
+			KASAN_SHADOW_END - (1UL << (64 - KASAN_SHADOW_SCALE_SHIFT)));
+	else
+		BUILD_BUG_ON(KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
 	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS), SHADOW_ALIGN));
 	BUILD_BUG_ON(!IS_ALIGNED(_KASAN_SHADOW_START(VA_BITS_MIN), SHADOW_ALIGN));
 	BUILD_BUG_ON(!IS_ALIGNED(KASAN_SHADOW_END, SHADOW_ALIGN));
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 00a3bf7c0d8f..03b440658817 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -58,8 +58,14 @@  int kasan_populate_early_shadow(const void *shadow_start,
 #ifndef kasan_mem_to_shadow
 static inline void *kasan_mem_to_shadow(const void *addr)
 {
-	return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
-		+ KASAN_SHADOW_OFFSET;
+	void *scaled;
+
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC))
+		scaled = (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+	else
+		scaled = (void *)((long)addr >> KASAN_SHADOW_SCALE_SHIFT);
+
+	return KASAN_SHADOW_OFFSET + scaled;
 }
 #endif
 
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index b48c768acc84..c08097715686 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -644,15 +644,29 @@  void kasan_report_async(void)
  */
 void kasan_non_canonical_hook(unsigned long addr)
 {
+	unsigned long max_shadow_size = BIT(BITS_PER_LONG - KASAN_SHADOW_SCALE_SHIFT);
 	unsigned long orig_addr;
 	const char *bug_type;
 
 	/*
-	 * All addresses that came as a result of the memory-to-shadow mapping
-	 * (even for bogus pointers) must be >= KASAN_SHADOW_OFFSET.
+	 * With the default kasan_mem_to_shadow() algorithm, all addresses
+	 * returned by the memory-to-shadow mapping (even for bogus pointers)
+	 * must be within a certain displacement from KASAN_SHADOW_OFFSET.
+	 *
+	 * For Generic KASAN, the displacement is unsigned, so
+	 * KASAN_SHADOW_OFFSET is the smallest possible shadow address. For
+	 * Software Tag-Based KASAN, the displacement is signed, so
+	 * KASAN_SHADOW_OFFSET is the center of the range.
 	 */
-	if (addr < KASAN_SHADOW_OFFSET)
-		return;
+	if (IS_ENABLED(CONFIG_KASAN_GENERIC)) {
+		if (addr < KASAN_SHADOW_OFFSET ||
+		    addr >= KASAN_SHADOW_OFFSET + max_shadow_size)
+			return;
+	} else {
+		if (addr < KASAN_SHADOW_OFFSET - max_shadow_size / 2 ||
+		    addr >= KASAN_SHADOW_OFFSET + max_shadow_size / 2)
+			return;
+	}
 
 	orig_addr = (unsigned long)kasan_shadow_to_mem((void *)addr);
 
diff --git a/scripts/gdb/linux/mm.py b/scripts/gdb/linux/mm.py
index 7571aebbe650..2e63f3dedd53 100644
--- a/scripts/gdb/linux/mm.py
+++ b/scripts/gdb/linux/mm.py
@@ -110,12 +110,13 @@  class aarch64_page_ops():
         self.KERNEL_END = gdb.parse_and_eval("_end")
 
         if constants.LX_CONFIG_KASAN_GENERIC or constants.LX_CONFIG_KASAN_SW_TAGS:
+            self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
             if constants.LX_CONFIG_KASAN_GENERIC:
                 self.KASAN_SHADOW_SCALE_SHIFT = 3
+                self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
             else:
                 self.KASAN_SHADOW_SCALE_SHIFT = 4
-            self.KASAN_SHADOW_OFFSET = constants.LX_CONFIG_KASAN_SHADOW_OFFSET
-            self.KASAN_SHADOW_END = (1 << (64 - self.KASAN_SHADOW_SCALE_SHIFT)) + self.KASAN_SHADOW_OFFSET
+                self.KASAN_SHADOW_END = self.KASAN_SHADOW_OFFSET
             self.PAGE_END = self.KASAN_SHADOW_END - (1 << (self.vabits_actual - self.KASAN_SHADOW_SCALE_SHIFT))
         else:
             self.PAGE_END = self._PAGE_END(self.VA_BITS_MIN)