diff mbox series

[v4,5/5] arm64: mte: Inline mte_assign_mem_tag_range()

Message ID 20210118183033.41764-6-vincenzo.frascino@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: ARMv8.5-A: MTE: Add async mode support | expand

Commit Message

Vincenzo Frascino Jan. 18, 2021, 6:30 p.m. UTC
mte_assign_mem_tag_range() is called on production KASAN HW hot
paths. It makes sense to inline it in an attempt to reduce the
overhead.

Inline mte_assign_mem_tag_range() based on the indications provided at
[1].

[1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
---
 arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++-
 arch/arm64/lib/mte.S         | 15 ---------------
 2 files changed, 25 insertions(+), 16 deletions(-)

Comments

Catalin Marinas Jan. 19, 2021, 2:45 p.m. UTC | #1
On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote:
> mte_assign_mem_tag_range() is called on production KASAN HW hot
> paths. It makes sense to inline it in an attempt to reduce the
> overhead.
> 
> Inline mte_assign_mem_tag_range() based on the indications provided at
> [1].
> 
> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> ---
>  arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++-
>  arch/arm64/lib/mte.S         | 15 ---------------
>  2 files changed, 25 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> index 237bb2f7309d..1a6fd53f82c3 100644
> --- a/arch/arm64/include/asm/mte.h
> +++ b/arch/arm64/include/asm/mte.h
> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task);
>  int mte_ptrace_copy_tags(struct task_struct *child, long request,
>  			 unsigned long addr, unsigned long data);
>  
> -void mte_assign_mem_tag_range(void *addr, size_t size);
> +static inline void mte_assign_mem_tag_range(void *addr, size_t size)
> +{
> +	u64 _addr = (u64)addr;
> +	u64 _end = _addr + size;
> +
> +	/*
> +	 * This function must be invoked from an MTE enabled context.
> +	 *
> +	 * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> +	 * size must be non-zero and MTE_GRANULE_SIZE aligned.
> +	 */
> +	do {
> +		/*
> +		 * 'asm volatile' is required to prevent the compiler to move
> +		 * the statement outside of the loop.
> +		 */
> +		asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
> +			     :
> +			     : "r" (_addr)
> +			     : "memory");
> +
> +		_addr += MTE_GRANULE_SIZE;
> +	} while (_addr != _end);
> +}

While I'm ok with moving this function to C, I don't think it solves the
inlining in the kasan code. The only interface we have to kasan is via
mte_{set,get}_mem_tag_range(), so the above function doesn't need to
live in a header.

If you do want inlining all the way to the kasan code, we should
probably move the mte_{set,get}_mem_tag_range() functions to the header
as well (and ideally backed by some numbers to show that it matters).

Moving it to mte.c also gives us more control on how it's called (we
have the WARN_ONs in place in the callers).
Vincenzo Frascino Jan. 19, 2021, 3:48 p.m. UTC | #2
Hi Catalin,

On 1/19/21 2:45 PM, Catalin Marinas wrote:
> On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote:
>> mte_assign_mem_tag_range() is called on production KASAN HW hot
>> paths. It makes sense to inline it in an attempt to reduce the
>> overhead.
>>
>> Inline mte_assign_mem_tag_range() based on the indications provided at
>> [1].
>>
>> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
>> ---
>>  arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++-
>>  arch/arm64/lib/mte.S         | 15 ---------------
>>  2 files changed, 25 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
>> index 237bb2f7309d..1a6fd53f82c3 100644
>> --- a/arch/arm64/include/asm/mte.h
>> +++ b/arch/arm64/include/asm/mte.h
>> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task);
>>  int mte_ptrace_copy_tags(struct task_struct *child, long request,
>>  			 unsigned long addr, unsigned long data);
>>  
>> -void mte_assign_mem_tag_range(void *addr, size_t size);
>> +static inline void mte_assign_mem_tag_range(void *addr, size_t size)
>> +{
>> +	u64 _addr = (u64)addr;
>> +	u64 _end = _addr + size;
>> +
>> +	/*
>> +	 * This function must be invoked from an MTE enabled context.
>> +	 *
>> +	 * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
>> +	 * size must be non-zero and MTE_GRANULE_SIZE aligned.
>> +	 */
>> +	do {
>> +		/*
>> +		 * 'asm volatile' is required to prevent the compiler to move
>> +		 * the statement outside of the loop.
>> +		 */
>> +		asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
>> +			     :
>> +			     : "r" (_addr)
>> +			     : "memory");
>> +
>> +		_addr += MTE_GRANULE_SIZE;
>> +	} while (_addr != _end);
>> +}
> 
> While I'm ok with moving this function to C, I don't think it solves the
> inlining in the kasan code. The only interface we have to kasan is via
> mte_{set,get}_mem_tag_range(), so the above function doesn't need to
> live in a header.
> 
> If you do want inlining all the way to the kasan code, we should
> probably move the mte_{set,get}_mem_tag_range() functions to the header
> as well (and ideally backed by some numbers to show that it matters).
> 
> Moving it to mte.c also gives us more control on how it's called (we
> have the WARN_ONs in place in the callers).
> 

Based on the thread [1] this patch contains only an intermediate step to allow
KASAN to call directly mte_assign_mem_tag_range() in future. At that point I
think that mte_set_mem_tag_range() can be removed.

If you agree, I would live the things like this to give to Andrey a chance to
execute on the original plan with a separate series.

I agree though that this change alone does not bring huge benefits but
regressions neither.

If you want I can add something to the commit message in the next version to
make this more explicit.

Let me know how do you want me to proceed.
Andrey Konovalov Jan. 19, 2021, 6:12 p.m. UTC | #3
On Tue, Jan 19, 2021 at 4:45 PM Vincenzo Frascino
<vincenzo.frascino@arm.com> wrote:
>
> Hi Catalin,
>
> On 1/19/21 2:45 PM, Catalin Marinas wrote:
> > On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote:
> >> mte_assign_mem_tag_range() is called on production KASAN HW hot
> >> paths. It makes sense to inline it in an attempt to reduce the
> >> overhead.
> >>
> >> Inline mte_assign_mem_tag_range() based on the indications provided at
> >> [1].
> >>
> >> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/
> >>
> >> Cc: Catalin Marinas <catalin.marinas@arm.com>
> >> Cc: Will Deacon <will@kernel.org>
> >> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> >> ---
> >>  arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++-
> >>  arch/arm64/lib/mte.S         | 15 ---------------
> >>  2 files changed, 25 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> >> index 237bb2f7309d..1a6fd53f82c3 100644
> >> --- a/arch/arm64/include/asm/mte.h
> >> +++ b/arch/arm64/include/asm/mte.h
> >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task);
> >>  int mte_ptrace_copy_tags(struct task_struct *child, long request,
> >>                       unsigned long addr, unsigned long data);
> >>
> >> -void mte_assign_mem_tag_range(void *addr, size_t size);
> >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size)
> >> +{
> >> +    u64 _addr = (u64)addr;
> >> +    u64 _end = _addr + size;
> >> +
> >> +    /*
> >> +     * This function must be invoked from an MTE enabled context.
> >> +     *
> >> +     * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> >> +     * size must be non-zero and MTE_GRANULE_SIZE aligned.
> >> +     */
> >> +    do {
> >> +            /*
> >> +             * 'asm volatile' is required to prevent the compiler to move
> >> +             * the statement outside of the loop.
> >> +             */
> >> +            asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
> >> +                         :
> >> +                         : "r" (_addr)
> >> +                         : "memory");
> >> +
> >> +            _addr += MTE_GRANULE_SIZE;
> >> +    } while (_addr != _end);
> >> +}
> >
> > While I'm ok with moving this function to C, I don't think it solves the
> > inlining in the kasan code. The only interface we have to kasan is via
> > mte_{set,get}_mem_tag_range(), so the above function doesn't need to
> > live in a header.
> >
> > If you do want inlining all the way to the kasan code, we should
> > probably move the mte_{set,get}_mem_tag_range() functions to the header
> > as well (and ideally backed by some numbers to show that it matters).
> >
> > Moving it to mte.c also gives us more control on how it's called (we
> > have the WARN_ONs in place in the callers).
> >
>
> Based on the thread [1] this patch contains only an intermediate step to allow
> KASAN to call directly mte_assign_mem_tag_range() in future. At that point I
> think that mte_set_mem_tag_range() can be removed.
>
> If you agree, I would live the things like this to give to Andrey a chance to
> execute on the original plan with a separate series.

I think we should drop this patch from this series as it's unrelated.

I will pick it up into my future optimization series. Then it will be
easier to discuss it in the context. The important part that I needed
is an inlinable C implementation of mte_assign_mem_tag_range(), which
I now have with this patch.

Thanks, Vincenzo!
Catalin Marinas Jan. 19, 2021, 7 p.m. UTC | #4
On Tue, Jan 19, 2021 at 07:12:40PM +0100, Andrey Konovalov wrote:
> On Tue, Jan 19, 2021 at 4:45 PM Vincenzo Frascino
> <vincenzo.frascino@arm.com> wrote:
> > On 1/19/21 2:45 PM, Catalin Marinas wrote:
> > > On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote:
> > >> mte_assign_mem_tag_range() is called on production KASAN HW hot
> > >> paths. It makes sense to inline it in an attempt to reduce the
> > >> overhead.
> > >>
> > >> Inline mte_assign_mem_tag_range() based on the indications provided at
> > >> [1].
> > >>
> > >> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/
> > >>
> > >> Cc: Catalin Marinas <catalin.marinas@arm.com>
> > >> Cc: Will Deacon <will@kernel.org>
> > >> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > >> ---
> > >>  arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++-
> > >>  arch/arm64/lib/mte.S         | 15 ---------------
> > >>  2 files changed, 25 insertions(+), 16 deletions(-)
> > >>
> > >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> > >> index 237bb2f7309d..1a6fd53f82c3 100644
> > >> --- a/arch/arm64/include/asm/mte.h
> > >> +++ b/arch/arm64/include/asm/mte.h
> > >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task);
> > >>  int mte_ptrace_copy_tags(struct task_struct *child, long request,
> > >>                       unsigned long addr, unsigned long data);
> > >>
> > >> -void mte_assign_mem_tag_range(void *addr, size_t size);
> > >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size)
> > >> +{
> > >> +    u64 _addr = (u64)addr;
> > >> +    u64 _end = _addr + size;
> > >> +
> > >> +    /*
> > >> +     * This function must be invoked from an MTE enabled context.
> > >> +     *
> > >> +     * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> > >> +     * size must be non-zero and MTE_GRANULE_SIZE aligned.
> > >> +     */
> > >> +    do {
> > >> +            /*
> > >> +             * 'asm volatile' is required to prevent the compiler to move
> > >> +             * the statement outside of the loop.
> > >> +             */
> > >> +            asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
> > >> +                         :
> > >> +                         : "r" (_addr)
> > >> +                         : "memory");
> > >> +
> > >> +            _addr += MTE_GRANULE_SIZE;
> > >> +    } while (_addr != _end);
> > >> +}
> > >
> > > While I'm ok with moving this function to C, I don't think it solves the
> > > inlining in the kasan code. The only interface we have to kasan is via
> > > mte_{set,get}_mem_tag_range(), so the above function doesn't need to
> > > live in a header.
> > >
> > > If you do want inlining all the way to the kasan code, we should
> > > probably move the mte_{set,get}_mem_tag_range() functions to the header
> > > as well (and ideally backed by some numbers to show that it matters).
> > >
> > > Moving it to mte.c also gives us more control on how it's called (we
> > > have the WARN_ONs in place in the callers).
> > >
> >
> > Based on the thread [1] this patch contains only an intermediate step to allow
> > KASAN to call directly mte_assign_mem_tag_range() in future. At that point I
> > think that mte_set_mem_tag_range() can be removed.
> >
> > If you agree, I would live the things like this to give to Andrey a chance to
> > execute on the original plan with a separate series.
> 
> I think we should drop this patch from this series as it's unrelated.
> 
> I will pick it up into my future optimization series. Then it will be
> easier to discuss it in the context. The important part that I needed
> is an inlinable C implementation of mte_assign_mem_tag_range(), which
> I now have with this patch.

That's fine by me but we may want to add some forced-alignment on the
addr and size as the loop here depends on them being aligned, otherwise
it gets stuck. The mte_set_mem_tag_range() at least had a WARN_ON in
place. Here we could do:

	addr &= MTE_GRANULE_MASK;
	size = ALIGN(size, MTE_GRANULE_SIZE);

(or maybe trim "size" with MTE_GRANULE_MASK)

That's unless the call places are well known and guarantee this
alignment (only had a very brief look).
Andrey Konovalov Jan. 19, 2021, 7:34 p.m. UTC | #5
On Tue, Jan 19, 2021 at 8:00 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Tue, Jan 19, 2021 at 07:12:40PM +0100, Andrey Konovalov wrote:
> > On Tue, Jan 19, 2021 at 4:45 PM Vincenzo Frascino
> > <vincenzo.frascino@arm.com> wrote:
> > > On 1/19/21 2:45 PM, Catalin Marinas wrote:
> > > > On Mon, Jan 18, 2021 at 06:30:33PM +0000, Vincenzo Frascino wrote:
> > > >> mte_assign_mem_tag_range() is called on production KASAN HW hot
> > > >> paths. It makes sense to inline it in an attempt to reduce the
> > > >> overhead.
> > > >>
> > > >> Inline mte_assign_mem_tag_range() based on the indications provided at
> > > >> [1].
> > > >>
> > > >> [1] https://lore.kernel.org/r/CAAeHK+wCO+J7D1_T89DG+jJrPLk3X9RsGFKxJGd0ZcUFjQT-9Q@mail.gmail.com/
> > > >>
> > > >> Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > >> Cc: Will Deacon <will@kernel.org>
> > > >> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> > > >> ---
> > > >>  arch/arm64/include/asm/mte.h | 26 +++++++++++++++++++++++++-
> > > >>  arch/arm64/lib/mte.S         | 15 ---------------
> > > >>  2 files changed, 25 insertions(+), 16 deletions(-)
> > > >>
> > > >> diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
> > > >> index 237bb2f7309d..1a6fd53f82c3 100644
> > > >> --- a/arch/arm64/include/asm/mte.h
> > > >> +++ b/arch/arm64/include/asm/mte.h
> > > >> @@ -49,7 +49,31 @@ long get_mte_ctrl(struct task_struct *task);
> > > >>  int mte_ptrace_copy_tags(struct task_struct *child, long request,
> > > >>                       unsigned long addr, unsigned long data);
> > > >>
> > > >> -void mte_assign_mem_tag_range(void *addr, size_t size);
> > > >> +static inline void mte_assign_mem_tag_range(void *addr, size_t size)
> > > >> +{
> > > >> +    u64 _addr = (u64)addr;
> > > >> +    u64 _end = _addr + size;
> > > >> +
> > > >> +    /*
> > > >> +     * This function must be invoked from an MTE enabled context.
> > > >> +     *
> > > >> +     * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
> > > >> +     * size must be non-zero and MTE_GRANULE_SIZE aligned.
> > > >> +     */
> > > >> +    do {
> > > >> +            /*
> > > >> +             * 'asm volatile' is required to prevent the compiler to move
> > > >> +             * the statement outside of the loop.
> > > >> +             */
> > > >> +            asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
> > > >> +                         :
> > > >> +                         : "r" (_addr)
> > > >> +                         : "memory");
> > > >> +
> > > >> +            _addr += MTE_GRANULE_SIZE;
> > > >> +    } while (_addr != _end);
> > > >> +}
> > > >
> > > > While I'm ok with moving this function to C, I don't think it solves the
> > > > inlining in the kasan code. The only interface we have to kasan is via
> > > > mte_{set,get}_mem_tag_range(), so the above function doesn't need to
> > > > live in a header.
> > > >
> > > > If you do want inlining all the way to the kasan code, we should
> > > > probably move the mte_{set,get}_mem_tag_range() functions to the header
> > > > as well (and ideally backed by some numbers to show that it matters).
> > > >
> > > > Moving it to mte.c also gives us more control on how it's called (we
> > > > have the WARN_ONs in place in the callers).
> > > >
> > >
> > > Based on the thread [1] this patch contains only an intermediate step to allow
> > > KASAN to call directly mte_assign_mem_tag_range() in future. At that point I
> > > think that mte_set_mem_tag_range() can be removed.
> > >
> > > If you agree, I would live the things like this to give to Andrey a chance to
> > > execute on the original plan with a separate series.
> >
> > I think we should drop this patch from this series as it's unrelated.
> >
> > I will pick it up into my future optimization series. Then it will be
> > easier to discuss it in the context. The important part that I needed
> > is an inlinable C implementation of mte_assign_mem_tag_range(), which
> > I now have with this patch.
>
> That's fine by me but we may want to add some forced-alignment on the
> addr and size as the loop here depends on them being aligned, otherwise
> it gets stuck. The mte_set_mem_tag_range() at least had a WARN_ON in
> place. Here we could do:
>
>         addr &= MTE_GRANULE_MASK;
>         size = ALIGN(size, MTE_GRANULE_SIZE);
>
> (or maybe trim "size" with MTE_GRANULE_MASK)
>
> That's unless the call places are well known and guarantee this
> alignment (only had a very brief look).

No problem. I'll either add the ALIGN or change the call site to
ensure alignment.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 237bb2f7309d..1a6fd53f82c3 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -49,7 +49,31 @@  long get_mte_ctrl(struct task_struct *task);
 int mte_ptrace_copy_tags(struct task_struct *child, long request,
 			 unsigned long addr, unsigned long data);
 
-void mte_assign_mem_tag_range(void *addr, size_t size);
+static inline void mte_assign_mem_tag_range(void *addr, size_t size)
+{
+	u64 _addr = (u64)addr;
+	u64 _end = _addr + size;
+
+	/*
+	 * This function must be invoked from an MTE enabled context.
+	 *
+	 * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
+	 * size must be non-zero and MTE_GRANULE_SIZE aligned.
+	 */
+	do {
+		/*
+		 * 'asm volatile' is required to prevent the compiler to move
+		 * the statement outside of the loop.
+		 */
+		asm volatile(__MTE_PREAMBLE "stg %0, [%0]"
+			     :
+			     : "r" (_addr)
+			     : "memory");
+
+		_addr += MTE_GRANULE_SIZE;
+	} while (_addr != _end);
+}
+
 
 #else /* CONFIG_ARM64_MTE */
 
diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
index 9e1a12e10053..a0a650451510 100644
--- a/arch/arm64/lib/mte.S
+++ b/arch/arm64/lib/mte.S
@@ -150,18 +150,3 @@  SYM_FUNC_START(mte_restore_page_tags)
 	ret
 SYM_FUNC_END(mte_restore_page_tags)
 
-/*
- * Assign allocation tags for a region of memory based on the pointer tag
- *   x0 - source pointer
- *   x1 - size
- *
- * Note: The address must be non-NULL and MTE_GRANULE_SIZE aligned and
- * size must be non-zero and MTE_GRANULE_SIZE aligned.
- */
-SYM_FUNC_START(mte_assign_mem_tag_range)
-1:	stg	x0, [x0]
-	add	x0, x0, #MTE_GRANULE_SIZE
-	subs	x1, x1, #MTE_GRANULE_SIZE
-	b.gt	1b
-	ret
-SYM_FUNC_END(mte_assign_mem_tag_range)