diff mbox series

[v14,8/8] kselftest/arm64: Verify that TCO is enabled in load_unaligned_zeropad()

Message ID 20210308161434.33424-9-vincenzo.frascino@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: ARMv8.5-A: MTE: Add async mode support | expand

Commit Message

Vincenzo Frascino March 8, 2021, 4:14 p.m. UTC
load_unaligned_zeropad() and __get/put_kernel_nofault() functions can
read passed some buffer limits which may include some MTE granule with a
different tag.

When MTE async mode is enable, the load operation crosses the boundaries
and the next granule has a different tag the PE sets the TFSR_EL1.TF1
bit as if an asynchronous tag fault is happened:

 ==================================================================
 BUG: KASAN: invalid-access
 Asynchronous mode enabled: no access details available

 CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8
 Hardware name: FVP Base RevC (DT)
 Call trace:
   dump_backtrace+0x0/0x1c0
   show_stack+0x18/0x24
   dump_stack+0xcc/0x14c
   kasan_report_async+0x54/0x70
   mte_check_tfsr_el1+0x48/0x4c
   exit_to_user_mode+0x18/0x38
   finish_ret_to_user+0x4/0x15c
 ==================================================================

Verify that Tag Check Override (TCO) is enabled in these functions before
the load and disable it afterwards to prevent this to happen.

Note: The issue has been observed only with an MTE enabled userspace.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reported-by: Branislav Rankov <Branislav.Rankov@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
---
 .../arm64/mte/check_read_beyond_buffer.c      | 78 +++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100644 tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c

Comments

Catalin Marinas March 11, 2021, 1:25 p.m. UTC | #1
On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote:
> load_unaligned_zeropad() and __get/put_kernel_nofault() functions can
> read passed some buffer limits which may include some MTE granule with a
> different tag.
> 
> When MTE async mode is enable, the load operation crosses the boundaries
> and the next granule has a different tag the PE sets the TFSR_EL1.TF1
> bit as if an asynchronous tag fault is happened:
> 
>  ==================================================================
>  BUG: KASAN: invalid-access
>  Asynchronous mode enabled: no access details available
> 
>  CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8
>  Hardware name: FVP Base RevC (DT)
>  Call trace:
>    dump_backtrace+0x0/0x1c0
>    show_stack+0x18/0x24
>    dump_stack+0xcc/0x14c
>    kasan_report_async+0x54/0x70
>    mte_check_tfsr_el1+0x48/0x4c
>    exit_to_user_mode+0x18/0x38
>    finish_ret_to_user+0x4/0x15c
>  ==================================================================
> 
> Verify that Tag Check Override (TCO) is enabled in these functions before
> the load and disable it afterwards to prevent this to happen.
> 
> Note: The issue has been observed only with an MTE enabled userspace.

The above bug is all about kernel buffers. While userspace can trigger
the relevant code paths, it should not matter whether the user has MTE
enabled or not. Can you please confirm that you can still triggered the
fault with kernel-mode MTE but non-MTE user-space? If not, we may have a
bug somewhere as the two are unrelated: load_unaligned_zeropad() only
acts on kernel buffers and are subject to the kernel MTE tag check fault
mode.

I don't think we should have a user-space selftest for this. The bug is
not about a user-kernel interface, so an in-kernel test is more
appropriate. Could we instead add this to the kasan tests and calling
load_unaligned_zeropad() and other functions directly?
Vincenzo Frascino March 11, 2021, 3 p.m. UTC | #2
On 3/11/21 1:25 PM, Catalin Marinas wrote:
> On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote:
>> load_unaligned_zeropad() and __get/put_kernel_nofault() functions can
>> read passed some buffer limits which may include some MTE granule with a
>> different tag.
>>
>> When MTE async mode is enable, the load operation crosses the boundaries
>> and the next granule has a different tag the PE sets the TFSR_EL1.TF1
>> bit as if an asynchronous tag fault is happened:
>>
>>  ==================================================================
>>  BUG: KASAN: invalid-access
>>  Asynchronous mode enabled: no access details available
>>
>>  CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8
>>  Hardware name: FVP Base RevC (DT)
>>  Call trace:
>>    dump_backtrace+0x0/0x1c0
>>    show_stack+0x18/0x24
>>    dump_stack+0xcc/0x14c
>>    kasan_report_async+0x54/0x70
>>    mte_check_tfsr_el1+0x48/0x4c
>>    exit_to_user_mode+0x18/0x38
>>    finish_ret_to_user+0x4/0x15c
>>  ==================================================================
>>
>> Verify that Tag Check Override (TCO) is enabled in these functions before
>> the load and disable it afterwards to prevent this to happen.
>>
>> Note: The issue has been observed only with an MTE enabled userspace.
> 
> The above bug is all about kernel buffers. While userspace can trigger
> the relevant code paths, it should not matter whether the user has MTE
> enabled or not. Can you please confirm that you can still triggered the
> fault with kernel-mode MTE but non-MTE user-space? If not, we may have a
> bug somewhere as the two are unrelated: load_unaligned_zeropad() only
> acts on kernel buffers and are subject to the kernel MTE tag check fault
> mode.
>

I retried and you are right, it does not matter if it is a MTE or non-MTE
user-space. The issue seems to be that this test does not trigger the problem
all the times which probably lead me to the wrong conclusions.

> I don't think we should have a user-space selftest for this. The bug is
> not about a user-kernel interface, so an in-kernel test is more
> appropriate. Could we instead add this to the kasan tests and calling
> load_unaligned_zeropad() and other functions directly?
> 

I agree with you we should abandon this strategy of triggering the issue due to
my comment above. I will investigate the option of having a kasan test and try
to come up with one that calls the relevant functions directly. I would prefer
though, since the rest of the series is almost ready, to post it in a future
series. What do you think?
Catalin Marinas March 11, 2021, 4:28 p.m. UTC | #3
On Thu, Mar 11, 2021 at 03:00:26PM +0000, Vincenzo Frascino wrote:
> On 3/11/21 1:25 PM, Catalin Marinas wrote:
> > On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote:
> >> load_unaligned_zeropad() and __get/put_kernel_nofault() functions can
> >> read passed some buffer limits which may include some MTE granule with a
> >> different tag.
> >>
> >> When MTE async mode is enable, the load operation crosses the boundaries
> >> and the next granule has a different tag the PE sets the TFSR_EL1.TF1
> >> bit as if an asynchronous tag fault is happened:
> >>
> >>  ==================================================================
> >>  BUG: KASAN: invalid-access
> >>  Asynchronous mode enabled: no access details available
> >>
> >>  CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8
> >>  Hardware name: FVP Base RevC (DT)
> >>  Call trace:
> >>    dump_backtrace+0x0/0x1c0
> >>    show_stack+0x18/0x24
> >>    dump_stack+0xcc/0x14c
> >>    kasan_report_async+0x54/0x70
> >>    mte_check_tfsr_el1+0x48/0x4c
> >>    exit_to_user_mode+0x18/0x38
> >>    finish_ret_to_user+0x4/0x15c
> >>  ==================================================================
> >>
> >> Verify that Tag Check Override (TCO) is enabled in these functions before
> >> the load and disable it afterwards to prevent this to happen.
> >>
> >> Note: The issue has been observed only with an MTE enabled userspace.
> > 
> > The above bug is all about kernel buffers. While userspace can trigger
> > the relevant code paths, it should not matter whether the user has MTE
> > enabled or not. Can you please confirm that you can still triggered the
> > fault with kernel-mode MTE but non-MTE user-space? If not, we may have a
> > bug somewhere as the two are unrelated: load_unaligned_zeropad() only
> > acts on kernel buffers and are subject to the kernel MTE tag check fault
> > mode.
> 
> I retried and you are right, it does not matter if it is a MTE or non-MTE
> user-space. The issue seems to be that this test does not trigger the problem
> all the times which probably lead me to the wrong conclusions.

Keep the test around for some quick checks before you get the kasan
test support.

> > I don't think we should have a user-space selftest for this. The bug is
> > not about a user-kernel interface, so an in-kernel test is more
> > appropriate. Could we instead add this to the kasan tests and calling
> > load_unaligned_zeropad() and other functions directly?
> 
> I agree with you we should abandon this strategy of triggering the issue due to
> my comment above. I will investigate the option of having a kasan test and try
> to come up with one that calls the relevant functions directly. I would prefer
> though, since the rest of the series is almost ready, to post it in a future
> series. What do you think?

That's fine by me.
Vincenzo Frascino March 11, 2021, 4:34 p.m. UTC | #4
On 3/11/21 4:28 PM, Catalin Marinas wrote:
> On Thu, Mar 11, 2021 at 03:00:26PM +0000, Vincenzo Frascino wrote:
>> On 3/11/21 1:25 PM, Catalin Marinas wrote:
>>> On Mon, Mar 08, 2021 at 04:14:34PM +0000, Vincenzo Frascino wrote:
>>>> load_unaligned_zeropad() and __get/put_kernel_nofault() functions can
>>>> read passed some buffer limits which may include some MTE granule with a
>>>> different tag.
>>>>
>>>> When MTE async mode is enable, the load operation crosses the boundaries
>>>> and the next granule has a different tag the PE sets the TFSR_EL1.TF1
>>>> bit as if an asynchronous tag fault is happened:
>>>>
>>>>  ==================================================================
>>>>  BUG: KASAN: invalid-access
>>>>  Asynchronous mode enabled: no access details available
>>>>
>>>>  CPU: 0 PID: 1 Comm: init Not tainted 5.12.0-rc1-ge1045c86620d-dirty #8
>>>>  Hardware name: FVP Base RevC (DT)
>>>>  Call trace:
>>>>    dump_backtrace+0x0/0x1c0
>>>>    show_stack+0x18/0x24
>>>>    dump_stack+0xcc/0x14c
>>>>    kasan_report_async+0x54/0x70
>>>>    mte_check_tfsr_el1+0x48/0x4c
>>>>    exit_to_user_mode+0x18/0x38
>>>>    finish_ret_to_user+0x4/0x15c
>>>>  ==================================================================
>>>>
>>>> Verify that Tag Check Override (TCO) is enabled in these functions before
>>>> the load and disable it afterwards to prevent this to happen.
>>>>
>>>> Note: The issue has been observed only with an MTE enabled userspace.
>>>
>>> The above bug is all about kernel buffers. While userspace can trigger
>>> the relevant code paths, it should not matter whether the user has MTE
>>> enabled or not. Can you please confirm that you can still triggered the
>>> fault with kernel-mode MTE but non-MTE user-space? If not, we may have a
>>> bug somewhere as the two are unrelated: load_unaligned_zeropad() only
>>> acts on kernel buffers and are subject to the kernel MTE tag check fault
>>> mode.
>>
>> I retried and you are right, it does not matter if it is a MTE or non-MTE
>> user-space. The issue seems to be that this test does not trigger the problem
>> all the times which probably lead me to the wrong conclusions.
> 
> Keep the test around for some quick checks before you get the kasan
> test support.
> 

Of course, I never throw away my code.

>>> I don't think we should have a user-space selftest for this. The bug is
>>> not about a user-kernel interface, so an in-kernel test is more
>>> appropriate. Could we instead add this to the kasan tests and calling
>>> load_unaligned_zeropad() and other functions directly?
>>
>> I agree with you we should abandon this strategy of triggering the issue due to
>> my comment above. I will investigate the option of having a kasan test and try
>> to come up with one that calls the relevant functions directly. I would prefer
>> though, since the rest of the series is almost ready, to post it in a future
>> series. What do you think?
> 
> That's fine by me.
>
diff mbox series

Patch

diff --git a/tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c b/tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c
new file mode 100644
index 000000000000..eb03cd52a58e
--- /dev/null
+++ b/tools/testing/selftests/arm64/mte/check_read_beyond_buffer.c
@@ -0,0 +1,78 @@ 
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2020 ARM Limited
+
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <time.h>
+#include <unistd.h>
+#include <sys/auxv.h>
+#include <sys/mman.h>
+#include <sys/prctl.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "kselftest.h"
+#include "mte_common_util.h"
+#include "mte_def.h"
+
+#define NUM_DEVICES		8
+
+static char *dev[NUM_DEVICES] = {
+	"/proc/cmdline",
+	"/fstab.fvp",
+	"/dev/null",
+	"/proc/mounts",
+	"/proc/filesystems",
+	"/proc/cmdline",
+	"/proc/device-tre", /* incorrect path */
+	"",
+};
+
+#define FAKE_PERMISSION		0x88000
+#define MAX_DESCRIPTOR		0xffffffff
+
+int mte_read_beyond_buffer_test(void)
+{
+	int fd[NUM_DEVICES];
+	unsigned int _desc, _dev;
+
+	for (_desc = 0; _desc <= MAX_DESCRIPTOR; _desc++) {
+		for (_dev = 0; _dev < NUM_DEVICES; _dev++) {
+#ifdef _TEST_DEBUG
+			printf("[TEST]: openat(0x%x, %s, 0x%x)\n", _desc, dev[_dev], FAKE_PERMISSION);
+#endif
+
+			fd[_dev] = openat(_desc, dev[_dev], FAKE_PERMISSION);
+		}
+
+		for (_dev = 0; _dev <= NUM_DEVICES; _dev++)
+			close(fd[_dev]);
+	}
+
+	return KSFT_PASS;
+}
+
+int main(int argc, char *argv[])
+{
+	int err;
+
+	err = mte_default_setup();
+	if (err)
+		return err;
+
+	ksft_set_plan(1);
+
+	evaluate_test(mte_read_beyond_buffer_test(),
+		"Verify that TCO is enabled correctly if a read beyond buffer occurs\n");
+
+	mte_restore_setup();
+	ksft_print_cnts();
+
+	return ksft_get_fail_cnt() == 0 ? KSFT_PASS : KSFT_FAIL;
+}