[RFC6,v6,00/21] ILP32 for ARM64

Message ID	20160512002000.GA30997@yury-N73SV (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Thu, 12 May 2016 03:20:00 +0300 From: Yury Norov <ynorov@caviumnetworks.com> To: <arnd@arndb.de>, <catalin.marinas@arm.com>, <linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org> Subject: Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 Message-ID: <20160512002000.GA30997@yury-N73SV> References: <1459894127-17698-1-git-send-email-ynorov@caviumnetworks.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1459894127-17698-1-git-send-email-ynorov@caviumnetworks.com> User-Agent: Mutt/1.5.23 (2014-03-12) SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM Precedence: list Cc: linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, pinskia@gmail.com, Prasun.Kapoor@caviumnetworks.com, heiko.carstens@de.ibm.com, linux-doc@vger.kernel.org, Nathan_Lynch@mentor.com, agraf@suse.de, klimov.linux@gmail.com, broonie@kernel.org, bamvor.zhangjian@huawei.com, schwab@suse.de, schwidefsky@de.ibm.com, joseph@codesourcery.com, christoph.muellner@theobroma-systems.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Yury Norov May 12, 2016, 12:20 a.m. UTC

Hi,

I debugged preadv02 and pwritev02 failures and found very weird bug.
Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
of vector, and kernel reports successful read/write.

There are 2 problems:
1. How kernel allows such address to be passed to fs subsystem;
2. How fs successes to read/write at non-mapped, and in fact non-user
address.

I don't know the answer on 2'nd question, and it might be something
generic. But I investigated first problem.

The problem is that compat_rw_copy_check_uvector() uses access_ok() to
validate user address, and on arm64 it ends up with checking buffer
end against current_thread_info()->addr_limit.

current_thread_info()->addr_limit for ilp32, and most probably for
aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
It happens because on thread creation we call flush_old_exec() to set 
addr_limit, and completely ignore compat mode there.

This patch fixes it. It also fixes USER_DS macro to return different
values depending on compat.

This patch is enough to handle preadv02 and pwritev02, but problem #2
is still there.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/include/asm/elf.h     | 1 +
 arch/arm64/include/asm/uaccess.h | 2 +-
 arch/arm64/kernel/binfmt_elf32.c | 1 +
 arch/arm64/kernel/binfmt_ilp32.c | 1 +
 4 files changed, 4 insertions(+), 1 deletion(-)

Arnd Bergmann May 12, 2016, 9:19 a.m. UTC | #1

On Thursday 12 May 2016 03:20:00 Yury Norov wrote:
> 
> I debugged preadv02 and pwritev02 failures and found very weird bug.
> Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> of vector, and kernel reports successful read/write.
> 
> There are 2 problems:
> 1. How kernel allows such address to be passed to fs subsystem;
> 2. How fs successes to read/write at non-mapped, and in fact non-user
> address.
> 
> I don't know the answer on 2'nd question, and it might be something
> generic. But I investigated first problem.
> 
> The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> validate user address, and on arm64 it ends up with checking buffer
> end against current_thread_info()->addr_limit.
> 
> current_thread_info()->addr_limit for ilp32, and most probably for
> aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> It happens because on thread creation we call flush_old_exec() to set 
> addr_limit, and completely ignore compat mode there.
> 
> This patch fixes it. It also fixes USER_DS macro to return different
> values depending on compat.
> 
> This patch is enough to handle preadv02 and pwritev02, but problem #2
> is still there.
> 
> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> 

Good catch!

Can you do a version of this patch that works on the current
mainline kernel and can be backported to fix aarch32 emulation?

For ilp32 mode, I think we can better fix arch/arm64/kernel/binfmt_ilp32.c
as it is introduced.

	Arnd

Yury Norov May 12, 2016, 10:30 a.m. UTC | #2

On Thu, May 12, 2016 at 11:19:21AM +0200, Arnd Bergmann wrote:
> On Thursday 12 May 2016 03:20:00 Yury Norov wrote:
> > 
> > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > of vector, and kernel reports successful read/write.
> > 
> > There are 2 problems:
> > 1. How kernel allows such address to be passed to fs subsystem;
> > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > address.
> > 
> > I don't know the answer on 2'nd question, and it might be something
> > generic. But I investigated first problem.
> > 
> > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > validate user address, and on arm64 it ends up with checking buffer
> > end against current_thread_info()->addr_limit.
> > 
> > current_thread_info()->addr_limit for ilp32, and most probably for
> > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > It happens because on thread creation we call flush_old_exec() to set 
> > addr_limit, and completely ignore compat mode there.
> > 
> > This patch fixes it. It also fixes USER_DS macro to return different
> > values depending on compat.
> > 
> > This patch is enough to handle preadv02 and pwritev02, but problem #2
> > is still there.
> > 
> > Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> > 
> 
> Good catch!
> 
> Can you do a version of this patch that works on the current
> mainline kernel and can be backported to fix aarch32 emulation?
> 
> For ilp32 mode, I think we can better fix arch/arm64/kernel/binfmt_ilp32.c
> as it is introduced.
> 
> 	Arnd
> 

OK, will do

> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Catalin Marinas May 12, 2016, 1:35 p.m. UTC | #3

On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> I debugged preadv02 and pwritev02 failures and found very weird bug.
> Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> of vector, and kernel reports successful read/write.
> 
> There are 2 problems:
> 1. How kernel allows such address to be passed to fs subsystem;
> 2. How fs successes to read/write at non-mapped, and in fact non-user
> address.
> 
> I don't know the answer on 2'nd question, and it might be something
> generic. But I investigated first problem.
> 
> The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> validate user address, and on arm64 it ends up with checking buffer
> end against current_thread_info()->addr_limit.
> 
> current_thread_info()->addr_limit for ilp32, and most probably for
> aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> It happens because on thread creation we call flush_old_exec() to set 
> addr_limit, and completely ignore compat mode there.

I assume accesses beyond this address would fault anyway but I haven't
checked the code paths.

> diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
> index 7a39683..6ba4952 100644
> --- a/arch/arm64/include/asm/elf.h
> +++ b/arch/arm64/include/asm/elf.h
> @@ -146,6 +146,7 @@ typedef struct user_fpsimd_state elf_fpregset_t;
>  do {						\
>  	clear_thread_flag(TIF_32BIT_AARCH64);	\
>  	clear_thread_flag(TIF_32BIT);		\
> +	set_fs(TASK_SIZE_64);			\
>  } while (0)

See below.

> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 19cfdc5..3b0dd8d 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -51,7 +51,7 @@
>  #define KERNEL_DS	(-1UL)
>  #define get_ds()	(KERNEL_DS)
>  
> -#define USER_DS		TASK_SIZE_64
> +#define USER_DS		TASK_SIZE

I agree with this.

>  #define get_fs()	(current_thread_info()->addr_limit)
>  
>  static inline void set_fs(mm_segment_t fs)
> diff --git a/arch/arm64/kernel/binfmt_elf32.c b/arch/arm64/kernel/binfmt_elf32.c
> index 5487872..2e8d9f3 100644
> --- a/arch/arm64/kernel/binfmt_elf32.c
> +++ b/arch/arm64/kernel/binfmt_elf32.c
> @@ -12,6 +12,7 @@
>  do {						\
>  	clear_thread_flag(TIF_32BIT_AARCH64);	\
>  	set_thread_flag(TIF_32BIT);		\
> +	set_fs(TASK_SIZE_32);			\
>  } while (0)
>  
>  #define COMPAT_ARCH_DLINFO
> diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> index a934fd4..a8599c6 100644
> --- a/arch/arm64/kernel/binfmt_ilp32.c
> +++ b/arch/arm64/kernel/binfmt_ilp32.c
> @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
>  do {									\
>  	set_thread_flag(TIF_32BIT_AARCH64);				\
>  	clear_thread_flag(TIF_32BIT);					\
> +	set_fs(TASK_SIZE_32);						\
>  } while (0)

I don't think we need these two. AFAICT, flush_old_exec() takes care of
setting the USER_DS for the new thread.

Yury Norov May 12, 2016, 1:44 p.m. UTC | #4

On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > of vector, and kernel reports successful read/write.
> > 
> > There are 2 problems:
> > 1. How kernel allows such address to be passed to fs subsystem;
> > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > address.
> > 
> > I don't know the answer on 2'nd question, and it might be something
> > generic. But I investigated first problem.
> > 
> > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > validate user address, and on arm64 it ends up with checking buffer
> > end against current_thread_info()->addr_limit.
> > 
> > current_thread_info()->addr_limit for ilp32, and most probably for
> > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > It happens because on thread creation we call flush_old_exec() to set 
> > addr_limit, and completely ignore compat mode there.
> 
> I assume accesses beyond this address would fault anyway but I haven't
> checked the code paths.
> 
> > diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
> > index 7a39683..6ba4952 100644
> > --- a/arch/arm64/include/asm/elf.h
> > +++ b/arch/arm64/include/asm/elf.h
> > @@ -146,6 +146,7 @@ typedef struct user_fpsimd_state elf_fpregset_t;
> >  do {						\
> >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> >  	clear_thread_flag(TIF_32BIT);		\
> > +	set_fs(TASK_SIZE_64);			\
> >  } while (0)
> 
> See below.
> 
> > diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> > index 19cfdc5..3b0dd8d 100644
> > --- a/arch/arm64/include/asm/uaccess.h
> > +++ b/arch/arm64/include/asm/uaccess.h
> > @@ -51,7 +51,7 @@
> >  #define KERNEL_DS	(-1UL)
> >  #define get_ds()	(KERNEL_DS)
> >  
> > -#define USER_DS		TASK_SIZE_64
> > +#define USER_DS		TASK_SIZE
> 
> I agree with this.
> 
> >  #define get_fs()	(current_thread_info()->addr_limit)
> >  
> >  static inline void set_fs(mm_segment_t fs)
> > diff --git a/arch/arm64/kernel/binfmt_elf32.c b/arch/arm64/kernel/binfmt_elf32.c
> > index 5487872..2e8d9f3 100644
> > --- a/arch/arm64/kernel/binfmt_elf32.c
> > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > @@ -12,6 +12,7 @@
> >  do {						\
> >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> >  	set_thread_flag(TIF_32BIT);		\
> > +	set_fs(TASK_SIZE_32);			\
> >  } while (0)
> >  
> >  #define COMPAT_ARCH_DLINFO
> > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > index a934fd4..a8599c6 100644
> > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> >  do {									\
> >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> >  	clear_thread_flag(TIF_32BIT);					\
> > +	set_fs(TASK_SIZE_32);						\
> >  } while (0)
> 
> I don't think we need these two. AFAICT, flush_old_exec() takes care of
> setting the USER_DS for the new thread.

That's true, but USER_DS depends on personality which is not set yet
for new thread, as I wrote above. In fact, I tried correct USER_DS
only, and it doesn't work

> 
> -- 
> Catalin

Catalin Marinas May 12, 2016, 2:07 p.m. UTC | #5

On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> > On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > > of vector, and kernel reports successful read/write.
> > > 
> > > There are 2 problems:
> > > 1. How kernel allows such address to be passed to fs subsystem;
> > > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > > address.
> > > 
> > > I don't know the answer on 2'nd question, and it might be something
> > > generic. But I investigated first problem.
> > > 
> > > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > > validate user address, and on arm64 it ends up with checking buffer
> > > end against current_thread_info()->addr_limit.
> > > 
> > > current_thread_info()->addr_limit for ilp32, and most probably for
> > > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > > It happens because on thread creation we call flush_old_exec() to set 
> > > addr_limit, and completely ignore compat mode there.

[...]

> > > --- a/arch/arm64/kernel/binfmt_elf32.c
> > > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > > @@ -12,6 +12,7 @@
> > >  do {						\
> > >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> > >  	set_thread_flag(TIF_32BIT);		\
> > > +	set_fs(TASK_SIZE_32);			\
> > >  } while (0)
> > >  
> > >  #define COMPAT_ARCH_DLINFO
> > > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > > index a934fd4..a8599c6 100644
> > > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> > >  do {									\
> > >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> > >  	clear_thread_flag(TIF_32BIT);					\
> > > +	set_fs(TASK_SIZE_32);						\
> > >  } while (0)
> > 
> > I don't think we need these two. AFAICT, flush_old_exec() takes care of
> > setting the USER_DS for the new thread.
> 
> That's true, but USER_DS depends on personality which is not set yet
> for new thread, as I wrote above. In fact, I tried correct USER_DS
> only, and it doesn't work

Ah, it looks like load_elf_binary() sets the personality after
flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
maximum 64-bit task value, so they should have a similar issue with
native 32-bit vs compat behaviour.

So what exactly is LTP complaining about? Is different error (like
EFAULT vs EINVAL) or not getting an error at all.

(I need to update my LTP, the preadv tests appeared in December last
year)

Catalin Marinas May 12, 2016, 2:20 p.m. UTC | #6

On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
> On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> > On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> > > On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > > > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > > > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > > > of vector, and kernel reports successful read/write.
> > > > 
> > > > There are 2 problems:
> > > > 1. How kernel allows such address to be passed to fs subsystem;
> > > > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > > > address.
> > > > 
> > > > I don't know the answer on 2'nd question, and it might be something
> > > > generic. But I investigated first problem.
> > > > 
> > > > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > > > validate user address, and on arm64 it ends up with checking buffer
> > > > end against current_thread_info()->addr_limit.
> > > > 
> > > > current_thread_info()->addr_limit for ilp32, and most probably for
> > > > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > > > It happens because on thread creation we call flush_old_exec() to set 
> > > > addr_limit, and completely ignore compat mode there.
> 
> [...]
> 
> > > > --- a/arch/arm64/kernel/binfmt_elf32.c
> > > > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > > > @@ -12,6 +12,7 @@
> > > >  do {						\
> > > >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> > > >  	set_thread_flag(TIF_32BIT);		\
> > > > +	set_fs(TASK_SIZE_32);			\
> > > >  } while (0)
> > > >  
> > > >  #define COMPAT_ARCH_DLINFO
> > > > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > > > index a934fd4..a8599c6 100644
> > > > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > > > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > > > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> > > >  do {									\
> > > >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> > > >  	clear_thread_flag(TIF_32BIT);					\
> > > > +	set_fs(TASK_SIZE_32);						\
> > > >  } while (0)
> > > 
> > > I don't think we need these two. AFAICT, flush_old_exec() takes care of
> > > setting the USER_DS for the new thread.
> > 
> > That's true, but USER_DS depends on personality which is not set yet
> > for new thread, as I wrote above. In fact, I tried correct USER_DS
> > only, and it doesn't work
> 
> Ah, it looks like load_elf_binary() sets the personality after
> flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
> maximum 64-bit task value, so they should have a similar issue with
> native 32-bit vs compat behaviour.

I think we have another problem. flush_old_exec() calls the arm64
flush_thread() where tls_thread_flush() checks for is_compat_task(). So
starting a 32-bit application from a 64-bit one not go on the correct
path.

Yury Norov May 12, 2016, 2:24 p.m. UTC | #7

On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
> On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> > On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> > > On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > > > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > > > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > > > of vector, and kernel reports successful read/write.
> > > > 
> > > > There are 2 problems:
> > > > 1. How kernel allows such address to be passed to fs subsystem;
> > > > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > > > address.
> > > > 
> > > > I don't know the answer on 2'nd question, and it might be something
> > > > generic. But I investigated first problem.
> > > > 
> > > > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > > > validate user address, and on arm64 it ends up with checking buffer
> > > > end against current_thread_info()->addr_limit.
> > > > 
> > > > current_thread_info()->addr_limit for ilp32, and most probably for
> > > > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > > > It happens because on thread creation we call flush_old_exec() to set 
> > > > addr_limit, and completely ignore compat mode there.
> 
> [...]
> 
> > > > --- a/arch/arm64/kernel/binfmt_elf32.c
> > > > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > > > @@ -12,6 +12,7 @@
> > > >  do {						\
> > > >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> > > >  	set_thread_flag(TIF_32BIT);		\
> > > > +	set_fs(TASK_SIZE_32);			\
> > > >  } while (0)
> > > >  
> > > >  #define COMPAT_ARCH_DLINFO
> > > > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > > > index a934fd4..a8599c6 100644
> > > > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > > > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > > > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> > > >  do {									\
> > > >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> > > >  	clear_thread_flag(TIF_32BIT);					\
> > > > +	set_fs(TASK_SIZE_32);						\
> > > >  } while (0)
> > > 
> > > I don't think we need these two. AFAICT, flush_old_exec() takes care of
> > > setting the USER_DS for the new thread.
> > 
> > That's true, but USER_DS depends on personality which is not set yet
> > for new thread, as I wrote above. In fact, I tried correct USER_DS
> > only, and it doesn't work
> 
> Ah, it looks like load_elf_binary() sets the personality after
> flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
> maximum 64-bit task value, so they should have a similar issue with
> native 32-bit vs compat behaviour.

Hmmm. If so, it means we'd introduce generic fix. It would be removing 
set_fs() from flush_old_exec() and appending it to load_elf_binary()
after SET_PERSONALITY(). But I think it should be agreed with other
arches developers. I've sent standalone patch for aarch64 (you in CC)
so let's move discussion there.

> So what exactly is LTP complaining about? Is different error (like
> EFAULT vs EINVAL) or not getting an error at all.
 
It should be EINVAL, but it succeed. The other problem is that
following fs routines does not complain on wrong address.

> (I need to update my LTP, the preadv tests appeared in December last
> year)
> 

preadv02 was extended with this testcase in April.

> -- 
> Catalin

Yury Norov May 12, 2016, 2:34 p.m. UTC | #8

On Thu, May 12, 2016 at 03:20:16PM +0100, Catalin Marinas wrote:
> On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
> > On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> > > On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> > > > On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > > > > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > > > > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > > > > of vector, and kernel reports successful read/write.
> > > > > 
> > > > > There are 2 problems:
> > > > > 1. How kernel allows such address to be passed to fs subsystem;
> > > > > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > > > > address.
> > > > > 
> > > > > I don't know the answer on 2'nd question, and it might be something
> > > > > generic. But I investigated first problem.
> > > > > 
> > > > > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > > > > validate user address, and on arm64 it ends up with checking buffer
> > > > > end against current_thread_info()->addr_limit.
> > > > > 
> > > > > current_thread_info()->addr_limit for ilp32, and most probably for
> > > > > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > > > > It happens because on thread creation we call flush_old_exec() to set 
> > > > > addr_limit, and completely ignore compat mode there.
> > 
> > [...]
> > 
> > > > > --- a/arch/arm64/kernel/binfmt_elf32.c
> > > > > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > > > > @@ -12,6 +12,7 @@
> > > > >  do {						\
> > > > >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> > > > >  	set_thread_flag(TIF_32BIT);		\
> > > > > +	set_fs(TASK_SIZE_32);			\
> > > > >  } while (0)
> > > > >  
> > > > >  #define COMPAT_ARCH_DLINFO
> > > > > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > index a934fd4..a8599c6 100644
> > > > > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > > > > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> > > > >  do {									\
> > > > >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> > > > >  	clear_thread_flag(TIF_32BIT);					\
> > > > > +	set_fs(TASK_SIZE_32);						\
> > > > >  } while (0)
> > > > 
> > > > I don't think we need these two. AFAICT, flush_old_exec() takes care of
> > > > setting the USER_DS for the new thread.
> > > 
> > > That's true, but USER_DS depends on personality which is not set yet
> > > for new thread, as I wrote above. In fact, I tried correct USER_DS
> > > only, and it doesn't work
> > 
> > Ah, it looks like load_elf_binary() sets the personality after
> > flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
> > maximum 64-bit task value, so they should have a similar issue with
> > native 32-bit vs compat behaviour.
> 
> I think we have another problem. flush_old_exec() calls the arm64
> flush_thread() where tls_thread_flush() checks for is_compat_task(). So
> starting a 32-bit application from a 64-bit one not go on the correct
> path.

As per now, all native, aarch32 and ilp32 tasks can exec() any
binaries they need. Are you think it's wrong? If so, how we coild run
first compat application (maybe shell), it there are only lp64 tasks
on the system?

> 
> -- 
> Catalin

Catalin Marinas May 12, 2016, 2:54 p.m. UTC | #9

On Thu, May 12, 2016 at 05:34:15PM +0300, Yury Norov wrote:
> On Thu, May 12, 2016 at 03:20:16PM +0100, Catalin Marinas wrote:
> > On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
> > > On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> > > > On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> > > > > On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > > > > > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > > > > > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > > > > > of vector, and kernel reports successful read/write.
> > > > > > 
> > > > > > There are 2 problems:
> > > > > > 1. How kernel allows such address to be passed to fs subsystem;
> > > > > > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > > > > > address.
> > > > > > 
> > > > > > I don't know the answer on 2'nd question, and it might be something
> > > > > > generic. But I investigated first problem.
> > > > > > 
> > > > > > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > > > > > validate user address, and on arm64 it ends up with checking buffer
> > > > > > end against current_thread_info()->addr_limit.
> > > > > > 
> > > > > > current_thread_info()->addr_limit for ilp32, and most probably for
> > > > > > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > > > > > It happens because on thread creation we call flush_old_exec() to set 
> > > > > > addr_limit, and completely ignore compat mode there.
> > > 
> > > [...]
> > > 
> > > > > > --- a/arch/arm64/kernel/binfmt_elf32.c
> > > > > > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > > > > > @@ -12,6 +12,7 @@
> > > > > >  do {						\
> > > > > >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> > > > > >  	set_thread_flag(TIF_32BIT);		\
> > > > > > +	set_fs(TASK_SIZE_32);			\
> > > > > >  } while (0)
> > > > > >  
> > > > > >  #define COMPAT_ARCH_DLINFO
> > > > > > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > > index a934fd4..a8599c6 100644
> > > > > > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > > > > > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> > > > > >  do {									\
> > > > > >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> > > > > >  	clear_thread_flag(TIF_32BIT);					\
> > > > > > +	set_fs(TASK_SIZE_32);						\
> > > > > >  } while (0)
> > > > > 
> > > > > I don't think we need these two. AFAICT, flush_old_exec() takes care of
> > > > > setting the USER_DS for the new thread.
> > > > 
> > > > That's true, but USER_DS depends on personality which is not set yet
> > > > for new thread, as I wrote above. In fact, I tried correct USER_DS
> > > > only, and it doesn't work
> > > 
> > > Ah, it looks like load_elf_binary() sets the personality after
> > > flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
> > > maximum 64-bit task value, so they should have a similar issue with
> > > native 32-bit vs compat behaviour.
> > 
> > I think we have another problem. flush_old_exec() calls the arm64
> > flush_thread() where tls_thread_flush() checks for is_compat_task(). So
> > starting a 32-bit application from a 64-bit one not go on the correct
> > path.
> 
> As per now, all native, aarch32 and ilp32 tasks can exec() any
> binaries they need.

And that's correct.

> Are you think it's wrong? If so, how we coild run
> first compat application (maybe shell), it there are only lp64 tasks
> on the system?

What I meant is that we rely on flush_old_exec() to initialise the TLS
register for the compat task but it currently depends on what the parent
is. I think tls_thread_flush() should actually drop the is_compat_task()
thread and always initialise all the TLS registers.

Yury Norov May 12, 2016, 3:27 p.m. UTC | #10

On Thu, May 12, 2016 at 03:54:03PM +0100, Catalin Marinas wrote:
> On Thu, May 12, 2016 at 05:34:15PM +0300, Yury Norov wrote:
> > On Thu, May 12, 2016 at 03:20:16PM +0100, Catalin Marinas wrote:
> > > On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
> > > > On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> > > > > On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> > > > > > On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > > > > > > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > > > > > > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > > > > > > of vector, and kernel reports successful read/write.
> > > > > > > 
> > > > > > > There are 2 problems:
> > > > > > > 1. How kernel allows such address to be passed to fs subsystem;
> > > > > > > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > > > > > > address.
> > > > > > > 
> > > > > > > I don't know the answer on 2'nd question, and it might be something
> > > > > > > generic. But I investigated first problem.
> > > > > > > 
> > > > > > > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > > > > > > validate user address, and on arm64 it ends up with checking buffer
> > > > > > > end against current_thread_info()->addr_limit.
> > > > > > > 
> > > > > > > current_thread_info()->addr_limit for ilp32, and most probably for
> > > > > > > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > > > > > > It happens because on thread creation we call flush_old_exec() to set 
> > > > > > > addr_limit, and completely ignore compat mode there.
> > > > 
> > > > [...]
> > > > 
> > > > > > > --- a/arch/arm64/kernel/binfmt_elf32.c
> > > > > > > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > > > > > > @@ -12,6 +12,7 @@
> > > > > > >  do {						\
> > > > > > >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> > > > > > >  	set_thread_flag(TIF_32BIT);		\
> > > > > > > +	set_fs(TASK_SIZE_32);			\
> > > > > > >  } while (0)
> > > > > > >  
> > > > > > >  #define COMPAT_ARCH_DLINFO
> > > > > > > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > > > index a934fd4..a8599c6 100644
> > > > > > > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > > > > > > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > > > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> > > > > > >  do {									\
> > > > > > >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> > > > > > >  	clear_thread_flag(TIF_32BIT);					\
> > > > > > > +	set_fs(TASK_SIZE_32);						\
> > > > > > >  } while (0)
> > > > > > 
> > > > > > I don't think we need these two. AFAICT, flush_old_exec() takes care of
> > > > > > setting the USER_DS for the new thread.
> > > > > 
> > > > > That's true, but USER_DS depends on personality which is not set yet
> > > > > for new thread, as I wrote above. In fact, I tried correct USER_DS
> > > > > only, and it doesn't work
> > > > 
> > > > Ah, it looks like load_elf_binary() sets the personality after
> > > > flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
> > > > maximum 64-bit task value, so they should have a similar issue with
> > > > native 32-bit vs compat behaviour.
> > > 
> > > I think we have another problem. flush_old_exec() calls the arm64
> > > flush_thread() where tls_thread_flush() checks for is_compat_task(). So
> > > starting a 32-bit application from a 64-bit one not go on the correct
> > > path.
> > 
> > As per now, all native, aarch32 and ilp32 tasks can exec() any
> > binaries they need.
> 
> And that's correct.
> 
> > Are you think it's wrong? If so, how we coild run
> > first compat application (maybe shell), it there are only lp64 tasks
> > on the system?
> 
> What I meant is that we rely on flush_old_exec() to initialise the TLS
> register for the compat task but it currently depends on what the parent
> is. I think tls_thread_flush() should actually drop the is_compat_task()
> thread and always initialise all the TLS registers.
> 

OK. I'll do it in v2.

> -- 
> Catalin

Catalin Marinas May 12, 2016, 3:28 p.m. UTC | #11

On Thu, May 12, 2016 at 05:24:57PM +0300, Yury Norov wrote:
> On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
> > On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> > > On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> > > > On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > > > > I debugged preadv02 and pwritev02 failures and found very weird bug.
> > > > > Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > > > > of vector, and kernel reports successful read/write.
> > > > > 
> > > > > There are 2 problems:
> > > > > 1. How kernel allows such address to be passed to fs subsystem;
> > > > > 2. How fs successes to read/write at non-mapped, and in fact non-user
> > > > > address.
> > > > > 
> > > > > I don't know the answer on 2'nd question, and it might be something
> > > > > generic. But I investigated first problem.
> > > > > 
> > > > > The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> > > > > validate user address, and on arm64 it ends up with checking buffer
> > > > > end against current_thread_info()->addr_limit.
> > > > > 
> > > > > current_thread_info()->addr_limit for ilp32, and most probably for
> > > > > aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> > > > > It happens because on thread creation we call flush_old_exec() to set 
> > > > > addr_limit, and completely ignore compat mode there.
> > 
> > [...]
> > 
> > > > > --- a/arch/arm64/kernel/binfmt_elf32.c
> > > > > +++ b/arch/arm64/kernel/binfmt_elf32.c
> > > > > @@ -12,6 +12,7 @@
> > > > >  do {						\
> > > > >  	clear_thread_flag(TIF_32BIT_AARCH64);	\
> > > > >  	set_thread_flag(TIF_32BIT);		\
> > > > > +	set_fs(TASK_SIZE_32);			\
> > > > >  } while (0)
> > > > >  
> > > > >  #define COMPAT_ARCH_DLINFO
> > > > > diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > index a934fd4..a8599c6 100644
> > > > > --- a/arch/arm64/kernel/binfmt_ilp32.c
> > > > > +++ b/arch/arm64/kernel/binfmt_ilp32.c
> > > > > @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
> > > > >  do {									\
> > > > >  	set_thread_flag(TIF_32BIT_AARCH64);				\
> > > > >  	clear_thread_flag(TIF_32BIT);					\
> > > > > +	set_fs(TASK_SIZE_32);						\
> > > > >  } while (0)
> > > > 
> > > > I don't think we need these two. AFAICT, flush_old_exec() takes care of
> > > > setting the USER_DS for the new thread.
> > > 
> > > That's true, but USER_DS depends on personality which is not set yet
> > > for new thread, as I wrote above. In fact, I tried correct USER_DS
> > > only, and it doesn't work
> > 
> > Ah, it looks like load_elf_binary() sets the personality after
> > flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
> > maximum 64-bit task value, so they should have a similar issue with
> > native 32-bit vs compat behaviour.
> 
> Hmmm. If so, it means we'd introduce generic fix. It would be removing 
> set_fs() from flush_old_exec() and appending it to load_elf_binary()
> after SET_PERSONALITY(). But I think it should be agreed with other
> arches developers.

The set_fs() in flush_old_exec() is probably fine, it may be meant to
re-set the USER_DS for the old thread.

It appears that at least powerpc and x86 don't have different USER_DS
setting for native and compat, so moving the set_fs() call further down
would not make any difference for them, nor will it fix the preadv02 LTP
test (if it fails for them, I haven't checked).

> I've sent standalone patch for aarch64 (you in CC) so let's move
> discussion there.

I've seen the patch but we would lose some discussion history here. I
think we should continue this thread and just summarise the conclusion
in reply to the other patch. This thread is also available on
linux-arch, in case other architecture maintainers follow it.

> > So what exactly is LTP complaining about? Is different error (like
> > EFAULT vs EINVAL) or not getting an error at all.
>  
> It should be EINVAL, but it succeed. The other problem is that
> following fs routines does not complain on wrong address.

I see. The test asks the kernel to write a single byte (out of maximum
64) to the user address 0xffffffff. In the absence of the access_ok()
check, this operation succeeds. If the preadv syscall gets 2 bytes as
the count, then it would fail with EFAULT.

While it's not really a bug, I agree that for matching the native 32-bit
behavior (basically for other syscalls like those involving vfs_read()),
the simplest fix would be to have a dynamic USER_DS.

zhangjian May 13, 2016, 8:11 a.m. UTC | #12

Hi,

On 2016/5/12 23:28, Catalin Marinas wrote:
> On Thu, May 12, 2016 at 05:24:57PM +0300, Yury Norov wrote:
>> On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
>>> On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
>>>> On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
>>>>> On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
>>>>>> I debugged preadv02 and pwritev02 failures and found very weird bug.
>>>>>> Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
>>>>>> of vector, and kernel reports successful read/write.
>>>>>>
>>>>>> There are 2 problems:
>>>>>> 1. How kernel allows such address to be passed to fs subsystem;
>>>>>> 2. How fs successes to read/write at non-mapped, and in fact non-user
>>>>>> address.
>>>>>>
>>>>>> I don't know the answer on 2'nd question, and it might be something
>>>>>> generic. But I investigated first problem.
>>>>>>
>>>>>> The problem is that compat_rw_copy_check_uvector() uses access_ok() to
>>>>>> validate user address, and on arm64 it ends up with checking buffer
>>>>>> end against current_thread_info()->addr_limit.
>>>>>>
>>>>>> current_thread_info()->addr_limit for ilp32, and most probably for
>>>>>> aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
>>>>>> It happens because on thread creation we call flush_old_exec() to set
>>>>>> addr_limit, and completely ignore compat mode there.
>>>
>>> [...]
>>>
>>>>>> --- a/arch/arm64/kernel/binfmt_elf32.c
>>>>>> +++ b/arch/arm64/kernel/binfmt_elf32.c
>>>>>> @@ -12,6 +12,7 @@
>>>>>>   do {						\
>>>>>>   	clear_thread_flag(TIF_32BIT_AARCH64);	\
>>>>>>   	set_thread_flag(TIF_32BIT);		\
>>>>>> +	set_fs(TASK_SIZE_32);			\
>>>>>>   } while (0)
>>>>>>
>>>>>>   #define COMPAT_ARCH_DLINFO
>>>>>> diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
>>>>>> index a934fd4..a8599c6 100644
>>>>>> --- a/arch/arm64/kernel/binfmt_ilp32.c
>>>>>> +++ b/arch/arm64/kernel/binfmt_ilp32.c
>>>>>> @@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
>>>>>>   do {									\
>>>>>>   	set_thread_flag(TIF_32BIT_AARCH64);				\
>>>>>>   	clear_thread_flag(TIF_32BIT);					\
>>>>>> +	set_fs(TASK_SIZE_32);						\
>>>>>>   } while (0)
>>>>>
>>>>> I don't think we need these two. AFAICT, flush_old_exec() takes care of
>>>>> setting the USER_DS for the new thread.
>>>>
>>>> That's true, but USER_DS depends on personality which is not set yet
>>>> for new thread, as I wrote above. In fact, I tried correct USER_DS
>>>> only, and it doesn't work
>>>
>>> Ah, it looks like load_elf_binary() sets the personality after
>>> flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
>>> maximum 64-bit task value, so they should have a similar issue with
>>> native 32-bit vs compat behaviour.
>>
>> Hmmm. If so, it means we'd introduce generic fix. It would be removing
>> set_fs() from flush_old_exec() and appending it to load_elf_binary()
>> after SET_PERSONALITY(). But I think it should be agreed with other
>> arches developers.
>
> The set_fs() in flush_old_exec() is probably fine, it may be meant to
> re-set the USER_DS for the old thread.
>
> It appears that at least powerpc and x86 don't have different USER_DS
> setting for native and compat, so moving the set_fs() call further down
> would not make any difference for them, nor will it fix the preadv02 LTP
> test (if it fails for them, I haven't checked).
>
>> I've sent standalone patch for aarch64 (you in CC) so let's move
>> discussion there.
>
> I've seen the patch but we would lose some discussion history here. I
> think we should continue this thread and just summarise the conclusion
> in reply to the other patch. This thread is also available on
> linux-arch, in case other architecture maintainers follow it.
>
>>> So what exactly is LTP complaining about? Is different error (like
>>> EFAULT vs EINVAL) or not getting an error at all.
>>
>> It should be EINVAL, but it succeed. The other problem is that
>> following fs routines does not complain on wrong address.
>
> I see. The test asks the kernel to write a single byte (out of maximum
> 64) to the user address 0xffffffff.
What address We should set for this limitation, TASK_SIZE or STACK_TOP?
It is same for 64bit application. But STACK_TOP(0xffff0000) is below
TASK_SIZE in 32bit application. The address above STACK_TOP is preserved
for 32bit application.

Regards

Bamvor

 > In the absence of the access_ok()
> check, this operation succeeds. If the preadv syscall gets 2 bytes as
> the count, then it would fail with EFAULT.
>
> While it's not really a bug, I agree that for matching the native 32-bit
> behavior (basically for other syscalls like those involving vfs_read()),
> the simplest fix would be to have a dynamic USER_DS.



>

Catalin Marinas May 13, 2016, 9:28 a.m. UTC | #13

On Fri, May 13, 2016 at 04:11:23PM +0800, Zhangjian (Bamvor) wrote:
> On 2016/5/12 23:28, Catalin Marinas wrote:
> >On Thu, May 12, 2016 at 05:24:57PM +0300, Yury Norov wrote:
> >>On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
> >>>On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
> >>>>On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
> >>>>>On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> >>>>>>I debugged preadv02 and pwritev02 failures and found very weird bug.
> >>>>>>Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> >>>>>>of vector, and kernel reports successful read/write.
> >>>>>>
> >>>>>>There are 2 problems:
> >>>>>>1. How kernel allows such address to be passed to fs subsystem;
> >>>>>>2. How fs successes to read/write at non-mapped, and in fact non-user
> >>>>>>address.
> >>>>>>
> >>>>>>I don't know the answer on 2'nd question, and it might be something
> >>>>>>generic. But I investigated first problem.
> >>>>>>
> >>>>>>The problem is that compat_rw_copy_check_uvector() uses access_ok() to
> >>>>>>validate user address, and on arm64 it ends up with checking buffer
> >>>>>>end against current_thread_info()->addr_limit.
> >>>>>>
> >>>>>>current_thread_info()->addr_limit for ilp32, and most probably for
> >>>>>>aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
> >>>>>>It happens because on thread creation we call flush_old_exec() to set
> >>>>>>addr_limit, and completely ignore compat mode there.
[...]
> >>>>That's true, but USER_DS depends on personality which is not set yet
> >>>>for new thread, as I wrote above. In fact, I tried correct USER_DS
> >>>>only, and it doesn't work
> >>>
> >>>Ah, it looks like load_elf_binary() sets the personality after
> >>>flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
> >>>maximum 64-bit task value, so they should have a similar issue with
> >>>native 32-bit vs compat behaviour.
[...]
> >>>So what exactly is LTP complaining about? Is different error (like
> >>>EFAULT vs EINVAL) or not getting an error at all.
> >>
> >>It should be EINVAL, but it succeed. The other problem is that
> >>following fs routines does not complain on wrong address.
> >
> >I see. The test asks the kernel to write a single byte (out of maximum
> >64) to the user address 0xffffffff.
> 
> What address We should set for this limitation, TASK_SIZE or STACK_TOP?
> It is same for 64bit application. But STACK_TOP(0xffff0000) is below
> TASK_SIZE in 32bit application. The address above STACK_TOP is preserved
> for 32bit application.

The discussion is mainly around whether USER_DS for 32-bit compat apps
should be the same as USER_DS for native 32-bit apps. Even for native
32-bit kernels, we don't use STACK_TOP as addr_limit. A read/write from
0xffffffff would fail in both cases anyway. I think the LTP test doesn't
even try to access such memory but only to probe the range validity (I
haven't managed to build the latest LTP yet).

Catalin Marinas May 13, 2016, 1:32 p.m. UTC | #14

On Fri, May 13, 2016 at 09:28:03AM +0000, Catalin Marinas wrote:
> > >>>>>On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
> > >>>>>>I debugged preadv02 and pwritev02 failures and found very weird bug.
> > >>>>>>Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
> > >>>>>>of vector, and kernel reports successful read/write.
[...]
> The discussion is mainly around whether USER_DS for 32-bit compat apps
> should be the same as USER_DS for native 32-bit apps. Even for native
> 32-bit kernels, we don't use STACK_TOP as addr_limit. A read/write from
> 0xffffffff would fail in both cases anyway. I think the LTP test doesn't
> even try to access such memory but only to probe the range validity (I
> haven't managed to build the latest LTP yet).

OK, so I managed to get the latest LTP (tag 20160510) built for AArch32
but the preadv02 does not fail:

-----------------------------
# uname -m
aarch64

# file ./testcases/bin/preadv02
./testcases/bin/preadv02: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=98066291426e1d3ee49d8504ce2a5bd721ab7fe8, not stripped

# ./testcases/bin/preadv02
preadv02.c:97: PASS: preadv() failed as expected: EINVAL
preadv02.c:97: PASS: preadv() failed as expected: EINVAL
preadv02.c:97: PASS: preadv() failed as expected: EINVAL
preadv02.c:97: PASS: preadv() failed as expected: EFAULT
preadv02.c:97: PASS: preadv() failed as expected: EBADF
preadv02.c:97: PASS: preadv() failed as expected: EBADF
preadv02.c:97: PASS: preadv() failed as expected: EISDIR
preadv02.c:97: PASS: preadv() failed as expected: ESPIPE

Summary:
passed   8
failed   0
skipped  0
warnings 0
-----------------------------

It's the 4th test above which passes iovec_base as -1 and expects
EFAULT. However, it seems to get the expected error. I guess that even
if access_ok() passes, the access to that location by the kernel would
fail since there isn't anything mapped.

With ILP32, however, STACK_TOP is TASK_SIZE and you may have the address
mapped already. I still think the USER_DS fix is useful, though I need
another way to check that it's actually a fix.

[RFC6,v6,00/21] ILP32 for ARM64

Commit Message

Comments

Patch