Message ID | 20170719175900.124074-1-thgarnie@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 19, 2017 at 10:58 AM, Thomas Garnier <thgarnie@google.com> wrote: > The work pending loop can call set_fs after addr_limit_user_check > removed the _TIF_FSCHECK flag. To prevent the infinite loop, move > the addr_limit_user_check call at the beginning of the loop. > > Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user-mode return") > Reported-by: Leonard Crestez <leonard.crestez@nxp.com> > Signed-off-by: Thomas Garnier <thgarnie@google.com> Any comments on this patch set? > --- > arch/arm/kernel/signal.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c > index 3a48b54c6405..f4574287d14b 100644 > --- a/arch/arm/kernel/signal.c > +++ b/arch/arm/kernel/signal.c > @@ -573,10 +573,10 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall) > */ > trace_hardirqs_off(); > > - /* Check valid user FS if needed */ > - addr_limit_user_check(); > - > do { > + /* Check valid user FS if needed */ > + addr_limit_user_check(); > + > if (likely(thread_flags & _TIF_NEED_RESCHED)) { > schedule(); > } else { > -- > 2.14.0.rc0.284.gd933b75aa4-goog >
On Mon, 2017-07-24 at 10:07 -0700, Thomas Garnier wrote: > On Wed, Jul 19, 2017 at 10:58 AM, Thomas Garnier <thgarnie@google.com > > wrote: > > > > The work pending loop can call set_fs after addr_limit_user_check > > removed the _TIF_FSCHECK flag. To prevent the infinite loop, move > > the addr_limit_user_check call at the beginning of the loop. > > > > Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user- > > mode return") > > Reported-by: Leonard Crestez <leonard.crestez@nxp.com> > > Signed-off-by: Thomas Garnier <thgarnie@google.com> > Any comments on this patch set? Tested-by: Leonard Crestez <leonard.crestez@nxp.com> This appears to fix the original issue of failing to boot from NFS when there are lots of alignment faults. But this is a very basic test relative to the reach of this change. However the original patch has been in linux-next for a while and apparently nobody else noticed system calls randomly hanging on arm. I assume maintainers need to give their opinion.
On Tue, Jul 25, 2017 at 01:28:01PM +0300, Leonard Crestez wrote: > On Mon, 2017-07-24 at 10:07 -0700, Thomas Garnier wrote: > > On Wed, Jul 19, 2017 at 10:58 AM, Thomas Garnier <thgarnie@google.com > > > wrote: > > > > > > The work pending loop can call set_fs after addr_limit_user_check > > > removed the _TIF_FSCHECK flag. To prevent the infinite loop, move > > > the addr_limit_user_check call at the beginning of the loop. > > > > > > Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user- > > > mode return") > > > Reported-by: Leonard Crestez <leonard.crestez@nxp.com> > > > Signed-off-by: Thomas Garnier <thgarnie@google.com> > > > Any comments on this patch set? > > Tested-by: Leonard Crestez <leonard.crestez@nxp.com> > > This appears to fix the original issue of failing to boot from NFS when > there are lots of alignment faults. But this is a very basic test > relative to the reach of this change. > > However the original patch has been in linux-next for a while and > apparently nobody else noticed system calls randomly hanging on arm. > > I assume maintainers need to give their opinion. I've already stated my opinion, which is different from what Linus has requested of Thomas. IMHO, the current approach is going to keep on causing problems along the lines that I've already pointed out.
On Tue, Jul 25, 2017 at 3:38 AM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Tue, Jul 25, 2017 at 01:28:01PM +0300, Leonard Crestez wrote: >> On Mon, 2017-07-24 at 10:07 -0700, Thomas Garnier wrote: >> > On Wed, Jul 19, 2017 at 10:58 AM, Thomas Garnier <thgarnie@google.com >> > > wrote: >> > > >> > > The work pending loop can call set_fs after addr_limit_user_check >> > > removed the _TIF_FSCHECK flag. To prevent the infinite loop, move >> > > the addr_limit_user_check call at the beginning of the loop. >> > > >> > > Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user- >> > > mode return") >> > > Reported-by: Leonard Crestez <leonard.crestez@nxp.com> >> > > Signed-off-by: Thomas Garnier <thgarnie@google.com> >> >> > Any comments on this patch set? >> >> Tested-by: Leonard Crestez <leonard.crestez@nxp.com> >> >> This appears to fix the original issue of failing to boot from NFS when >> there are lots of alignment faults. But this is a very basic test >> relative to the reach of this change. >> >> However the original patch has been in linux-next for a while and >> apparently nobody else noticed system calls randomly hanging on arm. >> >> I assume maintainers need to give their opinion. > > I've already stated my opinion, which is different from what Linus has > requested of Thomas. IMHO, the current approach is going to keep on > causing problems along the lines that I've already pointed out. I understand. Do you think this problem apply to arm64 as well? > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
On Tue, Jul 25, 2017 at 01:01:17PM -0700, Thomas Garnier wrote: > On Tue, Jul 25, 2017 at 3:38 AM, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: > > On Tue, Jul 25, 2017 at 01:28:01PM +0300, Leonard Crestez wrote: > >> On Mon, 2017-07-24 at 10:07 -0700, Thomas Garnier wrote: > >> > On Wed, Jul 19, 2017 at 10:58 AM, Thomas Garnier <thgarnie@google.com > >> > > wrote: > >> > > > >> > > The work pending loop can call set_fs after addr_limit_user_check > >> > > removed the _TIF_FSCHECK flag. To prevent the infinite loop, move > >> > > the addr_limit_user_check call at the beginning of the loop. > >> > > > >> > > Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user- > >> > > mode return") > >> > > Reported-by: Leonard Crestez <leonard.crestez@nxp.com> > >> > > Signed-off-by: Thomas Garnier <thgarnie@google.com> > >> > >> > Any comments on this patch set? > >> > >> Tested-by: Leonard Crestez <leonard.crestez@nxp.com> > >> > >> This appears to fix the original issue of failing to boot from NFS when > >> there are lots of alignment faults. But this is a very basic test > >> relative to the reach of this change. > >> > >> However the original patch has been in linux-next for a while and > >> apparently nobody else noticed system calls randomly hanging on arm. > >> > >> I assume maintainers need to give their opinion. > > > > I've already stated my opinion, which is different from what Linus has > > requested of Thomas. IMHO, the current approach is going to keep on > > causing problems along the lines that I've already pointed out. > > I understand. Do you think this problem apply to arm64 as well? It's probably less of an issue for arm64 because we don't take alignment faults from the kernel and I think the perf case would resolve itself by throttling the event. However, I also don't see the advantage of doing this in the work loop as opposed to leaving it until we're actually doing the return to userspace. I looked to see what you've done for x86, but it looks like you check/clear the flag before the work pending loop (exit_to_usermode_loop), which subsequently re-enables interrupts and exits when EXIT_TO_USERMODE_LOOP_FLAGS are all clear. Since TIF_FSCHECK isn't included in those flags, what stops it being set again by an irq and remaining set for the return to userspace? Will
On Wed, Jul 26, 2017 at 5:02 AM, Will Deacon <will.deacon@arm.com> wrote: > On Tue, Jul 25, 2017 at 01:01:17PM -0700, Thomas Garnier wrote: >> On Tue, Jul 25, 2017 at 3:38 AM, Russell King - ARM Linux >> <linux@armlinux.org.uk> wrote: >> > On Tue, Jul 25, 2017 at 01:28:01PM +0300, Leonard Crestez wrote: >> >> On Mon, 2017-07-24 at 10:07 -0700, Thomas Garnier wrote: >> >> > On Wed, Jul 19, 2017 at 10:58 AM, Thomas Garnier <thgarnie@google.com >> >> > > wrote: >> >> > > >> >> > > The work pending loop can call set_fs after addr_limit_user_check >> >> > > removed the _TIF_FSCHECK flag. To prevent the infinite loop, move >> >> > > the addr_limit_user_check call at the beginning of the loop. >> >> > > >> >> > > Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user- >> >> > > mode return") >> >> > > Reported-by: Leonard Crestez <leonard.crestez@nxp.com> >> >> > > Signed-off-by: Thomas Garnier <thgarnie@google.com> >> >> >> >> > Any comments on this patch set? >> >> >> >> Tested-by: Leonard Crestez <leonard.crestez@nxp.com> >> >> >> >> This appears to fix the original issue of failing to boot from NFS when >> >> there are lots of alignment faults. But this is a very basic test >> >> relative to the reach of this change. >> >> >> >> However the original patch has been in linux-next for a while and >> >> apparently nobody else noticed system calls randomly hanging on arm. >> >> >> >> I assume maintainers need to give their opinion. >> > >> > I've already stated my opinion, which is different from what Linus has >> > requested of Thomas. IMHO, the current approach is going to keep on >> > causing problems along the lines that I've already pointed out. >> >> I understand. Do you think this problem apply to arm64 as well? > > It's probably less of an issue for arm64 because we don't take alignment > faults from the kernel and I think the perf case would resolve itself by > throttling the event. However, I also don't see the advantage of doing > this in the work loop as opposed to leaving it until we're actually doing > the return to userspace. I think the idea is doing the check as early as possible to catch any error before any additional logic. > > I looked to see what you've done for x86, but it looks like you check/clear > the flag before the work pending loop (exit_to_usermode_loop), which > subsequently re-enables interrupts and exits when > EXIT_TO_USERMODE_LOOP_FLAGS are all clear. Since TIF_FSCHECK isn't included > in those flags, what stops it being set again by an irq and remaining set > for the return to userspace? Nothing, I plan to improve the x86 logic later. I focused on ARM/ARM64 right now based on Leonard report. For the next iteration, I plan on having an updated version of the previous implementation for ARM. I will send it soon. > > Will
On Wed, Jul 26, 2017 at 07:20:22AM -0700, Thomas Garnier wrote: > On Wed, Jul 26, 2017 at 5:02 AM, Will Deacon <will.deacon@arm.com> wrote: > > I looked to see what you've done for x86, but it looks like you check/clear > > the flag before the work pending loop (exit_to_usermode_loop), which > > subsequently re-enables interrupts and exits when > > EXIT_TO_USERMODE_LOOP_FLAGS are all clear. Since TIF_FSCHECK isn't included > > in those flags, what stops it being set again by an irq and remaining set > > for the return to userspace? > > Nothing, I plan to improve the x86 logic later. I focused on ARM/ARM64 > right now based on Leonard report. Hmm. In this case, I'd suggest concentrating on x86 and getting the implementation correct there before porting it to other architectures. If x86 were to check TIF_FSCHECK in the loop, and repeat until clear, would x86 also end up in these infinite loops that have been reported on ARM as well? I strongly suggest testing the behaviour with kprobes/tracing enabled for a function called from the work pending loop, and checking how that behaves.
On Wed, Jul 26, 2017 at 11:25 AM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Wed, Jul 26, 2017 at 07:20:22AM -0700, Thomas Garnier wrote: >> On Wed, Jul 26, 2017 at 5:02 AM, Will Deacon <will.deacon@arm.com> wrote: >> > I looked to see what you've done for x86, but it looks like you check/clear >> > the flag before the work pending loop (exit_to_usermode_loop), which >> > subsequently re-enables interrupts and exits when >> > EXIT_TO_USERMODE_LOOP_FLAGS are all clear. Since TIF_FSCHECK isn't included >> > in those flags, what stops it being set again by an irq and remaining set >> > for the return to userspace? >> >> Nothing, I plan to improve the x86 logic later. I focused on ARM/ARM64 >> right now based on Leonard report. > > Hmm. In this case, I'd suggest concentrating on x86 and getting the > implementation correct there before porting it to other architectures. I think the ARM architecture is different. > > If x86 were to check TIF_FSCHECK in the loop, and repeat until clear, > would x86 also end up in these infinite loops that have been reported > on ARM as well? If for every loop set_fs was called. I think the idea is that at some point only the TIF_FSCHECK remain. I don't think x86 suffers from the same issue than ARM. > > I strongly suggest testing the behaviour with kprobes/tracing enabled > for a function called from the work pending loop, and checking how > that behaves. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index 3a48b54c6405..f4574287d14b 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -573,10 +573,10 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall) */ trace_hardirqs_off(); - /* Check valid user FS if needed */ - addr_limit_user_check(); - do { + /* Check valid user FS if needed */ + addr_limit_user_check(); + if (likely(thread_flags & _TIF_NEED_RESCHED)) { schedule(); } else {
The work pending loop can call set_fs after addr_limit_user_check removed the _TIF_FSCHECK flag. To prevent the infinite loop, move the addr_limit_user_check call at the beginning of the loop. Fixes: 73ac5d6a2b6a ("arm/syscalls: Check address limit on user-mode return") Reported-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Thomas Garnier <thgarnie@google.com> --- arch/arm/kernel/signal.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)