[0/7] wchan: Fix wchan support

Message ID	20211008111527.438276127@infradead.org (mailing list archive)
Headers	show Return-Path: <linux-hardening-owner@kernel.org> Message-ID: <20211008111527.438276127@infradead.org> User-Agent: quilt/0.66 Date: Fri, 08 Oct 2021 13:15:27 +0200 From: Peter Zijlstra <peterz@infradead.org> To: keescook@chromium.org, jannh@google.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, vcaputo@pengaru.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, akpm@linux-foundation.org, christian.brauner@ubuntu.com, amistry@google.com, Kenta.Tada@sony.com, legion@kernel.org, michael.weiss@aisec.fraunhofer.de, mhocko@suse.com, deller@gmx.de, zhengqi.arch@bytedance.com, me@tobin.cc, tycho@tycho.pizza, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, mark.rutland@arm.com, axboe@kernel.dk, metze@samba.org, laijs@linux.alibaba.com, luto@kernel.org, dave.hansen@linux.intel.com, ebiederm@xmission.com, ohoono.kwon@samsung.com, kaleshsingh@google.com, yifeifz2@illinois.edu, jpoimboe@redhat.com, linux-hardening@vger.kernel.org, linux-arch@vger.kernel.org, vgupta@kernel.org, linux@armlinux.org.uk, will@kernel.org, guoren@kernel.org, bcain@codeaurora.org, monstr@monstr.eu, tsbogend@alpha.franken.de, nickhu@andestech.com, jonas@southpole.se, mpe@ellerman.id.au, paul.walmsley@sifive.com, hca@linux.ibm.com, ysato@users.sourceforge.jp, davem@davemloft.net, chris@zankel.net Subject: [PATCH 0/7] wchan: Fix wchan support Precedence: bulk
Series	wchan: Fix wchan support \| expand [0/7] wchan: Fix wchan support [1/7] Revert "proc/wchan: use printk format instead of lookup_symbol_name()" [2/7] leaking_addresses: Always print a trailing newline [3/7] proc: Use task_is_running() for wchan in /proc/$pid/stat [4/7] x86: Fix get_wchan() to support the ORC unwinder [5/7] sched: Add wrapper for get_wchan() to keep task blocked [6/7] arch: __get_wchan \|\| STACKTRACE_SUPPORT [7/7] arch: Fix STACKTRACE_SUPPORT

Message ID

20211008111527.438276127@infradead.org (mailing list archive)

Headers

Message-ID: <20211008111527.438276127@infradead.org>
User-Agent: quilt/0.66
Date: Fri, 08 Oct 2021 13:15:27 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: keescook@chromium.org, jannh@google.com
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
        vcaputo@pengaru.com, mingo@redhat.com, juri.lelli@redhat.com,
        vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
        rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
        bristot@redhat.com, akpm@linux-foundation.org,
        christian.brauner@ubuntu.com, amistry@google.com,
        Kenta.Tada@sony.com, legion@kernel.org,
        michael.weiss@aisec.fraunhofer.de, mhocko@suse.com, deller@gmx.de,
        zhengqi.arch@bytedance.com, me@tobin.cc, tycho@tycho.pizza,
        tglx@linutronix.de, bp@alien8.de, hpa@zytor.com,
        mark.rutland@arm.com, axboe@kernel.dk, metze@samba.org,
        laijs@linux.alibaba.com, luto@kernel.org,
        dave.hansen@linux.intel.com, ebiederm@xmission.com,
        ohoono.kwon@samsung.com, kaleshsingh@google.com,
        yifeifz2@illinois.edu, jpoimboe@redhat.com,
        linux-hardening@vger.kernel.org, linux-arch@vger.kernel.org,
        vgupta@kernel.org, linux@armlinux.org.uk, will@kernel.org,
        guoren@kernel.org, bcain@codeaurora.org, monstr@monstr.eu,
        tsbogend@alpha.franken.de, nickhu@andestech.com,
        jonas@southpole.se, mpe@ellerman.id.au, paul.walmsley@sifive.com,
        hca@linux.ibm.com, ysato@users.sourceforge.jp, davem@davemloft.net,
        chris@zankel.net
Subject: [PATCH 0/7] wchan: Fix wchan support
Precedence: bulk

Series

wchan: Fix wchan support | expand

Message

Peter Zijlstra Oct. 8, 2021, 11:15 a.m. UTC

Hi,

This fixes up wchan which is various degrees of broken across the
architectures.

Patch 4 fixes wchan for x86, which has been returning 0 for the past many
releases.

Patch 5 fixes the fundamental race against scheduling.

Patch 6 deletes a lot and makes STACKTRACE unconditional

patch 7 fixes up a few STACKTRACE arch oddities

0day says all builds are good, so it must be perfect :-) I'm planning on
queueing up at least the first 5 patches, but I'm hoping the last two patches
can be too.

Also available here:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/wchan

---
 arch/alpha/include/asm/processor.h      |  2 +-
 arch/alpha/kernel/process.c             |  5 ++-
 arch/arc/include/asm/processor.h        |  2 --
 arch/arc/kernel/stacktrace.c            | 19 +---------
 arch/arm/include/asm/processor.h        |  2 --
 arch/arm/kernel/process.c               | 24 -------------
 arch/arm64/include/asm/processor.h      |  2 --
 arch/arm64/kernel/process.c             | 28 ---------------
 arch/csky/include/asm/processor.h       |  2 --
 arch/csky/kernel/stacktrace.c           | 26 ++++----------
 arch/h8300/include/asm/processor.h      |  2 +-
 arch/h8300/kernel/process.c             |  5 +--
 arch/hexagon/include/asm/processor.h    |  3 --
 arch/hexagon/kernel/process.c           | 28 ---------------
 arch/ia64/include/asm/processor.h       |  3 --
 arch/ia64/kernel/process.c              | 31 -----------------
 arch/m68k/include/asm/processor.h       |  2 +-
 arch/m68k/kernel/process.c              |  4 +--
 arch/microblaze/include/asm/processor.h |  2 --
 arch/microblaze/kernel/process.c        |  6 ----
 arch/mips/include/asm/processor.h       |  2 --
 arch/mips/kernel/process.c              | 31 +----------------
 arch/mips/kernel/stacktrace.c           | 27 ++++++++------
 arch/nds32/include/asm/processor.h      |  2 --
 arch/nds32/kernel/process.c             | 28 ---------------
 arch/nds32/kernel/stacktrace.c          | 21 +++++------
 arch/nios2/include/asm/processor.h      |  2 +-
 arch/nios2/kernel/process.c             |  5 +--
 arch/openrisc/include/asm/processor.h   |  1 -
 arch/openrisc/kernel/process.c          |  6 ----
 arch/parisc/include/asm/processor.h     |  2 --
 arch/parisc/kernel/process.c            | 27 --------------
 arch/powerpc/include/asm/processor.h    |  2 --
 arch/powerpc/kernel/process.c           | 40 ---------------------
 arch/riscv/include/asm/processor.h      |  3 --
 arch/riscv/kernel/stacktrace.c          | 23 ------------
 arch/s390/include/asm/processor.h       |  1 -
 arch/s390/kernel/process.c              | 29 ---------------
 arch/sh/include/asm/processor_32.h      |  2 --
 arch/sh/kernel/process_32.c             | 22 ------------
 arch/sparc/include/asm/processor_32.h   |  2 +-
 arch/sparc/include/asm/processor_64.h   |  2 --
 arch/sparc/kernel/process_32.c          |  5 +--
 arch/sparc/kernel/process_64.c          | 31 -----------------
 arch/um/include/asm/processor-generic.h |  1 -
 arch/um/kernel/process.c                | 35 -------------------
 arch/x86/include/asm/processor.h        |  2 --
 arch/x86/kernel/process.c               | 62 ---------------------------------
 arch/xtensa/include/asm/processor.h     |  2 --
 arch/xtensa/kernel/process.c            | 32 -----------------
 fs/proc/array.c                         |  7 ++--
 fs/proc/base.c                          | 19 +++++-----
 include/linux/sched.h                   |  1 +
 kernel/sched/core.c                     | 34 ++++++++++++++++++
 lib/Kconfig.debug                       |  7 +---
 scripts/leaking_addresses.pl            |  3 +-
 56 files changed, 97 insertions(+), 622 deletions(-)

Comments

Russell King (Oracle) Oct. 14, 2021, 12:02 p.m. UTC | #1

On Fri, Oct 08, 2021 at 01:15:27PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> This fixes up wchan which is various degrees of broken across the
> architectures.
> 
> Patch 4 fixes wchan for x86, which has been returning 0 for the past many
> releases.
> 
> Patch 5 fixes the fundamental race against scheduling.
> 
> Patch 6 deletes a lot and makes STACKTRACE unconditional
> 
> patch 7 fixes up a few STACKTRACE arch oddities
> 
> 0day says all builds are good, so it must be perfect :-) I'm planning on
> queueing up at least the first 5 patches, but I'm hoping the last two patches
> can be too.
> 
> Also available here:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/wchan

These patches introduce a regression on ARM. Whereas before, I have
/proc/*/wchan populated with non-zero values, with these patches they
_all_ contain "0":

root@clearfog21:~# cat /proc/*/wchan
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000root@clearfog21:~#

I'll try to investigate what is going on later today.

Russell King (Oracle) Oct. 14, 2021, 1:38 p.m. UTC | #2

On Thu, Oct 14, 2021 at 01:02:34PM +0100, Russell King (Oracle) wrote:
> On Fri, Oct 08, 2021 at 01:15:27PM +0200, Peter Zijlstra wrote:
> > Hi,
> > 
> > This fixes up wchan which is various degrees of broken across the
> > architectures.
> > 
> > Patch 4 fixes wchan for x86, which has been returning 0 for the past many
> > releases.
> > 
> > Patch 5 fixes the fundamental race against scheduling.
> > 
> > Patch 6 deletes a lot and makes STACKTRACE unconditional
> > 
> > patch 7 fixes up a few STACKTRACE arch oddities
> > 
> > 0day says all builds are good, so it must be perfect :-) I'm planning on
> > queueing up at least the first 5 patches, but I'm hoping the last two patches
> > can be too.
> > 
> > Also available here:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/wchan
> 
> These patches introduce a regression on ARM. Whereas before, I have
> /proc/*/wchan populated with non-zero values, with these patches they
> _all_ contain "0":
> 
> root@clearfog21:~# cat /proc/*/wchan
> 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000root@clearfog21:~#
> 
> I'll try to investigate what is going on later today.

What is going on here is that the ARM stacktrace code refuses to trace
non-current tasks in a SMP environment due to the racy nature of doing
so if the non-current tasks are running.

When walking the stack with frame pointers, we:

- validate that the frame pointer is between the stack pointer and the
  top of stack defined by that stack pointer.
- we then load the next stack pointer and next frame pointer from the
  stack.

The reason this is unsafe when the task is not blocked is the stack can
change at any moment, which can cause the value read as a stack pointer
to be wildly different. If the read frame pointer value is roughly in
agreement, we can end up reading any part of memory, which would be an
information leak.

The table based unwinding is much more complex being essentially a set
of instructions to the unwinder code about which values to read from
the stack into a set of pseudo-registers, corrections to the stack
pointer, or transfers from the pseudo-registers. I haven't analysed
this code enough to really know the implications of what could be
possible if the values on the stack change while this code is running
on another CPU (it's not my code!) There is an attempt to bounds-limit
the virtual stack pointer after each unwind instruction is processed
to catch the unwinder doing anything silly, so it may be safe in so far
as it will fail should it encounter anything "stupid".

However, get_wchan() is a different case; we know for certain that the
task is blocked, so it won't be running on another CPU, and with your
patch 4, we have this guarantee. However, that is not true of all
callers to the stacktracing code, so I don't see how we can sanely
switch to using the stacktracing code for this.

Josh Poimboeuf Oct. 14, 2021, 7:51 p.m. UTC | #3

On Thu, Oct 14, 2021 at 02:38:19PM +0100, Russell King (Oracle) wrote:
> What is going on here is that the ARM stacktrace code refuses to trace
> non-current tasks in a SMP environment due to the racy nature of doing
> so if the non-current tasks are running.
> 
> When walking the stack with frame pointers, we:
> 
> - validate that the frame pointer is between the stack pointer and the
>   top of stack defined by that stack pointer.
> - we then load the next stack pointer and next frame pointer from the
>   stack.
> 
> The reason this is unsafe when the task is not blocked is the stack can
> change at any moment, which can cause the value read as a stack pointer
> to be wildly different. If the read frame pointer value is roughly in
> agreement, we can end up reading any part of memory, which would be an
> information leak.

It would be a good idea to add some guardrails to prevent that
regardless.  If there's stack corruption for any reason, the unwinder
shouldn't make things worse.

On x86 the unwinder relies on the caller to ensure the task is blocked
(or current).  If the caller doesn't do that, they might get garbage,
and they get to keep the pieces.

But an important part of that is that the unwinder has guardrails to
ensure it handles stack corruption gracefully by never accessing out of
bounds of the stack.

When multiple stacks are involved in a kernel execution path (task, irq,
exception, etc), the stacks link to each other (e.g., last word on the
irq stack might point to the task stack).  Also the irq/exception stack
addresses are stored in percpu variables, and the task stack is in the
task struct.  So the unwinder can easily make sure it's in-bounds.  See
get_stack_info() in arch/x86/kernel/dumpstack_64.c.