diff mbox series

[v3] kprobe/ftrace: bail out if ftrace was killed

Message ID 20240501162956.229427-1-stephen.s.brennan@oracle.com (mailing list archive)
State New
Headers show
Series [v3] kprobe/ftrace: bail out if ftrace was killed | expand

Commit Message

Stephen Brennan May 1, 2024, 4:29 p.m. UTC
If an error happens in ftrace, ftrace_kill() will prevent disarming
kprobes. Eventually, the ftrace_ops associated with the kprobes will be
freed, yet the kprobes will still be active, and when triggered, they
will use the freed memory, likely resulting in a page fault and panic.

This behavior can be reproduced quite easily, by creating a kprobe and
then triggering a ftrace_kill(). For simplicity, we can simulate an
ftrace error with a kernel module like [1]:

[1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer

  sudo perf probe --add commit_creds
  sudo perf trace -e probe:commit_creds
  # In another terminal
  make
  sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
  # Back to perf terminal
  # ctrl-c
  sudo perf probe --del commit_creds

After a short period, a page fault and panic would occur as the kprobe
continues to execute and uses the freed ftrace_ops. While ftrace_kill()
is supposed to be used only in extreme circumstances, it is invoked in
FTRACE_WARN_ON() and so there are many places where an unexpected bug
could be triggered, yet the system may continue operating, possibly
without the administrator noticing. If ftrace_kill() does not panic the
system, then we should do everything we can to continue operating,
rather than leave a ticking time bomb.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
Changes in v3:
  Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
  variable and check it directly in the kprobe handlers.
Link to v1/v2 discussion:
  https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.brennan@oracle.com/

 arch/csky/kernel/probes/ftrace.c     | 3 +++
 arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
 arch/parisc/kernel/ftrace.c          | 3 +++
 arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
 arch/riscv/kernel/probes/ftrace.c    | 3 +++
 arch/s390/kernel/ftrace.c            | 3 +++
 arch/x86/kernel/kprobes/ftrace.c     | 3 +++
 include/linux/kprobes.h              | 7 +++++++
 kernel/kprobes.c                     | 6 ++++++
 kernel/trace/ftrace.c                | 1 +
 10 files changed, 35 insertions(+)

Comments

Steven Rostedt May 1, 2024, 5:24 p.m. UTC | #1
On Wed,  1 May 2024 09:29:56 -0700
Stephen Brennan <stephen.s.brennan@oracle.com> wrote:

> If an error happens in ftrace, ftrace_kill() will prevent disarming
> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> freed, yet the kprobes will still be active, and when triggered, they
> will use the freed memory, likely resulting in a page fault and panic.
> 
> This behavior can be reproduced quite easily, by creating a kprobe and
> then triggering a ftrace_kill(). For simplicity, we can simulate an
> ftrace error with a kernel module like [1]:
> 
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> 
>   sudo perf probe --add commit_creds
>   sudo perf trace -e probe:commit_creds
>   # In another terminal
>   make
>   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>   # Back to perf terminal
>   # ctrl-c
>   sudo perf probe --del commit_creds
> 
> After a short period, a page fault and panic would occur as the kprobe
> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> is supposed to be used only in extreme circumstances, it is invoked in
> FTRACE_WARN_ON() and so there are many places where an unexpected bug
> could be triggered, yet the system may continue operating, possibly
> without the administrator noticing. If ftrace_kill() does not panic the
> system, then we should do everything we can to continue operating,
> rather than leave a ticking time bomb.
> 
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
> Changes in v3:
>   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>   variable and check it directly in the kprobe handlers.

Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Thanks,

-- Steve
Guo Ren May 1, 2024, 5:35 p.m. UTC | #2
On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> If an error happens in ftrace, ftrace_kill() will prevent disarming
> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> freed, yet the kprobes will still be active, and when triggered, they
> will use the freed memory, likely resulting in a page fault and panic.
>
> This behavior can be reproduced quite easily, by creating a kprobe and
> then triggering a ftrace_kill(). For simplicity, we can simulate an
> ftrace error with a kernel module like [1]:
>
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>
>   sudo perf probe --add commit_creds
>   sudo perf trace -e probe:commit_creds
>   # In another terminal
>   make
>   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>   # Back to perf terminal
>   # ctrl-c
>   sudo perf probe --del commit_creds
>
> After a short period, a page fault and panic would occur as the kprobe
> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> is supposed to be used only in extreme circumstances, it is invoked in
> FTRACE_WARN_ON() and so there are many places where an unexpected bug
> could be triggered, yet the system may continue operating, possibly
> without the administrator noticing. If ftrace_kill() does not panic the
> system, then we should do everything we can to continue operating,
> rather than leave a ticking time bomb.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
> Changes in v3:
>   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>   variable and check it directly in the kprobe handlers.
> Link to v1/v2 discussion:
>   https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.brennan@oracle.com/
>
>  arch/csky/kernel/probes/ftrace.c     | 3 +++
>  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
>  arch/parisc/kernel/ftrace.c          | 3 +++
>  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
>  arch/riscv/kernel/probes/ftrace.c    | 3 +++
>  arch/s390/kernel/ftrace.c            | 3 +++
>  arch/x86/kernel/kprobes/ftrace.c     | 3 +++
>  include/linux/kprobes.h              | 7 +++++++
>  kernel/kprobes.c                     | 6 ++++++
>  kernel/trace/ftrace.c                | 1 +
>  10 files changed, 35 insertions(+)
>
> diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
> index 834cffcfbce3..7ba4b98076de 100644
> --- a/arch/csky/kernel/probes/ftrace.c
> +++ b/arch/csky/kernel/probes/ftrace.c
> @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>         struct kprobe_ctlblk *kcb;
>         struct pt_regs *regs;
>
> +       if (unlikely(kprobe_ftrace_disabled))
> +               return;
> +
For csky part.
Acked-by: Guo Ren <guoren@kernel.org>

>         bit = ftrace_test_recursion_trylock(ip, parent_ip);
>         if (bit < 0)
>                 return;
> diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
> index 73858c9029cc..bff058317062 100644
> --- a/arch/loongarch/kernel/ftrace_dyn.c
> +++ b/arch/loongarch/kernel/ftrace_dyn.c
> @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>         struct kprobe *p;
>         struct kprobe_ctlblk *kcb;
>
> +       if (unlikely(kprobe_ftrace_disabled))
> +               return;
> +
>         bit = ftrace_test_recursion_trylock(ip, parent_ip);
>         if (bit < 0)
>                 return;
> diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
> index 621a4b386ae4..c91f9c2e61ed 100644
> --- a/arch/parisc/kernel/ftrace.c
> +++ b/arch/parisc/kernel/ftrace.c
> @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>         struct kprobe *p;
>         int bit;
>
> +       if (unlikely(kprobe_ftrace_disabled))
> +               return;
> +
>         bit = ftrace_test_recursion_trylock(ip, parent_ip);
>         if (bit < 0)
>                 return;
> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
> index 072ebe7f290b..f8208c027148 100644
> --- a/arch/powerpc/kernel/kprobes-ftrace.c
> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
>         struct pt_regs *regs;
>         int bit;
>
> +       if (unlikely(kprobe_ftrace_disabled))
> +               return;
> +
>         bit = ftrace_test_recursion_trylock(nip, parent_nip);
>         if (bit < 0)
>                 return;
> diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
> index 7142ec42e889..a69dfa610aa8 100644
> --- a/arch/riscv/kernel/probes/ftrace.c
> +++ b/arch/riscv/kernel/probes/ftrace.c
> @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>         struct kprobe_ctlblk *kcb;
>         int bit;
>
> +       if (unlikely(kprobe_ftrace_disabled))
> +               return;
> +
>         bit = ftrace_test_recursion_trylock(ip, parent_ip);
>         if (bit < 0)
>                 return;
> diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
> index c46381ea04ec..7f6f8c438c26 100644
> --- a/arch/s390/kernel/ftrace.c
> +++ b/arch/s390/kernel/ftrace.c
> @@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>         struct kprobe *p;
>         int bit;
>
> +       if (unlikely(kprobe_ftrace_disabled))
> +               return;
> +
>         bit = ftrace_test_recursion_trylock(ip, parent_ip);
>         if (bit < 0)
>                 return;
> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> index dd2ec14adb77..15af7e98e161 100644
> --- a/arch/x86/kernel/kprobes/ftrace.c
> +++ b/arch/x86/kernel/kprobes/ftrace.c
> @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>         struct kprobe_ctlblk *kcb;
>         int bit;
>
> +       if (unlikely(kprobe_ftrace_disabled))
> +               return;
> +
>         bit = ftrace_test_recursion_trylock(ip, parent_ip);
>         if (bit < 0)
>                 return;
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 0ff44d6633e3..5fcbc254d186 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -378,11 +378,15 @@ static inline void wait_for_kprobe_optimizer(void) { }
>  extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>                                   struct ftrace_ops *ops, struct ftrace_regs *fregs);
>  extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
> +/* Set when ftrace has been killed: kprobes on ftrace must be disabled for safety */
> +extern bool kprobe_ftrace_disabled __read_mostly;
> +extern void kprobe_ftrace_kill(void);
>  #else
>  static inline int arch_prepare_kprobe_ftrace(struct kprobe *p)
>  {
>         return -EINVAL;
>  }
> +static inline void kprobe_ftrace_kill(void) {}
>  #endif /* CONFIG_KPROBES_ON_FTRACE */
>
>  /* Get the kprobe at this addr (if any) - called with preemption disabled */
> @@ -495,6 +499,9 @@ static inline void kprobe_flush_task(struct task_struct *tk)
>  static inline void kprobe_free_init_mem(void)
>  {
>  }
> +static inline void kprobe_ftrace_kill(void)
> +{
> +}
>  static inline int disable_kprobe(struct kprobe *kp)
>  {
>         return -EOPNOTSUPP;
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 65adc815fc6e..166ebf81dc45 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1068,6 +1068,7 @@ static struct ftrace_ops kprobe_ipmodify_ops __read_mostly = {
>
>  static int kprobe_ipmodify_enabled;
>  static int kprobe_ftrace_enabled;
> +bool kprobe_ftrace_disabled;
>
>  static int __arm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
>                                int *cnt)
> @@ -1136,6 +1137,11 @@ static int disarm_kprobe_ftrace(struct kprobe *p)
>                 ipmodify ? &kprobe_ipmodify_ops : &kprobe_ftrace_ops,
>                 ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled);
>  }
> +
> +void kprobe_ftrace_kill()
> +{
> +       kprobe_ftrace_disabled = true;
> +}
>  #else  /* !CONFIG_KPROBES_ON_FTRACE */
>  static inline int arm_kprobe_ftrace(struct kprobe *p)
>  {
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index da1710499698..96db99c347b3 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7895,6 +7895,7 @@ void ftrace_kill(void)
>         ftrace_disabled = 1;
>         ftrace_enabled = 0;
>         ftrace_trace_function = ftrace_stub;
> +       kprobe_ftrace_kill();
>  }
>
>  /**
> --
> 2.39.3
>
Masami Hiramatsu (Google) May 2, 2024, 2:03 a.m. UTC | #3
On Thu, 2 May 2024 01:35:16 +0800
Guo Ren <guoren@kernel.org> wrote:

> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
> >
> > If an error happens in ftrace, ftrace_kill() will prevent disarming
> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> > freed, yet the kprobes will still be active, and when triggered, they
> > will use the freed memory, likely resulting in a page fault and panic.
> >
> > This behavior can be reproduced quite easily, by creating a kprobe and
> > then triggering a ftrace_kill(). For simplicity, we can simulate an
> > ftrace error with a kernel module like [1]:
> >
> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> >
> >   sudo perf probe --add commit_creds
> >   sudo perf trace -e probe:commit_creds
> >   # In another terminal
> >   make
> >   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
> >   # Back to perf terminal
> >   # ctrl-c
> >   sudo perf probe --del commit_creds
> >
> > After a short period, a page fault and panic would occur as the kprobe
> > continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> > is supposed to be used only in extreme circumstances, it is invoked in
> > FTRACE_WARN_ON() and so there are many places where an unexpected bug
> > could be triggered, yet the system may continue operating, possibly
> > without the administrator noticing. If ftrace_kill() does not panic the
> > system, then we should do everything we can to continue operating,
> > rather than leave a ticking time bomb.
> >
> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> > ---
> > Changes in v3:
> >   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
> >   variable and check it directly in the kprobe handlers.
> > Link to v1/v2 discussion:
> >   https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.brennan@oracle.com/
> >
> >  arch/csky/kernel/probes/ftrace.c     | 3 +++
> >  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
> >  arch/parisc/kernel/ftrace.c          | 3 +++
> >  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
> >  arch/riscv/kernel/probes/ftrace.c    | 3 +++
> >  arch/s390/kernel/ftrace.c            | 3 +++
> >  arch/x86/kernel/kprobes/ftrace.c     | 3 +++
> >  include/linux/kprobes.h              | 7 +++++++
> >  kernel/kprobes.c                     | 6 ++++++
> >  kernel/trace/ftrace.c                | 1 +
> >  10 files changed, 35 insertions(+)
> >
> > diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
> > index 834cffcfbce3..7ba4b98076de 100644
> > --- a/arch/csky/kernel/probes/ftrace.c
> > +++ b/arch/csky/kernel/probes/ftrace.c
> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >         struct kprobe_ctlblk *kcb;
> >         struct pt_regs *regs;
> >
> > +       if (unlikely(kprobe_ftrace_disabled))
> > +               return;
> > +
> For csky part.
> Acked-by: Guo Ren <guoren@kernel.org>

Thanks Stephen, Guo and Steve!

Let me pick this to probes/for-next!

Thank you,

> 
> >         bit = ftrace_test_recursion_trylock(ip, parent_ip);
> >         if (bit < 0)
> >                 return;
> > diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
> > index 73858c9029cc..bff058317062 100644
> > --- a/arch/loongarch/kernel/ftrace_dyn.c
> > +++ b/arch/loongarch/kernel/ftrace_dyn.c
> > @@ -287,6 +287,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >         struct kprobe *p;
> >         struct kprobe_ctlblk *kcb;
> >
> > +       if (unlikely(kprobe_ftrace_disabled))
> > +               return;
> > +
> >         bit = ftrace_test_recursion_trylock(ip, parent_ip);
> >         if (bit < 0)
> >                 return;
> > diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
> > index 621a4b386ae4..c91f9c2e61ed 100644
> > --- a/arch/parisc/kernel/ftrace.c
> > +++ b/arch/parisc/kernel/ftrace.c
> > @@ -206,6 +206,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >         struct kprobe *p;
> >         int bit;
> >
> > +       if (unlikely(kprobe_ftrace_disabled))
> > +               return;
> > +
> >         bit = ftrace_test_recursion_trylock(ip, parent_ip);
> >         if (bit < 0)
> >                 return;
> > diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
> > index 072ebe7f290b..f8208c027148 100644
> > --- a/arch/powerpc/kernel/kprobes-ftrace.c
> > +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> > @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
> >         struct pt_regs *regs;
> >         int bit;
> >
> > +       if (unlikely(kprobe_ftrace_disabled))
> > +               return;
> > +
> >         bit = ftrace_test_recursion_trylock(nip, parent_nip);
> >         if (bit < 0)
> >                 return;
> > diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
> > index 7142ec42e889..a69dfa610aa8 100644
> > --- a/arch/riscv/kernel/probes/ftrace.c
> > +++ b/arch/riscv/kernel/probes/ftrace.c
> > @@ -11,6 +11,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >         struct kprobe_ctlblk *kcb;
> >         int bit;
> >
> > +       if (unlikely(kprobe_ftrace_disabled))
> > +               return;
> > +
> >         bit = ftrace_test_recursion_trylock(ip, parent_ip);
> >         if (bit < 0)
> >                 return;
> > diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
> > index c46381ea04ec..7f6f8c438c26 100644
> > --- a/arch/s390/kernel/ftrace.c
> > +++ b/arch/s390/kernel/ftrace.c
> > @@ -296,6 +296,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >         struct kprobe *p;
> >         int bit;
> >
> > +       if (unlikely(kprobe_ftrace_disabled))
> > +               return;
> > +
> >         bit = ftrace_test_recursion_trylock(ip, parent_ip);
> >         if (bit < 0)
> >                 return;
> > diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> > index dd2ec14adb77..15af7e98e161 100644
> > --- a/arch/x86/kernel/kprobes/ftrace.c
> > +++ b/arch/x86/kernel/kprobes/ftrace.c
> > @@ -21,6 +21,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >         struct kprobe_ctlblk *kcb;
> >         int bit;
> >
> > +       if (unlikely(kprobe_ftrace_disabled))
> > +               return;
> > +
> >         bit = ftrace_test_recursion_trylock(ip, parent_ip);
> >         if (bit < 0)
> >                 return;
> > diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> > index 0ff44d6633e3..5fcbc254d186 100644
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -378,11 +378,15 @@ static inline void wait_for_kprobe_optimizer(void) { }
> >  extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >                                   struct ftrace_ops *ops, struct ftrace_regs *fregs);
> >  extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
> > +/* Set when ftrace has been killed: kprobes on ftrace must be disabled for safety */
> > +extern bool kprobe_ftrace_disabled __read_mostly;
> > +extern void kprobe_ftrace_kill(void);
> >  #else
> >  static inline int arch_prepare_kprobe_ftrace(struct kprobe *p)
> >  {
> >         return -EINVAL;
> >  }
> > +static inline void kprobe_ftrace_kill(void) {}
> >  #endif /* CONFIG_KPROBES_ON_FTRACE */
> >
> >  /* Get the kprobe at this addr (if any) - called with preemption disabled */
> > @@ -495,6 +499,9 @@ static inline void kprobe_flush_task(struct task_struct *tk)
> >  static inline void kprobe_free_init_mem(void)
> >  {
> >  }
> > +static inline void kprobe_ftrace_kill(void)
> > +{
> > +}
> >  static inline int disable_kprobe(struct kprobe *kp)
> >  {
> >         return -EOPNOTSUPP;
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 65adc815fc6e..166ebf81dc45 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1068,6 +1068,7 @@ static struct ftrace_ops kprobe_ipmodify_ops __read_mostly = {
> >
> >  static int kprobe_ipmodify_enabled;
> >  static int kprobe_ftrace_enabled;
> > +bool kprobe_ftrace_disabled;
> >
> >  static int __arm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
> >                                int *cnt)
> > @@ -1136,6 +1137,11 @@ static int disarm_kprobe_ftrace(struct kprobe *p)
> >                 ipmodify ? &kprobe_ipmodify_ops : &kprobe_ftrace_ops,
> >                 ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled);
> >  }
> > +
> > +void kprobe_ftrace_kill()
> > +{
> > +       kprobe_ftrace_disabled = true;
> > +}
> >  #else  /* !CONFIG_KPROBES_ON_FTRACE */
> >  static inline int arm_kprobe_ftrace(struct kprobe *p)
> >  {
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index da1710499698..96db99c347b3 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -7895,6 +7895,7 @@ void ftrace_kill(void)
> >         ftrace_disabled = 1;
> >         ftrace_enabled = 0;
> >         ftrace_trace_function = ftrace_stub;
> > +       kprobe_ftrace_kill();
> >  }
> >
> >  /**
> > --
> > 2.39.3
> >
> 
> 
> -- 
> Best Regards
>  Guo Ren
Christophe Leroy May 6, 2024, 2:46 p.m. UTC | #4
Le 01/05/2024 à 18:29, Stephen Brennan a écrit :
> If an error happens in ftrace, ftrace_kill() will prevent disarming
> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> freed, yet the kprobes will still be active, and when triggered, they
> will use the freed memory, likely resulting in a page fault and panic.
> 
> This behavior can be reproduced quite easily, by creating a kprobe and
> then triggering a ftrace_kill(). For simplicity, we can simulate an
> ftrace error with a kernel module like [1]:
> 
> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> 
>    sudo perf probe --add commit_creds
>    sudo perf trace -e probe:commit_creds
>    # In another terminal
>    make
>    sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>    # Back to perf terminal
>    # ctrl-c
>    sudo perf probe --del commit_creds
> 
> After a short period, a page fault and panic would occur as the kprobe
> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> is supposed to be used only in extreme circumstances, it is invoked in
> FTRACE_WARN_ON() and so there are many places where an unexpected bug
> could be triggered, yet the system may continue operating, possibly
> without the administrator noticing. If ftrace_kill() does not panic the
> system, then we should do everything we can to continue operating,
> rather than leave a ticking time bomb.
> 
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
> Changes in v3:
>    Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>    variable and check it directly in the kprobe handlers.

Isn't it safer to provide a fonction rather than a direct access to a 
variable ?

By the way, wouldn't it be more performant to use a static branch (jump 
label) ?

Christophe
Steven Rostedt May 6, 2024, 10:50 p.m. UTC | #5
On Mon, 6 May 2024 14:46:57 +0000
Christophe Leroy <christophe.leroy@csgroup.eu> wrote:

> Isn't it safer to provide a fonction rather than a direct access to a 
> variable ?
> 
> By the way, wouldn't it be more performant to use a static branch (jump 
> label) ?

A static branch could work, but the point of this is that if ftrace
failed, it was likely due to an issue with text modification. Do we want to
stop it via text modification?

-- Steve
Stephen Brennan May 7, 2024, 6:36 p.m. UTC | #6
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 01/05/2024 à 18:29, Stephen Brennan a écrit :
>> If an error happens in ftrace, ftrace_kill() will prevent disarming
>> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
>> freed, yet the kprobes will still be active, and when triggered, they
>> will use the freed memory, likely resulting in a page fault and panic.
>> 
>> This behavior can be reproduced quite easily, by creating a kprobe and
>> then triggering a ftrace_kill(). For simplicity, we can simulate an
>> ftrace error with a kernel module like [1]:
>> 
>> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>> 
>>    sudo perf probe --add commit_creds
>>    sudo perf trace -e probe:commit_creds
>>    # In another terminal
>>    make
>>    sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>>    # Back to perf terminal
>>    # ctrl-c
>>    sudo perf probe --del commit_creds
>> 
>> After a short period, a page fault and panic would occur as the kprobe
>> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
>> is supposed to be used only in extreme circumstances, it is invoked in
>> FTRACE_WARN_ON() and so there are many places where an unexpected bug
>> could be triggered, yet the system may continue operating, possibly
>> without the administrator noticing. If ftrace_kill() does not panic the
>> system, then we should do everything we can to continue operating,
>> rather than leave a ticking time bomb.
>> 
>> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>> ---
>> Changes in v3:
>>    Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>>    variable and check it directly in the kprobe handlers.
>
> Isn't it safer to provide a fonction rather than a direct access to a 
> variable ?

Is the concern that other code could modify this variable? If so, then I
suppose the function call is safer. But the variable is not exported and
I think built-in code can be trusted not to muck with it. Maybe I'm
missing your point about safety though?

> By the way, wouldn't it be more performant to use a static branch (jump 
> label) ?

I agree with Steven's concern that text modification would unfortunately
not be a good way to handle an error in text modification. Especially, I
believe there could be deadlock risks, as static key enablement requires
taking the text_mutex and the jump_label_mutex. I'd be concerned that
the text_mutex could already be held in some situations where
ftrace_kill() is called. But I'm not certain about that.

Thanks for taking a look!
Stephen
Stephen Brennan May 15, 2024, 10:18 p.m. UTC | #7
Masami Hiramatsu (Google) <mhiramat@kernel.org> writes:
> On Thu, 2 May 2024 01:35:16 +0800
> Guo Ren <guoren@kernel.org> wrote:
>
>> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
>> <stephen.s.brennan@oracle.com> wrote:
>> >
>> > If an error happens in ftrace, ftrace_kill() will prevent disarming
>> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be
>> > freed, yet the kprobes will still be active, and when triggered, they
>> > will use the freed memory, likely resulting in a page fault and panic.
>> >
>> > This behavior can be reproduced quite easily, by creating a kprobe and
>> > then triggering a ftrace_kill(). For simplicity, we can simulate an
>> > ftrace error with a kernel module like [1]:
>> >
>> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>> >
>> >   sudo perf probe --add commit_creds
>> >   sudo perf trace -e probe:commit_creds
>> >   # In another terminal
>> >   make
>> >   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>> >   # Back to perf terminal
>> >   # ctrl-c
>> >   sudo perf probe --del commit_creds
>> >
>> > After a short period, a page fault and panic would occur as the kprobe
>> > continues to execute and uses the freed ftrace_ops. While ftrace_kill()
>> > is supposed to be used only in extreme circumstances, it is invoked in
>> > FTRACE_WARN_ON() and so there are many places where an unexpected bug
>> > could be triggered, yet the system may continue operating, possibly
>> > without the administrator noticing. If ftrace_kill() does not panic the
>> > system, then we should do everything we can to continue operating,
>> > rather than leave a ticking time bomb.
>> >
>> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
>> > ---
>> > Changes in v3:
>> >   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>> >   variable and check it directly in the kprobe handlers.
>> > Link to v1/v2 discussion:
>> >   https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.brennan@oracle.com/
>> >
>> >  arch/csky/kernel/probes/ftrace.c     | 3 +++
>> >  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
>> >  arch/parisc/kernel/ftrace.c          | 3 +++
>> >  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
>> >  arch/riscv/kernel/probes/ftrace.c    | 3 +++
>> >  arch/s390/kernel/ftrace.c            | 3 +++
>> >  arch/x86/kernel/kprobes/ftrace.c     | 3 +++
>> >  include/linux/kprobes.h              | 7 +++++++
>> >  kernel/kprobes.c                     | 6 ++++++
>> >  kernel/trace/ftrace.c                | 1 +
>> >  10 files changed, 35 insertions(+)
>> >
>> > diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
>> > index 834cffcfbce3..7ba4b98076de 100644
>> > --- a/arch/csky/kernel/probes/ftrace.c
>> > +++ b/arch/csky/kernel/probes/ftrace.c
>> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
>> >         struct kprobe_ctlblk *kcb;
>> >         struct pt_regs *regs;
>> >
>> > +       if (unlikely(kprobe_ftrace_disabled))
>> > +               return;
>> > +
>> For csky part.
>> Acked-by: Guo Ren <guoren@kernel.org>
>
> Thanks Stephen, Guo and Steve!
>
> Let me pick this to probes/for-next!

Thank you Masami!

I did want to check, is this the correct git tree to be watching?

https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/log/?h=probes/for-next

( I'm not trying to pressure on timing, as I know the merge window is
  hectic. Just making sure I'm watching the correct place! )

Thanks,
Stephen
Masami Hiramatsu (Google) May 16, 2024, 12:23 a.m. UTC | #8
On Wed, 15 May 2024 15:18:08 -0700
Stephen Brennan <stephen.s.brennan@oracle.com> wrote:

> Masami Hiramatsu (Google) <mhiramat@kernel.org> writes:
> > On Thu, 2 May 2024 01:35:16 +0800
> > Guo Ren <guoren@kernel.org> wrote:
> >
> >> On Thu, May 2, 2024 at 12:30 AM Stephen Brennan
> >> <stephen.s.brennan@oracle.com> wrote:
> >> >
> >> > If an error happens in ftrace, ftrace_kill() will prevent disarming
> >> > kprobes. Eventually, the ftrace_ops associated with the kprobes will be
> >> > freed, yet the kprobes will still be active, and when triggered, they
> >> > will use the freed memory, likely resulting in a page fault and panic.
> >> >
> >> > This behavior can be reproduced quite easily, by creating a kprobe and
> >> > then triggering a ftrace_kill(). For simplicity, we can simulate an
> >> > ftrace error with a kernel module like [1]:
> >> >
> >> > [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
> >> >
> >> >   sudo perf probe --add commit_creds
> >> >   sudo perf trace -e probe:commit_creds
> >> >   # In another terminal
> >> >   make
> >> >   sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
> >> >   # Back to perf terminal
> >> >   # ctrl-c
> >> >   sudo perf probe --del commit_creds
> >> >
> >> > After a short period, a page fault and panic would occur as the kprobe
> >> > continues to execute and uses the freed ftrace_ops. While ftrace_kill()
> >> > is supposed to be used only in extreme circumstances, it is invoked in
> >> > FTRACE_WARN_ON() and so there are many places where an unexpected bug
> >> > could be triggered, yet the system may continue operating, possibly
> >> > without the administrator noticing. If ftrace_kill() does not panic the
> >> > system, then we should do everything we can to continue operating,
> >> > rather than leave a ticking time bomb.
> >> >
> >> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> >> > ---
> >> > Changes in v3:
> >> >   Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
> >> >   variable and check it directly in the kprobe handlers.
> >> > Link to v1/v2 discussion:
> >> >   https://lore.kernel.org/all/20240426225834.993353-1-stephen.s.brennan@oracle.com/
> >> >
> >> >  arch/csky/kernel/probes/ftrace.c     | 3 +++
> >> >  arch/loongarch/kernel/ftrace_dyn.c   | 3 +++
> >> >  arch/parisc/kernel/ftrace.c          | 3 +++
> >> >  arch/powerpc/kernel/kprobes-ftrace.c | 3 +++
> >> >  arch/riscv/kernel/probes/ftrace.c    | 3 +++
> >> >  arch/s390/kernel/ftrace.c            | 3 +++
> >> >  arch/x86/kernel/kprobes/ftrace.c     | 3 +++
> >> >  include/linux/kprobes.h              | 7 +++++++
> >> >  kernel/kprobes.c                     | 6 ++++++
> >> >  kernel/trace/ftrace.c                | 1 +
> >> >  10 files changed, 35 insertions(+)
> >> >
> >> > diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
> >> > index 834cffcfbce3..7ba4b98076de 100644
> >> > --- a/arch/csky/kernel/probes/ftrace.c
> >> > +++ b/arch/csky/kernel/probes/ftrace.c
> >> > @@ -12,6 +12,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> >> >         struct kprobe_ctlblk *kcb;
> >> >         struct pt_regs *regs;
> >> >
> >> > +       if (unlikely(kprobe_ftrace_disabled))
> >> > +               return;
> >> > +
> >> For csky part.
> >> Acked-by: Guo Ren <guoren@kernel.org>
> >
> > Thanks Stephen, Guo and Steve!
> >
> > Let me pick this to probes/for-next!
> 
> Thank you Masami!
> 
> I did want to check, is this the correct git tree to be watching?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/log/?h=probes/for-next
> 
> ( I'm not trying to pressure on timing, as I know the merge window is
>   hectic. Just making sure I'm watching the correct place! )

Sorry, I forgot to push it from my local tree. Now it should be there.

Thanks,
diff mbox series

Patch

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index 834cffcfbce3..7ba4b98076de 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -12,6 +12,9 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe_ctlblk *kcb;
 	struct pt_regs *regs;
 
+	if (unlikely(kprobe_ftrace_disabled))
+		return;
+
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..bff058317062 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -287,6 +287,9 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p;
 	struct kprobe_ctlblk *kcb;
 
+	if (unlikely(kprobe_ftrace_disabled))
+		return;
+
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 621a4b386ae4..c91f9c2e61ed 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -206,6 +206,9 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p;
 	int bit;
 
+	if (unlikely(kprobe_ftrace_disabled))
+		return;
+
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 072ebe7f290b..f8208c027148 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -21,6 +21,9 @@  void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 	struct pt_regs *regs;
 	int bit;
 
+	if (unlikely(kprobe_ftrace_disabled))
+		return;
+
 	bit = ftrace_test_recursion_trylock(nip, parent_nip);
 	if (bit < 0)
 		return;
diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
index 7142ec42e889..a69dfa610aa8 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -11,6 +11,9 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe_ctlblk *kcb;
 	int bit;
 
+	if (unlikely(kprobe_ftrace_disabled))
+		return;
+
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..7f6f8c438c26 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -296,6 +296,9 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe *p;
 	int bit;
 
+	if (unlikely(kprobe_ftrace_disabled))
+		return;
+
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index dd2ec14adb77..15af7e98e161 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -21,6 +21,9 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	struct kprobe_ctlblk *kcb;
 	int bit;
 
+	if (unlikely(kprobe_ftrace_disabled))
+		return;
+
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (bit < 0)
 		return;
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 0ff44d6633e3..5fcbc254d186 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -378,11 +378,15 @@  static inline void wait_for_kprobe_optimizer(void) { }
 extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 				  struct ftrace_ops *ops, struct ftrace_regs *fregs);
 extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
+/* Set when ftrace has been killed: kprobes on ftrace must be disabled for safety */
+extern bool kprobe_ftrace_disabled __read_mostly;
+extern void kprobe_ftrace_kill(void);
 #else
 static inline int arch_prepare_kprobe_ftrace(struct kprobe *p)
 {
 	return -EINVAL;
 }
+static inline void kprobe_ftrace_kill(void) {}
 #endif /* CONFIG_KPROBES_ON_FTRACE */
 
 /* Get the kprobe at this addr (if any) - called with preemption disabled */
@@ -495,6 +499,9 @@  static inline void kprobe_flush_task(struct task_struct *tk)
 static inline void kprobe_free_init_mem(void)
 {
 }
+static inline void kprobe_ftrace_kill(void)
+{
+}
 static inline int disable_kprobe(struct kprobe *kp)
 {
 	return -EOPNOTSUPP;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 65adc815fc6e..166ebf81dc45 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1068,6 +1068,7 @@  static struct ftrace_ops kprobe_ipmodify_ops __read_mostly = {
 
 static int kprobe_ipmodify_enabled;
 static int kprobe_ftrace_enabled;
+bool kprobe_ftrace_disabled;
 
 static int __arm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
 			       int *cnt)
@@ -1136,6 +1137,11 @@  static int disarm_kprobe_ftrace(struct kprobe *p)
 		ipmodify ? &kprobe_ipmodify_ops : &kprobe_ftrace_ops,
 		ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled);
 }
+
+void kprobe_ftrace_kill()
+{
+	kprobe_ftrace_disabled = true;
+}
 #else	/* !CONFIG_KPROBES_ON_FTRACE */
 static inline int arm_kprobe_ftrace(struct kprobe *p)
 {
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index da1710499698..96db99c347b3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7895,6 +7895,7 @@  void ftrace_kill(void)
 	ftrace_disabled = 1;
 	ftrace_enabled = 0;
 	ftrace_trace_function = ftrace_stub;
+	kprobe_ftrace_kill();
 }
 
 /**