mbox series

[00/12] error handling and pciehp maintenance

Message ID 20180918235848.26694-1-keith.busch@intel.com (mailing list archive)
Headers show
Series error handling and pciehp maintenance | expand

Message

Keith Busch Sept. 18, 2018, 11:58 p.m. UTC
I ran into a lot of trouble testing error handling, and this series is
just trying to simplify some things. The first 4 fix up aer_inject, and
the rest are cleanup to make better use of kernel APIs.

Keith Busch (12):
  PCI: Set PCI bus accessors to noinline
  PCI/AER: Covertly inject errors
  PCI/AER: Reuse existing service device lookup
  PCI/AER: Abstract AER interrupt handling
  PCI/AER: Remove dead code
  PCI/AER: Remove error source from aer struct
  PCI/AER: Use kfifo for tracking events
  PCI/AER: Use kfifo helper inserting locked elements
  PCI/AER: Don't read upstream ports below fatal errors
  PCI/AER: Use threaded IRQ for bottom half
  PCI/AER: Use managed resource allocations
  PCI/pciehp: Use device managed allocations

 drivers/pci/access.c              |   4 +-
 drivers/pci/hotplug/pciehp_core.c |  14 +-
 drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
 drivers/pci/pcie/Kconfig          |   2 +-
 drivers/pci/pcie/aer.c            | 219 ++++++---------------------
 drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
 drivers/pci/pcie/portdrv.h        |   4 -
 drivers/pci/pcie/portdrv_core.c   |   1 +
 8 files changed, 227 insertions(+), 371 deletions(-)

Comments

Bjorn Helgaas Oct. 4, 2018, 9:40 p.m. UTC | #1
On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> I ran into a lot of trouble testing error handling, and this series is
> just trying to simplify some things. The first 4 fix up aer_inject, and
> the rest are cleanup to make better use of kernel APIs.
> 
> Keith Busch (12):
>   PCI: Set PCI bus accessors to noinline
>   PCI/AER: Covertly inject errors
>   PCI/AER: Reuse existing service device lookup
>   PCI/AER: Abstract AER interrupt handling
>   PCI/AER: Remove dead code
>   PCI/AER: Remove error source from aer struct
>   PCI/AER: Use kfifo for tracking events
>   PCI/AER: Use kfifo helper inserting locked elements
>   PCI/AER: Don't read upstream ports below fatal errors
>   PCI/AER: Use threaded IRQ for bottom half
>   PCI/AER: Use managed resource allocations
>   PCI/pciehp: Use device managed allocations
> 
>  drivers/pci/access.c              |   4 +-
>  drivers/pci/hotplug/pciehp_core.c |  14 +-
>  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
>  drivers/pci/pcie/Kconfig          |   2 +-
>  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
>  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
>  drivers/pci/pcie/portdrv.h        |   4 -
>  drivers/pci/pcie/portdrv_core.c   |   1 +
>  8 files changed, 227 insertions(+), 371 deletions(-)

Thanks a lot for doing this!  I applied these to pci/hotplug for
v4.20, except for "PCI/AER: Don't read upstream ports below fatal
errors", which seems to be already there via another posting, and
"PCI/pciehp: Use device managed allocations", which needs a few
tweaks.
Keith Busch Oct. 4, 2018, 10:11 p.m. UTC | #2
On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > I ran into a lot of trouble testing error handling, and this series is
> > just trying to simplify some things. The first 4 fix up aer_inject, and
> > the rest are cleanup to make better use of kernel APIs.
> > 
> > Keith Busch (12):
> >   PCI: Set PCI bus accessors to noinline
> >   PCI/AER: Covertly inject errors
> >   PCI/AER: Reuse existing service device lookup
> >   PCI/AER: Abstract AER interrupt handling
> >   PCI/AER: Remove dead code
> >   PCI/AER: Remove error source from aer struct
> >   PCI/AER: Use kfifo for tracking events
> >   PCI/AER: Use kfifo helper inserting locked elements
> >   PCI/AER: Don't read upstream ports below fatal errors
> >   PCI/AER: Use threaded IRQ for bottom half
> >   PCI/AER: Use managed resource allocations
> >   PCI/pciehp: Use device managed allocations
> > 
> >  drivers/pci/access.c              |   4 +-
> >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> >  drivers/pci/pcie/Kconfig          |   2 +-
> >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> >  drivers/pci/pcie/portdrv.h        |   4 -
> >  drivers/pci/pcie/portdrv_core.c   |   1 +
> >  8 files changed, 227 insertions(+), 371 deletions(-)
> 
> Thanks a lot for doing this!  I applied these to pci/hotplug for
> v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> errors", which seems to be already there via another posting, and
> "PCI/pciehp: Use device managed allocations", which needs a few
> tweaks.

Sounds good, and thanks for applying!

In case this went unnoticed, patch 2's aer_inject using ftrace hooks
to pci config accessors is really cool and fixes several kernel crashes
I encountered, but it may not work on every architecture. I'm not sure
how widely aer_inject is used, so maybe there are no concerns with the
DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
dependency in case there are valid objections.
Bjorn Helgaas Oct. 5, 2018, 5:31 p.m. UTC | #3
[+cc arm64 folks, LKML: This conversation is about this patch:

  https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com

which fixes some PCIe AER error injection bugs, but also makes the error
injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
support.  Note that this question is only about the error *injection*
module used for testing.  It doesn't affect AER support itself.]

On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > I ran into a lot of trouble testing error handling, and this series is
> > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > the rest are cleanup to make better use of kernel APIs.
> > > 
> > > Keith Busch (12):
> > >   PCI: Set PCI bus accessors to noinline
> > >   PCI/AER: Covertly inject errors
> > >   PCI/AER: Reuse existing service device lookup
> > >   PCI/AER: Abstract AER interrupt handling
> > >   PCI/AER: Remove dead code
> > >   PCI/AER: Remove error source from aer struct
> > >   PCI/AER: Use kfifo for tracking events
> > >   PCI/AER: Use kfifo helper inserting locked elements
> > >   PCI/AER: Don't read upstream ports below fatal errors
> > >   PCI/AER: Use threaded IRQ for bottom half
> > >   PCI/AER: Use managed resource allocations
> > >   PCI/pciehp: Use device managed allocations
> > > 
> > >  drivers/pci/access.c              |   4 +-
> > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > >  drivers/pci/pcie/Kconfig          |   2 +-
> > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > >  drivers/pci/pcie/portdrv.h        |   4 -
> > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > 
> > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > errors", which seems to be already there via another posting, and
> > "PCI/pciehp: Use device managed allocations", which needs a few
> > tweaks.
> 
> Sounds good, and thanks for applying!
> 
> In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> to pci config accessors is really cool and fixes several kernel crashes
> I encountered, but it may not work on every architecture. I'm not sure
> how widely aer_inject is used, so maybe there are no concerns with the
> DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> dependency in case there are valid objections.

Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
on these arches:

  arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
  powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
  riscv		# ARCH_RV64I only
  s390
  x86

Notably missing is arm64, which has DYNAMIC_FTRACE but not
DYNAMIC_FTRACE_WITH_REGS.

Bjorn
Keith Busch Oct. 8, 2018, 4:18 p.m. UTC | #4
On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> [+cc arm64 folks, LKML: This conversation is about this patch:
> 
>   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> 
> which fixes some PCIe AER error injection bugs, but also makes the error
> injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> support.  Note that this question is only about the error *injection*
> module used for testing.  It doesn't affect AER support itself.]
> 
> On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > I ran into a lot of trouble testing error handling, and this series is
> > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > the rest are cleanup to make better use of kernel APIs.
> > > > 
> > > > Keith Busch (12):
> > > >   PCI: Set PCI bus accessors to noinline
> > > >   PCI/AER: Covertly inject errors
> > > >   PCI/AER: Reuse existing service device lookup
> > > >   PCI/AER: Abstract AER interrupt handling
> > > >   PCI/AER: Remove dead code
> > > >   PCI/AER: Remove error source from aer struct
> > > >   PCI/AER: Use kfifo for tracking events
> > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > >   PCI/AER: Use threaded IRQ for bottom half
> > > >   PCI/AER: Use managed resource allocations
> > > >   PCI/pciehp: Use device managed allocations
> > > > 
> > > >  drivers/pci/access.c              |   4 +-
> > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > 
> > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > errors", which seems to be already there via another posting, and
> > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > tweaks.
> > 
> > Sounds good, and thanks for applying!
> > 
> > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > to pci config accessors is really cool and fixes several kernel crashes
> > I encountered, but it may not work on every architecture. I'm not sure
> > how widely aer_inject is used, so maybe there are no concerns with the
> > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > dependency in case there are valid objections.
> 
> Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> on these arches:
> 
>   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
>   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
>   riscv		# ARCH_RV64I only
>   s390
>   x86
> 
> Notably missing is arm64, which has DYNAMIC_FTRACE but not
> DYNAMIC_FTRACE_WITH_REGS.
> 
> Bjorn

Looks like the kbuild bot found an ARM kernel config that has
DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
there. I'll need to update this patch regardless, and I think the right
thing to do is maintain the "old" way with conditional compiling for
any arch specific features.
Bjorn Helgaas Oct. 8, 2018, 5:23 p.m. UTC | #5
On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote:
> On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> > [+cc arm64 folks, LKML: This conversation is about this patch:
> > 
> >   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> > 
> > which fixes some PCIe AER error injection bugs, but also makes the error
> > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> > support.  Note that this question is only about the error *injection*
> > module used for testing.  It doesn't affect AER support itself.]
> > 
> > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > > I ran into a lot of trouble testing error handling, and this series is
> > > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > > the rest are cleanup to make better use of kernel APIs.
> > > > > 
> > > > > Keith Busch (12):
> > > > >   PCI: Set PCI bus accessors to noinline
> > > > >   PCI/AER: Covertly inject errors
> > > > >   PCI/AER: Reuse existing service device lookup
> > > > >   PCI/AER: Abstract AER interrupt handling
> > > > >   PCI/AER: Remove dead code
> > > > >   PCI/AER: Remove error source from aer struct
> > > > >   PCI/AER: Use kfifo for tracking events
> > > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > > >   PCI/AER: Use threaded IRQ for bottom half
> > > > >   PCI/AER: Use managed resource allocations
> > > > >   PCI/pciehp: Use device managed allocations
> > > > > 
> > > > >  drivers/pci/access.c              |   4 +-
> > > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > > 
> > > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > > errors", which seems to be already there via another posting, and
> > > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > > tweaks.
> > > 
> > > Sounds good, and thanks for applying!
> > > 
> > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > > to pci config accessors is really cool and fixes several kernel crashes
> > > I encountered, but it may not work on every architecture. I'm not sure
> > > how widely aer_inject is used, so maybe there are no concerns with the
> > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > > dependency in case there are valid objections.
> > 
> > Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> > on these arches:
> > 
> >   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> >   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
> >   riscv		# ARCH_RV64I only
> >   s390
> >   x86
> > 
> > Notably missing is arm64, which has DYNAMIC_FTRACE but not
> > DYNAMIC_FTRACE_WITH_REGS.
> > 
> > Bjorn
> 
> Looks like the kbuild bot found an ARM kernel config that has
> DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
> there. I'll need to update this patch regardless, and I think the right
> thing to do is maintain the "old" way with conditional compiling for
> any arch specific features.

Sounds messy, but probably the best route.

I dropped these patches for now:

  PCI/AER: Covertly inject errors with ftrace hooks
  PCI/AER: Reuse existing pcie_port_find_device() interface
  PCI/AER: Abstract AER interrupt handling
Will Deacon Oct. 9, 2018, 4:03 p.m. UTC | #6
Hi Bjorn,

On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> [+cc arm64 folks, LKML: This conversation is about this patch:
> 
>   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> 
> which fixes some PCIe AER error injection bugs, but also makes the error
> injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> support.  Note that this question is only about the error *injection*
> module used for testing.  It doesn't affect AER support itself.]
> 
> On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > to pci config accessors is really cool and fixes several kernel crashes
> > I encountered, but it may not work on every architecture. I'm not sure
> > how widely aer_inject is used, so maybe there are no concerns with the
> > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > dependency in case there are valid objections.
> 
> Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> on these arches:
> 
>   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
>   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
>   riscv		# ARCH_RV64I only
>   s390
>   x86
> 
> Notably missing is arm64, which has DYNAMIC_FTRACE but not
> DYNAMIC_FTRACE_WITH_REGS.

Thanks for the heads-up here. This feature is currently in development for
arm64:

http://lkml.kernel.org/r/20181001141648.1DBED68BDF@newverein.lst.de

but the latest review comments suggest that it's a fair way from ready yet,
so I wouldn't hold your breath.

Will
Lorenzo Pieralisi Nov. 6, 2018, 4:34 p.m. UTC | #7
On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote:
> On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> > [+cc arm64 folks, LKML: This conversation is about this patch:
> > 
> >   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> > 
> > which fixes some PCIe AER error injection bugs, but also makes the error
> > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> > support.  Note that this question is only about the error *injection*
> > module used for testing.  It doesn't affect AER support itself.]
> > 
> > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > > I ran into a lot of trouble testing error handling, and this series is
> > > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > > the rest are cleanup to make better use of kernel APIs.
> > > > > 
> > > > > Keith Busch (12):
> > > > >   PCI: Set PCI bus accessors to noinline
> > > > >   PCI/AER: Covertly inject errors
> > > > >   PCI/AER: Reuse existing service device lookup
> > > > >   PCI/AER: Abstract AER interrupt handling
> > > > >   PCI/AER: Remove dead code
> > > > >   PCI/AER: Remove error source from aer struct
> > > > >   PCI/AER: Use kfifo for tracking events
> > > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > > >   PCI/AER: Use threaded IRQ for bottom half
> > > > >   PCI/AER: Use managed resource allocations
> > > > >   PCI/pciehp: Use device managed allocations
> > > > > 
> > > > >  drivers/pci/access.c              |   4 +-
> > > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > > 
> > > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > > errors", which seems to be already there via another posting, and
> > > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > > tweaks.
> > > 
> > > Sounds good, and thanks for applying!
> > > 
> > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > > to pci config accessors is really cool and fixes several kernel crashes
> > > I encountered, but it may not work on every architecture. I'm not sure
> > > how widely aer_inject is used, so maybe there are no concerns with the
> > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > > dependency in case there are valid objections.
> > 
> > Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> > on these arches:
> > 
> >   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> >   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
> >   riscv		# ARCH_RV64I only
> >   s390
> >   x86
> > 
> > Notably missing is arm64, which has DYNAMIC_FTRACE but not
> > DYNAMIC_FTRACE_WITH_REGS.
> > 
> > Bjorn
> 
> Looks like the kbuild bot found an ARM kernel config that has
> DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
> there. I'll need to update this patch regardless, and I think the right
> thing to do is maintain the "old" way with conditional compiling for
> any arch specific features.

The question is whether we really need to dynamically patch the kernel
with ftrace to achieve what that patch does.

Furthermore, it would also be good to report what bugs we are actually
fixing, from what you are writing falling back to the current method if
!DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
fixing the current behaviour with something that does not depend on arch
features that may not even be implemented.

Anyway - please CC me in in the follow up series, I am happy to help you
test it.

Thanks,
Lorenzo
Keith Busch Nov. 6, 2018, 4:47 p.m. UTC | #8
On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote:
> The question is whether we really need to dynamically patch the kernel
> with ftrace to achieve what that patch does.
> 
> Furthermore, it would also be good to report what bugs we are actually
> fixing, from what you are writing falling back to the current method if
> !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
> fixing the current behaviour with something that does not depend on arch
> features that may not even be implemented.

There are two problems with the current method:

  1. It may dereference pci_dev after it was freed
  2. The pci_dev's children inherit its fake pci_bus's ops on
     enumeration

Both result in kernel panic.

The dynamic kernel patch just seemed like a cool way to inject errors
without messing with the driver's structures. But if there's a more
elegant way to do it, I'm all for it.
Lorenzo Pieralisi Nov. 6, 2018, 5:21 p.m. UTC | #9
On Tue, Nov 06, 2018 at 09:47:52AM -0700, Keith Busch wrote:
> On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote:
> > The question is whether we really need to dynamically patch the kernel
> > with ftrace to achieve what that patch does.
> > 
> > Furthermore, it would also be good to report what bugs we are actually
> > fixing, from what you are writing falling back to the current method if
> > !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
> > fixing the current behaviour with something that does not depend on arch
> > features that may not even be implemented.
> 
> There are two problems with the current method:
> 
>   1. It may dereference pci_dev after it was freed
>   2. The pci_dev's children inherit its fake pci_bus's ops on
>      enumeration
> 
> Both result in kernel panic.

That's my point, current test module is not robust, I wanted to ask if
there is a way to fix it that does not depend on arch features, because
if there is a dependency that is not met we are still not fixing the
current code, using it as a fallback can still create issues.

> The dynamic kernel patch just seemed like a cool way to inject errors
> without messing with the driver's structures. But if there's a more
> elegant way to do it, I'm all for it.

If you have a simple reproducer for the bugs I am happy to help you test
it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that
new code path if that's the final direction we are taking).

Thanks,
Lorenzo
Keith Busch Nov. 6, 2018, 5:26 p.m. UTC | #10
On Tue, Nov 06, 2018 at 05:21:00PM +0000, Lorenzo Pieralisi wrote:
> If you have a simple reproducer for the bugs I am happy to help you test
> it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that
> new code path if that's the final direction we are taking).

The easiest way to reproduce is load the aer_inject module, inject an
error into a bridge, then remove the bridge and rescan.