mbox series

[v12,00/25] Linux RISC-V AIA Support

Message ID 20240127161753.114685-1-apatel@ventanamicro.com (mailing list archive)
Headers show
Series Linux RISC-V AIA Support | expand

Message

Anup Patel Jan. 27, 2024, 4:17 p.m. UTC
The RISC-V AIA specification is ratified as-per the RISC-V international
process. The latest ratified AIA specifcation can be found at:
https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf

At a high-level, the AIA specification adds three things:
1) AIA CSRs
   - Improved local interrupt support
2) Incoming Message Signaled Interrupt Controller (IMSIC)
   - Per-HART MSI controller
   - Support MSI virtualization
   - Support IPI along with virtualization
3) Advanced Platform-Level Interrupt Controller (APLIC)
   - Wired interrupt controller
   - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
   - In Direct-mode, injects external interrupts directly into HARTs

For an overview of the AIA specification, refer the AIA virtualization
talk at KVM Forum 2022:
https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
https://www.youtube.com/watch?v=r071dL8Z0yo

To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).

These patches can also be found in the riscv_aia_v12 branch at:
https://github.com/avpatel/linux.git

Changes since v11:
 - Rebased on Linux-6.8-rc1
 - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
   MSI handling to per device MSI domains" series by Thomas.
   (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
    PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
    https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
 - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
 - Updated IMSIC driver to support per-device MSI domains for PCI and
   platform devices.

Changes since v10:
 - Rebased on Linux-6.6-rc7
 - Dropped PATCH3 of v10 series since this has been merged by MarcZ
   for Linux-6.6-rc7
 - Changed the IMSIC ID management strategy from 1-n approach to
   x86-style 1-1 approach

Changes since v9:
 - Rebased on Linux-6.6-rc4
 - Use builtin_platform_driver() in PATCH5, PATCH9, and PATCH12

Changes since v8:
 - Rebased on Linux-6.6-rc3
 - Dropped PATCH2 of v8 series since we won't be requiring
   riscv_get_intc_hartid() based on Marc Z's comments on ACPI AIA support.
 - Addressed Saravana's comments in PATCH3 of v8 series
 - Update PATCH9 and PATCH13 of v8 series based on comments from Sunil

Changes since v7:
 - Rebased on Linux-6.6-rc1
 - Addressed comments on PATCH1 of v7 series and split it into two PATCHes
 - Use DEFINE_SIMPLE_PROP() in PATCH2 of v7 series

Changes since v6:
 - Rebased on Linux-6.5-rc4
 - Updated PATCH2 to use IS_ENABLED(CONFIG_SPARC) instead of
   !IS_ENABLED(CONFIG_OF_IRQ)
 - Added new PATCH4 to fix syscore registration in PLIC driver
 - Update PATCH5 to convert PLIC driver into full-blown platform driver
   with a re-written probe function.

Changes since v5:
 - Rebased on Linux-6.5-rc2
 - Updated the overall series to ensure that only IPI, timer, and
   INTC drivers are probed very early whereas rest of the interrupt
   controllers (such as PLIC, APLIC, and IMISC) are probed as
   regular platform drivers.
 - Renamed riscv_fw_parent_hartid() to riscv_get_intc_hartid()
 - New PATCH1 to add fw_devlink support for msi-parent DT property
 - New PATCH2 to ensure all INTC suppliers are initialized which in-turn
   fixes the probing issue for PLIC, APLIC and IMSIC as platform driver
 - New PATCH3 to use platform driver probing for PLIC
 - Re-structured the IMSIC driver into two separate drivers: early and
   platform. The IMSIC early driver (PATCH7) only initialized IMSIC state
   and provides IPIs whereas the IMSIC platform driver (PATCH8) is probed
   provides MSI domain for platform devices.
 - Re-structure the APLIC platform driver into three separe sources: main,
   direct mode, and MSI mode.

Changes since v4:
 - Rebased on Linux-6.5-rc1
 - Added "Dependencies" in the APLIC bindings (PATCH6 in v4)
 - Dropped the PATCH6 which was changing the IOMMU DMA domain APIs
 - Dropped use of IOMMU DMA APIs in the IMSIC driver (PATCH4)

Changes since v3:
 - Rebased on Linux-6.4-rc6
 - Droped PATCH2 of v3 series instead we now set FWNODE_FLAG_BEST_EFFORT via
   IRQCHIP_DECLARE()
 - Extend riscv_fw_parent_hartid() to support both DT and ACPI in PATCH1
 - Extend iommu_dma_compose_msi_msg() instead of adding iommu_dma_select_msi()
   in PATCH6
 - Addressed Conor's comments in PATCH3
 - Addressed Conor's and Rob's comments in PATCH7

Changes since v2:
 - Rebased on Linux-6.4-rc1
 - Addressed Rob's comments on DT bindings patches 4 and 8.
 - Addessed Marc's comments on IMSIC driver PATCH5
 - Replaced use of OF apis in APLIC and IMSIC drivers with FWNODE apis
   this makes both drivers easily portable for ACPI support. This also
   removes unnecessary indirection from the APLIC and IMSIC drivers.
 - PATCH1 is a new patch for portability with ACPI support
 - PATCH2 is a new patch to fix probing in APLIC drivers for APLIC-only systems.
 - PATCH7 is a new patch which addresses the IOMMU DMA domain issues pointed
   out by SiFive

Changes since v1:
 - Rebased on Linux-6.2-rc2
 - Addressed comments on IMSIC DT bindings for PATCH4
 - Use raw_spin_lock_irqsave() on ids_lock for PATCH5
 - Improved MMIO alignment checks in PATCH5 to allow MMIO regions
   with holes.
 - Addressed comments on APLIC DT bindings for PATCH6
 - Fixed warning splat in aplic_msi_write_msg() caused by
   zeroed MSI message in PATCH7
 - Dropped DT property riscv,slow-ipi instead will have module
   parameter in future.

Anup Patel (11):
  irqchip/sifive-plic: Convert PLIC driver into a platform driver
  irqchip/riscv-intc: Add support for RISC-V AIA
  dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller
  irqchip: Add RISC-V incoming MSI controller early driver
  irqchip/riscv-imsic: Add device MSI domain support for platform
    devices
  irqchip/riscv-imsic: Add device MSI domain support for PCI devices
  dt-bindings: interrupt-controller: Add RISC-V advanced PLIC
  irqchip: Add RISC-V advanced PLIC driver for direct-mode
  irqchip/riscv-aplic: Add support for MSI-mode
  RISC-V: Select APLIC and IMSIC drivers
  MAINTAINERS: Add entry for RISC-V AIA drivers

Björn Töpel (1):
  genirq/matrix: Dynamic bitmap allocation

Thomas Gleixner (13):
  irqchip/gic-v3: Make gic_irq_domain_select() robust for zero parameter
    count
  genirq/irqdomain: Remove the param count restriction from select()
  genirq/msi: Extend msi_parent_ops
  genirq/irqdomain: Add DOMAIN_BUS_DEVICE_IMS
  platform-msi: Prepare for real per device domains
  irqchip: Convert all platform MSI users to the new API
  genirq/msi: Provide optional translation op
  genirq/msi: Split msi_domain_alloc_irq_at()
  genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI
  genirq/msi: Optionally use dev->fwnode for device domain
  genirq/msi: Provide allocation/free functions for "wired" MSI
    interrupts
  genirq/irqdomain: Reroute device MSI create_mapping
  genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV

 .../interrupt-controller/riscv,aplic.yaml     | 172 ++++
 .../interrupt-controller/riscv,imsics.yaml    | 172 ++++
 MAINTAINERS                                   |  14 +
 arch/riscv/Kconfig                            |   2 +
 arch/x86/include/asm/hw_irq.h                 |   2 -
 drivers/base/platform-msi.c                   |  97 ++
 drivers/dma/mv_xor_v2.c                       |   8 +-
 drivers/dma/qcom/hidma.c                      |   6 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   5 +-
 drivers/irqchip/Kconfig                       |  25 +
 drivers/irqchip/Makefile                      |   3 +
 drivers/irqchip/irq-gic-v3.c                  |   6 +-
 drivers/irqchip/irq-riscv-aplic-direct.c      | 343 +++++++
 drivers/irqchip/irq-riscv-aplic-main.c        | 232 +++++
 drivers/irqchip/irq-riscv-aplic-main.h        |  53 ++
 drivers/irqchip/irq-riscv-aplic-msi.c         | 256 +++++
 drivers/irqchip/irq-riscv-imsic-early.c       | 241 +++++
 drivers/irqchip/irq-riscv-imsic-platform.c    | 403 ++++++++
 drivers/irqchip/irq-riscv-imsic-state.c       | 887 ++++++++++++++++++
 drivers/irqchip/irq-riscv-imsic-state.h       | 105 +++
 drivers/irqchip/irq-riscv-intc.c              |  34 +-
 drivers/irqchip/irq-sifive-plic.c             | 239 +++--
 drivers/mailbox/bcm-flexrm-mailbox.c          |   8 +-
 drivers/perf/arm_smmuv3_pmu.c                 |   4 +-
 drivers/ufs/host/ufs-qcom.c                   |   8 +-
 include/linux/irqchip/riscv-aplic.h           | 119 +++
 include/linux/irqchip/riscv-imsic.h           |  87 ++
 include/linux/irqdomain.h                     |  17 +
 include/linux/irqdomain_defs.h                |   2 +
 include/linux/msi.h                           |  21 +
 kernel/irq/irqdomain.c                        |  28 +-
 kernel/irq/matrix.c                           |  28 +-
 kernel/irq/msi.c                              | 184 +++-
 33 files changed, 3636 insertions(+), 175 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
 create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
 create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
 create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
 create mode 100644 drivers/irqchip/irq-riscv-aplic-msi.c
 create mode 100644 drivers/irqchip/irq-riscv-imsic-early.c
 create mode 100644 drivers/irqchip/irq-riscv-imsic-platform.c
 create mode 100644 drivers/irqchip/irq-riscv-imsic-state.c
 create mode 100644 drivers/irqchip/irq-riscv-imsic-state.h
 create mode 100644 include/linux/irqchip/riscv-aplic.h
 create mode 100644 include/linux/irqchip/riscv-imsic.h

Comments

Anup Patel Jan. 27, 2024, 4:20 p.m. UTC | #1
Hi Thomas,

On Sat, Jan 27, 2024 at 9:48 PM Anup Patel <apatel@ventanamicro.com> wrote:
>
> The RISC-V AIA specification is ratified as-per the RISC-V international
> process. The latest ratified AIA specifcation can be found at:
> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>
> At a high-level, the AIA specification adds three things:
> 1) AIA CSRs
>    - Improved local interrupt support
> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>    - Per-HART MSI controller
>    - Support MSI virtualization
>    - Support IPI along with virtualization
> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>    - Wired interrupt controller
>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>    - In Direct-mode, injects external interrupts directly into HARTs
>
> For an overview of the AIA specification, refer the AIA virtualization
> talk at KVM Forum 2022:
> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> https://www.youtube.com/watch?v=r071dL8Z0yo
>
> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>
> These patches can also be found in the riscv_aia_v12 branch at:
> https://github.com/avpatel/linux.git
>
> Changes since v11:
>  - Rebased on Linux-6.8-rc1
>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>    MSI handling to per device MSI domains" series by Thomas.
>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>    platform devices.
>
> Changes since v10:
>  - Rebased on Linux-6.6-rc7
>  - Dropped PATCH3 of v10 series since this has been merged by MarcZ
>    for Linux-6.6-rc7
>  - Changed the IMSIC ID management strategy from 1-n approach to
>    x86-style 1-1 approach
>
> Changes since v9:
>  - Rebased on Linux-6.6-rc4
>  - Use builtin_platform_driver() in PATCH5, PATCH9, and PATCH12
>
> Changes since v8:
>  - Rebased on Linux-6.6-rc3
>  - Dropped PATCH2 of v8 series since we won't be requiring
>    riscv_get_intc_hartid() based on Marc Z's comments on ACPI AIA support.
>  - Addressed Saravana's comments in PATCH3 of v8 series
>  - Update PATCH9 and PATCH13 of v8 series based on comments from Sunil
>
> Changes since v7:
>  - Rebased on Linux-6.6-rc1
>  - Addressed comments on PATCH1 of v7 series and split it into two PATCHes
>  - Use DEFINE_SIMPLE_PROP() in PATCH2 of v7 series
>
> Changes since v6:
>  - Rebased on Linux-6.5-rc4
>  - Updated PATCH2 to use IS_ENABLED(CONFIG_SPARC) instead of
>    !IS_ENABLED(CONFIG_OF_IRQ)
>  - Added new PATCH4 to fix syscore registration in PLIC driver
>  - Update PATCH5 to convert PLIC driver into full-blown platform driver
>    with a re-written probe function.
>
> Changes since v5:
>  - Rebased on Linux-6.5-rc2
>  - Updated the overall series to ensure that only IPI, timer, and
>    INTC drivers are probed very early whereas rest of the interrupt
>    controllers (such as PLIC, APLIC, and IMISC) are probed as
>    regular platform drivers.
>  - Renamed riscv_fw_parent_hartid() to riscv_get_intc_hartid()
>  - New PATCH1 to add fw_devlink support for msi-parent DT property
>  - New PATCH2 to ensure all INTC suppliers are initialized which in-turn
>    fixes the probing issue for PLIC, APLIC and IMSIC as platform driver
>  - New PATCH3 to use platform driver probing for PLIC
>  - Re-structured the IMSIC driver into two separate drivers: early and
>    platform. The IMSIC early driver (PATCH7) only initialized IMSIC state
>    and provides IPIs whereas the IMSIC platform driver (PATCH8) is probed
>    provides MSI domain for platform devices.
>  - Re-structure the APLIC platform driver into three separe sources: main,
>    direct mode, and MSI mode.
>
> Changes since v4:
>  - Rebased on Linux-6.5-rc1
>  - Added "Dependencies" in the APLIC bindings (PATCH6 in v4)
>  - Dropped the PATCH6 which was changing the IOMMU DMA domain APIs
>  - Dropped use of IOMMU DMA APIs in the IMSIC driver (PATCH4)
>
> Changes since v3:
>  - Rebased on Linux-6.4-rc6
>  - Droped PATCH2 of v3 series instead we now set FWNODE_FLAG_BEST_EFFORT via
>    IRQCHIP_DECLARE()
>  - Extend riscv_fw_parent_hartid() to support both DT and ACPI in PATCH1
>  - Extend iommu_dma_compose_msi_msg() instead of adding iommu_dma_select_msi()
>    in PATCH6
>  - Addressed Conor's comments in PATCH3
>  - Addressed Conor's and Rob's comments in PATCH7
>
> Changes since v2:
>  - Rebased on Linux-6.4-rc1
>  - Addressed Rob's comments on DT bindings patches 4 and 8.
>  - Addessed Marc's comments on IMSIC driver PATCH5
>  - Replaced use of OF apis in APLIC and IMSIC drivers with FWNODE apis
>    this makes both drivers easily portable for ACPI support. This also
>    removes unnecessary indirection from the APLIC and IMSIC drivers.
>  - PATCH1 is a new patch for portability with ACPI support
>  - PATCH2 is a new patch to fix probing in APLIC drivers for APLIC-only systems.
>  - PATCH7 is a new patch which addresses the IOMMU DMA domain issues pointed
>    out by SiFive
>
> Changes since v1:
>  - Rebased on Linux-6.2-rc2
>  - Addressed comments on IMSIC DT bindings for PATCH4
>  - Use raw_spin_lock_irqsave() on ids_lock for PATCH5
>  - Improved MMIO alignment checks in PATCH5 to allow MMIO regions
>    with holes.
>  - Addressed comments on APLIC DT bindings for PATCH6
>  - Fixed warning splat in aplic_msi_write_msg() caused by
>    zeroed MSI message in PATCH7
>  - Dropped DT property riscv,slow-ipi instead will have module
>    parameter in future.
>
> Anup Patel (11):
>   irqchip/sifive-plic: Convert PLIC driver into a platform driver
>   irqchip/riscv-intc: Add support for RISC-V AIA
>   dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller
>   irqchip: Add RISC-V incoming MSI controller early driver
>   irqchip/riscv-imsic: Add device MSI domain support for platform
>     devices
>   irqchip/riscv-imsic: Add device MSI domain support for PCI devices
>   dt-bindings: interrupt-controller: Add RISC-V advanced PLIC
>   irqchip: Add RISC-V advanced PLIC driver for direct-mode
>   irqchip/riscv-aplic: Add support for MSI-mode
>   RISC-V: Select APLIC and IMSIC drivers
>   MAINTAINERS: Add entry for RISC-V AIA drivers
>
> Björn Töpel (1):
>   genirq/matrix: Dynamic bitmap allocation
>
> Thomas Gleixner (13):
>   irqchip/gic-v3: Make gic_irq_domain_select() robust for zero parameter
>     count
>   genirq/irqdomain: Remove the param count restriction from select()
>   genirq/msi: Extend msi_parent_ops
>   genirq/irqdomain: Add DOMAIN_BUS_DEVICE_IMS
>   platform-msi: Prepare for real per device domains
>   irqchip: Convert all platform MSI users to the new API
>   genirq/msi: Provide optional translation op
>   genirq/msi: Split msi_domain_alloc_irq_at()
>   genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI
>   genirq/msi: Optionally use dev->fwnode for device domain
>   genirq/msi: Provide allocation/free functions for "wired" MSI
>     interrupts
>   genirq/irqdomain: Reroute device MSI create_mapping
>   genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV

I have rebased and included 13 patches (which add per-device MSI domain
infrastructure) from your series [1]. In this series, the IMSIC driver
implements the msi_parent_ops and APLIC driver implements wired-to-msi
bridge using your new infrastructure.

The remaining 27 patches of your series [1] requires testing on ARM
platforms which I don't have. I suggest these remaining patches to
go as separate series.

I hope you are okay with this approach.

Best Regards,
Anup

[1] https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/


>
>  .../interrupt-controller/riscv,aplic.yaml     | 172 ++++
>  .../interrupt-controller/riscv,imsics.yaml    | 172 ++++
>  MAINTAINERS                                   |  14 +
>  arch/riscv/Kconfig                            |   2 +
>  arch/x86/include/asm/hw_irq.h                 |   2 -
>  drivers/base/platform-msi.c                   |  97 ++
>  drivers/dma/mv_xor_v2.c                       |   8 +-
>  drivers/dma/qcom/hidma.c                      |   6 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   5 +-
>  drivers/irqchip/Kconfig                       |  25 +
>  drivers/irqchip/Makefile                      |   3 +
>  drivers/irqchip/irq-gic-v3.c                  |   6 +-
>  drivers/irqchip/irq-riscv-aplic-direct.c      | 343 +++++++
>  drivers/irqchip/irq-riscv-aplic-main.c        | 232 +++++
>  drivers/irqchip/irq-riscv-aplic-main.h        |  53 ++
>  drivers/irqchip/irq-riscv-aplic-msi.c         | 256 +++++
>  drivers/irqchip/irq-riscv-imsic-early.c       | 241 +++++
>  drivers/irqchip/irq-riscv-imsic-platform.c    | 403 ++++++++
>  drivers/irqchip/irq-riscv-imsic-state.c       | 887 ++++++++++++++++++
>  drivers/irqchip/irq-riscv-imsic-state.h       | 105 +++
>  drivers/irqchip/irq-riscv-intc.c              |  34 +-
>  drivers/irqchip/irq-sifive-plic.c             | 239 +++--
>  drivers/mailbox/bcm-flexrm-mailbox.c          |   8 +-
>  drivers/perf/arm_smmuv3_pmu.c                 |   4 +-
>  drivers/ufs/host/ufs-qcom.c                   |   8 +-
>  include/linux/irqchip/riscv-aplic.h           | 119 +++
>  include/linux/irqchip/riscv-imsic.h           |  87 ++
>  include/linux/irqdomain.h                     |  17 +
>  include/linux/irqdomain_defs.h                |   2 +
>  include/linux/msi.h                           |  21 +
>  kernel/irq/irqdomain.c                        |  28 +-
>  kernel/irq/matrix.c                           |  28 +-
>  kernel/irq/msi.c                              | 184 +++-
>  33 files changed, 3636 insertions(+), 175 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml
>  create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml
>  create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c
>  create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c
>  create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h
>  create mode 100644 drivers/irqchip/irq-riscv-aplic-msi.c
>  create mode 100644 drivers/irqchip/irq-riscv-imsic-early.c
>  create mode 100644 drivers/irqchip/irq-riscv-imsic-platform.c
>  create mode 100644 drivers/irqchip/irq-riscv-imsic-state.c
>  create mode 100644 drivers/irqchip/irq-riscv-imsic-state.h
>  create mode 100644 include/linux/irqchip/riscv-aplic.h
>  create mode 100644 include/linux/irqchip/riscv-imsic.h
>
> --
> 2.34.1
>
Björn Töpel Jan. 30, 2024, 7:16 a.m. UTC | #2
Anup Patel <apatel@ventanamicro.com> writes:

> The RISC-V AIA specification is ratified as-per the RISC-V international
> process. The latest ratified AIA specifcation can be found at:
> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>
> At a high-level, the AIA specification adds three things:
> 1) AIA CSRs
>    - Improved local interrupt support
> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>    - Per-HART MSI controller
>    - Support MSI virtualization
>    - Support IPI along with virtualization
> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>    - Wired interrupt controller
>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>    - In Direct-mode, injects external interrupts directly into HARTs
>
> For an overview of the AIA specification, refer the AIA virtualization
> talk at KVM Forum 2022:
> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> https://www.youtube.com/watch?v=r071dL8Z0yo
>
> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>
> These patches can also be found in the riscv_aia_v12 branch at:
> https://github.com/avpatel/linux.git
>
> Changes since v11:
>  - Rebased on Linux-6.8-rc1
>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>    MSI handling to per device MSI domains" series by Thomas.
>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>    platform devices.

Thanks for working on this, Anup! I'm still reviewing the patches.

I'm hitting a boot hang in text patching, with this series applied on
6.8-rc2. IPI issues?

I'm booting with U-boot UEFI.

kernel config:
https://gist.github.com/bjoto/bac563e6dcaab68dba1a5eaf675d51aa

QEMU 8.2.0/OpenSBI 1.4:
  | qemu-system-riscv64 \
  | 	-machine virt,acpi=off,aia=aplic-imsic \
  | 	-cpu rv64,v=true,vlen=256,elen=64,h=true,zbkb=on,zbkc=on,zbkx=on,zkr=on,zkt=on,svinval=on,svnapot=on,svpbmt=on \
  | 	-smp 4 \
  | 	-object rng-random,filename=/dev/urandom,id=rng0 \
  | 	-device virtio-rng-device,rng=rng0 \
  | 	-append "root=/dev/vda2 rw earlycon console=tty0 console=ttyS0 panic=-1 oops=panic sysctl.vm.panic_on_oom=1" \
  |     -m 4G \
  |     ...

Last lines from the kernel:
  | ...
  | goldfish_rtc 101000.rtc: registered as rtc0             
  | goldfish_rtc 101000.rtc: setting system clock to 2024-01-30T06:39:28 UTC (1706596768)              

Same kernel boots w/ "-machine virt,acpi=off" (AIA is *not* enabled).

Related or not, I got this splat (once) a ftrace kselftest:
  | # selftests: ftrace: ftracetest-ktap
  | Unable to handle kernel paging request at virtual address 5a5a5a5a5a5a5ac2
  | Oops [#1]
  | Modules linked in: drm fuse i2c_core drm_panel_orientation_quirks backlight dm_mod configfs ip_tables x_tables [last unloaded: trace_printk]
  | CPU: 2 PID: 19691 Comm: ls Tainted: G        W          6.8.0-rc2-kselftest_plain #1
  | Hardware name: riscv-virtio,qemu (DT)
  | epc : set_top_events_ownership+0x14/0x5c
  |  ra : eventfs_get_attr+0x2e/0x50
  | epc : ffffffff80533aa4 ra : ffffffff80533b1a sp : ff20000001cebc70
  |  gp : ffffffff8258b860 tp : ff6000008623e240 t0 : ffffffff80533a98
  |  t1 : ffffffff825b6b60 t2 : 0000000000000008 s0 : ff20000001cebc80
  |  s1 : ffffffff8233c000 a0 : ff6000009224e9b8 a1 : ff20000001cebd28
  |  a2 : ff20000001cebd98 a3 : 000000000000025e a4 : ffffffff80000000
  |  a5 : 5a5a5a5a5a5a5a5a a6 : 0000000000000000 a7 : 0000000000735049
  |  s2 : 000000000000025e s3 : ff20000001cebd98 s4 : ff6000009224e9b8
  |  s5 : ff20000001cebd28 s6 : ffffffffffffff9c s7 : ff6000008ac6a1c0
  |  s8 : 00007fff9f685d80 s9 : 0000000000000000 s10: 00007fffd4550ef0
  |  s11: 0000000000000000 t3 : 0000000000000001 t4 : 0000000000000016
  |  t5 : ffffffff818145be t6 : ff6000009233d77e
  | status: 0000000200000120 badaddr: 5a5a5a5a5a5a5ac2 cause: 000000000000000d
  | [<ffffffff80533aa4>] set_top_events_ownership+0x14/0x5c
  | Code: b297 ffad 82e7 d302 1141 e422 0800 3783 ff85 cb89 (57b8) 8b09 
  | ---[ end trace 0000000000000000 ]---

This might be unrelated, but the hang above is on every boot.


Björn
Björn Töpel Jan. 30, 2024, 7:52 a.m. UTC | #3
Björn Töpel <bjorn@kernel.org> writes:

> Anup Patel <apatel@ventanamicro.com> writes:
>
>> The RISC-V AIA specification is ratified as-per the RISC-V international
>> process. The latest ratified AIA specifcation can be found at:
>> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>>
>> At a high-level, the AIA specification adds three things:
>> 1) AIA CSRs
>>    - Improved local interrupt support
>> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>>    - Per-HART MSI controller
>>    - Support MSI virtualization
>>    - Support IPI along with virtualization
>> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>>    - Wired interrupt controller
>>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>>    - In Direct-mode, injects external interrupts directly into HARTs
>>
>> For an overview of the AIA specification, refer the AIA virtualization
>> talk at KVM Forum 2022:
>> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> https://www.youtube.com/watch?v=r071dL8Z0yo
>>
>> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>>
>> These patches can also be found in the riscv_aia_v12 branch at:
>> https://github.com/avpatel/linux.git
>>
>> Changes since v11:
>>  - Rebased on Linux-6.8-rc1
>>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>>    MSI handling to per device MSI domains" series by Thomas.
>>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>>    platform devices.
>
> Thanks for working on this, Anup! I'm still reviewing the patches.
>
> I'm hitting a boot hang in text patching, with this series applied on
> 6.8-rc2. IPI issues?

Not text patching! One cpu spinning in smp_call_function_many_cond() and
the others are in cpu_relax(). Smells like IPI...
Anup Patel Jan. 30, 2024, 10:02 a.m. UTC | #4
On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Björn Töpel <bjorn@kernel.org> writes:
>
> > Anup Patel <apatel@ventanamicro.com> writes:
> >
> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >> process. The latest ratified AIA specifcation can be found at:
> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>
> >> At a high-level, the AIA specification adds three things:
> >> 1) AIA CSRs
> >>    - Improved local interrupt support
> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >>    - Per-HART MSI controller
> >>    - Support MSI virtualization
> >>    - Support IPI along with virtualization
> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >>    - Wired interrupt controller
> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >>    - In Direct-mode, injects external interrupts directly into HARTs
> >>
> >> For an overview of the AIA specification, refer the AIA virtualization
> >> talk at KVM Forum 2022:
> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>
> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>
> >> These patches can also be found in the riscv_aia_v12 branch at:
> >> https://github.com/avpatel/linux.git
> >>
> >> Changes since v11:
> >>  - Rebased on Linux-6.8-rc1
> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>    MSI handling to per device MSI domains" series by Thomas.
> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>    platform devices.
> >
> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >
> > I'm hitting a boot hang in text patching, with this series applied on
> > 6.8-rc2. IPI issues?
>
> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> the others are in cpu_relax(). Smells like IPI...

Can you share the complete bootlog ?

Regards,
Anup
Anup Patel Jan. 30, 2024, 10:23 a.m. UTC | #5
On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Björn Töpel <bjorn@kernel.org> writes:
>
> > Anup Patel <apatel@ventanamicro.com> writes:
> >
> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >> process. The latest ratified AIA specifcation can be found at:
> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>
> >> At a high-level, the AIA specification adds three things:
> >> 1) AIA CSRs
> >>    - Improved local interrupt support
> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >>    - Per-HART MSI controller
> >>    - Support MSI virtualization
> >>    - Support IPI along with virtualization
> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >>    - Wired interrupt controller
> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >>    - In Direct-mode, injects external interrupts directly into HARTs
> >>
> >> For an overview of the AIA specification, refer the AIA virtualization
> >> talk at KVM Forum 2022:
> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>
> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>
> >> These patches can also be found in the riscv_aia_v12 branch at:
> >> https://github.com/avpatel/linux.git
> >>
> >> Changes since v11:
> >>  - Rebased on Linux-6.8-rc1
> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>    MSI handling to per device MSI domains" series by Thomas.
> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>    platform devices.
> >
> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >
> > I'm hitting a boot hang in text patching, with this series applied on
> > 6.8-rc2. IPI issues?
>
> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> the others are in cpu_relax(). Smells like IPI...

I tried bootefi from U-Boot multiple times but can't reproduce the
issue you are seeing.

Here's my boot log ...

$ qemu-system-riscv64 -M virt,aia=aplic-imsic -m 256M -display none
-serial stdio -bios
opensbi/build/platform/generic/firmware/fw_jump.bin -kernel
./u-boot/u-boot.bin -device
loader,file=./build-riscv64/arch/riscv/boot/Image,addr=0x84000000
-drive file=./rootfs_riscv64.ext2,format=raw,id=hd0 -device
virtio-blk-device,drive=hd0 -device virtio-net-device,netdev=eth0
-netdev user,id=eth0 -object rng-random,filename=/dev/urandom,id=rng0
-device virtio-rng-device,rng=rng0 -append "root=/dev/vda rootwait rw
console=ttyS0 earlycon" -smp 8

OpenSBI v1.4-8-gbb90a9e
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : riscv-virtio,qemu
Platform Features         : medeleg
Platform HART Count       : 8
Platform IPI Device       : aia-imsic
Platform Timer Device     : aclint-mtimer @ 10000000Hz
Platform Console Device   : uart8250
Platform HSM Device       : ---
Platform PMU Device       : ---
Platform Reboot Device    : syscon-reboot
Platform Shutdown Device  : syscon-poweroff
Platform Suspend Device   : ---
Platform CPPC Device      : ---
Firmware Base             : 0x80000000
Firmware Size             : 395 KB
Firmware RW Offset        : 0x40000
Firmware RW Size          : 139 KB
Firmware Heap Offset      : 0x56000
Firmware Heap Size        : 51 KB (total), 3 KB (reserved), 12 KB
(used), 36 KB (free)
Firmware Scratch Size     : 4096 B (total), 328 B (used), 3768 B (free)
Runtime SBI Version       : 2.0

Domain0 Name              : root
Domain0 Boot HART         : 7
Domain0 HARTs             : 0*,1*,2*,3*,4*,5*,6*,7*
Domain0 Region00          : 0x0000000000100000-0x0000000000100fff M:
(I,R,W) S/U: (R,W)
Domain0 Region01          : 0x0000000010000000-0x0000000010000fff M:
(I,R,W) S/U: (R,W)
Domain0 Region02          : 0x000000000c000000-0x000000000c007fff M:
(I,R,W) S/U: ()
Domain0 Region03          : 0x0000000024000000-0x0000000024007fff M:
(I,R,W) S/U: ()
Domain0 Region04          : 0x0000000002000000-0x000000000200ffff M:
(I,R,W) S/U: ()
Domain0 Region05          : 0x0000000080000000-0x000000008003ffff M:
(R,X) S/U: ()
Domain0 Region06          : 0x0000000080040000-0x000000008007ffff M:
(R,W) S/U: ()
Domain0 Region07          : 0x0000000000000000-0xffffffffffffffff M:
() S/U: (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x0000000082200000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes
Domain0 SysSuspend        : yes

Boot HART ID              : 7
Boot HART Domain          : root
Boot HART Priv Version    : v1.12
Boot HART Base ISA        : rv64imafdch
Boot HART ISA Extensions  : smaia,sstc,zicntr,zihpm,zicboz,zicbom,sdtrig
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 2 bits
Boot HART PMP Address Bits: 54
Boot HART MHPM Info       : 16 (0x0007fff8)
Boot HART Debug Triggers  : 2 triggers
Boot HART MIDELEG         : 0x0000000000001666
Boot HART MEDELEG         : 0x0000000000f0b509


U-Boot 2023.10 (Nov 07 2023 - 18:28:29 +0530)

CPU:   rv64imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_smaia_ssaia_sstc_svadu
Model: riscv-virtio,qemu
DRAM:  256 MiB
Core:  37 devices, 16 uclasses, devicetree: board
Flash: 32 MiB
Loading Environment from nowhere... OK
In:    serial,usbkbd
Out:   serial,vidconsole
Err:   serial,vidconsole
No working controllers found
Net:   eth0: virtio-net#1
Working FDT set to 8ef1f870
Hit any key to stop autoboot:  0
=> bootefi ${kernel_addr_r}:0x1600000 ${fdtcontroladdr}
No EFI system partition
No EFI system partition
Failed to persist EFI variables
Booting /MemoryMapped(0x0,0x84000000,0x1600000)
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
[    0.000000] Linux version 6.8.0-rc1-00039-gd9b9d6eb987f
(anup@anup-ubuntu-vm) (riscv64-unknown-linux-gnu-gcc (g2ee5e430018)
12.2.0, GNU ld (GNU Binutils) 2.40.0.20230214) #67 SMP Sat Jan 27
17:20:09 IST 2024
[    0.000000] random: crng init done
[    0.000000] Machine model: riscv-virtio,qemu
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x1 Version=0x10004
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI DBCN extension detected
[    0.000000] earlycon: ns16550a0 at MMIO 0x0000000010000000 (options '')
[    0.000000] printk: legacy bootconsole [ns16550a0] enabled
[    0.000000] efi: EFI v2.10 by Das U-Boot
[    0.000000] efi: RTPROP=0x8df05040 SMBIOS=0x8df01000 RNG=0x8c972040
MEMRESERVE=0x8c971040
[    0.000000] OF: reserved mem:
0x0000000080000000..0x000000008003ffff (256 KiB) nomap non-reusable
mmode_resv0@80000000
[    0.000000] OF: reserved mem:
0x0000000080040000..0x000000008007ffff (256 KiB) nomap non-reusable
mmode_resv1@80040000
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080000000-0x000000008fffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x000000008007ffff]
[    0.000000]   node   0: [mem 0x0000000080080000-0x000000008df00fff]
[    0.000000]   node   0: [mem 0x000000008df01000-0x000000008df01fff]
[    0.000000]   node   0: [mem 0x000000008df02000-0x000000008df04fff]
[    0.000000]   node   0: [mem 0x000000008df05000-0x000000008df05fff]
[    0.000000]   node   0: [mem 0x000000008df06000-0x000000008df06fff]
[    0.000000]   node   0: [mem 0x000000008df07000-0x000000008df08fff]
[    0.000000]   node   0: [mem 0x000000008df09000-0x000000008df09fff]
[    0.000000]   node   0: [mem 0x000000008df0a000-0x000000008df19fff]
[    0.000000]   node   0: [mem 0x000000008df1a000-0x000000008f741fff]
[    0.000000]   node   0: [mem 0x000000008f742000-0x000000008f742fff]
[    0.000000]   node   0: [mem 0x000000008f743000-0x000000008fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff]
[    0.000000] SBI HSM extension detected
[    0.000000] Falling back to deprecated "riscv,isa"
[    0.000000] riscv: base ISA extensions acdfhim
[    0.000000] riscv: ELF capabilities acdfim
[    0.000000] percpu: Embedded 20 pages/cpu s41464 r8192 d32264 u81920
[    0.000000] Kernel command line: root=/dev/vda rootwait rw
console=ttyS0 earlycon
[    0.000000] Dentry cache hash table entries: 32768 (order: 6,
262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072
bytes, linear)
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64512
[    0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[    0.000000] Virtual kernel memory layout:
[    0.000000]       fixmap : 0xff1bfffffea00000 - 0xff1bffffff000000
 (6144 kB)
[    0.000000]       pci io : 0xff1bffffff000000 - 0xff1c000000000000
 (  16 MB)
[    0.000000]      vmemmap : 0xff1c000000000000 - 0xff20000000000000
 (1024 TB)
[    0.000000]      vmalloc : 0xff20000000000000 - 0xff60000000000000
 (16384 TB)
[    0.000000]      modules : 0xffffffff01582000 - 0xffffffff80000000
 (2026 MB)
[    0.000000]       lowmem : 0xff60000000000000 - 0xff60000010000000
 ( 256 MB)
[    0.000000]       kernel : 0xffffffff80000000 - 0xffffffffffffffff
 (2047 MB)
[    0.000000] Memory: 217364K/262144K available (9190K kernel code,
4939K rwdata, 4096K rodata, 2252K init, 489K bss, 44780K reserved, 0K
cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8.
[    0.000000] rcu:     RCU debug extended QS entry/exit.
[    0.000000]     Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 64 local interrupts mapped using AIA
[    0.000000] riscv-imsic: imsics@28000000: providing IPIs using interrupt 1
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] clocksource: riscv_clocksource: mask:
0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120
ns
[    0.000087] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps
every 4398046511100ns
[    0.001406] riscv-timer: Timer interrupt in S-mode is available via
sstc extension
[    0.007310] Console: colour dummy device 80x25
[    0.014343] Calibrating delay loop (skipped), value calculated
using timer frequency.. 20.00 BogoMIPS (lpj=40000)
[    0.018315] pid_max: default: 32768 minimum: 301
[    0.020982] LSM: initializing lsm=capability,integrity
[    0.023969] Mount-cache hash table entries: 512 (order: 0, 4096
bytes, linear)
[    0.025231] Mountpoint-cache hash table entries: 512 (order: 0,
4096 bytes, linear)
[    0.066845] RCU Tasks Trace: Setting shift to 3 and lim to 1
rcu_task_cb_adjust=1.
[    0.068829] riscv: ELF compat mode supported
[    0.069115] ASID allocator using 16 bits (65536 entries)
[    0.080952] rcu: Hierarchical SRCU implementation.
[    0.081712] rcu:     Max phase no-delay instances is 1000.
[    0.086381] Remapping and enabling EFI services.
[    0.093736] smp: Bringing up secondary CPUs ...
[    0.162264] smp: Brought up 1 node, 8 CPUs
[    0.186107] devtmpfs: initialized
[    0.199725] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.200634] futex hash table entries: 2048 (order: 5, 131072 bytes, linear)
[    0.203482] pinctrl core: initialized pinctrl subsystem
[    0.213664] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.218255] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[    0.221185] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for
atomic allocations
[    0.222099] audit: initializing netlink subsys (disabled)
[    0.228028] audit: type=2000 audit(0.168:1): state=initialized
audit_enabled=0 res=1
[    0.230906] cpuidle: using governor menu
[    0.289647] cpu2: Ratio of byte access time to unaligned word
access is 7.20, unaligned accesses are fast
[    0.289661] cpu3: Ratio of byte access time to unaligned word
access is 5.94, unaligned accesses are fast
[    0.289652] cpu4: Ratio of byte access time to unaligned word
access is 7.13, unaligned accesses are fast
[    0.289625] cpu6: Ratio of byte access time to unaligned word
access is 10.28, unaligned accesses are fast
[    0.289615] cpu1: Ratio of byte access time to unaligned word
access is 8.04, unaligned accesses are fast
[    0.290252] cpu5: Ratio of byte access time to unaligned word
access is 7.13, unaligned accesses are fast
[    0.299499] cpu7: Ratio of byte access time to unaligned word
access is 6.58, unaligned accesses are fast
[    0.326695] cpu0: Ratio of byte access time to unaligned word
access is 7.78, unaligned accesses are fast
[    0.354371] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    0.354767] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[    0.361699] ACPI: Interpreter disabled.
[    0.363441] iommu: Default domain type: Translated
[    0.364215] iommu: DMA domain TLB invalidation policy: strict mode
[    0.368128] SCSI subsystem initialized
[    0.371067] usbcore: registered new interface driver usbfs
[    0.371912] usbcore: registered new interface driver hub
[    0.372389] usbcore: registered new device driver usb
[    0.375075] efivars: Registered efivars operations
[    0.389652] vgaarb: loaded
[    0.443368] clocksource: Switched to clocksource riscv_clocksource
[    0.449125] pnp: PnP ACPI: disabled
[    0.499449] NET: Registered PF_INET protocol family
[    0.500979] IP idents hash table entries: 4096 (order: 3, 32768
bytes, linear)
[    0.507062] tcp_listen_portaddr_hash hash table entries: 128
(order: 0, 4096 bytes, linear)
[    0.507775] Table-perturb hash table entries: 65536 (order: 6,
262144 bytes, linear)
[    0.508351] TCP established hash table entries: 2048 (order: 2,
16384 bytes, linear)
[    0.508930] TCP bind hash table entries: 2048 (order: 5, 131072
bytes, linear)
[    0.509942] TCP: Hash tables configured (established 2048 bind 2048)
[    0.511459] UDP hash table entries: 256 (order: 2, 24576 bytes, linear)
[    0.512262] UDP-Lite hash table entries: 256 (order: 2, 24576 bytes, linear)
[    0.514937] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.521225] RPC: Registered named UNIX socket transport module.
[    0.521913] RPC: Registered udp transport module.
[    0.522324] RPC: Registered tcp transport module.
[    0.522656] RPC: Registered tcp-with-tls transport module.
[    0.523178] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.523787] PCI: CLS 0 bytes, default 64
[    0.529358] workingset: timestamp_bits=46 max_order=16 bucket_order=0
[    0.537946] NFS: Registering the id_resolver key type
[    0.539478] Key type id_resolver registered
[    0.539918] Key type id_legacy registered
[    0.540656] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.542911] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver
Registering...
[    0.544894] 9p: Installing v9fs 9p2000 file system support
[    0.548517] NET: Registered PF_ALG protocol family
[    0.549459] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 245)
[    0.550658] io scheduler mq-deadline registered
[    0.552112] io scheduler kyber registered
[    0.552442] io scheduler bfq registered
[    0.556517] riscv-imsic: imsics@28000000:  hart-index-bits: 3,
guest-index-bits: 0
[    0.556955] riscv-imsic: imsics@28000000: group-index-bits: 0,
group-index-shift: 24
[    0.557403] riscv-imsic: imsics@28000000: per-CPU IDs 255 at base
PPN 0x0000000028000000
[    0.557699] riscv-imsic: imsics@28000000: total 2032 interrupts available
[    0.561962] pci-host-generic 30000000.pci: host bridge
/soc/pci@30000000 ranges:
[    0.563422] pci-host-generic 30000000.pci:       IO
0x0003000000..0x000300ffff -> 0x0000000000
[    0.564475] pci-host-generic 30000000.pci:      MEM
0x0040000000..0x007fffffff -> 0x0040000000
[    0.565013] pci-host-generic 30000000.pci:      MEM
0x0400000000..0x07ffffffff -> 0x0400000000
[    0.566349] pci-host-generic 30000000.pci: Memory resource size
exceeds max for 32 bits
[    0.567633] pci-host-generic 30000000.pci: ECAM at [mem
0x30000000-0x3fffffff] for [bus 00-ff]
[    0.569300] pci-host-generic 30000000.pci: PCI host bridge to bus 0000:00
[    0.570172] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.570559] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[    0.570969] pci_bus 0000:00: root bus resource [mem 0x40000000-0x7fffffff]
[    0.571595] pci_bus 0000:00: root bus resource [mem 0x400000000-0x7ffffffff]
[    0.573646] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
conventional PCI endpoint
[    0.654069] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.659475] SuperH (H)SCI(F) driver initialized
[    0.675004] loop: module loaded
[    0.680024] e1000e: Intel(R) PRO/1000 Network Driver
[    0.680162] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    0.684590] usbcore: registered new interface driver uas
[    0.685245] usbcore: registered new interface driver usb-storage
[    0.686530] mousedev: PS/2 mouse device common for all mice
[    0.693125] syscon-poweroff poweroff: pm_power_off already claimed
for sbi_srst_power_off
[    0.694774] syscon-poweroff: probe of poweroff failed with error -16
[    0.698092] sdhci: Secure Digital Host Controller Interface driver
[    0.698484] sdhci: Copyright(c) Pierre Ossman
[    0.699333] Synopsys Designware Multimedia Card Interface Driver
[    0.699869] sdhci-pltfm: SDHCI platform and OF driver helper
[    0.700673] usbcore: registered new interface driver usbhid
[    0.700920] usbhid: USB HID core driver
[    0.701501] riscv-pmu-sbi: SBI PMU extension is available
[    0.702263] riscv-pmu-sbi: 16 firmware and 18 hardware counters
[    0.702618] riscv-pmu-sbi: Perf sampling/filtering is not supported
as sscof extension is not available
[    0.709934] NET: Registered PF_INET6 protocol family
[    0.723647] Segment Routing with IPv6
[    0.724210] In-situ OAM (IOAM) with IPv6
[    0.724882] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.727936] NET: Registered PF_PACKET protocol family
[    0.729289] 9pnet: Installing 9P2000 support
[    0.729796] Key type dns_resolver registered
[    0.763394] debug_vm_pgtable: [debug_vm_pgtable         ]:
Validating architecture page table helpers
[    0.772054] riscv-aplic d000000.aplic: 96 interrupts forwared to
MSI base 0x0000000028000000
[    0.775578] virtio_blk virtio0: 1/0/0 default/read/poll queues
[    0.782792] virtio_blk virtio0: [vda] 65536 512-byte logical blocks
(33.6 MB/32.0 MiB)
[    0.827635] printk: legacy console [ttyS0] disabled
[    0.830784] 10000000.serial: ttyS0 at MMIO 0x10000000 (irq = 14,
base_baud = 230400) is a 16550A
[    0.833076] printk: legacy console [ttyS0] enabled
[    0.833076] printk: legacy console [ttyS0] enabled
[    0.833856] printk: legacy bootconsole [ns16550a0] disabled
[    0.833856] printk: legacy bootconsole [ns16550a0] disabled
[    0.843499] goldfish_rtc 101000.rtc: registered as rtc0
[    0.844980] goldfish_rtc 101000.rtc: setting system clock to
2024-01-30T10:19:33 UTC (1706609973)
[    0.848495] clk: Disabling unused clocks
[    0.884046] EXT4-fs (vda): warning: mounting unchecked fs, running
e2fsck is recommended
[    0.891369] EXT4-fs (vda): mounted filesystem
00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode:
disabled.
[    0.892199] ext4 filesystem being mounted at /root supports
timestamps until 2038-01-19 (0x7fffffff)
[    0.892644] VFS: Mounted root (ext4 filesystem) on device 254:0.
[    0.895564] devtmpfs: mounted
[    0.986847] Freeing unused kernel image (initmem) memory: 2252K
[    0.988406] Run /sbin/init as init process
mount: mounting devtmpfs on /dev failed: Device or resource busy
           _  _
          | ||_|
          | | _ ____  _   _  _  _
          | || |  _ \| | | |\ \/ /
          | || | | | | |_| |/    \
          |_||_|_| |_|\____|\_/\_/

               Busybox Rootfs

Please press Enter to activate this console.
/ #
/ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7
 10:        103        116         58        214         96         47
        78         52  RISC-V INTC   5 Edge      riscv-timer
 11:          0         44          0          0          0          0
         0          0  APLIC-MSI-d000000.aplic   8 Level     virtio0
 12:          0          0          0          0          0          0
         0          0  APLIC-MSI-d000000.aplic   7 Level     virtio1
 13:          0          0          0          6          0          0
         0          0  APLIC-MSI-d000000.aplic   6 Level     virtio2
 14:          0          0          0          0         64          0
         0          0  APLIC-MSI-d000000.aplic  10 Level     ttyS0
 15:          0          0          0          0          0          0
         0          0  APLIC-MSI-d000000.aplic  11 Level
101000.rtc
IPI0:         4          9         12          6          5         10
        13          7  Rescheduling interrupts
IPI1:       605        477        442        315        392        434
       405        417  Function call interrupts
IPI2:         0          0          0          0          0          0
         0          0  CPU stop interrupts
IPI3:         0          0          0          0          0          0
         0          0  CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0          0          0
         0          0  IRQ work interrupts
IPI5:         0          0          0          0          0          0
         0          0  Timer broadcast interrupts
/ #
/ #
/ # poweroff
/ # umount: devtmpfs busy - remounted read-only
[   24.504316] EXT4-fs (vda): re-mounted
00000000-0000-0000-0000-000000000000 ro. Quota mode: disabled.
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to[   26.543142] reboot: Power down


Regards,
Anup
Björn Töpel Jan. 30, 2024, 11:05 a.m. UTC | #6
Anup Patel <apatel@ventanamicro.com> writes:

> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Björn Töpel <bjorn@kernel.org> writes:
>>
>> > Anup Patel <apatel@ventanamicro.com> writes:
>> >
>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> process. The latest ratified AIA specifcation can be found at:
>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >>
>> >> At a high-level, the AIA specification adds three things:
>> >> 1) AIA CSRs
>> >>    - Improved local interrupt support
>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >>    - Per-HART MSI controller
>> >>    - Support MSI virtualization
>> >>    - Support IPI along with virtualization
>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >>    - Wired interrupt controller
>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >>    - In Direct-mode, injects external interrupts directly into HARTs
>> >>
>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >> talk at KVM Forum 2022:
>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >>
>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >>
>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >> https://github.com/avpatel/linux.git
>> >>
>> >> Changes since v11:
>> >>  - Rebased on Linux-6.8-rc1
>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >>    MSI handling to per device MSI domains" series by Thomas.
>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >>    platform devices.
>> >
>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >
>> > I'm hitting a boot hang in text patching, with this series applied on
>> > 6.8-rc2. IPI issues?
>>
>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> the others are in cpu_relax(). Smells like IPI...
>
> Can you share the complete bootlog ?

Here: https://gist.github.com/bjoto/04a580568378f3b5483af07cd9d22501
Björn Töpel Jan. 30, 2024, 11:46 a.m. UTC | #7
Anup Patel <apatel@ventanamicro.com> writes:

> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Björn Töpel <bjorn@kernel.org> writes:
>>
>> > Anup Patel <apatel@ventanamicro.com> writes:
>> >
>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> process. The latest ratified AIA specifcation can be found at:
>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >>
>> >> At a high-level, the AIA specification adds three things:
>> >> 1) AIA CSRs
>> >>    - Improved local interrupt support
>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >>    - Per-HART MSI controller
>> >>    - Support MSI virtualization
>> >>    - Support IPI along with virtualization
>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >>    - Wired interrupt controller
>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >>    - In Direct-mode, injects external interrupts directly into HARTs
>> >>
>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >> talk at KVM Forum 2022:
>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >>
>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >>
>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >> https://github.com/avpatel/linux.git
>> >>
>> >> Changes since v11:
>> >>  - Rebased on Linux-6.8-rc1
>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >>    MSI handling to per device MSI domains" series by Thomas.
>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >>    platform devices.
>> >
>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >
>> > I'm hitting a boot hang in text patching, with this series applied on
>> > 6.8-rc2. IPI issues?
>>
>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> the others are in cpu_relax(). Smells like IPI...
>
> I tried bootefi from U-Boot multiple times but can't reproduce the
> issue you are seeing.

Thanks! I can reproduce without EFI, and simpler command-line:

qemu-system-riscv64 \
  -bios /path/to/fw_dynamic.bin \
  -kernel /path/to/Image \
  -append 'earlycon console=tty0 console=ttyS0' \
  -machine virt,aia=aplic-imsic \
  -no-reboot -nodefaults -nographic \
  -smp 4 \
  -object rng-random,filename=/dev/urandom,id=rng0 \
  -device virtio-rng-device,rng=rng0 \
  -m 4G -chardev stdio,id=char0 -serial chardev:char0

I can reproduce with your upstream riscv_aia_v12 plus the config in the
gist [1], and all latest QEMU/OpenSBI:

QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")

Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
panicking about missing root mount ;-))


Björn

[1] https://gist.githubusercontent.com/bjoto/bac563e6dcaab68dba1a5eaf675d51aa/raw/ff6208fb17f27819dbe97ace7d034f385d2db657/gistfile1.txt
Björn Töpel Jan. 30, 2024, 2:48 p.m. UTC | #8
Björn Töpel <bjorn@kernel.org> writes:

> Anup Patel <apatel@ventanamicro.com> writes:
>
>> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>>>
>>> Björn Töpel <bjorn@kernel.org> writes:
>>>
>>> > Anup Patel <apatel@ventanamicro.com> writes:
>>> >
>>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>>> >> process. The latest ratified AIA specifcation can be found at:
>>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>>> >>
>>> >> At a high-level, the AIA specification adds three things:
>>> >> 1) AIA CSRs
>>> >>    - Improved local interrupt support
>>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>>> >>    - Per-HART MSI controller
>>> >>    - Support MSI virtualization
>>> >>    - Support IPI along with virtualization
>>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>>> >>    - Wired interrupt controller
>>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>>> >>    - In Direct-mode, injects external interrupts directly into HARTs
>>> >>
>>> >> For an overview of the AIA specification, refer the AIA virtualization
>>> >> talk at KVM Forum 2022:
>>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>>> >>
>>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>>> >>
>>> >> These patches can also be found in the riscv_aia_v12 branch at:
>>> >> https://github.com/avpatel/linux.git
>>> >>
>>> >> Changes since v11:
>>> >>  - Rebased on Linux-6.8-rc1
>>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>>> >>    MSI handling to per device MSI domains" series by Thomas.
>>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>>> >>    platform devices.
>>> >
>>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>>> >
>>> > I'm hitting a boot hang in text patching, with this series applied on
>>> > 6.8-rc2. IPI issues?
>>>
>>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>>> the others are in cpu_relax(). Smells like IPI...
>>
>> I tried bootefi from U-Boot multiple times but can't reproduce the
>> issue you are seeing.
>
> Thanks! I can reproduce without EFI, and simpler command-line:
>
> qemu-system-riscv64 \
>   -bios /path/to/fw_dynamic.bin \
>   -kernel /path/to/Image \
>   -append 'earlycon console=tty0 console=ttyS0' \
>   -machine virt,aia=aplic-imsic \
>   -no-reboot -nodefaults -nographic \
>   -smp 4 \
>   -object rng-random,filename=/dev/urandom,id=rng0 \
>   -device virtio-rng-device,rng=rng0 \
>   -m 4G -chardev stdio,id=char0 -serial chardev:char0
>
> I can reproduce with your upstream riscv_aia_v12 plus the config in the
> gist [1], and all latest QEMU/OpenSBI:
>
> QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
>
> Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> panicking about missing root mount ;-))

More context; The hang is during a late initcall, where an ftrace direct
(register_ftrace_direct()) modification is done.

Stop machine is used to call into __ftrace_modify_call(). Then into the
arch specific patch_text_nosync(), where flush_icache_range() hangs in
flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
right before the end of the function.

Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
patching fixes the boot hang, but it does seem related to IPI...
Anup Patel Jan. 30, 2024, 3:19 p.m. UTC | #9
On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Björn Töpel <bjorn@kernel.org> writes:
>
> > Anup Patel <apatel@ventanamicro.com> writes:
> >
> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
> >>>
> >>> Björn Töpel <bjorn@kernel.org> writes:
> >>>
> >>> > Anup Patel <apatel@ventanamicro.com> writes:
> >>> >
> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >>> >> process. The latest ratified AIA specifcation can be found at:
> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>> >>
> >>> >> At a high-level, the AIA specification adds three things:
> >>> >> 1) AIA CSRs
> >>> >>    - Improved local interrupt support
> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >>> >>    - Per-HART MSI controller
> >>> >>    - Support MSI virtualization
> >>> >>    - Support IPI along with virtualization
> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >>> >>    - Wired interrupt controller
> >>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >>> >>    - In Direct-mode, injects external interrupts directly into HARTs
> >>> >>
> >>> >> For an overview of the AIA specification, refer the AIA virtualization
> >>> >> talk at KVM Forum 2022:
> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>> >>
> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>> >>
> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
> >>> >> https://github.com/avpatel/linux.git
> >>> >>
> >>> >> Changes since v11:
> >>> >>  - Rebased on Linux-6.8-rc1
> >>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>> >>    MSI handling to per device MSI domains" series by Thomas.
> >>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
> >>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>> >>    platform devices.
> >>> >
> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >>> >
> >>> > I'm hitting a boot hang in text patching, with this series applied on
> >>> > 6.8-rc2. IPI issues?
> >>>
> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> >>> the others are in cpu_relax(). Smells like IPI...
> >>
> >> I tried bootefi from U-Boot multiple times but can't reproduce the
> >> issue you are seeing.
> >
> > Thanks! I can reproduce without EFI, and simpler command-line:
> >
> > qemu-system-riscv64 \
> >   -bios /path/to/fw_dynamic.bin \
> >   -kernel /path/to/Image \
> >   -append 'earlycon console=tty0 console=ttyS0' \
> >   -machine virt,aia=aplic-imsic \
> >   -no-reboot -nodefaults -nographic \
> >   -smp 4 \
> >   -object rng-random,filename=/dev/urandom,id=rng0 \
> >   -device virtio-rng-device,rng=rng0 \
> >   -m 4G -chardev stdio,id=char0 -serial chardev:char0
> >
> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
> > gist [1], and all latest QEMU/OpenSBI:
> >
> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
> >
> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> > panicking about missing root mount ;-))
>
> More context; The hang is during a late initcall, where an ftrace direct
> (register_ftrace_direct()) modification is done.
>
> Stop machine is used to call into __ftrace_modify_call(). Then into the
> arch specific patch_text_nosync(), where flush_icache_range() hangs in
> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
> right before the end of the function.
>
> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
> patching fixes the boot hang, but it does seem related to IPI...

Thanks for the details, I will debug more at my end.

Regards,
Anup
Anup Patel Jan. 30, 2024, 3:48 p.m. UTC | #10
On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Björn Töpel <bjorn@kernel.org> writes:
>
> > Anup Patel <apatel@ventanamicro.com> writes:
> >
> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
> >>>
> >>> Björn Töpel <bjorn@kernel.org> writes:
> >>>
> >>> > Anup Patel <apatel@ventanamicro.com> writes:
> >>> >
> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >>> >> process. The latest ratified AIA specifcation can be found at:
> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >>> >>
> >>> >> At a high-level, the AIA specification adds three things:
> >>> >> 1) AIA CSRs
> >>> >>    - Improved local interrupt support
> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >>> >>    - Per-HART MSI controller
> >>> >>    - Support MSI virtualization
> >>> >>    - Support IPI along with virtualization
> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >>> >>    - Wired interrupt controller
> >>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >>> >>    - In Direct-mode, injects external interrupts directly into HARTs
> >>> >>
> >>> >> For an overview of the AIA specification, refer the AIA virtualization
> >>> >> talk at KVM Forum 2022:
> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >>> >>
> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >>> >>
> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
> >>> >> https://github.com/avpatel/linux.git
> >>> >>
> >>> >> Changes since v11:
> >>> >>  - Rebased on Linux-6.8-rc1
> >>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>> >>    MSI handling to per device MSI domains" series by Thomas.
> >>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
> >>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>> >>    platform devices.
> >>> >
> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >>> >
> >>> > I'm hitting a boot hang in text patching, with this series applied on
> >>> > 6.8-rc2. IPI issues?
> >>>
> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> >>> the others are in cpu_relax(). Smells like IPI...
> >>
> >> I tried bootefi from U-Boot multiple times but can't reproduce the
> >> issue you are seeing.
> >
> > Thanks! I can reproduce without EFI, and simpler command-line:
> >
> > qemu-system-riscv64 \
> >   -bios /path/to/fw_dynamic.bin \
> >   -kernel /path/to/Image \
> >   -append 'earlycon console=tty0 console=ttyS0' \
> >   -machine virt,aia=aplic-imsic \
> >   -no-reboot -nodefaults -nographic \
> >   -smp 4 \
> >   -object rng-random,filename=/dev/urandom,id=rng0 \
> >   -device virtio-rng-device,rng=rng0 \
> >   -m 4G -chardev stdio,id=char0 -serial chardev:char0
> >
> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
> > gist [1], and all latest QEMU/OpenSBI:
> >
> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
> >
> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> > panicking about missing root mount ;-))
>
> More context; The hang is during a late initcall, where an ftrace direct
> (register_ftrace_direct()) modification is done.
>
> Stop machine is used to call into __ftrace_modify_call(). Then into the
> arch specific patch_text_nosync(), where flush_icache_range() hangs in
> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
> right before the end of the function.
>
> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
> patching fixes the boot hang, but it does seem related to IPI...
>
Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
and friends) correctly.

On other hand, the flush_icache_mm() does the right thing by
doing local flush on the current CPU and IPI based flush on other
CPUs.

Can you try the following patch ?

diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 55a34f2020a8..a3dfbe4de832 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)

 void flush_icache_all(void)
 {
+    cpumask_t others;
+
     local_flush_icache_all();

+    cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
+    if (cpumask_empty(&others))
+        return;
+
     if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
-        sbi_remote_fence_i(NULL);
+        sbi_remote_fence_i(&others);
     else
-        on_each_cpu(ipi_remote_fence_i, NULL, 1);
+        on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
 }
 EXPORT_SYMBOL(flush_icache_all);


Regards,
Anup
Björn Töpel Jan. 30, 2024, 5:49 p.m. UTC | #11
Anup Patel <apatel@ventanamicro.com> writes:

> On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Björn Töpel <bjorn@kernel.org> writes:
>>
>> > Anup Patel <apatel@ventanamicro.com> writes:
>> >
>> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>> >>>
>> >>> Björn Töpel <bjorn@kernel.org> writes:
>> >>>
>> >>> > Anup Patel <apatel@ventanamicro.com> writes:
>> >>> >
>> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >>> >> process. The latest ratified AIA specifcation can be found at:
>> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >>> >>
>> >>> >> At a high-level, the AIA specification adds three things:
>> >>> >> 1) AIA CSRs
>> >>> >>    - Improved local interrupt support
>> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >>> >>    - Per-HART MSI controller
>> >>> >>    - Support MSI virtualization
>> >>> >>    - Support IPI along with virtualization
>> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >>> >>    - Wired interrupt controller
>> >>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >>> >>    - In Direct-mode, injects external interrupts directly into HARTs
>> >>> >>
>> >>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >>> >> talk at KVM Forum 2022:
>> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >>> >>
>> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >>> >>
>> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >>> >> https://github.com/avpatel/linux.git
>> >>> >>
>> >>> >> Changes since v11:
>> >>> >>  - Rebased on Linux-6.8-rc1
>> >>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >>> >>    MSI handling to per device MSI domains" series by Thomas.
>> >>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>> >>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >>> >>    platform devices.
>> >>> >
>> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >>> >
>> >>> > I'm hitting a boot hang in text patching, with this series applied on
>> >>> > 6.8-rc2. IPI issues?
>> >>>
>> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> >>> the others are in cpu_relax(). Smells like IPI...
>> >>
>> >> I tried bootefi from U-Boot multiple times but can't reproduce the
>> >> issue you are seeing.
>> >
>> > Thanks! I can reproduce without EFI, and simpler command-line:
>> >
>> > qemu-system-riscv64 \
>> >   -bios /path/to/fw_dynamic.bin \
>> >   -kernel /path/to/Image \
>> >   -append 'earlycon console=tty0 console=ttyS0' \
>> >   -machine virt,aia=aplic-imsic \
>> >   -no-reboot -nodefaults -nographic \
>> >   -smp 4 \
>> >   -object rng-random,filename=/dev/urandom,id=rng0 \
>> >   -device virtio-rng-device,rng=rng0 \
>> >   -m 4G -chardev stdio,id=char0 -serial chardev:char0
>> >
>> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
>> > gist [1], and all latest QEMU/OpenSBI:
>> >
>> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
>> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
>> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
>> >
>> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
>> > panicking about missing root mount ;-))
>>
>> More context; The hang is during a late initcall, where an ftrace direct
>> (register_ftrace_direct()) modification is done.
>>
>> Stop machine is used to call into __ftrace_modify_call(). Then into the
>> arch specific patch_text_nosync(), where flush_icache_range() hangs in
>> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
>> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
>> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
>> right before the end of the function.
>>
>> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
>> patching fixes the boot hang, but it does seem related to IPI...
>>
> Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
> and friends) correctly.
>
> On other hand, the flush_icache_mm() does the right thing by
> doing local flush on the current CPU and IPI based flush on other
> CPUs.
>
> Can you try the following patch ?
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index 55a34f2020a8..a3dfbe4de832 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
>
>  void flush_icache_all(void)
>  {
> +    cpumask_t others;
> +
>      local_flush_icache_all();
>
> +    cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
> +    if (cpumask_empty(&others))
> +        return;
> +
>      if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
> -        sbi_remote_fence_i(NULL);
> +        sbi_remote_fence_i(&others);
>      else
> -        on_each_cpu(ipi_remote_fence_i, NULL, 1);
> +        on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
>  }
>  EXPORT_SYMBOL(flush_icache_all);

Unfortunately, I see the same hang. LMK if you'd like me to try anything
else.


Björn
Anup Patel Feb. 1, 2024, 3:07 p.m. UTC | #12
On Tue, Jan 30, 2024 at 11:19 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Anup Patel <apatel@ventanamicro.com> writes:
>
> > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote:
> >>
> >> Björn Töpel <bjorn@kernel.org> writes:
> >>
> >> > Anup Patel <apatel@ventanamicro.com> writes:
> >> >
> >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
> >> >>>
> >> >>> Björn Töpel <bjorn@kernel.org> writes:
> >> >>>
> >> >>> > Anup Patel <apatel@ventanamicro.com> writes:
> >> >>> >
> >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
> >> >>> >> process. The latest ratified AIA specifcation can be found at:
> >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >> >>> >>
> >> >>> >> At a high-level, the AIA specification adds three things:
> >> >>> >> 1) AIA CSRs
> >> >>> >>    - Improved local interrupt support
> >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >> >>> >>    - Per-HART MSI controller
> >> >>> >>    - Support MSI virtualization
> >> >>> >>    - Support IPI along with virtualization
> >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >> >>> >>    - Wired interrupt controller
> >> >>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >> >>> >>    - In Direct-mode, injects external interrupts directly into HARTs
> >> >>> >>
> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization
> >> >>> >> talk at KVM Forum 2022:
> >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
> >> >>> >>
> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
> >> >>> >>
> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
> >> >>> >> https://github.com/avpatel/linux.git
> >> >>> >>
> >> >>> >> Changes since v11:
> >> >>> >>  - Rebased on Linux-6.8-rc1
> >> >>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >> >>> >>    MSI handling to per device MSI domains" series by Thomas.
> >> >>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >> >>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >> >>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
> >> >>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >> >>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
> >> >>> >>    platform devices.
> >> >>> >
> >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
> >> >>> >
> >> >>> > I'm hitting a boot hang in text patching, with this series applied on
> >> >>> > 6.8-rc2. IPI issues?
> >> >>>
> >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
> >> >>> the others are in cpu_relax(). Smells like IPI...
> >> >>
> >> >> I tried bootefi from U-Boot multiple times but can't reproduce the
> >> >> issue you are seeing.
> >> >
> >> > Thanks! I can reproduce without EFI, and simpler command-line:
> >> >
> >> > qemu-system-riscv64 \
> >> >   -bios /path/to/fw_dynamic.bin \
> >> >   -kernel /path/to/Image \
> >> >   -append 'earlycon console=tty0 console=ttyS0' \
> >> >   -machine virt,aia=aplic-imsic \
> >> >   -no-reboot -nodefaults -nographic \
> >> >   -smp 4 \
> >> >   -object rng-random,filename=/dev/urandom,id=rng0 \
> >> >   -device virtio-rng-device,rng=rng0 \
> >> >   -m 4G -chardev stdio,id=char0 -serial chardev:char0
> >> >
> >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
> >> > gist [1], and all latest QEMU/OpenSBI:
> >> >
> >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
> >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
> >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
> >> >
> >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
> >> > panicking about missing root mount ;-))
> >>
> >> More context; The hang is during a late initcall, where an ftrace direct
> >> (register_ftrace_direct()) modification is done.
> >>
> >> Stop machine is used to call into __ftrace_modify_call(). Then into the
> >> arch specific patch_text_nosync(), where flush_icache_range() hangs in
> >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
> >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
> >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
> >> right before the end of the function.
> >>
> >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
> >> patching fixes the boot hang, but it does seem related to IPI...
> >>
> > Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
> > and friends) correctly.
> >
> > On other hand, the flush_icache_mm() does the right thing by
> > doing local flush on the current CPU and IPI based flush on other
> > CPUs.
> >
> > Can you try the following patch ?
> >
> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> > index 55a34f2020a8..a3dfbe4de832 100644
> > --- a/arch/riscv/mm/cacheflush.c
> > +++ b/arch/riscv/mm/cacheflush.c
> > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
> >
> >  void flush_icache_all(void)
> >  {
> > +    cpumask_t others;
> > +
> >      local_flush_icache_all();
> >
> > +    cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
> > +    if (cpumask_empty(&others))
> > +        return;
> > +
> >      if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
> > -        sbi_remote_fence_i(NULL);
> > +        sbi_remote_fence_i(&others);
> >      else
> > -        on_each_cpu(ipi_remote_fence_i, NULL, 1);
> > +        on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
> >  }
> >  EXPORT_SYMBOL(flush_icache_all);
>
> Unfortunately, I see the same hang. LMK if you'd like me to try anything
> else.

I was able to reproduce this at my end but I had to use your config.

Digging further, it seems the issue is observed only when we use
in-kernel IPIs for cache flushing (instead of SBI calls) along with
some of the tracers (or debugging features) enabled. With the tracers
(or debug features) disabled we don't see any issue. In fact, the
upstream defconfig works perfectly fine with AIA drivers and
in-kernel IPIs.

It seems AIA based in-kernel IPIs are exposing some other issue
with RISC-V kernel. I will debug more to find the root cause.

Regards,
Anup
Björn Töpel Feb. 1, 2024, 6:45 p.m. UTC | #13
Anup Patel <apatel@ventanamicro.com> writes:

> On Tue, Jan 30, 2024 at 11:19 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Anup Patel <apatel@ventanamicro.com> writes:
>>
>> > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote:
>> >>
>> >> Björn Töpel <bjorn@kernel.org> writes:
>> >>
>> >> > Anup Patel <apatel@ventanamicro.com> writes:
>> >> >
>> >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote:
>> >> >>>
>> >> >>> Björn Töpel <bjorn@kernel.org> writes:
>> >> >>>
>> >> >>> > Anup Patel <apatel@ventanamicro.com> writes:
>> >> >>> >
>> >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international
>> >> >>> >> process. The latest ratified AIA specifcation can be found at:
>> >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >> >>> >>
>> >> >>> >> At a high-level, the AIA specification adds three things:
>> >> >>> >> 1) AIA CSRs
>> >> >>> >>    - Improved local interrupt support
>> >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >> >>> >>    - Per-HART MSI controller
>> >> >>> >>    - Support MSI virtualization
>> >> >>> >>    - Support IPI along with virtualization
>> >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >> >>> >>    - Wired interrupt controller
>> >> >>> >>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >> >>> >>    - In Direct-mode, injects external interrupts directly into HARTs
>> >> >>> >>
>> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization
>> >> >>> >> talk at KVM Forum 2022:
>> >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo
>> >> >>> >>
>> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher).
>> >> >>> >>
>> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at:
>> >> >>> >> https://github.com/avpatel/linux.git
>> >> >>> >>
>> >> >>> >> Changes since v11:
>> >> >>> >>  - Rebased on Linux-6.8-rc1
>> >> >>> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>> >> >>> >>    MSI handling to per device MSI domains" series by Thomas.
>> >> >>> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>> >> >>> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>> >> >>> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>> >> >>> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>> >> >>> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>> >> >>> >>    platform devices.
>> >> >>> >
>> >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches.
>> >> >>> >
>> >> >>> > I'm hitting a boot hang in text patching, with this series applied on
>> >> >>> > 6.8-rc2. IPI issues?
>> >> >>>
>> >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and
>> >> >>> the others are in cpu_relax(). Smells like IPI...
>> >> >>
>> >> >> I tried bootefi from U-Boot multiple times but can't reproduce the
>> >> >> issue you are seeing.
>> >> >
>> >> > Thanks! I can reproduce without EFI, and simpler command-line:
>> >> >
>> >> > qemu-system-riscv64 \
>> >> >   -bios /path/to/fw_dynamic.bin \
>> >> >   -kernel /path/to/Image \
>> >> >   -append 'earlycon console=tty0 console=ttyS0' \
>> >> >   -machine virt,aia=aplic-imsic \
>> >> >   -no-reboot -nodefaults -nographic \
>> >> >   -smp 4 \
>> >> >   -object rng-random,filename=/dev/urandom,id=rng0 \
>> >> >   -device virtio-rng-device,rng=rng0 \
>> >> >   -m 4G -chardev stdio,id=char0 -serial chardev:char0
>> >> >
>> >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the
>> >> > gist [1], and all latest QEMU/OpenSBI:
>> >> >
>> >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging")
>> >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found")
>> >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers")
>> >> >
>> >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e.
>> >> > panicking about missing root mount ;-))
>> >>
>> >> More context; The hang is during a late initcall, where an ftrace direct
>> >> (register_ftrace_direct()) modification is done.
>> >>
>> >> Stop machine is used to call into __ftrace_modify_call(). Then into the
>> >> arch specific patch_text_nosync(), where flush_icache_range() hangs in
>> >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to
>> >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info,
>> >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)"
>> >> right before the end of the function.
>> >>
>> >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code
>> >> patching fixes the boot hang, but it does seem related to IPI...
>> >>
>> > Looks like flush_icache_all() does not use the IPIs (on_each_cpu()
>> > and friends) correctly.
>> >
>> > On other hand, the flush_icache_mm() does the right thing by
>> > doing local flush on the current CPU and IPI based flush on other
>> > CPUs.
>> >
>> > Can you try the following patch ?
>> >
>> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
>> > index 55a34f2020a8..a3dfbe4de832 100644
>> > --- a/arch/riscv/mm/cacheflush.c
>> > +++ b/arch/riscv/mm/cacheflush.c
>> > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info)
>> >
>> >  void flush_icache_all(void)
>> >  {
>> > +    cpumask_t others;
>> > +
>> >      local_flush_icache_all();
>> >
>> > +    cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id()));
>> > +    if (cpumask_empty(&others))
>> > +        return;
>> > +
>> >      if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence())
>> > -        sbi_remote_fence_i(NULL);
>> > +        sbi_remote_fence_i(&others);
>> >      else
>> > -        on_each_cpu(ipi_remote_fence_i, NULL, 1);
>> > +        on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
>> >  }
>> >  EXPORT_SYMBOL(flush_icache_all);
>>
>> Unfortunately, I see the same hang. LMK if you'd like me to try anything
>> else.
>
> I was able to reproduce this at my end but I had to use your config.
>
> Digging further, it seems the issue is observed only when we use
> in-kernel IPIs for cache flushing (instead of SBI calls) along with
> some of the tracers (or debugging features) enabled. With the tracers
> (or debug features) disabled we don't see any issue. In fact, the
> upstream defconfig works perfectly fine with AIA drivers and
> in-kernel IPIs.

Same here. I only see the issue for *one* scenario. Other than that
scenario, AIA is working fine! We're doing ftrace text patching, and I
wonder if this is the issue. RISC-V (unfortunately) still rely on
stop_machine() text patching (which will change!).

Again, the hang is in stop_machine() context, where interrupts should
very much be disabled, right? So, triggering an IPI will be impossible.

Dumping mstatus in QEMU:
  | mstatus  0000000a000000a0
  | mstatus  0000000a000000a0
  | mstatus  0000000a000000a0
  | mstatus  0000000a000000a0

Indeed sstatus.SIE is 0.

Seems like the bug is that text patching is trying to issue an IPI:
  | [<ffffffff801145d4>] smp_call_function_many_cond+0x81e/0x8ba
  | [<ffffffff80114716>] on_each_cpu_cond_mask+0x3e/0xde
  | [<ffffffff80013968>] flush_icache_all+0x98/0xc4
  | [<ffffffff80009c26>] patch_text_nosync+0x7c/0x146
  | [<ffffffff80ef9116>] __ftrace_modify_call.constprop.0+0xca/0x120
  | [<ffffffff80ef918c>] ftrace_update_ftrace_func+0x20/0x40
  | [<ffffffff80efb8ac>] ftrace_modify_all_code+0x5a/0x1d8
  | [<ffffffff80efba50>] __ftrace_modify_code+0x26/0x42
  | [<ffffffff80131734>] multi_cpu_stop+0x14e/0x1d8
  | [<ffffffff8013107a>] cpu_stopper_thread+0x9e/0x182
  | [<ffffffff80077a04>] smpboot_thread_fn+0xf8/0x1d2
  | [<ffffffff800718fc>] kthread+0xe8/0x108
  | [<ffffffff80f1cde6>] ret_from_fork+0xe/0x20


Björn
Björn Töpel Feb. 6, 2024, 3:39 p.m. UTC | #14
Hi Anup,

Anup Patel <apatel@ventanamicro.com> writes:

> The RISC-V AIA specification is ratified as-per the RISC-V international
> process. The latest ratified AIA specifcation can be found at:
> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>
> At a high-level, the AIA specification adds three things:
> 1) AIA CSRs
>    - Improved local interrupt support
> 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>    - Per-HART MSI controller
>    - Support MSI virtualization
>    - Support IPI along with virtualization
> 3) Advanced Platform-Level Interrupt Controller (APLIC)
>    - Wired interrupt controller
>    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>    - In Direct-mode, injects external interrupts directly into HARTs
>
> For an overview of the AIA specification, refer the AIA virtualization
> talk at KVM Forum 2022:
> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> https://www.youtube.com/watch?v=r071dL8Z0yo

Thank you for continuing to work on this series! I like this
direction of the series!

TL;DR: I think we can get rid of most of the id/householding data
structures, except for the irq matrix.

Most of my comments are more of a design/overview nature, so I'll
comment here in the cover letter.

I took the series for a spin with and it with Alex' ftrace fix it,
passes all my tests nicely!

Now some thoughts/comments (I'm coming from the x86 side of things!):

id/enable-tracking: There are a lot of different id/enabled tracking
with corresponding locks, where there's IMO overlap with what the
matrix provides.

Let's start with struct imsic_priv:

   | /* Dummy HW interrupt numbers */
   | unsigned int nr_hwirqs;
   | raw_spinlock_t hwirqs_lock;
   | unsigned long *hwirqs_used_bitmap;

These are used to for the domain routing (hwirq -> desc/virq), and not
needed. Just use the same id as virq (at allocation time), and get rid
of these data structures/corresponding functions. The lookup in the
interrupt handler via imsic_local_priv.vectors doesn't care about
hwirq. This is what x86 does... The imsic_vector roughly corresponds
to apic_chip_data (nit: imsic_vector could have the chip_data suffix
as well, at least it would have helped me!)

Moving/affinity changes. The moving of a vector to another CPU
currently involves:

1. Allocate a new vector from the matrix
2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
   spinlocks)
3. Trigger two IPIs to apply the bitmap
4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
   all bits, and potentially rearm

This seems a bit heavy-weight: Why are you explicitly setting/clearing
all the bits in a loop at the local sync?

x86 does it a bit differently (more lazily): The chip_data has
prev_{cpu,vector}/move_in_progress fields, and keep both vectors
enabled until there's an interrupt on the new vector, and then the old
one is cleaned (irq_complete_move()).

Further; When it's time to remove the old vector, x86 doesn't trigger
an IPI on the disabling side, but queues a cleanup job on a per-cpu
list and triggers a timeout. So, the per-cpu chip_data (per-cpu
"vectors" in your series) can reside in two places during the transit.

I wonder if this clean up is less intrusive, and you just need to
perform what's in the per-list instead of dealing with the
ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
chip_data/desc has that information. This would mean that
imsic_local_priv() would only have the local vectors (chip_data), and
a cleanup list/timer.

My general comment is that instead of having these global id-tracking
structures, use the matrix together with some desc/chip_data local
data, which should be sufficient.

Random thought: Do we need to explicitly disable (csr) the vector,
when we're changing the affinity? What if we just leave it enabled,
and only when mask/unmask is performed it's actually explicitly masked
(writes to the csr)?

Missing features (which can be added later):
* Reservation mode/activate support (allocate many MSI, but only
  request/activate a subset)
* Handle managed interrupts
* There might be some irqd flags are missing, which mostly cpuhp care
  about (e.g. irqd_*_single_target())...

Finally; Given that the APLIC requires a lot more patches, depending
on how the review process moves on -- maybe the IMSIC side could go as
a separate series?


Cheers,
Björn
Anup Patel Feb. 6, 2024, 5:39 p.m. UTC | #15
On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Hi Anup,
>
> Anup Patel <apatel@ventanamicro.com> writes:
>
> > The RISC-V AIA specification is ratified as-per the RISC-V international
> > process. The latest ratified AIA specifcation can be found at:
> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >
> > At a high-level, the AIA specification adds three things:
> > 1) AIA CSRs
> >    - Improved local interrupt support
> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >    - Per-HART MSI controller
> >    - Support MSI virtualization
> >    - Support IPI along with virtualization
> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >    - Wired interrupt controller
> >    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >    - In Direct-mode, injects external interrupts directly into HARTs
> >
> > For an overview of the AIA specification, refer the AIA virtualization
> > talk at KVM Forum 2022:
> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> > https://www.youtube.com/watch?v=r071dL8Z0yo
>
> Thank you for continuing to work on this series! I like this
> direction of the series!
>
> TL;DR: I think we can get rid of most of the id/householding data
> structures, except for the irq matrix.
>
> Most of my comments are more of a design/overview nature, so I'll
> comment here in the cover letter.
>
> I took the series for a spin with and it with Alex' ftrace fix it,
> passes all my tests nicely!
>
> Now some thoughts/comments (I'm coming from the x86 side of things!):
>
> id/enable-tracking: There are a lot of different id/enabled tracking
> with corresponding locks, where there's IMO overlap with what the
> matrix provides.

The matrix allocator does not track the enabled/disabled state of
the per-CPU IDs. This is why we have a separate per-CPU
ids_enabled_bitmap which is also used for remote synchronization
across CPUs.

>
> Let's start with struct imsic_priv:
>
>    | /* Dummy HW interrupt numbers */
>    | unsigned int nr_hwirqs;
>    | raw_spinlock_t hwirqs_lock;
>    | unsigned long *hwirqs_used_bitmap;

The matrix allocator manages actual IDs for each CPU whereas
the Linux irq_data expects a fixed hwirq which does not change.

Due to this, we have a dummy hwirq space which is always
fixed. The only thing that is changed under-the-hood by the
IMSIC driver is the dummy hwirq to actual HW vector (cpu, id)
mapping.

>
> These are used to for the domain routing (hwirq -> desc/virq), and not
> needed. Just use the same id as virq (at allocation time), and get rid
> of these data structures/corresponding functions. The lookup in the
> interrupt handler via imsic_local_priv.vectors doesn't care about
> hwirq. This is what x86 does... The imsic_vector roughly corresponds
> to apic_chip_data (nit: imsic_vector could have the chip_data suffix
> as well, at least it would have helped me!)

Yes, imsic_vector corresponds to apic_chip_data in the x86 world.

>
> Moving/affinity changes. The moving of a vector to another CPU
> currently involves:
>
> 1. Allocate a new vector from the matrix
> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
>    spinlocks)
> 3. Trigger two IPIs to apply the bitmap
> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
>    all bits, and potentially rearm
>
> This seems a bit heavy-weight: Why are you explicitly setting/clearing
> all the bits in a loop at the local sync?

This can be certainly optimized by introducing another
ids_dirty_bitmap. I will add this in the next revision.

>
> x86 does it a bit differently (more lazily): The chip_data has
> prev_{cpu,vector}/move_in_progress fields, and keep both vectors
> enabled until there's an interrupt on the new vector, and then the old
> one is cleaned (irq_complete_move()).
>
> Further; When it's time to remove the old vector, x86 doesn't trigger
> an IPI on the disabling side, but queues a cleanup job on a per-cpu
> list and triggers a timeout. So, the per-cpu chip_data (per-cpu
> "vectors" in your series) can reside in two places during the transit.

We can't avoid IPIs when moving vectors from one CPU to another
CPU because IMSIC id enable/disable is only possible through
CSRs. Also, keep in-mind that irq affinity change might be initiated
on CPU X for some interrupt targeting CPU Y which is then changed
to target CPU Z.

In the case of x86, they have memory mapped registers which
allows one CPU to enable/disable the ID of another CPU.

>
> I wonder if this clean up is less intrusive, and you just need to
> perform what's in the per-list instead of dealing with the
> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
> chip_data/desc has that information. This would mean that
> imsic_local_priv() would only have the local vectors (chip_data), and
> a cleanup list/timer.
>
> My general comment is that instead of having these global id-tracking
> structures, use the matrix together with some desc/chip_data local
> data, which should be sufficient.

The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
are required since the matrix allocator only manages allocation of
per-CPU IDs.

>
> Random thought: Do we need to explicitly disable (csr) the vector,
> when we're changing the affinity? What if we just leave it enabled,
> and only when mask/unmask is performed it's actually explicitly masked
> (writes to the csr)?

We should not leave it enabled because some rough/buggy device
can inject spurious interrupts using MSI writes to unused enabled
interrupts.

>
> Missing features (which can be added later):
> * Reservation mode/activate support (allocate many MSI, but only
>   request/activate a subset)

I did not see any PCIe or platform device requiring this kind of
reservation. Any examples ?

> * Handle managed interrupts

Any examples of managed interrupts in the RISC-V world ?

> * There might be some irqd flags are missing, which mostly cpuhp care
>   about (e.g. irqd_*_single_target())...

Okay, let me check and update.

>
> Finally; Given that the APLIC requires a lot more patches, depending
> on how the review process moves on -- maybe the IMSIC side could go as
> a separate series?
>

The most popular implementation choice across RISC-V platforms
will be IMSIC + APLIC so both drivers should go together. In fact,
we need both drivers for the QEMU virt machine as well because
UART interrupt (and other wired interrupts) on the QEMU virt
machine goes through APLIC.

Regards,
Anup
Björn Töpel Feb. 7, 2024, 7:27 a.m. UTC | #16
Hi!

Anup Patel <anup@brainfault.org> writes:

> On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Hi Anup,
>>
>> Anup Patel <apatel@ventanamicro.com> writes:
>>
>> > The RISC-V AIA specification is ratified as-per the RISC-V international
>> > process. The latest ratified AIA specifcation can be found at:
>> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
>> >
>> > At a high-level, the AIA specification adds three things:
>> > 1) AIA CSRs
>> >    - Improved local interrupt support
>> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
>> >    - Per-HART MSI controller
>> >    - Support MSI virtualization
>> >    - Support IPI along with virtualization
>> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
>> >    - Wired interrupt controller
>> >    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
>> >    - In Direct-mode, injects external interrupts directly into HARTs
>> >
>> > For an overview of the AIA specification, refer the AIA virtualization
>> > talk at KVM Forum 2022:
>> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
>> > https://www.youtube.com/watch?v=r071dL8Z0yo
>>
>> Thank you for continuing to work on this series! I like this
>> direction of the series!
>>
>> TL;DR: I think we can get rid of most of the id/householding data
>> structures, except for the irq matrix.
>>
>> Most of my comments are more of a design/overview nature, so I'll
>> comment here in the cover letter.
>>
>> I took the series for a spin with and it with Alex' ftrace fix it,
>> passes all my tests nicely!
>>
>> Now some thoughts/comments (I'm coming from the x86 side of things!):
>>
>> id/enable-tracking: There are a lot of different id/enabled tracking
>> with corresponding locks, where there's IMO overlap with what the
>> matrix provides.
>
> The matrix allocator does not track the enabled/disabled state of
> the per-CPU IDs. This is why we have a separate per-CPU
> ids_enabled_bitmap which is also used for remote synchronization
> across CPUs.

Exactly, but what I'm asking is if that structure is really needed. More
below.

>> Let's start with struct imsic_priv:
>>
>>    | /* Dummy HW interrupt numbers */
>>    | unsigned int nr_hwirqs;
>>    | raw_spinlock_t hwirqs_lock;
>>    | unsigned long *hwirqs_used_bitmap;
>
> The matrix allocator manages actual IDs for each CPU whereas
> the Linux irq_data expects a fixed hwirq which does not change.
>
> Due to this, we have a dummy hwirq space which is always
> fixed. The only thing that is changed under-the-hood by the
> IMSIC driver is the dummy hwirq to actual HW vector (cpu, id)
> mapping.

Read below. I'm not talking about local_id from the irq_matrix, I'm
saying use virq, which has the properties you're asking for, and doesn't
require an additional structure. When an irq/desc is allocated, you have
a nice unique number with the virq for the lifetime of the interrupt.

>> These are used to for the domain routing (hwirq -> desc/virq), and not
>> needed. Just use the same id as virq (at allocation time), and get rid
>> of these data structures/corresponding functions. The lookup in the
>> interrupt handler via imsic_local_priv.vectors doesn't care about
>> hwirq. This is what x86 does... The imsic_vector roughly corresponds
>> to apic_chip_data (nit: imsic_vector could have the chip_data suffix
>> as well, at least it would have helped me!)
>
> Yes, imsic_vector corresponds to apic_chip_data in the x86 world.

...and I'm trying to ask the following; Given the IMSIC is pretty much
x86 vector (arch/x86/kernel/apic/vector.c), I'm trying to figure out the
rational why IMSIC has all the extra householding data, not needed by
x86. The x86 has been battle proven, and having to deal with all kind of
quirks (e.g. lost interrupts on affinity changes).

>> Moving/affinity changes. The moving of a vector to another CPU
>> currently involves:
>>
>> 1. Allocate a new vector from the matrix
>> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
>>    spinlocks)
>> 3. Trigger two IPIs to apply the bitmap
>> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
>>    all bits, and potentially rearm
>>
>> This seems a bit heavy-weight: Why are you explicitly setting/clearing
>> all the bits in a loop at the local sync?
>
> This can be certainly optimized by introducing another
> ids_dirty_bitmap. I will add this in the next revision.

I rather have fewer maps, and less locks! ;-)

>> x86 does it a bit differently (more lazily): The chip_data has
>> prev_{cpu,vector}/move_in_progress fields, and keep both vectors
>> enabled until there's an interrupt on the new vector, and then the old
>> one is cleaned (irq_complete_move()).
>>
>> Further; When it's time to remove the old vector, x86 doesn't trigger
>> an IPI on the disabling side, but queues a cleanup job on a per-cpu
>> list and triggers a timeout. So, the per-cpu chip_data (per-cpu
>> "vectors" in your series) can reside in two places during the transit.
>
> We can't avoid IPIs when moving vectors from one CPU to another
> CPU because IMSIC id enable/disable is only possible through
> CSRs. Also, keep in-mind that irq affinity change might be initiated
> on CPU X for some interrupt targeting CPU Y which is then changed
> to target CPU Z.
>
> In the case of x86, they have memory mapped registers which
> allows one CPU to enable/disable the ID of another CPU.

Nope. Same mechanics on x86 -- the cleanup has to be done one the
originating core. What I asked was "what about using a timer instead of
an IPI". I think this was up in the last rev as well?

Check out commit bdc1dad299bb ("x86/vector: Replace
IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the
comment about lost interrupts, and the rational for keeping the original
target active until there's a new interrupt on the new cpu.

>> I wonder if this clean up is less intrusive, and you just need to
>> perform what's in the per-list instead of dealing with the
>> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
>> chip_data/desc has that information. This would mean that
>> imsic_local_priv() would only have the local vectors (chip_data), and
>> a cleanup list/timer.
>>
>> My general comment is that instead of having these global id-tracking
>> structures, use the matrix together with some desc/chip_data local
>> data, which should be sufficient.
>
> The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
> are required since the matrix allocator only manages allocation of
> per-CPU IDs.

The information in ids_enabled_bitmap is/could be inherent in
imsic_local_priv.vectors (guess what x86 does... ;-)).

Dummy hwirqs could be replaced with the virq.

Hmm, seems like we're talking past each other, or at least I get the
feeling I can't get my opinions out right. I'll try to do a quick PoC,
to show you what I mean. That's probably easier than just talking about
it. ...and maybe I'll come realizing I'm all wrong!

My reaction is -- you're doing a lot of householding with a lot of
locks, and my worry is that we'll just end up with same issues/bloat
that x86 once had (has? ;-)).

>> Random thought: Do we need to explicitly disable (csr) the vector,
>> when we're changing the affinity? What if we just leave it enabled,
>> and only when mask/unmask is performed it's actually explicitly masked
>> (writes to the csr)?
>
> We should not leave it enabled because some rough/buggy device
> can inject spurious interrupts using MSI writes to unused enabled
> interrupts.

OK!

>>
>> Missing features (which can be added later):
>> * Reservation mode/activate support (allocate many MSI, but only
>>   request/activate a subset)
>
> I did not see any PCIe or platform device requiring this kind of
> reservation. Any examples ?

It's not a requirement. Some devices allocate a gazillion interrupts
(NICs with many QoS queues, e.g.), but only activate a subset (via
request_irq()). A system using these kind of devices might run out of
interrupts.

Problems you run into once you leave the embedded world, pretty much.

>> * Handle managed interrupts
>
> Any examples of managed interrupts in the RISC-V world ?

E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary
netdev drivers would use it. Typically devices with per-cpu queues.

>> * There might be some irqd flags are missing, which mostly cpuhp care
>>   about (e.g. irqd_*_single_target())...
>
> Okay, let me check and update.

I haven't dug much into cpuhp, so I'm out on a limb here...

>> Finally; Given that the APLIC requires a lot more patches, depending
>> on how the review process moves on -- maybe the IMSIC side could go as
>> a separate series?
>>
>
> The most popular implementation choice across RISC-V platforms
> will be IMSIC + APLIC so both drivers should go together. In fact,
> we need both drivers for the QEMU virt machine as well because
> UART interrupt (and other wired interrupts) on the QEMU virt
> machine goes through APLIC.

Thanks for clearing that out! Hmm, an IMSIC only QEMU would be awesome.


Cheers,
Björn
Anup Patel Feb. 7, 2024, 9:18 a.m. UTC | #17
On Wed, Feb 7, 2024 at 12:57 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Hi!
>
> Anup Patel <anup@brainfault.org> writes:
>
> > On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <bjorn@kernel.org> wrote:
> >>
> >> Hi Anup,
> >>
> >> Anup Patel <apatel@ventanamicro.com> writes:
> >>
> >> > The RISC-V AIA specification is ratified as-per the RISC-V international
> >> > process. The latest ratified AIA specifcation can be found at:
> >> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >> >
> >> > At a high-level, the AIA specification adds three things:
> >> > 1) AIA CSRs
> >> >    - Improved local interrupt support
> >> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >> >    - Per-HART MSI controller
> >> >    - Support MSI virtualization
> >> >    - Support IPI along with virtualization
> >> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >> >    - Wired interrupt controller
> >> >    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >> >    - In Direct-mode, injects external interrupts directly into HARTs
> >> >
> >> > For an overview of the AIA specification, refer the AIA virtualization
> >> > talk at KVM Forum 2022:
> >> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> >> > https://www.youtube.com/watch?v=r071dL8Z0yo
> >>
> >> Thank you for continuing to work on this series! I like this
> >> direction of the series!
> >>
> >> TL;DR: I think we can get rid of most of the id/householding data
> >> structures, except for the irq matrix.
> >>
> >> Most of my comments are more of a design/overview nature, so I'll
> >> comment here in the cover letter.
> >>
> >> I took the series for a spin with and it with Alex' ftrace fix it,
> >> passes all my tests nicely!
> >>
> >> Now some thoughts/comments (I'm coming from the x86 side of things!):
> >>
> >> id/enable-tracking: There are a lot of different id/enabled tracking
> >> with corresponding locks, where there's IMO overlap with what the
> >> matrix provides.
> >
> > The matrix allocator does not track the enabled/disabled state of
> > the per-CPU IDs. This is why we have a separate per-CPU
> > ids_enabled_bitmap which is also used for remote synchronization
> > across CPUs.
>
> Exactly, but what I'm asking is if that structure is really needed. More
> below.
>
> >> Let's start with struct imsic_priv:
> >>
> >>    | /* Dummy HW interrupt numbers */
> >>    | unsigned int nr_hwirqs;
> >>    | raw_spinlock_t hwirqs_lock;
> >>    | unsigned long *hwirqs_used_bitmap;
> >
> > The matrix allocator manages actual IDs for each CPU whereas
> > the Linux irq_data expects a fixed hwirq which does not change.
> >
> > Due to this, we have a dummy hwirq space which is always
> > fixed. The only thing that is changed under-the-hood by the
> > IMSIC driver is the dummy hwirq to actual HW vector (cpu, id)
> > mapping.
>
> Read below. I'm not talking about local_id from the irq_matrix, I'm
> saying use virq, which has the properties you're asking for, and doesn't
> require an additional structure. When an irq/desc is allocated, you have
> a nice unique number with the virq for the lifetime of the interrupt.

Sure, let me explore using virq in-place of hwirq.

>
> >> These are used to for the domain routing (hwirq -> desc/virq), and not
> >> needed. Just use the same id as virq (at allocation time), and get rid
> >> of these data structures/corresponding functions. The lookup in the
> >> interrupt handler via imsic_local_priv.vectors doesn't care about
> >> hwirq. This is what x86 does... The imsic_vector roughly corresponds
> >> to apic_chip_data (nit: imsic_vector could have the chip_data suffix
> >> as well, at least it would have helped me!)
> >
> > Yes, imsic_vector corresponds to apic_chip_data in the x86 world.
>
> ...and I'm trying to ask the following; Given the IMSIC is pretty much
> x86 vector (arch/x86/kernel/apic/vector.c), I'm trying to figure out the
> rational why IMSIC has all the extra householding data, not needed by
> x86. The x86 has been battle proven, and having to deal with all kind of
> quirks (e.g. lost interrupts on affinity changes).

Understood.

>
> >> Moving/affinity changes. The moving of a vector to another CPU
> >> currently involves:
> >>
> >> 1. Allocate a new vector from the matrix
> >> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
> >>    spinlocks)
> >> 3. Trigger two IPIs to apply the bitmap
> >> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
> >>    all bits, and potentially rearm
> >>
> >> This seems a bit heavy-weight: Why are you explicitly setting/clearing
> >> all the bits in a loop at the local sync?
> >
> > This can be certainly optimized by introducing another
> > ids_dirty_bitmap. I will add this in the next revision.
>
> I rather have fewer maps, and less locks! ;-)
>
> >> x86 does it a bit differently (more lazily): The chip_data has
> >> prev_{cpu,vector}/move_in_progress fields, and keep both vectors
> >> enabled until there's an interrupt on the new vector, and then the old
> >> one is cleaned (irq_complete_move()).
> >>
> >> Further; When it's time to remove the old vector, x86 doesn't trigger
> >> an IPI on the disabling side, but queues a cleanup job on a per-cpu
> >> list and triggers a timeout. So, the per-cpu chip_data (per-cpu
> >> "vectors" in your series) can reside in two places during the transit.
> >
> > We can't avoid IPIs when moving vectors from one CPU to another
> > CPU because IMSIC id enable/disable is only possible through
> > CSRs. Also, keep in-mind that irq affinity change might be initiated
> > on CPU X for some interrupt targeting CPU Y which is then changed
> > to target CPU Z.
> >
> > In the case of x86, they have memory mapped registers which
> > allows one CPU to enable/disable the ID of another CPU.
>
> Nope. Same mechanics on x86 -- the cleanup has to be done one the
> originating core. What I asked was "what about using a timer instead of
> an IPI". I think this was up in the last rev as well?
>
> Check out commit bdc1dad299bb ("x86/vector: Replace
> IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the
> comment about lost interrupts, and the rational for keeping the original
> target active until there's a new interrupt on the new cpu.

Trying timer interrupt is still TBD on my side because with v12
my goal was to implement per-device MSI domains. Let me
explore timer interrupts for v13.

>
> >> I wonder if this clean up is less intrusive, and you just need to
> >> perform what's in the per-list instead of dealing with the
> >> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
> >> chip_data/desc has that information. This would mean that
> >> imsic_local_priv() would only have the local vectors (chip_data), and
> >> a cleanup list/timer.
> >>
> >> My general comment is that instead of having these global id-tracking
> >> structures, use the matrix together with some desc/chip_data local
> >> data, which should be sufficient.
> >
> > The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
> > are required since the matrix allocator only manages allocation of
> > per-CPU IDs.
>
> The information in ids_enabled_bitmap is/could be inherent in
> imsic_local_priv.vectors (guess what x86 does... ;-)).
>
> Dummy hwirqs could be replaced with the virq.
>
> Hmm, seems like we're talking past each other, or at least I get the
> feeling I can't get my opinions out right. I'll try to do a quick PoC,
> to show you what I mean. That's probably easier than just talking about
> it. ...and maybe I'll come realizing I'm all wrong!

I suggest to wait for my v13 and try something on top of that
otherwise we might duplicate efforts.

>
> My reaction is -- you're doing a lot of householding with a lot of
> locks, and my worry is that we'll just end up with same issues/bloat
> that x86 once had (has? ;-)).
>
> >> Random thought: Do we need to explicitly disable (csr) the vector,
> >> when we're changing the affinity? What if we just leave it enabled,
> >> and only when mask/unmask is performed it's actually explicitly masked
> >> (writes to the csr)?
> >
> > We should not leave it enabled because some rough/buggy device
> > can inject spurious interrupts using MSI writes to unused enabled
> > interrupts.
>
> OK!
>
> >>
> >> Missing features (which can be added later):
> >> * Reservation mode/activate support (allocate many MSI, but only
> >>   request/activate a subset)
> >
> > I did not see any PCIe or platform device requiring this kind of
> > reservation. Any examples ?
>
> It's not a requirement. Some devices allocate a gazillion interrupts
> (NICs with many QoS queues, e.g.), but only activate a subset (via
> request_irq()). A system using these kind of devices might run out of
> interrupts.

I don't see how this is not possible currently.

>
> Problems you run into once you leave the embedded world, pretty much.
>
> >> * Handle managed interrupts
> >
> > Any examples of managed interrupts in the RISC-V world ?
>
> E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary
> netdev drivers would use it. Typically devices with per-cpu queues.

We have tested with NVMe devices, e1000e, VirtIO-net, etc and I did
not see any issue.

We can always add new features as separate incremental series as long
as there is clear use-cause backed by real-world devices.

>
> >> * There might be some irqd flags are missing, which mostly cpuhp care
> >>   about (e.g. irqd_*_single_target())...
> >
> > Okay, let me check and update.
>
> I haven't dug much into cpuhp, so I'm out on a limb here...
>
> >> Finally; Given that the APLIC requires a lot more patches, depending
> >> on how the review process moves on -- maybe the IMSIC side could go as
> >> a separate series?
> >>
> >
> > The most popular implementation choice across RISC-V platforms
> > will be IMSIC + APLIC so both drivers should go together. In fact,
> > we need both drivers for the QEMU virt machine as well because
> > UART interrupt (and other wired interrupts) on the QEMU virt
> > machine goes through APLIC.
>
> Thanks for clearing that out! Hmm, an IMSIC only QEMU would be awesome.
>
>
> Cheers,
> Björn

Regards,
Anup
Björn Töpel Feb. 7, 2024, 9:37 a.m. UTC | #18
Anup Patel <apatel@ventanamicro.com> writes:

>> Nope. Same mechanics on x86 -- the cleanup has to be done one the
>> originating core. What I asked was "what about using a timer instead of
>> an IPI". I think this was up in the last rev as well?
>>
>> Check out commit bdc1dad299bb ("x86/vector: Replace
>> IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the
>> comment about lost interrupts, and the rational for keeping the original
>> target active until there's a new interrupt on the new cpu.
>
> Trying timer interrupt is still TBD on my side because with v12
> my goal was to implement per-device MSI domains. Let me
> explore timer interrupts for v13.

OK!

>> >> I wonder if this clean up is less intrusive, and you just need to
>> >> perform what's in the per-list instead of dealing with the
>> >> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
>> >> chip_data/desc has that information. This would mean that
>> >> imsic_local_priv() would only have the local vectors (chip_data), and
>> >> a cleanup list/timer.
>> >>
>> >> My general comment is that instead of having these global id-tracking
>> >> structures, use the matrix together with some desc/chip_data local
>> >> data, which should be sufficient.
>> >
>> > The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
>> > are required since the matrix allocator only manages allocation of
>> > per-CPU IDs.
>>
>> The information in ids_enabled_bitmap is/could be inherent in
>> imsic_local_priv.vectors (guess what x86 does... ;-)).
>>
>> Dummy hwirqs could be replaced with the virq.
>>
>> Hmm, seems like we're talking past each other, or at least I get the
>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
>> to show you what I mean. That's probably easier than just talking about
>> it. ...and maybe I'll come realizing I'm all wrong!
>
> I suggest to wait for my v13 and try something on top of that
> otherwise we might duplicate efforts.

OK!

>> > I did not see any PCIe or platform device requiring this kind of
>> > reservation. Any examples ?
>>
>> It's not a requirement. Some devices allocate a gazillion interrupts
>> (NICs with many QoS queues, e.g.), but only activate a subset (via
>> request_irq()). A system using these kind of devices might run out of
>> interrupts.
>
> I don't see how this is not possible currently.

Again, this is something we can improve on later. But, this
implementation activates the interrupt at allocation time, no?

>> Problems you run into once you leave the embedded world, pretty much.
>>
>> >> * Handle managed interrupts
>> >
>> > Any examples of managed interrupts in the RISC-V world ?
>>
>> E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary
>> netdev drivers would use it. Typically devices with per-cpu queues.
>
> We have tested with NVMe devices, e1000e, VirtIO-net, etc and I did
> not see any issue.
>
> We can always add new features as separate incremental series as long
> as there is clear use-cause backed by real-world devices.

Agreed. Let's not feature creep.


Björn
Björn Töpel Feb. 7, 2024, 12:55 p.m. UTC | #19
Björn Töpel <bjorn@kernel.org> writes:

>>> Hmm, seems like we're talking past each other, or at least I get the
>>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
>>> to show you what I mean. That's probably easier than just talking about
>>> it. ...and maybe I'll come realizing I'm all wrong!
>>
>> I suggest to wait for my v13 and try something on top of that
>> otherwise we might duplicate efforts.
>
> OK!

I did some really rough code sketching, and I'm confident that you can
get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the
corresponding functions/locks. I'd say one lock is enough, and the key
is having the per-cpu imsic_local_priv.vectors change from struct
imsic_vector * to struct imsic_vector **.

Using smp_call_function_single() to IPI enable (and disable if you don't
want to use the lazy timer disable mechanism) seems feasible as well!

(Let me know if you don't have the spare cycles, and I can help out.)


Björn
Anup Patel Feb. 7, 2024, 1:08 p.m. UTC | #20
On Wed, Feb 7, 2024 at 6:25 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Björn Töpel <bjorn@kernel.org> writes:
>
> >>> Hmm, seems like we're talking past each other, or at least I get the
> >>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
> >>> to show you what I mean. That's probably easier than just talking about
> >>> it. ...and maybe I'll come realizing I'm all wrong!
> >>
> >> I suggest to wait for my v13 and try something on top of that
> >> otherwise we might duplicate efforts.
> >
> > OK!
>
> I did some really rough code sketching, and I'm confident that you can
> get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the
> corresponding functions/locks. I'd say one lock is enough, and the key
> is having the per-cpu imsic_local_priv.vectors change from struct
> imsic_vector * to struct imsic_vector **.

I have managed to remove hwirqs_bitmap (and related function).

Now, I am trying another approach to simplify locking using atomics.

>
> Using smp_call_function_single() to IPI enable (and disable if you don't
> want to use the lazy timer disable mechanism) seems feasible as well!

We have intentionally kept separate virq for synchronization because
this allows us to gather stats for debugging. Calling smp_call_function_single()
will not allow us to separately gather stats for sync IPIs.

The smp_call_function_single() eventually leads to __ipi_send_mask()
via send_ipi_mask() in arch/riscv so directly calling __ipi_send_mask()
for sync IPI is faster.

>
> (Let me know if you don't have the spare cycles, and I can help out.)
>
>
> Björn

Regards,
Anup
Anup Patel Feb. 7, 2024, 1:10 p.m. UTC | #21
On Wed, Feb 7, 2024 at 6:25 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Björn Töpel <bjorn@kernel.org> writes:
>
> >>> Hmm, seems like we're talking past each other, or at least I get the
> >>> feeling I can't get my opinions out right. I'll try to do a quick PoC,
> >>> to show you what I mean. That's probably easier than just talking about
> >>> it. ...and maybe I'll come realizing I'm all wrong!
> >>
> >> I suggest to wait for my v13 and try something on top of that
> >> otherwise we might duplicate efforts.
> >
> > OK!
>
> I did some really rough code sketching, and I'm confident that you can
> get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the
> corresponding functions/locks. I'd say one lock is enough, and the key
> is having the per-cpu imsic_local_priv.vectors change from struct
> imsic_vector * to struct imsic_vector **.
>
> Using smp_call_function_single() to IPI enable (and disable if you don't
> want to use the lazy timer disable mechanism) seems feasible as well!
>
> (Let me know if you don't have the spare cycles, and I can help out.)

If you can help upstream IOMMU driver then that would be awesome.

Regards,
Anup
Andrea Parri Feb. 8, 2024, 10:10 a.m. UTC | #22
Hi Anup,

I understand that some refactoring is in progress, but I don't see the
report below; adding it here hoping that it can be useful towards v13.
(Unfortunately, I didn't have enough time to debug this yet...)


>   irqchip/sifive-plic: Convert PLIC driver into a platform driver

I'm seeing the following LOCKDEP warning with this series, bisected to
the commit above.  This is a defconfig + PROVE_LOCKING=y build, booted
using -machine virt,aia=none.

[    0.953473] ========================================================
[    0.953704] WARNING: possible irq lock inversion dependency detected
[    0.953955] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
[    0.954224] --------------------------------------------------------
[    0.954444] swapper/0/0 just changed the state of lock:
[    0.954664] ffffaf808109d0c8 (&irq_desc_lock_class){-...}-{2:2}, at: handle_fasteoi_irq+0x24/0x1da
[    0.955699] but this lock took another, HARDIRQ-unsafe lock in the past:
[    0.955942]  (&handler->enable_lock){+.+.}-{2:2}
[    0.955974] 
               
               and interrupts could create inverse lock ordering between them.

[    0.956507] 
               other info that might help us debug this:
[    0.956775]  Possible interrupt unsafe locking scenario:

[    0.956998]        CPU0                    CPU1
[    0.957247]        ----                    ----
[    0.957439]   lock(&handler->enable_lock);
[    0.957607]                                local_irq_disable();
[    0.957793]                                lock(&irq_desc_lock_class);
[    0.958021]                                lock(&handler->enable_lock);
[    0.958246]   <Interrupt>
[    0.958342]     lock(&irq_desc_lock_class);
[    0.958501] 
                *** DEADLOCK ***

[    0.958715] no locks held by swapper/0/0.
[    0.958870] 
               the shortest dependencies between 2nd lock and 1st lock:
[    0.959152]  -> (&handler->enable_lock){+.+.}-{2:2} {
[    0.959372]     HARDIRQ-ON-W at:
[    0.959522]                       __lock_acquire+0x884/0x1f5c
[    0.959745]                       lock_acquire+0xf0/0x292
[    0.959913]                       _raw_spin_lock+0x2c/0x40
[    0.960090]                       plic_probe+0x322/0x65c
[    0.960257]                       platform_probe+0x4e/0x92
[    0.960432]                       really_probe+0x82/0x210
[    0.960598]                       __driver_probe_device+0x5c/0xd0
[    0.960784]                       driver_probe_device+0x2c/0xb0
[    0.960964]                       __driver_attach+0x72/0x10a
[    0.961151]                       bus_for_each_dev+0x60/0xac
[    0.961330]                       driver_attach+0x1a/0x22
[    0.961496]                       bus_add_driver+0xd4/0x19e
[    0.961666]                       driver_register+0x3e/0xd8
[    0.961835]                       __platform_driver_register+0x1c/0x24
[    0.962030]                       plic_driver_init+0x1a/0x22
[    0.962201]                       do_one_initcall+0x80/0x268
[    0.962371]                       kernel_init_freeable+0x296/0x300
[    0.962554]                       kernel_init+0x1e/0x10a
[    0.962713]                       ret_from_fork+0xe/0x1c
[    0.962884]     SOFTIRQ-ON-W at:
[    0.962994]                       __lock_acquire+0x89e/0x1f5c
[    0.963169]                       lock_acquire+0xf0/0x292
[    0.963336]                       _raw_spin_lock+0x2c/0x40
[    0.963497]                       plic_probe+0x322/0x65c
[    0.963664]                       platform_probe+0x4e/0x92
[    0.963849]                       really_probe+0x82/0x210
[    0.964054]                       __driver_probe_device+0x5c/0xd0
[    0.964255]                       driver_probe_device+0x2c/0xb0
[    0.964428]                       __driver_attach+0x72/0x10a
[    0.964603]                       bus_for_each_dev+0x60/0xac
[    0.964777]                       driver_attach+0x1a/0x22
[    0.964943]                       bus_add_driver+0xd4/0x19e
[    0.965343]                       driver_register+0x3e/0xd8
[    0.965527]                       __platform_driver_register+0x1c/0x24
[    0.965732]                       plic_driver_init+0x1a/0x22
[    0.965908]                       do_one_initcall+0x80/0x268
[    0.966078]                       kernel_init_freeable+0x296/0x300
[    0.966268]                       kernel_init+0x1e/0x10a
[    0.966436]                       ret_from_fork+0xe/0x1c
[    0.966599]     INITIAL USE at:
[    0.966716]                      __lock_acquire+0x3fc/0x1f5c
[    0.966891]                      lock_acquire+0xf0/0x292
[    0.967048]                      _raw_spin_lock+0x2c/0x40
[    0.967206]                      plic_probe+0x322/0x65c
[    0.967360]                      platform_probe+0x4e/0x92
[    0.967522]                      really_probe+0x82/0x210
[    0.967678]                      __driver_probe_device+0x5c/0xd0
[    0.967853]                      driver_probe_device+0x2c/0xb0
[    0.968025]                      __driver_attach+0x72/0x10a
[    0.968185]                      bus_for_each_dev+0x60/0xac
[    0.968348]                      driver_attach+0x1a/0x22
[    0.968513]                      bus_add_driver+0xd4/0x19e
[    0.968678]                      driver_register+0x3e/0xd8
[    0.968839]                      __platform_driver_register+0x1c/0x24
[    0.969035]                      plic_driver_init+0x1a/0x22
[    0.969239]                      do_one_initcall+0x80/0x268
[    0.969431]                      kernel_init_freeable+0x296/0x300
[    0.969610]                      kernel_init+0x1e/0x10a
[    0.969766]                      ret_from_fork+0xe/0x1c
[    0.969936]   }
[    0.970010]   ... key      at: [<ffffffff824f4138>] __key.2+0x0/0x10
[    0.970224]   ... acquired at:
[    0.970353]    lock_acquire+0xf0/0x292
[    0.970482]    _raw_spin_lock+0x2c/0x40
[    0.970609]    plic_irq_enable+0x7e/0x140
[    0.970739]    irq_enable+0x2c/0x60
[    0.970882]    __irq_startup+0x58/0x60
[    0.971008]    irq_startup+0x5e/0x13c
[    0.971126]    __setup_irq+0x4de/0x5da
[    0.971248]    request_threaded_irq+0xcc/0x12e
[    0.971394]    vm_find_vqs+0x62/0x50a
[    0.971518]    probe_common+0xfe/0x1d2
[    0.971635]    virtrng_probe+0xc/0x14
[    0.971751]    virtio_dev_probe+0x154/0x1fc
[    0.971878]    really_probe+0x82/0x210
[    0.972008]    __driver_probe_device+0x5c/0xd0
[    0.972147]    driver_probe_device+0x2c/0xb0
[    0.972280]    __driver_attach+0x72/0x10a
[    0.972407]    bus_for_each_dev+0x60/0xac
[    0.972540]    driver_attach+0x1a/0x22
[    0.972656]    bus_add_driver+0xd4/0x19e
[    0.972777]    driver_register+0x3e/0xd8
[    0.972896]    register_virtio_driver+0x1c/0x2a
[    0.973049]    virtio_rng_driver_init+0x18/0x20
[    0.973236]    do_one_initcall+0x80/0x268
[    0.973399]    kernel_init_freeable+0x296/0x300
[    0.973540]    kernel_init+0x1e/0x10a
[    0.973658]    ret_from_fork+0xe/0x1c

[    0.973858] -> (&irq_desc_lock_class){-...}-{2:2} {
[    0.974036]    IN-HARDIRQ-W at:
[    0.974142]                     __lock_acquire+0xa82/0x1f5c
[    0.974309]                     lock_acquire+0xf0/0x292
[    0.974467]                     _raw_spin_lock+0x2c/0x40
[    0.974625]                     handle_fasteoi_irq+0x24/0x1da
[    0.974794]                     generic_handle_domain_irq+0x1c/0x2a
[    0.974982]                     plic_handle_irq+0x7e/0xf0
[    0.975143]                     generic_handle_domain_irq+0x1c/0x2a
[    0.975329]                     riscv_intc_irq+0x2e/0x46
[    0.975488]                     handle_riscv_irq+0x4a/0x74
[    0.975652]                     call_on_irq_stack+0x32/0x40
[    0.975817]    INITIAL USE at:
[    0.975923]                    __lock_acquire+0x3fc/0x1f5c
[    0.976087]                    lock_acquire+0xf0/0x292
[    0.976244]                    _raw_spin_lock_irqsave+0x3a/0x64
[    0.976423]                    __irq_get_desc_lock+0x5a/0x84
[    0.976594]                    irq_modify_status+0x2a/0x106
[    0.976764]                    irq_set_percpu_devid+0x62/0x78
[    0.976939]                    riscv_intc_domain_map+0x1e/0x54
[    0.977133]                    irq_domain_associate_locked+0x42/0xe4
[    0.977363]                    irq_create_mapping_affinity+0x98/0xd4
[    0.977570]                    sbi_ipi_init+0x70/0x142
[    0.977744]                    init_IRQ+0xfe/0x11a
[    0.977906]                    start_kernel+0x4ae/0x790
[    0.978082]  }
[    0.978151]  ... key      at: [<ffffffff824bbee0>] irq_desc_lock_class+0x0/0x10
[    0.978389]  ... acquired at:
[    0.978494]    mark_lock+0x3fe/0x8c2
[    0.978624]    __lock_acquire+0xa82/0x1f5c
[    0.978766]    lock_acquire+0xf0/0x292
[    0.978897]    _raw_spin_lock+0x2c/0x40
[    0.979029]    handle_fasteoi_irq+0x24/0x1da
[    0.979171]    generic_handle_domain_irq+0x1c/0x2a
[    0.979326]    plic_handle_irq+0x7e/0xf0
[    0.979460]    generic_handle_domain_irq+0x1c/0x2a
[    0.979618]    riscv_intc_irq+0x2e/0x46
[    0.979751]    handle_riscv_irq+0x4a/0x74
[    0.979888]    call_on_irq_stack+0x32/0x40

[    0.980110] 
               stack backtrace:
[    0.980358] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
[    0.980662] Hardware name: riscv-virtio,qemu (DT)
[    0.980913] Call Trace:
[    0.981042] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
[    0.981246] [<ffffffff80ae020a>] show_stack+0x2c/0x38
[    0.981456] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
[    0.981648] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
[    0.981813] [<ffffffff80ae17a4>] print_irq_inversion_bug.part.0+0x162/0x176
[    0.982031] [<ffffffff8007c6e6>] mark_lock+0x3fe/0x8c2
[    0.982198] [<ffffffff8007d888>] __lock_acquire+0xa82/0x1f5c
[    0.982377] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
[    0.982549] [<ffffffff80af9962>] _raw_spin_lock+0x2c/0x40
[    0.982721] [<ffffffff8008f3fe>] handle_fasteoi_irq+0x24/0x1da
[    0.982904] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
[    0.983112] [<ffffffff80581dc0>] plic_handle_irq+0x7e/0xf0
[    0.983293] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
[    0.983495] [<ffffffff8057fb1a>] riscv_intc_irq+0x2e/0x46
[    0.983671] [<ffffffff80aedb4c>] handle_riscv_irq+0x4a/0x74
[    0.983856] [<ffffffff80afa756>] call_on_irq_stack+0x32/0x40


When I switch to -machine virt,aia=aplic-imsic (same config as above), I
get:

[    0.971406] ============================================
[    0.971439] WARNING: possible recursive locking detected
[    0.971497] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
[    0.971583] --------------------------------------------
[    0.971612] swapper/0/1 is trying to acquire lock:
[    0.971662] ffffaf83fefa8e78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x92/0x146
[    0.971927] 
               but task is already holding lock:
[    0.971975] ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
[    0.972045] 
               other info that might help us debug this:
[    0.972085]  Possible unsafe locking scenario:

[    0.972114]        CPU0
[    0.972133]        ----
[    0.972153]   lock(&lpriv->ids_lock);
[    0.972191]   lock(&lpriv->ids_lock);
[    0.972228] 
                *** DEADLOCK ***

[    0.972258]  May be due to missing lock nesting notation

[    0.972306] 6 locks held by swapper/0/1:
[    0.972338]  #0: ffffaf8081f65970 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x6a/0x10a
[    0.972413]  #1: ffffaf808217c240 (&desc->request_mutex){+.+.}-{3:3}, at: __setup_irq+0xa2/0x5da
[    0.972492]  #2: ffffaf808217c0c8 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xbe/0x5da
[    0.972555]  #3: ffffffff81892ac0 (mask_lock){....}-{2:2}, at: irq_setup_affinity+0x38/0xc6
[    0.972617]  #4: ffffffff81892a80 (tmp_mask_lock){....}-{2:2}, at: irq_do_set_affinity+0x3a/0x164
[    0.972681]  #5: ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
[    0.972753] 
               stack backtrace:
[    0.972852] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
[    0.972900] Hardware name: riscv-virtio,qemu (DT)
[    0.972987] Call Trace:
[    0.973019] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
[    0.973054] [<ffffffff80ae020a>] show_stack+0x2c/0x38
[    0.973083] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
[    0.973112] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
[    0.973139] [<ffffffff8007ad5e>] print_deadlock_bug+0x282/0x328
[    0.973168] [<ffffffff8007e15c>] __lock_acquire+0x1356/0x1f5c
[    0.973198] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
[    0.973225] [<ffffffff80af9adc>] _raw_spin_lock_irqsave+0x3a/0x64
[    0.973255] [<ffffffff80581210>] imsic_vector_move+0x92/0x146
[    0.973285] [<ffffffff80581a04>] imsic_irq_set_affinity+0x8e/0xc6
[    0.973315] [<ffffffff8008c86a>] irq_do_set_affinity+0x142/0x164
[    0.973345] [<ffffffff8008cc22>] irq_setup_affinity+0x68/0xc6
[    0.973374] [<ffffffff8008fa82>] irq_startup+0x72/0x13c
[    0.973401] [<ffffffff8008d40c>] __setup_irq+0x4de/0x5da
[    0.973430] [<ffffffff8008d5d4>] request_threaded_irq+0xcc/0x12e
[    0.973460] [<ffffffff806346d8>] vp_find_vqs_msix+0x114/0x376
[    0.973491] [<ffffffff80634970>] vp_find_vqs+0x36/0x136
[    0.973518] [<ffffffff80633280>] vp_modern_find_vqs+0x16/0x4e
[    0.973547] [<ffffffff80ab31f8>] p9_virtio_probe+0x8e/0x31c
[    0.973576] [<ffffffff8062d982>] virtio_dev_probe+0x154/0x1fc
[    0.973605] [<ffffffff80693738>] really_probe+0x82/0x210
[    0.973632] [<ffffffff80693922>] __driver_probe_device+0x5c/0xd0
[    0.973661] [<ffffffff806939c2>] driver_probe_device+0x2c/0xb0
[    0.973690] [<ffffffff80693b46>] __driver_attach+0x72/0x10a
[    0.973718] [<ffffffff8069191a>] bus_for_each_dev+0x60/0xac
[    0.973746] [<ffffffff80693164>] driver_attach+0x1a/0x22
[    0.973773] [<ffffffff80692ade>] bus_add_driver+0xd4/0x19e
[    0.973801] [<ffffffff8069487e>] driver_register+0x3e/0xd8
[    0.973829] [<ffffffff8062d1ce>] register_virtio_driver+0x1c/0x2a
[    0.973858] [<ffffffff80c3da52>] p9_virtio_init+0x36/0x56
[    0.973887] [<ffffffff800028fe>] do_one_initcall+0x80/0x268
[    0.973915] [<ffffffff80c01144>] kernel_init_freeable+0x296/0x300
[    0.973944] [<ffffffff80af05dc>] kernel_init+0x1e/0x10a
[    0.973972] [<ffffffff80afa716>] ret_from_fork+0xe/0x1c


FWIW, the full Qemu command I used was as follows:

	sudo /home/andrea/Downloads/qemu/build/qemu-system-riscv64 \
		-append "root=/dev/root rw rootfstype=9p rootflags=version=9p2000.L,trans=virtio,cache=mmap,access=any raid=noautodetect security=none loglevel=7" \
		-cpu rv64,sv57=off,svadu=off,svnapot=off \
		-device virtio-net-device,netdev=net0 \
		-device virtio-rng-device,rng=rng0 \
		-device virtio-9p-pci,fsdev=root,mount_tag=/dev/root \
		-fsdev local,id=root,path=/home/andrea/Downloads/jammy/,security_model=none \
		-kernel /home/andrea/linux/arch/riscv/boot/Image \
		-m 16G \
		-machine virt,aia=<either "none" or "aplic-imsic"> \
		-monitor telnet:127.0.0.1:55555,server,nowait \
		-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 \
		-nographic \
		-object rng-random,filename=/dev/urandom,id=rng0 \
		-serial mon:stdio \
		-smp 5


  Andrea
Thomas Gleixner Feb. 14, 2024, 7:54 p.m. UTC | #23
Anup!

On Sat, Jan 27 2024 at 21:50, Anup Patel wrote:
>> Changes since v11:
>>  - Rebased on Linux-6.8-rc1
>>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
>>    MSI handling to per device MSI domains" series by Thomas.
>>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
>>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
>>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
>>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
>>  - Updated IMSIC driver to support per-device MSI domains for PCI and
>>    platform devices.
>
> I have rebased and included 13 patches (which add per-device MSI domain
> infrastructure) from your series [1]. In this series, the IMSIC driver
> implements the msi_parent_ops and APLIC driver implements wired-to-msi
> bridge using your new infrastructure.
>
> The remaining 27 patches of your series [1] requires testing on ARM
> platforms which I don't have. I suggest these remaining patches to
> go as separate series.

Of course. Darwi (in Cc) is going to work on the ARM parts when he
returns from vacation. I'm going to apply the infrastructure patches
(1-13) in the next days so they are out of the way for you and Darwi,
unless someone has any objections.

Thanks for picking this up and driving it forward!

        tglx
Anup Patel Feb. 15, 2024, 5:48 a.m. UTC | #24
On Thu, Feb 15, 2024 at 1:24 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Anup!
>
> On Sat, Jan 27 2024 at 21:50, Anup Patel wrote:
> >> Changes since v11:
> >>  - Rebased on Linux-6.8-rc1
> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>    MSI handling to per device MSI domains" series by Thomas.
> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>    platform devices.
> >
> > I have rebased and included 13 patches (which add per-device MSI domain
> > infrastructure) from your series [1]. In this series, the IMSIC driver
> > implements the msi_parent_ops and APLIC driver implements wired-to-msi
> > bridge using your new infrastructure.
> >
> > The remaining 27 patches of your series [1] requires testing on ARM
> > platforms which I don't have. I suggest these remaining patches to
> > go as separate series.
>
> Of course. Darwi (in Cc) is going to work on the ARM parts when he
> returns from vacation. I'm going to apply the infrastructure patches
> (1-13) in the next days so they are out of the way for you and Darwi,
> unless someone has any objections.
>
> Thanks for picking this up and driving it forward!

Thanks Thomas, I will be sending v13 of this series next week.

For the time being, I will carry the 13 infrastructure patches in
this series until they land in upstream Linux so that it is easier
for people to try this series.

Regards,
Anup
Marc Zyngier Feb. 15, 2024, 11:57 a.m. UTC | #25
On Wed, 14 Feb 2024 19:54:49 +0000,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> Anup!
> 
> On Sat, Jan 27 2024 at 21:50, Anup Patel wrote:
> >> Changes since v11:
> >>  - Rebased on Linux-6.8-rc1
> >>  - Included kernel/irq related patches from "genirq, irqchip: Convert ARM
> >>    MSI handling to per device MSI domains" series by Thomas.
> >>    (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19,
> >>     PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of
> >>     https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/)
> >>  - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism.
> >>  - Updated IMSIC driver to support per-device MSI domains for PCI and
> >>    platform devices.
> >
> > I have rebased and included 13 patches (which add per-device MSI domain
> > infrastructure) from your series [1]. In this series, the IMSIC driver
> > implements the msi_parent_ops and APLIC driver implements wired-to-msi
> > bridge using your new infrastructure.
> >
> > The remaining 27 patches of your series [1] requires testing on ARM
> > platforms which I don't have. I suggest these remaining patches to
> > go as separate series.
> 
> Of course. Darwi (in Cc) is going to work on the ARM parts when he
> returns from vacation. I'm going to apply the infrastructure patches
> (1-13) in the next days so they are out of the way for you and Darwi,
> unless someone has any objections.

FWIW, I've fiven the first 13 patches a go on two of the most
problematic platforms (Huawei's D05, and Marvell's McBin). Nothing
immediately broke, so it's obviously perfect.

Thanks,

	M.
Thomas Gleixner Feb. 15, 2024, 7:59 p.m. UTC | #26
Anup!

On Thu, Feb 15 2024 at 11:18, Anup Patel wrote:
> On Thu, Feb 15, 2024 at 1:24 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Thanks for picking this up and driving it forward!
>
> Thanks Thomas, I will be sending v13 of this series next week.
>
> For the time being, I will carry the 13 infrastructure patches in
> this series until they land in upstream Linux so that it is easier
> for people to try this series.

I pushed out the lot on top of 6.8-rc4 (no other changes) to

     git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi

with some minimal changes (DEVICE_IMS -> DEVICE_MSI, removal of an
unused interface).

I'm going over the rest of the series after I dealt with my other patch
backlog.

Thanks,

        tglx
Anup Patel Feb. 16, 2024, 11:33 a.m. UTC | #27
Hi Andrea,

On Thu, Feb 8, 2024 at 3:40 PM Andrea Parri <parri.andrea@gmail.com> wrote:
>
> Hi Anup,
>
> I understand that some refactoring is in progress, but I don't see the
> report below; adding it here hoping that it can be useful towards v13.
> (Unfortunately, I didn't have enough time to debug this yet...)
>
>
> >   irqchip/sifive-plic: Convert PLIC driver into a platform driver
>
> I'm seeing the following LOCKDEP warning with this series, bisected to
> the commit above.  This is a defconfig + PROVE_LOCKING=y build, booted
> using -machine virt,aia=none.
>
> [    0.953473] ========================================================
> [    0.953704] WARNING: possible irq lock inversion dependency detected
> [    0.953955] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
> [    0.954224] --------------------------------------------------------
> [    0.954444] swapper/0/0 just changed the state of lock:
> [    0.954664] ffffaf808109d0c8 (&irq_desc_lock_class){-...}-{2:2}, at: handle_fasteoi_irq+0x24/0x1da
> [    0.955699] but this lock took another, HARDIRQ-unsafe lock in the past:
> [    0.955942]  (&handler->enable_lock){+.+.}-{2:2}
> [    0.955974]
>
>                and interrupts could create inverse lock ordering between them.
>
> [    0.956507]
>                other info that might help us debug this:
> [    0.956775]  Possible interrupt unsafe locking scenario:
>
> [    0.956998]        CPU0                    CPU1
> [    0.957247]        ----                    ----
> [    0.957439]   lock(&handler->enable_lock);
> [    0.957607]                                local_irq_disable();
> [    0.957793]                                lock(&irq_desc_lock_class);
> [    0.958021]                                lock(&handler->enable_lock);
> [    0.958246]   <Interrupt>
> [    0.958342]     lock(&irq_desc_lock_class);
> [    0.958501]
>                 *** DEADLOCK ***

I was able to reproduce this warning.

Further digging, it turns out the locking safety issue existed
before PLIC was converted into platform driver but was never
caught by the lock dependency because previously it was
being probed very early using IRQCHIP_DECLARE().

I will include a separate patch in v13 to fix this warning.

Thanks,
Anup



>
> [    0.958715] no locks held by swapper/0/0.
> [    0.958870]
>                the shortest dependencies between 2nd lock and 1st lock:
> [    0.959152]  -> (&handler->enable_lock){+.+.}-{2:2} {
> [    0.959372]     HARDIRQ-ON-W at:
> [    0.959522]                       __lock_acquire+0x884/0x1f5c
> [    0.959745]                       lock_acquire+0xf0/0x292
> [    0.959913]                       _raw_spin_lock+0x2c/0x40
> [    0.960090]                       plic_probe+0x322/0x65c
> [    0.960257]                       platform_probe+0x4e/0x92
> [    0.960432]                       really_probe+0x82/0x210
> [    0.960598]                       __driver_probe_device+0x5c/0xd0
> [    0.960784]                       driver_probe_device+0x2c/0xb0
> [    0.960964]                       __driver_attach+0x72/0x10a
> [    0.961151]                       bus_for_each_dev+0x60/0xac
> [    0.961330]                       driver_attach+0x1a/0x22
> [    0.961496]                       bus_add_driver+0xd4/0x19e
> [    0.961666]                       driver_register+0x3e/0xd8
> [    0.961835]                       __platform_driver_register+0x1c/0x24
> [    0.962030]                       plic_driver_init+0x1a/0x22
> [    0.962201]                       do_one_initcall+0x80/0x268
> [    0.962371]                       kernel_init_freeable+0x296/0x300
> [    0.962554]                       kernel_init+0x1e/0x10a
> [    0.962713]                       ret_from_fork+0xe/0x1c
> [    0.962884]     SOFTIRQ-ON-W at:
> [    0.962994]                       __lock_acquire+0x89e/0x1f5c
> [    0.963169]                       lock_acquire+0xf0/0x292
> [    0.963336]                       _raw_spin_lock+0x2c/0x40
> [    0.963497]                       plic_probe+0x322/0x65c
> [    0.963664]                       platform_probe+0x4e/0x92
> [    0.963849]                       really_probe+0x82/0x210
> [    0.964054]                       __driver_probe_device+0x5c/0xd0
> [    0.964255]                       driver_probe_device+0x2c/0xb0
> [    0.964428]                       __driver_attach+0x72/0x10a
> [    0.964603]                       bus_for_each_dev+0x60/0xac
> [    0.964777]                       driver_attach+0x1a/0x22
> [    0.964943]                       bus_add_driver+0xd4/0x19e
> [    0.965343]                       driver_register+0x3e/0xd8
> [    0.965527]                       __platform_driver_register+0x1c/0x24
> [    0.965732]                       plic_driver_init+0x1a/0x22
> [    0.965908]                       do_one_initcall+0x80/0x268
> [    0.966078]                       kernel_init_freeable+0x296/0x300
> [    0.966268]                       kernel_init+0x1e/0x10a
> [    0.966436]                       ret_from_fork+0xe/0x1c
> [    0.966599]     INITIAL USE at:
> [    0.966716]                      __lock_acquire+0x3fc/0x1f5c
> [    0.966891]                      lock_acquire+0xf0/0x292
> [    0.967048]                      _raw_spin_lock+0x2c/0x40
> [    0.967206]                      plic_probe+0x322/0x65c
> [    0.967360]                      platform_probe+0x4e/0x92
> [    0.967522]                      really_probe+0x82/0x210
> [    0.967678]                      __driver_probe_device+0x5c/0xd0
> [    0.967853]                      driver_probe_device+0x2c/0xb0
> [    0.968025]                      __driver_attach+0x72/0x10a
> [    0.968185]                      bus_for_each_dev+0x60/0xac
> [    0.968348]                      driver_attach+0x1a/0x22
> [    0.968513]                      bus_add_driver+0xd4/0x19e
> [    0.968678]                      driver_register+0x3e/0xd8
> [    0.968839]                      __platform_driver_register+0x1c/0x24
> [    0.969035]                      plic_driver_init+0x1a/0x22
> [    0.969239]                      do_one_initcall+0x80/0x268
> [    0.969431]                      kernel_init_freeable+0x296/0x300
> [    0.969610]                      kernel_init+0x1e/0x10a
> [    0.969766]                      ret_from_fork+0xe/0x1c
> [    0.969936]   }
> [    0.970010]   ... key      at: [<ffffffff824f4138>] __key.2+0x0/0x10
> [    0.970224]   ... acquired at:
> [    0.970353]    lock_acquire+0xf0/0x292
> [    0.970482]    _raw_spin_lock+0x2c/0x40
> [    0.970609]    plic_irq_enable+0x7e/0x140
> [    0.970739]    irq_enable+0x2c/0x60
> [    0.970882]    __irq_startup+0x58/0x60
> [    0.971008]    irq_startup+0x5e/0x13c
> [    0.971126]    __setup_irq+0x4de/0x5da
> [    0.971248]    request_threaded_irq+0xcc/0x12e
> [    0.971394]    vm_find_vqs+0x62/0x50a
> [    0.971518]    probe_common+0xfe/0x1d2
> [    0.971635]    virtrng_probe+0xc/0x14
> [    0.971751]    virtio_dev_probe+0x154/0x1fc
> [    0.971878]    really_probe+0x82/0x210
> [    0.972008]    __driver_probe_device+0x5c/0xd0
> [    0.972147]    driver_probe_device+0x2c/0xb0
> [    0.972280]    __driver_attach+0x72/0x10a
> [    0.972407]    bus_for_each_dev+0x60/0xac
> [    0.972540]    driver_attach+0x1a/0x22
> [    0.972656]    bus_add_driver+0xd4/0x19e
> [    0.972777]    driver_register+0x3e/0xd8
> [    0.972896]    register_virtio_driver+0x1c/0x2a
> [    0.973049]    virtio_rng_driver_init+0x18/0x20
> [    0.973236]    do_one_initcall+0x80/0x268
> [    0.973399]    kernel_init_freeable+0x296/0x300
> [    0.973540]    kernel_init+0x1e/0x10a
> [    0.973658]    ret_from_fork+0xe/0x1c
>
> [    0.973858] -> (&irq_desc_lock_class){-...}-{2:2} {
> [    0.974036]    IN-HARDIRQ-W at:
> [    0.974142]                     __lock_acquire+0xa82/0x1f5c
> [    0.974309]                     lock_acquire+0xf0/0x292
> [    0.974467]                     _raw_spin_lock+0x2c/0x40
> [    0.974625]                     handle_fasteoi_irq+0x24/0x1da
> [    0.974794]                     generic_handle_domain_irq+0x1c/0x2a
> [    0.974982]                     plic_handle_irq+0x7e/0xf0
> [    0.975143]                     generic_handle_domain_irq+0x1c/0x2a
> [    0.975329]                     riscv_intc_irq+0x2e/0x46
> [    0.975488]                     handle_riscv_irq+0x4a/0x74
> [    0.975652]                     call_on_irq_stack+0x32/0x40
> [    0.975817]    INITIAL USE at:
> [    0.975923]                    __lock_acquire+0x3fc/0x1f5c
> [    0.976087]                    lock_acquire+0xf0/0x292
> [    0.976244]                    _raw_spin_lock_irqsave+0x3a/0x64
> [    0.976423]                    __irq_get_desc_lock+0x5a/0x84
> [    0.976594]                    irq_modify_status+0x2a/0x106
> [    0.976764]                    irq_set_percpu_devid+0x62/0x78
> [    0.976939]                    riscv_intc_domain_map+0x1e/0x54
> [    0.977133]                    irq_domain_associate_locked+0x42/0xe4
> [    0.977363]                    irq_create_mapping_affinity+0x98/0xd4
> [    0.977570]                    sbi_ipi_init+0x70/0x142
> [    0.977744]                    init_IRQ+0xfe/0x11a
> [    0.977906]                    start_kernel+0x4ae/0x790
> [    0.978082]  }
> [    0.978151]  ... key      at: [<ffffffff824bbee0>] irq_desc_lock_class+0x0/0x10
> [    0.978389]  ... acquired at:
> [    0.978494]    mark_lock+0x3fe/0x8c2
> [    0.978624]    __lock_acquire+0xa82/0x1f5c
> [    0.978766]    lock_acquire+0xf0/0x292
> [    0.978897]    _raw_spin_lock+0x2c/0x40
> [    0.979029]    handle_fasteoi_irq+0x24/0x1da
> [    0.979171]    generic_handle_domain_irq+0x1c/0x2a
> [    0.979326]    plic_handle_irq+0x7e/0xf0
> [    0.979460]    generic_handle_domain_irq+0x1c/0x2a
> [    0.979618]    riscv_intc_irq+0x2e/0x46
> [    0.979751]    handle_riscv_irq+0x4a/0x74
> [    0.979888]    call_on_irq_stack+0x32/0x40
>
> [    0.980110]
>                stack backtrace:
> [    0.980358] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
> [    0.980662] Hardware name: riscv-virtio,qemu (DT)
> [    0.980913] Call Trace:
> [    0.981042] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
> [    0.981246] [<ffffffff80ae020a>] show_stack+0x2c/0x38
> [    0.981456] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
> [    0.981648] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
> [    0.981813] [<ffffffff80ae17a4>] print_irq_inversion_bug.part.0+0x162/0x176
> [    0.982031] [<ffffffff8007c6e6>] mark_lock+0x3fe/0x8c2
> [    0.982198] [<ffffffff8007d888>] __lock_acquire+0xa82/0x1f5c
> [    0.982377] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
> [    0.982549] [<ffffffff80af9962>] _raw_spin_lock+0x2c/0x40
> [    0.982721] [<ffffffff8008f3fe>] handle_fasteoi_irq+0x24/0x1da
> [    0.982904] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
> [    0.983112] [<ffffffff80581dc0>] plic_handle_irq+0x7e/0xf0
> [    0.983293] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
> [    0.983495] [<ffffffff8057fb1a>] riscv_intc_irq+0x2e/0x46
> [    0.983671] [<ffffffff80aedb4c>] handle_riscv_irq+0x4a/0x74
> [    0.983856] [<ffffffff80afa756>] call_on_irq_stack+0x32/0x40
>
>
> When I switch to -machine virt,aia=aplic-imsic (same config as above), I
> get:
>
> [    0.971406] ============================================
> [    0.971439] WARNING: possible recursive locking detected
> [    0.971497] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
> [    0.971583] --------------------------------------------
> [    0.971612] swapper/0/1 is trying to acquire lock:
> [    0.971662] ffffaf83fefa8e78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x92/0x146
> [    0.971927]
>                but task is already holding lock:
> [    0.971975] ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
> [    0.972045]
>                other info that might help us debug this:
> [    0.972085]  Possible unsafe locking scenario:
>
> [    0.972114]        CPU0
> [    0.972133]        ----
> [    0.972153]   lock(&lpriv->ids_lock);
> [    0.972191]   lock(&lpriv->ids_lock);
> [    0.972228]
>                 *** DEADLOCK ***
>
> [    0.972258]  May be due to missing lock nesting notation
>
> [    0.972306] 6 locks held by swapper/0/1:
> [    0.972338]  #0: ffffaf8081f65970 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x6a/0x10a
> [    0.972413]  #1: ffffaf808217c240 (&desc->request_mutex){+.+.}-{3:3}, at: __setup_irq+0xa2/0x5da
> [    0.972492]  #2: ffffaf808217c0c8 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xbe/0x5da
> [    0.972555]  #3: ffffffff81892ac0 (mask_lock){....}-{2:2}, at: irq_setup_affinity+0x38/0xc6
> [    0.972617]  #4: ffffffff81892a80 (tmp_mask_lock){....}-{2:2}, at: irq_do_set_affinity+0x3a/0x164
> [    0.972681]  #5: ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
> [    0.972753]
>                stack backtrace:
> [    0.972852] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
> [    0.972900] Hardware name: riscv-virtio,qemu (DT)
> [    0.972987] Call Trace:
> [    0.973019] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
> [    0.973054] [<ffffffff80ae020a>] show_stack+0x2c/0x38
> [    0.973083] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
> [    0.973112] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
> [    0.973139] [<ffffffff8007ad5e>] print_deadlock_bug+0x282/0x328
> [    0.973168] [<ffffffff8007e15c>] __lock_acquire+0x1356/0x1f5c
> [    0.973198] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
> [    0.973225] [<ffffffff80af9adc>] _raw_spin_lock_irqsave+0x3a/0x64
> [    0.973255] [<ffffffff80581210>] imsic_vector_move+0x92/0x146
> [    0.973285] [<ffffffff80581a04>] imsic_irq_set_affinity+0x8e/0xc6
> [    0.973315] [<ffffffff8008c86a>] irq_do_set_affinity+0x142/0x164
> [    0.973345] [<ffffffff8008cc22>] irq_setup_affinity+0x68/0xc6
> [    0.973374] [<ffffffff8008fa82>] irq_startup+0x72/0x13c
> [    0.973401] [<ffffffff8008d40c>] __setup_irq+0x4de/0x5da
> [    0.973430] [<ffffffff8008d5d4>] request_threaded_irq+0xcc/0x12e
> [    0.973460] [<ffffffff806346d8>] vp_find_vqs_msix+0x114/0x376
> [    0.973491] [<ffffffff80634970>] vp_find_vqs+0x36/0x136
> [    0.973518] [<ffffffff80633280>] vp_modern_find_vqs+0x16/0x4e
> [    0.973547] [<ffffffff80ab31f8>] p9_virtio_probe+0x8e/0x31c
> [    0.973576] [<ffffffff8062d982>] virtio_dev_probe+0x154/0x1fc
> [    0.973605] [<ffffffff80693738>] really_probe+0x82/0x210
> [    0.973632] [<ffffffff80693922>] __driver_probe_device+0x5c/0xd0
> [    0.973661] [<ffffffff806939c2>] driver_probe_device+0x2c/0xb0
> [    0.973690] [<ffffffff80693b46>] __driver_attach+0x72/0x10a
> [    0.973718] [<ffffffff8069191a>] bus_for_each_dev+0x60/0xac
> [    0.973746] [<ffffffff80693164>] driver_attach+0x1a/0x22
> [    0.973773] [<ffffffff80692ade>] bus_add_driver+0xd4/0x19e
> [    0.973801] [<ffffffff8069487e>] driver_register+0x3e/0xd8
> [    0.973829] [<ffffffff8062d1ce>] register_virtio_driver+0x1c/0x2a
> [    0.973858] [<ffffffff80c3da52>] p9_virtio_init+0x36/0x56
> [    0.973887] [<ffffffff800028fe>] do_one_initcall+0x80/0x268
> [    0.973915] [<ffffffff80c01144>] kernel_init_freeable+0x296/0x300
> [    0.973944] [<ffffffff80af05dc>] kernel_init+0x1e/0x10a
> [    0.973972] [<ffffffff80afa716>] ret_from_fork+0xe/0x1c
>
>
> FWIW, the full Qemu command I used was as follows:
>
>         sudo /home/andrea/Downloads/qemu/build/qemu-system-riscv64 \
>                 -append "root=/dev/root rw rootfstype=9p rootflags=version=9p2000.L,trans=virtio,cache=mmap,access=any raid=noautodetect security=none loglevel=7" \
>                 -cpu rv64,sv57=off,svadu=off,svnapot=off \
>                 -device virtio-net-device,netdev=net0 \
>                 -device virtio-rng-device,rng=rng0 \
>                 -device virtio-9p-pci,fsdev=root,mount_tag=/dev/root \
>                 -fsdev local,id=root,path=/home/andrea/Downloads/jammy/,security_model=none \
>                 -kernel /home/andrea/linux/arch/riscv/boot/Image \
>                 -m 16G \
>                 -machine virt,aia=<either "none" or "aplic-imsic"> \
>                 -monitor telnet:127.0.0.1:55555,server,nowait \
>                 -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 \
>                 -nographic \
>                 -object rng-random,filename=/dev/urandom,id=rng0 \
>                 -serial mon:stdio \
>                 -smp 5
>
>
>   Andrea
Thomas Gleixner Feb. 16, 2024, 9:05 p.m. UTC | #28
Anup!

On Thu, Feb 15 2024 at 20:59, Thomas Gleixner wrote:
> I'm going over the rest of the series after I dealt with my other patch
> backlog.

Aside of the nitpicks I had, this looks pretty reasonable.

Thanks,

        tglx
Anup Patel Feb. 20, 2024, 6:12 a.m. UTC | #29
On Sat, Feb 17, 2024 at 2:35 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Anup!
>
> On Thu, Feb 15 2024 at 20:59, Thomas Gleixner wrote:
> > I'm going over the rest of the series after I dealt with my other patch
> > backlog.
>
> Aside of the nitpicks I had, this looks pretty reasonable.

Thanks for your review.

I have sent v13 series. Please have a look.

Regards,
Anup

>
> Thanks,
>
>         tglx