Message ID | 20240127161753.114685-1-apatel@ventanamicro.com (mailing list archive) |
---|---|
Headers | show |
Series | Linux RISC-V AIA Support | expand |
Hi Thomas, On Sat, Jan 27, 2024 at 9:48 PM Anup Patel <apatel@ventanamicro.com> wrote: > > The RISC-V AIA specification is ratified as-per the RISC-V international > process. The latest ratified AIA specifcation can be found at: > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > > At a high-level, the AIA specification adds three things: > 1) AIA CSRs > - Improved local interrupt support > 2) Incoming Message Signaled Interrupt Controller (IMSIC) > - Per-HART MSI controller > - Support MSI virtualization > - Support IPI along with virtualization > 3) Advanced Platform-Level Interrupt Controller (APLIC) > - Wired interrupt controller > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > - In Direct-mode, injects external interrupts directly into HARTs > > For an overview of the AIA specification, refer the AIA virtualization > talk at KVM Forum 2022: > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > https://www.youtube.com/watch?v=r071dL8Z0yo > > To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). > > These patches can also be found in the riscv_aia_v12 branch at: > https://github.com/avpatel/linux.git > > Changes since v11: > - Rebased on Linux-6.8-rc1 > - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > MSI handling to per device MSI domains" series by Thomas. > (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > - Updated IMSIC driver to support per-device MSI domains for PCI and > platform devices. > > Changes since v10: > - Rebased on Linux-6.6-rc7 > - Dropped PATCH3 of v10 series since this has been merged by MarcZ > for Linux-6.6-rc7 > - Changed the IMSIC ID management strategy from 1-n approach to > x86-style 1-1 approach > > Changes since v9: > - Rebased on Linux-6.6-rc4 > - Use builtin_platform_driver() in PATCH5, PATCH9, and PATCH12 > > Changes since v8: > - Rebased on Linux-6.6-rc3 > - Dropped PATCH2 of v8 series since we won't be requiring > riscv_get_intc_hartid() based on Marc Z's comments on ACPI AIA support. > - Addressed Saravana's comments in PATCH3 of v8 series > - Update PATCH9 and PATCH13 of v8 series based on comments from Sunil > > Changes since v7: > - Rebased on Linux-6.6-rc1 > - Addressed comments on PATCH1 of v7 series and split it into two PATCHes > - Use DEFINE_SIMPLE_PROP() in PATCH2 of v7 series > > Changes since v6: > - Rebased on Linux-6.5-rc4 > - Updated PATCH2 to use IS_ENABLED(CONFIG_SPARC) instead of > !IS_ENABLED(CONFIG_OF_IRQ) > - Added new PATCH4 to fix syscore registration in PLIC driver > - Update PATCH5 to convert PLIC driver into full-blown platform driver > with a re-written probe function. > > Changes since v5: > - Rebased on Linux-6.5-rc2 > - Updated the overall series to ensure that only IPI, timer, and > INTC drivers are probed very early whereas rest of the interrupt > controllers (such as PLIC, APLIC, and IMISC) are probed as > regular platform drivers. > - Renamed riscv_fw_parent_hartid() to riscv_get_intc_hartid() > - New PATCH1 to add fw_devlink support for msi-parent DT property > - New PATCH2 to ensure all INTC suppliers are initialized which in-turn > fixes the probing issue for PLIC, APLIC and IMSIC as platform driver > - New PATCH3 to use platform driver probing for PLIC > - Re-structured the IMSIC driver into two separate drivers: early and > platform. The IMSIC early driver (PATCH7) only initialized IMSIC state > and provides IPIs whereas the IMSIC platform driver (PATCH8) is probed > provides MSI domain for platform devices. > - Re-structure the APLIC platform driver into three separe sources: main, > direct mode, and MSI mode. > > Changes since v4: > - Rebased on Linux-6.5-rc1 > - Added "Dependencies" in the APLIC bindings (PATCH6 in v4) > - Dropped the PATCH6 which was changing the IOMMU DMA domain APIs > - Dropped use of IOMMU DMA APIs in the IMSIC driver (PATCH4) > > Changes since v3: > - Rebased on Linux-6.4-rc6 > - Droped PATCH2 of v3 series instead we now set FWNODE_FLAG_BEST_EFFORT via > IRQCHIP_DECLARE() > - Extend riscv_fw_parent_hartid() to support both DT and ACPI in PATCH1 > - Extend iommu_dma_compose_msi_msg() instead of adding iommu_dma_select_msi() > in PATCH6 > - Addressed Conor's comments in PATCH3 > - Addressed Conor's and Rob's comments in PATCH7 > > Changes since v2: > - Rebased on Linux-6.4-rc1 > - Addressed Rob's comments on DT bindings patches 4 and 8. > - Addessed Marc's comments on IMSIC driver PATCH5 > - Replaced use of OF apis in APLIC and IMSIC drivers with FWNODE apis > this makes both drivers easily portable for ACPI support. This also > removes unnecessary indirection from the APLIC and IMSIC drivers. > - PATCH1 is a new patch for portability with ACPI support > - PATCH2 is a new patch to fix probing in APLIC drivers for APLIC-only systems. > - PATCH7 is a new patch which addresses the IOMMU DMA domain issues pointed > out by SiFive > > Changes since v1: > - Rebased on Linux-6.2-rc2 > - Addressed comments on IMSIC DT bindings for PATCH4 > - Use raw_spin_lock_irqsave() on ids_lock for PATCH5 > - Improved MMIO alignment checks in PATCH5 to allow MMIO regions > with holes. > - Addressed comments on APLIC DT bindings for PATCH6 > - Fixed warning splat in aplic_msi_write_msg() caused by > zeroed MSI message in PATCH7 > - Dropped DT property riscv,slow-ipi instead will have module > parameter in future. > > Anup Patel (11): > irqchip/sifive-plic: Convert PLIC driver into a platform driver > irqchip/riscv-intc: Add support for RISC-V AIA > dt-bindings: interrupt-controller: Add RISC-V incoming MSI controller > irqchip: Add RISC-V incoming MSI controller early driver > irqchip/riscv-imsic: Add device MSI domain support for platform > devices > irqchip/riscv-imsic: Add device MSI domain support for PCI devices > dt-bindings: interrupt-controller: Add RISC-V advanced PLIC > irqchip: Add RISC-V advanced PLIC driver for direct-mode > irqchip/riscv-aplic: Add support for MSI-mode > RISC-V: Select APLIC and IMSIC drivers > MAINTAINERS: Add entry for RISC-V AIA drivers > > Björn Töpel (1): > genirq/matrix: Dynamic bitmap allocation > > Thomas Gleixner (13): > irqchip/gic-v3: Make gic_irq_domain_select() robust for zero parameter > count > genirq/irqdomain: Remove the param count restriction from select() > genirq/msi: Extend msi_parent_ops > genirq/irqdomain: Add DOMAIN_BUS_DEVICE_IMS > platform-msi: Prepare for real per device domains > irqchip: Convert all platform MSI users to the new API > genirq/msi: Provide optional translation op > genirq/msi: Split msi_domain_alloc_irq_at() > genirq/msi: Provide DOMAIN_BUS_WIRED_TO_MSI > genirq/msi: Optionally use dev->fwnode for device domain > genirq/msi: Provide allocation/free functions for "wired" MSI > interrupts > genirq/irqdomain: Reroute device MSI create_mapping > genirq/msi: Provide MSI_FLAG_PARENT_PM_DEV I have rebased and included 13 patches (which add per-device MSI domain infrastructure) from your series [1]. In this series, the IMSIC driver implements the msi_parent_ops and APLIC driver implements wired-to-msi bridge using your new infrastructure. The remaining 27 patches of your series [1] requires testing on ARM platforms which I don't have. I suggest these remaining patches to go as separate series. I hope you are okay with this approach. Best Regards, Anup [1] https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/ > > .../interrupt-controller/riscv,aplic.yaml | 172 ++++ > .../interrupt-controller/riscv,imsics.yaml | 172 ++++ > MAINTAINERS | 14 + > arch/riscv/Kconfig | 2 + > arch/x86/include/asm/hw_irq.h | 2 - > drivers/base/platform-msi.c | 97 ++ > drivers/dma/mv_xor_v2.c | 8 +- > drivers/dma/qcom/hidma.c | 6 +- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +- > drivers/irqchip/Kconfig | 25 + > drivers/irqchip/Makefile | 3 + > drivers/irqchip/irq-gic-v3.c | 6 +- > drivers/irqchip/irq-riscv-aplic-direct.c | 343 +++++++ > drivers/irqchip/irq-riscv-aplic-main.c | 232 +++++ > drivers/irqchip/irq-riscv-aplic-main.h | 53 ++ > drivers/irqchip/irq-riscv-aplic-msi.c | 256 +++++ > drivers/irqchip/irq-riscv-imsic-early.c | 241 +++++ > drivers/irqchip/irq-riscv-imsic-platform.c | 403 ++++++++ > drivers/irqchip/irq-riscv-imsic-state.c | 887 ++++++++++++++++++ > drivers/irqchip/irq-riscv-imsic-state.h | 105 +++ > drivers/irqchip/irq-riscv-intc.c | 34 +- > drivers/irqchip/irq-sifive-plic.c | 239 +++-- > drivers/mailbox/bcm-flexrm-mailbox.c | 8 +- > drivers/perf/arm_smmuv3_pmu.c | 4 +- > drivers/ufs/host/ufs-qcom.c | 8 +- > include/linux/irqchip/riscv-aplic.h | 119 +++ > include/linux/irqchip/riscv-imsic.h | 87 ++ > include/linux/irqdomain.h | 17 + > include/linux/irqdomain_defs.h | 2 + > include/linux/msi.h | 21 + > kernel/irq/irqdomain.c | 28 +- > kernel/irq/matrix.c | 28 +- > kernel/irq/msi.c | 184 +++- > 33 files changed, 3636 insertions(+), 175 deletions(-) > create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,aplic.yaml > create mode 100644 Documentation/devicetree/bindings/interrupt-controller/riscv,imsics.yaml > create mode 100644 drivers/irqchip/irq-riscv-aplic-direct.c > create mode 100644 drivers/irqchip/irq-riscv-aplic-main.c > create mode 100644 drivers/irqchip/irq-riscv-aplic-main.h > create mode 100644 drivers/irqchip/irq-riscv-aplic-msi.c > create mode 100644 drivers/irqchip/irq-riscv-imsic-early.c > create mode 100644 drivers/irqchip/irq-riscv-imsic-platform.c > create mode 100644 drivers/irqchip/irq-riscv-imsic-state.c > create mode 100644 drivers/irqchip/irq-riscv-imsic-state.h > create mode 100644 include/linux/irqchip/riscv-aplic.h > create mode 100644 include/linux/irqchip/riscv-imsic.h > > -- > 2.34.1 >
Anup Patel <apatel@ventanamicro.com> writes: > The RISC-V AIA specification is ratified as-per the RISC-V international > process. The latest ratified AIA specifcation can be found at: > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > > At a high-level, the AIA specification adds three things: > 1) AIA CSRs > - Improved local interrupt support > 2) Incoming Message Signaled Interrupt Controller (IMSIC) > - Per-HART MSI controller > - Support MSI virtualization > - Support IPI along with virtualization > 3) Advanced Platform-Level Interrupt Controller (APLIC) > - Wired interrupt controller > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > - In Direct-mode, injects external interrupts directly into HARTs > > For an overview of the AIA specification, refer the AIA virtualization > talk at KVM Forum 2022: > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > https://www.youtube.com/watch?v=r071dL8Z0yo > > To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). > > These patches can also be found in the riscv_aia_v12 branch at: > https://github.com/avpatel/linux.git > > Changes since v11: > - Rebased on Linux-6.8-rc1 > - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > MSI handling to per device MSI domains" series by Thomas. > (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > - Updated IMSIC driver to support per-device MSI domains for PCI and > platform devices. Thanks for working on this, Anup! I'm still reviewing the patches. I'm hitting a boot hang in text patching, with this series applied on 6.8-rc2. IPI issues? I'm booting with U-boot UEFI. kernel config: https://gist.github.com/bjoto/bac563e6dcaab68dba1a5eaf675d51aa QEMU 8.2.0/OpenSBI 1.4: | qemu-system-riscv64 \ | -machine virt,acpi=off,aia=aplic-imsic \ | -cpu rv64,v=true,vlen=256,elen=64,h=true,zbkb=on,zbkc=on,zbkx=on,zkr=on,zkt=on,svinval=on,svnapot=on,svpbmt=on \ | -smp 4 \ | -object rng-random,filename=/dev/urandom,id=rng0 \ | -device virtio-rng-device,rng=rng0 \ | -append "root=/dev/vda2 rw earlycon console=tty0 console=ttyS0 panic=-1 oops=panic sysctl.vm.panic_on_oom=1" \ | -m 4G \ | ... Last lines from the kernel: | ... | goldfish_rtc 101000.rtc: registered as rtc0 | goldfish_rtc 101000.rtc: setting system clock to 2024-01-30T06:39:28 UTC (1706596768) Same kernel boots w/ "-machine virt,acpi=off" (AIA is *not* enabled). Related or not, I got this splat (once) a ftrace kselftest: | # selftests: ftrace: ftracetest-ktap | Unable to handle kernel paging request at virtual address 5a5a5a5a5a5a5ac2 | Oops [#1] | Modules linked in: drm fuse i2c_core drm_panel_orientation_quirks backlight dm_mod configfs ip_tables x_tables [last unloaded: trace_printk] | CPU: 2 PID: 19691 Comm: ls Tainted: G W 6.8.0-rc2-kselftest_plain #1 | Hardware name: riscv-virtio,qemu (DT) | epc : set_top_events_ownership+0x14/0x5c | ra : eventfs_get_attr+0x2e/0x50 | epc : ffffffff80533aa4 ra : ffffffff80533b1a sp : ff20000001cebc70 | gp : ffffffff8258b860 tp : ff6000008623e240 t0 : ffffffff80533a98 | t1 : ffffffff825b6b60 t2 : 0000000000000008 s0 : ff20000001cebc80 | s1 : ffffffff8233c000 a0 : ff6000009224e9b8 a1 : ff20000001cebd28 | a2 : ff20000001cebd98 a3 : 000000000000025e a4 : ffffffff80000000 | a5 : 5a5a5a5a5a5a5a5a a6 : 0000000000000000 a7 : 0000000000735049 | s2 : 000000000000025e s3 : ff20000001cebd98 s4 : ff6000009224e9b8 | s5 : ff20000001cebd28 s6 : ffffffffffffff9c s7 : ff6000008ac6a1c0 | s8 : 00007fff9f685d80 s9 : 0000000000000000 s10: 00007fffd4550ef0 | s11: 0000000000000000 t3 : 0000000000000001 t4 : 0000000000000016 | t5 : ffffffff818145be t6 : ff6000009233d77e | status: 0000000200000120 badaddr: 5a5a5a5a5a5a5ac2 cause: 000000000000000d | [<ffffffff80533aa4>] set_top_events_ownership+0x14/0x5c | Code: b297 ffad 82e7 d302 1141 e422 0800 3783 ff85 cb89 (57b8) 8b09 | ---[ end trace 0000000000000000 ]--- This might be unrelated, but the hang above is on every boot. Björn
Björn Töpel <bjorn@kernel.org> writes: > Anup Patel <apatel@ventanamicro.com> writes: > >> The RISC-V AIA specification is ratified as-per the RISC-V international >> process. The latest ratified AIA specifcation can be found at: >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf >> >> At a high-level, the AIA specification adds three things: >> 1) AIA CSRs >> - Improved local interrupt support >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) >> - Per-HART MSI controller >> - Support MSI virtualization >> - Support IPI along with virtualization >> 3) Advanced Platform-Level Interrupt Controller (APLIC) >> - Wired interrupt controller >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) >> - In Direct-mode, injects external interrupts directly into HARTs >> >> For an overview of the AIA specification, refer the AIA virtualization >> talk at KVM Forum 2022: >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf >> https://www.youtube.com/watch?v=r071dL8Z0yo >> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). >> >> These patches can also be found in the riscv_aia_v12 branch at: >> https://github.com/avpatel/linux.git >> >> Changes since v11: >> - Rebased on Linux-6.8-rc1 >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM >> MSI handling to per device MSI domains" series by Thomas. >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. >> - Updated IMSIC driver to support per-device MSI domains for PCI and >> platform devices. > > Thanks for working on this, Anup! I'm still reviewing the patches. > > I'm hitting a boot hang in text patching, with this series applied on > 6.8-rc2. IPI issues? Not text patching! One cpu spinning in smp_call_function_many_cond() and the others are in cpu_relax(). Smells like IPI...
On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: > > Björn Töpel <bjorn@kernel.org> writes: > > > Anup Patel <apatel@ventanamicro.com> writes: > > > >> The RISC-V AIA specification is ratified as-per the RISC-V international > >> process. The latest ratified AIA specifcation can be found at: > >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > >> > >> At a high-level, the AIA specification adds three things: > >> 1) AIA CSRs > >> - Improved local interrupt support > >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) > >> - Per-HART MSI controller > >> - Support MSI virtualization > >> - Support IPI along with virtualization > >> 3) Advanced Platform-Level Interrupt Controller (APLIC) > >> - Wired interrupt controller > >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > >> - In Direct-mode, injects external interrupts directly into HARTs > >> > >> For an overview of the AIA specification, refer the AIA virtualization > >> talk at KVM Forum 2022: > >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > >> https://www.youtube.com/watch?v=r071dL8Z0yo > >> > >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). > >> > >> These patches can also be found in the riscv_aia_v12 branch at: > >> https://github.com/avpatel/linux.git > >> > >> Changes since v11: > >> - Rebased on Linux-6.8-rc1 > >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > >> MSI handling to per device MSI domains" series by Thomas. > >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > >> - Updated IMSIC driver to support per-device MSI domains for PCI and > >> platform devices. > > > > Thanks for working on this, Anup! I'm still reviewing the patches. > > > > I'm hitting a boot hang in text patching, with this series applied on > > 6.8-rc2. IPI issues? > > Not text patching! One cpu spinning in smp_call_function_many_cond() and > the others are in cpu_relax(). Smells like IPI... Can you share the complete bootlog ? Regards, Anup
On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: > > Björn Töpel <bjorn@kernel.org> writes: > > > Anup Patel <apatel@ventanamicro.com> writes: > > > >> The RISC-V AIA specification is ratified as-per the RISC-V international > >> process. The latest ratified AIA specifcation can be found at: > >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > >> > >> At a high-level, the AIA specification adds three things: > >> 1) AIA CSRs > >> - Improved local interrupt support > >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) > >> - Per-HART MSI controller > >> - Support MSI virtualization > >> - Support IPI along with virtualization > >> 3) Advanced Platform-Level Interrupt Controller (APLIC) > >> - Wired interrupt controller > >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > >> - In Direct-mode, injects external interrupts directly into HARTs > >> > >> For an overview of the AIA specification, refer the AIA virtualization > >> talk at KVM Forum 2022: > >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > >> https://www.youtube.com/watch?v=r071dL8Z0yo > >> > >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). > >> > >> These patches can also be found in the riscv_aia_v12 branch at: > >> https://github.com/avpatel/linux.git > >> > >> Changes since v11: > >> - Rebased on Linux-6.8-rc1 > >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > >> MSI handling to per device MSI domains" series by Thomas. > >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > >> - Updated IMSIC driver to support per-device MSI domains for PCI and > >> platform devices. > > > > Thanks for working on this, Anup! I'm still reviewing the patches. > > > > I'm hitting a boot hang in text patching, with this series applied on > > 6.8-rc2. IPI issues? > > Not text patching! One cpu spinning in smp_call_function_many_cond() and > the others are in cpu_relax(). Smells like IPI... I tried bootefi from U-Boot multiple times but can't reproduce the issue you are seeing. Here's my boot log ... $ qemu-system-riscv64 -M virt,aia=aplic-imsic -m 256M -display none -serial stdio -bios opensbi/build/platform/generic/firmware/fw_jump.bin -kernel ./u-boot/u-boot.bin -device loader,file=./build-riscv64/arch/riscv/boot/Image,addr=0x84000000 -drive file=./rootfs_riscv64.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -device virtio-net-device,netdev=eth0 -netdev user,id=eth0 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-device,rng=rng0 -append "root=/dev/vda rootwait rw console=ttyS0 earlycon" -smp 8 OpenSBI v1.4-8-gbb90a9e ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : riscv-virtio,qemu Platform Features : medeleg Platform HART Count : 8 Platform IPI Device : aia-imsic Platform Timer Device : aclint-mtimer @ 10000000Hz Platform Console Device : uart8250 Platform HSM Device : --- Platform PMU Device : --- Platform Reboot Device : syscon-reboot Platform Shutdown Device : syscon-poweroff Platform Suspend Device : --- Platform CPPC Device : --- Firmware Base : 0x80000000 Firmware Size : 395 KB Firmware RW Offset : 0x40000 Firmware RW Size : 139 KB Firmware Heap Offset : 0x56000 Firmware Heap Size : 51 KB (total), 3 KB (reserved), 12 KB (used), 36 KB (free) Firmware Scratch Size : 4096 B (total), 328 B (used), 3768 B (free) Runtime SBI Version : 2.0 Domain0 Name : root Domain0 Boot HART : 7 Domain0 HARTs : 0*,1*,2*,3*,4*,5*,6*,7* Domain0 Region00 : 0x0000000000100000-0x0000000000100fff M: (I,R,W) S/U: (R,W) Domain0 Region01 : 0x0000000010000000-0x0000000010000fff M: (I,R,W) S/U: (R,W) Domain0 Region02 : 0x000000000c000000-0x000000000c007fff M: (I,R,W) S/U: () Domain0 Region03 : 0x0000000024000000-0x0000000024007fff M: (I,R,W) S/U: () Domain0 Region04 : 0x0000000002000000-0x000000000200ffff M: (I,R,W) S/U: () Domain0 Region05 : 0x0000000080000000-0x000000008003ffff M: (R,X) S/U: () Domain0 Region06 : 0x0000000080040000-0x000000008007ffff M: (R,W) S/U: () Domain0 Region07 : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X) Domain0 Next Address : 0x0000000080200000 Domain0 Next Arg1 : 0x0000000082200000 Domain0 Next Mode : S-mode Domain0 SysReset : yes Domain0 SysSuspend : yes Boot HART ID : 7 Boot HART Domain : root Boot HART Priv Version : v1.12 Boot HART Base ISA : rv64imafdch Boot HART ISA Extensions : smaia,sstc,zicntr,zihpm,zicboz,zicbom,sdtrig Boot HART PMP Count : 16 Boot HART PMP Granularity : 2 bits Boot HART PMP Address Bits: 54 Boot HART MHPM Info : 16 (0x0007fff8) Boot HART Debug Triggers : 2 triggers Boot HART MIDELEG : 0x0000000000001666 Boot HART MEDELEG : 0x0000000000f0b509 U-Boot 2023.10 (Nov 07 2023 - 18:28:29 +0530) CPU: rv64imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_smaia_ssaia_sstc_svadu Model: riscv-virtio,qemu DRAM: 256 MiB Core: 37 devices, 16 uclasses, devicetree: board Flash: 32 MiB Loading Environment from nowhere... OK In: serial,usbkbd Out: serial,vidconsole Err: serial,vidconsole No working controllers found Net: eth0: virtio-net#1 Working FDT set to 8ef1f870 Hit any key to stop autoboot: 0 => bootefi ${kernel_addr_r}:0x1600000 ${fdtcontroladdr} No EFI system partition No EFI system partition Failed to persist EFI variables Booting /MemoryMapped(0x0,0x84000000,0x1600000) EFI stub: Booting Linux Kernel... EFI stub: Using DTB from configuration table EFI stub: Exiting boot services... [ 0.000000] Linux version 6.8.0-rc1-00039-gd9b9d6eb987f (anup@anup-ubuntu-vm) (riscv64-unknown-linux-gnu-gcc (g2ee5e430018) 12.2.0, GNU ld (GNU Binutils) 2.40.0.20230214) #67 SMP Sat Jan 27 17:20:09 IST 2024 [ 0.000000] random: crng init done [ 0.000000] Machine model: riscv-virtio,qemu [ 0.000000] SBI specification v2.0 detected [ 0.000000] SBI implementation ID=0x1 Version=0x10004 [ 0.000000] SBI TIME extension detected [ 0.000000] SBI IPI extension detected [ 0.000000] SBI RFENCE extension detected [ 0.000000] SBI SRST extension detected [ 0.000000] SBI DBCN extension detected [ 0.000000] earlycon: ns16550a0 at MMIO 0x0000000010000000 (options '') [ 0.000000] printk: legacy bootconsole [ns16550a0] enabled [ 0.000000] efi: EFI v2.10 by Das U-Boot [ 0.000000] efi: RTPROP=0x8df05040 SMBIOS=0x8df01000 RNG=0x8c972040 MEMRESERVE=0x8c971040 [ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008003ffff (256 KiB) nomap non-reusable mmode_resv0@80000000 [ 0.000000] OF: reserved mem: 0x0000000080040000..0x000000008007ffff (256 KiB) nomap non-reusable mmode_resv1@80040000 [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080000000-0x000000008fffffff] [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x000000008007ffff] [ 0.000000] node 0: [mem 0x0000000080080000-0x000000008df00fff] [ 0.000000] node 0: [mem 0x000000008df01000-0x000000008df01fff] [ 0.000000] node 0: [mem 0x000000008df02000-0x000000008df04fff] [ 0.000000] node 0: [mem 0x000000008df05000-0x000000008df05fff] [ 0.000000] node 0: [mem 0x000000008df06000-0x000000008df06fff] [ 0.000000] node 0: [mem 0x000000008df07000-0x000000008df08fff] [ 0.000000] node 0: [mem 0x000000008df09000-0x000000008df09fff] [ 0.000000] node 0: [mem 0x000000008df0a000-0x000000008df19fff] [ 0.000000] node 0: [mem 0x000000008df1a000-0x000000008f741fff] [ 0.000000] node 0: [mem 0x000000008f742000-0x000000008f742fff] [ 0.000000] node 0: [mem 0x000000008f743000-0x000000008fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x000000008fffffff] [ 0.000000] SBI HSM extension detected [ 0.000000] Falling back to deprecated "riscv,isa" [ 0.000000] riscv: base ISA extensions acdfhim [ 0.000000] riscv: ELF capabilities acdfim [ 0.000000] percpu: Embedded 20 pages/cpu s41464 r8192 d32264 u81920 [ 0.000000] Kernel command line: root=/dev/vda rootwait rw console=ttyS0 earlycon [ 0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64512 [ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off [ 0.000000] Virtual kernel memory layout: [ 0.000000] fixmap : 0xff1bfffffea00000 - 0xff1bffffff000000 (6144 kB) [ 0.000000] pci io : 0xff1bffffff000000 - 0xff1c000000000000 ( 16 MB) [ 0.000000] vmemmap : 0xff1c000000000000 - 0xff20000000000000 (1024 TB) [ 0.000000] vmalloc : 0xff20000000000000 - 0xff60000000000000 (16384 TB) [ 0.000000] modules : 0xffffffff01582000 - 0xffffffff80000000 (2026 MB) [ 0.000000] lowmem : 0xff60000000000000 - 0xff60000010000000 ( 256 MB) [ 0.000000] kernel : 0xffffffff80000000 - 0xffffffffffffffff (2047 MB) [ 0.000000] Memory: 217364K/262144K available (9190K kernel code, 4939K rwdata, 4096K rodata, 2252K init, 489K bss, 44780K reserved, 0K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] rcu: Hierarchical RCU implementation. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8. [ 0.000000] rcu: RCU debug extended QS entry/exit. [ 0.000000] Tracing variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] riscv-intc: 64 local interrupts mapped using AIA [ 0.000000] riscv-imsic: imsics@28000000: providing IPIs using interrupt 1 [ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention. [ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns [ 0.000087] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps every 4398046511100ns [ 0.001406] riscv-timer: Timer interrupt in S-mode is available via sstc extension [ 0.007310] Console: colour dummy device 80x25 [ 0.014343] Calibrating delay loop (skipped), value calculated using timer frequency.. 20.00 BogoMIPS (lpj=40000) [ 0.018315] pid_max: default: 32768 minimum: 301 [ 0.020982] LSM: initializing lsm=capability,integrity [ 0.023969] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear) [ 0.025231] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear) [ 0.066845] RCU Tasks Trace: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1. [ 0.068829] riscv: ELF compat mode supported [ 0.069115] ASID allocator using 16 bits (65536 entries) [ 0.080952] rcu: Hierarchical SRCU implementation. [ 0.081712] rcu: Max phase no-delay instances is 1000. [ 0.086381] Remapping and enabling EFI services. [ 0.093736] smp: Bringing up secondary CPUs ... [ 0.162264] smp: Brought up 1 node, 8 CPUs [ 0.186107] devtmpfs: initialized [ 0.199725] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.200634] futex hash table entries: 2048 (order: 5, 131072 bytes, linear) [ 0.203482] pinctrl core: initialized pinctrl subsystem [ 0.213664] NET: Registered PF_NETLINK/PF_ROUTE protocol family [ 0.218255] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations [ 0.221185] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations [ 0.222099] audit: initializing netlink subsys (disabled) [ 0.228028] audit: type=2000 audit(0.168:1): state=initialized audit_enabled=0 res=1 [ 0.230906] cpuidle: using governor menu [ 0.289647] cpu2: Ratio of byte access time to unaligned word access is 7.20, unaligned accesses are fast [ 0.289661] cpu3: Ratio of byte access time to unaligned word access is 5.94, unaligned accesses are fast [ 0.289652] cpu4: Ratio of byte access time to unaligned word access is 7.13, unaligned accesses are fast [ 0.289625] cpu6: Ratio of byte access time to unaligned word access is 10.28, unaligned accesses are fast [ 0.289615] cpu1: Ratio of byte access time to unaligned word access is 8.04, unaligned accesses are fast [ 0.290252] cpu5: Ratio of byte access time to unaligned word access is 7.13, unaligned accesses are fast [ 0.299499] cpu7: Ratio of byte access time to unaligned word access is 6.58, unaligned accesses are fast [ 0.326695] cpu0: Ratio of byte access time to unaligned word access is 7.78, unaligned accesses are fast [ 0.354371] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages [ 0.354767] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page [ 0.361699] ACPI: Interpreter disabled. [ 0.363441] iommu: Default domain type: Translated [ 0.364215] iommu: DMA domain TLB invalidation policy: strict mode [ 0.368128] SCSI subsystem initialized [ 0.371067] usbcore: registered new interface driver usbfs [ 0.371912] usbcore: registered new interface driver hub [ 0.372389] usbcore: registered new device driver usb [ 0.375075] efivars: Registered efivars operations [ 0.389652] vgaarb: loaded [ 0.443368] clocksource: Switched to clocksource riscv_clocksource [ 0.449125] pnp: PnP ACPI: disabled [ 0.499449] NET: Registered PF_INET protocol family [ 0.500979] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linear) [ 0.507062] tcp_listen_portaddr_hash hash table entries: 128 (order: 0, 4096 bytes, linear) [ 0.507775] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear) [ 0.508351] TCP established hash table entries: 2048 (order: 2, 16384 bytes, linear) [ 0.508930] TCP bind hash table entries: 2048 (order: 5, 131072 bytes, linear) [ 0.509942] TCP: Hash tables configured (established 2048 bind 2048) [ 0.511459] UDP hash table entries: 256 (order: 2, 24576 bytes, linear) [ 0.512262] UDP-Lite hash table entries: 256 (order: 2, 24576 bytes, linear) [ 0.514937] NET: Registered PF_UNIX/PF_LOCAL protocol family [ 0.521225] RPC: Registered named UNIX socket transport module. [ 0.521913] RPC: Registered udp transport module. [ 0.522324] RPC: Registered tcp transport module. [ 0.522656] RPC: Registered tcp-with-tls transport module. [ 0.523178] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.523787] PCI: CLS 0 bytes, default 64 [ 0.529358] workingset: timestamp_bits=46 max_order=16 bucket_order=0 [ 0.537946] NFS: Registering the id_resolver key type [ 0.539478] Key type id_resolver registered [ 0.539918] Key type id_legacy registered [ 0.540656] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 0.542911] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering... [ 0.544894] 9p: Installing v9fs 9p2000 file system support [ 0.548517] NET: Registered PF_ALG protocol family [ 0.549459] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245) [ 0.550658] io scheduler mq-deadline registered [ 0.552112] io scheduler kyber registered [ 0.552442] io scheduler bfq registered [ 0.556517] riscv-imsic: imsics@28000000: hart-index-bits: 3, guest-index-bits: 0 [ 0.556955] riscv-imsic: imsics@28000000: group-index-bits: 0, group-index-shift: 24 [ 0.557403] riscv-imsic: imsics@28000000: per-CPU IDs 255 at base PPN 0x0000000028000000 [ 0.557699] riscv-imsic: imsics@28000000: total 2032 interrupts available [ 0.561962] pci-host-generic 30000000.pci: host bridge /soc/pci@30000000 ranges: [ 0.563422] pci-host-generic 30000000.pci: IO 0x0003000000..0x000300ffff -> 0x0000000000 [ 0.564475] pci-host-generic 30000000.pci: MEM 0x0040000000..0x007fffffff -> 0x0040000000 [ 0.565013] pci-host-generic 30000000.pci: MEM 0x0400000000..0x07ffffffff -> 0x0400000000 [ 0.566349] pci-host-generic 30000000.pci: Memory resource size exceeds max for 32 bits [ 0.567633] pci-host-generic 30000000.pci: ECAM at [mem 0x30000000-0x3fffffff] for [bus 00-ff] [ 0.569300] pci-host-generic 30000000.pci: PCI host bridge to bus 0000:00 [ 0.570172] pci_bus 0000:00: root bus resource [bus 00-ff] [ 0.570559] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] [ 0.570969] pci_bus 0000:00: root bus resource [mem 0x40000000-0x7fffffff] [ 0.571595] pci_bus 0000:00: root bus resource [mem 0x400000000-0x7ffffffff] [ 0.573646] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000 conventional PCI endpoint [ 0.654069] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.659475] SuperH (H)SCI(F) driver initialized [ 0.675004] loop: module loaded [ 0.680024] e1000e: Intel(R) PRO/1000 Network Driver [ 0.680162] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 0.684590] usbcore: registered new interface driver uas [ 0.685245] usbcore: registered new interface driver usb-storage [ 0.686530] mousedev: PS/2 mouse device common for all mice [ 0.693125] syscon-poweroff poweroff: pm_power_off already claimed for sbi_srst_power_off [ 0.694774] syscon-poweroff: probe of poweroff failed with error -16 [ 0.698092] sdhci: Secure Digital Host Controller Interface driver [ 0.698484] sdhci: Copyright(c) Pierre Ossman [ 0.699333] Synopsys Designware Multimedia Card Interface Driver [ 0.699869] sdhci-pltfm: SDHCI platform and OF driver helper [ 0.700673] usbcore: registered new interface driver usbhid [ 0.700920] usbhid: USB HID core driver [ 0.701501] riscv-pmu-sbi: SBI PMU extension is available [ 0.702263] riscv-pmu-sbi: 16 firmware and 18 hardware counters [ 0.702618] riscv-pmu-sbi: Perf sampling/filtering is not supported as sscof extension is not available [ 0.709934] NET: Registered PF_INET6 protocol family [ 0.723647] Segment Routing with IPv6 [ 0.724210] In-situ OAM (IOAM) with IPv6 [ 0.724882] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver [ 0.727936] NET: Registered PF_PACKET protocol family [ 0.729289] 9pnet: Installing 9P2000 support [ 0.729796] Key type dns_resolver registered [ 0.763394] debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers [ 0.772054] riscv-aplic d000000.aplic: 96 interrupts forwared to MSI base 0x0000000028000000 [ 0.775578] virtio_blk virtio0: 1/0/0 default/read/poll queues [ 0.782792] virtio_blk virtio0: [vda] 65536 512-byte logical blocks (33.6 MB/32.0 MiB) [ 0.827635] printk: legacy console [ttyS0] disabled [ 0.830784] 10000000.serial: ttyS0 at MMIO 0x10000000 (irq = 14, base_baud = 230400) is a 16550A [ 0.833076] printk: legacy console [ttyS0] enabled [ 0.833076] printk: legacy console [ttyS0] enabled [ 0.833856] printk: legacy bootconsole [ns16550a0] disabled [ 0.833856] printk: legacy bootconsole [ns16550a0] disabled [ 0.843499] goldfish_rtc 101000.rtc: registered as rtc0 [ 0.844980] goldfish_rtc 101000.rtc: setting system clock to 2024-01-30T10:19:33 UTC (1706609973) [ 0.848495] clk: Disabling unused clocks [ 0.884046] EXT4-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended [ 0.891369] EXT4-fs (vda): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: disabled. [ 0.892199] ext4 filesystem being mounted at /root supports timestamps until 2038-01-19 (0x7fffffff) [ 0.892644] VFS: Mounted root (ext4 filesystem) on device 254:0. [ 0.895564] devtmpfs: mounted [ 0.986847] Freeing unused kernel image (initmem) memory: 2252K [ 0.988406] Run /sbin/init as init process mount: mounting devtmpfs on /dev failed: Device or resource busy _ _ | ||_| | | _ ____ _ _ _ _ | || | _ \| | | |\ \/ / | || | | | | |_| |/ \ |_||_|_| |_|\____|\_/\_/ Busybox Rootfs Please press Enter to activate this console. / # / # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 10: 103 116 58 214 96 47 78 52 RISC-V INTC 5 Edge riscv-timer 11: 0 44 0 0 0 0 0 0 APLIC-MSI-d000000.aplic 8 Level virtio0 12: 0 0 0 0 0 0 0 0 APLIC-MSI-d000000.aplic 7 Level virtio1 13: 0 0 0 6 0 0 0 0 APLIC-MSI-d000000.aplic 6 Level virtio2 14: 0 0 0 0 64 0 0 0 APLIC-MSI-d000000.aplic 10 Level ttyS0 15: 0 0 0 0 0 0 0 0 APLIC-MSI-d000000.aplic 11 Level 101000.rtc IPI0: 4 9 12 6 5 10 13 7 Rescheduling interrupts IPI1: 605 477 442 315 392 434 405 417 Function call interrupts IPI2: 0 0 0 0 0 0 0 0 CPU stop interrupts IPI3: 0 0 0 0 0 0 0 0 CPU stop (for crash dump) interrupts IPI4: 0 0 0 0 0 0 0 0 IRQ work interrupts IPI5: 0 0 0 0 0 0 0 0 Timer broadcast interrupts / # / # / # poweroff / # umount: devtmpfs busy - remounted read-only [ 24.504316] EXT4-fs (vda): re-mounted 00000000-0000-0000-0000-000000000000 ro. Quota mode: disabled. The system is going down NOW! Sent SIGTERM to all processes Sent SIGKILL to[ 26.543142] reboot: Power down Regards, Anup
Anup Patel <apatel@ventanamicro.com> writes: > On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Björn Töpel <bjorn@kernel.org> writes: >> >> > Anup Patel <apatel@ventanamicro.com> writes: >> > >> >> The RISC-V AIA specification is ratified as-per the RISC-V international >> >> process. The latest ratified AIA specifcation can be found at: >> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf >> >> >> >> At a high-level, the AIA specification adds three things: >> >> 1) AIA CSRs >> >> - Improved local interrupt support >> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) >> >> - Per-HART MSI controller >> >> - Support MSI virtualization >> >> - Support IPI along with virtualization >> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) >> >> - Wired interrupt controller >> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) >> >> - In Direct-mode, injects external interrupts directly into HARTs >> >> >> >> For an overview of the AIA specification, refer the AIA virtualization >> >> talk at KVM Forum 2022: >> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf >> >> https://www.youtube.com/watch?v=r071dL8Z0yo >> >> >> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). >> >> >> >> These patches can also be found in the riscv_aia_v12 branch at: >> >> https://github.com/avpatel/linux.git >> >> >> >> Changes since v11: >> >> - Rebased on Linux-6.8-rc1 >> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM >> >> MSI handling to per device MSI domains" series by Thomas. >> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, >> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of >> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) >> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. >> >> - Updated IMSIC driver to support per-device MSI domains for PCI and >> >> platform devices. >> > >> > Thanks for working on this, Anup! I'm still reviewing the patches. >> > >> > I'm hitting a boot hang in text patching, with this series applied on >> > 6.8-rc2. IPI issues? >> >> Not text patching! One cpu spinning in smp_call_function_many_cond() and >> the others are in cpu_relax(). Smells like IPI... > > Can you share the complete bootlog ? Here: https://gist.github.com/bjoto/04a580568378f3b5483af07cd9d22501
Anup Patel <apatel@ventanamicro.com> writes: > On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Björn Töpel <bjorn@kernel.org> writes: >> >> > Anup Patel <apatel@ventanamicro.com> writes: >> > >> >> The RISC-V AIA specification is ratified as-per the RISC-V international >> >> process. The latest ratified AIA specifcation can be found at: >> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf >> >> >> >> At a high-level, the AIA specification adds three things: >> >> 1) AIA CSRs >> >> - Improved local interrupt support >> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) >> >> - Per-HART MSI controller >> >> - Support MSI virtualization >> >> - Support IPI along with virtualization >> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) >> >> - Wired interrupt controller >> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) >> >> - In Direct-mode, injects external interrupts directly into HARTs >> >> >> >> For an overview of the AIA specification, refer the AIA virtualization >> >> talk at KVM Forum 2022: >> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf >> >> https://www.youtube.com/watch?v=r071dL8Z0yo >> >> >> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). >> >> >> >> These patches can also be found in the riscv_aia_v12 branch at: >> >> https://github.com/avpatel/linux.git >> >> >> >> Changes since v11: >> >> - Rebased on Linux-6.8-rc1 >> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM >> >> MSI handling to per device MSI domains" series by Thomas. >> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, >> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of >> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) >> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. >> >> - Updated IMSIC driver to support per-device MSI domains for PCI and >> >> platform devices. >> > >> > Thanks for working on this, Anup! I'm still reviewing the patches. >> > >> > I'm hitting a boot hang in text patching, with this series applied on >> > 6.8-rc2. IPI issues? >> >> Not text patching! One cpu spinning in smp_call_function_many_cond() and >> the others are in cpu_relax(). Smells like IPI... > > I tried bootefi from U-Boot multiple times but can't reproduce the > issue you are seeing. Thanks! I can reproduce without EFI, and simpler command-line: qemu-system-riscv64 \ -bios /path/to/fw_dynamic.bin \ -kernel /path/to/Image \ -append 'earlycon console=tty0 console=ttyS0' \ -machine virt,aia=aplic-imsic \ -no-reboot -nodefaults -nographic \ -smp 4 \ -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-rng-device,rng=rng0 \ -m 4G -chardev stdio,id=char0 -serial chardev:char0 I can reproduce with your upstream riscv_aia_v12 plus the config in the gist [1], and all latest QEMU/OpenSBI: QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging") OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found") Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers") Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e. panicking about missing root mount ;-)) Björn [1] https://gist.githubusercontent.com/bjoto/bac563e6dcaab68dba1a5eaf675d51aa/raw/ff6208fb17f27819dbe97ace7d034f385d2db657/gistfile1.txt
Björn Töpel <bjorn@kernel.org> writes: > Anup Patel <apatel@ventanamicro.com> writes: > >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: >>> >>> Björn Töpel <bjorn@kernel.org> writes: >>> >>> > Anup Patel <apatel@ventanamicro.com> writes: >>> > >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international >>> >> process. The latest ratified AIA specifcation can be found at: >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf >>> >> >>> >> At a high-level, the AIA specification adds three things: >>> >> 1) AIA CSRs >>> >> - Improved local interrupt support >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) >>> >> - Per-HART MSI controller >>> >> - Support MSI virtualization >>> >> - Support IPI along with virtualization >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) >>> >> - Wired interrupt controller >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) >>> >> - In Direct-mode, injects external interrupts directly into HARTs >>> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization >>> >> talk at KVM Forum 2022: >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo >>> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). >>> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at: >>> >> https://github.com/avpatel/linux.git >>> >> >>> >> Changes since v11: >>> >> - Rebased on Linux-6.8-rc1 >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM >>> >> MSI handling to per device MSI domains" series by Thomas. >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of >>> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and >>> >> platform devices. >>> > >>> > Thanks for working on this, Anup! I'm still reviewing the patches. >>> > >>> > I'm hitting a boot hang in text patching, with this series applied on >>> > 6.8-rc2. IPI issues? >>> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and >>> the others are in cpu_relax(). Smells like IPI... >> >> I tried bootefi from U-Boot multiple times but can't reproduce the >> issue you are seeing. > > Thanks! I can reproduce without EFI, and simpler command-line: > > qemu-system-riscv64 \ > -bios /path/to/fw_dynamic.bin \ > -kernel /path/to/Image \ > -append 'earlycon console=tty0 console=ttyS0' \ > -machine virt,aia=aplic-imsic \ > -no-reboot -nodefaults -nographic \ > -smp 4 \ > -object rng-random,filename=/dev/urandom,id=rng0 \ > -device virtio-rng-device,rng=rng0 \ > -m 4G -chardev stdio,id=char0 -serial chardev:char0 > > I can reproduce with your upstream riscv_aia_v12 plus the config in the > gist [1], and all latest QEMU/OpenSBI: > > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging") > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found") > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers") > > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e. > panicking about missing root mount ;-)) More context; The hang is during a late initcall, where an ftrace direct (register_ftrace_direct()) modification is done. Stop machine is used to call into __ftrace_modify_call(). Then into the arch specific patch_text_nosync(), where flush_icache_range() hangs in flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info, scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)" right before the end of the function. Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code patching fixes the boot hang, but it does seem related to IPI...
On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote: > > Björn Töpel <bjorn@kernel.org> writes: > > > Anup Patel <apatel@ventanamicro.com> writes: > > > >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: > >>> > >>> Björn Töpel <bjorn@kernel.org> writes: > >>> > >>> > Anup Patel <apatel@ventanamicro.com> writes: > >>> > > >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international > >>> >> process. The latest ratified AIA specifcation can be found at: > >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > >>> >> > >>> >> At a high-level, the AIA specification adds three things: > >>> >> 1) AIA CSRs > >>> >> - Improved local interrupt support > >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) > >>> >> - Per-HART MSI controller > >>> >> - Support MSI virtualization > >>> >> - Support IPI along with virtualization > >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) > >>> >> - Wired interrupt controller > >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > >>> >> - In Direct-mode, injects external interrupts directly into HARTs > >>> >> > >>> >> For an overview of the AIA specification, refer the AIA virtualization > >>> >> talk at KVM Forum 2022: > >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo > >>> >> > >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). > >>> >> > >>> >> These patches can also be found in the riscv_aia_v12 branch at: > >>> >> https://github.com/avpatel/linux.git > >>> >> > >>> >> Changes since v11: > >>> >> - Rebased on Linux-6.8-rc1 > >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > >>> >> MSI handling to per device MSI domains" series by Thomas. > >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > >>> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and > >>> >> platform devices. > >>> > > >>> > Thanks for working on this, Anup! I'm still reviewing the patches. > >>> > > >>> > I'm hitting a boot hang in text patching, with this series applied on > >>> > 6.8-rc2. IPI issues? > >>> > >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and > >>> the others are in cpu_relax(). Smells like IPI... > >> > >> I tried bootefi from U-Boot multiple times but can't reproduce the > >> issue you are seeing. > > > > Thanks! I can reproduce without EFI, and simpler command-line: > > > > qemu-system-riscv64 \ > > -bios /path/to/fw_dynamic.bin \ > > -kernel /path/to/Image \ > > -append 'earlycon console=tty0 console=ttyS0' \ > > -machine virt,aia=aplic-imsic \ > > -no-reboot -nodefaults -nographic \ > > -smp 4 \ > > -object rng-random,filename=/dev/urandom,id=rng0 \ > > -device virtio-rng-device,rng=rng0 \ > > -m 4G -chardev stdio,id=char0 -serial chardev:char0 > > > > I can reproduce with your upstream riscv_aia_v12 plus the config in the > > gist [1], and all latest QEMU/OpenSBI: > > > > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging") > > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found") > > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers") > > > > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e. > > panicking about missing root mount ;-)) > > More context; The hang is during a late initcall, where an ftrace direct > (register_ftrace_direct()) modification is done. > > Stop machine is used to call into __ftrace_modify_call(). Then into the > arch specific patch_text_nosync(), where flush_icache_range() hangs in > flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to > on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info, > scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)" > right before the end of the function. > > Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code > patching fixes the boot hang, but it does seem related to IPI... Thanks for the details, I will debug more at my end. Regards, Anup
On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote: > > Björn Töpel <bjorn@kernel.org> writes: > > > Anup Patel <apatel@ventanamicro.com> writes: > > > >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: > >>> > >>> Björn Töpel <bjorn@kernel.org> writes: > >>> > >>> > Anup Patel <apatel@ventanamicro.com> writes: > >>> > > >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international > >>> >> process. The latest ratified AIA specifcation can be found at: > >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > >>> >> > >>> >> At a high-level, the AIA specification adds three things: > >>> >> 1) AIA CSRs > >>> >> - Improved local interrupt support > >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) > >>> >> - Per-HART MSI controller > >>> >> - Support MSI virtualization > >>> >> - Support IPI along with virtualization > >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) > >>> >> - Wired interrupt controller > >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > >>> >> - In Direct-mode, injects external interrupts directly into HARTs > >>> >> > >>> >> For an overview of the AIA specification, refer the AIA virtualization > >>> >> talk at KVM Forum 2022: > >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo > >>> >> > >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). > >>> >> > >>> >> These patches can also be found in the riscv_aia_v12 branch at: > >>> >> https://github.com/avpatel/linux.git > >>> >> > >>> >> Changes since v11: > >>> >> - Rebased on Linux-6.8-rc1 > >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > >>> >> MSI handling to per device MSI domains" series by Thomas. > >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > >>> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and > >>> >> platform devices. > >>> > > >>> > Thanks for working on this, Anup! I'm still reviewing the patches. > >>> > > >>> > I'm hitting a boot hang in text patching, with this series applied on > >>> > 6.8-rc2. IPI issues? > >>> > >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and > >>> the others are in cpu_relax(). Smells like IPI... > >> > >> I tried bootefi from U-Boot multiple times but can't reproduce the > >> issue you are seeing. > > > > Thanks! I can reproduce without EFI, and simpler command-line: > > > > qemu-system-riscv64 \ > > -bios /path/to/fw_dynamic.bin \ > > -kernel /path/to/Image \ > > -append 'earlycon console=tty0 console=ttyS0' \ > > -machine virt,aia=aplic-imsic \ > > -no-reboot -nodefaults -nographic \ > > -smp 4 \ > > -object rng-random,filename=/dev/urandom,id=rng0 \ > > -device virtio-rng-device,rng=rng0 \ > > -m 4G -chardev stdio,id=char0 -serial chardev:char0 > > > > I can reproduce with your upstream riscv_aia_v12 plus the config in the > > gist [1], and all latest QEMU/OpenSBI: > > > > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging") > > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found") > > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers") > > > > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e. > > panicking about missing root mount ;-)) > > More context; The hang is during a late initcall, where an ftrace direct > (register_ftrace_direct()) modification is done. > > Stop machine is used to call into __ftrace_modify_call(). Then into the > arch specific patch_text_nosync(), where flush_icache_range() hangs in > flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to > on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info, > scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)" > right before the end of the function. > > Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code > patching fixes the boot hang, but it does seem related to IPI... > Looks like flush_icache_all() does not use the IPIs (on_each_cpu() and friends) correctly. On other hand, the flush_icache_mm() does the right thing by doing local flush on the current CPU and IPI based flush on other CPUs. Can you try the following patch ? diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c index 55a34f2020a8..a3dfbe4de832 100644 --- a/arch/riscv/mm/cacheflush.c +++ b/arch/riscv/mm/cacheflush.c @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info) void flush_icache_all(void) { + cpumask_t others; + local_flush_icache_all(); + cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id())); + if (cpumask_empty(&others)) + return; + if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence()) - sbi_remote_fence_i(NULL); + sbi_remote_fence_i(&others); else - on_each_cpu(ipi_remote_fence_i, NULL, 1); + on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1); } EXPORT_SYMBOL(flush_icache_all); Regards, Anup
Anup Patel <apatel@ventanamicro.com> writes: > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Björn Töpel <bjorn@kernel.org> writes: >> >> > Anup Patel <apatel@ventanamicro.com> writes: >> > >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: >> >>> >> >>> Björn Töpel <bjorn@kernel.org> writes: >> >>> >> >>> > Anup Patel <apatel@ventanamicro.com> writes: >> >>> > >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international >> >>> >> process. The latest ratified AIA specifcation can be found at: >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf >> >>> >> >> >>> >> At a high-level, the AIA specification adds three things: >> >>> >> 1) AIA CSRs >> >>> >> - Improved local interrupt support >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) >> >>> >> - Per-HART MSI controller >> >>> >> - Support MSI virtualization >> >>> >> - Support IPI along with virtualization >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) >> >>> >> - Wired interrupt controller >> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) >> >>> >> - In Direct-mode, injects external interrupts directly into HARTs >> >>> >> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization >> >>> >> talk at KVM Forum 2022: >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo >> >>> >> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). >> >>> >> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at: >> >>> >> https://github.com/avpatel/linux.git >> >>> >> >> >>> >> Changes since v11: >> >>> >> - Rebased on Linux-6.8-rc1 >> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM >> >>> >> MSI handling to per device MSI domains" series by Thomas. >> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, >> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of >> >>> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) >> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. >> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and >> >>> >> platform devices. >> >>> > >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches. >> >>> > >> >>> > I'm hitting a boot hang in text patching, with this series applied on >> >>> > 6.8-rc2. IPI issues? >> >>> >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and >> >>> the others are in cpu_relax(). Smells like IPI... >> >> >> >> I tried bootefi from U-Boot multiple times but can't reproduce the >> >> issue you are seeing. >> > >> > Thanks! I can reproduce without EFI, and simpler command-line: >> > >> > qemu-system-riscv64 \ >> > -bios /path/to/fw_dynamic.bin \ >> > -kernel /path/to/Image \ >> > -append 'earlycon console=tty0 console=ttyS0' \ >> > -machine virt,aia=aplic-imsic \ >> > -no-reboot -nodefaults -nographic \ >> > -smp 4 \ >> > -object rng-random,filename=/dev/urandom,id=rng0 \ >> > -device virtio-rng-device,rng=rng0 \ >> > -m 4G -chardev stdio,id=char0 -serial chardev:char0 >> > >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the >> > gist [1], and all latest QEMU/OpenSBI: >> > >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging") >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found") >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers") >> > >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e. >> > panicking about missing root mount ;-)) >> >> More context; The hang is during a late initcall, where an ftrace direct >> (register_ftrace_direct()) modification is done. >> >> Stop machine is used to call into __ftrace_modify_call(). Then into the >> arch specific patch_text_nosync(), where flush_icache_range() hangs in >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info, >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)" >> right before the end of the function. >> >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code >> patching fixes the boot hang, but it does seem related to IPI... >> > Looks like flush_icache_all() does not use the IPIs (on_each_cpu() > and friends) correctly. > > On other hand, the flush_icache_mm() does the right thing by > doing local flush on the current CPU and IPI based flush on other > CPUs. > > Can you try the following patch ? > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > index 55a34f2020a8..a3dfbe4de832 100644 > --- a/arch/riscv/mm/cacheflush.c > +++ b/arch/riscv/mm/cacheflush.c > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info) > > void flush_icache_all(void) > { > + cpumask_t others; > + > local_flush_icache_all(); > > + cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id())); > + if (cpumask_empty(&others)) > + return; > + > if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence()) > - sbi_remote_fence_i(NULL); > + sbi_remote_fence_i(&others); > else > - on_each_cpu(ipi_remote_fence_i, NULL, 1); > + on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1); > } > EXPORT_SYMBOL(flush_icache_all); Unfortunately, I see the same hang. LMK if you'd like me to try anything else. Björn
On Tue, Jan 30, 2024 at 11:19 PM Björn Töpel <bjorn@kernel.org> wrote: > > Anup Patel <apatel@ventanamicro.com> writes: > > > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote: > >> > >> Björn Töpel <bjorn@kernel.org> writes: > >> > >> > Anup Patel <apatel@ventanamicro.com> writes: > >> > > >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: > >> >>> > >> >>> Björn Töpel <bjorn@kernel.org> writes: > >> >>> > >> >>> > Anup Patel <apatel@ventanamicro.com> writes: > >> >>> > > >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international > >> >>> >> process. The latest ratified AIA specifcation can be found at: > >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > >> >>> >> > >> >>> >> At a high-level, the AIA specification adds three things: > >> >>> >> 1) AIA CSRs > >> >>> >> - Improved local interrupt support > >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) > >> >>> >> - Per-HART MSI controller > >> >>> >> - Support MSI virtualization > >> >>> >> - Support IPI along with virtualization > >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) > >> >>> >> - Wired interrupt controller > >> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > >> >>> >> - In Direct-mode, injects external interrupts directly into HARTs > >> >>> >> > >> >>> >> For an overview of the AIA specification, refer the AIA virtualization > >> >>> >> talk at KVM Forum 2022: > >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo > >> >>> >> > >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). > >> >>> >> > >> >>> >> These patches can also be found in the riscv_aia_v12 branch at: > >> >>> >> https://github.com/avpatel/linux.git > >> >>> >> > >> >>> >> Changes since v11: > >> >>> >> - Rebased on Linux-6.8-rc1 > >> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > >> >>> >> MSI handling to per device MSI domains" series by Thomas. > >> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > >> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > >> >>> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > >> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > >> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and > >> >>> >> platform devices. > >> >>> > > >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches. > >> >>> > > >> >>> > I'm hitting a boot hang in text patching, with this series applied on > >> >>> > 6.8-rc2. IPI issues? > >> >>> > >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and > >> >>> the others are in cpu_relax(). Smells like IPI... > >> >> > >> >> I tried bootefi from U-Boot multiple times but can't reproduce the > >> >> issue you are seeing. > >> > > >> > Thanks! I can reproduce without EFI, and simpler command-line: > >> > > >> > qemu-system-riscv64 \ > >> > -bios /path/to/fw_dynamic.bin \ > >> > -kernel /path/to/Image \ > >> > -append 'earlycon console=tty0 console=ttyS0' \ > >> > -machine virt,aia=aplic-imsic \ > >> > -no-reboot -nodefaults -nographic \ > >> > -smp 4 \ > >> > -object rng-random,filename=/dev/urandom,id=rng0 \ > >> > -device virtio-rng-device,rng=rng0 \ > >> > -m 4G -chardev stdio,id=char0 -serial chardev:char0 > >> > > >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the > >> > gist [1], and all latest QEMU/OpenSBI: > >> > > >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging") > >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found") > >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers") > >> > > >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e. > >> > panicking about missing root mount ;-)) > >> > >> More context; The hang is during a late initcall, where an ftrace direct > >> (register_ftrace_direct()) modification is done. > >> > >> Stop machine is used to call into __ftrace_modify_call(). Then into the > >> arch specific patch_text_nosync(), where flush_icache_range() hangs in > >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to > >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info, > >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)" > >> right before the end of the function. > >> > >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code > >> patching fixes the boot hang, but it does seem related to IPI... > >> > > Looks like flush_icache_all() does not use the IPIs (on_each_cpu() > > and friends) correctly. > > > > On other hand, the flush_icache_mm() does the right thing by > > doing local flush on the current CPU and IPI based flush on other > > CPUs. > > > > Can you try the following patch ? > > > > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c > > index 55a34f2020a8..a3dfbe4de832 100644 > > --- a/arch/riscv/mm/cacheflush.c > > +++ b/arch/riscv/mm/cacheflush.c > > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info) > > > > void flush_icache_all(void) > > { > > + cpumask_t others; > > + > > local_flush_icache_all(); > > > > + cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id())); > > + if (cpumask_empty(&others)) > > + return; > > + > > if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence()) > > - sbi_remote_fence_i(NULL); > > + sbi_remote_fence_i(&others); > > else > > - on_each_cpu(ipi_remote_fence_i, NULL, 1); > > + on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1); > > } > > EXPORT_SYMBOL(flush_icache_all); > > Unfortunately, I see the same hang. LMK if you'd like me to try anything > else. I was able to reproduce this at my end but I had to use your config. Digging further, it seems the issue is observed only when we use in-kernel IPIs for cache flushing (instead of SBI calls) along with some of the tracers (or debugging features) enabled. With the tracers (or debug features) disabled we don't see any issue. In fact, the upstream defconfig works perfectly fine with AIA drivers and in-kernel IPIs. It seems AIA based in-kernel IPIs are exposing some other issue with RISC-V kernel. I will debug more to find the root cause. Regards, Anup
Anup Patel <apatel@ventanamicro.com> writes: > On Tue, Jan 30, 2024 at 11:19 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Anup Patel <apatel@ventanamicro.com> writes: >> >> > On Tue, Jan 30, 2024 at 8:18 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> >> >> Björn Töpel <bjorn@kernel.org> writes: >> >> >> >> > Anup Patel <apatel@ventanamicro.com> writes: >> >> > >> >> >> On Tue, Jan 30, 2024 at 1:22 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> >>> >> >> >>> Björn Töpel <bjorn@kernel.org> writes: >> >> >>> >> >> >>> > Anup Patel <apatel@ventanamicro.com> writes: >> >> >>> > >> >> >>> >> The RISC-V AIA specification is ratified as-per the RISC-V international >> >> >>> >> process. The latest ratified AIA specifcation can be found at: >> >> >>> >> https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf >> >> >>> >> >> >> >>> >> At a high-level, the AIA specification adds three things: >> >> >>> >> 1) AIA CSRs >> >> >>> >> - Improved local interrupt support >> >> >>> >> 2) Incoming Message Signaled Interrupt Controller (IMSIC) >> >> >>> >> - Per-HART MSI controller >> >> >>> >> - Support MSI virtualization >> >> >>> >> - Support IPI along with virtualization >> >> >>> >> 3) Advanced Platform-Level Interrupt Controller (APLIC) >> >> >>> >> - Wired interrupt controller >> >> >>> >> - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) >> >> >>> >> - In Direct-mode, injects external interrupts directly into HARTs >> >> >>> >> >> >> >>> >> For an overview of the AIA specification, refer the AIA virtualization >> >> >>> >> talk at KVM Forum 2022: >> >> >>> >> https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf >> >> >>> >> https://www.youtube.com/watch?v=r071dL8Z0yo >> >> >>> >> >> >> >>> >> To test this series, use QEMU v7.2 (or higher) and OpenSBI v1.2 (or higher). >> >> >>> >> >> >> >>> >> These patches can also be found in the riscv_aia_v12 branch at: >> >> >>> >> https://github.com/avpatel/linux.git >> >> >>> >> >> >> >>> >> Changes since v11: >> >> >>> >> - Rebased on Linux-6.8-rc1 >> >> >>> >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM >> >> >>> >> MSI handling to per device MSI domains" series by Thomas. >> >> >>> >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, >> >> >>> >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of >> >> >>> >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) >> >> >>> >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. >> >> >>> >> - Updated IMSIC driver to support per-device MSI domains for PCI and >> >> >>> >> platform devices. >> >> >>> > >> >> >>> > Thanks for working on this, Anup! I'm still reviewing the patches. >> >> >>> > >> >> >>> > I'm hitting a boot hang in text patching, with this series applied on >> >> >>> > 6.8-rc2. IPI issues? >> >> >>> >> >> >>> Not text patching! One cpu spinning in smp_call_function_many_cond() and >> >> >>> the others are in cpu_relax(). Smells like IPI... >> >> >> >> >> >> I tried bootefi from U-Boot multiple times but can't reproduce the >> >> >> issue you are seeing. >> >> > >> >> > Thanks! I can reproduce without EFI, and simpler command-line: >> >> > >> >> > qemu-system-riscv64 \ >> >> > -bios /path/to/fw_dynamic.bin \ >> >> > -kernel /path/to/Image \ >> >> > -append 'earlycon console=tty0 console=ttyS0' \ >> >> > -machine virt,aia=aplic-imsic \ >> >> > -no-reboot -nodefaults -nographic \ >> >> > -smp 4 \ >> >> > -object rng-random,filename=/dev/urandom,id=rng0 \ >> >> > -device virtio-rng-device,rng=rng0 \ >> >> > -m 4G -chardev stdio,id=char0 -serial chardev:char0 >> >> > >> >> > I can reproduce with your upstream riscv_aia_v12 plus the config in the >> >> > gist [1], and all latest QEMU/OpenSBI: >> >> > >> >> > QEMU: 11be70677c70 ("Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into staging") >> >> > OpenSBI: bb90a9ebf6d9 ("lib: sbi: Print number of debug triggers found") >> >> > Linux: d9b9d6eb987f ("MAINTAINERS: Add entry for RISC-V AIA drivers") >> >> > >> >> > Removing ",aia=aplic-imsic" from the CLI above completes the boot (i.e. >> >> > panicking about missing root mount ;-)) >> >> >> >> More context; The hang is during a late initcall, where an ftrace direct >> >> (register_ftrace_direct()) modification is done. >> >> >> >> Stop machine is used to call into __ftrace_modify_call(). Then into the >> >> arch specific patch_text_nosync(), where flush_icache_range() hangs in >> >> flush_icache_all(). From "on_each_cpu(ipi_remote_fence_i, NULL, 1);" to >> >> on_each_cpu_cond_mask() "smp_call_function_many_cond(mask, func, info, >> >> scf_flags, cond_func);" which never returns from "csd_lock_wait(csd)" >> >> right before the end of the function. >> >> >> >> Any ideas? Disabling CONFIG_HID_BPF, that does the early ftrace code >> >> patching fixes the boot hang, but it does seem related to IPI... >> >> >> > Looks like flush_icache_all() does not use the IPIs (on_each_cpu() >> > and friends) correctly. >> > >> > On other hand, the flush_icache_mm() does the right thing by >> > doing local flush on the current CPU and IPI based flush on other >> > CPUs. >> > >> > Can you try the following patch ? >> > >> > diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c >> > index 55a34f2020a8..a3dfbe4de832 100644 >> > --- a/arch/riscv/mm/cacheflush.c >> > +++ b/arch/riscv/mm/cacheflush.c >> > @@ -19,12 +19,18 @@ static void ipi_remote_fence_i(void *info) >> > >> > void flush_icache_all(void) >> > { >> > + cpumask_t others; >> > + >> > local_flush_icache_all(); >> > >> > + cpumask_andnot(&others, cpu_online_mask, cpumask_of(smp_processor_id())); >> > + if (cpumask_empty(&others)) >> > + return; >> > + >> > if (IS_ENABLED(CONFIG_RISCV_SBI) && !riscv_use_ipi_for_rfence()) >> > - sbi_remote_fence_i(NULL); >> > + sbi_remote_fence_i(&others); >> > else >> > - on_each_cpu(ipi_remote_fence_i, NULL, 1); >> > + on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1); >> > } >> > EXPORT_SYMBOL(flush_icache_all); >> >> Unfortunately, I see the same hang. LMK if you'd like me to try anything >> else. > > I was able to reproduce this at my end but I had to use your config. > > Digging further, it seems the issue is observed only when we use > in-kernel IPIs for cache flushing (instead of SBI calls) along with > some of the tracers (or debugging features) enabled. With the tracers > (or debug features) disabled we don't see any issue. In fact, the > upstream defconfig works perfectly fine with AIA drivers and > in-kernel IPIs. Same here. I only see the issue for *one* scenario. Other than that scenario, AIA is working fine! We're doing ftrace text patching, and I wonder if this is the issue. RISC-V (unfortunately) still rely on stop_machine() text patching (which will change!). Again, the hang is in stop_machine() context, where interrupts should very much be disabled, right? So, triggering an IPI will be impossible. Dumping mstatus in QEMU: | mstatus 0000000a000000a0 | mstatus 0000000a000000a0 | mstatus 0000000a000000a0 | mstatus 0000000a000000a0 Indeed sstatus.SIE is 0. Seems like the bug is that text patching is trying to issue an IPI: | [<ffffffff801145d4>] smp_call_function_many_cond+0x81e/0x8ba | [<ffffffff80114716>] on_each_cpu_cond_mask+0x3e/0xde | [<ffffffff80013968>] flush_icache_all+0x98/0xc4 | [<ffffffff80009c26>] patch_text_nosync+0x7c/0x146 | [<ffffffff80ef9116>] __ftrace_modify_call.constprop.0+0xca/0x120 | [<ffffffff80ef918c>] ftrace_update_ftrace_func+0x20/0x40 | [<ffffffff80efb8ac>] ftrace_modify_all_code+0x5a/0x1d8 | [<ffffffff80efba50>] __ftrace_modify_code+0x26/0x42 | [<ffffffff80131734>] multi_cpu_stop+0x14e/0x1d8 | [<ffffffff8013107a>] cpu_stopper_thread+0x9e/0x182 | [<ffffffff80077a04>] smpboot_thread_fn+0xf8/0x1d2 | [<ffffffff800718fc>] kthread+0xe8/0x108 | [<ffffffff80f1cde6>] ret_from_fork+0xe/0x20 Björn
Hi Anup, Anup Patel <apatel@ventanamicro.com> writes: > The RISC-V AIA specification is ratified as-per the RISC-V international > process. The latest ratified AIA specifcation can be found at: > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > > At a high-level, the AIA specification adds three things: > 1) AIA CSRs > - Improved local interrupt support > 2) Incoming Message Signaled Interrupt Controller (IMSIC) > - Per-HART MSI controller > - Support MSI virtualization > - Support IPI along with virtualization > 3) Advanced Platform-Level Interrupt Controller (APLIC) > - Wired interrupt controller > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > - In Direct-mode, injects external interrupts directly into HARTs > > For an overview of the AIA specification, refer the AIA virtualization > talk at KVM Forum 2022: > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > https://www.youtube.com/watch?v=r071dL8Z0yo Thank you for continuing to work on this series! I like this direction of the series! TL;DR: I think we can get rid of most of the id/householding data structures, except for the irq matrix. Most of my comments are more of a design/overview nature, so I'll comment here in the cover letter. I took the series for a spin with and it with Alex' ftrace fix it, passes all my tests nicely! Now some thoughts/comments (I'm coming from the x86 side of things!): id/enable-tracking: There are a lot of different id/enabled tracking with corresponding locks, where there's IMO overlap with what the matrix provides. Let's start with struct imsic_priv: | /* Dummy HW interrupt numbers */ | unsigned int nr_hwirqs; | raw_spinlock_t hwirqs_lock; | unsigned long *hwirqs_used_bitmap; These are used to for the domain routing (hwirq -> desc/virq), and not needed. Just use the same id as virq (at allocation time), and get rid of these data structures/corresponding functions. The lookup in the interrupt handler via imsic_local_priv.vectors doesn't care about hwirq. This is what x86 does... The imsic_vector roughly corresponds to apic_chip_data (nit: imsic_vector could have the chip_data suffix as well, at least it would have helped me!) Moving/affinity changes. The moving of a vector to another CPU currently involves: 1. Allocate a new vector from the matrix 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested spinlocks) 3. Trigger two IPIs to apply the bitmap 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip all bits, and potentially rearm This seems a bit heavy-weight: Why are you explicitly setting/clearing all the bits in a loop at the local sync? x86 does it a bit differently (more lazily): The chip_data has prev_{cpu,vector}/move_in_progress fields, and keep both vectors enabled until there's an interrupt on the new vector, and then the old one is cleaned (irq_complete_move()). Further; When it's time to remove the old vector, x86 doesn't trigger an IPI on the disabling side, but queues a cleanup job on a per-cpu list and triggers a timeout. So, the per-cpu chip_data (per-cpu "vectors" in your series) can reside in two places during the transit. I wonder if this clean up is less intrusive, and you just need to perform what's in the per-list instead of dealing with the ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The chip_data/desc has that information. This would mean that imsic_local_priv() would only have the local vectors (chip_data), and a cleanup list/timer. My general comment is that instead of having these global id-tracking structures, use the matrix together with some desc/chip_data local data, which should be sufficient. Random thought: Do we need to explicitly disable (csr) the vector, when we're changing the affinity? What if we just leave it enabled, and only when mask/unmask is performed it's actually explicitly masked (writes to the csr)? Missing features (which can be added later): * Reservation mode/activate support (allocate many MSI, but only request/activate a subset) * Handle managed interrupts * There might be some irqd flags are missing, which mostly cpuhp care about (e.g. irqd_*_single_target())... Finally; Given that the APLIC requires a lot more patches, depending on how the review process moves on -- maybe the IMSIC side could go as a separate series? Cheers, Björn
On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <bjorn@kernel.org> wrote: > > Hi Anup, > > Anup Patel <apatel@ventanamicro.com> writes: > > > The RISC-V AIA specification is ratified as-per the RISC-V international > > process. The latest ratified AIA specifcation can be found at: > > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > > > > At a high-level, the AIA specification adds three things: > > 1) AIA CSRs > > - Improved local interrupt support > > 2) Incoming Message Signaled Interrupt Controller (IMSIC) > > - Per-HART MSI controller > > - Support MSI virtualization > > - Support IPI along with virtualization > > 3) Advanced Platform-Level Interrupt Controller (APLIC) > > - Wired interrupt controller > > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > > - In Direct-mode, injects external interrupts directly into HARTs > > > > For an overview of the AIA specification, refer the AIA virtualization > > talk at KVM Forum 2022: > > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > > https://www.youtube.com/watch?v=r071dL8Z0yo > > Thank you for continuing to work on this series! I like this > direction of the series! > > TL;DR: I think we can get rid of most of the id/householding data > structures, except for the irq matrix. > > Most of my comments are more of a design/overview nature, so I'll > comment here in the cover letter. > > I took the series for a spin with and it with Alex' ftrace fix it, > passes all my tests nicely! > > Now some thoughts/comments (I'm coming from the x86 side of things!): > > id/enable-tracking: There are a lot of different id/enabled tracking > with corresponding locks, where there's IMO overlap with what the > matrix provides. The matrix allocator does not track the enabled/disabled state of the per-CPU IDs. This is why we have a separate per-CPU ids_enabled_bitmap which is also used for remote synchronization across CPUs. > > Let's start with struct imsic_priv: > > | /* Dummy HW interrupt numbers */ > | unsigned int nr_hwirqs; > | raw_spinlock_t hwirqs_lock; > | unsigned long *hwirqs_used_bitmap; The matrix allocator manages actual IDs for each CPU whereas the Linux irq_data expects a fixed hwirq which does not change. Due to this, we have a dummy hwirq space which is always fixed. The only thing that is changed under-the-hood by the IMSIC driver is the dummy hwirq to actual HW vector (cpu, id) mapping. > > These are used to for the domain routing (hwirq -> desc/virq), and not > needed. Just use the same id as virq (at allocation time), and get rid > of these data structures/corresponding functions. The lookup in the > interrupt handler via imsic_local_priv.vectors doesn't care about > hwirq. This is what x86 does... The imsic_vector roughly corresponds > to apic_chip_data (nit: imsic_vector could have the chip_data suffix > as well, at least it would have helped me!) Yes, imsic_vector corresponds to apic_chip_data in the x86 world. > > Moving/affinity changes. The moving of a vector to another CPU > currently involves: > > 1. Allocate a new vector from the matrix > 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested > spinlocks) > 3. Trigger two IPIs to apply the bitmap > 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip > all bits, and potentially rearm > > This seems a bit heavy-weight: Why are you explicitly setting/clearing > all the bits in a loop at the local sync? This can be certainly optimized by introducing another ids_dirty_bitmap. I will add this in the next revision. > > x86 does it a bit differently (more lazily): The chip_data has > prev_{cpu,vector}/move_in_progress fields, and keep both vectors > enabled until there's an interrupt on the new vector, and then the old > one is cleaned (irq_complete_move()). > > Further; When it's time to remove the old vector, x86 doesn't trigger > an IPI on the disabling side, but queues a cleanup job on a per-cpu > list and triggers a timeout. So, the per-cpu chip_data (per-cpu > "vectors" in your series) can reside in two places during the transit. We can't avoid IPIs when moving vectors from one CPU to another CPU because IMSIC id enable/disable is only possible through CSRs. Also, keep in-mind that irq affinity change might be initiated on CPU X for some interrupt targeting CPU Y which is then changed to target CPU Z. In the case of x86, they have memory mapped registers which allows one CPU to enable/disable the ID of another CPU. > > I wonder if this clean up is less intrusive, and you just need to > perform what's in the per-list instead of dealing with the > ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The > chip_data/desc has that information. This would mean that > imsic_local_priv() would only have the local vectors (chip_data), and > a cleanup list/timer. > > My general comment is that instead of having these global id-tracking > structures, use the matrix together with some desc/chip_data local > data, which should be sufficient. The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors are required since the matrix allocator only manages allocation of per-CPU IDs. > > Random thought: Do we need to explicitly disable (csr) the vector, > when we're changing the affinity? What if we just leave it enabled, > and only when mask/unmask is performed it's actually explicitly masked > (writes to the csr)? We should not leave it enabled because some rough/buggy device can inject spurious interrupts using MSI writes to unused enabled interrupts. > > Missing features (which can be added later): > * Reservation mode/activate support (allocate many MSI, but only > request/activate a subset) I did not see any PCIe or platform device requiring this kind of reservation. Any examples ? > * Handle managed interrupts Any examples of managed interrupts in the RISC-V world ? > * There might be some irqd flags are missing, which mostly cpuhp care > about (e.g. irqd_*_single_target())... Okay, let me check and update. > > Finally; Given that the APLIC requires a lot more patches, depending > on how the review process moves on -- maybe the IMSIC side could go as > a separate series? > The most popular implementation choice across RISC-V platforms will be IMSIC + APLIC so both drivers should go together. In fact, we need both drivers for the QEMU virt machine as well because UART interrupt (and other wired interrupts) on the QEMU virt machine goes through APLIC. Regards, Anup
Hi! Anup Patel <anup@brainfault.org> writes: > On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Hi Anup, >> >> Anup Patel <apatel@ventanamicro.com> writes: >> >> > The RISC-V AIA specification is ratified as-per the RISC-V international >> > process. The latest ratified AIA specifcation can be found at: >> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf >> > >> > At a high-level, the AIA specification adds three things: >> > 1) AIA CSRs >> > - Improved local interrupt support >> > 2) Incoming Message Signaled Interrupt Controller (IMSIC) >> > - Per-HART MSI controller >> > - Support MSI virtualization >> > - Support IPI along with virtualization >> > 3) Advanced Platform-Level Interrupt Controller (APLIC) >> > - Wired interrupt controller >> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) >> > - In Direct-mode, injects external interrupts directly into HARTs >> > >> > For an overview of the AIA specification, refer the AIA virtualization >> > talk at KVM Forum 2022: >> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf >> > https://www.youtube.com/watch?v=r071dL8Z0yo >> >> Thank you for continuing to work on this series! I like this >> direction of the series! >> >> TL;DR: I think we can get rid of most of the id/householding data >> structures, except for the irq matrix. >> >> Most of my comments are more of a design/overview nature, so I'll >> comment here in the cover letter. >> >> I took the series for a spin with and it with Alex' ftrace fix it, >> passes all my tests nicely! >> >> Now some thoughts/comments (I'm coming from the x86 side of things!): >> >> id/enable-tracking: There are a lot of different id/enabled tracking >> with corresponding locks, where there's IMO overlap with what the >> matrix provides. > > The matrix allocator does not track the enabled/disabled state of > the per-CPU IDs. This is why we have a separate per-CPU > ids_enabled_bitmap which is also used for remote synchronization > across CPUs. Exactly, but what I'm asking is if that structure is really needed. More below. >> Let's start with struct imsic_priv: >> >> | /* Dummy HW interrupt numbers */ >> | unsigned int nr_hwirqs; >> | raw_spinlock_t hwirqs_lock; >> | unsigned long *hwirqs_used_bitmap; > > The matrix allocator manages actual IDs for each CPU whereas > the Linux irq_data expects a fixed hwirq which does not change. > > Due to this, we have a dummy hwirq space which is always > fixed. The only thing that is changed under-the-hood by the > IMSIC driver is the dummy hwirq to actual HW vector (cpu, id) > mapping. Read below. I'm not talking about local_id from the irq_matrix, I'm saying use virq, which has the properties you're asking for, and doesn't require an additional structure. When an irq/desc is allocated, you have a nice unique number with the virq for the lifetime of the interrupt. >> These are used to for the domain routing (hwirq -> desc/virq), and not >> needed. Just use the same id as virq (at allocation time), and get rid >> of these data structures/corresponding functions. The lookup in the >> interrupt handler via imsic_local_priv.vectors doesn't care about >> hwirq. This is what x86 does... The imsic_vector roughly corresponds >> to apic_chip_data (nit: imsic_vector could have the chip_data suffix >> as well, at least it would have helped me!) > > Yes, imsic_vector corresponds to apic_chip_data in the x86 world. ...and I'm trying to ask the following; Given the IMSIC is pretty much x86 vector (arch/x86/kernel/apic/vector.c), I'm trying to figure out the rational why IMSIC has all the extra householding data, not needed by x86. The x86 has been battle proven, and having to deal with all kind of quirks (e.g. lost interrupts on affinity changes). >> Moving/affinity changes. The moving of a vector to another CPU >> currently involves: >> >> 1. Allocate a new vector from the matrix >> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested >> spinlocks) >> 3. Trigger two IPIs to apply the bitmap >> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip >> all bits, and potentially rearm >> >> This seems a bit heavy-weight: Why are you explicitly setting/clearing >> all the bits in a loop at the local sync? > > This can be certainly optimized by introducing another > ids_dirty_bitmap. I will add this in the next revision. I rather have fewer maps, and less locks! ;-) >> x86 does it a bit differently (more lazily): The chip_data has >> prev_{cpu,vector}/move_in_progress fields, and keep both vectors >> enabled until there's an interrupt on the new vector, and then the old >> one is cleaned (irq_complete_move()). >> >> Further; When it's time to remove the old vector, x86 doesn't trigger >> an IPI on the disabling side, but queues a cleanup job on a per-cpu >> list and triggers a timeout. So, the per-cpu chip_data (per-cpu >> "vectors" in your series) can reside in two places during the transit. > > We can't avoid IPIs when moving vectors from one CPU to another > CPU because IMSIC id enable/disable is only possible through > CSRs. Also, keep in-mind that irq affinity change might be initiated > on CPU X for some interrupt targeting CPU Y which is then changed > to target CPU Z. > > In the case of x86, they have memory mapped registers which > allows one CPU to enable/disable the ID of another CPU. Nope. Same mechanics on x86 -- the cleanup has to be done one the originating core. What I asked was "what about using a timer instead of an IPI". I think this was up in the last rev as well? Check out commit bdc1dad299bb ("x86/vector: Replace IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the comment about lost interrupts, and the rational for keeping the original target active until there's a new interrupt on the new cpu. >> I wonder if this clean up is less intrusive, and you just need to >> perform what's in the per-list instead of dealing with the >> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The >> chip_data/desc has that information. This would mean that >> imsic_local_priv() would only have the local vectors (chip_data), and >> a cleanup list/timer. >> >> My general comment is that instead of having these global id-tracking >> structures, use the matrix together with some desc/chip_data local >> data, which should be sufficient. > > The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors > are required since the matrix allocator only manages allocation of > per-CPU IDs. The information in ids_enabled_bitmap is/could be inherent in imsic_local_priv.vectors (guess what x86 does... ;-)). Dummy hwirqs could be replaced with the virq. Hmm, seems like we're talking past each other, or at least I get the feeling I can't get my opinions out right. I'll try to do a quick PoC, to show you what I mean. That's probably easier than just talking about it. ...and maybe I'll come realizing I'm all wrong! My reaction is -- you're doing a lot of householding with a lot of locks, and my worry is that we'll just end up with same issues/bloat that x86 once had (has? ;-)). >> Random thought: Do we need to explicitly disable (csr) the vector, >> when we're changing the affinity? What if we just leave it enabled, >> and only when mask/unmask is performed it's actually explicitly masked >> (writes to the csr)? > > We should not leave it enabled because some rough/buggy device > can inject spurious interrupts using MSI writes to unused enabled > interrupts. OK! >> >> Missing features (which can be added later): >> * Reservation mode/activate support (allocate many MSI, but only >> request/activate a subset) > > I did not see any PCIe or platform device requiring this kind of > reservation. Any examples ? It's not a requirement. Some devices allocate a gazillion interrupts (NICs with many QoS queues, e.g.), but only activate a subset (via request_irq()). A system using these kind of devices might run out of interrupts. Problems you run into once you leave the embedded world, pretty much. >> * Handle managed interrupts > > Any examples of managed interrupts in the RISC-V world ? E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary netdev drivers would use it. Typically devices with per-cpu queues. >> * There might be some irqd flags are missing, which mostly cpuhp care >> about (e.g. irqd_*_single_target())... > > Okay, let me check and update. I haven't dug much into cpuhp, so I'm out on a limb here... >> Finally; Given that the APLIC requires a lot more patches, depending >> on how the review process moves on -- maybe the IMSIC side could go as >> a separate series? >> > > The most popular implementation choice across RISC-V platforms > will be IMSIC + APLIC so both drivers should go together. In fact, > we need both drivers for the QEMU virt machine as well because > UART interrupt (and other wired interrupts) on the QEMU virt > machine goes through APLIC. Thanks for clearing that out! Hmm, an IMSIC only QEMU would be awesome. Cheers, Björn
On Wed, Feb 7, 2024 at 12:57 PM Björn Töpel <bjorn@kernel.org> wrote: > > Hi! > > Anup Patel <anup@brainfault.org> writes: > > > On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <bjorn@kernel.org> wrote: > >> > >> Hi Anup, > >> > >> Anup Patel <apatel@ventanamicro.com> writes: > >> > >> > The RISC-V AIA specification is ratified as-per the RISC-V international > >> > process. The latest ratified AIA specifcation can be found at: > >> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > >> > > >> > At a high-level, the AIA specification adds three things: > >> > 1) AIA CSRs > >> > - Improved local interrupt support > >> > 2) Incoming Message Signaled Interrupt Controller (IMSIC) > >> > - Per-HART MSI controller > >> > - Support MSI virtualization > >> > - Support IPI along with virtualization > >> > 3) Advanced Platform-Level Interrupt Controller (APLIC) > >> > - Wired interrupt controller > >> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > >> > - In Direct-mode, injects external interrupts directly into HARTs > >> > > >> > For an overview of the AIA specification, refer the AIA virtualization > >> > talk at KVM Forum 2022: > >> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf > >> > https://www.youtube.com/watch?v=r071dL8Z0yo > >> > >> Thank you for continuing to work on this series! I like this > >> direction of the series! > >> > >> TL;DR: I think we can get rid of most of the id/householding data > >> structures, except for the irq matrix. > >> > >> Most of my comments are more of a design/overview nature, so I'll > >> comment here in the cover letter. > >> > >> I took the series for a spin with and it with Alex' ftrace fix it, > >> passes all my tests nicely! > >> > >> Now some thoughts/comments (I'm coming from the x86 side of things!): > >> > >> id/enable-tracking: There are a lot of different id/enabled tracking > >> with corresponding locks, where there's IMO overlap with what the > >> matrix provides. > > > > The matrix allocator does not track the enabled/disabled state of > > the per-CPU IDs. This is why we have a separate per-CPU > > ids_enabled_bitmap which is also used for remote synchronization > > across CPUs. > > Exactly, but what I'm asking is if that structure is really needed. More > below. > > >> Let's start with struct imsic_priv: > >> > >> | /* Dummy HW interrupt numbers */ > >> | unsigned int nr_hwirqs; > >> | raw_spinlock_t hwirqs_lock; > >> | unsigned long *hwirqs_used_bitmap; > > > > The matrix allocator manages actual IDs for each CPU whereas > > the Linux irq_data expects a fixed hwirq which does not change. > > > > Due to this, we have a dummy hwirq space which is always > > fixed. The only thing that is changed under-the-hood by the > > IMSIC driver is the dummy hwirq to actual HW vector (cpu, id) > > mapping. > > Read below. I'm not talking about local_id from the irq_matrix, I'm > saying use virq, which has the properties you're asking for, and doesn't > require an additional structure. When an irq/desc is allocated, you have > a nice unique number with the virq for the lifetime of the interrupt. Sure, let me explore using virq in-place of hwirq. > > >> These are used to for the domain routing (hwirq -> desc/virq), and not > >> needed. Just use the same id as virq (at allocation time), and get rid > >> of these data structures/corresponding functions. The lookup in the > >> interrupt handler via imsic_local_priv.vectors doesn't care about > >> hwirq. This is what x86 does... The imsic_vector roughly corresponds > >> to apic_chip_data (nit: imsic_vector could have the chip_data suffix > >> as well, at least it would have helped me!) > > > > Yes, imsic_vector corresponds to apic_chip_data in the x86 world. > > ...and I'm trying to ask the following; Given the IMSIC is pretty much > x86 vector (arch/x86/kernel/apic/vector.c), I'm trying to figure out the > rational why IMSIC has all the extra householding data, not needed by > x86. The x86 has been battle proven, and having to deal with all kind of > quirks (e.g. lost interrupts on affinity changes). Understood. > > >> Moving/affinity changes. The moving of a vector to another CPU > >> currently involves: > >> > >> 1. Allocate a new vector from the matrix > >> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested > >> spinlocks) > >> 3. Trigger two IPIs to apply the bitmap > >> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip > >> all bits, and potentially rearm > >> > >> This seems a bit heavy-weight: Why are you explicitly setting/clearing > >> all the bits in a loop at the local sync? > > > > This can be certainly optimized by introducing another > > ids_dirty_bitmap. I will add this in the next revision. > > I rather have fewer maps, and less locks! ;-) > > >> x86 does it a bit differently (more lazily): The chip_data has > >> prev_{cpu,vector}/move_in_progress fields, and keep both vectors > >> enabled until there's an interrupt on the new vector, and then the old > >> one is cleaned (irq_complete_move()). > >> > >> Further; When it's time to remove the old vector, x86 doesn't trigger > >> an IPI on the disabling side, but queues a cleanup job on a per-cpu > >> list and triggers a timeout. So, the per-cpu chip_data (per-cpu > >> "vectors" in your series) can reside in two places during the transit. > > > > We can't avoid IPIs when moving vectors from one CPU to another > > CPU because IMSIC id enable/disable is only possible through > > CSRs. Also, keep in-mind that irq affinity change might be initiated > > on CPU X for some interrupt targeting CPU Y which is then changed > > to target CPU Z. > > > > In the case of x86, they have memory mapped registers which > > allows one CPU to enable/disable the ID of another CPU. > > Nope. Same mechanics on x86 -- the cleanup has to be done one the > originating core. What I asked was "what about using a timer instead of > an IPI". I think this was up in the last rev as well? > > Check out commit bdc1dad299bb ("x86/vector: Replace > IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the > comment about lost interrupts, and the rational for keeping the original > target active until there's a new interrupt on the new cpu. Trying timer interrupt is still TBD on my side because with v12 my goal was to implement per-device MSI domains. Let me explore timer interrupts for v13. > > >> I wonder if this clean up is less intrusive, and you just need to > >> perform what's in the per-list instead of dealing with the > >> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The > >> chip_data/desc has that information. This would mean that > >> imsic_local_priv() would only have the local vectors (chip_data), and > >> a cleanup list/timer. > >> > >> My general comment is that instead of having these global id-tracking > >> structures, use the matrix together with some desc/chip_data local > >> data, which should be sufficient. > > > > The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors > > are required since the matrix allocator only manages allocation of > > per-CPU IDs. > > The information in ids_enabled_bitmap is/could be inherent in > imsic_local_priv.vectors (guess what x86 does... ;-)). > > Dummy hwirqs could be replaced with the virq. > > Hmm, seems like we're talking past each other, or at least I get the > feeling I can't get my opinions out right. I'll try to do a quick PoC, > to show you what I mean. That's probably easier than just talking about > it. ...and maybe I'll come realizing I'm all wrong! I suggest to wait for my v13 and try something on top of that otherwise we might duplicate efforts. > > My reaction is -- you're doing a lot of householding with a lot of > locks, and my worry is that we'll just end up with same issues/bloat > that x86 once had (has? ;-)). > > >> Random thought: Do we need to explicitly disable (csr) the vector, > >> when we're changing the affinity? What if we just leave it enabled, > >> and only when mask/unmask is performed it's actually explicitly masked > >> (writes to the csr)? > > > > We should not leave it enabled because some rough/buggy device > > can inject spurious interrupts using MSI writes to unused enabled > > interrupts. > > OK! > > >> > >> Missing features (which can be added later): > >> * Reservation mode/activate support (allocate many MSI, but only > >> request/activate a subset) > > > > I did not see any PCIe or platform device requiring this kind of > > reservation. Any examples ? > > It's not a requirement. Some devices allocate a gazillion interrupts > (NICs with many QoS queues, e.g.), but only activate a subset (via > request_irq()). A system using these kind of devices might run out of > interrupts. I don't see how this is not possible currently. > > Problems you run into once you leave the embedded world, pretty much. > > >> * Handle managed interrupts > > > > Any examples of managed interrupts in the RISC-V world ? > > E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary > netdev drivers would use it. Typically devices with per-cpu queues. We have tested with NVMe devices, e1000e, VirtIO-net, etc and I did not see any issue. We can always add new features as separate incremental series as long as there is clear use-cause backed by real-world devices. > > >> * There might be some irqd flags are missing, which mostly cpuhp care > >> about (e.g. irqd_*_single_target())... > > > > Okay, let me check and update. > > I haven't dug much into cpuhp, so I'm out on a limb here... > > >> Finally; Given that the APLIC requires a lot more patches, depending > >> on how the review process moves on -- maybe the IMSIC side could go as > >> a separate series? > >> > > > > The most popular implementation choice across RISC-V platforms > > will be IMSIC + APLIC so both drivers should go together. In fact, > > we need both drivers for the QEMU virt machine as well because > > UART interrupt (and other wired interrupts) on the QEMU virt > > machine goes through APLIC. > > Thanks for clearing that out! Hmm, an IMSIC only QEMU would be awesome. > > > Cheers, > Björn Regards, Anup
Anup Patel <apatel@ventanamicro.com> writes: >> Nope. Same mechanics on x86 -- the cleanup has to be done one the >> originating core. What I asked was "what about using a timer instead of >> an IPI". I think this was up in the last rev as well? >> >> Check out commit bdc1dad299bb ("x86/vector: Replace >> IRQ_MOVE_CLEANUP_VECTOR with a timer callback") Specifically, the >> comment about lost interrupts, and the rational for keeping the original >> target active until there's a new interrupt on the new cpu. > > Trying timer interrupt is still TBD on my side because with v12 > my goal was to implement per-device MSI domains. Let me > explore timer interrupts for v13. OK! >> >> I wonder if this clean up is less intrusive, and you just need to >> >> perform what's in the per-list instead of dealing with the >> >> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The >> >> chip_data/desc has that information. This would mean that >> >> imsic_local_priv() would only have the local vectors (chip_data), and >> >> a cleanup list/timer. >> >> >> >> My general comment is that instead of having these global id-tracking >> >> structures, use the matrix together with some desc/chip_data local >> >> data, which should be sufficient. >> > >> > The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors >> > are required since the matrix allocator only manages allocation of >> > per-CPU IDs. >> >> The information in ids_enabled_bitmap is/could be inherent in >> imsic_local_priv.vectors (guess what x86 does... ;-)). >> >> Dummy hwirqs could be replaced with the virq. >> >> Hmm, seems like we're talking past each other, or at least I get the >> feeling I can't get my opinions out right. I'll try to do a quick PoC, >> to show you what I mean. That's probably easier than just talking about >> it. ...and maybe I'll come realizing I'm all wrong! > > I suggest to wait for my v13 and try something on top of that > otherwise we might duplicate efforts. OK! >> > I did not see any PCIe or platform device requiring this kind of >> > reservation. Any examples ? >> >> It's not a requirement. Some devices allocate a gazillion interrupts >> (NICs with many QoS queues, e.g.), but only activate a subset (via >> request_irq()). A system using these kind of devices might run out of >> interrupts. > > I don't see how this is not possible currently. Again, this is something we can improve on later. But, this implementation activates the interrupt at allocation time, no? >> Problems you run into once you leave the embedded world, pretty much. >> >> >> * Handle managed interrupts >> > >> > Any examples of managed interrupts in the RISC-V world ? >> >> E.g. all nvme drives: nvme_setup_irqs(), and I'd assume contemporary >> netdev drivers would use it. Typically devices with per-cpu queues. > > We have tested with NVMe devices, e1000e, VirtIO-net, etc and I did > not see any issue. > > We can always add new features as separate incremental series as long > as there is clear use-cause backed by real-world devices. Agreed. Let's not feature creep. Björn
Björn Töpel <bjorn@kernel.org> writes: >>> Hmm, seems like we're talking past each other, or at least I get the >>> feeling I can't get my opinions out right. I'll try to do a quick PoC, >>> to show you what I mean. That's probably easier than just talking about >>> it. ...and maybe I'll come realizing I'm all wrong! >> >> I suggest to wait for my v13 and try something on top of that >> otherwise we might duplicate efforts. > > OK! I did some really rough code sketching, and I'm confident that you can get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the corresponding functions/locks. I'd say one lock is enough, and the key is having the per-cpu imsic_local_priv.vectors change from struct imsic_vector * to struct imsic_vector **. Using smp_call_function_single() to IPI enable (and disable if you don't want to use the lazy timer disable mechanism) seems feasible as well! (Let me know if you don't have the spare cycles, and I can help out.) Björn
On Wed, Feb 7, 2024 at 6:25 PM Björn Töpel <bjorn@kernel.org> wrote: > > Björn Töpel <bjorn@kernel.org> writes: > > >>> Hmm, seems like we're talking past each other, or at least I get the > >>> feeling I can't get my opinions out right. I'll try to do a quick PoC, > >>> to show you what I mean. That's probably easier than just talking about > >>> it. ...and maybe I'll come realizing I'm all wrong! > >> > >> I suggest to wait for my v13 and try something on top of that > >> otherwise we might duplicate efforts. > > > > OK! > > I did some really rough code sketching, and I'm confident that you can > get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the > corresponding functions/locks. I'd say one lock is enough, and the key > is having the per-cpu imsic_local_priv.vectors change from struct > imsic_vector * to struct imsic_vector **. I have managed to remove hwirqs_bitmap (and related function). Now, I am trying another approach to simplify locking using atomics. > > Using smp_call_function_single() to IPI enable (and disable if you don't > want to use the lazy timer disable mechanism) seems feasible as well! We have intentionally kept separate virq for synchronization because this allows us to gather stats for debugging. Calling smp_call_function_single() will not allow us to separately gather stats for sync IPIs. The smp_call_function_single() eventually leads to __ipi_send_mask() via send_ipi_mask() in arch/riscv so directly calling __ipi_send_mask() for sync IPI is faster. > > (Let me know if you don't have the spare cycles, and I can help out.) > > > Björn Regards, Anup
On Wed, Feb 7, 2024 at 6:25 PM Björn Töpel <bjorn@kernel.org> wrote: > > Björn Töpel <bjorn@kernel.org> writes: > > >>> Hmm, seems like we're talking past each other, or at least I get the > >>> feeling I can't get my opinions out right. I'll try to do a quick PoC, > >>> to show you what I mean. That's probably easier than just talking about > >>> it. ...and maybe I'll come realizing I'm all wrong! > >> > >> I suggest to wait for my v13 and try something on top of that > >> otherwise we might duplicate efforts. > > > > OK! > > I did some really rough code sketching, and I'm confident that you can > get rid of all ids_enabled_bitmap, hwirqs_used_bitmap, and the > corresponding functions/locks. I'd say one lock is enough, and the key > is having the per-cpu imsic_local_priv.vectors change from struct > imsic_vector * to struct imsic_vector **. > > Using smp_call_function_single() to IPI enable (and disable if you don't > want to use the lazy timer disable mechanism) seems feasible as well! > > (Let me know if you don't have the spare cycles, and I can help out.) If you can help upstream IOMMU driver then that would be awesome. Regards, Anup
Hi Anup,
I understand that some refactoring is in progress, but I don't see the
report below; adding it here hoping that it can be useful towards v13.
(Unfortunately, I didn't have enough time to debug this yet...)
> irqchip/sifive-plic: Convert PLIC driver into a platform driver
I'm seeing the following LOCKDEP warning with this series, bisected to
the commit above. This is a defconfig + PROVE_LOCKING=y build, booted
using -machine virt,aia=none.
[ 0.953473] ========================================================
[ 0.953704] WARNING: possible irq lock inversion dependency detected
[ 0.953955] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
[ 0.954224] --------------------------------------------------------
[ 0.954444] swapper/0/0 just changed the state of lock:
[ 0.954664] ffffaf808109d0c8 (&irq_desc_lock_class){-...}-{2:2}, at: handle_fasteoi_irq+0x24/0x1da
[ 0.955699] but this lock took another, HARDIRQ-unsafe lock in the past:
[ 0.955942] (&handler->enable_lock){+.+.}-{2:2}
[ 0.955974]
and interrupts could create inverse lock ordering between them.
[ 0.956507]
other info that might help us debug this:
[ 0.956775] Possible interrupt unsafe locking scenario:
[ 0.956998] CPU0 CPU1
[ 0.957247] ---- ----
[ 0.957439] lock(&handler->enable_lock);
[ 0.957607] local_irq_disable();
[ 0.957793] lock(&irq_desc_lock_class);
[ 0.958021] lock(&handler->enable_lock);
[ 0.958246] <Interrupt>
[ 0.958342] lock(&irq_desc_lock_class);
[ 0.958501]
*** DEADLOCK ***
[ 0.958715] no locks held by swapper/0/0.
[ 0.958870]
the shortest dependencies between 2nd lock and 1st lock:
[ 0.959152] -> (&handler->enable_lock){+.+.}-{2:2} {
[ 0.959372] HARDIRQ-ON-W at:
[ 0.959522] __lock_acquire+0x884/0x1f5c
[ 0.959745] lock_acquire+0xf0/0x292
[ 0.959913] _raw_spin_lock+0x2c/0x40
[ 0.960090] plic_probe+0x322/0x65c
[ 0.960257] platform_probe+0x4e/0x92
[ 0.960432] really_probe+0x82/0x210
[ 0.960598] __driver_probe_device+0x5c/0xd0
[ 0.960784] driver_probe_device+0x2c/0xb0
[ 0.960964] __driver_attach+0x72/0x10a
[ 0.961151] bus_for_each_dev+0x60/0xac
[ 0.961330] driver_attach+0x1a/0x22
[ 0.961496] bus_add_driver+0xd4/0x19e
[ 0.961666] driver_register+0x3e/0xd8
[ 0.961835] __platform_driver_register+0x1c/0x24
[ 0.962030] plic_driver_init+0x1a/0x22
[ 0.962201] do_one_initcall+0x80/0x268
[ 0.962371] kernel_init_freeable+0x296/0x300
[ 0.962554] kernel_init+0x1e/0x10a
[ 0.962713] ret_from_fork+0xe/0x1c
[ 0.962884] SOFTIRQ-ON-W at:
[ 0.962994] __lock_acquire+0x89e/0x1f5c
[ 0.963169] lock_acquire+0xf0/0x292
[ 0.963336] _raw_spin_lock+0x2c/0x40
[ 0.963497] plic_probe+0x322/0x65c
[ 0.963664] platform_probe+0x4e/0x92
[ 0.963849] really_probe+0x82/0x210
[ 0.964054] __driver_probe_device+0x5c/0xd0
[ 0.964255] driver_probe_device+0x2c/0xb0
[ 0.964428] __driver_attach+0x72/0x10a
[ 0.964603] bus_for_each_dev+0x60/0xac
[ 0.964777] driver_attach+0x1a/0x22
[ 0.964943] bus_add_driver+0xd4/0x19e
[ 0.965343] driver_register+0x3e/0xd8
[ 0.965527] __platform_driver_register+0x1c/0x24
[ 0.965732] plic_driver_init+0x1a/0x22
[ 0.965908] do_one_initcall+0x80/0x268
[ 0.966078] kernel_init_freeable+0x296/0x300
[ 0.966268] kernel_init+0x1e/0x10a
[ 0.966436] ret_from_fork+0xe/0x1c
[ 0.966599] INITIAL USE at:
[ 0.966716] __lock_acquire+0x3fc/0x1f5c
[ 0.966891] lock_acquire+0xf0/0x292
[ 0.967048] _raw_spin_lock+0x2c/0x40
[ 0.967206] plic_probe+0x322/0x65c
[ 0.967360] platform_probe+0x4e/0x92
[ 0.967522] really_probe+0x82/0x210
[ 0.967678] __driver_probe_device+0x5c/0xd0
[ 0.967853] driver_probe_device+0x2c/0xb0
[ 0.968025] __driver_attach+0x72/0x10a
[ 0.968185] bus_for_each_dev+0x60/0xac
[ 0.968348] driver_attach+0x1a/0x22
[ 0.968513] bus_add_driver+0xd4/0x19e
[ 0.968678] driver_register+0x3e/0xd8
[ 0.968839] __platform_driver_register+0x1c/0x24
[ 0.969035] plic_driver_init+0x1a/0x22
[ 0.969239] do_one_initcall+0x80/0x268
[ 0.969431] kernel_init_freeable+0x296/0x300
[ 0.969610] kernel_init+0x1e/0x10a
[ 0.969766] ret_from_fork+0xe/0x1c
[ 0.969936] }
[ 0.970010] ... key at: [<ffffffff824f4138>] __key.2+0x0/0x10
[ 0.970224] ... acquired at:
[ 0.970353] lock_acquire+0xf0/0x292
[ 0.970482] _raw_spin_lock+0x2c/0x40
[ 0.970609] plic_irq_enable+0x7e/0x140
[ 0.970739] irq_enable+0x2c/0x60
[ 0.970882] __irq_startup+0x58/0x60
[ 0.971008] irq_startup+0x5e/0x13c
[ 0.971126] __setup_irq+0x4de/0x5da
[ 0.971248] request_threaded_irq+0xcc/0x12e
[ 0.971394] vm_find_vqs+0x62/0x50a
[ 0.971518] probe_common+0xfe/0x1d2
[ 0.971635] virtrng_probe+0xc/0x14
[ 0.971751] virtio_dev_probe+0x154/0x1fc
[ 0.971878] really_probe+0x82/0x210
[ 0.972008] __driver_probe_device+0x5c/0xd0
[ 0.972147] driver_probe_device+0x2c/0xb0
[ 0.972280] __driver_attach+0x72/0x10a
[ 0.972407] bus_for_each_dev+0x60/0xac
[ 0.972540] driver_attach+0x1a/0x22
[ 0.972656] bus_add_driver+0xd4/0x19e
[ 0.972777] driver_register+0x3e/0xd8
[ 0.972896] register_virtio_driver+0x1c/0x2a
[ 0.973049] virtio_rng_driver_init+0x18/0x20
[ 0.973236] do_one_initcall+0x80/0x268
[ 0.973399] kernel_init_freeable+0x296/0x300
[ 0.973540] kernel_init+0x1e/0x10a
[ 0.973658] ret_from_fork+0xe/0x1c
[ 0.973858] -> (&irq_desc_lock_class){-...}-{2:2} {
[ 0.974036] IN-HARDIRQ-W at:
[ 0.974142] __lock_acquire+0xa82/0x1f5c
[ 0.974309] lock_acquire+0xf0/0x292
[ 0.974467] _raw_spin_lock+0x2c/0x40
[ 0.974625] handle_fasteoi_irq+0x24/0x1da
[ 0.974794] generic_handle_domain_irq+0x1c/0x2a
[ 0.974982] plic_handle_irq+0x7e/0xf0
[ 0.975143] generic_handle_domain_irq+0x1c/0x2a
[ 0.975329] riscv_intc_irq+0x2e/0x46
[ 0.975488] handle_riscv_irq+0x4a/0x74
[ 0.975652] call_on_irq_stack+0x32/0x40
[ 0.975817] INITIAL USE at:
[ 0.975923] __lock_acquire+0x3fc/0x1f5c
[ 0.976087] lock_acquire+0xf0/0x292
[ 0.976244] _raw_spin_lock_irqsave+0x3a/0x64
[ 0.976423] __irq_get_desc_lock+0x5a/0x84
[ 0.976594] irq_modify_status+0x2a/0x106
[ 0.976764] irq_set_percpu_devid+0x62/0x78
[ 0.976939] riscv_intc_domain_map+0x1e/0x54
[ 0.977133] irq_domain_associate_locked+0x42/0xe4
[ 0.977363] irq_create_mapping_affinity+0x98/0xd4
[ 0.977570] sbi_ipi_init+0x70/0x142
[ 0.977744] init_IRQ+0xfe/0x11a
[ 0.977906] start_kernel+0x4ae/0x790
[ 0.978082] }
[ 0.978151] ... key at: [<ffffffff824bbee0>] irq_desc_lock_class+0x0/0x10
[ 0.978389] ... acquired at:
[ 0.978494] mark_lock+0x3fe/0x8c2
[ 0.978624] __lock_acquire+0xa82/0x1f5c
[ 0.978766] lock_acquire+0xf0/0x292
[ 0.978897] _raw_spin_lock+0x2c/0x40
[ 0.979029] handle_fasteoi_irq+0x24/0x1da
[ 0.979171] generic_handle_domain_irq+0x1c/0x2a
[ 0.979326] plic_handle_irq+0x7e/0xf0
[ 0.979460] generic_handle_domain_irq+0x1c/0x2a
[ 0.979618] riscv_intc_irq+0x2e/0x46
[ 0.979751] handle_riscv_irq+0x4a/0x74
[ 0.979888] call_on_irq_stack+0x32/0x40
[ 0.980110]
stack backtrace:
[ 0.980358] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
[ 0.980662] Hardware name: riscv-virtio,qemu (DT)
[ 0.980913] Call Trace:
[ 0.981042] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
[ 0.981246] [<ffffffff80ae020a>] show_stack+0x2c/0x38
[ 0.981456] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
[ 0.981648] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
[ 0.981813] [<ffffffff80ae17a4>] print_irq_inversion_bug.part.0+0x162/0x176
[ 0.982031] [<ffffffff8007c6e6>] mark_lock+0x3fe/0x8c2
[ 0.982198] [<ffffffff8007d888>] __lock_acquire+0xa82/0x1f5c
[ 0.982377] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
[ 0.982549] [<ffffffff80af9962>] _raw_spin_lock+0x2c/0x40
[ 0.982721] [<ffffffff8008f3fe>] handle_fasteoi_irq+0x24/0x1da
[ 0.982904] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
[ 0.983112] [<ffffffff80581dc0>] plic_handle_irq+0x7e/0xf0
[ 0.983293] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a
[ 0.983495] [<ffffffff8057fb1a>] riscv_intc_irq+0x2e/0x46
[ 0.983671] [<ffffffff80aedb4c>] handle_riscv_irq+0x4a/0x74
[ 0.983856] [<ffffffff80afa756>] call_on_irq_stack+0x32/0x40
When I switch to -machine virt,aia=aplic-imsic (same config as above), I
get:
[ 0.971406] ============================================
[ 0.971439] WARNING: possible recursive locking detected
[ 0.971497] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted
[ 0.971583] --------------------------------------------
[ 0.971612] swapper/0/1 is trying to acquire lock:
[ 0.971662] ffffaf83fefa8e78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x92/0x146
[ 0.971927]
but task is already holding lock:
[ 0.971975] ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
[ 0.972045]
other info that might help us debug this:
[ 0.972085] Possible unsafe locking scenario:
[ 0.972114] CPU0
[ 0.972133] ----
[ 0.972153] lock(&lpriv->ids_lock);
[ 0.972191] lock(&lpriv->ids_lock);
[ 0.972228]
*** DEADLOCK ***
[ 0.972258] May be due to missing lock nesting notation
[ 0.972306] 6 locks held by swapper/0/1:
[ 0.972338] #0: ffffaf8081f65970 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x6a/0x10a
[ 0.972413] #1: ffffaf808217c240 (&desc->request_mutex){+.+.}-{3:3}, at: __setup_irq+0xa2/0x5da
[ 0.972492] #2: ffffaf808217c0c8 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xbe/0x5da
[ 0.972555] #3: ffffffff81892ac0 (mask_lock){....}-{2:2}, at: irq_setup_affinity+0x38/0xc6
[ 0.972617] #4: ffffffff81892a80 (tmp_mask_lock){....}-{2:2}, at: irq_do_set_affinity+0x3a/0x164
[ 0.972681] #5: ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146
[ 0.972753]
stack backtrace:
[ 0.972852] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122
[ 0.972900] Hardware name: riscv-virtio,qemu (DT)
[ 0.972987] Call Trace:
[ 0.973019] [<ffffffff80007198>] dump_backtrace+0x1c/0x24
[ 0.973054] [<ffffffff80ae020a>] show_stack+0x2c/0x38
[ 0.973083] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c
[ 0.973112] [<ffffffff80aedafa>] dump_stack+0x14/0x1c
[ 0.973139] [<ffffffff8007ad5e>] print_deadlock_bug+0x282/0x328
[ 0.973168] [<ffffffff8007e15c>] __lock_acquire+0x1356/0x1f5c
[ 0.973198] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292
[ 0.973225] [<ffffffff80af9adc>] _raw_spin_lock_irqsave+0x3a/0x64
[ 0.973255] [<ffffffff80581210>] imsic_vector_move+0x92/0x146
[ 0.973285] [<ffffffff80581a04>] imsic_irq_set_affinity+0x8e/0xc6
[ 0.973315] [<ffffffff8008c86a>] irq_do_set_affinity+0x142/0x164
[ 0.973345] [<ffffffff8008cc22>] irq_setup_affinity+0x68/0xc6
[ 0.973374] [<ffffffff8008fa82>] irq_startup+0x72/0x13c
[ 0.973401] [<ffffffff8008d40c>] __setup_irq+0x4de/0x5da
[ 0.973430] [<ffffffff8008d5d4>] request_threaded_irq+0xcc/0x12e
[ 0.973460] [<ffffffff806346d8>] vp_find_vqs_msix+0x114/0x376
[ 0.973491] [<ffffffff80634970>] vp_find_vqs+0x36/0x136
[ 0.973518] [<ffffffff80633280>] vp_modern_find_vqs+0x16/0x4e
[ 0.973547] [<ffffffff80ab31f8>] p9_virtio_probe+0x8e/0x31c
[ 0.973576] [<ffffffff8062d982>] virtio_dev_probe+0x154/0x1fc
[ 0.973605] [<ffffffff80693738>] really_probe+0x82/0x210
[ 0.973632] [<ffffffff80693922>] __driver_probe_device+0x5c/0xd0
[ 0.973661] [<ffffffff806939c2>] driver_probe_device+0x2c/0xb0
[ 0.973690] [<ffffffff80693b46>] __driver_attach+0x72/0x10a
[ 0.973718] [<ffffffff8069191a>] bus_for_each_dev+0x60/0xac
[ 0.973746] [<ffffffff80693164>] driver_attach+0x1a/0x22
[ 0.973773] [<ffffffff80692ade>] bus_add_driver+0xd4/0x19e
[ 0.973801] [<ffffffff8069487e>] driver_register+0x3e/0xd8
[ 0.973829] [<ffffffff8062d1ce>] register_virtio_driver+0x1c/0x2a
[ 0.973858] [<ffffffff80c3da52>] p9_virtio_init+0x36/0x56
[ 0.973887] [<ffffffff800028fe>] do_one_initcall+0x80/0x268
[ 0.973915] [<ffffffff80c01144>] kernel_init_freeable+0x296/0x300
[ 0.973944] [<ffffffff80af05dc>] kernel_init+0x1e/0x10a
[ 0.973972] [<ffffffff80afa716>] ret_from_fork+0xe/0x1c
FWIW, the full Qemu command I used was as follows:
sudo /home/andrea/Downloads/qemu/build/qemu-system-riscv64 \
-append "root=/dev/root rw rootfstype=9p rootflags=version=9p2000.L,trans=virtio,cache=mmap,access=any raid=noautodetect security=none loglevel=7" \
-cpu rv64,sv57=off,svadu=off,svnapot=off \
-device virtio-net-device,netdev=net0 \
-device virtio-rng-device,rng=rng0 \
-device virtio-9p-pci,fsdev=root,mount_tag=/dev/root \
-fsdev local,id=root,path=/home/andrea/Downloads/jammy/,security_model=none \
-kernel /home/andrea/linux/arch/riscv/boot/Image \
-m 16G \
-machine virt,aia=<either "none" or "aplic-imsic"> \
-monitor telnet:127.0.0.1:55555,server,nowait \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 \
-nographic \
-object rng-random,filename=/dev/urandom,id=rng0 \
-serial mon:stdio \
-smp 5
Andrea
Anup! On Sat, Jan 27 2024 at 21:50, Anup Patel wrote: >> Changes since v11: >> - Rebased on Linux-6.8-rc1 >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM >> MSI handling to per device MSI domains" series by Thomas. >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. >> - Updated IMSIC driver to support per-device MSI domains for PCI and >> platform devices. > > I have rebased and included 13 patches (which add per-device MSI domain > infrastructure) from your series [1]. In this series, the IMSIC driver > implements the msi_parent_ops and APLIC driver implements wired-to-msi > bridge using your new infrastructure. > > The remaining 27 patches of your series [1] requires testing on ARM > platforms which I don't have. I suggest these remaining patches to > go as separate series. Of course. Darwi (in Cc) is going to work on the ARM parts when he returns from vacation. I'm going to apply the infrastructure patches (1-13) in the next days so they are out of the way for you and Darwi, unless someone has any objections. Thanks for picking this up and driving it forward! tglx
On Thu, Feb 15, 2024 at 1:24 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Anup! > > On Sat, Jan 27 2024 at 21:50, Anup Patel wrote: > >> Changes since v11: > >> - Rebased on Linux-6.8-rc1 > >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > >> MSI handling to per device MSI domains" series by Thomas. > >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > >> - Updated IMSIC driver to support per-device MSI domains for PCI and > >> platform devices. > > > > I have rebased and included 13 patches (which add per-device MSI domain > > infrastructure) from your series [1]. In this series, the IMSIC driver > > implements the msi_parent_ops and APLIC driver implements wired-to-msi > > bridge using your new infrastructure. > > > > The remaining 27 patches of your series [1] requires testing on ARM > > platforms which I don't have. I suggest these remaining patches to > > go as separate series. > > Of course. Darwi (in Cc) is going to work on the ARM parts when he > returns from vacation. I'm going to apply the infrastructure patches > (1-13) in the next days so they are out of the way for you and Darwi, > unless someone has any objections. > > Thanks for picking this up and driving it forward! Thanks Thomas, I will be sending v13 of this series next week. For the time being, I will carry the 13 infrastructure patches in this series until they land in upstream Linux so that it is easier for people to try this series. Regards, Anup
On Wed, 14 Feb 2024 19:54:49 +0000, Thomas Gleixner <tglx@linutronix.de> wrote: > > Anup! > > On Sat, Jan 27 2024 at 21:50, Anup Patel wrote: > >> Changes since v11: > >> - Rebased on Linux-6.8-rc1 > >> - Included kernel/irq related patches from "genirq, irqchip: Convert ARM > >> MSI handling to per device MSI domains" series by Thomas. > >> (PATCH7, PATCH8, PATCH9, PATCH14, PATCH16, PATCH17, PATCH18, PATCH19, > >> PATCH20, PATCH21, PATCH22, PATCH23, and PATCH32 of > >> https://lore.kernel.org/linux-arm-kernel/20221121135653.208611233@linutronix.de/) > >> - Updated APLIC MSI-mode driver to use the new WIRED_TO_MSI mechanism. > >> - Updated IMSIC driver to support per-device MSI domains for PCI and > >> platform devices. > > > > I have rebased and included 13 patches (which add per-device MSI domain > > infrastructure) from your series [1]. In this series, the IMSIC driver > > implements the msi_parent_ops and APLIC driver implements wired-to-msi > > bridge using your new infrastructure. > > > > The remaining 27 patches of your series [1] requires testing on ARM > > platforms which I don't have. I suggest these remaining patches to > > go as separate series. > > Of course. Darwi (in Cc) is going to work on the ARM parts when he > returns from vacation. I'm going to apply the infrastructure patches > (1-13) in the next days so they are out of the way for you and Darwi, > unless someone has any objections. FWIW, I've fiven the first 13 patches a go on two of the most problematic platforms (Huawei's D05, and Marvell's McBin). Nothing immediately broke, so it's obviously perfect. Thanks, M.
Anup! On Thu, Feb 15 2024 at 11:18, Anup Patel wrote: > On Thu, Feb 15, 2024 at 1:24 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> Thanks for picking this up and driving it forward! > > Thanks Thomas, I will be sending v13 of this series next week. > > For the time being, I will carry the 13 infrastructure patches in > this series until they land in upstream Linux so that it is easier > for people to try this series. I pushed out the lot on top of 6.8-rc4 (no other changes) to git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi with some minimal changes (DEVICE_IMS -> DEVICE_MSI, removal of an unused interface). I'm going over the rest of the series after I dealt with my other patch backlog. Thanks, tglx
Hi Andrea, On Thu, Feb 8, 2024 at 3:40 PM Andrea Parri <parri.andrea@gmail.com> wrote: > > Hi Anup, > > I understand that some refactoring is in progress, but I don't see the > report below; adding it here hoping that it can be useful towards v13. > (Unfortunately, I didn't have enough time to debug this yet...) > > > > irqchip/sifive-plic: Convert PLIC driver into a platform driver > > I'm seeing the following LOCKDEP warning with this series, bisected to > the commit above. This is a defconfig + PROVE_LOCKING=y build, booted > using -machine virt,aia=none. > > [ 0.953473] ======================================================== > [ 0.953704] WARNING: possible irq lock inversion dependency detected > [ 0.953955] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted > [ 0.954224] -------------------------------------------------------- > [ 0.954444] swapper/0/0 just changed the state of lock: > [ 0.954664] ffffaf808109d0c8 (&irq_desc_lock_class){-...}-{2:2}, at: handle_fasteoi_irq+0x24/0x1da > [ 0.955699] but this lock took another, HARDIRQ-unsafe lock in the past: > [ 0.955942] (&handler->enable_lock){+.+.}-{2:2} > [ 0.955974] > > and interrupts could create inverse lock ordering between them. > > [ 0.956507] > other info that might help us debug this: > [ 0.956775] Possible interrupt unsafe locking scenario: > > [ 0.956998] CPU0 CPU1 > [ 0.957247] ---- ---- > [ 0.957439] lock(&handler->enable_lock); > [ 0.957607] local_irq_disable(); > [ 0.957793] lock(&irq_desc_lock_class); > [ 0.958021] lock(&handler->enable_lock); > [ 0.958246] <Interrupt> > [ 0.958342] lock(&irq_desc_lock_class); > [ 0.958501] > *** DEADLOCK *** I was able to reproduce this warning. Further digging, it turns out the locking safety issue existed before PLIC was converted into platform driver but was never caught by the lock dependency because previously it was being probed very early using IRQCHIP_DECLARE(). I will include a separate patch in v13 to fix this warning. Thanks, Anup > > [ 0.958715] no locks held by swapper/0/0. > [ 0.958870] > the shortest dependencies between 2nd lock and 1st lock: > [ 0.959152] -> (&handler->enable_lock){+.+.}-{2:2} { > [ 0.959372] HARDIRQ-ON-W at: > [ 0.959522] __lock_acquire+0x884/0x1f5c > [ 0.959745] lock_acquire+0xf0/0x292 > [ 0.959913] _raw_spin_lock+0x2c/0x40 > [ 0.960090] plic_probe+0x322/0x65c > [ 0.960257] platform_probe+0x4e/0x92 > [ 0.960432] really_probe+0x82/0x210 > [ 0.960598] __driver_probe_device+0x5c/0xd0 > [ 0.960784] driver_probe_device+0x2c/0xb0 > [ 0.960964] __driver_attach+0x72/0x10a > [ 0.961151] bus_for_each_dev+0x60/0xac > [ 0.961330] driver_attach+0x1a/0x22 > [ 0.961496] bus_add_driver+0xd4/0x19e > [ 0.961666] driver_register+0x3e/0xd8 > [ 0.961835] __platform_driver_register+0x1c/0x24 > [ 0.962030] plic_driver_init+0x1a/0x22 > [ 0.962201] do_one_initcall+0x80/0x268 > [ 0.962371] kernel_init_freeable+0x296/0x300 > [ 0.962554] kernel_init+0x1e/0x10a > [ 0.962713] ret_from_fork+0xe/0x1c > [ 0.962884] SOFTIRQ-ON-W at: > [ 0.962994] __lock_acquire+0x89e/0x1f5c > [ 0.963169] lock_acquire+0xf0/0x292 > [ 0.963336] _raw_spin_lock+0x2c/0x40 > [ 0.963497] plic_probe+0x322/0x65c > [ 0.963664] platform_probe+0x4e/0x92 > [ 0.963849] really_probe+0x82/0x210 > [ 0.964054] __driver_probe_device+0x5c/0xd0 > [ 0.964255] driver_probe_device+0x2c/0xb0 > [ 0.964428] __driver_attach+0x72/0x10a > [ 0.964603] bus_for_each_dev+0x60/0xac > [ 0.964777] driver_attach+0x1a/0x22 > [ 0.964943] bus_add_driver+0xd4/0x19e > [ 0.965343] driver_register+0x3e/0xd8 > [ 0.965527] __platform_driver_register+0x1c/0x24 > [ 0.965732] plic_driver_init+0x1a/0x22 > [ 0.965908] do_one_initcall+0x80/0x268 > [ 0.966078] kernel_init_freeable+0x296/0x300 > [ 0.966268] kernel_init+0x1e/0x10a > [ 0.966436] ret_from_fork+0xe/0x1c > [ 0.966599] INITIAL USE at: > [ 0.966716] __lock_acquire+0x3fc/0x1f5c > [ 0.966891] lock_acquire+0xf0/0x292 > [ 0.967048] _raw_spin_lock+0x2c/0x40 > [ 0.967206] plic_probe+0x322/0x65c > [ 0.967360] platform_probe+0x4e/0x92 > [ 0.967522] really_probe+0x82/0x210 > [ 0.967678] __driver_probe_device+0x5c/0xd0 > [ 0.967853] driver_probe_device+0x2c/0xb0 > [ 0.968025] __driver_attach+0x72/0x10a > [ 0.968185] bus_for_each_dev+0x60/0xac > [ 0.968348] driver_attach+0x1a/0x22 > [ 0.968513] bus_add_driver+0xd4/0x19e > [ 0.968678] driver_register+0x3e/0xd8 > [ 0.968839] __platform_driver_register+0x1c/0x24 > [ 0.969035] plic_driver_init+0x1a/0x22 > [ 0.969239] do_one_initcall+0x80/0x268 > [ 0.969431] kernel_init_freeable+0x296/0x300 > [ 0.969610] kernel_init+0x1e/0x10a > [ 0.969766] ret_from_fork+0xe/0x1c > [ 0.969936] } > [ 0.970010] ... key at: [<ffffffff824f4138>] __key.2+0x0/0x10 > [ 0.970224] ... acquired at: > [ 0.970353] lock_acquire+0xf0/0x292 > [ 0.970482] _raw_spin_lock+0x2c/0x40 > [ 0.970609] plic_irq_enable+0x7e/0x140 > [ 0.970739] irq_enable+0x2c/0x60 > [ 0.970882] __irq_startup+0x58/0x60 > [ 0.971008] irq_startup+0x5e/0x13c > [ 0.971126] __setup_irq+0x4de/0x5da > [ 0.971248] request_threaded_irq+0xcc/0x12e > [ 0.971394] vm_find_vqs+0x62/0x50a > [ 0.971518] probe_common+0xfe/0x1d2 > [ 0.971635] virtrng_probe+0xc/0x14 > [ 0.971751] virtio_dev_probe+0x154/0x1fc > [ 0.971878] really_probe+0x82/0x210 > [ 0.972008] __driver_probe_device+0x5c/0xd0 > [ 0.972147] driver_probe_device+0x2c/0xb0 > [ 0.972280] __driver_attach+0x72/0x10a > [ 0.972407] bus_for_each_dev+0x60/0xac > [ 0.972540] driver_attach+0x1a/0x22 > [ 0.972656] bus_add_driver+0xd4/0x19e > [ 0.972777] driver_register+0x3e/0xd8 > [ 0.972896] register_virtio_driver+0x1c/0x2a > [ 0.973049] virtio_rng_driver_init+0x18/0x20 > [ 0.973236] do_one_initcall+0x80/0x268 > [ 0.973399] kernel_init_freeable+0x296/0x300 > [ 0.973540] kernel_init+0x1e/0x10a > [ 0.973658] ret_from_fork+0xe/0x1c > > [ 0.973858] -> (&irq_desc_lock_class){-...}-{2:2} { > [ 0.974036] IN-HARDIRQ-W at: > [ 0.974142] __lock_acquire+0xa82/0x1f5c > [ 0.974309] lock_acquire+0xf0/0x292 > [ 0.974467] _raw_spin_lock+0x2c/0x40 > [ 0.974625] handle_fasteoi_irq+0x24/0x1da > [ 0.974794] generic_handle_domain_irq+0x1c/0x2a > [ 0.974982] plic_handle_irq+0x7e/0xf0 > [ 0.975143] generic_handle_domain_irq+0x1c/0x2a > [ 0.975329] riscv_intc_irq+0x2e/0x46 > [ 0.975488] handle_riscv_irq+0x4a/0x74 > [ 0.975652] call_on_irq_stack+0x32/0x40 > [ 0.975817] INITIAL USE at: > [ 0.975923] __lock_acquire+0x3fc/0x1f5c > [ 0.976087] lock_acquire+0xf0/0x292 > [ 0.976244] _raw_spin_lock_irqsave+0x3a/0x64 > [ 0.976423] __irq_get_desc_lock+0x5a/0x84 > [ 0.976594] irq_modify_status+0x2a/0x106 > [ 0.976764] irq_set_percpu_devid+0x62/0x78 > [ 0.976939] riscv_intc_domain_map+0x1e/0x54 > [ 0.977133] irq_domain_associate_locked+0x42/0xe4 > [ 0.977363] irq_create_mapping_affinity+0x98/0xd4 > [ 0.977570] sbi_ipi_init+0x70/0x142 > [ 0.977744] init_IRQ+0xfe/0x11a > [ 0.977906] start_kernel+0x4ae/0x790 > [ 0.978082] } > [ 0.978151] ... key at: [<ffffffff824bbee0>] irq_desc_lock_class+0x0/0x10 > [ 0.978389] ... acquired at: > [ 0.978494] mark_lock+0x3fe/0x8c2 > [ 0.978624] __lock_acquire+0xa82/0x1f5c > [ 0.978766] lock_acquire+0xf0/0x292 > [ 0.978897] _raw_spin_lock+0x2c/0x40 > [ 0.979029] handle_fasteoi_irq+0x24/0x1da > [ 0.979171] generic_handle_domain_irq+0x1c/0x2a > [ 0.979326] plic_handle_irq+0x7e/0xf0 > [ 0.979460] generic_handle_domain_irq+0x1c/0x2a > [ 0.979618] riscv_intc_irq+0x2e/0x46 > [ 0.979751] handle_riscv_irq+0x4a/0x74 > [ 0.979888] call_on_irq_stack+0x32/0x40 > > [ 0.980110] > stack backtrace: > [ 0.980358] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122 > [ 0.980662] Hardware name: riscv-virtio,qemu (DT) > [ 0.980913] Call Trace: > [ 0.981042] [<ffffffff80007198>] dump_backtrace+0x1c/0x24 > [ 0.981246] [<ffffffff80ae020a>] show_stack+0x2c/0x38 > [ 0.981456] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c > [ 0.981648] [<ffffffff80aedafa>] dump_stack+0x14/0x1c > [ 0.981813] [<ffffffff80ae17a4>] print_irq_inversion_bug.part.0+0x162/0x176 > [ 0.982031] [<ffffffff8007c6e6>] mark_lock+0x3fe/0x8c2 > [ 0.982198] [<ffffffff8007d888>] __lock_acquire+0xa82/0x1f5c > [ 0.982377] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292 > [ 0.982549] [<ffffffff80af9962>] _raw_spin_lock+0x2c/0x40 > [ 0.982721] [<ffffffff8008f3fe>] handle_fasteoi_irq+0x24/0x1da > [ 0.982904] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a > [ 0.983112] [<ffffffff80581dc0>] plic_handle_irq+0x7e/0xf0 > [ 0.983293] [<ffffffff8008a4a4>] generic_handle_domain_irq+0x1c/0x2a > [ 0.983495] [<ffffffff8057fb1a>] riscv_intc_irq+0x2e/0x46 > [ 0.983671] [<ffffffff80aedb4c>] handle_riscv_irq+0x4a/0x74 > [ 0.983856] [<ffffffff80afa756>] call_on_irq_stack+0x32/0x40 > > > When I switch to -machine virt,aia=aplic-imsic (same config as above), I > get: > > [ 0.971406] ============================================ > [ 0.971439] WARNING: possible recursive locking detected > [ 0.971497] 6.8.0-rc1-00039-gd9b9d6eb987f #1122 Not tainted > [ 0.971583] -------------------------------------------- > [ 0.971612] swapper/0/1 is trying to acquire lock: > [ 0.971662] ffffaf83fefa8e78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x92/0x146 > [ 0.971927] > but task is already holding lock: > [ 0.971975] ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146 > [ 0.972045] > other info that might help us debug this: > [ 0.972085] Possible unsafe locking scenario: > > [ 0.972114] CPU0 > [ 0.972133] ---- > [ 0.972153] lock(&lpriv->ids_lock); > [ 0.972191] lock(&lpriv->ids_lock); > [ 0.972228] > *** DEADLOCK *** > > [ 0.972258] May be due to missing lock nesting notation > > [ 0.972306] 6 locks held by swapper/0/1: > [ 0.972338] #0: ffffaf8081f65970 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x6a/0x10a > [ 0.972413] #1: ffffaf808217c240 (&desc->request_mutex){+.+.}-{3:3}, at: __setup_irq+0xa2/0x5da > [ 0.972492] #2: ffffaf808217c0c8 (&irq_desc_lock_class){....}-{2:2}, at: __setup_irq+0xbe/0x5da > [ 0.972555] #3: ffffffff81892ac0 (mask_lock){....}-{2:2}, at: irq_setup_affinity+0x38/0xc6 > [ 0.972617] #4: ffffffff81892a80 (tmp_mask_lock){....}-{2:2}, at: irq_do_set_affinity+0x3a/0x164 > [ 0.972681] #5: ffffaf83fef6ee78 (&lpriv->ids_lock){-...}-{2:2}, at: imsic_vector_move+0x86/0x146 > [ 0.972753] > stack backtrace: > [ 0.972852] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc1-00039-gd9b9d6eb987f #1122 > [ 0.972900] Hardware name: riscv-virtio,qemu (DT) > [ 0.972987] Call Trace: > [ 0.973019] [<ffffffff80007198>] dump_backtrace+0x1c/0x24 > [ 0.973054] [<ffffffff80ae020a>] show_stack+0x2c/0x38 > [ 0.973083] [<ffffffff80aedac4>] dump_stack_lvl+0x5a/0x7c > [ 0.973112] [<ffffffff80aedafa>] dump_stack+0x14/0x1c > [ 0.973139] [<ffffffff8007ad5e>] print_deadlock_bug+0x282/0x328 > [ 0.973168] [<ffffffff8007e15c>] __lock_acquire+0x1356/0x1f5c > [ 0.973198] [<ffffffff8007f59e>] lock_acquire+0xf0/0x292 > [ 0.973225] [<ffffffff80af9adc>] _raw_spin_lock_irqsave+0x3a/0x64 > [ 0.973255] [<ffffffff80581210>] imsic_vector_move+0x92/0x146 > [ 0.973285] [<ffffffff80581a04>] imsic_irq_set_affinity+0x8e/0xc6 > [ 0.973315] [<ffffffff8008c86a>] irq_do_set_affinity+0x142/0x164 > [ 0.973345] [<ffffffff8008cc22>] irq_setup_affinity+0x68/0xc6 > [ 0.973374] [<ffffffff8008fa82>] irq_startup+0x72/0x13c > [ 0.973401] [<ffffffff8008d40c>] __setup_irq+0x4de/0x5da > [ 0.973430] [<ffffffff8008d5d4>] request_threaded_irq+0xcc/0x12e > [ 0.973460] [<ffffffff806346d8>] vp_find_vqs_msix+0x114/0x376 > [ 0.973491] [<ffffffff80634970>] vp_find_vqs+0x36/0x136 > [ 0.973518] [<ffffffff80633280>] vp_modern_find_vqs+0x16/0x4e > [ 0.973547] [<ffffffff80ab31f8>] p9_virtio_probe+0x8e/0x31c > [ 0.973576] [<ffffffff8062d982>] virtio_dev_probe+0x154/0x1fc > [ 0.973605] [<ffffffff80693738>] really_probe+0x82/0x210 > [ 0.973632] [<ffffffff80693922>] __driver_probe_device+0x5c/0xd0 > [ 0.973661] [<ffffffff806939c2>] driver_probe_device+0x2c/0xb0 > [ 0.973690] [<ffffffff80693b46>] __driver_attach+0x72/0x10a > [ 0.973718] [<ffffffff8069191a>] bus_for_each_dev+0x60/0xac > [ 0.973746] [<ffffffff80693164>] driver_attach+0x1a/0x22 > [ 0.973773] [<ffffffff80692ade>] bus_add_driver+0xd4/0x19e > [ 0.973801] [<ffffffff8069487e>] driver_register+0x3e/0xd8 > [ 0.973829] [<ffffffff8062d1ce>] register_virtio_driver+0x1c/0x2a > [ 0.973858] [<ffffffff80c3da52>] p9_virtio_init+0x36/0x56 > [ 0.973887] [<ffffffff800028fe>] do_one_initcall+0x80/0x268 > [ 0.973915] [<ffffffff80c01144>] kernel_init_freeable+0x296/0x300 > [ 0.973944] [<ffffffff80af05dc>] kernel_init+0x1e/0x10a > [ 0.973972] [<ffffffff80afa716>] ret_from_fork+0xe/0x1c > > > FWIW, the full Qemu command I used was as follows: > > sudo /home/andrea/Downloads/qemu/build/qemu-system-riscv64 \ > -append "root=/dev/root rw rootfstype=9p rootflags=version=9p2000.L,trans=virtio,cache=mmap,access=any raid=noautodetect security=none loglevel=7" \ > -cpu rv64,sv57=off,svadu=off,svnapot=off \ > -device virtio-net-device,netdev=net0 \ > -device virtio-rng-device,rng=rng0 \ > -device virtio-9p-pci,fsdev=root,mount_tag=/dev/root \ > -fsdev local,id=root,path=/home/andrea/Downloads/jammy/,security_model=none \ > -kernel /home/andrea/linux/arch/riscv/boot/Image \ > -m 16G \ > -machine virt,aia=<either "none" or "aplic-imsic"> \ > -monitor telnet:127.0.0.1:55555,server,nowait \ > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 \ > -nographic \ > -object rng-random,filename=/dev/urandom,id=rng0 \ > -serial mon:stdio \ > -smp 5 > > > Andrea
Anup! On Thu, Feb 15 2024 at 20:59, Thomas Gleixner wrote: > I'm going over the rest of the series after I dealt with my other patch > backlog. Aside of the nitpicks I had, this looks pretty reasonable. Thanks, tglx
On Sat, Feb 17, 2024 at 2:35 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Anup! > > On Thu, Feb 15 2024 at 20:59, Thomas Gleixner wrote: > > I'm going over the rest of the series after I dealt with my other patch > > backlog. > > Aside of the nitpicks I had, this looks pretty reasonable. Thanks for your review. I have sent v13 series. Please have a look. Regards, Anup > > Thanks, > > tglx