Message ID | 20240904014426.3404397-1-jjang@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | selftest: drivers: Add support to check duplicate hwirq | expand |
On 2024/9/4 9:44 AM, Joseph Jang wrote: > Validate there are no duplicate hwirq from the irq debug > file system /sys/kernel/debug/irq/irqs/* per chip name. > > One example log show 2 duplicated hwirq in the irq debug > file system. > > $ sudo cat /sys/kernel/debug/irq/irqs/163 > handler: handle_fasteoi_irq > device: 0019:00:00.0 > <SNIP> > node: 1 > affinity: 72-143 > effectiv: 76 > domain: irqchip@0x0000100022040000-3 > hwirq: 0xc8000000 > chip: ITS-MSI > flags: 0x20 > > $ sudo cat /sys/kernel/debug/irq/irqs/174 > handler: handle_fasteoi_irq > device: 0039:00:00.0 > <SNIP> > node: 3 > affinity: 216-287 > effectiv: 221 > domain: irqchip@0x0000300022040000-3 > hwirq: 0xc8000000 > chip: ITS-MSI > flags: 0x20 > > The irq-check.sh can help to collect hwirq and chip name from > /sys/kernel/debug/irq/irqs/* and print error log when find duplicate > hwirq per chip name. > > Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. > [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/ > > Signed-off-by: Joseph Jang <jjang@nvidia.com> > Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> > --- > tools/testing/selftests/drivers/irq/Makefile | 5 +++ > tools/testing/selftests/drivers/irq/config | 2 + > .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ > 3 files changed, 46 insertions(+) > create mode 100644 tools/testing/selftests/drivers/irq/Makefile > create mode 100644 tools/testing/selftests/drivers/irq/config > create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh > > diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile > new file mode 100644 > index 000000000000..d6998017c861 > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/Makefile > @@ -0,0 +1,5 @@ > +# SPDX-License-Identifier: GPL-2.0 > + > +TEST_PROGS := irq-check.sh > + > +include ../../lib.mk > diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config > new file mode 100644 > index 000000000000..a53d3b713728 > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/config > @@ -0,0 +1,2 @@ > +CONFIG_GENERIC_IRQ_DEBUGFS=y > +CONFIG_GENERIC_IRQ_INJECTION=y > diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh > new file mode 100755 > index 000000000000..e784777043a1 > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/irq-check.sh > @@ -0,0 +1,39 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0 > + > +# This script need root permission > +uid=$(id -u) > +if [ $uid -ne 0 ]; then > + echo "SKIP: Must be run as root" > + exit 4 > +fi > + > +# Ensure debugfs is mounted > +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null > +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then > + echo "SKIP: irq debugfs not found" > + exit 4 > +fi > + > +# Traverse the irq debug file system directory to collect chip_name and hwirq > +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do > + # Read chip name and hwirq from the irq_file > + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}') > + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' ) > + > + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then > + continue > + fi > + > + echo "$chip_name $hwirq" > +done) > + > +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd) > + > +if [ -n "$dup_hwirq_list" ]; then > + echo "ERROR: Found duplicate hwirq" > + echo "$dup_hwirq_list" > + exit 1 > +fi > + > +exit 0 Hi Tglx, I follow your suggestions https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq. If you have available time, would you please help to take a look at new patch again ? https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/ Hi Shuah, If you have time, could you help to take a look at the new patch ? Thank you, Joseph.
On 10/17/24 22:29, Joseph Jang wrote: > > > On 2024/9/4 9:44 AM, Joseph Jang wrote: >> Validate there are no duplicate hwirq from the irq debug >> file system /sys/kernel/debug/irq/irqs/* per chip name. >> >> One example log show 2 duplicated hwirq in the irq debug >> file system. >> >> $ sudo cat /sys/kernel/debug/irq/irqs/163 >> handler: handle_fasteoi_irq >> device: 0019:00:00.0 >> <SNIP> >> node: 1 >> affinity: 72-143 >> effectiv: 76 >> domain: irqchip@0x0000100022040000-3 >> hwirq: 0xc8000000 >> chip: ITS-MSI >> flags: 0x20 >> >> $ sudo cat /sys/kernel/debug/irq/irqs/174 >> handler: handle_fasteoi_irq >> device: 0039:00:00.0 >> <SNIP> >> node: 3 >> affinity: 216-287 >> effectiv: 221 >> domain: irqchip@0x0000300022040000-3 >> hwirq: 0xc8000000 >> chip: ITS-MSI >> flags: 0x20 >> >> The irq-check.sh can help to collect hwirq and chip name from >> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate >> hwirq per chip name. >> >> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. >> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/ >> >> Signed-off-by: Joseph Jang <jjang@nvidia.com> >> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> >> --- >> tools/testing/selftests/drivers/irq/Makefile | 5 +++ >> tools/testing/selftests/drivers/irq/config | 2 + >> .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ >> 3 files changed, 46 insertions(+) >> create mode 100644 tools/testing/selftests/drivers/irq/Makefile >> create mode 100644 tools/testing/selftests/drivers/irq/config >> create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh >> >> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile >> new file mode 100644 >> index 000000000000..d6998017c861 >> --- /dev/null >> +++ b/tools/testing/selftests/drivers/irq/Makefile >> @@ -0,0 +1,5 @@ >> +# SPDX-License-Identifier: GPL-2.0 >> + >> +TEST_PROGS := irq-check.sh >> + >> +include ../../lib.mk >> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config >> new file mode 100644 >> index 000000000000..a53d3b713728 >> --- /dev/null >> +++ b/tools/testing/selftests/drivers/irq/config >> @@ -0,0 +1,2 @@ >> +CONFIG_GENERIC_IRQ_DEBUGFS=y >> +CONFIG_GENERIC_IRQ_INJECTION=y >> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh >> new file mode 100755 >> index 000000000000..e784777043a1 >> --- /dev/null >> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh >> @@ -0,0 +1,39 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> + >> +# This script need root permission >> +uid=$(id -u) >> +if [ $uid -ne 0 ]; then >> + echo "SKIP: Must be run as root" >> + exit 4 >> +fi >> + >> +# Ensure debugfs is mounted >> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null >> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then >> + echo "SKIP: irq debugfs not found" >> + exit 4 >> +fi >> + >> +# Traverse the irq debug file system directory to collect chip_name and hwirq >> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do >> + # Read chip name and hwirq from the irq_file >> + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}') >> + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' ) >> + >> + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then >> + continue >> + fi >> + >> + echo "$chip_name $hwirq" >> +done) >> + >> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd) >> + >> +if [ -n "$dup_hwirq_list" ]; then >> + echo "ERROR: Found duplicate hwirq" >> + echo "$dup_hwirq_list" >> + exit 1 >> +fi >> + >> +exit 0 > > Hi Tglx, > > I follow your suggestions https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq. If you have available time, would you please help to take a look at new patch again ? > > > https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/ > > > Hi Shuah, > > If you have time, could you help to take a look at the new patch ? > Once Thomas reviews this and gives me okay - I will accept the patch. thanks, -- Shuah
On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote: > Validate there are no duplicate hwirq from the irq debug > file system /sys/kernel/debug/irq/irqs/* per chip name. > > One example log show 2 duplicated hwirq in the irq debug > file system. > > $ sudo cat /sys/kernel/debug/irq/irqs/163 > handler: handle_fasteoi_irq > device: 0019:00:00.0 > <SNIP> > node: 1 > affinity: 72-143 > effectiv: 76 > domain: irqchip@0x0000100022040000-3 > hwirq: 0xc8000000 > chip: ITS-MSI > flags: 0x20 > > $ sudo cat /sys/kernel/debug/irq/irqs/174 > handler: handle_fasteoi_irq > device: 0039:00:00.0 > <SNIP> > node: 3 > affinity: 216-287 > effectiv: 221 > domain: irqchip@0x0000300022040000-3 > hwirq: 0xc8000000 > chip: ITS-MSI > flags: 0x20 > > The irq-check.sh can help to collect hwirq and chip name from > /sys/kernel/debug/irq/irqs/* and print error log when find duplicate > hwirq per chip name. > > Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. > [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/ I don't know enough about this issue to understand the details. It seems like you look for duplicate hwirqs in chips with the same name, e.g., "ITS-MSI" in this case? That name seems too generic to me (might there be several instances of "ITS-MSI" in a system?) Also, the name may come from chip->irq_print_chip(), so it apparently relies on irqchip drivers to make the names unique if there are multiple instances? I would have expected looking for duplicates inside something more specific, like "irqchip@0x0000300022040000-3". But again, I don't know enough about the problem to speak confidently here. Cosmetic nits: - Tweak subject to match history (use "git log --oneline tools/testing/selftests/drivers/" to see it), e.g., selftests: irq: Add check for duplicate hwirq - Rewrap commit log to fill 75 columns. No point in using shorter lines. - Indent the "$ sudu cat ..." block by a couple spaces since it's effectively a quotation, not part of the main text body. - Possibly include sample output of irq-check.sh (also indented as a quote) when run on the system where you manually found the duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..." - Reword "The irq-check.sh can help ..." to something like this: Add an irq-check.sh test to report errors when there are duplicate hwirqs per chip name. - Since the kernel patch has already been merged, cite it like this instead of using the https://lore URL: db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation") > Signed-off-by: Joseph Jang <jjang@nvidia.com> > Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> > --- > tools/testing/selftests/drivers/irq/Makefile | 5 +++ > tools/testing/selftests/drivers/irq/config | 2 + > .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ > 3 files changed, 46 insertions(+) > create mode 100644 tools/testing/selftests/drivers/irq/Makefile > create mode 100644 tools/testing/selftests/drivers/irq/config > create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh > > diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile > new file mode 100644 > index 000000000000..d6998017c861 > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/Makefile > @@ -0,0 +1,5 @@ > +# SPDX-License-Identifier: GPL-2.0 > + > +TEST_PROGS := irq-check.sh > + > +include ../../lib.mk > diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config > new file mode 100644 > index 000000000000..a53d3b713728 > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/config > @@ -0,0 +1,2 @@ > +CONFIG_GENERIC_IRQ_DEBUGFS=y > +CONFIG_GENERIC_IRQ_INJECTION=y > diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh > new file mode 100755 > index 000000000000..e784777043a1 > --- /dev/null > +++ b/tools/testing/selftests/drivers/irq/irq-check.sh > @@ -0,0 +1,39 @@ > +#!/bin/bash > +# SPDX-License-Identifier: GPL-2.0 > + > +# This script need root permission > +uid=$(id -u) > +if [ $uid -ne 0 ]; then > + echo "SKIP: Must be run as root" > + exit 4 > +fi > + > +# Ensure debugfs is mounted > +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null > +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then > + echo "SKIP: irq debugfs not found" > + exit 4 > +fi > + > +# Traverse the irq debug file system directory to collect chip_name and hwirq > +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do > + # Read chip name and hwirq from the irq_file > + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}') > + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' ) > + > + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then > + continue > + fi > + > + echo "$chip_name $hwirq" > +done) > + > +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd) > + > +if [ -n "$dup_hwirq_list" ]; then > + echo "ERROR: Found duplicate hwirq" > + echo "$dup_hwirq_list" > + exit 1 > +fi > + > +exit 0 > -- > 2.34.1 >
On 2024/10/19 3:34 AM, Bjorn Helgaas wrote: > On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote: >> Validate there are no duplicate hwirq from the irq debug >> file system /sys/kernel/debug/irq/irqs/* per chip name. >> >> One example log show 2 duplicated hwirq in the irq debug >> file system. >> >> $ sudo cat /sys/kernel/debug/irq/irqs/163 >> handler: handle_fasteoi_irq >> device: 0019:00:00.0 >> <SNIP> >> node: 1 >> affinity: 72-143 >> effectiv: 76 >> domain: irqchip@0x0000100022040000-3 >> hwirq: 0xc8000000 >> chip: ITS-MSI >> flags: 0x20 >> >> $ sudo cat /sys/kernel/debug/irq/irqs/174 >> handler: handle_fasteoi_irq >> device: 0039:00:00.0 >> <SNIP> >> node: 3 >> affinity: 216-287 >> effectiv: 221 >> domain: irqchip@0x0000300022040000-3 >> hwirq: 0xc8000000 >> chip: ITS-MSI >> flags: 0x20 >> >> The irq-check.sh can help to collect hwirq and chip name from >> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate >> hwirq per chip name. >> >> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. >> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/ > > I don't know enough about this issue to understand the details. It > seems like you look for duplicate hwirqs in chips with the same name, > e.g., "ITS-MSI" in this case? That name seems too generic to me > (might there be several instances of "ITS-MSI" in a system?) > As I know, each PCIe device typically has only one ITS-MSI controller. Having multiple ITS-MSI instances for the same device would lead to confusion and potential conflicts in interrupt routing. > Also, the name may come from chip->irq_print_chip(), so it apparently > relies on irqchip drivers to make the names unique if there are > multiple instances? > > I would have expected looking for duplicates inside something more > specific, like "irqchip@0x0000300022040000-3". But again, I don't > know enough about the problem to speak confidently here. > In our case, If we look for duplicates by different irq domains like "irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as following example. $ sudo cat /sys/kernel/debug/irq/irqs/163 handler: handle_fasteoi_irq device: 0019:00:00.0 <SNIP> node: 1 affinity: 72-143 effectiv: 76 domain: irqchip@0x0000100022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20 $ sudo cat /sys/kernel/debug/irq/irqs/174 handler: handle_fasteoi_irq device: 0039:00:00.0 <SNIP> node: 3 affinity: 216-287 effectiv: 221 domain: irqchip@0x0000300022040000-3 hwirq: 0xc8000000 chip: ITS-MSI flags: 0x20 We could not detect the duplicated hwirq number (0xc8000000) in this case. > Cosmetic nits: > > - Tweak subject to match history (use "git log --oneline > tools/testing/selftests/drivers/" to see it), e.g., > > selftests: irq: Add check for duplicate hwirq > > - Rewrap commit log to fill 75 columns. No point in using shorter > lines. > > - Indent the "$ sudu cat ..." block by a couple spaces since it's > effectively a quotation, not part of the main text body. > > - Possibly include sample output of irq-check.sh (also indented as a > quote) when run on the system where you manually found the > duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..." > > - Reword "The irq-check.sh can help ..." to something like this: > > Add an irq-check.sh test to report errors when there are > duplicate hwirqs per chip name. > > - Since the kernel patch has already been merged, cite it like this > instead of using the https://lore URL: > > db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation") > If you agree to use irq chip name ("ITS-MSI") to scan duplicate hwirq, I could send version 2 patch to fix above suggestions. Thank you, Joseph. >> Signed-off-by: Joseph Jang <jjang@nvidia.com> >> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> >> --- >> tools/testing/selftests/drivers/irq/Makefile | 5 +++ >> tools/testing/selftests/drivers/irq/config | 2 + >> .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ >> 3 files changed, 46 insertions(+) >> create mode 100644 tools/testing/selftests/drivers/irq/Makefile >> create mode 100644 tools/testing/selftests/drivers/irq/config >> create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh >> >> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile >> new file mode 100644 >> index 000000000000..d6998017c861 >> --- /dev/null >> +++ b/tools/testing/selftests/drivers/irq/Makefile >> @@ -0,0 +1,5 @@ >> +# SPDX-License-Identifier: GPL-2.0 >> + >> +TEST_PROGS := irq-check.sh >> + >> +include ../../lib.mk >> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config >> new file mode 100644 >> index 000000000000..a53d3b713728 >> --- /dev/null >> +++ b/tools/testing/selftests/drivers/irq/config >> @@ -0,0 +1,2 @@ >> +CONFIG_GENERIC_IRQ_DEBUGFS=y >> +CONFIG_GENERIC_IRQ_INJECTION=y >> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh >> new file mode 100755 >> index 000000000000..e784777043a1 >> --- /dev/null >> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh >> @@ -0,0 +1,39 @@ >> +#!/bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> + >> +# This script need root permission >> +uid=$(id -u) >> +if [ $uid -ne 0 ]; then >> + echo "SKIP: Must be run as root" >> + exit 4 >> +fi >> + >> +# Ensure debugfs is mounted >> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null >> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then >> + echo "SKIP: irq debugfs not found" >> + exit 4 >> +fi >> + >> +# Traverse the irq debug file system directory to collect chip_name and hwirq >> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do >> + # Read chip name and hwirq from the irq_file >> + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}') >> + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' ) >> + >> + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then >> + continue >> + fi >> + >> + echo "$chip_name $hwirq" >> +done) >> + >> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd) >> + >> +if [ -n "$dup_hwirq_list" ]; then >> + echo "ERROR: Found duplicate hwirq" >> + echo "$dup_hwirq_list" >> + exit 1 >> +fi >> + >> +exit 0 >> -- >> 2.34.1 >> >
On Mon, Nov 11, 2024 at 03:21:36PM +0800, Joseph Jang wrote: > On 2024/10/19 3:34 AM, Bjorn Helgaas wrote: > > On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote: > > > Validate there are no duplicate hwirq from the irq debug > > > file system /sys/kernel/debug/irq/irqs/* per chip name. > > > > > > One example log show 2 duplicated hwirq in the irq debug > > > file system. > > > > > > $ sudo cat /sys/kernel/debug/irq/irqs/163 > > > handler: handle_fasteoi_irq > > > device: 0019:00:00.0 > > > <SNIP> > > > node: 1 > > > affinity: 72-143 > > > effectiv: 76 > > > domain: irqchip@0x0000100022040000-3 > > > hwirq: 0xc8000000 > > > chip: ITS-MSI > > > flags: 0x20 > > > > > > $ sudo cat /sys/kernel/debug/irq/irqs/174 > > > handler: handle_fasteoi_irq > > > device: 0039:00:00.0 > > > <SNIP> > > > node: 3 > > > affinity: 216-287 > > > effectiv: 221 > > > domain: irqchip@0x0000300022040000-3 > > > hwirq: 0xc8000000 > > > chip: ITS-MSI > > > flags: 0x20 > > > > > > The irq-check.sh can help to collect hwirq and chip name from > > > /sys/kernel/debug/irq/irqs/* and print error log when find duplicate > > > hwirq per chip name. > > > > > > Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue. > > > [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/ > > > > I don't know enough about this issue to understand the details. It > > seems like you look for duplicate hwirqs in chips with the same name, > > e.g., "ITS-MSI" in this case? That name seems too generic to me > > (might there be several instances of "ITS-MSI" in a system?) > > As I know, each PCIe device typically has only one ITS-MSI controller. > Having multiple ITS-MSI instances for the same device would lead to > confusion and potential conflicts in interrupt routing. > > > Also, the name may come from chip->irq_print_chip(), so it apparently > > relies on irqchip drivers to make the names unique if there are > > multiple instances? > > > > I would have expected looking for duplicates inside something more > > specific, like "irqchip@0x0000300022040000-3". But again, I don't > > know enough about the problem to speak confidently here. > > In our case, If we look for duplicates by different irq domains like > "irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as > following example. > > $ sudo cat /sys/kernel/debug/irq/irqs/163 > handler: handle_fasteoi_irq > device: 0019:00:00.0 > <SNIP> > node: 1 > affinity: 72-143 > effectiv: 76 > domain: irqchip@0x0000100022040000-3 > hwirq: 0xc8000000 > chip: ITS-MSI > flags: 0x20 > $ sudo cat /sys/kernel/debug/irq/irqs/174 > handler: handle_fasteoi_irq > device: 0039:00:00.0 > <SNIP> > node: 3 > affinity: 216-287 > effectiv: 221 > domain: irqchip@0x0000300022040000-3 > hwirq: 0xc8000000 > chip: ITS-MSI > flags: 0x20 > > We could not detect the duplicated hwirq number (0xc8000000) in this > case. Again, this is really out of my area, but based on Documentation/core-api/irq/irq-domain.rst, I assumed the point of hwirq was that hwirq numbers were local to an interrupt controller, i.e., to an irq_domain. If that's the case, it should not be a problem if hwirq number 0xc8000000 is used in two separate irq_domains. Bjorn
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile new file mode 100644 index 000000000000..d6998017c861 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +TEST_PROGS := irq-check.sh + +include ../../lib.mk diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config new file mode 100644 index 000000000000..a53d3b713728 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/config @@ -0,0 +1,2 @@ +CONFIG_GENERIC_IRQ_DEBUGFS=y +CONFIG_GENERIC_IRQ_INJECTION=y diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh new file mode 100755 index 000000000000..e784777043a1 --- /dev/null +++ b/tools/testing/selftests/drivers/irq/irq-check.sh @@ -0,0 +1,39 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# This script need root permission +uid=$(id -u) +if [ $uid -ne 0 ]; then + echo "SKIP: Must be run as root" + exit 4 +fi + +# Ensure debugfs is mounted +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then + echo "SKIP: irq debugfs not found" + exit 4 +fi + +# Traverse the irq debug file system directory to collect chip_name and hwirq +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do + # Read chip name and hwirq from the irq_file + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}') + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' ) + + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then + continue + fi + + echo "$chip_name $hwirq" +done) + +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd) + +if [ -n "$dup_hwirq_list" ]; then + echo "ERROR: Found duplicate hwirq" + echo "$dup_hwirq_list" + exit 1 +fi + +exit 0