diff mbox

[2/2] PCI: fix system hang issue of Marvell SATA host controller

Message ID 1362666556-10036-1-git-send-email-yxlraid@gmail.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

yxlraid@gmail.com March 7, 2013, 2:29 p.m. UTC
From: Xiangliang Yu <yuxiangl@marvell.com>

Fix system hang issue: if first accessed resource file of BAR0 ~
BAR4, system will hang after executing lspci command
---
 drivers/pci/quirks.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

Comments

Bjorn Helgaas March 7, 2013, 4:28 p.m. UTC | #1
On Thu, Mar 7, 2013 at 7:29 AM,  <yxlraid@gmail.com> wrote:
> From: Xiangliang Yu <yuxiangl@marvell.com>
>
> Fix system hang issue: if first accessed resource file of BAR0 ~
> BAR4, system will hang after executing lspci command

This needs more explanation.  We've already read the BARs by the time
header quirks are run, so apparently it's not just the mere act of
accessing a BAR that causes a hang.

We need to know exactly what's going on here.  For example, do BARs
0-4 exist?  Does the device decode accesses to the regions described
by the BARs?  The PCI core has to know what resources the device uses,
so if the device decodes accesses, we can't just throw away the
start/end information.

> ---
>  drivers/pci/quirks.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 0369fb6..d49f8dc 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>                                 PCI_CLASS_BRIDGE_HOST, 8, quirk_mmio_always_on);
>
> +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
> +*  by IO resource file, and need to skip the files
> +*/
> +static void quirk_marvell_mask_bar(struct pci_dev *dev)
> +{
> +       int i;
> +
> +       for (i = 0; i < 5; i++)
> +               if (dev->resource[i].start)
> +                       dev->resource[i].start =
> +                               dev->resource[i].end = 0;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
> +                               quirk_marvell_mask_bar);
> +
>  /* The Mellanox Tavor device gives false positive parity errors
>   * Mark this device with a broken_parity_status, to allow
>   * PCI scanning code to "skip" this now blacklisted device.
> --
> 1.7.5.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiangliang Yu March 8, 2013, 3:07 a.m. UTC | #2
Hi, Bjorn

> > Fix system hang issue: if first accessed resource file of BAR0 ~
> > BAR4, system will hang after executing lspci command
> 
> This needs more explanation.  We've already read the BARs by the time
> header quirks are run, so apparently it's not just the mere act of
> accessing a BAR that causes a hang.
> 
> We need to know exactly what's going on here.  For example, do BARs
> 0-4 exist?  Does the device decode accesses to the regions described
> by the BARs?  The PCI core has to know what resources the device uses,
> so if the device decodes accesses, we can't just throw away the
> start/end information.
The BARs 0-4 is exist and the PCI device is enable IO space, but user access the regions file by udevadm command with info parameter, the system will hang.
Like this: udevadmin info --attribut-walk --path=/sys/device/pci-device/000:*.
Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region file. 
Is my explanation ok for the patch?


> 
> > ---
> >  drivers/pci/quirks.c |   15 +++++++++++++++
> >  1 files changed, 15 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 0369fb6..d49f8dc 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
> >                                 PCI_CLASS_BRIDGE_HOST, 8,
> quirk_mmio_always_on);
> >
> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
> > +*  by IO resource file, and need to skip the files
> > +*/
> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < 5; i++)
> > +               if (dev->resource[i].start)
> > +                       dev->resource[i].start =
> > +                               dev->resource[i].end = 0;
> > +}
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
> > +                               quirk_marvell_mask_bar);
> > +
> >  /* The Mellanox Tavor device gives false positive parity errors
> >   * Mark this device with a broken_parity_status, to allow
> >   * PCI scanning code to "skip" this now blacklisted device.
> > --
> > 1.7.5.4
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas March 8, 2013, 4:19 a.m. UTC | #3
On Thu, Mar 7, 2013 at 8:07 PM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Bjorn
>
>> > Fix system hang issue: if first accessed resource file of BAR0 ~
>> > BAR4, system will hang after executing lspci command
>>
>> This needs more explanation.  We've already read the BARs by the time
>> header quirks are run, so apparently it's not just the mere act of
>> accessing a BAR that causes a hang.
>>
>> We need to know exactly what's going on here.  For example, do BARs
>> 0-4 exist?  Does the device decode accesses to the regions described
>> by the BARs?  The PCI core has to know what resources the device uses,
>> so if the device decodes accesses, we can't just throw away the
>> start/end information.
> The BARs 0-4 is exist and the PCI device is enable IO space, but user access the regions file by udevadm command with info parameter, the system will hang.
> Like this: udevadmin info --attribut-walk --path=/sys/device/pci-device/000:*.
> Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region file.
> Is my explanation ok for the patch?

No, I still don't know what causes the hang; I only know that udevadm
can trigger it.  I don't want to just paper over the problem until we
know what the root cause is.

Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?

>>
>> > ---
>> >  drivers/pci/quirks.c |   15 +++++++++++++++
>> >  1 files changed, 15 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> > index 0369fb6..d49f8dc 100644
>> > --- a/drivers/pci/quirks.c
>> > +++ b/drivers/pci/quirks.c
>> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>> >                                 PCI_CLASS_BRIDGE_HOST, 8,
>> quirk_mmio_always_on);
>> >
>> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>> > +*  by IO resource file, and need to skip the files
>> > +*/
>> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>> > +{
>> > +       int i;
>> > +
>> > +       for (i = 0; i < 5; i++)
>> > +               if (dev->resource[i].start)
>> > +                       dev->resource[i].start =
>> > +                               dev->resource[i].end = 0;
>> > +}
>> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>> > +                               quirk_marvell_mask_bar);
>> > +
>> >  /* The Mellanox Tavor device gives false positive parity errors
>> >   * Mark this device with a broken_parity_status, to allow
>> >   * PCI scanning code to "skip" this now blacklisted device.
>> > --
>> > 1.7.5.4
>> >
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiangliang Yu March 8, 2013, 6:51 a.m. UTC | #4
Hi, Bjorn

> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
> >> > BAR4, system will hang after executing lspci command
> >>
> >> This needs more explanation.  We've already read the BARs by the time
> >> header quirks are run, so apparently it's not just the mere act of
> >> accessing a BAR that causes a hang.
> >>
> >> We need to know exactly what's going on here.  For example, do BARs
> >> 0-4 exist?  Does the device decode accesses to the regions described
> >> by the BARs?  The PCI core has to know what resources the device uses,
> >> so if the device decodes accesses, we can't just throw away the
> >> start/end information.
> > The BARs 0-4 is exist and the PCI device is enable IO space, but user access
> the regions file by udevadm command with info parameter, the system will hang.
> > Like this: udevadmin info --attribut-walk
> --path=/sys/device/pci-device/000:*.
> > Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region
> file.
> > Is my explanation ok for the patch?
> 
> No, I still don't know what causes the hang; I only know that udevadm
> can trigger it.  I don't want to just paper over the problem until we
> know what the root cause is.
> 
> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
The commands are ok because the commands can't find the device after accessing IO port.
The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.

> 
> >>
> >> > ---
> >> >  drivers/pci/quirks.c |   15 +++++++++++++++
> >> >  1 files changed, 15 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> >> > index 0369fb6..d49f8dc 100644
> >> > --- a/drivers/pci/quirks.c
> >> > +++ b/drivers/pci/quirks.c
> >> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
> >> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
> >> >                                 PCI_CLASS_BRIDGE_HOST, 8,
> >> quirk_mmio_always_on);
> >> >
> >> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
> >> > +*  by IO resource file, and need to skip the files
> >> > +*/
> >> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
> >> > +{
> >> > +       int i;
> >> > +
> >> > +       for (i = 0; i < 5; i++)
> >> > +               if (dev->resource[i].start)
> >> > +                       dev->resource[i].start =
> >> > +                               dev->resource[i].end = 0;
> >> > +}
> >> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
> >> > +                               quirk_marvell_mask_bar);
> >> > +
> >> >  /* The Mellanox Tavor device gives false positive parity errors
> >> >   * Mark this device with a broken_parity_status, to allow
> >> >   * PCI scanning code to "skip" this now blacklisted device.
> >> > --
> >> > 1.7.5.4
> >> >
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas March 8, 2013, 5:01 p.m. UTC | #5
On Thu, Mar 7, 2013 at 11:51 PM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Bjorn
>
>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>> >> > BAR4, system will hang after executing lspci command
>> >>
>> >> This needs more explanation.  We've already read the BARs by the time
>> >> header quirks are run, so apparently it's not just the mere act of
>> >> accessing a BAR that causes a hang.
>> >>
>> >> We need to know exactly what's going on here.  For example, do BARs
>> >> 0-4 exist?  Does the device decode accesses to the regions described
>> >> by the BARs?  The PCI core has to know what resources the device uses,
>> >> so if the device decodes accesses, we can't just throw away the
>> >> start/end information.
>> > The BARs 0-4 is exist and the PCI device is enable IO space, but user access
>> the regions file by udevadm command with info parameter, the system will hang.
>> > Like this: udevadmin info --attribut-walk
>> --path=/sys/device/pci-device/000:*.
>> > Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region
>> file.
>> > Is my explanation ok for the patch?
>>
>> No, I still don't know what causes the hang; I only know that udevadm
>> can trigger it.  I don't want to just paper over the problem until we
>> know what the root cause is.
>>
>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
> The commands are ok because the commands can't find the device after accessing IO port.
> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.

Ah, so the problem is not with accessing the BAR in config space.  The
problem is with accessing the I/O port space mapped by the BAR.  Is
that right?

Does "udevadm info --attribute-walk" really access the device address
space mapped by the BARs?  That seems surprising to me, and I don't
see any indication of it when I try it on an AHCI device on my system:

# udevadm info --attribute-walk --path=/sys/devices/pci0000:00/0000:00:1f.2

Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

  looking at device '/devices/pci0000:00/0000:00:1f.2':
    KERNEL=="0000:00:1f.2"
    SUBSYSTEM=="pci"
    DRIVER=="ahci"
    ATTR{irq}=="40"
    ATTR{subsystem_vendor}=="0x17aa"
    ATTR{broken_parity_status}=="0"
    ATTR{class}=="0x010601"
    ATTR{consistent_dma_mask_bits}=="64"
    ATTR{dma_mask_bits}=="64"
    ATTR{local_cpus}=="00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f"
    ATTR{device}=="0x3b2f"
    ATTR{enable}=="1"
    ATTR{msi_bus}==""
    ATTR{local_cpulist}=="0-3"
    ATTR{vendor}=="0x8086"
    ATTR{subsystem_device}=="0x2168"
    ATTR{numa_node}=="-1"

  looking at parent device '/devices/pci0000:00':
    KERNELS=="pci0000:00"
    SUBSYSTEMS==""
    DRIVERS==""

>> >> > ---
>> >> >  drivers/pci/quirks.c |   15 +++++++++++++++
>> >> >  1 files changed, 15 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> >> > index 0369fb6..d49f8dc 100644
>> >> > --- a/drivers/pci/quirks.c
>> >> > +++ b/drivers/pci/quirks.c
>> >> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>> >> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>> >> >                                 PCI_CLASS_BRIDGE_HOST, 8,
>> >> quirk_mmio_always_on);
>> >> >
>> >> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>> >> > +*  by IO resource file, and need to skip the files
>> >> > +*/
>> >> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>> >> > +{
>> >> > +       int i;
>> >> > +
>> >> > +       for (i = 0; i < 5; i++)
>> >> > +               if (dev->resource[i].start)
>> >> > +                       dev->resource[i].start =
>> >> > +                               dev->resource[i].end = 0;
>> >> > +}
>> >> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>> >> > +                               quirk_marvell_mask_bar);
>> >> > +
>> >> >  /* The Mellanox Tavor device gives false positive parity errors
>> >> >   * Mark this device with a broken_parity_status, to allow
>> >> >   * PCI scanning code to "skip" this now blacklisted device.
>> >> > --
>> >> > 1.7.5.4
>> >> >
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe March 9, 2013, 3:18 a.m. UTC | #6
On Thu, Mar 7, 2013 at 11:51 PM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Bjorn
>
>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>> >> > BAR4, system will hang after executing lspci command
>> >>
>> >> This needs more explanation.  We've already read the BARs by the time
>> >> header quirks are run, so apparently it's not just the mere act of
>> >> accessing a BAR that causes a hang.
>> >>
>> >> We need to know exactly what's going on here.  For example, do BARs
>> >> 0-4 exist?  Does the device decode accesses to the regions described
>> >> by the BARs?  The PCI core has to know what resources the device uses,
>> >> so if the device decodes accesses, we can't just throw away the
>> >> start/end information.
>> > The BARs 0-4 is exist and the PCI device is enable IO space, but user access
>> the regions file by udevadm command with info parameter, the system will hang.
>> > Like this: udevadmin info --attribut-walk
>> --path=/sys/device/pci-device/000:*.
>> > Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region
>> file.
>> > Is my explanation ok for the patch?
>>
>> No, I still don't know what causes the hang; I only know that udevadm
>> can trigger it.  I don't want to just paper over the problem until we
>> know what the root cause is.
>>
>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
> The commands are ok because the commands can't find the device after accessing IO port.

Xiangliang:

Sorry but I didn't understand your response above, could you elaborate
a little more?


Are the first five BARs of the suspect device all mapping to I/O port
space - i.e. similar to something like this (a capture and inclusion
of an 'lspci' of the suspect device would be nice to see):
  00:1f.2 SATA controller:
    Region 0: I/O ports at 1860 [size=8]
    Region 1: I/O ports at 1814 [size=4]
    Region 2: I/O ports at 1818 [size=8]
    Region 3: I/O ports at 1810 [size=4]
    Region 4: I/O ports at 1840 [size=32]
    Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=2K]

You have done a good job isolating the issue so far.  As Bjorn noted;
it's looking as if the problem is with accessing the I/O port space
mapped by the suspect device's BAR(s), not with accessing the BAR(s)
in the device's configuration space.

As you responded positively to earlier, as proposed the suspect device
will still actively be decoding accesses to the regions described by
the BARs.  Because the device is actively decoding the PCI core can't
just throw away the BAR's corresponding resource regions, as the patch
is currently doing, due to the possibility of another device being
added at a later time.

If a subsequent device were added later, the core may need to try and
allocate resources for it and, in the worst case scenario, the core
could end up allocating resources that conflict with this suspect
device as a consequence of the suspect device's original resource
allocations having been silently thrown away.  The result would be
both devices believing they each exclusively own the same set (or
subset) of I/O port mappings and thus both actively decoding accesses
to such which.  A situation that would obviously be disastrous.

There is still something going on here that we still do not
understand.  Could you please capture the following information to
help further isolate the issue:
  A 'dmesg' log from the system which was booted using both the
"debug" and "ignore_loglevel" boot parameters, a 'lspci -xxx -s<dev>'
capture, and a 'lspci -vv' capture.

Thanks,
 Myron

> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.
>
>>
>> >>
>> >> > ---
>> >> >  drivers/pci/quirks.c |   15 +++++++++++++++
>> >> >  1 files changed, 15 insertions(+), 0 deletions(-)
>> >> >
>> >> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> >> > index 0369fb6..d49f8dc 100644
>> >> > --- a/drivers/pci/quirks.c
>> >> > +++ b/drivers/pci/quirks.c
>> >> > @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>> >> >  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>> >> >                                 PCI_CLASS_BRIDGE_HOST, 8,
>> >> quirk_mmio_always_on);
>> >> >
>> >> > +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>> >> > +*  by IO resource file, and need to skip the files
>> >> > +*/
>> >> > +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>> >> > +{
>> >> > +       int i;
>> >> > +
>> >> > +       for (i = 0; i < 5; i++)
>> >> > +               if (dev->resource[i].start)
>> >> > +                       dev->resource[i].start =
>> >> > +                               dev->resource[i].end = 0;
>> >> > +}
>> >> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>> >> > +                               quirk_marvell_mask_bar);
>> >> > +
>> >> >  /* The Mellanox Tavor device gives false positive parity errors
>> >> >   * Mark this device with a broken_parity_status, to allow
>> >> >   * PCI scanning code to "skip" this now blacklisted device.
>> >> > --
>> >> > 1.7.5.4
>> >> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiangliang Yu March 9, 2013, 2:49 p.m. UTC | #7
Hi, Bjorn

>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~ 
>> >> > BAR4, system will hang after executing lspci command
>> >>
>> >> This needs more explanation.  We've already read the BARs by the 
>> >> time header quirks are run, so apparently it's not just the mere 
>> >> act of accessing a BAR that causes a hang.
>> >>
>> >> We need to know exactly what's going on here.  For example, do 
>> >> BARs
>> >> 0-4 exist?  Does the device decode accesses to the regions 
>> >> described by the BARs?  The PCI core has to know what resources 
>> >> the device uses, so if the device decodes accesses, we can't just 
>> >> throw away the start/end information.
>> > The BARs 0-4 is exist and the PCI device is enable IO space, but 
>> > user access
>> the regions file by udevadm command with info parameter, the system will hang.
>> > Like this: udevadmin info --attribut-walk
>> --path=/sys/device/pci-device/000:*.
>> > Because the device is just AHCI host controller, don't need the 
>> > BAR0 ~ 4 region
>> file.
>> > Is my explanation ok for the patch?
>>
>> No, I still don't know what causes the hang; I only know that udevadm 
>> can trigger it.  I don't want to just paper over the problem until we 
>> know what the root cause is.
>>
>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev> 
>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
> The commands are ok because the commands can't find the device after accessing IO port.
> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.

>Ah, so the problem is not with accessing the BAR in config space.  The problem is with accessing the I/O port space mapped by the BAR.  Is that right?

Yes...

>Does "udevadm info --attribute-walk" really access the device address space mapped by the BARs?  

The older version maybe will access the space, I just got the info from HP. And I simplify the issue by executing following command:
Cat /sys/devices/pci-device/**/resourceX

I want to set the resources of BAR0 ~ 4 to 0 to avoid the IO accessing by user.

Any question? Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe March 9, 2013, 11:24 p.m. UTC | #8
On Sat, Mar 9, 2013 at 7:49 AM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Bjorn
>
>>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>>> >> > BAR4, system will hang after executing lspci command
>>> >>
>>> >> This needs more explanation.  We've already read the BARs by the
>>> >> time header quirks are run, so apparently it's not just the mere
>>> >> act of accessing a BAR that causes a hang.
>>> >>
>>> >> We need to know exactly what's going on here.  For example, do
>>> >> BARs
>>> >> 0-4 exist?  Does the device decode accesses to the regions
>>> >> described by the BARs?  The PCI core has to know what resources
>>> >> the device uses, so if the device decodes accesses, we can't just
>>> >> throw away the start/end information.
>>> > The BARs 0-4 is exist and the PCI device is enable IO space, but
>>> > user access
>>> the regions file by udevadm command with info parameter, the system will hang.
>>> > Like this: udevadmin info --attribut-walk
>>> --path=/sys/device/pci-device/000:*.
>>> > Because the device is just AHCI host controller, don't need the
>>> > BAR0 ~ 4 region
>>> file.
>>> > Is my explanation ok for the patch?
>>>
>>> No, I still don't know what causes the hang; I only know that udevadm
>>> can trigger it.  I don't want to just paper over the problem until we
>>> know what the root cause is.
>>>
>>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
>>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
>> The commands are ok because the commands can't find the device after accessing IO port.
>> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.
>
>>Ah, so the problem is not with accessing the BAR in config space.  The problem is with accessing the I/O port space mapped by the BAR.  Is that right?
>
> Yes...
>
>>Does "udevadm info --attribute-walk" really access the device address space mapped by the BARs?
>
> The older version maybe will access the space, I just got the info from HP. And I simplify the issue by executing following command:
> Cat /sys/devices/pci-device/**/resourceX
>
> I want to set the resources of BAR0 ~ 4 to 0 to avoid the IO accessing by user.

I tried to explain earlier the possible issues with the approach that
is currently being put forth.  Please review that and if you have any
questions ask.

>
> Any question? Thanks!

Googling and looking at the PCI IDs data base I see that the Marvell
9125 device has been around since sometime around 2010 and that there
even seem to be a number of follow-on iterations of the chip (i.e.
9128, 9120, ...).  It seems incredibly unlikely that Marvell made a
device that has been shipping for 2+ years with five I/O BARs that do
not work and we are only now finding out such.

Am I missing something relevant here?  Can you verify that this device
has is indeed not new and has been successfully used in recent
platforms?


You just recently responded with  "... I just got the info from HP.
..." so I'm assuming this is an issue that has just been encountered
on some type of HP system - is this correct?  If so, do you have
access to the system to provide the logs I asked for earlier?  Also,
is there anything special or completely new about this platform that
would explain away the arguments for why this is probably not a
Marvell device issue?

At this point it seems more likely that there is an issue with the
BIOS of the HP system, perhaps a resource duplication/overlap issue
much like I talked about earlier.

To understand the root cause and not just band-aid over a symptom we
need to get the logs asked for from the system.  HP likely needs to
get involved and start participating and providing such at this point.

Again, the logs that would be helpful currently are: A 'dmesg' log
from the system which was booted using both the "debug" and
"ignore_loglevel" boot parameters, a 'lspci -xxx -s<dev>' capture
targeting the Marvell 9125 device, and a 'lspci -vv' capture of the
system's entire PCI hierarchy.

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe March 11, 2013, 9:19 p.m. UTC | #9
On Mon, Mar 11, 2013 at 3:15 AM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Myron
>
>> >>> >> > Fix system hang issue: if first accessed resource file of BAR0 ~
>> >>> >> > BAR4, system will hang after executing lspci command
>> >
>> > Any question? Thanks!
>>
>> Googling and looking at the PCI IDs data base I see that the Marvell
>> 9125 device has been around since sometime around 2010 and that there
>> even seem to be a number of follow-on iterations of the chip (i.e.
>> 9128, 9120, ...).  It seems incredibly unlikely that Marvell made a
>> device that has been shipping for 2+ years with five I/O BARs that do
>> not work and we are only now finding out such.
> Just only 9125 has the issue.
>
>> Am I missing something relevant here?  Can you verify that this device
>> has is indeed not new and has been successfully used in recent
>> platforms?
> The device can used in recent platforms.

Could you please be a little more explicit (and I'll try to be more
specific in my questions) as I was not able to get much, if any,
understanding from the responses.

I would like to understand if the 9125 device has had issues
corresponding to accessing the I/O Port space mapped by its BARS from
the very beginning - i.e. there have been no platforms in the last 2+
years that have been able to successfully drive this device using its
I/O BAR accessing methods?

What seems more likely is that only now, due to some new and yet
unknown reason, are issues corresponding to accessing the I/O Port
space mapped by its BARS occurring - perhaps something to do with a
new processor or chipset.

Are you seeing any similar issues when booting Windows on the same platform?

This information could be helpful in tracking down the root cause.

>
>> You just recently responded with  "... I just got the info from HP.
>> ..." so I'm assuming this is an issue that has just been encountered
>> on some type of HP system - is this correct?  If so, do you have
>> access to the system to provide the logs I asked for earlier?  Also,
>> is there anything special or completely new about this platform that
>> would explain away the arguments for why this is probably not a
>> Marvell device issue?
> I can reproduce the issue with following platform:
> CPU: Intel i7-3770 3.40GHZ
> OS: centos 6.4

6.4 is a fairly old kernel by now - 2.6.32.  Have you been able to try
an upstream kernel and if so, what were the results?

>
> Now, the situation is like this:
> I captured the PCIE trace with analyzer and found that 1st BE is 0x1111 when
> accessing IO port space. But 9125 spec has some limitation, and the BE must
> be
> 0x0100, to access the 2nd byte only. So, the chip will go to bad.

Great, this is new, interesting, data.  Is the 9125 spec publicly
accessible and/or could you elaborate on the "some limitation"
comment?

I'm fairly sure that PCI Express supports byte-granular accesses to
I/O port space (I'll try to read up on this some more as I don't
usually work at this low of a level) and it seems unlikely that this
area would be broken in a chipset, especially an Intel one.

A byte enable (BE) of 0x1111 suggests the CPU did a 32-bit I/O port
read.  Does the 9125 device only support one-byte I/O port accesses
and when presented with larger request types it doesn't respond
properly?  I have to admit I don't know what the correct response
would be - perhaps a master abort.  Do you know what the PCI host
controller would return to the CPU so the CPU wouldn't hang in such a
case?

> Can you tell me what can I do to fix the issue? Thanks!

Once we understand the root cause I'm sure we'll be able to come up
with a solution.  Let's keep honing in on the problem for now until we
get to that understanding.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas March 12, 2013, 4:21 p.m. UTC | #10
On Tue, Mar 12, 2013 at 3:22 AM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Myron
>> > Now, the situation is like this:
>> > I captured the PCIE trace with analyzer and found that 1st BE is 0x1111
>> > when
>> > accessing IO port space. But 9125 spec has some limitation, and the BE
>> > must
>> > be
>> > 0x0100, to access the 2nd byte only. So, the chip will go to bad.
>>
>> Great, this is new, interesting, data.  Is the 9125 spec publicly
>> accessible and/or could you elaborate on the "some limitation"
>> comment?
> 9125 spec is publicly accessible.

Please provide a URL for the spec.

>> A byte enable (BE) of 0x1111 suggests the CPU did a 32-bit I/O port
>> read.  Does the 9125 device only support one-byte I/O port accesses
>> and when presented with larger request types it doesn't respond
>> properly?
> Yes, the hardware engineer had confirmed the situation.

Please provide a URL for the erratum describing this 9125 issue.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Hancock March 14, 2013, 4:16 a.m. UTC | #11
On 03/08/2013 09:18 PM, Myron Stowe wrote:
> On Thu, Mar 7, 2013 at 11:51 PM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
>> Hi, Bjorn
>>
>>>>>> Fix system hang issue: if first accessed resource file of BAR0 ~
>>>>>> BAR4, system will hang after executing lspci command
>>>>>
>>>>> This needs more explanation.  We've already read the BARs by the time
>>>>> header quirks are run, so apparently it's not just the mere act of
>>>>> accessing a BAR that causes a hang.
>>>>>
>>>>> We need to know exactly what's going on here.  For example, do BARs
>>>>> 0-4 exist?  Does the device decode accesses to the regions described
>>>>> by the BARs?  The PCI core has to know what resources the device uses,
>>>>> so if the device decodes accesses, we can't just throw away the
>>>>> start/end information.
>>>> The BARs 0-4 is exist and the PCI device is enable IO space, but user access
>>> the regions file by udevadm command with info parameter, the system will hang.
>>>> Like this: udevadmin info --attribut-walk
>>> --path=/sys/device/pci-device/000:*.
>>>> Because the device is just AHCI host controller, don't need the BAR0 ~ 4 region
>>> file.
>>>> Is my explanation ok for the patch?
>>>
>>> No, I still don't know what causes the hang; I only know that udevadm
>>> can trigger it.  I don't want to just paper over the problem until we
>>> know what the root cause is.
>>>
>>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
>>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
>> The commands are ok because the commands can't find the device after accessing IO port.
>
> Xiangliang:
>
> Sorry but I didn't understand your response above, could you elaborate
> a little more?
>
>
> Are the first five BARs of the suspect device all mapping to I/O port
> space - i.e. similar to something like this (a capture and inclusion
> of an 'lspci' of the suspect device would be nice to see):
>    00:1f.2 SATA controller:
>      Region 0: I/O ports at 1860 [size=8]
>      Region 1: I/O ports at 1814 [size=4]
>      Region 2: I/O ports at 1818 [size=8]
>      Region 3: I/O ports at 1810 [size=4]
>      Region 4: I/O ports at 1840 [size=32]
>      Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=2K]
>
> You have done a good job isolating the issue so far.  As Bjorn noted;
> it's looking as if the problem is with accessing the I/O port space
> mapped by the suspect device's BAR(s), not with accessing the BAR(s)
> in the device's configuration space.

It would seem so. My question is what is accessing the IO port space in 
the first place. BAR5 is the MMIO region used by the AHCI driver. BARs 
0-4 are the legacy SFF-compatible ATA ports. Nothing should be messing 
with those IO ports while AHCI is enabled. It's expected that doing that 
will break things.

If something in udev is randomly groveling around inside the resource 
files for those BARs in sysfs, that seems like a really bad thing.

>
> As you responded positively to earlier, as proposed the suspect device
> will still actively be decoding accesses to the regions described by
> the BARs.  Because the device is actively decoding the PCI core can't
> just throw away the BAR's corresponding resource regions, as the patch
> is currently doing, due to the possibility of another device being
> added at a later time.
>
> If a subsequent device were added later, the core may need to try and
> allocate resources for it and, in the worst case scenario, the core
> could end up allocating resources that conflict with this suspect
> device as a consequence of the suspect device's original resource
> allocations having been silently thrown away.  The result would be
> both devices believing they each exclusively own the same set (or
> subset) of I/O port mappings and thus both actively decoding accesses
> to such which.  A situation that would obviously be disastrous.
>
> There is still something going on here that we still do not
> understand.  Could you please capture the following information to
> help further isolate the issue:
>    A 'dmesg' log from the system which was booted using both the
> "debug" and "ignore_loglevel" boot parameters, a 'lspci -xxx -s<dev>'
> capture, and a 'lspci -vv' capture.
>
> Thanks,
>   Myron
>
>> The root cause is that accessing of IO port will make the chip go bad. So, the point of the patch is don't export capability of the IO accessing.
>>
>>>
>>>>>
>>>>>> ---
>>>>>>   drivers/pci/quirks.c |   15 +++++++++++++++
>>>>>>   1 files changed, 15 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>> index 0369fb6..d49f8dc 100644
>>>>>> --- a/drivers/pci/quirks.c
>>>>>> +++ b/drivers/pci/quirks.c
>>>>>> @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>>>>>>   DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>>>>>>                                  PCI_CLASS_BRIDGE_HOST, 8,
>>>>> quirk_mmio_always_on);
>>>>>>
>>>>>> +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>>>>>> +*  by IO resource file, and need to skip the files
>>>>>> +*/
>>>>>> +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>>>>>> +{
>>>>>> +       int i;
>>>>>> +
>>>>>> +       for (i = 0; i < 5; i++)
>>>>>> +               if (dev->resource[i].start)
>>>>>> +                       dev->resource[i].start =
>>>>>> +                               dev->resource[i].end = 0;
>>>>>> +}
>>>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>>>>>> +                               quirk_marvell_mask_bar);
>>>>>> +
>>>>>>   /* The Mellanox Tavor device gives false positive parity errors
>>>>>>    * Mark this device with a broken_parity_status, to allow
>>>>>>    * PCI scanning code to "skip" this now blacklisted device.
>>>>>> --
>>>>>> 1.7.5.4
>>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe March 14, 2013, 3:02 p.m. UTC | #12
On Wed, Mar 13, 2013 at 10:16 PM, Robert Hancock <hancockrwd@gmail.com> wrote:
> On 03/08/2013 09:18 PM, Myron Stowe wrote:
>>
>> On Thu, Mar 7, 2013 at 11:51 PM, Xiangliang Yu <yuxiangl@marvell.com>
>> wrote:
>>>
>>> Hi, Bjorn
>>>
>>>>>>> Fix system hang issue: if first accessed resource file of BAR0 ~
>>>>>>> BAR4, system will hang after executing lspci command
>>>>>>
>>>>>>
>>>>>> This needs more explanation.  We've already read the BARs by the time
>>>>>> header quirks are run, so apparently it's not just the mere act of
>>>>>> accessing a BAR that causes a hang.
>>>>>>
>>>>>> We need to know exactly what's going on here.  For example, do BARs
>>>>>> 0-4 exist?  Does the device decode accesses to the regions described
>>>>>> by the BARs?  The PCI core has to know what resources the device uses,
>>>>>> so if the device decodes accesses, we can't just throw away the
>>>>>> start/end information.
>>>>>
>>>>> The BARs 0-4 is exist and the PCI device is enable IO space, but user
>>>>> access
>>>>
>>>> the regions file by udevadm command with info parameter, the system will
>>>> hang.
>>>>>
>>>>> Like this: udevadmin info --attribut-walk
>>>>
>>>> --path=/sys/device/pci-device/000:*.
>>>>>
>>>>> Because the device is just AHCI host controller, don't need the BAR0 ~
>>>>> 4 region
>>>>
>>>> file.
>>>>>
>>>>> Is my explanation ok for the patch?
>>>>
>>>>
>>>> No, I still don't know what causes the hang; I only know that udevadm
>>>> can trigger it.  I don't want to just paper over the problem until we
>>>> know what the root cause is.
>>>>
>>>> Does "lspci -H1 -vv" also cause a hang?  What about "setpci -s<dev>
>>>> BASE_ADDRESS_0"?  "setpci -H1 -s<dev> BASE_ADDRESS_0"?
>>>
>>> The commands are ok because the commands can't find the device after
>>> accessing IO port.
>>
>>
>> Xiangliang:
>>
>> Sorry but I didn't understand your response above, could you elaborate
>> a little more?
>>
>>
>> Are the first five BARs of the suspect device all mapping to I/O port
>> space - i.e. similar to something like this (a capture and inclusion
>> of an 'lspci' of the suspect device would be nice to see):
>>    00:1f.2 SATA controller:
>>      Region 0: I/O ports at 1860 [size=8]
>>      Region 1: I/O ports at 1814 [size=4]
>>      Region 2: I/O ports at 1818 [size=8]
>>      Region 3: I/O ports at 1810 [size=4]
>>      Region 4: I/O ports at 1840 [size=32]
>>      Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=2K]
>>
>> You have done a good job isolating the issue so far.  As Bjorn noted;
>> it's looking as if the problem is with accessing the I/O port space
>> mapped by the suspect device's BAR(s), not with accessing the BAR(s)
>> in the device's configuration space.
>
>
> It would seem so. My question is what is accessing the IO port space in the
> first place. BAR5 is the MMIO region used by the AHCI driver. BARs 0-4 are
> the legacy SFF-compatible ATA ports. Nothing should be messing with those IO
> ports while AHCI is enabled. It's expected that doing that will break
> things.
>
> If something in udev is randomly groveling around inside the resource files
> for those BARs in sysfs, that seems like a really bad thing.

Thanks Robert

I'll see if I can get someone knowledgable about udev to look at this
since it seems to be the suspect currently.

Hopefully this will lead somewhere as the core isn't, and shouldn't
be, concerned with the contents or access limitations of those
regions.  If the 9125 doesn't respond correctly to 32-bit IO port
reads and adjustments for such end up needing to be accounted for then
they should to be covered by the driver and not the PCI core.

>
>
>>
>> As you responded positively to earlier, as proposed the suspect device
>> will still actively be decoding accesses to the regions described by
>> the BARs.  Because the device is actively decoding the PCI core can't
>> just throw away the BAR's corresponding resource regions, as the patch
>> is currently doing, due to the possibility of another device being
>> added at a later time.
>>
>> If a subsequent device were added later, the core may need to try and
>> allocate resources for it and, in the worst case scenario, the core
>> could end up allocating resources that conflict with this suspect
>> device as a consequence of the suspect device's original resource
>> allocations having been silently thrown away.  The result would be
>> both devices believing they each exclusively own the same set (or
>> subset) of I/O port mappings and thus both actively decoding accesses
>> to such which.  A situation that would obviously be disastrous.
>>
>> There is still something going on here that we still do not
>> understand.  Could you please capture the following information to
>> help further isolate the issue:
>>    A 'dmesg' log from the system which was booted using both the
>> "debug" and "ignore_loglevel" boot parameters, a 'lspci -xxx -s<dev>'
>> capture, and a 'lspci -vv' capture.
>>
>> Thanks,
>>   Myron
>>
>>> The root cause is that accessing of IO port will make the chip go bad.
>>> So, the point of the patch is don't export capability of the IO accessing.
>>>
>>>>
>>>>>>
>>>>>>> ---
>>>>>>>   drivers/pci/quirks.c |   15 +++++++++++++++
>>>>>>>   1 files changed, 15 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>>> index 0369fb6..d49f8dc 100644
>>>>>>> --- a/drivers/pci/quirks.c
>>>>>>> +++ b/drivers/pci/quirks.c
>>>>>>> @@ -44,6 +44,21 @@ static void quirk_mmio_always_on(struct pci_dev
>>>>>>> *dev)
>>>>>>>   DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>>>>>>>                                  PCI_CLASS_BRIDGE_HOST, 8,
>>>>>>
>>>>>> quirk_mmio_always_on);
>>>>>>>
>>>>>>>
>>>>>>> +/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
>>>>>>> +*  by IO resource file, and need to skip the files
>>>>>>> +*/
>>>>>>> +static void quirk_marvell_mask_bar(struct pci_dev *dev)
>>>>>>> +{
>>>>>>> +       int i;
>>>>>>> +
>>>>>>> +       for (i = 0; i < 5; i++)
>>>>>>> +               if (dev->resource[i].start)
>>>>>>> +                       dev->resource[i].start =
>>>>>>> +                               dev->resource[i].end = 0;
>>>>>>> +}
>>>>>>> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
>>>>>>> +                               quirk_marvell_mask_bar);
>>>>>>> +
>>>>>>>   /* The Mellanox Tavor device gives false positive parity errors
>>>>>>>    * Mark this device with a broken_parity_status, to allow
>>>>>>>    * PCI scanning code to "skip" this now blacklisted device.
>>>>>>> --
>>>>>>> 1.7.5.4
>>>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe March 14, 2013, 3:03 p.m. UTC | #13
On Wed, Mar 13, 2013 at 3:40 AM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
> Hi, Bjorn
>
>> >> > Now, the situation is like this:
>> >> > I captured the PCIE trace with analyzer and found that 1st BE is 0x1111
>> >> > when
>> >> > accessing IO port space. But 9125 spec has some limitation, and the BE
>> >> > must
>> >> > be
>> >> > 0x0100, to access the 2nd byte only. So, the chip will go to bad.
>> >>
>> >> Great, this is new, interesting, data.  Is the 9125 spec publicly
>> >> accessible and/or could you elaborate on the "some limitation"
>> >> comment?
>> > 9125 spec is publicly accessible.
> If you can't see the pic, please open the attachment. Thanks!

Neither Bjorn nor myself could see the pic (from the previous thread
or this thread's attachment).

>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe March 17, 2013, 12:13 a.m. UTC | #14
On Thu, Mar 14, 2013 at 9:03 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Wed, Mar 13, 2013 at 3:40 AM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
>> Hi, Bjorn
>>
>>> >> > Now, the situation is like this:
>>> >> > I captured the PCIE trace with analyzer and found that 1st BE is 0x1111
>>> >> > when
>>> >> > accessing IO port space. But 9125 spec has some limitation, and the BE
>>> >> > must
>>> >> > be
>>> >> > 0x0100, to access the 2nd byte only. So, the chip will go to bad.
>>> >>
>>> >> Great, this is new, interesting, data.  Is the 9125 spec publicly
>>> >> accessible and/or could you elaborate on the "some limitation"
>>> >> comment?
>>> > 9125 spec is publicly accessible.
>> If you can't see the pic, please open the attachment. Thanks!
>
> Neither Bjorn nor myself could see the pic (from the previous thread
> or this thread's attachment).
>
>>
>>

Just an FYI that I proposed a different tact at
https://lkml.org/lkml/2013/3/16/168.  For those following this thread
you may want to start following that thread also.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Myron Stowe March 21, 2013, 4 p.m. UTC | #15
On Sat, Mar 16, 2013 at 6:13 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Thu, Mar 14, 2013 at 9:03 AM, Myron Stowe <myron.stowe@gmail.com> wrote:
>> On Wed, Mar 13, 2013 at 3:40 AM, Xiangliang Yu <yuxiangl@marvell.com> wrote:
>>> Hi, Bjorn
>>>
>>>> >> > Now, the situation is like this:
>>>> >> > I captured the PCIE trace with analyzer and found that 1st BE is 0x1111
>>>> >> > when
>>>> >> > accessing IO port space. But 9125 spec has some limitation, and the BE
>>>> >> > must
>>>> >> > be
>>>> >> > 0x0100, to access the 2nd byte only. So, the chip will go to bad.
>>>> >>
>>>> >> Great, this is new, interesting, data.  Is the 9125 spec publicly
>>>> >> accessible and/or could you elaborate on the "some limitation"
>>>> >> comment?
>>>> > 9125 spec is publicly accessible.
>>> If you can't see the pic, please open the attachment. Thanks!
>>
>> Neither Bjorn nor myself could see the pic (from the previous thread
>> or this thread's attachment).
>>
>>>
>>>
>
> Just an FYI that I proposed a different tact at
> https://lkml.org/lkml/2013/3/16/168.  For those following this thread
> you may want to start following that thread also.

I posted a third approach last night.  For those of you following this
thread there are now three streams:
  This stream,
  https://lkml.org/lkml/2013/3/16/168 (which is also on the linux-pci list),
  and https://lkml.org/lkml/2013/3/21/12 (which is also on the linux-pci list).

Perhaps the third time will be a charm
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 0369fb6..d49f8dc 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -44,6 +44,21 @@  static void quirk_mmio_always_on(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
 				PCI_CLASS_BRIDGE_HOST, 8, quirk_mmio_always_on);
 
+/* The BAR0 ~ BAR4 of Marvell 9125 device can't be accessed
+*  by IO resource file, and need to skip the files
+*/
+static void quirk_marvell_mask_bar(struct pci_dev *dev)
+{
+	int i;
+
+	for (i = 0; i < 5; i++)
+		if (dev->resource[i].start)
+			dev->resource[i].start =
+				dev->resource[i].end = 0;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MARVELL_EXT, 0x9125,
+				quirk_marvell_mask_bar);
+
 /* The Mellanox Tavor device gives false positive parity errors
  * Mark this device with a broken_parity_status, to allow
  * PCI scanning code to "skip" this now blacklisted device.