diff mbox series

[resend,net-next,v2,2/3] PCI/VPD: Change Chelsio T4 quirk to provide access to full virtual address space

Message ID 6bf6319f-acaa-c114-d10b-cb9b7d469968@gmail.com (mailing list archive)
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series cxgb4: improve PCI VPD handling | expand

Commit Message

Heiner Kallweit Feb. 5, 2021, 7:29 p.m. UTC
cxgb4 uses the full VPD address space for accessing its EEPROM (with some
mapping, see t4_eeprom_ptov()). In cudbg_collect_vpd_data() it sets the
VPD len to 32K (PCI_VPD_MAX_SIZE), and then back to 2K (CUDBG_VPD_PF_SIZE).
Having official (structured) and inofficial (unstructured) VPD data
violates the PCI spec, let's set VPD len according to all data that can be
accessed via PCI VPD access, no matter of its structure.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/pci/vpd.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Bjorn Helgaas Feb. 5, 2021, 9:46 p.m. UTC | #1
[+cc Casey, Rahul]

On Fri, Feb 05, 2021 at 08:29:45PM +0100, Heiner Kallweit wrote:
> cxgb4 uses the full VPD address space for accessing its EEPROM (with some
> mapping, see t4_eeprom_ptov()). In cudbg_collect_vpd_data() it sets the
> VPD len to 32K (PCI_VPD_MAX_SIZE), and then back to 2K (CUDBG_VPD_PF_SIZE).
> Having official (structured) and inofficial (unstructured) VPD data
> violates the PCI spec, let's set VPD len according to all data that can be
> accessed via PCI VPD access, no matter of its structure.

s/inofficial/unofficial/

> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/pci/vpd.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
> index 7915d10f9..06a7954d0 100644
> --- a/drivers/pci/vpd.c
> +++ b/drivers/pci/vpd.c
> @@ -633,9 +633,8 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>  	/*
>  	 * If this is a T3-based adapter, there's a 1KB VPD area at offset
>  	 * 0xc00 which contains the preferred VPD values.  If this is a T4 or
> -	 * later based adapter, the special VPD is at offset 0x400 for the
> -	 * Physical Functions (the SR-IOV Virtual Functions have no VPD
> -	 * Capabilities).  The PCI VPD Access core routines will normally
> +	 * later based adapter, provide access to the full virtual EEPROM
> +	 * address space. The PCI VPD Access core routines will normally
>  	 * compute the size of the VPD by parsing the VPD Data Structure at
>  	 * offset 0x000.  This will result in silent failures when attempting
>  	 * to accesses these other VPD areas which are beyond those computed
> @@ -644,7 +643,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>  	if (chip == 0x0 && prod >= 0x20)
>  		pci_set_vpd_size(dev, 8192);
>  	else if (chip >= 0x4 && func < 0x8)
> -		pci_set_vpd_size(dev, 2048);
> +		pci_set_vpd_size(dev, PCI_VPD_MAX_SIZE);

This code was added by 7dcf688d4c78 ("PCI/cxgb4: Extend T3 PCI quirk
to T4+ devices") [1].  Unfortunately that commit doesn't really have
the details about what it fixes, other than the silent failures it
mentions in the comment.

Some devices hang if we try to read at the wrong VPD address, and this
can be done via the sysfs "vpd" file.  Can you expand the commit log
with an argument for why it is always safe to set the size to
PCI_VPD_MAX_SIZE for these devices?

The fact that cudbg_collect_vpd_data() fiddles around with
pci_set_vpd_size() suggests to me that there is *some* problem with
reading parts of the VPD.  Otherwise, why would they bother?

940c9c458866 ("cxgb4: collect vpd info directly from hardware") [2]
added the pci_set_vpd_size() usage, but doesn't say why it's needed.
Maybe Rahul will remember?

Bjorn

[1] https://git.kernel.org/linus/7dcf688d4c78
[2] https://git.kernel.org/linus/940c9c458866

>  }
>  
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> -- 
> 2.30.0
> 
> 
>
Heiner Kallweit Feb. 5, 2021, 10:31 p.m. UTC | #2
On 05.02.2021 22:46, Bjorn Helgaas wrote:
> [+cc Casey, Rahul]
> 
> On Fri, Feb 05, 2021 at 08:29:45PM +0100, Heiner Kallweit wrote:
>> cxgb4 uses the full VPD address space for accessing its EEPROM (with some
>> mapping, see t4_eeprom_ptov()). In cudbg_collect_vpd_data() it sets the
>> VPD len to 32K (PCI_VPD_MAX_SIZE), and then back to 2K (CUDBG_VPD_PF_SIZE).
>> Having official (structured) and inofficial (unstructured) VPD data
>> violates the PCI spec, let's set VPD len according to all data that can be
>> accessed via PCI VPD access, no matter of its structure.
> 
> s/inofficial/unofficial/
> 
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/pci/vpd.c | 7 +++----
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
>> index 7915d10f9..06a7954d0 100644
>> --- a/drivers/pci/vpd.c
>> +++ b/drivers/pci/vpd.c
>> @@ -633,9 +633,8 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>>  	/*
>>  	 * If this is a T3-based adapter, there's a 1KB VPD area at offset
>>  	 * 0xc00 which contains the preferred VPD values.  If this is a T4 or
>> -	 * later based adapter, the special VPD is at offset 0x400 for the
>> -	 * Physical Functions (the SR-IOV Virtual Functions have no VPD
>> -	 * Capabilities).  The PCI VPD Access core routines will normally
>> +	 * later based adapter, provide access to the full virtual EEPROM
>> +	 * address space. The PCI VPD Access core routines will normally
>>  	 * compute the size of the VPD by parsing the VPD Data Structure at
>>  	 * offset 0x000.  This will result in silent failures when attempting
>>  	 * to accesses these other VPD areas which are beyond those computed
>> @@ -644,7 +643,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>>  	if (chip == 0x0 && prod >= 0x20)
>>  		pci_set_vpd_size(dev, 8192);
>>  	else if (chip >= 0x4 && func < 0x8)
>> -		pci_set_vpd_size(dev, 2048);
>> +		pci_set_vpd_size(dev, PCI_VPD_MAX_SIZE);
> 
> This code was added by 7dcf688d4c78 ("PCI/cxgb4: Extend T3 PCI quirk
> to T4+ devices") [1].  Unfortunately that commit doesn't really have
> the details about what it fixes, other than the silent failures it
> mentions in the comment.
> 
> Some devices hang if we try to read at the wrong VPD address, and this
> can be done via the sysfs "vpd" file.  Can you expand the commit log
> with an argument for why it is always safe to set the size to
> PCI_VPD_MAX_SIZE for these devices?
> 

Seeing t4_eeprom_ptov() there is data at the end of the VPD address
space, but there may be gaps in between. I don't have test hw,
therefore it would be good if Chelsio could confirm that accessing
any address in the VPD address space (32K) is ok. If a VPD address
isn't backed by EEPROM, it should return 0x00 or 0xff, and not hang
the device.

> The fact that cudbg_collect_vpd_data() fiddles around with
> pci_set_vpd_size() suggests to me that there is *some* problem with
> reading parts of the VPD.  Otherwise, why would they bother?
> 
> 940c9c458866 ("cxgb4: collect vpd info directly from hardware") [2]
> added the pci_set_vpd_size() usage, but doesn't say why it's needed.
> Maybe Rahul will remember?
> 

In addition we have cb92148b58a4 ("PCI: Add pci_set_vpd_size() to set
VPD size"). To me it seems the VPD size quirks and this commit
try to achieve the same: allow to override the autodetected VPD len

The quirk mechanism is well established, and if possible I'd like
to get rid of pci_set_vpd_size(). I don't like the idea that the
PCI core exposes API calls for accessing a proprietary VPD data
format of one specific vendor (cxgb4 is the only user of
pci_set_vpd_size()).

> Bjorn
> 
> [1] https://git.kernel.org/linus/7dcf688d4c78
> [2] https://git.kernel.org/linus/940c9c458866
> 
>>  }
>>  
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
>> -- 
>> 2.30.0
>>
>>
>>
Rahul Lakkireddy Feb. 8, 2021, 7:47 p.m. UTC | #3
On Friday, February 02/05/21, 2021 at 23:31:24 +0100, Heiner Kallweit wrote:
> On 05.02.2021 22:46, Bjorn Helgaas wrote:
> > [+cc Casey, Rahul]
> > 
> > On Fri, Feb 05, 2021 at 08:29:45PM +0100, Heiner Kallweit wrote:
> >> cxgb4 uses the full VPD address space for accessing its EEPROM (with some
> >> mapping, see t4_eeprom_ptov()). In cudbg_collect_vpd_data() it sets the
> >> VPD len to 32K (PCI_VPD_MAX_SIZE), and then back to 2K (CUDBG_VPD_PF_SIZE).
> >> Having official (structured) and inofficial (unstructured) VPD data
> >> violates the PCI spec, let's set VPD len according to all data that can be
> >> accessed via PCI VPD access, no matter of its structure.
> > 
> > s/inofficial/unofficial/
> > 
> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> >> ---
> >>  drivers/pci/vpd.c | 7 +++----
> >>  1 file changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
> >> index 7915d10f9..06a7954d0 100644
> >> --- a/drivers/pci/vpd.c
> >> +++ b/drivers/pci/vpd.c
> >> @@ -633,9 +633,8 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
> >>  	/*
> >>  	 * If this is a T3-based adapter, there's a 1KB VPD area at offset
> >>  	 * 0xc00 which contains the preferred VPD values.  If this is a T4 or
> >> -	 * later based adapter, the special VPD is at offset 0x400 for the
> >> -	 * Physical Functions (the SR-IOV Virtual Functions have no VPD
> >> -	 * Capabilities).  The PCI VPD Access core routines will normally
> >> +	 * later based adapter, provide access to the full virtual EEPROM
> >> +	 * address space. The PCI VPD Access core routines will normally
> >>  	 * compute the size of the VPD by parsing the VPD Data Structure at
> >>  	 * offset 0x000.  This will result in silent failures when attempting
> >>  	 * to accesses these other VPD areas which are beyond those computed
> >> @@ -644,7 +643,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
> >>  	if (chip == 0x0 && prod >= 0x20)
> >>  		pci_set_vpd_size(dev, 8192);
> >>  	else if (chip >= 0x4 && func < 0x8)
> >> -		pci_set_vpd_size(dev, 2048);
> >> +		pci_set_vpd_size(dev, PCI_VPD_MAX_SIZE);
> > 
> > This code was added by 7dcf688d4c78 ("PCI/cxgb4: Extend T3 PCI quirk
> > to T4+ devices") [1].  Unfortunately that commit doesn't really have
> > the details about what it fixes, other than the silent failures it
> > mentions in the comment.
> > 
> > Some devices hang if we try to read at the wrong VPD address, and this
> > can be done via the sysfs "vpd" file.  Can you expand the commit log
> > with an argument for why it is always safe to set the size to
> > PCI_VPD_MAX_SIZE for these devices?
> > 
> 
> Seeing t4_eeprom_ptov() there is data at the end of the VPD address
> space, but there may be gaps in between. I don't have test hw,
> therefore it would be good if Chelsio could confirm that accessing
> any address in the VPD address space (32K) is ok. If a VPD address
> isn't backed by EEPROM, it should return 0x00 or 0xff, and not hang
> the device.
> 

We've tested the patches on T5 adapter. Although there are no crashes
seen, the 32K VPD read from sysfs at certain chunks are getting wrapped
around and overwritten. We're still analyzing this.

> > The fact that cudbg_collect_vpd_data() fiddles around with
> > pci_set_vpd_size() suggests to me that there is *some* problem with
> > reading parts of the VPD.  Otherwise, why would they bother?
> > 
> > 940c9c458866 ("cxgb4: collect vpd info directly from hardware") [2]
> > added the pci_set_vpd_size() usage, but doesn't say why it's needed.
> > Maybe Rahul will remember?
> > 

If firmware has crashed, then it's not possible to collect the VPD info
from firmware. So, the VPD info is fetched from EEPROM instead, which
is unfortunately outside the VPD size of the PF.

> 
> In addition we have cb92148b58a4 ("PCI: Add pci_set_vpd_size() to set
> VPD size"). To me it seems the VPD size quirks and this commit
> try to achieve the same: allow to override the autodetected VPD len
> 
> The quirk mechanism is well established, and if possible I'd like
> to get rid of pci_set_vpd_size(). I don't like the idea that the
> PCI core exposes API calls for accessing a proprietary VPD data
> format of one specific vendor (cxgb4 is the only user of
> pci_set_vpd_size()).
> 

There seems to be a way to get the Serial Configuration Version from
some internal registers. I will send the patch soon. It should remove
the call to pci_set_vpd_size() from cudbg_lib.c.

Thanks,
Rahul

> > Bjorn
> > 
> > [1] https://git.kernel.org/linus/7dcf688d4c78
> > [2] https://git.kernel.org/linus/940c9c458866
> > 
> >>  }
> >>  
> >>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> >> -- 
> >> 2.30.0
> >>
> >>
> >>
>
Heiner Kallweit Feb. 8, 2021, 8:21 p.m. UTC | #4
On 08.02.2021 20:47, Rahul Lakkireddy wrote:
> On Friday, February 02/05/21, 2021 at 23:31:24 +0100, Heiner Kallweit wrote:
>> On 05.02.2021 22:46, Bjorn Helgaas wrote:
>>> [+cc Casey, Rahul]
>>>
>>> On Fri, Feb 05, 2021 at 08:29:45PM +0100, Heiner Kallweit wrote:
>>>> cxgb4 uses the full VPD address space for accessing its EEPROM (with some
>>>> mapping, see t4_eeprom_ptov()). In cudbg_collect_vpd_data() it sets the
>>>> VPD len to 32K (PCI_VPD_MAX_SIZE), and then back to 2K (CUDBG_VPD_PF_SIZE).
>>>> Having official (structured) and inofficial (unstructured) VPD data
>>>> violates the PCI spec, let's set VPD len according to all data that can be
>>>> accessed via PCI VPD access, no matter of its structure.
>>>
>>> s/inofficial/unofficial/
>>>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>>> ---
>>>>  drivers/pci/vpd.c | 7 +++----
>>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
>>>> index 7915d10f9..06a7954d0 100644
>>>> --- a/drivers/pci/vpd.c
>>>> +++ b/drivers/pci/vpd.c
>>>> @@ -633,9 +633,8 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>>>>  	/*
>>>>  	 * If this is a T3-based adapter, there's a 1KB VPD area at offset
>>>>  	 * 0xc00 which contains the preferred VPD values.  If this is a T4 or
>>>> -	 * later based adapter, the special VPD is at offset 0x400 for the
>>>> -	 * Physical Functions (the SR-IOV Virtual Functions have no VPD
>>>> -	 * Capabilities).  The PCI VPD Access core routines will normally
>>>> +	 * later based adapter, provide access to the full virtual EEPROM
>>>> +	 * address space. The PCI VPD Access core routines will normally
>>>>  	 * compute the size of the VPD by parsing the VPD Data Structure at
>>>>  	 * offset 0x000.  This will result in silent failures when attempting
>>>>  	 * to accesses these other VPD areas which are beyond those computed
>>>> @@ -644,7 +643,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>>>>  	if (chip == 0x0 && prod >= 0x20)
>>>>  		pci_set_vpd_size(dev, 8192);
>>>>  	else if (chip >= 0x4 && func < 0x8)
>>>> -		pci_set_vpd_size(dev, 2048);
>>>> +		pci_set_vpd_size(dev, PCI_VPD_MAX_SIZE);
>>>
>>> This code was added by 7dcf688d4c78 ("PCI/cxgb4: Extend T3 PCI quirk
>>> to T4+ devices") [1].  Unfortunately that commit doesn't really have
>>> the details about what it fixes, other than the silent failures it
>>> mentions in the comment.
>>>
>>> Some devices hang if we try to read at the wrong VPD address, and this
>>> can be done via the sysfs "vpd" file.  Can you expand the commit log
>>> with an argument for why it is always safe to set the size to
>>> PCI_VPD_MAX_SIZE for these devices?
>>>
>>
>> Seeing t4_eeprom_ptov() there is data at the end of the VPD address
>> space, but there may be gaps in between. I don't have test hw,
>> therefore it would be good if Chelsio could confirm that accessing
>> any address in the VPD address space (32K) is ok. If a VPD address
>> isn't backed by EEPROM, it should return 0x00 or 0xff, and not hang
>> the device.
>>
> 
> We've tested the patches on T5 adapter. Although there are no crashes
> seen, the 32K VPD read from sysfs at certain chunks are getting wrapped
> around and overwritten. We're still analyzing this.
> 
>>> The fact that cudbg_collect_vpd_data() fiddles around with
>>> pci_set_vpd_size() suggests to me that there is *some* problem with
>>> reading parts of the VPD.  Otherwise, why would they bother?
>>>
>>> 940c9c458866 ("cxgb4: collect vpd info directly from hardware") [2]
>>> added the pci_set_vpd_size() usage, but doesn't say why it's needed.
>>> Maybe Rahul will remember?
>>>
> 
> If firmware has crashed, then it's not possible to collect the VPD info
> from firmware. So, the VPD info is fetched from EEPROM instead, which
> is unfortunately outside the VPD size of the PF.
> 
>>
>> In addition we have cb92148b58a4 ("PCI: Add pci_set_vpd_size() to set
>> VPD size"). To me it seems the VPD size quirks and this commit
>> try to achieve the same: allow to override the autodetected VPD len
>>
>> The quirk mechanism is well established, and if possible I'd like
>> to get rid of pci_set_vpd_size(). I don't like the idea that the
>> PCI core exposes API calls for accessing a proprietary VPD data
>> format of one specific vendor (cxgb4 is the only user of
>> pci_set_vpd_size()).
>>
> 
> There seems to be a way to get the Serial Configuration Version from
> some internal registers. I will send the patch soon. It should remove
> the call to pci_set_vpd_size() from cudbg_lib.c.
> 
Great, this would be the best solution. Then we can get rid of
pci_set_vpd_size() w/o impact on what is exposed via sysfs.

So for now I'll drop patches 2 and 3 and will resend patch 1
as a single patch. 

> Thanks,
> Rahul
> 
Heiner

>>> Bjorn
>>>
>>> [1] https://git.kernel.org/linus/7dcf688d4c78
>>> [2] https://git.kernel.org/linus/940c9c458866
>>>
>>>>  }
>>>>  
>>>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
>>>> -- 
>>>> 2.30.0
>>>>
>>>>
>>>>
>>
diff mbox series

Patch

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 7915d10f9..06a7954d0 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -633,9 +633,8 @@  static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
 	/*
 	 * If this is a T3-based adapter, there's a 1KB VPD area at offset
 	 * 0xc00 which contains the preferred VPD values.  If this is a T4 or
-	 * later based adapter, the special VPD is at offset 0x400 for the
-	 * Physical Functions (the SR-IOV Virtual Functions have no VPD
-	 * Capabilities).  The PCI VPD Access core routines will normally
+	 * later based adapter, provide access to the full virtual EEPROM
+	 * address space. The PCI VPD Access core routines will normally
 	 * compute the size of the VPD by parsing the VPD Data Structure at
 	 * offset 0x000.  This will result in silent failures when attempting
 	 * to accesses these other VPD areas which are beyond those computed
@@ -644,7 +643,7 @@  static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
 	if (chip == 0x0 && prod >= 0x20)
 		pci_set_vpd_size(dev, 8192);
 	else if (chip >= 0x4 && func < 0x8)
-		pci_set_vpd_size(dev, 2048);
+		pci_set_vpd_size(dev, PCI_VPD_MAX_SIZE);
 }
 
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,