mbox series

[RFC,v1,0/3] Fix using wrong GPF DVSEC location issue

Message ID 20250319035516.222054-1-ming.li@zohomail.com
Headers show
Series Fix using wrong GPF DVSEC location issue | expand

Message

Li Ming March 19, 2025, 3:55 a.m. UTC
During review all new patches in branch cxl/next. I noticed there may be
a problem in below commit.

commit a52b6a2c1c99 ("cxl/pci: Support Global Persistent Flush (GPF)")

There is a new field gpf_dvsec in struct cxl_port to cache GPF DVSEC for
Port(DVSEC ID 04h) location. When the first EP attaching to a cxl port,
it will trigger locating GPF DVSEC on the cxl dport which the first EP
is under, then the location is cached in port->gpf_dvsec. So if another
EP under another dport is attaching, it will reuse the value of
port->gpf_dvsec as GPF DVSEC location for this another dport. The
problem is the cached location may be a wrong location for other dports
of the port.

Per Table 8-2 in CXL r3.2 section 8.1.1 and CXL r3.2 section 8.1.6, Each
CXL Downstream switch ports and CXL root ports have their own GPF DVSEC
for Port(DVSEC ID 04h). So my understanding is that CXL subsystem should
locate GPF DVSEC for Port for each dport rather than using the cached
location in CXL port.

But I am not sure if all dports under a same port will have same
configuration space layout, if yes, that will not be a problem. If I am
wrong, please let me know, thanks.

base-commit: 3b5d43245f0a56390baaa670e1b6d898772266b3 cxl/next

Li Ming (3):
  cxl/core: Fix caching dport GPF DVSEC issue
  cxl/pci: Update Port GPF timeout only when the first EP attaching
  cxl/pci: Drop the parameter is_port of cxl_gpf_get_dvsec()

 drivers/cxl/core/core.h |  2 +-
 drivers/cxl/core/pci.c  | 27 ++++++++++++++++-----------
 drivers/cxl/core/port.c |  2 +-
 drivers/cxl/cxl.h       |  6 +++---
 drivers/cxl/pmem.c      |  2 +-
 5 files changed, 22 insertions(+), 17 deletions(-)

Comments

Davidlohr Bueso March 21, 2025, 2:14 a.m. UTC | #1
On Wed, 19 Mar 2025, Li Ming wrote:

>But I am not sure if all dports under a same port will have same
>configuration space layout, if yes, that will not be a problem. If I am
>wrong, please let me know, thanks.

Yes, when caching the dvsec was suggested, it was my assumption that the
config space would be the same.

Thanks,
Davidlohr
Davidlohr Bueso March 21, 2025, 3:59 a.m. UTC | #2
On Thu, 20 Mar 2025, Davidlohr Bueso wrote:

>On Wed, 19 Mar 2025, Li Ming wrote:
>
>>But I am not sure if all dports under a same port will have same
>>configuration space layout, if yes, that will not be a problem. If I am
>>wrong, please let me know, thanks.
>
>Yes, when caching the dvsec was suggested, it was my assumption that the
>config space would be the same.

Ultimately I don't know what the expectation is here, but your updates
do allow more flexibility from vendors, I guess(?). It's a bit late
in the cycle, unfortunately, so if these are to go in for v6.15, they
would be considered a fix imo, otherwise perhaps they are wanted for
v6.16 or not at all (patch 3 does look useful regardless)?

Based on some of the topologies listed in qemu, I did some testing (and
this was also why the same dvsec config layout) and see things working as
expected.

https://www.qemu.org/docs/master/system/devices/cxl.html#example-command-lines

(i) 2 direct-attached Type3 devices:
-next
[    3.938238] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.939323] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    4.003074] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    4.003676] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs
-next+fix
[    3.969841] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.970957] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    3.971622] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    3.972664] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs

(ii) 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with the CXL Type3 device directly attached:
-next
[    6.182333] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:00.0: Port GPF phase 1 timeout: 20 secs
[    6.182352] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:00.0: Port GPF phase 2 timeout: 20 secs
[    6.183938] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    6.183955] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs
[    6.204324] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    6.205407] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    6.210006] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:01.0: Port GPF phase 1 timeout: 20 secs
[    6.210921] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:de:01.0: Port GPF phase 2 timeout: 20 secs
-next+fix
[    6.153093] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 1 timeout: 20 secs
[    6.153107] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:01.0: Port GPF phase 2 timeout: 20 secs
[    6.154170] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:01.0: Port GPF phase 1 timeout: 20 secs
[    6.154187] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:01.0: Port GPF phase 2 timeout: 20 secs
[    6.195279] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:00.0: Port GPF phase 1 timeout: 20 secs
[    6.195859] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:de:00.0: Port GPF phase 2 timeout: 20 secs
[    6.255782] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    6.257152] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs

(iii) 4 Type3 devices below a CXL Switch:
-next
[    3.940200] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:01.0: Port GPF phase 1 timeout: 20 secs
[    3.940218] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:01.0: Port GPF phase 2 timeout: 20 secs
[    3.940231] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:00.0: Port GPF phase 1 timeout: 20 secs
[    3.940245] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:00.0: Port GPF phase 2 timeout: 20 secs
[    3.940340] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.940350] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    3.948114] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:02.0: Port GPF phase 1 timeout: 20 secs
[    3.949203] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:02.0: Port GPF phase 2 timeout: 20 secs
[    3.997620] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:03.0: Port GPF phase 1 timeout: 20 secs
[    3.997641] cxl_core:update_gpf_port_dvsec:1125: pcieport 0000:0e:03.0: Port GPF phase 2 timeout: 20 secs
-next+fix
[    3.983632] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:01.0: Port GPF phase 1 timeout: 20 secs
[    3.983649] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:01.0: Port GPF phase 2 timeout: 20 secs
[    3.984525] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 1 timeout: 20 secs
[    3.984539] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0c:00.0: Port GPF phase 2 timeout: 20 secs
[    3.984667] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:03.0: Port GPF phase 1 timeout: 20 secs
[    3.984692] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:03.0: Port GPF phase 2 timeout: 20 secs
[    3.985074] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:02.0: Port GPF phase 1 timeout: 20 secs
[    3.985090] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:02.0: Port GPF phase 2 timeout: 20 secs
[    3.988954] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:00.0: Port GPF phase 1 timeout: 20 secs
[    3.990464] cxl_core:update_gpf_port_dvsec:1131: pcieport 0000:0e:00.0: Port GPF phase 2 timeout: 20 secs

Thanks,
Davidlohr
Li Ming March 21, 2025, 6:55 a.m. UTC | #3
On 3/21/2025 11:59 AM, Davidlohr Bueso wrote:
> On Thu, 20 Mar 2025, Davidlohr Bueso wrote:
>
>> On Wed, 19 Mar 2025, Li Ming wrote:
>>
>>> But I am not sure if all dports under a same port will have same
>>> configuration space layout, if yes, that will not be a problem. If I am
>>> wrong, please let me know, thanks.
>>
>> Yes, when caching the dvsec was suggested, it was my assumption that the
>> config space would be the same.
>
> Ultimately I don't know what the expectation is here, but your updates
> do allow more flexibility from vendors, I guess(?). It's a bit late
> in the cycle, unfortunately, so if these are to go in for v6.15, they
> would be considered a fix imo, otherwise perhaps they are wanted for
> v6.16 or not at all (patch 3 does look useful regardless)?

My understanding is that the expectation of the patchset is to avoid using a wrong GPF DVSEC in case of dports under a same port have different config space layout. And I think the change is more closely to the description of CXL spec.

If the case(dports under a same port have different config space layout) would not happen, maybe add a comment in cxl_gpf_port_setup() is another option.

Yes, if patch 1 & 2 are considered to be merged, they are worth a fix tag. And patch 3 is an obvious cleanup change.

>
> Based on some of the topologies listed in qemu, I did some testing (and
> this was also why the same dvsec config layout) and see things working as
> expected.

Thanks for testing.


Ming

[snip]
Jonathan Cameron March 21, 2025, 12:09 p.m. UTC | #4
On Fri, 21 Mar 2025 14:55:42 +0800
Li Ming <ming.li@zohomail.com> wrote:

> On 3/21/2025 11:59 AM, Davidlohr Bueso wrote:
> > On Thu, 20 Mar 2025, Davidlohr Bueso wrote:
> >  
> >> On Wed, 19 Mar 2025, Li Ming wrote:
> >>  
> >>> But I am not sure if all dports under a same port will have same
> >>> configuration space layout, if yes, that will not be a problem. If I am
> >>> wrong, please let me know, thanks.  
> >>
> >> Yes, when caching the dvsec was suggested, it was my assumption that the
> >> config space would be the same.  
> >
> > Ultimately I don't know what the expectation is here, but your updates
> > do allow more flexibility from vendors, I guess(?). It's a bit late
> > in the cycle, unfortunately, so if these are to go in for v6.15, they
> > would be considered a fix imo, otherwise perhaps they are wanted for
> > v6.16 or not at all (patch 3 does look useful regardless)?  
> 
> My understanding is that the expectation of the patchset is to avoid using a wrong GPF DVSEC in case of dports under a same port have different config space layout. And I think the change is more closely to the description of CXL spec.
> 
> If the case(dports under a same port have different config space layout) would not happen, maybe add a comment in cxl_gpf_port_setup() is another option.
> 
> Yes, if patch 1 & 2 are considered to be merged, they are worth a fix tag. And patch 3 is an obvious cleanup change.

I think they can indeed have different layout (in theory).
Seems moderately unlikely to occur in real devices, but you never know.

So I think a fixes tag would be valid.

Jonathan

> 
> >
> > Based on some of the topologies listed in qemu, I did some testing (and
> > this was also why the same dvsec config layout) and see things working as
> > expected.  
> 
> Thanks for testing.
> 
> 
> Ming
> 
> [snip]
>
Dan Williams March 21, 2025, 5:02 p.m. UTC | #5
Li Ming wrote:
> During review all new patches in branch cxl/next. I noticed there may be
> a problem in below commit.
> 
> commit a52b6a2c1c99 ("cxl/pci: Support Global Persistent Flush (GPF)")
> 
> There is a new field gpf_dvsec in struct cxl_port to cache GPF DVSEC for
> Port(DVSEC ID 04h) location. When the first EP attaching to a cxl port,
> it will trigger locating GPF DVSEC on the cxl dport which the first EP
> is under, then the location is cached in port->gpf_dvsec. So if another
> EP under another dport is attaching, it will reuse the value of
> port->gpf_dvsec as GPF DVSEC location for this another dport. The
> problem is the cached location may be a wrong location for other dports
> of the port.
> 
> Per Table 8-2 in CXL r3.2 section 8.1.1 and CXL r3.2 section 8.1.6, Each
> CXL Downstream switch ports and CXL root ports have their own GPF DVSEC
> for Port(DVSEC ID 04h). So my understanding is that CXL subsystem should
> locate GPF DVSEC for Port for each dport rather than using the cached
> location in CXL port.
> 
> But I am not sure if all dports under a same port will have same
> configuration space layout, if yes, that will not be a problem. If I am
> wrong, please let me know, thanks.
> 
> base-commit: 3b5d43245f0a56390baaa670e1b6d898772266b3 cxl/next
> 
> Li Ming (3):
>   cxl/core: Fix caching dport GPF DVSEC issue
>   cxl/pci: Update Port GPF timeout only when the first EP attaching
>   cxl/pci: Drop the parameter is_port of cxl_gpf_get_dvsec()

These look good. I doubt a device would build different offsets
per-dport, but the real risk is more in confusing future code readers
that dports are uniform.

For the series,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

A Fixes tag seems reasonable.