Message ID | 20230622205523.85375-17-terry.bowman@amd.com |
---|---|
State | Superseded |
Headers | show |
Series | cxl/pci: Add support for RCH RAS error handling | expand |
Terry Bowman wrote: > From: Robert Richter <rrichter@amd.com> > > Same as for ports and dports, also store the endpoint's Component > Register mappings, use struct cxl_dev_state for that. > > The Component Register base address @component_reg_phys is no longer > used after the rework of the Component Register setup which now uses > struct member @comp_map instead. Remove the base address. > > Signed-off-by: Robert Richter <rrichter@amd.com> > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Starting with this patch my QEMU cxl configuration fails to probe the device: # cxl list -Miu { "memdev":"mem0", "pmem_size":"512.00 MiB (536.87 MB)", "serial":"0", "host":"0000:35:00.0", "state":"disabled" } ...the next patch changes where the failure occurs, but it still fails and applying the rest of the patches does not fix the issue either. I have not had the time to debug it. Please have a look and make sure that at every step of this patch set the driver is still operational. I.e. 'git bisect' should never find this conversion in a broken state. For now I will look at proceeding with patch 1-15. The cxl bits of my QEMU config are: -object memory-backend-file,id=cxl-mem1,share=on,mem-path=cxl-window1,size=$cxl_backend_size -object memory-backend-file,id=cxl-label1,share=on,mem-path=cxl-label1,size=$cxl_label_size -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0,port=0 -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,lsa=cxl-label1 -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
Hi Dan, On 6/25/23 12:38, Dan Williams wrote: > Terry Bowman wrote: >> From: Robert Richter <rrichter@amd.com> >> >> Same as for ports and dports, also store the endpoint's Component >> Register mappings, use struct cxl_dev_state for that. >> >> The Component Register base address @component_reg_phys is no longer >> used after the rework of the Component Register setup which now uses >> struct member @comp_map instead. Remove the base address. >> >> Signed-off-by: Robert Richter <rrichter@amd.com> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Starting with this patch my QEMU cxl configuration fails to probe the > device: > > # cxl list -Miu > { > "memdev":"mem0", > "pmem_size":"512.00 MiB (536.87 MB)", > "serial":"0", > "host":"0000:35:00.0", > "state":"disabled" > } > > ...the next patch changes where the failure occurs, but it still fails > and applying the rest of the patches does not fix the issue either. I > have not had the time to debug it. Please have a look and make sure that > at every step of this patch set the driver is still operational. I.e. > 'git bisect' should never find this conversion in a broken state. > Yes, we will look into this issue now. We will verify each patch builds and runs without degrading. Can you share which Qemu release/branch to test against? > For now I will look at proceeding with patch 1-15. > Ok Regards, Terry > The cxl bits of my QEMU config are: > > -object memory-backend-file,id=cxl-mem1,share=on,mem-path=cxl-window1,size=$cxl_backend_size > -object memory-backend-file,id=cxl-label1,share=on,mem-path=cxl-label1,size=$cxl_label_size > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 > -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0,port=0 > -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,lsa=cxl-label1 > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
(resending without logs attached to make vger.kernel.org happy) Dan, Jonathan, On 25.06.23 10:38:26, Dan Williams wrote: > Terry Bowman wrote: > > From: Robert Richter <rrichter@amd.com> > > > > Same as for ports and dports, also store the endpoint's Component > > Register mappings, use struct cxl_dev_state for that. > > > > The Component Register base address @component_reg_phys is no longer > > used after the rework of the Component Register setup which now uses > > struct member @comp_map instead. Remove the base address. > > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> I could fix the HDM initialization failure. The check in "cxl/hdm: Use stored Component Register mappings to map HDM decoder capability" was too strict and did not return -ENODEV as the previous implementation. We will send an updated version of the remaining patches. However, while analyzing this I found the following: > > Starting with this patch my QEMU cxl configuration fails to probe the > device: > > # cxl list -Miu > { > "memdev":"mem0", > "pmem_size":"512.00 MiB (536.87 MB)", > "serial":"0", > "host":"0000:35:00.0", > "state":"disabled" > } I have looked into this and it looks to me that the qemu model is wrong. I am referring esp. to those sections in the CXL 3.0 spec (Aug 1, 2022): Figure 9-14. CXL Link/Protocol Register Mapping in a CXL VH Figure 9-15. CXL Link/Protocol Registers in a CXL Switch Table 8-22. CXL_Capability_ID Assignment In short, the CXL capabilities are wrongly assigned to the various component registers (component abbr. taken from Table 8-22): ACPI CHBCR (RC): * includes RAS cap, misses HDM cap * expected: no RAS cap, includes HDM cap RAS cap does not make sense here and is unused. logs: > [ 468.239319] cxl port1: found RAS capability (0x80) > [ 468.240186] cxl port1: Unknown CM cap ID: 4 (0xd8) > [ 468.241083] cxl port1: Unknown CM cap ID: 0 (0x110) > [ 468.241974] cxl port1: Unknown CM cap ID: 6 (0x260) > [ 468.242876] cxl port1: Unknown CM cap ID: 8 (0xa84) CXL root port (R): * includes RAS and HDM cap * expected: includes RAS cap, misses HDM cap This causes the HDM for the root ports being not properly initialized, kernel switching to fallback mode. logs: > [ 468.250182] pcieport 0000:34:00.0: Mapped CXL Memory Device resource 0x00000000fea00000 > [ 468.250258] pcieport 0000:34:00.0: found RAS capability (0x80) > [ 468.250371] pcieport 0000:34:00.0: Unknown CM cap ID: 4 (0xd8) > [ 468.250413] pcieport 0000:34:00.0: found HDM decoder capability (0x110) > [ 468.256022] pcieport 0000:34:00.0: Unknown CM cap ID: 6 (0x260) > [ 468.256058] pcieport 0000:34:00.0: Unknown CM cap ID: 8 (0xa84) CXL 2.0 device (D2) * includes RAS and HDM cap * works as expected logs: > [ 1396.813184] cxl_pci 0000:35:00.0: found RAS capability (0x80) > [ 1396.814133] cxl_pci 0000:35:00.0: Unknown CM cap ID: 4 (0xd8) > [ 1396.815081] cxl_pci 0000:35:00.0: found HDM decoder capability (0x110) No issues here. Details below. Kernel: version: 0c0df63177e3 Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxl cmdline: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.4.0-rc6-rric-00055-g0c0df63177e3 root=UUID=8b447531-798d-41cb-a3df-ae1d9bb1eaf9 ro debug ignore_loglevel "dyndbg=file drivers/cxl/* +p" systemd.log_target=journal Qemu: version: QEMU emulator version 8.0.50 (v6.2.0-12087-g62c0c95799) cmdline: /usr/local/bin/qemu-system-x86_64 -smp 6 -machine type=q35,accel=kvm,nvdimm=on,cxl=on -hda debian.img -m 4096 -enable-kvm -object memory-backend-file,id=cxl-mem1,share=on,mem-path=cxl-window1,size=512M -object memory-backend-file,id=cxl-label1,share=on,mem-path=cxl-label1,size=512M -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0,port=0 -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,lsa=cxl-label1 -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G -vnc :0 -nic none Annotated qemu kernel boot log after loading the cxl_acpi module (including deps): > [ 468.229977] cxl_acpi ACPI0017:00: root0 added (root port) root0 added. > [ 468.231055] acpi ACPI0016:00: UID found: 52 > [ 468.231959] pci0000:34: dport added to root0 add_host_bridge_dport() found the CXL host bridge and added it as dport to root0.. > [ 468.232867] Unknown target node for memory at 0x190000000, assuming node 0 > [ 468.233901] cxl_acpi ACPI0017:00: add: decoder0.0 node: 0 range [0x190000000 - 0x28fffffff] > [ 468.235394] acpi ACPI0016:00: UID found: 52 > [ 468.236171] acpi ACPI0016:00: CHBCR found for UID 52: 0x0000000180000000 > [ 468.237193] pci0000:34: host-bridge: pci0000:34 > [ 468.238132] cxl port1: Mapped CXL Memory Device resource 0x0000000180000000 CHBCR for the host bridge was found at 0x0000000180000000. In this kernel version it is bound to port1 (not the dport), some comments on this later. Component registers and capabilities are extracted from the CHBCR: > [ 468.239319] cxl port1: found RAS capability (0x80) > [ 468.240186] cxl port1: Unknown CM cap ID: 4 (0xd8) > [ 468.241083] cxl port1: Unknown CM cap ID: 0 (0x110) > [ 468.241974] cxl port1: Unknown CM cap ID: 6 (0x260) > [ 468.242876] cxl port1: Unknown CM cap ID: 8 (0xa84) The CHBCR does not contain the HDM registers for the root ports. It (wrongly) contains the RAS caps. I would expect HDM here instead. > [ 468.243767] cxl port1: Set up component registers > [ 468.244959] pci0000:34: port1 added to root0 port1 added to root0 with the component regs from the CHBCR bound. > [ 468.245807] pci0000:34: host supports CXL > [ 468.246711] cxl_acpi ACPI0017:00: root0: add: nvdimm-bridge0 > [ 468.247792] CXL bus rescan result: 0 port 1 added to the cxl bus which triggers cxl_port_probe: Running cxl_switch_port_probe() for port1 here: > [ 468.250149] pcieport 0000:34:00.0: Component Registers found for dport: 0x00000000fea00000 The CXL root port was found by devm_cxl_port_enumerate_dports(). Its component regs are extracted: > [ 468.250182] pcieport 0000:34:00.0: Mapped CXL Memory Device resource 0x00000000fea00000 > [ 468.250258] pcieport 0000:34:00.0: found RAS capability (0x80) > [ 468.250371] pcieport 0000:34:00.0: Unknown CM cap ID: 4 (0xd8) > [ 468.250413] pcieport 0000:34:00.0: found HDM decoder capability (0x110) > [ 468.256022] pcieport 0000:34:00.0: Unknown CM cap ID: 6 (0x260) > [ 468.256058] pcieport 0000:34:00.0: Unknown CM cap ID: 8 (0xa84) The CXL root port's comp regs contain the RAS caps but also (wrongly) the HDM caps. > [ 468.256065] pcieport 0000:34:00.0: Set up component registers > [ 468.256089] pcieport 0000:34:00.0: dport added to port1 CXL root port added as dport to port1. devm_cxl_setup_hdm() is run for port1: > [ 468.256157] cxl_port port1: found RAS capability (0x80) > [ 468.260800] cxl_port port1: Unknown CM cap ID: 4 (0xd8) > [ 468.260833] cxl_port port1: Unknown CM cap ID: 0 (0x110) > [ 468.260865] cxl_port port1: Unknown CM cap ID: 6 (0x260) > [ 468.260898] cxl_port port1: Unknown CM cap ID: 8 (0xa84) port1's comp regs from above extracted again for HDM setup, but... > [ 468.260904] cxl_port port1: HDM decoder registers not implemented ... the HDM decoder could not be found. map_hdm_decoder_regs() returns a -ENODEV. > [ 468.260915] cxl_port port1: Fallback to passthrough decoder cxl_switch_port_probe() continues with devm_cxl_add_passthrough_decoder() as there was only one dport found. > [ 468.261024] cxl decoder1.0: Added to port port1 > [ 468.267186] cxl_port port1: probe: 0 Probe success. > [ 468.298485] nd_bus ndbus0: START: nd_bus.probe(ndbus0) > [ 468.299859] nd_bus ndbus0: END: nd_bus.probe(ndbus0) = 0 > [ 468.300840] cxl_nvdimm_bridge nvdimm-bridge0: probe: 0 End of log. Full log attached. Loading the cxl_pci driver later succeeds, cxl list shows: [ { "memdev":"mem0", "pmem_size":536870912, "serial":0, "host":"0000:35:00.0" } ] I am not sure if the issue is a qemu config or implementation issue. Please take a look. Am I missing something here? Thanks, -Robert
On Fri, 30 Jun 2023 21:56:44 +0200 Robert Richter <rrichter@amd.com> wrote: > (resending without logs attached to make vger.kernel.org happy) > > Dan, Jonathan, > > On 25.06.23 10:38:26, Dan Williams wrote: > > Terry Bowman wrote: > > > From: Robert Richter <rrichter@amd.com> > > > > > > Same as for ports and dports, also store the endpoint's Component > > > Register mappings, use struct cxl_dev_state for that. > > > > > > The Component Register base address @component_reg_phys is no longer > > > used after the rework of the Component Register setup which now uses > > > struct member @comp_map instead. Remove the base address. > > > > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > I could fix the HDM initialization failure. The check in "cxl/hdm: Use > stored Component Register mappings to map HDM decoder capability" was > too strict and did not return -ENODEV as the previous implementation. > > We will send an updated version of the remaining patches. > > However, while analyzing this I found the following: > > > > > Starting with this patch my QEMU cxl configuration fails to probe the > > device: > > > > # cxl list -Miu > > { > > "memdev":"mem0", > > "pmem_size":"512.00 MiB (536.87 MB)", > > "serial":"0", > > "host":"0000:35:00.0", > > "state":"disabled" > > } > > I have looked into this and it looks to me that the qemu model is > wrong. I am referring esp. to those sections in the CXL 3.0 spec (Aug > 1, 2022): Indeed looks dodgy. Ira recently highlighted that this was a mess and proposed some fixes.. (going way back, I moaned about the way this was being done being inflexible to Ben but we never cleaned it up :( https://lore.kernel.org/linux-cxl/20230517-rfc-type2-dev-v1-2-6eb2e470981b@intel.com/ We should change that function to explicitly handle ever device type rather than trying to shortcut certain shared combinations. > > Figure 9-14. CXL Link/Protocol Register Mapping in a CXL VH > Figure 9-15. CXL Link/Protocol Registers in a CXL Switch > Table 8-22. CXL_Capability_ID Assignment > > In short, the CXL capabilities are wrongly assigned to the various > component registers (component abbr. taken from Table 8-22): > > ACPI CHBCR (RC): > * includes RAS cap, misses HDM cap > * expected: no RAS cap, includes HDM cap To check, do you have multiple RP under the RC? From info below I see you only have 1. You can override by providing pxb-cxl with hdm_for_passthrough=true If not default is a pass through decoder so no HDM cap should be present (they kernel used to assume that - so we changed default in QEMU to match - I think the kernel is now fine either way) QEMU code indeed looks dodgy as it's initializing the host bridge as if it were a root port. (oops) and as you've noticed then adds HDM decoders to the RP which also makes no sense (but are ignored). I've just hacked in use of null capability (which was only defined in CXL3.0 but is fine for old devices) so that we can screen out the things that shouldn't be there by skipping initializing them. Also added an enum entry for CXL2_RC so we can handle that one correctly rather than combining it with CXL2_ROOT_PORT (that predates me so no idea why it was like that). > > RAS cap does not make sense here and is unused. > > logs: > > > [ 468.239319] cxl port1: found RAS capability (0x80) > > [ 468.240186] cxl port1: Unknown CM cap ID: 4 (0xd8) > > [ 468.241083] cxl port1: Unknown CM cap ID: 0 (0x110) > > [ 468.241974] cxl port1: Unknown CM cap ID: 6 (0x260) > > [ 468.242876] cxl port1: Unknown CM cap ID: 8 (0xa84) > > CXL root port (R): > * includes RAS and HDM cap > * expected: includes RAS cap, misses HDM cap > > This causes the HDM for the root ports being not properly initialized, > kernel switching to fallback mode. This confuses me a bit. Why is it trying to initialize something that shouldn't be there in the root ports? (which whilst a bug 'might' occur on real hardware if that was also buggy ;) > > logs: > > > [ 468.250182] pcieport 0000:34:00.0: Mapped CXL Memory Device resource 0x00000000fea00000 > > [ 468.250258] pcieport 0000:34:00.0: found RAS capability (0x80) > > [ 468.250371] pcieport 0000:34:00.0: Unknown CM cap ID: 4 (0xd8) > > [ 468.250413] pcieport 0000:34:00.0: found HDM decoder capability (0x110) > > [ 468.256022] pcieport 0000:34:00.0: Unknown CM cap ID: 6 (0x260) > > [ 468.256058] pcieport 0000:34:00.0: Unknown CM cap ID: 8 (0xa84) > > CXL 2.0 device (D2) > * includes RAS and HDM cap > * works as expected > > logs: > > > [ 1396.813184] cxl_pci 0000:35:00.0: found RAS capability (0x80) > > [ 1396.814133] cxl_pci 0000:35:00.0: Unknown CM cap ID: 4 (0xd8) > > [ 1396.815081] cxl_pci 0000:35:00.0: found HDM decoder capability (0x110) > > No issues here. > > Details below. > > Kernel: > > version: 0c0df63177e3 Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxl > cmdline: > [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.4.0-rc6-rric-00055-g0c0df63177e3 root=UUID=8b447531-798d-41cb-a3df-ae1d9bb1eaf9 ro debug ignore_loglevel "dyndbg=file drivers/cxl/* +p" systemd.log_target=journal > > Qemu: > > version: QEMU emulator version 8.0.50 (v6.2.0-12087-g62c0c95799) > cmdline: > /usr/local/bin/qemu-system-x86_64 -smp 6 -machine type=q35,accel=kvm,nvdimm=on,cxl=on -hda debian.img -m 4096 -enable-kvm -object memory-backend-file,id=cxl-mem1,share=on,mem-path=cxl-window1,size=512M -object memory-backend-file,id=cxl-label1,share=on,mem-path=cxl-label1,size=512M -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 -device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0,port=0 -device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,lsa=cxl-label1 -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G -vnc :0 -nic none > > Annotated qemu kernel boot log after loading the cxl_acpi module > (including deps): > > > [ 468.229977] cxl_acpi ACPI0017:00: root0 added (root port) > > root0 added. > > > [ 468.231055] acpi ACPI0016:00: UID found: 52 > > [ 468.231959] pci0000:34: dport added to root0 > > add_host_bridge_dport() found the CXL host bridge and added it as > dport to root0.. > > > [ 468.232867] Unknown target node for memory at 0x190000000, assuming node 0 > > [ 468.233901] cxl_acpi ACPI0017:00: add: decoder0.0 node: 0 range [0x190000000 - 0x28fffffff] > > [ 468.235394] acpi ACPI0016:00: UID found: 52 > > [ 468.236171] acpi ACPI0016:00: CHBCR found for UID 52: 0x0000000180000000 > > [ 468.237193] pci0000:34: host-bridge: pci0000:34 > > [ 468.238132] cxl port1: Mapped CXL Memory Device resource 0x0000000180000000 > > CHBCR for the host bridge was found at 0x0000000180000000. > > In this kernel version it is bound to port1 (not the dport), some > comments on this later. > > Component registers and capabilities are extracted from the CHBCR: > > > [ 468.239319] cxl port1: found RAS capability (0x80) > > [ 468.240186] cxl port1: Unknown CM cap ID: 4 (0xd8) > > [ 468.241083] cxl port1: Unknown CM cap ID: 0 (0x110) > > [ 468.241974] cxl port1: Unknown CM cap ID: 6 (0x260) > > [ 468.242876] cxl port1: Unknown CM cap ID: 8 (0xa84) > > The CHBCR does not contain the HDM registers for the root ports. > > It (wrongly) contains the RAS caps. > > I would expect HDM here instead. As above. No on HDM but agreed RAS is 'novel' and should go away. In a root bridge with just one RP the HDM is optional. > > > [ 468.243767] cxl port1: Set up component registers > > [ 468.244959] pci0000:34: port1 added to root0 > > port1 added to root0 with the component regs from the CHBCR bound. I think that's reasonable as the port1 is the Host bridge in this case. > > > [ 468.245807] pci0000:34: host supports CXL > > [ 468.246711] cxl_acpi ACPI0017:00: root0: add: nvdimm-bridge0 > > [ 468.247792] CXL bus rescan result: 0 > > port 1 added to the cxl bus which triggers cxl_port_probe: > > Running cxl_switch_port_probe() for port1 here: > > > [ 468.250149] pcieport 0000:34:00.0: Component Registers found for dport: 0x00000000fea00000 > > The CXL root port was found by devm_cxl_port_enumerate_dports(). > > Its component regs are extracted: > > > [ 468.250182] pcieport 0000:34:00.0: Mapped CXL Memory Device resource 0x00000000fea00000 > > [ 468.250258] pcieport 0000:34:00.0: found RAS capability (0x80) > > [ 468.250371] pcieport 0000:34:00.0: Unknown CM cap ID: 4 (0xd8) > > [ 468.250413] pcieport 0000:34:00.0: found HDM decoder capability (0x110) > > [ 468.256022] pcieport 0000:34:00.0: Unknown CM cap ID: 6 (0x260) > > [ 468.256058] pcieport 0000:34:00.0: Unknown CM cap ID: 8 (0xa84) > > The CXL root port's comp regs contain the RAS caps but also (wrongly) > the HDM caps. Agreed - that's garbage though kernel should probably just moan about it. > > > [ 468.256065] pcieport 0000:34:00.0: Set up component registers > > [ 468.256089] pcieport 0000:34:00.0: dport added to port1 > > CXL root port added as dport to port1. > > devm_cxl_setup_hdm() is run for port1: > > > [ 468.256157] cxl_port port1: found RAS capability (0x80) > > [ 468.260800] cxl_port port1: Unknown CM cap ID: 4 (0xd8) > > [ 468.260833] cxl_port port1: Unknown CM cap ID: 0 (0x110) > > [ 468.260865] cxl_port port1: Unknown CM cap ID: 6 (0x260) > > [ 468.260898] cxl_port port1: Unknown CM cap ID: 8 (0xa84) > > port1's comp regs from above extracted again for HDM setup, but... > > > [ 468.260904] cxl_port port1: HDM decoder registers not implemented > > ... the HDM decoder could not be found. map_hdm_decoder_regs() returns > a -ENODEV. > > > [ 468.260915] cxl_port port1: Fallback to passthrough decoder > > cxl_switch_port_probe() continues with > devm_cxl_add_passthrough_decoder() as there was only one dport found. That's what I'd expect for the RC. > > > [ 468.261024] cxl decoder1.0: Added to port port1 > > [ 468.267186] cxl_port port1: probe: 0 > > Probe success. > > > [ 468.298485] nd_bus ndbus0: START: nd_bus.probe(ndbus0) > > [ 468.299859] nd_bus ndbus0: END: nd_bus.probe(ndbus0) = 0 > > [ 468.300840] cxl_nvdimm_bridge nvdimm-bridge0: probe: 0 > > End of log. Full log attached. > > Loading the cxl_pci driver later succeeds, cxl list shows: > > [ > { > "memdev":"mem0", > "pmem_size":536870912, > "serial":0, > "host":"0000:35:00.0" > } > ] > > I am not sure if the issue is a qemu config or implementation issue. > > Please take a look. Am I missing something here? The RAS registers case is definitely a bug (and I expect there are similar cases) so we should fix that in QEMU, but impact should be small (some warnings at most) if we are providing registers the kernel doesn't recognize. In theory the spec could change in future to make them defined for additional CXL components (unlikely though!) Jonathan > > Thanks, > > -Robert
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 76743016b64c..8aee1a42d9af 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -263,6 +263,7 @@ struct cxl_poison_state { * * @dev: The device associated with this CXL state * @cxlmd: The device representing the CXL.mem capabilities of @dev + * @comp_map: component register capability mappings * @regs: Parsed register blocks * @cxl_dvsec: Offset to the PCIe device DVSEC * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH) @@ -286,7 +287,6 @@ struct cxl_poison_state { * @active_persistent_bytes: sum of hard + soft persistent * @next_volatile_bytes: volatile capacity change pending device reset * @next_persistent_bytes: persistent capacity change pending device reset - * @component_reg_phys: register base of component registers * @info: Cached DVSEC information about the device. * @serial: PCIe Device Serial Number * @event: event log driver state @@ -299,7 +299,7 @@ struct cxl_poison_state { struct cxl_dev_state { struct device *dev; struct cxl_memdev *cxlmd; - + struct cxl_register_map comp_map; struct cxl_regs regs; int cxl_dvsec; @@ -325,7 +325,6 @@ struct cxl_dev_state { u64 next_volatile_bytes; u64 next_persistent_bytes; - resource_size_t component_reg_phys; u64 serial; struct cxl_event_state event; diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index 205e2e280aed..92c6151b7206 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -49,7 +49,6 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, struct cxl_dport *parent_dport) { struct cxl_port *parent_port = parent_dport->port; - struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_port *endpoint, *iter, *down; int rc; @@ -65,8 +64,9 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, ep->next = down; } + /* The Endpoint's component regs are located in cxlds. */ endpoint = devm_cxl_add_port(host, &cxlmd->dev, - cxlds->component_reg_phys, + CXL_RESOURCE_NONE, parent_dport); if (IS_ERR(endpoint)) return PTR_ERR(endpoint); diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index 99a75c54ee39..ad4cfcd95e17 100644 --- a/drivers/cxl/pci.c +++ b/drivers/cxl/pci.c @@ -665,16 +665,14 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) * If the component registers can't be found, the cxl_pci driver may * still be useful for management functions so don't return an error. */ - cxlds->component_reg_phys = CXL_RESOURCE_NONE; - rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT, &map); + rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT, + &cxlds->comp_map); if (rc) dev_warn(&pdev->dev, "No component registers (%d)\n", rc); - else if (!map.component_map.ras.valid) + else if (!cxlds->comp_map.component_map.ras.valid) dev_dbg(&pdev->dev, "RAS registers not found\n"); - cxlds->component_reg_phys = map.resource; - - rc = cxl_map_component_regs(&map, &cxlds->regs.component, + rc = cxl_map_component_regs(&cxlds->comp_map, &cxlds->regs.component, BIT(CXL_CM_CAP_CAP_ID_RAS)); if (rc) dev_dbg(&pdev->dev, "Failed to map RAS capability.\n"); diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 34b48027b3de..fd562a5fa06f 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -1241,7 +1241,6 @@ static int cxl_mock_mem_probe(struct platform_device *pdev) cxlds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf; if (is_rcd(pdev)) { cxlds->rcd = true; - cxlds->component_reg_phys = CXL_RESOURCE_NONE; } rc = cxl_enumerate_cmds(cxlds);