diff mbox series

Question about CXL region initialization

Message ID 20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5
State New, archived
Headers show
Series Question about CXL region initialization | expand

Commit Message

Wonjae Lee Jan. 12, 2024, 9:22 a.m. UTC
Hello,

To test that regions are initialized correctly for different combinations of
capacities, I connected two CXL devices with different capacities to a CXL v2.0
compliant system and enabled CXL Interleaving in the BIOS settings. When the
system boot with Linux 6.7, I noticed something strange. it succeeds in
initializing region1 but fails to initialize region0 and displays an "HPA order
violation" message.

Does anyone have any advice on this? Below is the log for your reference: 

1) iomem
480000000-AAAffffffff : CXL Window 0
  480000000-XXXfffffff : region0
    480000000-XXXfffffff : Soft Reserved
28400000000-BBBBfffffff : CXL Window 1
  28400000000-YYYffffffff : region1
    28400000000-YYYffffffff : Soft Reserved
      28400000000-YYYffffffff : dax1.0
        28400000000-YYYffffffff : System RAM (kmem)

2) dmesg - some relevant logs with CXL DEBUG enabled
...
[] cxl_port port1: decoder1.0: range: 0x480000000-0xXXXfffffff iw: 1 ig: 512
[] cxl decoder1.0: Added to port port1
[] cxl_port port2: decoder2.0: range: 0x480000000-0xXXXfffffff iw: 1 ig: 512
[] cxl decoder2.0: Added to port port2
[] cxl_port port2: decoder2.1: range: 0x28400000000-0xYYYffffffff iw: 1 ig: 256
[] cxl decoder2.1: Added to port port2
...
[] cxl_port endpoint5: decoder5.0: range: 0x480000000-0xXXXfffffff iw: 2 ig: 256
[] cxl_port endpoint5: decoder5.1: range: 0x28400000000-0xYYYffffffff iw: 1 ig: 256
[] cxl_pci 0000:64:00.0: mem1:decoder5.0: construct_region region0 res: [mem 0x480000000-0xXXXfffffff flags 0x200] iw: 2 ig: 256
[] cxl_pci 0000:64:00.0: mem1:decoder5.1: construct_region region1 res: [mem 0x28400000000-0xYYYffffffff flags 0x200] iw: 1 ig: 256
...
[] cxl region1: mem1:endpoint5 decoder5.1 add: mem1:decoder5.1 @ 0 next: none nr_eps: 1 nr_targets: 1
[] cxl region1: pci0000:63:port2 decoder2.1 add: mem1:decoder5.1 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
[] cxl region1: pci0000:63:port2 iw: 1 ig: 256
[] cxl region1: pci0000:63:port2 target[0] = 0000:63:02.0 for mem1:decoder5.1 @ 0
[] cxl_region region1: region1: register dax_region1
...
[] cxl_port endpoint6: decoder6.0: range: 0x480000000-0xXXXfffffff iw: 2 ig: 256
...
[] cxl region0: mem0:endpoint6 decoder6.0 add: mem0:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
[] cxl region0: pci0000:3d:port1 decoder1.0 add: mem0:decoder6.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
[] cxl region0: endpoint5: HPA order violation region1:[mem 0x28400000000-0xYYYffffffff flags 0x200] vs [mem 0x480000000-0xXXXfffffff flags 0x200]
[] cxl region0: endpoint5: failed to allocate region reference


Looking at the old history, there was an issue with "HPA order violation" and a
patch was applied, could this be related?
: https://lore.kernel.org/linux-cxl/20230905211007.256385-1-alison.schofield@intel.com/


FYI, As an experiment, I tried deleting the below error handling code in
cxl/core/region.c and both region0 and 1 are initialized successfully.


Any help would be appreciated.

Thank you,
Wonjae

Comments

Alison Schofield Jan. 12, 2024, 4:24 p.m. UTC | #1
On Fri, Jan 12, 2024 at 06:22:01PM +0900, Wonjae Lee wrote:
> Hello,
> 
> To test that regions are initialized correctly for different combinations of
> capacities, I connected two CXL devices with different capacities to a CXL v2.0
> compliant system and enabled CXL Interleaving in the BIOS settings. When the
> system boot with Linux 6.7, I noticed something strange. it succeeds in
> initializing region1 but fails to initialize region0 and displays an "HPA order
> violation" message.
> 
> Does anyone have any advice on this? Below is the log for your reference: 
> 
> 1) iomem
> 480000000-AAAffffffff : CXL Window 0
>   480000000-XXXfffffff : region0
>     480000000-XXXfffffff : Soft Reserved
> 28400000000-BBBBfffffff : CXL Window 1
>   28400000000-YYYffffffff : region1
>     28400000000-YYYffffffff : Soft Reserved
>       28400000000-YYYffffffff : dax1.0
>         28400000000-YYYffffffff : System RAM (kmem)
> 
> 2) dmesg - some relevant logs with CXL DEBUG enabled
> ...
> [] cxl_port port1: decoder1.0: range: 0x480000000-0xXXXfffffff iw: 1 ig: 512
> [] cxl decoder1.0: Added to port port1
> [] cxl_port port2: decoder2.0: range: 0x480000000-0xXXXfffffff iw: 1 ig: 512
> [] cxl decoder2.0: Added to port port2
> [] cxl_port port2: decoder2.1: range: 0x28400000000-0xYYYffffffff iw: 1 ig: 256
> [] cxl decoder2.1: Added to port port2
> ...
> [] cxl_port endpoint5: decoder5.0: range: 0x480000000-0xXXXfffffff iw: 2 ig: 256
> [] cxl_port endpoint5: decoder5.1: range: 0x28400000000-0xYYYffffffff iw: 1 ig: 256
> [] cxl_pci 0000:64:00.0: mem1:decoder5.0: construct_region region0 res: [mem 0x480000000-0xXXXfffffff flags 0x200] iw: 2 ig: 256
> [] cxl_pci 0000:64:00.0: mem1:decoder5.1: construct_region region1 res: [mem 0x28400000000-0xYYYffffffff flags 0x200] iw: 1 ig: 256
> ...
> [] cxl region1: mem1:endpoint5 decoder5.1 add: mem1:decoder5.1 @ 0 next: none nr_eps: 1 nr_targets: 1
> [] cxl region1: pci0000:63:port2 decoder2.1 add: mem1:decoder5.1 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
> [] cxl region1: pci0000:63:port2 iw: 1 ig: 256
> [] cxl region1: pci0000:63:port2 target[0] = 0000:63:02.0 for mem1:decoder5.1 @ 0
> [] cxl_region region1: region1: register dax_region1
> ...
> [] cxl_port endpoint6: decoder6.0: range: 0x480000000-0xXXXfffffff iw: 2 ig: 256
> ...
> [] cxl region0: mem0:endpoint6 decoder6.0 add: mem0:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> [] cxl region0: pci0000:3d:port1 decoder1.0 add: mem0:decoder6.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
> [] cxl region0: endpoint5: HPA order violation region1:[mem 0x28400000000-0xYYYffffffff flags 0x200] vs [mem 0x480000000-0xXXXfffffff flags 0x200]
> [] cxl region0: endpoint5: failed to allocate region reference
> 
> 
> Looking at the old history, there was an issue with "HPA order violation" and a
> patch was applied, could this be related?
> : https://lore.kernel.org/linux-cxl/20230905211007.256385-1-alison.schofield@intel.com/
>

Hi Wonjae,

I recently came across this issue too and have a patch in test.

That HPA violation check made sense for user created regions where
the CXL driver is programming the decoders. For the auto regions,
it's an issue.  There is no guarantee of the order is which endpoints
are discovered during probe, and since regions are currently created
once all their member endpoints arrive, this out of order violation
occurs.

Your diff makes sense for a work around. The patch checks that the
regions decoders are not misordered, and then ignores the order
violation for auto created regions only.

I'll 'cc you directly on the patch hoping you can test it out.

Thanks,
Alison

> 
> FYI, As an experiment, I tried deleting the below error handling code in
> cxl/core/region.c and both region0 and 1 are initialized successfully.
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3e817a6f94c6..ed08ce7840df 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -771,7 +771,6 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
>                                 "%s: HPA order violation %s:%pr vs %pr\n",
>                                 dev_name(&port->dev),
>                                 dev_name(&iter->region->dev), ip->res, p->res);
> -                       return ERR_PTR(-EBUSY);
>                 }
>         }
> 
> 
> Any help would be appreciated.
> 
> Thank you,
> Wonjae
diff mbox series

Patch

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3e817a6f94c6..ed08ce7840df 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -771,7 +771,6 @@  static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
                                "%s: HPA order violation %s:%pr vs %pr\n",
                                dev_name(&port->dev),
                                dev_name(&iter->region->dev), ip->res, p->res);
-                       return ERR_PTR(-EBUSY);
                }
        }