Message ID | 20250402232552.999634-1-gourry@gourry.net |
---|---|
State | New |
Headers | show |
Series | [v2] cxl: core/region - ignore interleave granularity when ways=1 | expand |
Gregory Price wrote: > When validating decoder IW/IG when setting up regions, the granularity > is irrelevant when iw=1 - all accesses will always route to the only > target anyway - so all ig values are "correct". Loosen the requirement > that `ig = (parent_iw * parent_ig)` when iw=1. > > On some Zen5 platforms, the platform BIOS specifies a 256-byte > interleave granularity window for host bridges when there is only > one target downstream. This leads to Linux rejecting the configuration > of a region with a x2 root with two x1 hostbridges. > > Decoder Programming: > root - iw:2 ig:256 > hb1 - iw:1 ig:256 (Linux expects 512) > hb2 - iw:1 ig:256 (Linux expects 512) > ep1 - iw:2 ig:256 > ep2 - iw:2 ig:256 > > This change allows all decoders downstream of a passthrough decoder to > also be configured as passthrough (iw:1 ig:X), but still disallows > downstream decoders from applying subsequent interleaves. > > e.g. in the above example if there was another decoder south of hb1 > attempting to interleave 2 endpoints - Linux would enforce hb1.ig=512 > because the southern decoder would have iw:2 and require ig=pig*piw. > > Signed-off-by: Gregory Price <gourry@gourry.net> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> Thanks for the respin! Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Hi Gregory and CXL community Cc Goto-san Wow, our platform has encountered a similar issue, and I am intending to consult the community regarding this matter. I drafted similar patch locally, but I wonder if we should "ignore" the IG or "program" the IG to the decoder. Let me post the mail(question) from my drafts in your thread(I hope you I hope you won't mind). ====================================== [Question] granularity is a don't care if not interleaving? I saw this sentence " granularity is a don't care if not interleaving" in this patch "[ndctl PATCH 2/6] cxl/list: Add interleave parameters to decoder listings" [1] This reminds me that our platform programed an unmatched interleave_granularity in HDM decoders between endpoint and the host-bridge, see below: CXL Root CFMW0 / \ CFMW1 decoder0.0 decoder0.1 128 GiB IW:1 IG:256 IW:1 IG:256 128 GiB \ / Host Bridge / \ decoder5.0 decoder5.1 IW:1 IG:256 IW:1 IG:256 / \ Endpoint9 Endpoint10 | | decoder9.0 decoder10.0 IW:1 IG:1024 IW:1 IG:1024 With this setup, the Linux kernel attempts to create regions for Endpoint9 and Endpoint10 but fails because the endpoint decoders’ interleave granularity (IG=1024) does not match their parent decoders’ IG (256). Ideally, the endpoint decoders are expected to be configured for IG=256. Currently, we learned that we have only special handling for the root decoders [2][3]. My question are: Q1: whether "granularity is a don't care if not interleaving" is applied to all HDM decoders(including root decoder and HDM decoder) In current cxl cli , it will not show any interleave_granularity at all when ways==1(no-interleaving) $ cxl list -PDE | grep granularity # show nothing when ways==1 Per the CXL Spec r3.1 IG: "The number of consecutive bytes that are assigned to each target in the Target List." Q2: Does this imply a configuration where the number of ways>1? Q3: Does the IG also represent the device's capabilities? When programming, should one also consider whether the device supports it? If "granularity is a don't care if not interleaving" is true, how about below changes diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 75cd5dbb41e4..647fe2ce18ca 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1435,6 +1435,11 @@ static int cxl_port_setup_targets(struct cxl_port *port, + if (cxld->interleave_ways == 1 && cxld->interleave_granularity != ig) { + cxld->interleave_granularity = ig; + /* program HDM decoder */ + ... + } if (cxld->interleave_ways != iw || cxld->interleave_granularity != ig || cxld->hpa_range.start != p->res->start || [1] https://lore.kernel.org/all/165973188300.1528532.222988685552982872.stgit@dwillia2-xfh.jf.intel.com/ [2] https://lore.kernel.org/all/165853776917.2430596.16823264262010844458.stgit@dwillia2-xfh.jf.intel.com/ [3] https://lore.kernel.org/all/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in/ Thanks Zhijian On 03/04/2025 07:25, Gregory Price wrote: > When validating decoder IW/IG when setting up regions, the granularity > is irrelevant when iw=1 - all accesses will always route to the only > target anyway - so all ig values are "correct". Loosen the requirement > that `ig = (parent_iw * parent_ig)` when iw=1. > > On some Zen5 platforms, the platform BIOS specifies a 256-byte > interleave granularity window for host bridges when there is only > one target downstream. This leads to Linux rejecting the configuration > of a region with a x2 root with two x1 hostbridges. > > Decoder Programming: > root - iw:2 ig:256 > hb1 - iw:1 ig:256 (Linux expects 512) > hb2 - iw:1 ig:256 (Linux expects 512) > ep1 - iw:2 ig:256 > ep2 - iw:2 ig:256 > > This change allows all decoders downstream of a passthrough decoder to > also be configured as passthrough (iw:1 ig:X), but still disallows > downstream decoders from applying subsequent interleaves. > > e.g. in the above example if there was another decoder south of hb1 > attempting to interleave 2 endpoints - Linux would enforce hb1.ig=512 > because the southern decoder would have iw:2 and require ig=pig*piw. > > Signed-off-by: Gregory Price <gourry@gourry.net> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/core/region.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 04bc6cad092c..dec262eadf9a 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -1553,7 +1553,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, > > if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { > if (cxld->interleave_ways != iw || > - cxld->interleave_granularity != ig || > + (iw > 1 && cxld->interleave_granularity != ig) || > cxled->spa_range.start != p->res->start || > cxled->spa_range.end != p->res->end || > ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
Hello, Gregory-san, and everyone > Hi Gregory and CXL community > Cc Goto-san > > Wow, our platform has encountered a similar issue, Yes, I actually encountered this issue. The endpoint decoders show the granularity value as 1024 bytes, but other decoders show 256 bytes, which is the default value, even when the interleave setting is one. As a result, the error message that this patch avoids appeared, and initialization failed, as described in the following email: https://lore.kernel.org/linux-cxl/OSYPR01MB53525FD64A9AFBAEEC1E19A390112@OSYPR01MB5352.jpnprd01.prod.outlook.com/T/#m811bdd93bca3887b4c14a2a20b8f21a77dcf2eae So, Gregory's patch is welcome for me. > and I am intending to consult the community regarding this matter. > I drafted similar patch locally, but I wonder if we should "ignore" the IG or > "program" the IG to the decoder. I'll try to summarize Li-san's question. Does anyone know what will happen if the CXL driver does not program the IG value to the decoder? Will it work correctly without any problems? My initial approach to avoid this problem was to "program" the decoder, as shown below. This patch is a very early trial version to avoid the error we encountered. However, Li-san's concern is that this patch writes the IG value to the decoders. Is this "programming," as shown below, unnecessary? --- diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 0f6661297152..46b933cc8b9f 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -937,6 +937,14 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, } (snip) + + if (cxld->interleave_granularity != 256) { + pr_info("y-goto: force set ig=256\n"); + cxld->interleave_granularity = 256; + u32p_replace_bits(&ctrl, 0, CXL_HDM_DECODER0_CTRL_IG_MASK); <------- make the value from ig + writel(ctrl, hdm + CXL_HDM_DECODER0_CTRL_OFFSET(which)); <--- "program" the decoder of ig value + } + ---- Thanks, ---- Yasunori Goto > > Let me post the mail(question) from my drafts in your thread(I hope you I hope > you won't mind). > ====================================== > [Question] granularity is a don't care if not interleaving? > I saw this sentence " granularity is a don't care if not interleaving" in this patch > "[ndctl PATCH 2/6] cxl/list: Add interleave parameters to decoder listings" [1] > > This reminds me that our platform programed an unmatched > interleave_granularity in HDM decoders between endpoint and the host-bridge, > see below: > > CXL Root > CFMW0 / \ CFMW1 > decoder0.0 decoder0.1 > 128 GiB IW:1 IG:256 IW:1 IG:256 128 GiB > \ / > Host Bridge > / \ > decoder5.0 decoder5.1 > IW:1 IG:256 IW:1 IG:256 > / \ > Endpoint9 Endpoint10 > | | > decoder9.0 decoder10.0 > IW:1 IG:1024 IW:1 IG:1024 > > With this setup, the Linux kernel attempts to create regions for Endpoint9 and > Endpoint10 but fails because the endpoint decoders’ interleave granularity > (IG=1024) does not match their parent decoders’ IG (256). Ideally, the > endpoint decoders are expected to be configured for IG=256. > > Currently, we learned that we have only special handling for the root decoders > [2][3]. > > My question are: > Q1: whether "granularity is a don't care if not interleaving" is applied to all HDM > decoders(including root decoder and HDM decoder) > > In current cxl cli , it will not show any interleave_granularity at all when > ways==1(no-interleaving) $ cxl list -PDE | grep granularity # show nothing > when ways==1 > > Per the CXL Spec r3.1 > IG: "The number of consecutive bytes that are assigned to each target in the > Target List." > Q2: Does this imply a configuration where the number of ways>1? > > Q3: Does the IG also represent the device's capabilities? When programming, > should one also consider whether the device supports it? > > > If "granularity is a don't care if not interleaving" is true, how about below > changes > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index > 75cd5dbb41e4..647fe2ce18ca 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -1435,6 +1435,11 @@ static int cxl_port_setup_targets(struct cxl_port > *port, > + if (cxld->interleave_ways == 1 && > cxld->interleave_granularity != ig) { > + cxld->interleave_granularity = ig; > + /* program HDM decoder */ > + ... > + } > if (cxld->interleave_ways != iw || > cxld->interleave_granularity != ig || > cxld->hpa_range.start != p->res->start || > > > > [1] > https://lore.kernel.org/all/165973188300.1528532.222988685552982872.stgit@ > dwillia2-xfh.jf.intel.com/ > [2] > https://lore.kernel.org/all/165853776917.2430596.16823264262010844458.stgi > t@dwillia2-xfh.jf.intel.com/ > [3] > https://lore.kernel.org/all/169824893473.1403938.16110924262989774582.stgi > t@bgt-140510-bm03.eng.stellus.in/ > > Thanks > Zhijian > > On 03/04/2025 07:25, Gregory Price wrote: > > When validating decoder IW/IG when setting up regions, the granularity > > is irrelevant when iw=1 - all accesses will always route to the only > > target anyway - so all ig values are "correct". Loosen the requirement > > that `ig = (parent_iw * parent_ig)` when iw=1. > > > > On some Zen5 platforms, the platform BIOS specifies a 256-byte > > interleave granularity window for host bridges when there is only one > > target downstream. This leads to Linux rejecting the configuration of > > a region with a x2 root with two x1 hostbridges. > > > > Decoder Programming: > > root - iw:2 ig:256 > > hb1 - iw:1 ig:256 (Linux expects 512) > > hb2 - iw:1 ig:256 (Linux expects 512) > > ep1 - iw:2 ig:256 > > ep2 - iw:2 ig:256 > > > > This change allows all decoders downstream of a passthrough decoder to > > also be configured as passthrough (iw:1 ig:X), but still disallows > > downstream decoders from applying subsequent interleaves. > > > > e.g. in the above example if there was another decoder south of hb1 > > attempting to interleave 2 endpoints - Linux would enforce hb1.ig=512 > > because the southern decoder would have iw:2 and require ig=pig*piw. > > > > Signed-off-by: Gregory Price <gourry@gourry.net> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > > --- > > drivers/cxl/core/region.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > > index 04bc6cad092c..dec262eadf9a 100644 > > --- a/drivers/cxl/core/region.c > > +++ b/drivers/cxl/core/region.c > > @@ -1553,7 +1553,7 @@ static int cxl_port_setup_targets(struct > > cxl_port *port, > > > > if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { > > if (cxld->interleave_ways != iw || > > - cxld->interleave_granularity != ig || > > + (iw > 1 && cxld->interleave_granularity != ig) || > > cxled->spa_range.start != p->res->start || > > cxled->spa_range.end != p->res->end || > > ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
Zhijian Li (Fujitsu) wrote: > Hi Gregory and CXL community > Cc Goto-san > > Wow, our platform has encountered a similar issue, and I am intending to consult > the community regarding this matter. > > I drafted similar patch locally, but I wonder if we should "ignore" the IG > or "program" the IG to the decoder. > > Let me post the mail(question) from my drafts in your thread(I hope you I hope you won't mind). > ====================================== > [Question] granularity is a don't care if not interleaving? > I saw this sentence " granularity is a don't care if not interleaving" in this > patch "[ndctl PATCH 2/6] cxl/list: Add interleave parameters to decoder listings" [1] > > This reminds me that our platform programed an unmatched interleave_granularity in HDM decoders > between endpoint and the host-bridge, see below: > > CXL Root > CFMW0 / \ CFMW1 > decoder0.0 decoder0.1 > 128 GiB IW:1 IG:256 IW:1 IG:256 128 GiB > \ / > Host Bridge > / \ > decoder5.0 decoder5.1 > IW:1 IG:256 IW:1 IG:256 > / \ > Endpoint9 Endpoint10 > | | > decoder9.0 decoder10.0 > IW:1 IG:1024 IW:1 IG:1024 Why 1024? Yes, the value does not matter, but attempting 1024 feels more like a unit test than a production use case. > > With this setup, the Linux kernel attempts to create regions for Endpoint9 and Endpoint10 > but fails because the endpoint decoders’ interleave granularity (IG=1024) does not > match their parent decoders’ IG (256). Ideally, the endpoint decoders are expected to be > configured for IG=256. > > Currently, we learned that we have only special handling for the root decoders [2][3]. > > My question are: > Q1: whether "granularity is a don't care if not interleaving" is applied to > all HDM decoders(including root decoder and HDM decoder) All decoders. > In current cxl cli , it will not show any interleave_granularity at all when ways==1(no-interleaving) > $ cxl list -PDE | grep granularity # show nothing when ways==1 Right, because the value theoretically has no functional impact in the ways==1 case. However, it errantly ends up having practical impact in these corners cases where code performs granularity comparisons without considering that ways may be 1. > Per the CXL Spec r3.1 > IG: "The number of consecutive bytes that are assigned to each target in the Target List." > Q2: Does this imply a configuration where the number of ways>1? Right, the granularity is the boundary at which the decoder switches to the next target in the target list. When ways=1 granularity can be infinity or zero by that definition. > Q3: Does the IG also represent the device's capabilities? When programming, should one also > consider whether the device supports it? Yes, see bits [9:8] in the CXL HDM Decoder Capability Register (CXL 3.2 8.2.4.20.1). So even though the math should not matter, I would still expect the driver to try to be careful to make sure that IG+8 is less than the address-bit max. > If "granularity is a don't care if not interleaving" is true, how about below changes Part of me says, "yes, that should be ok", another part of me says "what is the practical benefit of allowing any granularity to be specified?". So the fix from Gregory is limited to the case of "whoops, the platform BIOS thought this was a good idea even though it does not matter in practice, teach Linux to be lenient in this case.". The proposal to accept that in all case allows user-created regions to have odd large granularity sizes in the iw=1 case, and I am skeptical it is worth supporting that now.
Yasunori Gotou (Fujitsu) wrote: > Hello, Gregory-san, and everyone > > > Hi Gregory and CXL community > > Cc Goto-san > > > > Wow, our platform has encountered a similar issue, > > Yes, I actually encountered this issue. The endpoint decoders show the granularity value as 1024 bytes, > but other decoders show 256 bytes, which is the default value, even when the interleave setting is one. > > As a result, the error message that this patch avoids appeared, and initialization failed, as described in the following email: > https://lore.kernel.org/linux-cxl/OSYPR01MB53525FD64A9AFBAEEC1E19A390112@OSYPR01MB5352.jpnprd01.prod.outlook.com/T/#m811bdd93bca3887b4c14a2a20b8f21a77dcf2eae > > So, Gregory's patch is welcome for me. > > > and I am intending to consult the community regarding this matter. > > I drafted similar patch locally, but I wonder if we should "ignore" the IG or > > "program" the IG to the decoder. > > I'll try to summarize Li-san's question. > Does anyone know what will happen if the CXL driver does not program the IG value to the decoder? > Will it work correctly without any problems? The IG is always valid, either it is the default 0 which is 256, or a stale value from a previous configuration. > My initial approach to avoid this problem was to "program" the decoder, as shown below. > This patch is a very early trial version to avoid the error we encountered. > However, Li-san's concern is that this patch writes the IG value to the decoders. > Is this "programming," as shown below, unnecessary? If the decoder already has iw=1 and ig=1024 when the driver first enumerates the decoder then that is a platform BIOS compatibility concern where Linux should try to accommodate without touching the decoder. The concern about writing to the HDM control register is that there is a lag awaiting the "committed" bit being set. See cxld_await_commit(). The spec is silent on what what happens to inflight transactions between setting "commit" and awaiting "committed". I interpret that as undefined behavior and is why the driver is careful to make sure HDM is unmapped before changing decoders.
Hi Dan I am grateful for your prompt response. On 03/04/2025 12:42, Dan Williams wrote: > Zhijian Li (Fujitsu) wrote: >> Hi Gregory and CXL community >> Cc Goto-san >> >> | | >> decoder9.0 decoder10.0 >> IW:1 IG:1024 IW:1 IG:1024 > > Why 1024? Yes, the value does not matter, but attempting 1024 feels more > like a unit test than a production use case. I am uncertain, it appears to be this way when we get the device. I presume it should not be a side effect in no-interleaving case. Thank you for your answers to these questions. Your reply has truly cleared up my confusion. Once again, thank you! Thanks Zhijian >> >> My question are: >> Q1: whether "granularity is a don't care if not interleaving" is applied to >> all HDM decoders(including root decoder and HDM decoder) > > All decoders.> >> In current cxl cli , it will not show any interleave_granularity at all when ways==1(no-interleaving) >> $ cxl list -PDE | grep granularity # show nothing when ways==1 > > Right, because the value theoretically has no functional impact in the > ways==1 case. However, it errantly ends up having practical impact in > these corners cases where code performs granularity comparisons without > considering that ways may be 1. > >> Per the CXL Spec r3.1 >> IG: "The number of consecutive bytes that are assigned to each target in the Target List." >> Q2: Does this imply a configuration where the number of ways>1? > > Right, the granularity is the boundary at which the decoder switches to > the next target in the target list. When ways=1 granularity can be > infinity or zero by that definition. > >> Q3: Does the IG also represent the device's capabilities? When programming, should one also >> consider whether the device supports it? > > Yes, see bits [9:8] in the CXL HDM Decoder Capability Register (CXL 3.2 > 8.2.4.20.1). So even though the math should not matter, I would still > expect the driver to try to be careful to make sure that IG+8 is less > than the address-bit max. > >> If "granularity is a don't care if not interleaving" is true, how about below changes > > Part of me says, "yes, that should be ok", another part of me says "what > is the practical benefit of allowing any granularity to be specified?". > > So the fix from Gregory is limited to the case of "whoops, the platform > BIOS thought this was a good idea even though it does not matter in > practice, teach Linux to be lenient in this case.". > > The proposal to accept that in all case allows user-created regions to > have odd large granularity sizes in the iw=1 case, and I am skeptical > it is worth supporting that now.
On Wed, 2 Apr 2025 19:25:52 -0400 Gregory Price <gourry@gourry.net> wrote: > When validating decoder IW/IG when setting up regions, the granularity > is irrelevant when iw=1 - all accesses will always route to the only > target anyway - so all ig values are "correct". Loosen the requirement > that `ig = (parent_iw * parent_ig)` when iw=1. > > On some Zen5 platforms, the platform BIOS specifies a 256-byte > interleave granularity window for host bridges when there is only > one target downstream. This leads to Linux rejecting the configuration > of a region with a x2 root with two x1 hostbridges. > > Decoder Programming: > root - iw:2 ig:256 > hb1 - iw:1 ig:256 (Linux expects 512) > hb2 - iw:1 ig:256 (Linux expects 512) > ep1 - iw:2 ig:256 > ep2 - iw:2 ig:256 > > This change allows all decoders downstream of a passthrough decoder to > also be configured as passthrough (iw:1 ig:X), but still disallows > downstream decoders from applying subsequent interleaves. > > e.g. in the above example if there was another decoder south of hb1 > attempting to interleave 2 endpoints - Linux would enforce hb1.ig=512 > because the southern decoder would have iw:2 and require ig=pig*piw. > > Signed-off-by: Gregory Price <gourry@gourry.net> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reasonable work around and good explanation. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/cxl/core/region.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 04bc6cad092c..dec262eadf9a 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -1553,7 +1553,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, > > if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { > if (cxld->interleave_ways != iw || > - cxld->interleave_granularity != ig || > + (iw > 1 && cxld->interleave_granularity != ig) || > cxled->spa_range.start != p->res->start || > cxled->spa_range.end != p->res->end || > ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 04bc6cad092c..dec262eadf9a 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1553,7 +1553,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { if (cxld->interleave_ways != iw || - cxld->interleave_granularity != ig || + (iw > 1 && cxld->interleave_granularity != ig) || cxled->spa_range.start != p->res->start || cxled->spa_range.end != p->res->end || ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {