Message ID | 20230113002727.11411-1-fan.ni@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] cxl-host: Fix committed check for passthrough decoder | expand |
On Fri, 13 Jan 2023 00:27:55 +0000 Fan Ni <fan.ni@samsung.com> wrote: > For passthrough decoder (a decoder hosted by a cxl component with only > one downstream port), its cache_mem_registers field COMMITTED > (see spec 2.0 8.2.5.12 - CXL HDM Decoder Capability Structure) will not > be set by the current Linux CXL driver. Without the fix, for a cxl > topology setup with a single HB and single root port, the memdev read/write > requests cannot be passed to the device successfully as the function > cxl_hdm_find_target will fail the decoder COMMITTED check and return > directly, which causes read/write not being directed to cxl type3 device. > > Before the fix, a segfault is observed when trying using cxl memory for > htop command through 'numactl --membind' after converting cxl memory > into normal RAM. We also need to fix that segfault. > > Detailed steps to reproduce the issue with the cxl setup where there is > only one HB and a memdev is directly attached to the only root port of > the HB are listed as below, > 1. cxl create-region region0 > 2. ndctl create-namespace -m dax -r region0 > 3. daxctl reconfigure-device --mode=system-ram --no-online dax0.0 > 4. daxctl online-memory dax0.0 > 5. numactl --membind=1 htop > > Signed-off-by: Fan Ni <fan.ni@samsung.com> Ah. This mess is still going on. I've not been testing with this particular combination because the kernel didn't support it. The kernel code assumes that the implementation made the choice (which is an option in the spec) to not have any HDM decoders for the pass through case. As such it never programmed them (if you dig back a long way in the region bring patch sets in the kernel you'll find some discussion of this). Now I knew that meant the configuration didn't 'work' but nothing should be crashing - unless you mean that something in linux userspace is trying to access the memory and crashing because that fails (which is fine as far as I'm concerned ;) The work around for QEMU testing so far has been to add another root port and put nothing below that. The HDM decoders then have to be implemented so the kernel does what we expect. I'm not against a more comprehensive fix. Two options come to mind. 1) Add an option to the host bridge device to tell it not to implement hdm decoders at all. I'm not keen to just automatically drop them because having decoders on a pass through HB is a valid configuration. 2) Cheat and cleanly detect a pass through situation and let the accesses through. I'm not particularly keen on this option though as it will fail to test the code once it's 'fixed' in Linux. IIRC the spec doesn't say that programming such an HDM decoder is optional. I guess we could be a bit naughty with option 1 and flip the logic even though it would break backwards compatibility. So default to no HDM decoder. I doubt anyone will notice given that's the configuration that would have worked. However I would want to keep the option to enable these decoders around. I can spin up a patch or do you want to do it? My suggestion is option 1 with default being no HDM decoder. Jonathan > --- > hw/cxl/cxl-host.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c > index 1adf61231a..5ca0d6fd8f 100644 > --- a/hw/cxl/cxl-host.c > +++ b/hw/cxl/cxl-host.c > @@ -107,8 +107,11 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr, > uint32_t target_idx; > > ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL]; > - if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > - return false; > + > + /* skip the check for passthrough decoder */ You have a mix of spaces and tabs for indentation. Should all be 4 spaces for QEMU code. > + if (FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMIT) > + && !FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > + return false; Why is this code specific to a pass through decoder? All it's telling us (I think) is no one tried to commit the decoder yet. > } > > ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG);
On Fri, Jan 13, 2023 at 09:47:25AM +0000, Jonathan Cameron wrote: > On Fri, 13 Jan 2023 00:27:55 +0000 > Fan Ni <fan.ni@samsung.com> wrote: > > > For passthrough decoder (a decoder hosted by a cxl component with only > > one downstream port), its cache_mem_registers field COMMITTED > > (see spec 2.0 8.2.5.12 - CXL HDM Decoder Capability Structure) will not > > be set by the current Linux CXL driver. Without the fix, for a cxl > > topology setup with a single HB and single root port, the memdev read/write > > requests cannot be passed to the device successfully as the function > > cxl_hdm_find_target will fail the decoder COMMITTED check and return > > directly, which causes read/write not being directed to cxl type3 device. > > > > Before the fix, a segfault is observed when trying using cxl memory for > > htop command through 'numactl --membind' after converting cxl memory > > into normal RAM. > > We also need to fix that segfault. With the patch, we do not see the segfault anymore. The segfault was there before the patch because for a passthrough decoder, we cannot find a target as the committed field check cannot pass, the read request will return 0 (in cxl_read_cfmws) which can be used for futher addressing. With the patch, we skip the committed check for passthrough decoder and the requests can be passed to the device so the segfault is fixed. Our concern is that the fix may also let the requests pass for unprogrammed decoder, which is not allowed in current code. > > > > > > Detailed steps to reproduce the issue with the cxl setup where there is > > only one HB and a memdev is directly attached to the only root port of > > the HB are listed as below, > > 1. cxl create-region region0 > > 2. ndctl create-namespace -m dax -r region0 > > 3. daxctl reconfigure-device --mode=system-ram --no-online dax0.0 > > 4. daxctl online-memory dax0.0 > > 5. numactl --membind=1 htop > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > Ah. This mess is still going on. I've not been testing with this > particular combination because the kernel didn't support it. > The kernel code assumes that the implementation made the choice > (which is an option in the spec) to not have any HDM decoders > for the pass through case. As such it never programmed them > (if you dig back a long way in the region bring patch sets in the > kernel you'll find some discussion of this). Now I knew that meant > the configuration didn't 'work' but nothing should be crashing - > unless you mean that something in linux userspace is trying to > access the memory and crashing because that fails > (which is fine as far as I'm concerned ;) > > The work around for QEMU testing so far has been to add another root > port and put nothing below that. The HDM decoders then have to be > implemented so the kernel does what we expect. Do you mean we already have the workaround somewhere or it is what we have planned? currently the kernel will create a passthrough decoder if the number of downstream is 1. If we have the workaround, there should never be a passthrough decoder being created and we should not see the issue. > > I'm not against a more comprehensive fix. Two options come to mind. > 1) Add an option to the host bridge device to tell it not to implement > hdm decoders at all. I'm not keen to just automatically drop them > because having decoders on a pass through HB is a valid configuration. > 2) Cheat and cleanly detect a pass through situation and let the accesses > through. I'm not particularly keen on this option though as it > will fail to test the code once it's 'fixed' in Linux. IIRC the spec > doesn't say that programming such an HDM decoder is optional. > > I guess we could be a bit naughty with option 1 and flip the logic even > though it would break backwards compatibility. So default to no HDM decoder. > I doubt anyone will notice given that's the configuration that would have > worked. However I would want to keep the option to enable these decoders > around. I can spin up a patch or do you want to do it? My suggestion is option > 1 with default being no HDM decoder. > > Jonathan Please feel free to spin up a patch. > > > > > --- > > hw/cxl/cxl-host.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c > > index 1adf61231a..5ca0d6fd8f 100644 > > --- a/hw/cxl/cxl-host.c > > +++ b/hw/cxl/cxl-host.c > > @@ -107,8 +107,11 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr, > > uint32_t target_idx; > > > > ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL]; > > - if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > > - return false; > > + > > + /* skip the check for passthrough decoder */ > > You have a mix of spaces and tabs for indentation. Should all be 4 spaces > for QEMU code. > > > + if (FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMIT) > > + && !FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > > + return false; > > Why is this code specific to a pass through decoder? > All it's telling us (I think) is no one tried to commit the decoder yet. > > > } > > > > ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG); >
On Fri, 13 Jan 2023 17:10:51 +0000 Fan Ni <fan.ni@samsung.com> wrote: > On Fri, Jan 13, 2023 at 09:47:25AM +0000, Jonathan Cameron wrote: > > > On Fri, 13 Jan 2023 00:27:55 +0000 > > Fan Ni <fan.ni@samsung.com> wrote: > > > > > For passthrough decoder (a decoder hosted by a cxl component with only > > > one downstream port), its cache_mem_registers field COMMITTED > > > (see spec 2.0 8.2.5.12 - CXL HDM Decoder Capability Structure) will not > > > be set by the current Linux CXL driver. Without the fix, for a cxl > > > topology setup with a single HB and single root port, the memdev read/write > > > requests cannot be passed to the device successfully as the function > > > cxl_hdm_find_target will fail the decoder COMMITTED check and return > > > directly, which causes read/write not being directed to cxl type3 device. > > > > > > Before the fix, a segfault is observed when trying using cxl memory for > > > htop command through 'numactl --membind' after converting cxl memory > > > into normal RAM. > > > > We also need to fix that segfault. > With the patch, we do not see the segfault anymore. The segfault was > there before the patch because for a passthrough decoder, we cannot find a > target as the committed field check cannot pass, the read request will > return 0 (in cxl_read_cfmws) which can be used for futher addressing. > With the patch, we skip the committed check for passthrough decoder and > the requests can be passed to the device so the segfault is fixed. Our > concern is that the fix may also let the requests pass for unprogrammed > decoder, which is not allowed in current code. Agreed on the concern. That is one reason we need a more comprehensive solution. > > > > > > > > > > Detailed steps to reproduce the issue with the cxl setup where there is > > > only one HB and a memdev is directly attached to the only root port of > > > the HB are listed as below, > > > 1. cxl create-region region0 > > > 2. ndctl create-namespace -m dax -r region0 > > > 3. daxctl reconfigure-device --mode=system-ram --no-online dax0.0 > > > 4. daxctl online-memory dax0.0 > > > 5. numactl --membind=1 htop > > > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > > > Ah. This mess is still going on. I've not been testing with this > > particular combination because the kernel didn't support it. > > The kernel code assumes that the implementation made the choice > > (which is an option in the spec) to not have any HDM decoders > > for the pass through case. As such it never programmed them > > (if you dig back a long way in the region bring patch sets in the > > kernel you'll find some discussion of this). Now I knew that meant > > the configuration didn't 'work' but nothing should be crashing - > > unless you mean that something in linux userspace is trying to > > access the memory and crashing because that fails > > (which is fine as far as I'm concerned ;) > > > > The work around for QEMU testing so far has been to add another root > > port and put nothing below that. The HDM decoders then have to be > > implemented so the kernel does what we expect. > Do you mean we already have the workaround somewhere or it is what we > have planned? currently the kernel will create a passthrough decoder if > the number of downstream is 1. If we have the workaround, there > should never be a passthrough decoder being created and we should not > see the issue. I have code as describe (now, didn't until few minutes ago). I'll send it out later this week after a little more testing / internal review. > > > > I'm not against a more comprehensive fix. Two options come to mind. > > 1) Add an option to the host bridge device to tell it not to implement > > hdm decoders at all. I'm not keen to just automatically drop them > > because having decoders on a pass through HB is a valid configuration. > > 2) Cheat and cleanly detect a pass through situation and let the accesses > > through. I'm not particularly keen on this option though as it > > will fail to test the code once it's 'fixed' in Linux. IIRC the spec > > doesn't say that programming such an HDM decoder is optional. > > > > I guess we could be a bit naughty with option 1 and flip the logic even > > though it would break backwards compatibility. So default to no HDM decoder. > > I doubt anyone will notice given that's the configuration that would have > > worked. However I would want to keep the option to enable these decoders > > around. I can spin up a patch or do you want to do it? My suggestion is option > > 1 with default being no HDM decoder. I went with this option. Implementation uses the reset callback in QEMU for the pxb-cxl to edit the capability header table to 'hide' the HDM decoder if it finds just one downstream port (on second reset, as on first one the ports have not yet been created). In cxl-host.c we direct everything to that downstream port. > > > > Jonathan > Please feel free to spin up a patch. Will do, Jonathan > > > > > > > > > --- > > > hw/cxl/cxl-host.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c > > > index 1adf61231a..5ca0d6fd8f 100644 > > > --- a/hw/cxl/cxl-host.c > > > +++ b/hw/cxl/cxl-host.c > > > @@ -107,8 +107,11 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr, > > > uint32_t target_idx; > > > > > > ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL]; > > > - if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > > > - return false; > > > + > > > + /* skip the check for passthrough decoder */ > > > > You have a mix of spaces and tabs for indentation. Should all be 4 spaces > > for QEMU code. > > > > > + if (FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMIT) > > > + && !FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > > > + return false; > > > > Why is this code specific to a pass through decoder? > > All it's telling us (I think) is no one tried to commit the decoder yet. > > > > > } > > > > > > ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG); > >
On Mon, 16 Jan 2023 14:37:23 +0000 Jonathan Cameron via <qemu-devel@nongnu.org> wrote: > On Fri, 13 Jan 2023 17:10:51 +0000 > Fan Ni <fan.ni@samsung.com> wrote: > > > On Fri, Jan 13, 2023 at 09:47:25AM +0000, Jonathan Cameron wrote: > > > > > On Fri, 13 Jan 2023 00:27:55 +0000 > > > Fan Ni <fan.ni@samsung.com> wrote: > > > > > > > For passthrough decoder (a decoder hosted by a cxl component with only > > > > one downstream port), its cache_mem_registers field COMMITTED > > > > (see spec 2.0 8.2.5.12 - CXL HDM Decoder Capability Structure) will not > > > > be set by the current Linux CXL driver. Without the fix, for a cxl > > > > topology setup with a single HB and single root port, the memdev read/write > > > > requests cannot be passed to the device successfully as the function > > > > cxl_hdm_find_target will fail the decoder COMMITTED check and return > > > > directly, which causes read/write not being directed to cxl type3 device. > > > > > > > > Before the fix, a segfault is observed when trying using cxl memory for > > > > htop command through 'numactl --membind' after converting cxl memory > > > > into normal RAM. > > > > > > We also need to fix that segfault. > > With the patch, we do not see the segfault anymore. The segfault was > > there before the patch because for a passthrough decoder, we cannot find a > > target as the committed field check cannot pass, the read request will > > return 0 (in cxl_read_cfmws) which can be used for futher addressing. > > With the patch, we skip the committed check for passthrough decoder and > > the requests can be passed to the device so the segfault is fixed. Our > > concern is that the fix may also let the requests pass for unprogrammed > > decoder, which is not allowed in current code. > > Agreed on the concern. That is one reason we need a more comprehensive solution. > > > > > > > > > > > > > > > Detailed steps to reproduce the issue with the cxl setup where there is > > > > only one HB and a memdev is directly attached to the only root port of > > > > the HB are listed as below, > > > > 1. cxl create-region region0 > > > > 2. ndctl create-namespace -m dax -r region0 > > > > 3. daxctl reconfigure-device --mode=system-ram --no-online dax0.0 > > > > 4. daxctl online-memory dax0.0 > > > > 5. numactl --membind=1 htop > > > > > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > > > > > Ah. This mess is still going on. I've not been testing with this > > > particular combination because the kernel didn't support it. > > > The kernel code assumes that the implementation made the choice > > > (which is an option in the spec) to not have any HDM decoders > > > for the pass through case. As such it never programmed them > > > (if you dig back a long way in the region bring patch sets in the > > > kernel you'll find some discussion of this). Now I knew that meant > > > the configuration didn't 'work' but nothing should be crashing - > > > unless you mean that something in linux userspace is trying to > > > access the memory and crashing because that fails > > > (which is fine as far as I'm concerned ;) > > > > > > The work around for QEMU testing so far has been to add another root > > > port and put nothing below that. The HDM decoders then have to be > > > implemented so the kernel does what we expect. > > Do you mean we already have the workaround somewhere or it is what we > > have planned? currently the kernel will create a passthrough decoder if > > the number of downstream is 1. If we have the workaround, there > > should never be a passthrough decoder being created and we should not > > see the issue. > > I have code as describe (now, didn't until few minutes ago). > I'll send it out later this week after a little more testing / internal review. > > > > > > > I'm not against a more comprehensive fix. Two options come to mind. > > > 1) Add an option to the host bridge device to tell it not to implement > > > hdm decoders at all. I'm not keen to just automatically drop them > > > because having decoders on a pass through HB is a valid configuration. > > > 2) Cheat and cleanly detect a pass through situation and let the accesses > > > through. I'm not particularly keen on this option though as it > > > will fail to test the code once it's 'fixed' in Linux. IIRC the spec > > > doesn't say that programming such an HDM decoder is optional. > > > > > > I guess we could be a bit naughty with option 1 and flip the logic even > > > though it would break backwards compatibility. So default to no HDM decoder. > > > I doubt anyone will notice given that's the configuration that would have > > > worked. However I would want to keep the option to enable these decoders > > > around. I can spin up a patch or do you want to do it? My suggestion is option > > > 1 with default being no HDM decoder. > > I went with this option. Implementation uses the reset callback in QEMU > for the pxb-cxl to edit the capability header table to 'hide' the HDM decoder > if it finds just one downstream port (on second reset, as on first one the > ports have not yet been created). In cxl-host.c we direct everything to that > downstream port. > > > > > > > Jonathan > > Please feel free to spin up a patch. > > Will do, I've run out of time today to write a proper cover letter etc for the patch set and want to run a few tests on non pass through cases to make sure there are no side effects, but if you want to try it in the meantime I've pushed an updated tree to gitlab. https://gitlab.com/jic23/qemu/-/tree/cxl-2023-01-20 Top two patches enable a pass through decoder for the host bridge if there is only one root port below it. Thanks, Jonathan > > Jonathan > > > > > > > > > > > > > > --- > > > > hw/cxl/cxl-host.c | 7 +++++-- > > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c > > > > index 1adf61231a..5ca0d6fd8f 100644 > > > > --- a/hw/cxl/cxl-host.c > > > > +++ b/hw/cxl/cxl-host.c > > > > @@ -107,8 +107,11 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr, > > > > uint32_t target_idx; > > > > > > > > ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL]; > > > > - if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > > > > - return false; > > > > + > > > > + /* skip the check for passthrough decoder */ > > > > > > You have a mix of spaces and tabs for indentation. Should all be 4 spaces > > > for QEMU code. > > > > > > > + if (FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMIT) > > > > + && !FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { > > > > + return false; > > > > > > Why is this code specific to a pass through decoder? > > > All it's telling us (I think) is no one tried to commit the decoder yet. > > > > > > > } > > > > > > > > ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG); > > > > >
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c index 1adf61231a..5ca0d6fd8f 100644 --- a/hw/cxl/cxl-host.c +++ b/hw/cxl/cxl-host.c @@ -107,8 +107,11 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr, uint32_t target_idx; ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL]; - if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { - return false; + + /* skip the check for passthrough decoder */ + if (FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMIT) + && !FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) { + return false; } ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG);
For passthrough decoder (a decoder hosted by a cxl component with only one downstream port), its cache_mem_registers field COMMITTED (see spec 2.0 8.2.5.12 - CXL HDM Decoder Capability Structure) will not be set by the current Linux CXL driver. Without the fix, for a cxl topology setup with a single HB and single root port, the memdev read/write requests cannot be passed to the device successfully as the function cxl_hdm_find_target will fail the decoder COMMITTED check and return directly, which causes read/write not being directed to cxl type3 device. Before the fix, a segfault is observed when trying using cxl memory for htop command through 'numactl --membind' after converting cxl memory into normal RAM. Detailed steps to reproduce the issue with the cxl setup where there is only one HB and a memdev is directly attached to the only root port of the HB are listed as below, 1. cxl create-region region0 2. ndctl create-namespace -m dax -r region0 3. daxctl reconfigure-device --mode=system-ram --no-online dax0.0 4. daxctl online-memory dax0.0 5. numactl --membind=1 htop Signed-off-by: Fan Ni <fan.ni@samsung.com> --- hw/cxl/cxl-host.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)