Message ID | 20240905-const_dfc_prepare-v4-1-4180e1d5a244@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | driver core: Prevent device_find_child() from modifying caller's match data | expand |
On Thu, Sep 05, 2024 at 08:36:09AM +0800, Zijun Hu wrote: > From: Zijun Hu <quic_zijuhu@quicinc.com> > > To prepare for constifying the following old driver core API: > > struct device *device_find_child(struct device *dev, void *data, > int (*match)(struct device *dev, void *data)); > to new: > struct device *device_find_child(struct device *dev, const void *data, > int (*match)(struct device *dev, const void *data)); > > The new API does not allow its match function (*match)() to modify > caller's match data @*data, but match_free_decoder() as the old API's > match function indeed modifies relevant match data, so it is not suitable > for the new API any more, solved by using device_for_each_child() to > implement relevant finding free cxl decoder function. > > By the way, this commit does not change any existing logic. > > Suggested-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> > --- > drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++------ > 1 file changed, 24 insertions(+), 6 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 21ad5f242875..c2068e90bf2f 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -794,10 +794,15 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) > return rc; > } > > +struct cxld_match_data { > + int id; > + struct device *target_device; > +}; > + > static int match_free_decoder(struct device *dev, void *data) > { > + struct cxld_match_data *match_data = data; > struct cxl_decoder *cxld; > - int *id = data; > > if (!is_switch_decoder(dev)) > return 0; > @@ -805,17 +810,31 @@ static int match_free_decoder(struct device *dev, void *data) > cxld = to_cxl_decoder(dev); > > /* enforce ordered allocation */ > - if (cxld->id != *id) > + if (cxld->id != match_data->id) > return 0; > > - if (!cxld->region) > + if (!cxld->region) { > + match_data->target_device = get_device(dev); Where is put_device() called? Ah, it's on the drop later on after find_free_decoder(), right? > return 1; > + } > > - (*id)++; > + match_data->id++; > > return 0; > } > > +/* NOTE: need to drop the reference with put_device() after use. */ > +static struct device *find_free_decoder(struct device *parent) > +{ > + struct cxld_match_data match_data = { > + .id = 0, > + .target_device = NULL, > + }; > + > + device_for_each_child(parent, &match_data, match_free_decoder); > + return match_data.target_device; > +} > + > static int match_auto_decoder(struct device *dev, void *data) > { > struct cxl_region_params *p = data; > @@ -840,7 +859,6 @@ cxl_region_find_decoder(struct cxl_port *port, > struct cxl_region *cxlr) > { > struct device *dev; > - int id = 0; > > if (port == cxled_to_port(cxled)) > return &cxled->cxld; > @@ -849,7 +867,7 @@ cxl_region_find_decoder(struct cxl_port *port, > dev = device_find_child(&port->dev, &cxlr->params, > match_auto_decoder); > else > - dev = device_find_child(&port->dev, &id, match_free_decoder); > + dev = find_free_decoder(&port->dev); This still feels more complex that I think it should be. Why not just modify the needed device information after the device is found? What exactly is being changed in the match_free_decoder that needs to keep "state"? This feels odd. thanks, greg k-h
On 9/5/2024 1:32 PM, Greg Kroah-Hartman wrote: > On Thu, Sep 05, 2024 at 08:36:09AM +0800, Zijun Hu wrote: >> From: Zijun Hu <quic_zijuhu@quicinc.com> >> >> To prepare for constifying the following old driver core API: >> >> struct device *device_find_child(struct device *dev, void *data, >> int (*match)(struct device *dev, void *data)); >> to new: >> struct device *device_find_child(struct device *dev, const void *data, >> int (*match)(struct device *dev, const void *data)); >> >> The new API does not allow its match function (*match)() to modify >> caller's match data @*data, but match_free_decoder() as the old API's >> match function indeed modifies relevant match data, so it is not suitable >> for the new API any more, solved by using device_for_each_child() to >> implement relevant finding free cxl decoder function. >> >> By the way, this commit does not change any existing logic. >> >> Suggested-by: Ira Weiny <ira.weiny@intel.com> >> Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> >> --- >> drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++------ >> 1 file changed, 24 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c >> index 21ad5f242875..c2068e90bf2f 100644 >> --- a/drivers/cxl/core/region.c >> +++ b/drivers/cxl/core/region.c >> @@ -794,10 +794,15 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) >> return rc; >> } >> >> +struct cxld_match_data { >> + int id; >> + struct device *target_device; >> +}; >> + >> static int match_free_decoder(struct device *dev, void *data) >> { >> + struct cxld_match_data *match_data = data; >> struct cxl_decoder *cxld; >> - int *id = data; >> >> if (!is_switch_decoder(dev)) >> return 0; >> @@ -805,17 +810,31 @@ static int match_free_decoder(struct device *dev, void *data) >> cxld = to_cxl_decoder(dev); >> >> /* enforce ordered allocation */ >> - if (cxld->id != *id) >> + if (cxld->id != match_data->id) >> return 0; >> >> - if (!cxld->region) >> + if (!cxld->region) { >> + match_data->target_device = get_device(dev); > > Where is put_device() called? > it is called within cxl_region_find_decoder() > Ah, it's on the drop later on after find_free_decoder(), right? yes, it shares the same put_device() which is used for original device_find_child(). > >> return 1; >> + } >> >> - (*id)++; >> + match_data->id++; >> >> return 0; >> } >> >> +/* NOTE: need to drop the reference with put_device() after use. */ >> +static struct device *find_free_decoder(struct device *parent) >> +{ >> + struct cxld_match_data match_data = { >> + .id = 0, >> + .target_device = NULL, >> + }; >> + >> + device_for_each_child(parent, &match_data, match_free_decoder); >> + return match_data.target_device; >> +} >> + >> static int match_auto_decoder(struct device *dev, void *data) >> { >> struct cxl_region_params *p = data; >> @@ -840,7 +859,6 @@ cxl_region_find_decoder(struct cxl_port *port, >> struct cxl_region *cxlr) >> { >> struct device *dev; >> - int id = 0; >> >> if (port == cxled_to_port(cxled)) >> return &cxled->cxld; >> @@ -849,7 +867,7 @@ cxl_region_find_decoder(struct cxl_port *port, >> dev = device_find_child(&port->dev, &cxlr->params, >> match_auto_decoder); >> else >> - dev = device_find_child(&port->dev, &id, match_free_decoder); >> + dev = find_free_decoder(&port->dev); > > This still feels more complex that I think it should be. Why not just > modify the needed device information after the device is found? What > exactly is being changed in the match_free_decoder that needs to keep > "state"? This feels odd. > for match_auto_decoder() original logic, nothing of aim device is modified, it just need to modifies state or @id to find the aim device. > thanks, > > greg k-h
On 2024/9/5 13:32, Greg Kroah-Hartman wrote: > On Thu, Sep 05, 2024 at 08:36:09AM +0800, Zijun Hu wrote: >> From: Zijun Hu <quic_zijuhu@quicinc.com> >> >> To prepare for constifying the following old driver core API: >> >> struct device *device_find_child(struct device *dev, void *data, >> int (*match)(struct device *dev, void *data)); >> to new: >> struct device *device_find_child(struct device *dev, const void *data, >> int (*match)(struct device *dev, const void *data)); >> >> The new API does not allow its match function (*match)() to modify >> caller's match data @*data, but match_free_decoder() as the old API's >> match function indeed modifies relevant match data, so it is not suitable >> for the new API any more, solved by using device_for_each_child() to >> implement relevant finding free cxl decoder function. >> >> By the way, this commit does not change any existing logic. >> >> Suggested-by: Ira Weiny <ira.weiny@intel.com> >> Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> >> --- >> drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++------ >> 1 file changed, 24 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c >> index 21ad5f242875..c2068e90bf2f 100644 >> --- a/drivers/cxl/core/region.c >> +++ b/drivers/cxl/core/region.c >> @@ -794,10 +794,15 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) >> return rc; >> } >> >> +struct cxld_match_data { >> + int id; >> + struct device *target_device; >> +}; >> + >> static int match_free_decoder(struct device *dev, void *data) >> { >> + struct cxld_match_data *match_data = data; >> struct cxl_decoder *cxld; >> - int *id = data; >> >> if (!is_switch_decoder(dev)) >> return 0; >> @@ -805,17 +810,31 @@ static int match_free_decoder(struct device *dev, void *data) >> cxld = to_cxl_decoder(dev); >> >> /* enforce ordered allocation */ >> - if (cxld->id != *id) >> + if (cxld->id != match_data->id) >> return 0; >> >> - if (!cxld->region) >> + if (!cxld->region) { >> + match_data->target_device = get_device(dev); > > Where is put_device() called? > > Ah, it's on the drop later on after find_free_decoder(), right? > >> return 1; >> + } >> >> - (*id)++; >> + match_data->id++; >> >> return 0; >> } >> >> +/* NOTE: need to drop the reference with put_device() after use. */ >> +static struct device *find_free_decoder(struct device *parent) >> +{ >> + struct cxld_match_data match_data = { >> + .id = 0, >> + .target_device = NULL, >> + }; >> + >> + device_for_each_child(parent, &match_data, match_free_decoder); >> + return match_data.target_device; >> +} >> + >> static int match_auto_decoder(struct device *dev, void *data) >> { >> struct cxl_region_params *p = data; >> @@ -840,7 +859,6 @@ cxl_region_find_decoder(struct cxl_port *port, >> struct cxl_region *cxlr) >> { >> struct device *dev; >> - int id = 0; >> >> if (port == cxled_to_port(cxled)) >> return &cxled->cxld; >> @@ -849,7 +867,7 @@ cxl_region_find_decoder(struct cxl_port *port, >> dev = device_find_child(&port->dev, &cxlr->params, >> match_auto_decoder); >> else >> - dev = device_find_child(&port->dev, &id, match_free_decoder); >> + dev = find_free_decoder(&port->dev); > > This still feels more complex that I think it should be. Why not just > modify the needed device information after the device is found? What > exactly is being changed in the match_free_decoder that needs to keep > "state"? This feels odd. > i noticed that state of below exclusive fix have changed to "Handled Elsewhere", that fix can solve our concern as well, let us wait for 2-3 days to see if there are some progress. https://patchwork.kernel.org/project/cxl/patch/20240903-fix_cxld-v1-1-61acba7198ae@quicinc.com/ thanks > thanks, > > greg k-h
Greg Kroah-Hartman wrote: > On Thu, Sep 05, 2024 at 08:36:09AM +0800, Zijun Hu wrote: > > From: Zijun Hu <quic_zijuhu@quicinc.com> > > > > To prepare for constifying the following old driver core API: > > > > struct device *device_find_child(struct device *dev, void *data, > > int (*match)(struct device *dev, void *data)); > > to new: > > struct device *device_find_child(struct device *dev, const void *data, > > int (*match)(struct device *dev, const void *data)); > > > > The new API does not allow its match function (*match)() to modify > > caller's match data @*data, but match_free_decoder() as the old API's > > match function indeed modifies relevant match data, so it is not suitable > > for the new API any more, solved by using device_for_each_child() to > > implement relevant finding free cxl decoder function. > > > > By the way, this commit does not change any existing logic. > > > > Suggested-by: Ira Weiny <ira.weiny@intel.com> > > Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> > > --- > > drivers/cxl/core/region.c | 30 ++++++++++++++++++++++++------ > > 1 file changed, 24 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > > index 21ad5f242875..c2068e90bf2f 100644 > > --- a/drivers/cxl/core/region.c > > +++ b/drivers/cxl/core/region.c > > @@ -794,10 +794,15 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) > > return rc; > > } > > > > +struct cxld_match_data { > > + int id; > > + struct device *target_device; > > +}; > > + > > static int match_free_decoder(struct device *dev, void *data) > > { > > + struct cxld_match_data *match_data = data; > > struct cxl_decoder *cxld; > > - int *id = data; > > > > if (!is_switch_decoder(dev)) > > return 0; > > @@ -805,17 +810,31 @@ static int match_free_decoder(struct device *dev, void *data) > > cxld = to_cxl_decoder(dev); > > > > /* enforce ordered allocation */ > > - if (cxld->id != *id) > > + if (cxld->id != match_data->id) > > return 0; > > > > - if (!cxld->region) > > + if (!cxld->region) { > > + match_data->target_device = get_device(dev); > > Where is put_device() called? > > Ah, it's on the drop later on after find_free_decoder(), right? > > > return 1; > > + } > > > > - (*id)++; > > + match_data->id++; > > > > return 0; > > } > > > > +/* NOTE: need to drop the reference with put_device() after use. */ > > +static struct device *find_free_decoder(struct device *parent) > > +{ > > + struct cxld_match_data match_data = { > > + .id = 0, > > + .target_device = NULL, > > + }; > > + > > + device_for_each_child(parent, &match_data, match_free_decoder); > > + return match_data.target_device; > > +} > > + > > static int match_auto_decoder(struct device *dev, void *data) > > { > > struct cxl_region_params *p = data; > > @@ -840,7 +859,6 @@ cxl_region_find_decoder(struct cxl_port *port, > > struct cxl_region *cxlr) > > { > > struct device *dev; > > - int id = 0; > > > > if (port == cxled_to_port(cxled)) > > return &cxled->cxld; > > @@ -849,7 +867,7 @@ cxl_region_find_decoder(struct cxl_port *port, > > dev = device_find_child(&port->dev, &cxlr->params, > > match_auto_decoder); > > else > > - dev = device_find_child(&port->dev, &id, match_free_decoder); > > + dev = find_free_decoder(&port->dev); > > This still feels more complex that I think it should be. Why not just > modify the needed device information after the device is found? What > exactly is being changed in the match_free_decoder that needs to keep > "state"? This feels odd. Agreed it is odd. How about adding? diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index c2068e90bf2f..5d9017e6f16e 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -814,7 +814,7 @@ static int match_free_decoder(struct device *dev, void *data) return 0; if (!cxld->region) { - match_data->target_device = get_device(dev); + match_data->target_device = dev; return 1; } @@ -853,6 +853,22 @@ static int match_auto_decoder(struct device *dev, void *data) return 0; } +static struct device *find_auto_decoder(struct device *parent, + struct cxl_region *cxlr) +{ + struct device *dev; + + dev = device_find_child(parent, &cxlr->params, match_auto_decoder); + /* + * This decoder is pinned registered as long as the endpoint decoder is + * registered, and endpoint decoder unregistration holds the + * cxl_region_rwsem over unregister events, so no need to hold on to + * this extra reference. + */ + put_device(dev); + return dev; +} + static struct cxl_decoder * cxl_region_find_decoder(struct cxl_port *port, struct cxl_endpoint_decoder *cxled, @@ -864,19 +880,11 @@ cxl_region_find_decoder(struct cxl_port *port, return &cxled->cxld; if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) - dev = device_find_child(&port->dev, &cxlr->params, - match_auto_decoder); + dev = find_auto_decoder(&port->dev, cxlr); else dev = find_free_decoder(&port->dev); if (!dev) return NULL; - /* - * This decoder is pinned registered as long as the endpoint decoder is - * registered, and endpoint decoder unregistration holds the - * cxl_region_rwsem over unregister events, so no need to hold on to - * this extra reference. - */ - put_device(dev); return to_cxl_decoder(dev); }
Ira Weiny wrote: [..] > > This still feels more complex that I think it should be. Why not just > > modify the needed device information after the device is found? What > > exactly is being changed in the match_free_decoder that needs to keep > > "state"? This feels odd. > > Agreed it is odd. > > How about adding? I would prefer just dropping usage of device_find_ or device_for_each_ with storing an array decoders in the port directly. The port already has arrays for dports , endpoints, and regions. Using the "device" APIs to iterate children was a bit lazy, and if the id is used as the array key then a direct lookup makes some cases simpler.
On 9/10/2024 8:45 AM, Dan Williams wrote: > Ira Weiny wrote: > [..] >>> This still feels more complex that I think it should be. Why not just >>> modify the needed device information after the device is found? What >>> exactly is being changed in the match_free_decoder that needs to keep >>> "state"? This feels odd. >> >> Agreed it is odd. >> >> How about adding? > > I would prefer just dropping usage of device_find_ or device_for_each_ > with storing an array decoders in the port directly. The port already > has arrays for dports , endpoints, and regions. Using the "device" APIs > to iterate children was a bit lazy, and if the id is used as the array > key then a direct lookup makes some cases simpler. it seems Ira and Dan have corrected original logic to ensure that all child decoders are sorted by ID in ascending order as shown by below link. https://lore.kernel.org/all/66df666ded3f7_3c80f229439@iweiny-mobl.notmuch/ based on above correction, as shown by my another exclusive fix https://lore.kernel.org/all/20240905-fix_cxld-v2-1-51a520a709e4@quicinc.com/ there are a very simple change to solve the remaining original concern that device_find_child() modifies caller's match data. here is the simple change. --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -797,23 +797,13 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) static int match_free_decoder(struct device *dev, void *data) { struct cxl_decoder *cxld; - int *id = data; if (!is_switch_decoder(dev)) return 0; cxld = to_cxl_decoder(dev); - /* enforce ordered allocation */ - if (cxld->id != *id) - return 0; - - if (!cxld->region) - return 1; - - (*id)++; - - return 0; + return cxld->region ? 0 : 1; } static int match_auto_decoder(struct device *dev, void *data) @@ -840,7 +830,6 @@ cxl_region_find_decoder(struct cxl_port *port, struct cxl_region *cxlr) { struct device *dev; - int id = 0; if (port == cxled_to_port(cxled)) return &cxled->cxld; @@ -849,7 +838,7 @@ cxl_region_find_decoder(struct cxl_port *port, dev = device_find_child(&port->dev, &cxlr->params, match_auto_decoder); else - dev = device_find_child(&port->dev, &id, match_free_decoder); + dev = device_find_child(&port->dev, NULL, match_free_decoder); if (!dev) return NULL;
quic_zijuhu wrote: > On 9/10/2024 8:45 AM, Dan Williams wrote: > > Ira Weiny wrote: > > [..] > >>> This still feels more complex that I think it should be. Why not just > >>> modify the needed device information after the device is found? What > >>> exactly is being changed in the match_free_decoder that needs to keep > >>> "state"? This feels odd. > >> > >> Agreed it is odd. > >> > >> How about adding? > > > > I would prefer just dropping usage of device_find_ or device_for_each_ > > with storing an array decoders in the port directly. The port already > > has arrays for dports , endpoints, and regions. Using the "device" APIs > > to iterate children was a bit lazy, and if the id is used as the array > > key then a direct lookup makes some cases simpler. > > it seems Ira and Dan have corrected original logic to ensure > that all child decoders are sorted by ID in ascending order as shown > by below link. > > https://lore.kernel.org/all/66df666ded3f7_3c80f229439@iweiny-mobl.notmuch/ > > based on above correction, as shown by my another exclusive fix > https://lore.kernel.org/all/20240905-fix_cxld-v2-1-51a520a709e4@quicinc.com/ > there are a very simple change to solve the remaining original concern > that device_find_child() modifies caller's match data. > > here is the simple change. > > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -797,23 +797,13 @@ static size_t show_targetN(struct cxl_region > *cxlr, char *buf, int pos) > static int match_free_decoder(struct device *dev, void *data) > { > struct cxl_decoder *cxld; > - int *id = data; > > if (!is_switch_decoder(dev)) > return 0; > > cxld = to_cxl_decoder(dev); > > - /* enforce ordered allocation */ > - if (cxld->id != *id) > - return 0; > - > - if (!cxld->region) > - return 1; > - > - (*id)++; > - > - return 0; > + return cxld->region ? 0 : 1; So I wanted to write a comment here to stop the next person from tripping over this dependency on decoder 'add' order, but there is a problem. For this simple version to work it needs 3 things: 1/ decoders are added in hardware id order: done, devm_cxl_enumerate_decoders() handles that 2/ search for decoders in their added order: done, device_find_child() guarantees this, although it is not obvious without reading the internals of device_add(). 3/ regions are de-allocated from decoders in reverse decoder id order. This is not enforced, in fact it is impossible to enforce. Consider that any memory device can be removed at any time and may not be removed in the order in which the device allocated switch decoders in the topology. So, that existing comment of needing to enforce ordered allocation is still relevant even though the implementation fails to handle the out-of-order region deallocation problem. I alluded to the need for a "tear down the world" implementation back in 2022 [1], but never got around to finishing that. Now, the cxl_port.hdm_end attribute tracks the "last" decoder to be allocated for endpoint ports. That same tracking needs to be added for switch ports, then this routine could check for ordering constraints by: /* enforce hardware ordered allocation */ if (!cxld->region && port->hdm_end + 1 == cxld->id) return 1; return 0; As it stands now @hdm_end is never updated for switch ports. [1]: 176baefb2eb5 cxl/hdm: Commit decoder state to hardware Yes, that looks simple enough for now, although lets not use a ternary condition and lets leave a comment for the next person: /* decoders are added in hardware id order * (devm_cxl_enumerate_decoders), allocated to regions in id order * (device_find_child() walks children in 'add' order) */ > } > > static int match_auto_decoder(struct device *dev, void *data) > @@ -840,7 +830,6 @@ cxl_region_find_decoder(struct cxl_port *port, > struct cxl_region *cxlr) > { > struct device *dev; > - int id = 0; > > if (port == cxled_to_port(cxled)) > return &cxled->cxld; > @@ -849,7 +838,7 @@ cxl_region_find_decoder(struct cxl_port *port, > dev = device_find_child(&port->dev, &cxlr->params, > match_auto_decoder); > else > - dev = device_find_child(&port->dev, &id, > match_free_decoder); > + dev = device_find_child(&port->dev, NULL, > match_free_decoder); > if (!dev) > return NULL; > >
Dan Williams wrote: [..] > So I wanted to write a comment here to stop the next person from > tripping over this dependency on decoder 'add' order, but there is a > problem. For this simple version to work it needs 3 things: > > 1/ decoders are added in hardware id order: done, > devm_cxl_enumerate_decoders() handles that > > 2/ search for decoders in their added order: done, device_find_child() > guarantees this, although it is not obvious without reading the internals > of device_add(). > > 3/ regions are de-allocated from decoders in reverse decoder id order. > This is not enforced, in fact it is impossible to enforce. Consider that > any memory device can be removed at any time and may not be removed in > the order in which the device allocated switch decoders in the topology. > > So, that existing comment of needing to enforce ordered allocation is > still relevant even though the implementation fails to handle the > out-of-order region deallocation problem. > > I alluded to the need for a "tear down the world" implementation back in > 2022 [1], but never got around to finishing that. > > Now, the cxl_port.hdm_end attribute tracks the "last" decoder to be > allocated for endpoint ports. That same tracking needs to be added for > switch ports, then this routine could check for ordering constraints by: > > /* enforce hardware ordered allocation */ > if (!cxld->region && port->hdm_end + 1 == cxld->id) > return 1; > return 0; > > As it stands now @hdm_end is never updated for switch ports. > > [1]: 176baefb2eb5 cxl/hdm: Commit decoder state to hardware --- cut the reply here --- > Yes, that looks simple enough for now, although lets not use a ternary > condition and lets leave a comment for the next person: > > /* decoders are added in hardware id order > * (devm_cxl_enumerate_decoders), allocated to regions in id order > * (device_find_child() walks children in 'add' order) > */ This is garbage I forgot to delete after realizing there was missing logic to make this simple proposal work in practice.
On 2024/9/10 12:15, Dan Williams wrote: > quic_zijuhu wrote: >> On 9/10/2024 8:45 AM, Dan Williams wrote: >>> Ira Weiny wrote: >>> [..] >>>>> This still feels more complex that I think it should be. Why not just >>>>> modify the needed device information after the device is found? What >>>>> exactly is being changed in the match_free_decoder that needs to keep >>>>> "state"? This feels odd. >>>> >>>> Agreed it is odd. >>>> >>>> How about adding? >>> >>> I would prefer just dropping usage of device_find_ or device_for_each_ >>> with storing an array decoders in the port directly. The port already >>> has arrays for dports , endpoints, and regions. Using the "device" APIs >>> to iterate children was a bit lazy, and if the id is used as the array >>> key then a direct lookup makes some cases simpler. >> >> it seems Ira and Dan have corrected original logic to ensure >> that all child decoders are sorted by ID in ascending order as shown >> by below link. >> >> https://lore.kernel.org/all/66df666ded3f7_3c80f229439@iweiny-mobl.notmuch/ >> >> based on above correction, as shown by my another exclusive fix >> https://lore.kernel.org/all/20240905-fix_cxld-v2-1-51a520a709e4@quicinc.com/ >> there are a very simple change to solve the remaining original concern >> that device_find_child() modifies caller's match data. >> >> here is the simple change. >> >> --- a/drivers/cxl/core/region.c >> +++ b/drivers/cxl/core/region.c >> @@ -797,23 +797,13 @@ static size_t show_targetN(struct cxl_region >> *cxlr, char *buf, int pos) >> static int match_free_decoder(struct device *dev, void *data) >> { >> struct cxl_decoder *cxld; >> - int *id = data; >> >> if (!is_switch_decoder(dev)) >> return 0; >> >> cxld = to_cxl_decoder(dev); >> >> - /* enforce ordered allocation */ >> - if (cxld->id != *id) >> - return 0; >> - >> - if (!cxld->region) >> - return 1; >> - >> - (*id)++; >> - >> - return 0; >> + return cxld->region ? 0 : 1; > > So I wanted to write a comment here to stop the next person from > tripping over this dependency on decoder 'add' order, but there is a > problem. For this simple version to work it needs 3 things: > > 1/ decoders are added in hardware id order: done, > devm_cxl_enumerate_decoders() handles that > do not known how you achieve it, perhaps, it is not simpler than my below solution: finding a free switch cxl decoder with minimal ID https://lore.kernel.org/all/20240905-fix_cxld-v2-1-51a520a709e4@quicinc.com/ which has simple logic and also does not have any limitation related to add/allocate/de-allocate a decoder. i am curious why not to consider this solution ? > 2/ search for decoders in their added order: done, device_find_child() > guarantees this, although it is not obvious without reading the internals > of device_add(). > > 3/ regions are de-allocated from decoders in reverse decoder id order. > This is not enforced, in fact it is impossible to enforce. Consider that > any memory device can be removed at any time and may not be removed in > the order in which the device allocated switch decoders in the topology. > sorry, don't understand, could you take a example ? IMO, the simple change in question will always get a free decoder with the minimal ID once 1/ is ensured regardless of de-allocation approach. > So, that existing comment of needing to enforce ordered allocation is > still relevant even though the implementation fails to handle the > out-of-order region deallocation problem. > > I alluded to the need for a "tear down the world" implementation back in > 2022 [1], but never got around to finishing that. > > Now, the cxl_port.hdm_end attribute tracks the "last" decoder to be > allocated for endpoint ports. That same tracking needs to be added for > switch ports, then this routine could check for ordering constraints by: > > /* enforce hardware ordered allocation */ > if (!cxld->region && port->hdm_end + 1 == cxld->id) > return 1; > return 0; > > As it stands now @hdm_end is never updated for switch ports. > > [1]: 176baefb2eb5 cxl/hdm: Commit decoder state to hardware > > > > > > > > Yes, that looks simple enough for now, although lets not use a ternary > condition and lets leave a comment for the next person: > > /* decoders are added in hardware id order > * (devm_cxl_enumerate_decoders), allocated to regions in id order > * (device_find_child() walks children in 'add' order) > */ >> } >> >> static int match_auto_decoder(struct device *dev, void *data) >> @@ -840,7 +830,6 @@ cxl_region_find_decoder(struct cxl_port *port, >> struct cxl_region *cxlr) >> { >> struct device *dev; >> - int id = 0; >> >> if (port == cxled_to_port(cxled)) >> return &cxled->cxld; >> @@ -849,7 +838,7 @@ cxl_region_find_decoder(struct cxl_port *port, >> dev = device_find_child(&port->dev, &cxlr->params, >> match_auto_decoder); >> else >> - dev = device_find_child(&port->dev, &id, >> match_free_decoder); >> + dev = device_find_child(&port->dev, NULL, >> match_free_decoder); >> if (!dev) >> return NULL; >> >> > >
Zijun Hu wrote: [..] > > So I wanted to write a comment here to stop the next person from > > tripping over this dependency on decoder 'add' order, but there is a > > problem. For this simple version to work it needs 3 things: > > > > 1/ decoders are added in hardware id order: done, > > devm_cxl_enumerate_decoders() handles that > > > > do not known how you achieve it, perhaps, it is not simpler than > my below solution: > > finding a free switch cxl decoder with minimal ID > https://lore.kernel.org/all/20240905-fix_cxld-v2-1-51a520a709e4@quicinc.com/ > > which has simple logic and also does not have any limitation related > to add/allocate/de-allocate a decoder. > > i am curious why not to consider this solution ? Because it leaves region shutdown ordering bug in place. > > 2/ search for decoders in their added order: done, device_find_child() > > guarantees this, although it is not obvious without reading the internals > > of device_add(). > > > > 3/ regions are de-allocated from decoders in reverse decoder id order. > > This is not enforced, in fact it is impossible to enforce. Consider that > > any memory device can be removed at any time and may not be removed in > > the order in which the device allocated switch decoders in the topology. > > > > sorry, don't understand, could you take a example ? > > IMO, the simple change in question will always get a free decoder with > the minimal ID once 1/ is ensured regardless of de-allocation approach. No, you are missing the fact that CXL hardware requires that decoders cannot be sparsely allocated. They must be allocated consecutively and in increasing address order. Imagine a scenario with a switch port with three decoders, decoder{A,B,C} allocated to 3 respective regions region{A,B,C}. If regionB is destroyed due to device removal that does not make decoderB free to be reallocated in hardware. The destruction of regionB requires regionC to be torn down first. As it stands the driver does not force regionC to shutdown and it falsely clears @decoderB->region making it appear free prematurely. So, while regionB would be the next decoder to allocate after regionC is torn down, it is not a free decoder while decoderC and regionC have not been reclaimed.
Dan Williams wrote: [..] > So, while regionB would be the next decoder to allocate after regionC is > torn down, it is not a free decoder while decoderC and regionC have not been > reclaimed. The "simple" conversion is bug compatible with the current implementation, but here's a path to both constify the device_find_child() argument, *and* prevent unwanted allocations by allocating decoders precisely by id. Something like this, it passes a quick unit test run: diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 1d5007e3795a..749a281819b4 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1750,7 +1750,8 @@ static int cxl_decoder_init(struct cxl_port *port, struct cxl_decoder *cxld) struct device *dev; int rc; - rc = ida_alloc(&port->decoder_ida, GFP_KERNEL); + rc = ida_alloc_max(&port->decoder_ida, CXL_DECODER_NR_MAX - 1, + GFP_KERNEL); if (rc < 0) return rc; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 21ad5f242875..1f7b3a9ebfa3 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -794,26 +794,16 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) return rc; } -static int match_free_decoder(struct device *dev, void *data) +static int match_decoder_id(struct device *dev, void *data) { struct cxl_decoder *cxld; - int *id = data; + int id = *(int *) data; if (!is_switch_decoder(dev)) return 0; cxld = to_cxl_decoder(dev); - - /* enforce ordered allocation */ - if (cxld->id != *id) - return 0; - - if (!cxld->region) - return 1; - - (*id)++; - - return 0; + return cxld->id == id; } static int match_auto_decoder(struct device *dev, void *data) @@ -840,16 +830,29 @@ cxl_region_find_decoder(struct cxl_port *port, struct cxl_region *cxlr) { struct device *dev; - int id = 0; - if (port == cxled_to_port(cxled)) return &cxled->cxld; if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) dev = device_find_child(&port->dev, &cxlr->params, match_auto_decoder); - else - dev = device_find_child(&port->dev, &id, match_free_decoder); + else { + int id, last; + + /* + * Find next available decoder, but fail new decoder + * allocations if out-of-order region destruction has + * occurred + */ + id = find_first_zero_bit(port->decoder_alloc, + CXL_DECODER_NR_MAX); + last = find_last_bit(port->decoder_alloc, CXL_DECODER_NR_MAX); + + if (id >= CXL_DECODER_NR_MAX || + (last < CXL_DECODER_NR_MAX && id != last + 1)) + return NULL; + dev = device_find_child(&port->dev, &id, match_decoder_id); + } if (!dev) return NULL; /* @@ -943,6 +946,9 @@ static void cxl_rr_free_decoder(struct cxl_region_ref *cxl_rr) dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n"); if (cxld->region == cxlr) { + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + + clear_bit(cxld->id, port->decoder_alloc); cxld->region = NULL; put_device(&cxlr->dev); } @@ -977,6 +983,7 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr, cxl_rr->nr_eps++; if (!cxld->region) { + set_bit(cxld->id, port->decoder_alloc); cxld->region = cxlr; get_device(&cxlr->dev); } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 9afb407d438f..750cd027d0b0 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -578,6 +578,9 @@ struct cxl_dax_region { struct range hpa_range; }; +/* Max as of CXL 3.1 (8.2.4.20.1 CXL HDM Decoder Capability Register) */ +#define CXL_DECODER_NR_MAX 32 + /** * struct cxl_port - logical collection of upstream port devices and * downstream port devices to construct a CXL memory @@ -591,6 +594,7 @@ struct cxl_dax_region { * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids + * @decoder_alloc: decoder busy/free (@cxld->region set) bitmap * @reg_map: component and ras register mapping parameters * @nr_dports: number of entries in @dports * @hdm_end: track last allocated HDM decoder instance for allocation ordering @@ -611,6 +615,7 @@ struct cxl_port { struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; + DECLARE_BITMAP(decoder_alloc, CXL_DECODER_NR_MAX); struct cxl_register_map reg_map; int nr_dports; int hdm_end;
On 2024/9/11 00:01, Dan Williams wrote: > Zijun Hu wrote: > [..] >> >> do not known how you achieve it, perhaps, it is not simpler than >> my below solution: >> >> finding a free switch cxl decoder with minimal ID >> https://lore.kernel.org/all/20240905-fix_cxld-v2-1-51a520a709e4@quicinc.com/ >> >> which has simple logic and also does not have any limitation related >> to add/allocate/de-allocate a decoder. >> >> i am curious why not to consider this solution ? > > Because it leaves region shutdown ordering bug in place. > >>> 2/ search for decoders in their added order: done, device_find_child() >>> guarantees this, although it is not obvious without reading the internals >>> of device_add(). >>> >>> 3/ regions are de-allocated from decoders in reverse decoder id order. >>> This is not enforced, in fact it is impossible to enforce. Consider that >>> any memory device can be removed at any time and may not be removed in >>> the order in which the device allocated switch decoders in the topology. >>> >> >> sorry, don't understand, could you take a example ? >> >> IMO, the simple change in question will always get a free decoder with >> the minimal ID once 1/ is ensured regardless of de-allocation approach. > > No, you are missing the fact that CXL hardware requires that decoders > cannot be sparsely allocated. They must be allocated consecutively and > in increasing address order. > > Imagine a scenario with a switch port with three decoders, > decoder{A,B,C} allocated to 3 respective regions region{A,B,C}. > > If regionB is destroyed due to device removal that does not make > decoderB free to be reallocated in hardware. The destruction of regionB > requires regionC to be torn down first. As it stands the driver does not > force regionC to shutdown and it falsely clears @decoderB->region making > it appear free prematurely. > > So, while regionB would be the next decoder to allocate after regionC is > torn down, it is not a free decoder while decoderC and regionC have not been > reclaimed. understood it due to your detailed explanation. thank you Dan.
On 2024/9/11 02:27, Dan Williams wrote: > Dan Williams wrote: > [..] >> So, while regionB would be the next decoder to allocate after regionC is >> torn down, it is not a free decoder while decoderC and regionC have not been >> reclaimed. > > The "simple" conversion is bug compatible with the current > implementation, but here's a path to both constify the > device_find_child() argument, *and* prevent unwanted allocations by > allocating decoders precisely by id. Something like this, it passes a > quick unit test run: > sounds good. > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 1d5007e3795a..749a281819b4 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1750,7 +1750,8 @@ static int cxl_decoder_init(struct cxl_port *port, struct cxl_decoder *cxld) > struct device *dev; > int rc; > > - rc = ida_alloc(&port->decoder_ida, GFP_KERNEL); > + rc = ida_alloc_max(&port->decoder_ida, CXL_DECODER_NR_MAX - 1, > + GFP_KERNEL); > if (rc < 0) > return rc; > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 21ad5f242875..1f7b3a9ebfa3 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -794,26 +794,16 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) > return rc; > } > > -static int match_free_decoder(struct device *dev, void *data) > +static int match_decoder_id(struct device *dev, void *data) > { > struct cxl_decoder *cxld; > - int *id = data; > + int id = *(int *) data; > > if (!is_switch_decoder(dev)) > return 0; > > cxld = to_cxl_decoder(dev); > - > - /* enforce ordered allocation */ > - if (cxld->id != *id) > - return 0; > - > - if (!cxld->region) > - return 1; > - > - (*id)++; > - > - return 0; > + return cxld->id == id; > } > > static int match_auto_decoder(struct device *dev, void *data) > @@ -840,16 +830,29 @@ cxl_region_find_decoder(struct cxl_port *port, > struct cxl_region *cxlr) > { > struct device *dev; > - int id = 0; > - > if (port == cxled_to_port(cxled)) > return &cxled->cxld; > > if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) > dev = device_find_child(&port->dev, &cxlr->params, > match_auto_decoder); > - else > - dev = device_find_child(&port->dev, &id, match_free_decoder); > + else { > + int id, last; > + > + /* > + * Find next available decoder, but fail new decoder > + * allocations if out-of-order region destruction has > + * occurred > + */ > + id = find_first_zero_bit(port->decoder_alloc, > + CXL_DECODER_NR_MAX); > + last = find_last_bit(port->decoder_alloc, CXL_DECODER_NR_MAX); > + > + if (id >= CXL_DECODER_NR_MAX || > + (last < CXL_DECODER_NR_MAX && id != last + 1)) > + return NULL; Above finding logic seems wrong. what about below one ? last = find_last_bit(port->decoder_alloc, CXL_DECODER_NR_MAX); if (last >= CXL_DECODER_NR_MAX) id = 0; else if (last + 1 < CXL_DECODER_NR_MAX) id = last + 1; else return NULL; > + dev = device_find_child(&port->dev, &id, match_decoder_id); > + } > if (!dev) > return NULL; > /* > @@ -943,6 +946,9 @@ static void cxl_rr_free_decoder(struct cxl_region_ref *cxl_rr) > > dev_WARN_ONCE(&cxlr->dev, cxld->region != cxlr, "region mismatch\n"); > if (cxld->region == cxlr) { > + struct cxl_port *port = to_cxl_port(cxld->dev.parent); > + > + clear_bit(cxld->id, port->decoder_alloc); > cxld->region = NULL; > put_device(&cxlr->dev); > } > @@ -977,6 +983,7 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr, > cxl_rr->nr_eps++; > > if (!cxld->region) { > + set_bit(cxld->id, port->decoder_alloc); > cxld->region = cxlr; > get_device(&cxlr->dev); > } > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 9afb407d438f..750cd027d0b0 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -578,6 +578,9 @@ struct cxl_dax_region { > struct range hpa_range; > }; > > +/* Max as of CXL 3.1 (8.2.4.20.1 CXL HDM Decoder Capability Register) */ > +#define CXL_DECODER_NR_MAX 32 > + > /** > * struct cxl_port - logical collection of upstream port devices and > * downstream port devices to construct a CXL memory > @@ -591,6 +594,7 @@ struct cxl_dax_region { > * @regions: cxl_region_ref instances, regions mapped by this port > * @parent_dport: dport that points to this port in the parent > * @decoder_ida: allocator for decoder ids > + * @decoder_alloc: decoder busy/free (@cxld->region set) bitmap > * @reg_map: component and ras register mapping parameters > * @nr_dports: number of entries in @dports > * @hdm_end: track last allocated HDM decoder instance for allocation ordering > @@ -611,6 +615,7 @@ struct cxl_port { > struct xarray regions; > struct cxl_dport *parent_dport; > struct ida decoder_ida; > + DECLARE_BITMAP(decoder_alloc, CXL_DECODER_NR_MAX); > struct cxl_register_map reg_map; > int nr_dports; > int hdm_end;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 21ad5f242875..c2068e90bf2f 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -794,10 +794,15 @@ static size_t show_targetN(struct cxl_region *cxlr, char *buf, int pos) return rc; } +struct cxld_match_data { + int id; + struct device *target_device; +}; + static int match_free_decoder(struct device *dev, void *data) { + struct cxld_match_data *match_data = data; struct cxl_decoder *cxld; - int *id = data; if (!is_switch_decoder(dev)) return 0; @@ -805,17 +810,31 @@ static int match_free_decoder(struct device *dev, void *data) cxld = to_cxl_decoder(dev); /* enforce ordered allocation */ - if (cxld->id != *id) + if (cxld->id != match_data->id) return 0; - if (!cxld->region) + if (!cxld->region) { + match_data->target_device = get_device(dev); return 1; + } - (*id)++; + match_data->id++; return 0; } +/* NOTE: need to drop the reference with put_device() after use. */ +static struct device *find_free_decoder(struct device *parent) +{ + struct cxld_match_data match_data = { + .id = 0, + .target_device = NULL, + }; + + device_for_each_child(parent, &match_data, match_free_decoder); + return match_data.target_device; +} + static int match_auto_decoder(struct device *dev, void *data) { struct cxl_region_params *p = data; @@ -840,7 +859,6 @@ cxl_region_find_decoder(struct cxl_port *port, struct cxl_region *cxlr) { struct device *dev; - int id = 0; if (port == cxled_to_port(cxled)) return &cxled->cxld; @@ -849,7 +867,7 @@ cxl_region_find_decoder(struct cxl_port *port, dev = device_find_child(&port->dev, &cxlr->params, match_auto_decoder); else - dev = device_find_child(&port->dev, &id, match_free_decoder); + dev = find_free_decoder(&port->dev); if (!dev) return NULL; /*