Message ID | 170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3 |
---|---|
State | Accepted |
Commit | 36a1c2ee50f573972aea3c3019555f47ee0094c0 |
Headers | show |
Series | [v3] cxl: Add cxl_region_rwsem around commit_end write during decoder enumeration | expand |
On Fri, 17 Nov 2023, Dave Jiang wrote: >init_hdm_decoder() modifies port->commit_end without taking the >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). >However looking at the code, it looks like the write version of the rwsem >needs to be taken due to the modification of commit_end. Wrap the write >version of the rwsem around reading and writing of commit_end. > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") >Signed-off-by: Dave Jiang <dave.jiang@intel.com> Uhmm but do we expect concurrency during the switch/port probing phase? Thanks, Davidlohr
On 11/20/23 09:04, Davidlohr Bueso wrote: > On Fri, 17 Nov 2023, Dave Jiang wrote: > >> init_hdm_decoder() modifies port->commit_end without taking the >> cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). >> However looking at the code, it looks like the write version of the rwsem >> needs to be taken due to the modification of commit_end. Wrap the write >> version of the rwsem around reading and writing of commit_end. >> >> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > Uhmm but do we expect concurrency during the switch/port probing phase? I would assume so, which makes the protection necessary no? > > Thanks, > Davidlohr
Davidlohr Bueso wrote: > On Fri, 17 Nov 2023, Dave Jiang wrote: > > >init_hdm_decoder() modifies port->commit_end without taking the > >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). > >However looking at the code, it looks like the write version of the rwsem > >needs to be taken due to the modification of commit_end. Wrap the write > >version of the rwsem around reading and writing of commit_end. > > > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > >Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > Uhmm but do we expect concurrency during the switch/port probing phase? I answered that that detail here in response to Fan: http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch The takeaway is: "I am in favor of adding the locking rather than depend on a subtle side-effect of how CXL operates, and to avoid adding an unlocked version of cxl_num_decoders_committed()" I will add a note to the changelog to that effect, and drop Fixes: since the lock is not strictly required for correctness in this path.
Dan Williams wrote: > Davidlohr Bueso wrote: > > On Fri, 17 Nov 2023, Dave Jiang wrote: > > > > >init_hdm_decoder() modifies port->commit_end without taking the > > >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). > > >However looking at the code, it looks like the write version of the rwsem > > >needs to be taken due to the modification of commit_end. Wrap the write > > >version of the rwsem around reading and writing of commit_end. > > > > > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > > >Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > > > Uhmm but do we expect concurrency during the switch/port probing phase? > > I answered that that detail here in response to Fan: > > http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch > > The takeaway is: > > "I am in favor of adding the locking rather than depend on a subtle > side-effect of how CXL operates, and to avoid adding an unlocked version > of cxl_num_decoders_committed()" > > I will add a note to the changelog to that effect, and drop Fixes: since > the lock is not strictly required for correctness in this path. I rewrote the changelog to this: Author: Dave Jiang <dave.jiang@intel.com> Date: Fri Nov 17 13:18:48 2023 -0700 cxl/hdm: Fix a benign lockdep splat The new helper "cxl_num_decoders_committed()" added a lockdep assertion to validate that port->commit_end is protected against modification. That assertion fires in init_hdm_decoder() where it is initializing port->commit_end. Given that it is both accessing and writing that property it obstensibly needs the lock. In practice, CXL decoder commit rules (must commit in order) and the in-order discovery of device decoders makes the manipulation of ->commit_end in init_hdm_decoder() safe. However, rather than rely on the subtle rules of CXL hardware, just make the implementation obviously correct from a software perspective. The Fixes: tag is only for cleaning up a lockdep splat, there is no functional issue addressed by this fix. Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
On 11/22/23 18:02, Dan Williams wrote: > Dan Williams wrote: >> Davidlohr Bueso wrote: >>> On Fri, 17 Nov 2023, Dave Jiang wrote: >>> >>>> init_hdm_decoder() modifies port->commit_end without taking the >>>> cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). >>>> However looking at the code, it looks like the write version of the rwsem >>>> needs to be taken due to the modification of commit_end. Wrap the write >>>> version of the rwsem around reading and writing of commit_end. >>>> >>>> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") >>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >>> >>> Uhmm but do we expect concurrency during the switch/port probing phase? >> >> I answered that that detail here in response to Fan: >> >> http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch >> >> The takeaway is: >> >> "I am in favor of adding the locking rather than depend on a subtle >> side-effect of how CXL operates, and to avoid adding an unlocked version >> of cxl_num_decoders_committed()" >> >> I will add a note to the changelog to that effect, and drop Fixes: since >> the lock is not strictly required for correctness in this path. > > I rewrote the changelog to this: > > Author: Dave Jiang <dave.jiang@intel.com> > Date: Fri Nov 17 13:18:48 2023 -0700 > > cxl/hdm: Fix a benign lockdep splat > > The new helper "cxl_num_decoders_committed()" added a lockdep assertion > to validate that port->commit_end is protected against modification. > That assertion fires in init_hdm_decoder() where it is initializing > port->commit_end. Given that it is both accessing and writing that > property it obstensibly needs the lock. > > In practice, CXL decoder commit rules (must commit in order) and the > in-order discovery of device decoders makes the manipulation of > ->commit_end in init_hdm_decoder() safe. However, rather than rely on > the subtle rules of CXL hardware, just make the implementation obviously > correct from a software perspective. > > The Fixes: tag is only for cleaning up a lockdep splat, there is no > functional issue addressed by this fix. > > Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Thanks. LGTM
On Wed, 22 Nov 2023, Dan Williams wrote: >Dan Williams wrote: >> Davidlohr Bueso wrote: >> > On Fri, 17 Nov 2023, Dave Jiang wrote: >> > >> > >init_hdm_decoder() modifies port->commit_end without taking the >> > >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). >> > >However looking at the code, it looks like the write version of the rwsem >> > >needs to be taken due to the modification of commit_end. Wrap the write >> > >version of the rwsem around reading and writing of commit_end. >> > > >> > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") >> > >Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> > >> > Uhmm but do we expect concurrency during the switch/port probing phase? >> >> I answered that that detail here in response to Fan: >> >> http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch >> >> The takeaway is: >> >> "I am in favor of adding the locking rather than depend on a subtle >> side-effect of how CXL operates, and to avoid adding an unlocked version >> of cxl_num_decoders_committed()" >> >> I will add a note to the changelog to that effect, and drop Fixes: since >> the lock is not strictly required for correctness in this path. > >I rewrote the changelog to this: Thanks, this clarified my doubt. > >Author: Dave Jiang <dave.jiang@intel.com> >Date: Fri Nov 17 13:18:48 2023 -0700 > > cxl/hdm: Fix a benign lockdep splat > > The new helper "cxl_num_decoders_committed()" added a lockdep assertion > to validate that port->commit_end is protected against modification. > That assertion fires in init_hdm_decoder() where it is initializing > port->commit_end. Given that it is both accessing and writing that > property it obstensibly needs the lock. > > In practice, CXL decoder commit rules (must commit in order) and the > in-order discovery of device decoders makes the manipulation of > ->commit_end in init_hdm_decoder() safe. However, rather than rely on > the subtle rules of CXL hardware, just make the implementation obviously > correct from a software perspective. > > The Fixes: tag is only for cleaning up a lockdep splat, there is no > functional issue addressed by this fix. > > Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") >
Davidlohr Bueso wrote: > On Wed, 22 Nov 2023, Dan Williams wrote: > > >Dan Williams wrote: > >> Davidlohr Bueso wrote: > >> > On Fri, 17 Nov 2023, Dave Jiang wrote: > >> > > >> > >init_hdm_decoder() modifies port->commit_end without taking the > >> > >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). > >> > >However looking at the code, it looks like the write version of the rwsem > >> > >needs to be taken due to the modification of commit_end. Wrap the write > >> > >version of the rwsem around reading and writing of commit_end. > >> > > > >> > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > >> > >Signed-off-by: Dave Jiang <dave.jiang@intel.com> > >> > > >> > Uhmm but do we expect concurrency during the switch/port probing phase? > >> > >> I answered that that detail here in response to Fan: > >> > >> http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch > >> > >> The takeaway is: > >> > >> "I am in favor of adding the locking rather than depend on a subtle > >> side-effect of how CXL operates, and to avoid adding an unlocked version > >> of cxl_num_decoders_committed()" > >> > >> I will add a note to the changelog to that effect, and drop Fixes: since > >> the lock is not strictly required for correctness in this path. > > > >I rewrote the changelog to this: > > Thanks, this clarified my doubt. Converted this ^^^ into an "Acked-by: Davidlohr Bueso <dave@stgolabs.net>" if that's alright.
On Wed, 29 Nov 2023, Dan Williams wrote: >Converted this ^^^ into an "Acked-by: Davidlohr Bueso ><dave@stgolabs.net>" if that's alright. Sure.
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 1cc9be85ba4c..529baa8a1759 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -839,6 +839,8 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, cxld->target_type = CXL_DECODER_HOSTONLYMEM; else cxld->target_type = CXL_DECODER_DEVMEM; + + guard(rwsem_write)(&cxl_region_rwsem); if (cxld->id != cxl_num_decoders_committed(port)) { dev_warn(&port->dev, "decoder%d.%d: Committed out of order\n",
init_hdm_decoder() modifies port->commit_end without taking the cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). However looking at the code, it looks like the write version of the rwsem needs to be taken due to the modification of commit_end. Wrap the write version of the rwsem around reading and writing of commit_end. Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- v3: - Just do the one line change. (Dan) v2: - Add changes from Ira and moving the lock block - Add suggestion from Dan on using guard() --- drivers/cxl/core/hdm.c | 2 ++ 1 file changed, 2 insertions(+)