Message ID | 169948110840.509375.13862681045079385425.stgit@djiang5-mobl3 |
---|---|
State | Superseded |
Headers | show |
Series | cxl: Add cxl_region_rwsem around commit_end write during decoder enumeration | expand |
On Wed, Nov 08, 2023 at 03:05:08PM -0700, Dave Jiang wrote: > init_hdm_decoder() modifies port->commit_end without taking the > cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). > However looking at the code, it looks like the write version of the rwsem > needs to be taken due to the modification of commit_end. Wrap the write > version of the rwsem around reading and writing of commit_end. > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/core/hdm.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index bc8ad4a8afca..a8f960c496cb 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -854,10 +854,12 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > cxld->target_type = CXL_DECODER_HOSTONLYMEM; > else > cxld->target_type = CXL_DECODER_DEVMEM; > + down_write(&cxl_region_rwsem); > if (cxld->id != cxl_num_decoders_committed(port)) { > dev_warn(&port->dev, > "decoder%d.%d: Committed out of order\n", > port->id, cxld->id); > + up_write(&cxl_region_rwsem); > return -ENXIO; > } > > @@ -865,9 +867,11 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > dev_warn(&port->dev, > "decoder%d.%d: Committed with zero size\n", > port->id, cxld->id); > + up_write(&cxl_region_rwsem); > return -ENXIO; > } > port->commit_end = cxld->id; > + up_write(&cxl_region_rwsem); > } else { > if (cxled) { > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > I have two questions: 1. There are some other locations in the code where commit_end has been read or updated, and not guarded by a lock, for example in cxl_decoder_commit. Do they need to be protected by cxl_region_rwsem also? 2. Can we have a race condition here in init_hdm_decoder so commit_end need to be protected? Fan
Dave Jiang wrote: > init_hdm_decoder() modifies port->commit_end without taking the > cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). > However looking at the code, it looks like the write version of the rwsem > needs to be taken due to the modification of commit_end. Wrap the write > version of the rwsem around reading and writing of commit_end. > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/core/hdm.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index bc8ad4a8afca..a8f960c496cb 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -854,10 +854,12 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > cxld->target_type = CXL_DECODER_HOSTONLYMEM; > else > cxld->target_type = CXL_DECODER_DEVMEM; > + down_write(&cxl_region_rwsem); > if (cxld->id != cxl_num_decoders_committed(port)) { > dev_warn(&port->dev, > "decoder%d.%d: Committed out of order\n", > port->id, cxld->id); > + up_write(&cxl_region_rwsem); > return -ENXIO; > } > > @@ -865,9 +867,11 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > dev_warn(&port->dev, > "decoder%d.%d: Committed with zero size\n", > port->id, cxld->id); > + up_write(&cxl_region_rwsem); > return -ENXIO; > } > port->commit_end = cxld->id; > + up_write(&cxl_region_rwsem); > } else { > if (cxled) { > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > This part of the patch seemed unbalanced to me. Looking at the check for size == 0 I feel like it would be better to do that check first outside of the lock. Something like below. Ira 14:36:11 > git di --patience diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 1cc9be85ba4c..370ba5e39363 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -839,12 +839,6 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, cxld->target_type = CXL_DECODER_HOSTONLYMEM; else cxld->target_type = CXL_DECODER_DEVMEM; - if (cxld->id != cxl_num_decoders_committed(port)) { - dev_warn(&port->dev, - "decoder%d.%d: Committed out of order\n", - port->id, cxld->id); - return -ENXIO; - } if (size == 0) { dev_warn(&port->dev, @@ -852,7 +846,17 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, port->id, cxld->id); return -ENXIO; } + + down_write(&cxl_region_rwsem); + if (cxld->id != cxl_num_decoders_committed(port)) { + dev_warn(&port->dev, + "decoder%d.%d: Committed out of order\n", + port->id, cxld->id); + up_write(&cxl_region_rwsem); + return -ENXIO; + } port->commit_end = cxld->id; + up_write(&cxl_region_rwsem); } else { if (cxled) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
[ add Peter ] Ira Weiny wrote: > Dave Jiang wrote: > > init_hdm_decoder() modifies port->commit_end without taking the > > cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). > > However looking at the code, it looks like the write version of the rwsem > > needs to be taken due to the modification of commit_end. Wrap the write > > version of the rwsem around reading and writing of commit_end. > > > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> [..] > > This part of the patch seemed unbalanced to me. Looking at the check for size > == 0 I feel like it would be better to do that check first outside of the lock. > > Something like below. > > Ira > > 14:36:11 > git di --patience > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index 1cc9be85ba4c..370ba5e39363 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -839,12 +839,6 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > cxld->target_type = CXL_DECODER_HOSTONLYMEM; > else > cxld->target_type = CXL_DECODER_DEVMEM; > - if (cxld->id != cxl_num_decoders_committed(port)) { > - dev_warn(&port->dev, > - "decoder%d.%d: Committed out of order\n", > - port->id, cxld->id); > - return -ENXIO; > - } > > if (size == 0) { > dev_warn(&port->dev, > @@ -852,7 +846,17 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > port->id, cxld->id); > return -ENXIO; > } > + > + down_write(&cxl_region_rwsem); > + if (cxld->id != cxl_num_decoders_committed(port)) { > + dev_warn(&port->dev, > + "decoder%d.%d: Committed out of order\n", > + port->id, cxld->id); > + up_write(&cxl_region_rwsem); > + return -ENXIO; > + } > port->commit_end = cxld->id; > + up_write(&cxl_region_rwsem); > } else { > if (cxled) { > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); That is an improvement, but I was hoping the era of remembering to unlock on error paths was over. I came up with this oneliner, but wondered if Peter acks this from a coding style perpective: diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 1cc9be85ba4c..6452b10b80c0 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -831,7 +831,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, }; /* decoders are enabled if committed */ - if (committed) { + if (committed) scoped_guard(rwsem_write, &cxl_region_rwsem) { cxld->flags |= CXL_DECODER_F_ENABLE; if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK) cxld->flags |= CXL_DECODER_F_LOCK; The alternative is a much more violent re-ident of the code to turn: if (...) scoped_guard(...) { ... } else { ... } ...into: if (...) { scoped_guard(...) { ... } } else { ... } I know the coding-style does not allow: if (...) for (...) { ...but there are some instances of that pattern in the tree.
fan wrote: > On Wed, Nov 08, 2023 at 03:05:08PM -0700, Dave Jiang wrote: > > init_hdm_decoder() modifies port->commit_end without taking the > > cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). > > However looking at the code, it looks like the write version of the rwsem > > needs to be taken due to the modification of commit_end. Wrap the write > > version of the rwsem around reading and writing of commit_end. > > > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > > --- > > drivers/cxl/core/hdm.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > > index bc8ad4a8afca..a8f960c496cb 100644 > > --- a/drivers/cxl/core/hdm.c > > +++ b/drivers/cxl/core/hdm.c > > @@ -854,10 +854,12 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > > cxld->target_type = CXL_DECODER_HOSTONLYMEM; > > else > > cxld->target_type = CXL_DECODER_DEVMEM; > > + down_write(&cxl_region_rwsem); > > if (cxld->id != cxl_num_decoders_committed(port)) { > > dev_warn(&port->dev, > > "decoder%d.%d: Committed out of order\n", > > port->id, cxld->id); > > + up_write(&cxl_region_rwsem); > > return -ENXIO; > > } > > > > @@ -865,9 +867,11 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, > > dev_warn(&port->dev, > > "decoder%d.%d: Committed with zero size\n", > > port->id, cxld->id); > > + up_write(&cxl_region_rwsem); > > return -ENXIO; > > } > > port->commit_end = cxld->id; > > + up_write(&cxl_region_rwsem); > > } else { > > if (cxled) { > > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > > > > > > I have two questions: > > 1. There are some other locations in the code where commit_end has been > read or updated, and not guarded by a lock, for example in > cxl_decoder_commit. Do they need to be protected by cxl_region_rwsem > also? cxl_decoder_commit() is already called under the region rwsem, but yes at least the poison lookup code is currently missing the lock. > 2. Can we have a race condition here in init_hdm_decoder so commit_end > need to be protected? In this case it seems to be accidentally protected by the fact that decoders must be committed in order and the discovery happens in order, so it is difficult to see a path for a region commit / decommit operation racing this update of the committed decoders. That said, I am in favor of adding the locking rather than depend on a subtle side-effect of how CXL operates, and to avoid adding an unlocked version of cxl_num_decoders_committed().
On Fri, Nov 10, 2023 at 12:21:56PM -0800, Dan Williams wrote: > /* decoders are enabled if committed */ > - if (committed) { > + if (committed) scoped_guard(rwsem_write, &cxl_region_rwsem) { > cxld->flags |= CXL_DECODER_F_ENABLE; > if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK) > cxld->flags |= CXL_DECODER_F_LOCK; > > > The alternative is a much more violent re-ident of the code to turn: > > if (...) scoped_guard(...) { > ... > } else { > ... > } > > ...into: > > if (...) { > scoped_guard(...) { > ... > } > } else { > ... > } > What's wrong with: if (...) { guard(rwsem_write, &cxl_region_rwsem); ... } else { ... } ?
On Wed, Nov 15, 2023 at 02:01:14PM +0100, Peter Zijlstra wrote: > On Fri, Nov 10, 2023 at 12:21:56PM -0800, Dan Williams wrote: > > > /* decoders are enabled if committed */ > > - if (committed) { > > + if (committed) scoped_guard(rwsem_write, &cxl_region_rwsem) { > > cxld->flags |= CXL_DECODER_F_ENABLE; > > if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK) > > cxld->flags |= CXL_DECODER_F_LOCK; > > > > > > The alternative is a much more violent re-ident of the code to turn: > > > > if (...) scoped_guard(...) { > > ... > > } else { > > ... > > } > > > > ...into: > > > > if (...) { > > scoped_guard(...) { > > ... > > } > > } else { > > ... > > } > > > > What's wrong with: > > if (...) { > guard(rwsem_write, &cxl_region_rwsem); Uh, that is: guard(rwsem_write)(&ctl_region_rwsem); > ... > } else { > ... > } > > ?
Peter Zijlstra wrote: > On Wed, Nov 15, 2023 at 02:01:14PM +0100, Peter Zijlstra wrote: > > On Fri, Nov 10, 2023 at 12:21:56PM -0800, Dan Williams wrote: > > > > > /* decoders are enabled if committed */ > > > - if (committed) { > > > + if (committed) scoped_guard(rwsem_write, &cxl_region_rwsem) { > > > cxld->flags |= CXL_DECODER_F_ENABLE; > > > if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK) > > > cxld->flags |= CXL_DECODER_F_LOCK; > > > > > > > > > The alternative is a much more violent re-ident of the code to turn: > > > > > > if (...) scoped_guard(...) { > > > ... > > > } else { > > > ... > > > } > > > > > > ...into: > > > > > > if (...) { > > > scoped_guard(...) { > > > ... > > > } > > > } else { > > > ... > > > } > > > > > > > What's wrong with: > > > > if (...) { > > guard(rwsem_write, &cxl_region_rwsem); > > Uh, that is: > guard(rwsem_write)(&ctl_region_rwsem); Oh, likely it is me not understanding when to use scoped_guard()... /me re-reads the definition of guard(), and realizes that it indeed arranges for the lock to be released at the end of the local compound statement. ...and scoped_guard() is just a more convenient way to write: { guard(...); }
On Wed, Nov 15, 2023 at 08:01:01AM -0800, Dan Williams wrote: > ...and scoped_guard() is just a more convenient way to write: > > { > guard(...); > } Exactly so :-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index bc8ad4a8afca..a8f960c496cb 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -854,10 +854,12 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, cxld->target_type = CXL_DECODER_HOSTONLYMEM; else cxld->target_type = CXL_DECODER_DEVMEM; + down_write(&cxl_region_rwsem); if (cxld->id != cxl_num_decoders_committed(port)) { dev_warn(&port->dev, "decoder%d.%d: Committed out of order\n", port->id, cxld->id); + up_write(&cxl_region_rwsem); return -ENXIO; } @@ -865,9 +867,11 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, dev_warn(&port->dev, "decoder%d.%d: Committed with zero size\n", port->id, cxld->id); + up_write(&cxl_region_rwsem); return -ENXIO; } port->commit_end = cxld->id; + up_write(&cxl_region_rwsem); } else { if (cxled) { struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
init_hdm_decoder() modifies port->commit_end without taking the cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed(). However looking at the code, it looks like the write version of the rwsem needs to be taken due to the modification of commit_end. Wrap the write version of the rwsem around reading and writing of commit_end. Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- drivers/cxl/core/hdm.c | 4 ++++ 1 file changed, 4 insertions(+)