diff mbox series

[v3] cxl: Add cxl_region_rwsem around commit_end write during decoder enumeration

Message ID 170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3
State Accepted
Commit 36a1c2ee50f573972aea3c3019555f47ee0094c0
Headers show
Series [v3] cxl: Add cxl_region_rwsem around commit_end write during decoder enumeration | expand

Commit Message

Dave Jiang Nov. 17, 2023, 8:18 p.m. UTC
init_hdm_decoder() modifies port->commit_end without taking the
cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
However looking at the code, it looks like the write version of the rwsem
needs to be taken due to the modification of commit_end. Wrap the write
version of the rwsem around reading and writing of commit_end.

Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v3:
- Just do the one line change. (Dan)
v2:
- Add changes from Ira and moving the lock block
- Add suggestion from Dan on using guard()
---
 drivers/cxl/core/hdm.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Davidlohr Bueso Nov. 20, 2023, 4:04 p.m. UTC | #1
On Fri, 17 Nov 2023, Dave Jiang wrote:

>init_hdm_decoder() modifies port->commit_end without taking the
>cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
>However looking at the code, it looks like the write version of the rwsem
>needs to be taken due to the modification of commit_end. Wrap the write
>version of the rwsem around reading and writing of commit_end.
>
>Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
>Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Uhmm but do we expect concurrency during the switch/port probing phase?

Thanks,
Davidlohr
Dave Jiang Nov. 20, 2023, 5:04 p.m. UTC | #2
On 11/20/23 09:04, Davidlohr Bueso wrote:
> On Fri, 17 Nov 2023, Dave Jiang wrote:
> 
>> init_hdm_decoder() modifies port->commit_end without taking the
>> cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
>> However looking at the code, it looks like the write version of the rwsem
>> needs to be taken due to the modification of commit_end. Wrap the write
>> version of the rwsem around reading and writing of commit_end.
>>
>> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> Uhmm but do we expect concurrency during the switch/port probing phase?

I would assume so, which makes the protection necessary no?
> 
> Thanks,
> Davidlohr
Dan Williams Nov. 22, 2023, 9:43 p.m. UTC | #3
Davidlohr Bueso wrote:
> On Fri, 17 Nov 2023, Dave Jiang wrote:
> 
> >init_hdm_decoder() modifies port->commit_end without taking the
> >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
> >However looking at the code, it looks like the write version of the rwsem
> >needs to be taken due to the modification of commit_end. Wrap the write
> >version of the rwsem around reading and writing of commit_end.
> >
> >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> >Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> Uhmm but do we expect concurrency during the switch/port probing phase?

I answered that that detail here in response to Fan:

http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch

The takeaway is:

"I am in favor of adding the locking rather than depend on a subtle
side-effect of how CXL operates, and to avoid adding an unlocked version
of cxl_num_decoders_committed()"

I will add a note to the changelog to that effect, and drop Fixes: since
the lock is not strictly required for correctness in this path.
Dan Williams Nov. 23, 2023, 1:02 a.m. UTC | #4
Dan Williams wrote:
> Davidlohr Bueso wrote:
> > On Fri, 17 Nov 2023, Dave Jiang wrote:
> > 
> > >init_hdm_decoder() modifies port->commit_end without taking the
> > >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
> > >However looking at the code, it looks like the write version of the rwsem
> > >needs to be taken due to the modification of commit_end. Wrap the write
> > >version of the rwsem around reading and writing of commit_end.
> > >
> > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> > >Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> > 
> > Uhmm but do we expect concurrency during the switch/port probing phase?
> 
> I answered that that detail here in response to Fan:
> 
> http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch
> 
> The takeaway is:
> 
> "I am in favor of adding the locking rather than depend on a subtle
> side-effect of how CXL operates, and to avoid adding an unlocked version
> of cxl_num_decoders_committed()"
> 
> I will add a note to the changelog to that effect, and drop Fixes: since
> the lock is not strictly required for correctness in this path.

I rewrote the changelog to this:

Author: Dave Jiang <dave.jiang@intel.com>
Date:   Fri Nov 17 13:18:48 2023 -0700

    cxl/hdm: Fix a benign lockdep splat
    
    The new helper "cxl_num_decoders_committed()" added a lockdep assertion
    to validate that port->commit_end is protected against modification.
    That assertion fires in init_hdm_decoder() where it is initializing
    port->commit_end. Given that it is both accessing and writing that
    property it obstensibly needs the lock.
    
    In practice, CXL decoder commit rules (must commit in order) and the
    in-order discovery of device decoders makes the manipulation of
    ->commit_end in init_hdm_decoder() safe. However, rather than rely on
    the subtle rules of CXL hardware, just make the implementation obviously
    correct from a software perspective.
    
    The Fixes: tag is only for cleaning up a lockdep splat, there is no
    functional issue addressed by this fix.
    
    Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
Dave Jiang Nov. 27, 2023, 5:09 p.m. UTC | #5
On 11/22/23 18:02, Dan Williams wrote:
> Dan Williams wrote:
>> Davidlohr Bueso wrote:
>>> On Fri, 17 Nov 2023, Dave Jiang wrote:
>>>
>>>> init_hdm_decoder() modifies port->commit_end without taking the
>>>> cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
>>>> However looking at the code, it looks like the write version of the rwsem
>>>> needs to be taken due to the modification of commit_end. Wrap the write
>>>> version of the rwsem around reading and writing of commit_end.
>>>>
>>>> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>
>>> Uhmm but do we expect concurrency during the switch/port probing phase?
>>
>> I answered that that detail here in response to Fan:
>>
>> http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch
>>
>> The takeaway is:
>>
>> "I am in favor of adding the locking rather than depend on a subtle
>> side-effect of how CXL operates, and to avoid adding an unlocked version
>> of cxl_num_decoders_committed()"
>>
>> I will add a note to the changelog to that effect, and drop Fixes: since
>> the lock is not strictly required for correctness in this path.
> 
> I rewrote the changelog to this:
> 
> Author: Dave Jiang <dave.jiang@intel.com>
> Date:   Fri Nov 17 13:18:48 2023 -0700
> 
>     cxl/hdm: Fix a benign lockdep splat
>     
>     The new helper "cxl_num_decoders_committed()" added a lockdep assertion
>     to validate that port->commit_end is protected against modification.
>     That assertion fires in init_hdm_decoder() where it is initializing
>     port->commit_end. Given that it is both accessing and writing that
>     property it obstensibly needs the lock.
>     
>     In practice, CXL decoder commit rules (must commit in order) and the
>     in-order discovery of device decoders makes the manipulation of
>     ->commit_end in init_hdm_decoder() safe. However, rather than rely on
>     the subtle rules of CXL hardware, just make the implementation obviously
>     correct from a software perspective.
>     
>     The Fixes: tag is only for cleaning up a lockdep splat, there is no
>     functional issue addressed by this fix.
>     
>     Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")

Thanks. LGTM
Davidlohr Bueso Nov. 27, 2023, 5:39 p.m. UTC | #6
On Wed, 22 Nov 2023, Dan Williams wrote:

>Dan Williams wrote:
>> Davidlohr Bueso wrote:
>> > On Fri, 17 Nov 2023, Dave Jiang wrote:
>> >
>> > >init_hdm_decoder() modifies port->commit_end without taking the
>> > >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
>> > >However looking at the code, it looks like the write version of the rwsem
>> > >needs to be taken due to the modification of commit_end. Wrap the write
>> > >version of the rwsem around reading and writing of commit_end.
>> > >
>> > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
>> > >Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> >
>> > Uhmm but do we expect concurrency during the switch/port probing phase?
>>
>> I answered that that detail here in response to Fan:
>>
>> http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch
>>
>> The takeaway is:
>>
>> "I am in favor of adding the locking rather than depend on a subtle
>> side-effect of how CXL operates, and to avoid adding an unlocked version
>> of cxl_num_decoders_committed()"
>>
>> I will add a note to the changelog to that effect, and drop Fixes: since
>> the lock is not strictly required for correctness in this path.
>
>I rewrote the changelog to this:

Thanks, this clarified my doubt.

>
>Author: Dave Jiang <dave.jiang@intel.com>
>Date:   Fri Nov 17 13:18:48 2023 -0700
>
>    cxl/hdm: Fix a benign lockdep splat
>
>    The new helper "cxl_num_decoders_committed()" added a lockdep assertion
>    to validate that port->commit_end is protected against modification.
>    That assertion fires in init_hdm_decoder() where it is initializing
>    port->commit_end. Given that it is both accessing and writing that
>    property it obstensibly needs the lock.
>
>    In practice, CXL decoder commit rules (must commit in order) and the
>    in-order discovery of device decoders makes the manipulation of
>    ->commit_end in init_hdm_decoder() safe. However, rather than rely on
>    the subtle rules of CXL hardware, just make the implementation obviously
>    correct from a software perspective.
>
>    The Fixes: tag is only for cleaning up a lockdep splat, there is no
>    functional issue addressed by this fix.
>
>    Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
>
Dan Williams Nov. 30, 2023, 1:01 a.m. UTC | #7
Davidlohr Bueso wrote:
> On Wed, 22 Nov 2023, Dan Williams wrote:
> 
> >Dan Williams wrote:
> >> Davidlohr Bueso wrote:
> >> > On Fri, 17 Nov 2023, Dave Jiang wrote:
> >> >
> >> > >init_hdm_decoder() modifies port->commit_end without taking the
> >> > >cxl_region_rwsem. An assert splat emitted by cxl_num_decoders_committed().
> >> > >However looking at the code, it looks like the write version of the rwsem
> >> > >needs to be taken due to the modification of commit_end. Wrap the write
> >> > >version of the rwsem around reading and writing of commit_end.
> >> > >
> >> > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> >> > >Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> >> >
> >> > Uhmm but do we expect concurrency during the switch/port probing phase?
> >>
> >> I answered that that detail here in response to Fan:
> >>
> >> http://lore.kernel.org/r/654eb8ed72ced_46f0294c@dwillia2-mobl3.amr.corp.intel.com.notmuch
> >>
> >> The takeaway is:
> >>
> >> "I am in favor of adding the locking rather than depend on a subtle
> >> side-effect of how CXL operates, and to avoid adding an unlocked version
> >> of cxl_num_decoders_committed()"
> >>
> >> I will add a note to the changelog to that effect, and drop Fixes: since
> >> the lock is not strictly required for correctness in this path.
> >
> >I rewrote the changelog to this:
> 
> Thanks, this clarified my doubt.

Converted this ^^^ into an "Acked-by: Davidlohr Bueso
<dave@stgolabs.net>" if that's alright.
Davidlohr Bueso Nov. 30, 2023, 4:15 p.m. UTC | #8
On Wed, 29 Nov 2023, Dan Williams wrote:

>Converted this ^^^ into an "Acked-by: Davidlohr Bueso
><dave@stgolabs.net>" if that's alright.

Sure.
diff mbox series

Patch

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 1cc9be85ba4c..529baa8a1759 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -839,6 +839,8 @@  static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
 			cxld->target_type = CXL_DECODER_HOSTONLYMEM;
 		else
 			cxld->target_type = CXL_DECODER_DEVMEM;
+
+		guard(rwsem_write)(&cxl_region_rwsem);
 		if (cxld->id != cxl_num_decoders_committed(port)) {
 			dev_warn(&port->dev,
 				 "decoder%d.%d: Committed out of order\n",