Message ID | 20240222090530.62530-3-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/x86: cmpxchg cleanup | expand |
On 22.02.2024 10:05, Roger Pau Monne wrote: > The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same > can be achieved with an atomic increment, which is both simpler to read, and > avoid any need for a loop. > > Note there can be a small divergence in the channel returned if next_channel > overflows, but returned channel will always be in the [0, num_hpets_used) > range, and that's fine for the purpose of balancing HPET channels across CPUs. > This is also theoretical, as there's no system currently with 2^32 CPUs (as > long as next_channel is 32bit width). The code change looks good to me, but I fail to see the connection to 2^32 CPUs. So it feels like I'm missing something, in which case I'd rather not offer any R-b. Jan > Signed-of-by: Roger Pau Monné <roger.pau@citrix.com> > --- > xen/arch/x86/hpet.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c > index d982b0f6b2c9..4777dc859d96 100644 > --- a/xen/arch/x86/hpet.c > +++ b/xen/arch/x86/hpet.c > @@ -458,11 +458,7 @@ static struct hpet_event_channel *hpet_get_channel(unsigned int cpu) > if ( num_hpets_used >= nr_cpu_ids ) > return &hpet_events[cpu]; > > - do { > - next = next_channel; > - if ( (i = next + 1) == num_hpets_used ) > - i = 0; > - } while ( cmpxchg(&next_channel, next, i) != next ); > + next = arch_fetch_and_add(&next_channel, 1) % num_hpets_used; > > /* try unused channel first */ > for ( i = next; i < next + num_hpets_used; i++ )
On Thu, Feb 22, 2024 at 11:10:54AM +0100, Jan Beulich wrote: > On 22.02.2024 10:05, Roger Pau Monne wrote: > > The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same > > can be achieved with an atomic increment, which is both simpler to read, and > > avoid any need for a loop. > > > > Note there can be a small divergence in the channel returned if next_channel > > overflows, but returned channel will always be in the [0, num_hpets_used) > > range, and that's fine for the purpose of balancing HPET channels across CPUs. > > This is also theoretical, as there's no system currently with 2^32 CPUs (as > > long as next_channel is 32bit width). > > The code change looks good to me, but I fail to see the connection to > 2^32 CPUs. So it feels like I'm missing something, in which case I'd > rather not offer any R-b. The purpose of hpet_get_channel() is to distribute HPET channels across CPUs, so that each CPU gets assigned an HPET channel, balancing the number of CPUs that use each channel. This is done by (next_channel++ % num_hpets_used), so the value of next_channel after this change can be > num_hpets_used, which previously wasn't. This can lead to a different channel returned for the 2^32 call to the function, as the counter would overflow. Note calls to the function are restricted to the number of CPUs in the host, as per-CPU channel assignment is done only once (and the channel is then stored in a per-cpu variable). Feel free to adjust the commit message if you think all this is too much data, or too confusing. Thanks, Roger.
On 22.02.2024 11:58, Roger Pau Monné wrote: > On Thu, Feb 22, 2024 at 11:10:54AM +0100, Jan Beulich wrote: >> On 22.02.2024 10:05, Roger Pau Monne wrote: >>> The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same >>> can be achieved with an atomic increment, which is both simpler to read, and >>> avoid any need for a loop. >>> >>> Note there can be a small divergence in the channel returned if next_channel >>> overflows, but returned channel will always be in the [0, num_hpets_used) >>> range, and that's fine for the purpose of balancing HPET channels across CPUs. >>> This is also theoretical, as there's no system currently with 2^32 CPUs (as >>> long as next_channel is 32bit width). >> >> The code change looks good to me, but I fail to see the connection to >> 2^32 CPUs. So it feels like I'm missing something, in which case I'd >> rather not offer any R-b. > > The purpose of hpet_get_channel() is to distribute HPET channels > across CPUs, so that each CPU gets assigned an HPET channel, balancing > the number of CPUs that use each channel. > > This is done by (next_channel++ % num_hpets_used), so the value of > next_channel after this change can be > num_hpets_used, which > previously wasn't. This can lead to a different channel returned for > the 2^32 call to the function, as the counter would overflow. Note > calls to the function are restricted to the number of CPUs in the > host, as per-CPU channel assignment is done only once (and the channel > is then stored in a per-cpu variable). That's once per CPU being brought up, not once per CPU in the system. > Feel free to adjust the commit message if you think all this is too > much data, or too confusing. I'll simply drop that last sentence then, without which Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
Hi Roger, On 22/02/2024 09:05, Roger Pau Monne wrote: > The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same > can be achieved with an atomic increment, which is both simpler to read, and > avoid any need for a loop. > > Note there can be a small divergence in the channel returned if next_channel > overflows, but returned channel will always be in the [0, num_hpets_used) > range, and that's fine for the purpose of balancing HPET channels across CPUs. > This is also theoretical, as there's no system currently with 2^32 CPUs (as > long as next_channel is 32bit width). This would also happen if we decide to reduce the size next_channel. Reviewed-by: Julien Grall <jgrall@amazon.com> Cheers,
On 22.02.2024 10:05, Roger Pau Monne wrote: > The usage of a cmpxchg loop in hpet_get_channel() is unnecessary, as the same > can be achieved with an atomic increment, which is both simpler to read, and > avoid any need for a loop. > > Note there can be a small divergence in the channel returned if next_channel > overflows, but returned channel will always be in the [0, num_hpets_used) > range, and that's fine for the purpose of balancing HPET channels across CPUs. > This is also theoretical, as there's no system currently with 2^32 CPUs (as > long as next_channel is 32bit width). > > Signed-of-by: Roger Pau Monné <roger.pau@citrix.com> Hmm, I'm sorry - it's now me who is recorded as the author of the patch, for my script not finding any Signed-off-by: here (note the typo). Jan
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index d982b0f6b2c9..4777dc859d96 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -458,11 +458,7 @@ static struct hpet_event_channel *hpet_get_channel(unsigned int cpu) if ( num_hpets_used >= nr_cpu_ids ) return &hpet_events[cpu]; - do { - next = next_channel; - if ( (i = next + 1) == num_hpets_used ) - i = 0; - } while ( cmpxchg(&next_channel, next, i) != next ); + next = arch_fetch_and_add(&next_channel, 1) % num_hpets_used; /* try unused channel first */ for ( i = next; i < next + num_hpets_used; i++ )