Message ID | 20200220173116.88915-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] rwlock: allow recursive read locking when already locked in write mode | expand |
Hi, On 20/02/2020 17:31, Roger Pau Monne wrote: > Allow a CPU already holding the lock in write mode to also lock it in > read mode. There's no harm in allowing read locking a rwlock that's > already owned by the caller (ie: CPU) in write mode. Allowing such > accesses is required at least for the CPU maps use-case. > > In order to do this reserve 14bits of the lock, this allows to support > up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to > signal there are pending writers waiting on the lock and the other to > signal the lock is owned in write mode. Note the write related data > is using 16bits, this is done in order to be able to clear it (and > thus release the lock) using a 16bit atomic write. > > This reduces the maximum number of concurrent readers from 16777216 to > 65536, I think this should still be enough, or else the lock field > can be expanded from 32 to 64bits if all architectures support atomic > operations on 64bit integers. FWIW, arm32 is able to support atomic operations on 64-bit integers. > static inline void _write_unlock(rwlock_t *lock) > { > - /* > - * If the writer field is atomic, it can be cleared directly. > - * Otherwise, an atomic subtraction will be used to clear it. > - */ > - atomic_sub(_QW_LOCKED, &lock->cnts); > + /* Since the writer field is atomic, it can be cleared directly. */ > + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); > + BUILD_BUG_ON(_QR_SHIFT != 16); > + write_atomic((uint16_t *)&lock->cnts, 0); I think this is an abuse to cast an atomic_t() directly into a uint16_t. You would at least want to use &lock->cnts.counter here. The rest of the code looks good to me. Cheers,
On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: > Hi, > > On 20/02/2020 17:31, Roger Pau Monne wrote: > > Allow a CPU already holding the lock in write mode to also lock it in > > read mode. There's no harm in allowing read locking a rwlock that's > > already owned by the caller (ie: CPU) in write mode. Allowing such > > accesses is required at least for the CPU maps use-case. > > > > In order to do this reserve 14bits of the lock, this allows to support > > up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to > > signal there are pending writers waiting on the lock and the other to > > signal the lock is owned in write mode. Note the write related data > > is using 16bits, this is done in order to be able to clear it (and > > thus release the lock) using a 16bit atomic write. > > > > This reduces the maximum number of concurrent readers from 16777216 to > > 65536, I think this should still be enough, or else the lock field > > can be expanded from 32 to 64bits if all architectures support atomic > > operations on 64bit integers. > > FWIW, arm32 is able to support atomic operations on 64-bit integers. > > > static inline void _write_unlock(rwlock_t *lock) > > { > > - /* > > - * If the writer field is atomic, it can be cleared directly. > > - * Otherwise, an atomic subtraction will be used to clear it. > > - */ > > - atomic_sub(_QW_LOCKED, &lock->cnts); > > + /* Since the writer field is atomic, it can be cleared directly. */ > > + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); > > + BUILD_BUG_ON(_QR_SHIFT != 16); > > + write_atomic((uint16_t *)&lock->cnts, 0); > > I think this is an abuse to cast an atomic_t() directly into a uint16_t. You > would at least want to use &lock->cnts.counter here. Sure, I was wondering about this myself. Will wait for more comments, not sure whether this can be fixed upon commit if there are no other issues. Thanks, Roger.
Hi, On 21/02/2020 09:10, Roger Pau Monné wrote: > On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >> Hi, >> >> On 20/02/2020 17:31, Roger Pau Monne wrote: >>> Allow a CPU already holding the lock in write mode to also lock it in >>> read mode. There's no harm in allowing read locking a rwlock that's >>> already owned by the caller (ie: CPU) in write mode. Allowing such >>> accesses is required at least for the CPU maps use-case. >>> >>> In order to do this reserve 14bits of the lock, this allows to support >>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>> signal there are pending writers waiting on the lock and the other to >>> signal the lock is owned in write mode. Note the write related data >>> is using 16bits, this is done in order to be able to clear it (and >>> thus release the lock) using a 16bit atomic write. >>> >>> This reduces the maximum number of concurrent readers from 16777216 to >>> 65536, I think this should still be enough, or else the lock field >>> can be expanded from 32 to 64bits if all architectures support atomic >>> operations on 64bit integers. >> >> FWIW, arm32 is able to support atomic operations on 64-bit integers. >> >>> static inline void _write_unlock(rwlock_t *lock) >>> { >>> - /* >>> - * If the writer field is atomic, it can be cleared directly. >>> - * Otherwise, an atomic subtraction will be used to clear it. >>> - */ >>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>> + /* Since the writer field is atomic, it can be cleared directly. */ >>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>> + write_atomic((uint16_t *)&lock->cnts, 0); >> >> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You >> would at least want to use &lock->cnts.counter here. > > Sure, I was wondering about this myself. > > Will wait for more comments, not sure whether this can be fixed upon > commit if there are no other issues. It is trivial enough to be fixed while committing. Assuming this is going to be fixed: Reviewed-by: Julien Grall <julien@xen.org> Cheers,
On 21.02.2020 10:10, Roger Pau Monné wrote: > On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >> Hi, >> >> On 20/02/2020 17:31, Roger Pau Monne wrote: >>> Allow a CPU already holding the lock in write mode to also lock it in >>> read mode. There's no harm in allowing read locking a rwlock that's >>> already owned by the caller (ie: CPU) in write mode. Allowing such >>> accesses is required at least for the CPU maps use-case. >>> >>> In order to do this reserve 14bits of the lock, this allows to support >>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>> signal there are pending writers waiting on the lock and the other to >>> signal the lock is owned in write mode. Note the write related data >>> is using 16bits, this is done in order to be able to clear it (and >>> thus release the lock) using a 16bit atomic write. >>> >>> This reduces the maximum number of concurrent readers from 16777216 to >>> 65536, I think this should still be enough, or else the lock field >>> can be expanded from 32 to 64bits if all architectures support atomic >>> operations on 64bit integers. >> >> FWIW, arm32 is able to support atomic operations on 64-bit integers. >> >>> static inline void _write_unlock(rwlock_t *lock) >>> { >>> - /* >>> - * If the writer field is atomic, it can be cleared directly. >>> - * Otherwise, an atomic subtraction will be used to clear it. >>> - */ >>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>> + /* Since the writer field is atomic, it can be cleared directly. */ >>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>> + write_atomic((uint16_t *)&lock->cnts, 0); >> >> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You >> would at least want to use &lock->cnts.counter here. > > Sure, I was wondering about this myself. > > Will wait for more comments, not sure whether this can be fixed upon > commit if there are no other issues. It's more than just adding another field specifier here. A cast like this one is endianness-unsafe, and hence a trap waiting for a big endian port attempt to fall into. At the very least this should cause a build failure on big endian systems, even better would be if it was endianness-safe. Jan
On 21.02.20 14:36, Jan Beulich wrote: > On 21.02.2020 10:10, Roger Pau Monné wrote: >> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>> Hi, >>> >>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>> Allow a CPU already holding the lock in write mode to also lock it in >>>> read mode. There's no harm in allowing read locking a rwlock that's >>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>> accesses is required at least for the CPU maps use-case. >>>> >>>> In order to do this reserve 14bits of the lock, this allows to support >>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>> signal there are pending writers waiting on the lock and the other to >>>> signal the lock is owned in write mode. Note the write related data >>>> is using 16bits, this is done in order to be able to clear it (and >>>> thus release the lock) using a 16bit atomic write. >>>> >>>> This reduces the maximum number of concurrent readers from 16777216 to >>>> 65536, I think this should still be enough, or else the lock field >>>> can be expanded from 32 to 64bits if all architectures support atomic >>>> operations on 64bit integers. >>> >>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>> >>>> static inline void _write_unlock(rwlock_t *lock) >>>> { >>>> - /* >>>> - * If the writer field is atomic, it can be cleared directly. >>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>> - */ >>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>> + /* Since the writer field is atomic, it can be cleared directly. */ >>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>> >>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You >>> would at least want to use &lock->cnts.counter here. >> >> Sure, I was wondering about this myself. >> >> Will wait for more comments, not sure whether this can be fixed upon >> commit if there are no other issues. > > It's more than just adding another field specifier here. A cast like > this one is endianness-unsafe, and hence a trap waiting for a big > endian port attempt to fall into. At the very least this should cause > a build failure on big endian systems, even better would be if it was > endianness-safe. Wouldn't a union be the better choice? Juergen
On 21/02/2020 13:46, Jürgen Groß wrote: > On 21.02.20 14:36, Jan Beulich wrote: >> On 21.02.2020 10:10, Roger Pau Monné wrote: >>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>> Hi, >>>> >>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>> accesses is required at least for the CPU maps use-case. >>>>> >>>>> In order to do this reserve 14bits of the lock, this allows to support >>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>> signal there are pending writers waiting on the lock and the other to >>>>> signal the lock is owned in write mode. Note the write related data >>>>> is using 16bits, this is done in order to be able to clear it (and >>>>> thus release the lock) using a 16bit atomic write. >>>>> >>>>> This reduces the maximum number of concurrent readers from 16777216 to >>>>> 65536, I think this should still be enough, or else the lock field >>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>> operations on 64bit integers. >>>> >>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>> >>>>> static inline void _write_unlock(rwlock_t *lock) >>>>> { >>>>> - /* >>>>> - * If the writer field is atomic, it can be cleared directly. >>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>> - */ >>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>> + /* Since the writer field is atomic, it can be cleared >>>>> directly. */ >>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>> >>>> I think this is an abuse to cast an atomic_t() directly into a >>>> uint16_t. You >>>> would at least want to use &lock->cnts.counter here. >>> >>> Sure, I was wondering about this myself. >>> >>> Will wait for more comments, not sure whether this can be fixed upon >>> commit if there are no other issues. >> >> It's more than just adding another field specifier here. A cast like >> this one is endianness-unsafe, and hence a trap waiting for a big >> endian port attempt to fall into. At the very least this should cause >> a build failure on big endian systems, even better would be if it was >> endianness-safe. > > Wouldn't a union be the better choice? You would not be able to use atomic_t in that case as you can't assume the layout of the structure. Cheers,
On 21.02.20 14:49, Julien Grall wrote: > > > On 21/02/2020 13:46, Jürgen Groß wrote: >> On 21.02.20 14:36, Jan Beulich wrote: >>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>> Hi, >>>>> >>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>> accesses is required at least for the CPU maps use-case. >>>>>> >>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>> support >>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>> signal there are pending writers waiting on the lock and the other to >>>>>> signal the lock is owned in write mode. Note the write related data >>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>> thus release the lock) using a 16bit atomic write. >>>>>> >>>>>> This reduces the maximum number of concurrent readers from >>>>>> 16777216 to >>>>>> 65536, I think this should still be enough, or else the lock field >>>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>>> operations on 64bit integers. >>>>> >>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>> >>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>> { >>>>>> - /* >>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>> - */ >>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>> directly. */ >>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>> >>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>> uint16_t. You >>>>> would at least want to use &lock->cnts.counter here. >>>> >>>> Sure, I was wondering about this myself. >>>> >>>> Will wait for more comments, not sure whether this can be fixed upon >>>> commit if there are no other issues. >>> >>> It's more than just adding another field specifier here. A cast like >>> this one is endianness-unsafe, and hence a trap waiting for a big >>> endian port attempt to fall into. At the very least this should cause >>> a build failure on big endian systems, even better would be if it was >>> endianness-safe. >> >> Wouldn't a union be the better choice? > > You would not be able to use atomic_t in that case as you can't assume > the layout of the structure. union rwlockword { atomic_t cnts; uint32_t val; struct { uint16_t write; uint16_t readers; }; }; static inline const uint32_t _qr_bias( const union rwlockword { .write = 0, .readers = 1 } x; return x.val; } ... cnts = atomic_add_return(_qr_bias(), &lock->cnts); ... I guess this should do the trick, no? Juergen
On 21.02.2020 14:46, Jürgen Groß wrote: > On 21.02.20 14:36, Jan Beulich wrote: >> On 21.02.2020 10:10, Roger Pau Monné wrote: >>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>> Hi, >>>> >>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>> accesses is required at least for the CPU maps use-case. >>>>> >>>>> In order to do this reserve 14bits of the lock, this allows to support >>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>> signal there are pending writers waiting on the lock and the other to >>>>> signal the lock is owned in write mode. Note the write related data >>>>> is using 16bits, this is done in order to be able to clear it (and >>>>> thus release the lock) using a 16bit atomic write. >>>>> >>>>> This reduces the maximum number of concurrent readers from 16777216 to >>>>> 65536, I think this should still be enough, or else the lock field >>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>> operations on 64bit integers. >>>> >>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>> >>>>> static inline void _write_unlock(rwlock_t *lock) >>>>> { >>>>> - /* >>>>> - * If the writer field is atomic, it can be cleared directly. >>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>> - */ >>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>> + /* Since the writer field is atomic, it can be cleared directly. */ >>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>> >>>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You >>>> would at least want to use &lock->cnts.counter here. >>> >>> Sure, I was wondering about this myself. >>> >>> Will wait for more comments, not sure whether this can be fixed upon >>> commit if there are no other issues. >> >> It's more than just adding another field specifier here. A cast like >> this one is endianness-unsafe, and hence a trap waiting for a big >> endian port attempt to fall into. At the very least this should cause >> a build failure on big endian systems, even better would be if it was >> endianness-safe. > > Wouldn't a union be the better choice? Well, a union (with the second half being an array of uint16_t) could be the mechanism to implement this in an endian safe way, but the union alone won't help - you'd still have to decide which of the array elements you actually need to write to. And of course you'd then also make assumptions on sizeof(int) which are more strict than our current ones (which I'd like to avoid if possible). Jan
Hi Juergen, On 21/02/2020 14:06, Jürgen Groß wrote: > On 21.02.20 14:49, Julien Grall wrote: >> >> >> On 21/02/2020 13:46, Jürgen Groß wrote: >>> On 21.02.20 14:36, Jan Beulich wrote: >>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>> Hi, >>>>>> >>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>> Allow a CPU already holding the lock in write mode to also lock >>>>>>> it in >>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>> >>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>> support >>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>> signal there are pending writers waiting on the lock and the >>>>>>> other to >>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>> >>>>>>> This reduces the maximum number of concurrent readers from >>>>>>> 16777216 to >>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>> can be expanded from 32 to 64bits if all architectures support >>>>>>> atomic >>>>>>> operations on 64bit integers. >>>>>> >>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>> >>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>> { >>>>>>> - /* >>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>> - */ >>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>> directly. */ >>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>> >>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>> uint16_t. You >>>>>> would at least want to use &lock->cnts.counter here. >>>>> >>>>> Sure, I was wondering about this myself. >>>>> >>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>> commit if there are no other issues. >>>> >>>> It's more than just adding another field specifier here. A cast like >>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>> endian port attempt to fall into. At the very least this should cause >>>> a build failure on big endian systems, even better would be if it was >>>> endianness-safe. >>> >>> Wouldn't a union be the better choice? >> >> You would not be able to use atomic_t in that case as you can't assume >> the layout of the structure. > > union rwlockword { > atomic_t cnts; > uint32_t val; > struct { > uint16_t write; > uint16_t readers; > }; > }; > > static inline const uint32_t _qr_bias( > const union rwlockword { > .write = 0, > .readers = 1 > } x; > > return x.val; > } > > ... > cnts = atomic_add_return(_qr_bias(), &lock->cnts); > ... > > I guess this should do the trick, no? You are assuming the atomic_t layout which I would rather no want to happen. Cheers,
On 21.02.2020 15:06, Jürgen Groß wrote: > On 21.02.20 14:49, Julien Grall wrote: >> >> >> On 21/02/2020 13:46, Jürgen Groß wrote: >>> On 21.02.20 14:36, Jan Beulich wrote: >>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>> Hi, >>>>>> >>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>> >>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>> support >>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>> signal there are pending writers waiting on the lock and the other to >>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>> >>>>>>> This reduces the maximum number of concurrent readers from >>>>>>> 16777216 to >>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>>>> operations on 64bit integers. >>>>>> >>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>> >>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>> { >>>>>>> - /* >>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>> - */ >>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>> directly. */ >>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>> >>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>> uint16_t. You >>>>>> would at least want to use &lock->cnts.counter here. >>>>> >>>>> Sure, I was wondering about this myself. >>>>> >>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>> commit if there are no other issues. >>>> >>>> It's more than just adding another field specifier here. A cast like >>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>> endian port attempt to fall into. At the very least this should cause >>>> a build failure on big endian systems, even better would be if it was >>>> endianness-safe. >>> >>> Wouldn't a union be the better choice? >> >> You would not be able to use atomic_t in that case as you can't assume >> the layout of the structure. > > union rwlockword { > atomic_t cnts; > uint32_t val; > struct { > uint16_t write; > uint16_t readers; > }; > }; > > static inline const uint32_t _qr_bias( > const union rwlockword { > .write = 0, > .readers = 1 > } x; > > return x.val; > } > > ... > cnts = atomic_add_return(_qr_bias(), &lock->cnts); > ... > > I guess this should do the trick, no? I'm afraid it won't, and not just because of the sizeof() aspect already pointed out. Your x variable would end up like this in memory: little: 00 00 01 00 big: 00 00 00 01 => 00000001 which, read as 32-bit value, then ends up being little: 00010000 big: 00000001 The add therefore would be able to spill into the high 16 bits. Jan
On 21.02.20 15:11, Julien Grall wrote: > Hi Juergen, > > On 21/02/2020 14:06, Jürgen Groß wrote: >> On 21.02.20 14:49, Julien Grall wrote: >>> >>> >>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>> Allow a CPU already holding the lock in write mode to also lock >>>>>>>> it in >>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>> >>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>> support >>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>> other to >>>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>> >>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>> 16777216 to >>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>> can be expanded from 32 to 64bits if all architectures support >>>>>>>> atomic >>>>>>>> operations on 64bit integers. >>>>>>> >>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>>> >>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>> { >>>>>>>> - /* >>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>> - */ >>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>> directly. */ >>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>> >>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>> uint16_t. You >>>>>>> would at least want to use &lock->cnts.counter here. >>>>>> >>>>>> Sure, I was wondering about this myself. >>>>>> >>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>> commit if there are no other issues. >>>>> >>>>> It's more than just adding another field specifier here. A cast like >>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>> endian port attempt to fall into. At the very least this should cause >>>>> a build failure on big endian systems, even better would be if it was >>>>> endianness-safe. >>>> >>>> Wouldn't a union be the better choice? >>> >>> You would not be able to use atomic_t in that case as you can't >>> assume the layout of the structure. >> >> union rwlockword { >> atomic_t cnts; >> uint32_t val; >> struct { >> uint16_t write; >> uint16_t readers; >> }; >> }; >> >> static inline const uint32_t _qr_bias( >> const union rwlockword { >> .write = 0, >> .readers = 1 >> } x; >> >> return x.val; >> } >> >> ... >> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >> ... >> >> I guess this should do the trick, no? > > You are assuming the atomic_t layout which I would rather no want to > happen. That already happened. rwlock.h already assumes atomic_t to be 32 bits wide. I agree it would be better to have an atomic32_t type for this use case. Juergen
On 21.02.2020 15:16, Jürgen Groß wrote: > On 21.02.20 15:11, Julien Grall wrote: >> Hi Juergen, >> >> On 21/02/2020 14:06, Jürgen Groß wrote: >>> On 21.02.20 14:49, Julien Grall wrote: >>>> >>>> >>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>> Allow a CPU already holding the lock in write mode to also lock >>>>>>>>> it in >>>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>> >>>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>>> support >>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>> other to >>>>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>> >>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>> 16777216 to >>>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>>> can be expanded from 32 to 64bits if all architectures support >>>>>>>>> atomic >>>>>>>>> operations on 64bit integers. >>>>>>>> >>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>>>> >>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>> { >>>>>>>>> - /* >>>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>>> - */ >>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>> directly. */ >>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>> >>>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>>> uint16_t. You >>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>> >>>>>>> Sure, I was wondering about this myself. >>>>>>> >>>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>>> commit if there are no other issues. >>>>>> >>>>>> It's more than just adding another field specifier here. A cast like >>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>> endian port attempt to fall into. At the very least this should cause >>>>>> a build failure on big endian systems, even better would be if it was >>>>>> endianness-safe. >>>>> >>>>> Wouldn't a union be the better choice? >>>> >>>> You would not be able to use atomic_t in that case as you can't >>>> assume the layout of the structure. >>> >>> union rwlockword { >>> atomic_t cnts; >>> uint32_t val; >>> struct { >>> uint16_t write; >>> uint16_t readers; >>> }; >>> }; >>> >>> static inline const uint32_t _qr_bias( >>> const union rwlockword { >>> .write = 0, >>> .readers = 1 >>> } x; >>> >>> return x.val; >>> } >>> >>> ... >>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>> ... >>> >>> I guess this should do the trick, no? >> >> You are assuming the atomic_t layout which I would rather no want to >> happen. > > That already happened. rwlock.h already assumes atomic_t to be 32 bits > wide. Exactly 32 bits, or at least as many? Jan
On 21.02.20 15:16, Jan Beulich wrote: > On 21.02.2020 15:06, Jürgen Groß wrote: >> On 21.02.20 14:49, Julien Grall wrote: >>> >>> >>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>> >>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>> support >>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>>> signal there are pending writers waiting on the lock and the other to >>>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>> >>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>> 16777216 to >>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>>>>> operations on 64bit integers. >>>>>>> >>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>>> >>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>> { >>>>>>>> - /* >>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>> - */ >>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>> directly. */ >>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>> >>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>> uint16_t. You >>>>>>> would at least want to use &lock->cnts.counter here. >>>>>> >>>>>> Sure, I was wondering about this myself. >>>>>> >>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>> commit if there are no other issues. >>>>> >>>>> It's more than just adding another field specifier here. A cast like >>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>> endian port attempt to fall into. At the very least this should cause >>>>> a build failure on big endian systems, even better would be if it was >>>>> endianness-safe. >>>> >>>> Wouldn't a union be the better choice? >>> >>> You would not be able to use atomic_t in that case as you can't assume >>> the layout of the structure. >> >> union rwlockword { >> atomic_t cnts; >> uint32_t val; >> struct { >> uint16_t write; >> uint16_t readers; >> }; >> }; >> >> static inline const uint32_t _qr_bias( >> const union rwlockword { >> .write = 0, >> .readers = 1 >> } x; >> >> return x.val; >> } >> >> ... >> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >> ... >> >> I guess this should do the trick, no? > > I'm afraid it won't, and not just because of the sizeof() aspect > already pointed out. Your x variable would end up like this in > memory: > > little: 00 00 01 00 > big: 00 00 00 01 => 00000001 > > which, read as 32-bit value, then ends up being > > little: 00010000 > big: 00000001 > > The add therefore would be able to spill into the high 16 bits. And why exactly is this worse than just dropping spilled bits? Both cases will explode rather soon. Both cases can be avoided by introduction of e.g. ASSERTing that reader isn't ~0 before incrementing it (or to be zero afterwards). Juergen
On 21.02.20 15:17, Jan Beulich wrote: > On 21.02.2020 15:16, Jürgen Groß wrote: >> On 21.02.20 15:11, Julien Grall wrote: >>> Hi Juergen, >>> >>> On 21/02/2020 14:06, Jürgen Groß wrote: >>>> On 21.02.20 14:49, Julien Grall wrote: >>>>> >>>>> >>>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>>> Allow a CPU already holding the lock in write mode to also lock >>>>>>>>>> it in >>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>>> >>>>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>>>> support >>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>>> other to >>>>>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>>> >>>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>>> 16777216 to >>>>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>>>> can be expanded from 32 to 64bits if all architectures support >>>>>>>>>> atomic >>>>>>>>>> operations on 64bit integers. >>>>>>>>> >>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>>>>> >>>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>>> { >>>>>>>>>> - /* >>>>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>>>> - */ >>>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>>> directly. */ >>>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>>> >>>>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>>>> uint16_t. You >>>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>>> >>>>>>>> Sure, I was wondering about this myself. >>>>>>>> >>>>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>>>> commit if there are no other issues. >>>>>>> >>>>>>> It's more than just adding another field specifier here. A cast like >>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>>> endian port attempt to fall into. At the very least this should cause >>>>>>> a build failure on big endian systems, even better would be if it was >>>>>>> endianness-safe. >>>>>> >>>>>> Wouldn't a union be the better choice? >>>>> >>>>> You would not be able to use atomic_t in that case as you can't >>>>> assume the layout of the structure. >>>> >>>> union rwlockword { >>>> atomic_t cnts; >>>> uint32_t val; >>>> struct { >>>> uint16_t write; >>>> uint16_t readers; >>>> }; >>>> }; >>>> >>>> static inline const uint32_t _qr_bias( >>>> const union rwlockword { >>>> .write = 0, >>>> .readers = 1 >>>> } x; >>>> >>>> return x.val; >>>> } >>>> >>>> ... >>>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>>> ... >>>> >>>> I guess this should do the trick, no? >>> >>> You are assuming the atomic_t layout which I would rather no want to >>> happen. >> >> That already happened. rwlock.h already assumes atomic_t to be 32 bits >> wide. > > Exactly 32 bits, or at least as many? Value is read into a u32 variable and then an update via cmpxchg() is done using that value for comparison. So any additional bits will be lost and update will never succeed (see queue_write_lock_slowpath()). Juergen
On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: > On 21.02.2020 10:10, Roger Pau Monné wrote: > > On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: > >> Hi, > >> > >> On 20/02/2020 17:31, Roger Pau Monne wrote: > >>> Allow a CPU already holding the lock in write mode to also lock it in > >>> read mode. There's no harm in allowing read locking a rwlock that's > >>> already owned by the caller (ie: CPU) in write mode. Allowing such > >>> accesses is required at least for the CPU maps use-case. > >>> > >>> In order to do this reserve 14bits of the lock, this allows to support > >>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to > >>> signal there are pending writers waiting on the lock and the other to > >>> signal the lock is owned in write mode. Note the write related data > >>> is using 16bits, this is done in order to be able to clear it (and > >>> thus release the lock) using a 16bit atomic write. > >>> > >>> This reduces the maximum number of concurrent readers from 16777216 to > >>> 65536, I think this should still be enough, or else the lock field > >>> can be expanded from 32 to 64bits if all architectures support atomic > >>> operations on 64bit integers. > >> > >> FWIW, arm32 is able to support atomic operations on 64-bit integers. > >> > >>> static inline void _write_unlock(rwlock_t *lock) > >>> { > >>> - /* > >>> - * If the writer field is atomic, it can be cleared directly. > >>> - * Otherwise, an atomic subtraction will be used to clear it. > >>> - */ > >>> - atomic_sub(_QW_LOCKED, &lock->cnts); > >>> + /* Since the writer field is atomic, it can be cleared directly. */ > >>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); > >>> + BUILD_BUG_ON(_QR_SHIFT != 16); > >>> + write_atomic((uint16_t *)&lock->cnts, 0); > >> > >> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You > >> would at least want to use &lock->cnts.counter here. > > > > Sure, I was wondering about this myself. > > > > Will wait for more comments, not sure whether this can be fixed upon > > commit if there are no other issues. > > It's more than just adding another field specifier here. A cast like > this one is endianness-unsafe, and hence a trap waiting for a big > endian port attempt to fall into. At the very least this should cause > a build failure on big endian systems, even better would be if it was > endianness-safe. Why don't we leave the atomic_dec then? The ASSERT I've added can be turned into a BUG_ON, and that's likely even safer than what we currently have. Thanks, Roger.
On Fri, Feb 21, 2020 at 03:26:35PM +0100, Roger Pau Monné wrote: > On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: > > On 21.02.2020 10:10, Roger Pau Monné wrote: > > > On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: > > >> Hi, > > >> > > >> On 20/02/2020 17:31, Roger Pau Monne wrote: > > >>> Allow a CPU already holding the lock in write mode to also lock it in > > >>> read mode. There's no harm in allowing read locking a rwlock that's > > >>> already owned by the caller (ie: CPU) in write mode. Allowing such > > >>> accesses is required at least for the CPU maps use-case. > > >>> > > >>> In order to do this reserve 14bits of the lock, this allows to support > > >>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to > > >>> signal there are pending writers waiting on the lock and the other to > > >>> signal the lock is owned in write mode. Note the write related data > > >>> is using 16bits, this is done in order to be able to clear it (and > > >>> thus release the lock) using a 16bit atomic write. > > >>> > > >>> This reduces the maximum number of concurrent readers from 16777216 to > > >>> 65536, I think this should still be enough, or else the lock field > > >>> can be expanded from 32 to 64bits if all architectures support atomic > > >>> operations on 64bit integers. > > >> > > >> FWIW, arm32 is able to support atomic operations on 64-bit integers. > > >> > > >>> static inline void _write_unlock(rwlock_t *lock) > > >>> { > > >>> - /* > > >>> - * If the writer field is atomic, it can be cleared directly. > > >>> - * Otherwise, an atomic subtraction will be used to clear it. > > >>> - */ > > >>> - atomic_sub(_QW_LOCKED, &lock->cnts); > > >>> + /* Since the writer field is atomic, it can be cleared directly. */ > > >>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); > > >>> + BUILD_BUG_ON(_QR_SHIFT != 16); > > >>> + write_atomic((uint16_t *)&lock->cnts, 0); > > >> > > >> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You > > >> would at least want to use &lock->cnts.counter here. > > > > > > Sure, I was wondering about this myself. > > > > > > Will wait for more comments, not sure whether this can be fixed upon > > > commit if there are no other issues. > > > > It's more than just adding another field specifier here. A cast like > > this one is endianness-unsafe, and hence a trap waiting for a big > > endian port attempt to fall into. At the very least this should cause > > a build failure on big endian systems, even better would be if it was > > endianness-safe. > > Why don't we leave the atomic_dec then? Sorry, that should be atomic_sub, not atomic_dec. Thanks, Roger.
On 21/02/2020 14:16, Jürgen Groß wrote: > On 21.02.20 15:11, Julien Grall wrote: >> Hi Juergen, >> >> On 21/02/2020 14:06, Jürgen Groß wrote: >>> On 21.02.20 14:49, Julien Grall wrote: >>>> >>>> >>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>> Allow a CPU already holding the lock in write mode to also lock >>>>>>>>> it in >>>>>>>>> read mode. There's no harm in allowing read locking a rwlock >>>>>>>>> that's >>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>> >>>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>>> support >>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: >>>>>>>>> one to >>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>> other to >>>>>>>>> signal the lock is owned in write mode. Note the write related >>>>>>>>> data >>>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>> >>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>> 16777216 to >>>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>>> can be expanded from 32 to 64bits if all architectures support >>>>>>>>> atomic >>>>>>>>> operations on 64bit integers. >>>>>>>> >>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit >>>>>>>> integers. >>>>>>>> >>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>> { >>>>>>>>> - /* >>>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>>> - */ >>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>> directly. */ >>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>> >>>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>>> uint16_t. You >>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>> >>>>>>> Sure, I was wondering about this myself. >>>>>>> >>>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>>> commit if there are no other issues. >>>>>> >>>>>> It's more than just adding another field specifier here. A cast like >>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>> endian port attempt to fall into. At the very least this should cause >>>>>> a build failure on big endian systems, even better would be if it was >>>>>> endianness-safe. >>>>> >>>>> Wouldn't a union be the better choice? >>>> >>>> You would not be able to use atomic_t in that case as you can't >>>> assume the layout of the structure. >>> >>> union rwlockword { >>> atomic_t cnts; >>> uint32_t val; >>> struct { >>> uint16_t write; >>> uint16_t readers; >>> }; >>> }; >>> >>> static inline const uint32_t _qr_bias( >>> const union rwlockword { >>> .write = 0, >>> .readers = 1 >>> } x; >>> >>> return x.val; >>> } >>> >>> ... >>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>> ... >>> >>> I guess this should do the trick, no? >> >> You are assuming the atomic_t layout which I would rather no want to >> happen. > > That already happened. rwlock.h already assumes atomic_t to be 32 bits > wide. I agree it would be better to have an atomic32_t type for this > use case. I don't think you understood my point here. An atomic32_t will not be better as you still assume the layout of the structure. I.e the structure has only one field. I don't know if this is going to bite us in the future, but I don't want to make attempt to use such union. At least without any safeguard in Xen (i.e BUILD_BUG_ON(...)). Cheers,
On 21.02.20 15:32, Julien Grall wrote: > > > On 21/02/2020 14:16, Jürgen Groß wrote: >> On 21.02.20 15:11, Julien Grall wrote: >>> Hi Juergen, >>> >>> On 21/02/2020 14:06, Jürgen Groß wrote: >>>> On 21.02.20 14:49, Julien Grall wrote: >>>>> >>>>> >>>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>>> Allow a CPU already holding the lock in write mode to also >>>>>>>>>> lock it in >>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock >>>>>>>>>> that's >>>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing >>>>>>>>>> such >>>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>>> >>>>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>>>> support >>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: >>>>>>>>>> one to >>>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>>> other to >>>>>>>>>> signal the lock is owned in write mode. Note the write related >>>>>>>>>> data >>>>>>>>>> is using 16bits, this is done in order to be able to clear it >>>>>>>>>> (and >>>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>>> >>>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>>> 16777216 to >>>>>>>>>> 65536, I think this should still be enough, or else the lock >>>>>>>>>> field >>>>>>>>>> can be expanded from 32 to 64bits if all architectures support >>>>>>>>>> atomic >>>>>>>>>> operations on 64bit integers. >>>>>>>>> >>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit >>>>>>>>> integers. >>>>>>>>> >>>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>>> { >>>>>>>>>> - /* >>>>>>>>>> - * If the writer field is atomic, it can be cleared >>>>>>>>>> directly. >>>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear >>>>>>>>>> it. >>>>>>>>>> - */ >>>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>>> directly. */ >>>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>>> >>>>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>>>> uint16_t. You >>>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>>> >>>>>>>> Sure, I was wondering about this myself. >>>>>>>> >>>>>>>> Will wait for more comments, not sure whether this can be fixed >>>>>>>> upon >>>>>>>> commit if there are no other issues. >>>>>>> >>>>>>> It's more than just adding another field specifier here. A cast like >>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>>> endian port attempt to fall into. At the very least this should >>>>>>> cause >>>>>>> a build failure on big endian systems, even better would be if it >>>>>>> was >>>>>>> endianness-safe. >>>>>> >>>>>> Wouldn't a union be the better choice? >>>>> >>>>> You would not be able to use atomic_t in that case as you can't >>>>> assume the layout of the structure. >>>> >>>> union rwlockword { >>>> atomic_t cnts; >>>> uint32_t val; >>>> struct { >>>> uint16_t write; >>>> uint16_t readers; >>>> }; >>>> }; >>>> >>>> static inline const uint32_t _qr_bias( >>>> const union rwlockword { >>>> .write = 0, >>>> .readers = 1 >>>> } x; >>>> >>>> return x.val; >>>> } >>>> >>>> ... >>>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>>> ... >>>> >>>> I guess this should do the trick, no? >>> >>> You are assuming the atomic_t layout which I would rather no want to >>> happen. >> >> That already happened. rwlock.h already assumes atomic_t to be 32 bits >> wide. I agree it would be better to have an atomic32_t type for this >> use case. > > I don't think you understood my point here. An atomic32_t will not be > better as you still assume the layout of the structure. I.e the > structure has only one field. I don't understand your reasoning here, sorry. Are you saying that a structure of two uint16_t isn't known to be 32 bits long? Juergen
On 21.02.2020 15:24, Jürgen Groß wrote: > On 21.02.20 15:17, Jan Beulich wrote: >> On 21.02.2020 15:16, Jürgen Groß wrote: >>> On 21.02.20 15:11, Julien Grall wrote: >>>> Hi Juergen, >>>> >>>> On 21/02/2020 14:06, Jürgen Groß wrote: >>>>> On 21.02.20 14:49, Julien Grall wrote: >>>>>> >>>>>> >>>>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>>>> Allow a CPU already holding the lock in write mode to also lock >>>>>>>>>>> it in >>>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>>>> >>>>>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>>>>> support >>>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>>>> other to >>>>>>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>>>> >>>>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>>>> 16777216 to >>>>>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>>>>> can be expanded from 32 to 64bits if all architectures support >>>>>>>>>>> atomic >>>>>>>>>>> operations on 64bit integers. >>>>>>>>>> >>>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>>>>>> >>>>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>>>> { >>>>>>>>>>> - /* >>>>>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>>>>> - */ >>>>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>>>> directly. */ >>>>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>>>> >>>>>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>>>>> uint16_t. You >>>>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>>>> >>>>>>>>> Sure, I was wondering about this myself. >>>>>>>>> >>>>>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>>>>> commit if there are no other issues. >>>>>>>> >>>>>>>> It's more than just adding another field specifier here. A cast like >>>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>>>> endian port attempt to fall into. At the very least this should cause >>>>>>>> a build failure on big endian systems, even better would be if it was >>>>>>>> endianness-safe. >>>>>>> >>>>>>> Wouldn't a union be the better choice? >>>>>> >>>>>> You would not be able to use atomic_t in that case as you can't >>>>>> assume the layout of the structure. >>>>> >>>>> union rwlockword { >>>>> atomic_t cnts; >>>>> uint32_t val; >>>>> struct { >>>>> uint16_t write; >>>>> uint16_t readers; >>>>> }; >>>>> }; >>>>> >>>>> static inline const uint32_t _qr_bias( >>>>> const union rwlockword { >>>>> .write = 0, >>>>> .readers = 1 >>>>> } x; >>>>> >>>>> return x.val; >>>>> } >>>>> >>>>> ... >>>>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>>>> ... >>>>> >>>>> I guess this should do the trick, no? >>>> >>>> You are assuming the atomic_t layout which I would rather no want to >>>> happen. >>> >>> That already happened. rwlock.h already assumes atomic_t to be 32 bits >>> wide. >> >> Exactly 32 bits, or at least as many? > > Value is read into a u32 variable and then an update via cmpxchg() is > done using that value for comparison. So any additional bits will be > lost and update will never succeed (see queue_write_lock_slowpath()). Well, the local variables should be of type int (or possibly unsigned int), but the logic there isn't tied to anything else being exactly 32 bits wide. These local variable are merely an example of misguided use of fixed width types. Jan
On 21.02.2020 15:26, Roger Pau Monné wrote: > On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: >> On 21.02.2020 10:10, Roger Pau Monné wrote: >>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>> Hi, >>>> >>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>> accesses is required at least for the CPU maps use-case. >>>>> >>>>> In order to do this reserve 14bits of the lock, this allows to support >>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>> signal there are pending writers waiting on the lock and the other to >>>>> signal the lock is owned in write mode. Note the write related data >>>>> is using 16bits, this is done in order to be able to clear it (and >>>>> thus release the lock) using a 16bit atomic write. >>>>> >>>>> This reduces the maximum number of concurrent readers from 16777216 to >>>>> 65536, I think this should still be enough, or else the lock field >>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>> operations on 64bit integers. >>>> >>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>> >>>>> static inline void _write_unlock(rwlock_t *lock) >>>>> { >>>>> - /* >>>>> - * If the writer field is atomic, it can be cleared directly. >>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>> - */ >>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>> + /* Since the writer field is atomic, it can be cleared directly. */ >>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>> >>>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You >>>> would at least want to use &lock->cnts.counter here. >>> >>> Sure, I was wondering about this myself. >>> >>> Will wait for more comments, not sure whether this can be fixed upon >>> commit if there are no other issues. >> >> It's more than just adding another field specifier here. A cast like >> this one is endianness-unsafe, and hence a trap waiting for a big >> endian port attempt to fall into. At the very least this should cause >> a build failure on big endian systems, even better would be if it was >> endianness-safe. > > Why don't we leave the atomic_dec then? Because you need to invoke smp_processor_id() to calculate the value to use in the subtraction. I'm not meaning to say I'm entirely opposed (seeing how much of a discussion we're having), but the "simple write of zero" approach is certainly appealing. Jan
On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: > On 21.02.2020 15:26, Roger Pau Monné wrote: > > On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: > >> On 21.02.2020 10:10, Roger Pau Monné wrote: > >>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: > >>>> Hi, > >>>> > >>>> On 20/02/2020 17:31, Roger Pau Monne wrote: > >>>>> Allow a CPU already holding the lock in write mode to also lock it in > >>>>> read mode. There's no harm in allowing read locking a rwlock that's > >>>>> already owned by the caller (ie: CPU) in write mode. Allowing such > >>>>> accesses is required at least for the CPU maps use-case. > >>>>> > >>>>> In order to do this reserve 14bits of the lock, this allows to support > >>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to > >>>>> signal there are pending writers waiting on the lock and the other to > >>>>> signal the lock is owned in write mode. Note the write related data > >>>>> is using 16bits, this is done in order to be able to clear it (and > >>>>> thus release the lock) using a 16bit atomic write. > >>>>> > >>>>> This reduces the maximum number of concurrent readers from 16777216 to > >>>>> 65536, I think this should still be enough, or else the lock field > >>>>> can be expanded from 32 to 64bits if all architectures support atomic > >>>>> operations on 64bit integers. > >>>> > >>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. > >>>> > >>>>> static inline void _write_unlock(rwlock_t *lock) > >>>>> { > >>>>> - /* > >>>>> - * If the writer field is atomic, it can be cleared directly. > >>>>> - * Otherwise, an atomic subtraction will be used to clear it. > >>>>> - */ > >>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); > >>>>> + /* Since the writer field is atomic, it can be cleared directly. */ > >>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); > >>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); > >>>>> + write_atomic((uint16_t *)&lock->cnts, 0); > >>>> > >>>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You > >>>> would at least want to use &lock->cnts.counter here. > >>> > >>> Sure, I was wondering about this myself. > >>> > >>> Will wait for more comments, not sure whether this can be fixed upon > >>> commit if there are no other issues. > >> > >> It's more than just adding another field specifier here. A cast like > >> this one is endianness-unsafe, and hence a trap waiting for a big > >> endian port attempt to fall into. At the very least this should cause > >> a build failure on big endian systems, even better would be if it was > >> endianness-safe. > > > > Why don't we leave the atomic_dec then? > > Because you need to invoke smp_processor_id() to calculate the value > to use in the subtraction. I'm not meaning to say I'm entirely > opposed (seeing how much of a discussion we're having), but the > "simple write of zero" approach is certainly appealing. Well, we could avoid the smp_processor_id() call and instead use: atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); Note mask is using the low 16bits now, but if we go the atomic_sub route we could change the CPU ID fields to be 12 bits again and thus have some more room for the readers count. Thanks, Roger.
On 21.02.2020 15:19, Jürgen Groß wrote: > On 21.02.20 15:16, Jan Beulich wrote: >> On 21.02.2020 15:06, Jürgen Groß wrote: >>> On 21.02.20 14:49, Julien Grall wrote: >>>> >>>> >>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>> >>>>>>>>> In order to do this reserve 14bits of the lock, this allows to >>>>>>>>> support >>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>>>> signal there are pending writers waiting on the lock and the other to >>>>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>> >>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>> 16777216 to >>>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>>>>>> operations on 64bit integers. >>>>>>>> >>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>>>> >>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>> { >>>>>>>>> - /* >>>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>>> - */ >>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>> directly. */ >>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>> >>>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>>> uint16_t. You >>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>> >>>>>>> Sure, I was wondering about this myself. >>>>>>> >>>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>>> commit if there are no other issues. >>>>>> >>>>>> It's more than just adding another field specifier here. A cast like >>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>> endian port attempt to fall into. At the very least this should cause >>>>>> a build failure on big endian systems, even better would be if it was >>>>>> endianness-safe. >>>>> >>>>> Wouldn't a union be the better choice? >>>> >>>> You would not be able to use atomic_t in that case as you can't assume >>>> the layout of the structure. >>> >>> union rwlockword { >>> atomic_t cnts; >>> uint32_t val; >>> struct { >>> uint16_t write; >>> uint16_t readers; >>> }; >>> }; >>> >>> static inline const uint32_t _qr_bias( >>> const union rwlockword { >>> .write = 0, >>> .readers = 1 >>> } x; >>> >>> return x.val; >>> } >>> >>> ... >>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>> ... >>> >>> I guess this should do the trick, no? >> >> I'm afraid it won't, and not just because of the sizeof() aspect >> already pointed out. Your x variable would end up like this in >> memory: >> >> little: 00 00 01 00 >> big: 00 00 00 01 => 00000001 >> >> which, read as 32-bit value, then ends up being >> >> little: 00010000 >> big: 00000001 >> >> The add therefore would be able to spill into the high 16 bits. > > And why exactly is this worse than just dropping spilled bits? > Both cases will explode rather soon. Both cases can be avoided by > introduction of e.g. ASSERTing that reader isn't ~0 before > incrementing it (or to be zero afterwards). Hmm, yes. I can't reconstruct the thoughts of the people having laid out the bit assignments the way they are right now. I can only assume there was a reason to put the reader count in the high bits. But it looks like my previous remark wasn't even pointing at the worst effect. The resulting big endian 32-bit layout also is not in line with the _QW_* constants, due to mis-ordering of the two halves. Jan
On 21/02/2020 14:35, Jürgen Groß wrote: > On 21.02.20 15:32, Julien Grall wrote: >> >> >> On 21/02/2020 14:16, Jürgen Groß wrote: >>> On 21.02.20 15:11, Julien Grall wrote: >>>> Hi Juergen, >>>> >>>> On 21/02/2020 14:06, Jürgen Groß wrote: >>>>> On 21.02.20 14:49, Julien Grall wrote: >>>>>> >>>>>> >>>>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>>>> Allow a CPU already holding the lock in write mode to also >>>>>>>>>>> lock it in >>>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock >>>>>>>>>>> that's >>>>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing >>>>>>>>>>> such >>>>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>>>> >>>>>>>>>>> In order to do this reserve 14bits of the lock, this allows >>>>>>>>>>> to support >>>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: >>>>>>>>>>> one to >>>>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>>>> other to >>>>>>>>>>> signal the lock is owned in write mode. Note the write >>>>>>>>>>> related data >>>>>>>>>>> is using 16bits, this is done in order to be able to clear it >>>>>>>>>>> (and >>>>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>>>> >>>>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>>>> 16777216 to >>>>>>>>>>> 65536, I think this should still be enough, or else the lock >>>>>>>>>>> field >>>>>>>>>>> can be expanded from 32 to 64bits if all architectures >>>>>>>>>>> support atomic >>>>>>>>>>> operations on 64bit integers. >>>>>>>>>> >>>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit >>>>>>>>>> integers. >>>>>>>>>> >>>>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>>>> { >>>>>>>>>>> - /* >>>>>>>>>>> - * If the writer field is atomic, it can be cleared >>>>>>>>>>> directly. >>>>>>>>>>> - * Otherwise, an atomic subtraction will be used to >>>>>>>>>>> clear it. >>>>>>>>>>> - */ >>>>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>>>> directly. */ >>>>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>>>> >>>>>>>>>> I think this is an abuse to cast an atomic_t() directly into a >>>>>>>>>> uint16_t. You >>>>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>>>> >>>>>>>>> Sure, I was wondering about this myself. >>>>>>>>> >>>>>>>>> Will wait for more comments, not sure whether this can be fixed >>>>>>>>> upon >>>>>>>>> commit if there are no other issues. >>>>>>>> >>>>>>>> It's more than just adding another field specifier here. A cast >>>>>>>> like >>>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>>>> endian port attempt to fall into. At the very least this should >>>>>>>> cause >>>>>>>> a build failure on big endian systems, even better would be if >>>>>>>> it was >>>>>>>> endianness-safe. >>>>>>> >>>>>>> Wouldn't a union be the better choice? >>>>>> >>>>>> You would not be able to use atomic_t in that case as you can't >>>>>> assume the layout of the structure. >>>>> >>>>> union rwlockword { >>>>> atomic_t cnts; >>>>> uint32_t val; >>>>> struct { >>>>> uint16_t write; >>>>> uint16_t readers; >>>>> }; >>>>> }; >>>>> >>>>> static inline const uint32_t _qr_bias( >>>>> const union rwlockword { >>>>> .write = 0, >>>>> .readers = 1 >>>>> } x; >>>>> >>>>> return x.val; >>>>> } >>>>> >>>>> ... >>>>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>>>> ... >>>>> >>>>> I guess this should do the trick, no? >>>> >>>> You are assuming the atomic_t layout which I would rather no want to >>>> happen. >>> >>> That already happened. rwlock.h already assumes atomic_t to be 32 bits >>> wide. I agree it would be better to have an atomic32_t type for this >>> use case. >> >> I don't think you understood my point here. An atomic32_t will not be >> better as you still assume the layout of the structure. I.e the >> structure has only one field. > > I don't understand your reasoning here, sorry. > > Are you saying that a structure of two uint16_t isn't known to be 32 > bits long? You are assuming that atomic_t will always be: struct { int counter; } What if we decide to turn into struct { bool a; int counter; } You would at least want a BUILD_BUG_ON() as at the compiler will throw you an error rather than happily countinuing. Cheers,
On 21.02.2020 15:49, Roger Pau Monné wrote: > On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: >> On 21.02.2020 15:26, Roger Pau Monné wrote: >>> On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: >>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>> Hi, >>>>>> >>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>> >>>>>>> In order to do this reserve 14bits of the lock, this allows to support >>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>> signal there are pending writers waiting on the lock and the other to >>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>> >>>>>>> This reduces the maximum number of concurrent readers from 16777216 to >>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>>>> operations on 64bit integers. >>>>>> >>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>> >>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>> { >>>>>>> - /* >>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>> - */ >>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>> + /* Since the writer field is atomic, it can be cleared directly. */ >>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>> >>>>>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You >>>>>> would at least want to use &lock->cnts.counter here. >>>>> >>>>> Sure, I was wondering about this myself. >>>>> >>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>> commit if there are no other issues. >>>> >>>> It's more than just adding another field specifier here. A cast like >>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>> endian port attempt to fall into. At the very least this should cause >>>> a build failure on big endian systems, even better would be if it was >>>> endianness-safe. >>> >>> Why don't we leave the atomic_dec then? >> >> Because you need to invoke smp_processor_id() to calculate the value >> to use in the subtraction. I'm not meaning to say I'm entirely >> opposed (seeing how much of a discussion we're having), but the >> "simple write of zero" approach is certainly appealing. > > Well, we could avoid the smp_processor_id() call and instead use: > > atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); Which would make me suggest atomic_and() again. Jan
On 21/02/2020 14:49, Roger Pau Monné wrote: > On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: >> Because you need to invoke smp_processor_id() to calculate the value >> to use in the subtraction. I'm not meaning to say I'm entirely >> opposed (seeing how much of a discussion we're having), but the >> "simple write of zero" approach is certainly appealing. > > Well, we could avoid the smp_processor_id() call and instead use: > > atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); AFAICT, this would not be safe because the top 16-bit may change behind your back (via a read_lock). Cheers,
On Fri, Feb 21, 2020 at 03:52:28PM +0100, Jan Beulich wrote: > On 21.02.2020 15:49, Roger Pau Monné wrote: > > On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: > >> On 21.02.2020 15:26, Roger Pau Monné wrote: > >>> On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: > >>>> On 21.02.2020 10:10, Roger Pau Monné wrote: > >>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: > >>>>>> Hi, > >>>>>> > >>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: > >>>>>>> Allow a CPU already holding the lock in write mode to also lock it in > >>>>>>> read mode. There's no harm in allowing read locking a rwlock that's > >>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such > >>>>>>> accesses is required at least for the CPU maps use-case. > >>>>>>> > >>>>>>> In order to do this reserve 14bits of the lock, this allows to support > >>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to > >>>>>>> signal there are pending writers waiting on the lock and the other to > >>>>>>> signal the lock is owned in write mode. Note the write related data > >>>>>>> is using 16bits, this is done in order to be able to clear it (and > >>>>>>> thus release the lock) using a 16bit atomic write. > >>>>>>> > >>>>>>> This reduces the maximum number of concurrent readers from 16777216 to > >>>>>>> 65536, I think this should still be enough, or else the lock field > >>>>>>> can be expanded from 32 to 64bits if all architectures support atomic > >>>>>>> operations on 64bit integers. > >>>>>> > >>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. > >>>>>> > >>>>>>> static inline void _write_unlock(rwlock_t *lock) > >>>>>>> { > >>>>>>> - /* > >>>>>>> - * If the writer field is atomic, it can be cleared directly. > >>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. > >>>>>>> - */ > >>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); > >>>>>>> + /* Since the writer field is atomic, it can be cleared directly. */ > >>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); > >>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); > >>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); > >>>>>> > >>>>>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You > >>>>>> would at least want to use &lock->cnts.counter here. > >>>>> > >>>>> Sure, I was wondering about this myself. > >>>>> > >>>>> Will wait for more comments, not sure whether this can be fixed upon > >>>>> commit if there are no other issues. > >>>> > >>>> It's more than just adding another field specifier here. A cast like > >>>> this one is endianness-unsafe, and hence a trap waiting for a big > >>>> endian port attempt to fall into. At the very least this should cause > >>>> a build failure on big endian systems, even better would be if it was > >>>> endianness-safe. > >>> > >>> Why don't we leave the atomic_dec then? > >> > >> Because you need to invoke smp_processor_id() to calculate the value > >> to use in the subtraction. I'm not meaning to say I'm entirely > >> opposed (seeing how much of a discussion we're having), but the > >> "simple write of zero" approach is certainly appealing. > > > > Well, we could avoid the smp_processor_id() call and instead use: > > > > atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); > > Which would make me suggest atomic_and() again. I'm certainly not opposed to that, but in order to get this regression fixed I would argue that such atomic_sub is no worse than what's currently done. I can look into adding an atomic_and operation to all arches, but this is likely to take time and I would like to get this sorted sooner rather than later. Thanks, Roger.
On Fri, Feb 21, 2020 at 02:56:20PM +0000, Julien Grall wrote: > > > On 21/02/2020 14:49, Roger Pau Monné wrote: > > On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: > > > Because you need to invoke smp_processor_id() to calculate the value > > > to use in the subtraction. I'm not meaning to say I'm entirely > > > opposed (seeing how much of a discussion we're having), but the > > > "simple write of zero" approach is certainly appealing. > > > > Well, we could avoid the smp_processor_id() call and instead use: > > > > atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); > > AFAICT, this would not be safe because the top 16-bit may change behind your > back (via a read_lock). Why not? We don't touch the top 16bits in any way, the subtraction only affects the low 16bits and is done in an atomic manner. Thanks, Roger.
On 21/02/2020 14:59, Roger Pau Monné wrote: > On Fri, Feb 21, 2020 at 02:56:20PM +0000, Julien Grall wrote: >> >> >> On 21/02/2020 14:49, Roger Pau Monné wrote: >>> On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: >>>> Because you need to invoke smp_processor_id() to calculate the value >>>> to use in the subtraction. I'm not meaning to say I'm entirely >>>> opposed (seeing how much of a discussion we're having), but the >>>> "simple write of zero" approach is certainly appealing. >>> >>> Well, we could avoid the smp_processor_id() call and instead use: >>> >>> atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); >> >> AFAICT, this would not be safe because the top 16-bit may change behind your >> back (via a read_lock). > > Why not? We don't touch the top 16bits in any way, the subtraction > only affects the low 16bits and is done in an atomic manner. Hmmm, I did misread the code :(. Sorry for the that. > > Thanks, Roger. >
On 21.02.2020 15:56, Julien Grall wrote: > On 21/02/2020 14:49, Roger Pau Monné wrote: >> On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: >>> Because you need to invoke smp_processor_id() to calculate the value >>> to use in the subtraction. I'm not meaning to say I'm entirely >>> opposed (seeing how much of a discussion we're having), but the >>> "simple write of zero" approach is certainly appealing. >> >> Well, we could avoid the smp_processor_id() call and instead use: >> >> atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); > > AFAICT, this would not be safe because the top 16-bit may change behind > your back (via a read_lock). But them changing between the atomic_read() and the atomic_sub() is fine. The sub then will still only affect the low bits, leaving the high ones as they were (potentially as updated after the read). Jan
On 21.02.20 15:51, Julien Grall wrote: > > > On 21/02/2020 14:35, Jürgen Groß wrote: >> On 21.02.20 15:32, Julien Grall wrote: >>> >>> >>> On 21/02/2020 14:16, Jürgen Groß wrote: >>>> On 21.02.20 15:11, Julien Grall wrote: >>>>> Hi Juergen, >>>>> >>>>> On 21/02/2020 14:06, Jürgen Groß wrote: >>>>>> On 21.02.20 14:49, Julien Grall wrote: >>>>>>> >>>>>>> >>>>>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>>>>> Allow a CPU already holding the lock in write mode to also >>>>>>>>>>>> lock it in >>>>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock >>>>>>>>>>>> that's >>>>>>>>>>>> already owned by the caller (ie: CPU) in write mode. >>>>>>>>>>>> Allowing such >>>>>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>>>>> >>>>>>>>>>>> In order to do this reserve 14bits of the lock, this allows >>>>>>>>>>>> to support >>>>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: >>>>>>>>>>>> one to >>>>>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>>>>> other to >>>>>>>>>>>> signal the lock is owned in write mode. Note the write >>>>>>>>>>>> related data >>>>>>>>>>>> is using 16bits, this is done in order to be able to clear >>>>>>>>>>>> it (and >>>>>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>>>>> >>>>>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>>>>> 16777216 to >>>>>>>>>>>> 65536, I think this should still be enough, or else the lock >>>>>>>>>>>> field >>>>>>>>>>>> can be expanded from 32 to 64bits if all architectures >>>>>>>>>>>> support atomic >>>>>>>>>>>> operations on 64bit integers. >>>>>>>>>>> >>>>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit >>>>>>>>>>> integers. >>>>>>>>>>> >>>>>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>>>>> { >>>>>>>>>>>> - /* >>>>>>>>>>>> - * If the writer field is atomic, it can be cleared >>>>>>>>>>>> directly. >>>>>>>>>>>> - * Otherwise, an atomic subtraction will be used to >>>>>>>>>>>> clear it. >>>>>>>>>>>> - */ >>>>>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>>>>> directly. */ >>>>>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>>>>> >>>>>>>>>>> I think this is an abuse to cast an atomic_t() directly into >>>>>>>>>>> a uint16_t. You >>>>>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>>>>> >>>>>>>>>> Sure, I was wondering about this myself. >>>>>>>>>> >>>>>>>>>> Will wait for more comments, not sure whether this can be >>>>>>>>>> fixed upon >>>>>>>>>> commit if there are no other issues. >>>>>>>>> >>>>>>>>> It's more than just adding another field specifier here. A cast >>>>>>>>> like >>>>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>>>>> endian port attempt to fall into. At the very least this should >>>>>>>>> cause >>>>>>>>> a build failure on big endian systems, even better would be if >>>>>>>>> it was >>>>>>>>> endianness-safe. >>>>>>>> >>>>>>>> Wouldn't a union be the better choice? >>>>>>> >>>>>>> You would not be able to use atomic_t in that case as you can't >>>>>>> assume the layout of the structure. >>>>>> >>>>>> union rwlockword { >>>>>> atomic_t cnts; >>>>>> uint32_t val; >>>>>> struct { >>>>>> uint16_t write; >>>>>> uint16_t readers; >>>>>> }; >>>>>> }; >>>>>> >>>>>> static inline const uint32_t _qr_bias( >>>>>> const union rwlockword { >>>>>> .write = 0, >>>>>> .readers = 1 >>>>>> } x; >>>>>> >>>>>> return x.val; >>>>>> } >>>>>> >>>>>> ... >>>>>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>>>>> ... >>>>>> >>>>>> I guess this should do the trick, no? >>>>> >>>>> You are assuming the atomic_t layout which I would rather no want >>>>> to happen. >>>> >>>> That already happened. rwlock.h already assumes atomic_t to be 32 bits >>>> wide. I agree it would be better to have an atomic32_t type for this >>>> use case. >>> >>> I don't think you understood my point here. An atomic32_t will not be >>> better as you still assume the layout of the structure. I.e the >>> structure has only one field. >> >> I don't understand your reasoning here, sorry. >> >> Are you saying that a structure of two uint16_t isn't known to be 32 >> bits long? > > You are assuming that atomic_t will always be: > > struct > { > int counter; > } > > What if we decide to turn into > > struct > { > bool a; > int counter; > } As said before: then queue_write_lock_slowpath() is already broken. Juergen
On 21.02.2020 15:58, Roger Pau Monné wrote: > On Fri, Feb 21, 2020 at 03:52:28PM +0100, Jan Beulich wrote: >> On 21.02.2020 15:49, Roger Pau Monné wrote: >>> On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: >>>> On 21.02.2020 15:26, Roger Pau Monné wrote: >>>>> On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: >>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>> Allow a CPU already holding the lock in write mode to also lock it in >>>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's >>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such >>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>> >>>>>>>>> In order to do this reserve 14bits of the lock, this allows to support >>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to >>>>>>>>> signal there are pending writers waiting on the lock and the other to >>>>>>>>> signal the lock is owned in write mode. Note the write related data >>>>>>>>> is using 16bits, this is done in order to be able to clear it (and >>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>> >>>>>>>>> This reduces the maximum number of concurrent readers from 16777216 to >>>>>>>>> 65536, I think this should still be enough, or else the lock field >>>>>>>>> can be expanded from 32 to 64bits if all architectures support atomic >>>>>>>>> operations on 64bit integers. >>>>>>>> >>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. >>>>>>>> >>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>> { >>>>>>>>> - /* >>>>>>>>> - * If the writer field is atomic, it can be cleared directly. >>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. >>>>>>>>> - */ >>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>> + /* Since the writer field is atomic, it can be cleared directly. */ >>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>> >>>>>>>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You >>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>> >>>>>>> Sure, I was wondering about this myself. >>>>>>> >>>>>>> Will wait for more comments, not sure whether this can be fixed upon >>>>>>> commit if there are no other issues. >>>>>> >>>>>> It's more than just adding another field specifier here. A cast like >>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>> endian port attempt to fall into. At the very least this should cause >>>>>> a build failure on big endian systems, even better would be if it was >>>>>> endianness-safe. >>>>> >>>>> Why don't we leave the atomic_dec then? >>>> >>>> Because you need to invoke smp_processor_id() to calculate the value >>>> to use in the subtraction. I'm not meaning to say I'm entirely >>>> opposed (seeing how much of a discussion we're having), but the >>>> "simple write of zero" approach is certainly appealing. >>> >>> Well, we could avoid the smp_processor_id() call and instead use: >>> >>> atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); >> >> Which would make me suggest atomic_and() again. > > I'm certainly not opposed to that, but in order to get this regression > fixed I would argue that such atomic_sub is no worse than what's > currently done. It's one more read of a potentially heavily contended memory location. > I can look into adding an atomic_and operation to all arches, but this > is likely to take time and I would like to get this sorted sooner > rather than later. Because of it presumably taking time it was that I also brought up to consider reverting. Then again, the Arm atomic_and() can, afaict, be trivially cloned from their "add" counterparts, by replacing the middle 'd' both in the function names and in the insn mnemonics. It's just that there shouldn't be atomic_and_return() (for not being useful, as it doesn't allow reconstructing the original value). Jan
On 21.02.2020 16:13, Jürgen Groß wrote: > On 21.02.20 15:51, Julien Grall wrote: >> >> >> On 21/02/2020 14:35, Jürgen Groß wrote: >>> On 21.02.20 15:32, Julien Grall wrote: >>>> >>>> >>>> On 21/02/2020 14:16, Jürgen Groß wrote: >>>>> On 21.02.20 15:11, Julien Grall wrote: >>>>>> Hi Juergen, >>>>>> >>>>>> On 21/02/2020 14:06, Jürgen Groß wrote: >>>>>>> On 21.02.20 14:49, Julien Grall wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>>>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>>>>>> Allow a CPU already holding the lock in write mode to also >>>>>>>>>>>>> lock it in >>>>>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock >>>>>>>>>>>>> that's >>>>>>>>>>>>> already owned by the caller (ie: CPU) in write mode. >>>>>>>>>>>>> Allowing such >>>>>>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>>>>>> >>>>>>>>>>>>> In order to do this reserve 14bits of the lock, this allows >>>>>>>>>>>>> to support >>>>>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: >>>>>>>>>>>>> one to >>>>>>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>>>>>> other to >>>>>>>>>>>>> signal the lock is owned in write mode. Note the write >>>>>>>>>>>>> related data >>>>>>>>>>>>> is using 16bits, this is done in order to be able to clear >>>>>>>>>>>>> it (and >>>>>>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>>>>>> >>>>>>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>>>>>> 16777216 to >>>>>>>>>>>>> 65536, I think this should still be enough, or else the lock >>>>>>>>>>>>> field >>>>>>>>>>>>> can be expanded from 32 to 64bits if all architectures >>>>>>>>>>>>> support atomic >>>>>>>>>>>>> operations on 64bit integers. >>>>>>>>>>>> >>>>>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit >>>>>>>>>>>> integers. >>>>>>>>>>>> >>>>>>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>>>>>> { >>>>>>>>>>>>> - /* >>>>>>>>>>>>> - * If the writer field is atomic, it can be cleared >>>>>>>>>>>>> directly. >>>>>>>>>>>>> - * Otherwise, an atomic subtraction will be used to >>>>>>>>>>>>> clear it. >>>>>>>>>>>>> - */ >>>>>>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>>>>>> directly. */ >>>>>>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>>>>>> >>>>>>>>>>>> I think this is an abuse to cast an atomic_t() directly into >>>>>>>>>>>> a uint16_t. You >>>>>>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>>>>>> >>>>>>>>>>> Sure, I was wondering about this myself. >>>>>>>>>>> >>>>>>>>>>> Will wait for more comments, not sure whether this can be >>>>>>>>>>> fixed upon >>>>>>>>>>> commit if there are no other issues. >>>>>>>>>> >>>>>>>>>> It's more than just adding another field specifier here. A cast >>>>>>>>>> like >>>>>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>>>>>> endian port attempt to fall into. At the very least this should >>>>>>>>>> cause >>>>>>>>>> a build failure on big endian systems, even better would be if >>>>>>>>>> it was >>>>>>>>>> endianness-safe. >>>>>>>>> >>>>>>>>> Wouldn't a union be the better choice? >>>>>>>> >>>>>>>> You would not be able to use atomic_t in that case as you can't >>>>>>>> assume the layout of the structure. >>>>>>> >>>>>>> union rwlockword { >>>>>>> atomic_t cnts; >>>>>>> uint32_t val; >>>>>>> struct { >>>>>>> uint16_t write; >>>>>>> uint16_t readers; >>>>>>> }; >>>>>>> }; >>>>>>> >>>>>>> static inline const uint32_t _qr_bias( >>>>>>> const union rwlockword { >>>>>>> .write = 0, >>>>>>> .readers = 1 >>>>>>> } x; >>>>>>> >>>>>>> return x.val; >>>>>>> } >>>>>>> >>>>>>> ... >>>>>>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>>>>>> ... >>>>>>> >>>>>>> I guess this should do the trick, no? >>>>>> >>>>>> You are assuming the atomic_t layout which I would rather no want >>>>>> to happen. >>>>> >>>>> That already happened. rwlock.h already assumes atomic_t to be 32 bits >>>>> wide. I agree it would be better to have an atomic32_t type for this >>>>> use case. >>>> >>>> I don't think you understood my point here. An atomic32_t will not be >>>> better as you still assume the layout of the structure. I.e the >>>> structure has only one field. >>> >>> I don't understand your reasoning here, sorry. >>> >>> Are you saying that a structure of two uint16_t isn't known to be 32 >>> bits long? >> >> You are assuming that atomic_t will always be: >> >> struct >> { >> int counter; >> } >> >> What if we decide to turn into >> >> struct >> { >> bool a; >> int counter; >> } > > As said before: then queue_write_lock_slowpath() is already broken. Why? The atomic_*() used would still only act on the counter field (for their actual operation, i.e. what matters to callers; the other field(s) could be statistics or whatever). Jan
On 21.02.20 16:17, Jan Beulich wrote: > On 21.02.2020 16:13, Jürgen Groß wrote: >> On 21.02.20 15:51, Julien Grall wrote: >>> >>> >>> On 21/02/2020 14:35, Jürgen Groß wrote: >>>> On 21.02.20 15:32, Julien Grall wrote: >>>>> >>>>> >>>>> On 21/02/2020 14:16, Jürgen Groß wrote: >>>>>> On 21.02.20 15:11, Julien Grall wrote: >>>>>>> Hi Juergen, >>>>>>> >>>>>>> On 21/02/2020 14:06, Jürgen Groß wrote: >>>>>>>> On 21.02.20 14:49, Julien Grall wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 21/02/2020 13:46, Jürgen Groß wrote: >>>>>>>>>> On 21.02.20 14:36, Jan Beulich wrote: >>>>>>>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: >>>>>>>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: >>>>>>>>>>>>>> Allow a CPU already holding the lock in write mode to also >>>>>>>>>>>>>> lock it in >>>>>>>>>>>>>> read mode. There's no harm in allowing read locking a rwlock >>>>>>>>>>>>>> that's >>>>>>>>>>>>>> already owned by the caller (ie: CPU) in write mode. >>>>>>>>>>>>>> Allowing such >>>>>>>>>>>>>> accesses is required at least for the CPU maps use-case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In order to do this reserve 14bits of the lock, this allows >>>>>>>>>>>>>> to support >>>>>>>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: >>>>>>>>>>>>>> one to >>>>>>>>>>>>>> signal there are pending writers waiting on the lock and the >>>>>>>>>>>>>> other to >>>>>>>>>>>>>> signal the lock is owned in write mode. Note the write >>>>>>>>>>>>>> related data >>>>>>>>>>>>>> is using 16bits, this is done in order to be able to clear >>>>>>>>>>>>>> it (and >>>>>>>>>>>>>> thus release the lock) using a 16bit atomic write. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This reduces the maximum number of concurrent readers from >>>>>>>>>>>>>> 16777216 to >>>>>>>>>>>>>> 65536, I think this should still be enough, or else the lock >>>>>>>>>>>>>> field >>>>>>>>>>>>>> can be expanded from 32 to 64bits if all architectures >>>>>>>>>>>>>> support atomic >>>>>>>>>>>>>> operations on 64bit integers. >>>>>>>>>>>>> >>>>>>>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit >>>>>>>>>>>>> integers. >>>>>>>>>>>>> >>>>>>>>>>>>>> static inline void _write_unlock(rwlock_t *lock) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> - /* >>>>>>>>>>>>>> - * If the writer field is atomic, it can be cleared >>>>>>>>>>>>>> directly. >>>>>>>>>>>>>> - * Otherwise, an atomic subtraction will be used to >>>>>>>>>>>>>> clear it. >>>>>>>>>>>>>> - */ >>>>>>>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); >>>>>>>>>>>>>> + /* Since the writer field is atomic, it can be cleared >>>>>>>>>>>>>> directly. */ >>>>>>>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); >>>>>>>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); >>>>>>>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); >>>>>>>>>>>>> >>>>>>>>>>>>> I think this is an abuse to cast an atomic_t() directly into >>>>>>>>>>>>> a uint16_t. You >>>>>>>>>>>>> would at least want to use &lock->cnts.counter here. >>>>>>>>>>>> >>>>>>>>>>>> Sure, I was wondering about this myself. >>>>>>>>>>>> >>>>>>>>>>>> Will wait for more comments, not sure whether this can be >>>>>>>>>>>> fixed upon >>>>>>>>>>>> commit if there are no other issues. >>>>>>>>>>> >>>>>>>>>>> It's more than just adding another field specifier here. A cast >>>>>>>>>>> like >>>>>>>>>>> this one is endianness-unsafe, and hence a trap waiting for a big >>>>>>>>>>> endian port attempt to fall into. At the very least this should >>>>>>>>>>> cause >>>>>>>>>>> a build failure on big endian systems, even better would be if >>>>>>>>>>> it was >>>>>>>>>>> endianness-safe. >>>>>>>>>> >>>>>>>>>> Wouldn't a union be the better choice? >>>>>>>>> >>>>>>>>> You would not be able to use atomic_t in that case as you can't >>>>>>>>> assume the layout of the structure. >>>>>>>> >>>>>>>> union rwlockword { >>>>>>>> atomic_t cnts; >>>>>>>> uint32_t val; >>>>>>>> struct { >>>>>>>> uint16_t write; >>>>>>>> uint16_t readers; >>>>>>>> }; >>>>>>>> }; >>>>>>>> >>>>>>>> static inline const uint32_t _qr_bias( >>>>>>>> const union rwlockword { >>>>>>>> .write = 0, >>>>>>>> .readers = 1 >>>>>>>> } x; >>>>>>>> >>>>>>>> return x.val; >>>>>>>> } >>>>>>>> >>>>>>>> ... >>>>>>>> cnts = atomic_add_return(_qr_bias(), &lock->cnts); >>>>>>>> ... >>>>>>>> >>>>>>>> I guess this should do the trick, no? >>>>>>> >>>>>>> You are assuming the atomic_t layout which I would rather no want >>>>>>> to happen. >>>>>> >>>>>> That already happened. rwlock.h already assumes atomic_t to be 32 bits >>>>>> wide. I agree it would be better to have an atomic32_t type for this >>>>>> use case. >>>>> >>>>> I don't think you understood my point here. An atomic32_t will not be >>>>> better as you still assume the layout of the structure. I.e the >>>>> structure has only one field. >>>> >>>> I don't understand your reasoning here, sorry. >>>> >>>> Are you saying that a structure of two uint16_t isn't known to be 32 >>>> bits long? >>> >>> You are assuming that atomic_t will always be: >>> >>> struct >>> { >>> int counter; >>> } >>> >>> What if we decide to turn into >>> >>> struct >>> { >>> bool a; >>> int counter; >>> } >> >> As said before: then queue_write_lock_slowpath() is already broken. > > Why? The atomic_*() used would still only act on the counter field > (for their actual operation, i.e. what matters to callers; the > other field(s) could be statistics or whatever). No: u32 cnts; ... if ( !(cnts & _QW_WMASK) && (atomic_cmpxchg(&lock->cnts, cnts, cnts | _QW_WAITING) == cnts) ) Juergen
On 21.02.2020 16:23, Jürgen Groß wrote: > On 21.02.20 16:17, Jan Beulich wrote: >> On 21.02.2020 16:13, Jürgen Groß wrote: >>> On 21.02.20 15:51, Julien Grall wrote: >>>> You are assuming that atomic_t will always be: >>>> >>>> struct >>>> { >>>> int counter; >>>> } >>>> >>>> What if we decide to turn into >>>> >>>> struct >>>> { >>>> bool a; >>>> int counter; >>>> } >>> >>> As said before: then queue_write_lock_slowpath() is already broken. >> >> Why? The atomic_*() used would still only act on the counter field >> (for their actual operation, i.e. what matters to callers; the >> other field(s) could be statistics or whatever). > > No: > > u32 cnts; > ... > if ( !(cnts & _QW_WMASK) && > (atomic_cmpxchg(&lock->cnts, cnts, > cnts | _QW_WAITING) == cnts) ) I must be blind then. As said, atomic_cmpxchg() would still (for the purpose of consuming "cnts") act on only the "counter" field. It may additionally e.g. increment a stats counter. Jan
On 21.02.20 16:27, Jan Beulich wrote: > On 21.02.2020 16:23, Jürgen Groß wrote: >> On 21.02.20 16:17, Jan Beulich wrote: >>> On 21.02.2020 16:13, Jürgen Groß wrote: >>>> On 21.02.20 15:51, Julien Grall wrote: >>>>> You are assuming that atomic_t will always be: >>>>> >>>>> struct >>>>> { >>>>> int counter; >>>>> } >>>>> >>>>> What if we decide to turn into >>>>> >>>>> struct >>>>> { >>>>> bool a; >>>>> int counter; >>>>> } >>>> >>>> As said before: then queue_write_lock_slowpath() is already broken. >>> >>> Why? The atomic_*() used would still only act on the counter field >>> (for their actual operation, i.e. what matters to callers; the >>> other field(s) could be statistics or whatever). >> >> No: >> >> u32 cnts; >> ... >> if ( !(cnts & _QW_WMASK) && >> (atomic_cmpxchg(&lock->cnts, cnts, >> cnts | _QW_WAITING) == cnts) ) > > I must be blind then. As said, atomic_cmpxchg() would still (for the > purpose of consuming "cnts") act on only the "counter" field. It may > additionally e.g. increment a stats counter. Oh, sorry, I misunderstood you. Juergen
On Fri, Feb 21, 2020 at 04:15:39PM +0100, Jan Beulich wrote: > On 21.02.2020 15:58, Roger Pau Monné wrote: > > On Fri, Feb 21, 2020 at 03:52:28PM +0100, Jan Beulich wrote: > >> On 21.02.2020 15:49, Roger Pau Monné wrote: > >>> On Fri, Feb 21, 2020 at 03:41:59PM +0100, Jan Beulich wrote: > >>>> On 21.02.2020 15:26, Roger Pau Monné wrote: > >>>>> On Fri, Feb 21, 2020 at 02:36:52PM +0100, Jan Beulich wrote: > >>>>>> On 21.02.2020 10:10, Roger Pau Monné wrote: > >>>>>>> On Thu, Feb 20, 2020 at 07:20:06PM +0000, Julien Grall wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> On 20/02/2020 17:31, Roger Pau Monne wrote: > >>>>>>>>> Allow a CPU already holding the lock in write mode to also lock it in > >>>>>>>>> read mode. There's no harm in allowing read locking a rwlock that's > >>>>>>>>> already owned by the caller (ie: CPU) in write mode. Allowing such > >>>>>>>>> accesses is required at least for the CPU maps use-case. > >>>>>>>>> > >>>>>>>>> In order to do this reserve 14bits of the lock, this allows to support > >>>>>>>>> up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to > >>>>>>>>> signal there are pending writers waiting on the lock and the other to > >>>>>>>>> signal the lock is owned in write mode. Note the write related data > >>>>>>>>> is using 16bits, this is done in order to be able to clear it (and > >>>>>>>>> thus release the lock) using a 16bit atomic write. > >>>>>>>>> > >>>>>>>>> This reduces the maximum number of concurrent readers from 16777216 to > >>>>>>>>> 65536, I think this should still be enough, or else the lock field > >>>>>>>>> can be expanded from 32 to 64bits if all architectures support atomic > >>>>>>>>> operations on 64bit integers. > >>>>>>>> > >>>>>>>> FWIW, arm32 is able to support atomic operations on 64-bit integers. > >>>>>>>> > >>>>>>>>> static inline void _write_unlock(rwlock_t *lock) > >>>>>>>>> { > >>>>>>>>> - /* > >>>>>>>>> - * If the writer field is atomic, it can be cleared directly. > >>>>>>>>> - * Otherwise, an atomic subtraction will be used to clear it. > >>>>>>>>> - */ > >>>>>>>>> - atomic_sub(_QW_LOCKED, &lock->cnts); > >>>>>>>>> + /* Since the writer field is atomic, it can be cleared directly. */ > >>>>>>>>> + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); > >>>>>>>>> + BUILD_BUG_ON(_QR_SHIFT != 16); > >>>>>>>>> + write_atomic((uint16_t *)&lock->cnts, 0); > >>>>>>>> > >>>>>>>> I think this is an abuse to cast an atomic_t() directly into a uint16_t. You > >>>>>>>> would at least want to use &lock->cnts.counter here. > >>>>>>> > >>>>>>> Sure, I was wondering about this myself. > >>>>>>> > >>>>>>> Will wait for more comments, not sure whether this can be fixed upon > >>>>>>> commit if there are no other issues. > >>>>>> > >>>>>> It's more than just adding another field specifier here. A cast like > >>>>>> this one is endianness-unsafe, and hence a trap waiting for a big > >>>>>> endian port attempt to fall into. At the very least this should cause > >>>>>> a build failure on big endian systems, even better would be if it was > >>>>>> endianness-safe. > >>>>> > >>>>> Why don't we leave the atomic_dec then? > >>>> > >>>> Because you need to invoke smp_processor_id() to calculate the value > >>>> to use in the subtraction. I'm not meaning to say I'm entirely > >>>> opposed (seeing how much of a discussion we're having), but the > >>>> "simple write of zero" approach is certainly appealing. > >>> > >>> Well, we could avoid the smp_processor_id() call and instead use: > >>> > >>> atomic_sub(atomic_read(&lock->cnts) & 0xffff, &lock->cnts); > >> > >> Which would make me suggest atomic_and() again. > > > > I'm certainly not opposed to that, but in order to get this regression > > fixed I would argue that such atomic_sub is no worse than what's > > currently done. > > It's one more read of a potentially heavily contended memory location. > > > I can look into adding an atomic_and operation to all arches, but this > > is likely to take time and I would like to get this sorted sooner > > rather than later. > > Because of it presumably taking time it was that I also brought up to > consider reverting. Then again, the Arm atomic_and() can, afaict, be > trivially cloned from their "add" counterparts, by replacing the > middle 'd' both in the function names and in the insn mnemonics. It's > just that there shouldn't be atomic_and_return() (for not being > useful, as it doesn't allow reconstructing the original value). I've added atomic_and in a pre-patch and I'm currently waiting for gitlab to finish testing to assert that I'm not breaking the build. Will post once that's finished if successful. FWIW, the branch is at: http://xenbits.xen.org/gitweb/?p=people/royger/xen.git;a=shortlog;h=refs/heads/rwlock3 Thanks, Roger.
diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c index d568bbf6de..dadab372b5 100644 --- a/xen/common/rwlock.c +++ b/xen/common/rwlock.c @@ -69,7 +69,7 @@ void queue_write_lock_slowpath(rwlock_t *lock) /* Try to acquire the lock directly if no reader is present. */ if ( !atomic_read(&lock->cnts) && - (atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0) ) + (atomic_cmpxchg(&lock->cnts, 0, _write_lock_val()) == 0) ) goto unlock; /* @@ -93,7 +93,7 @@ void queue_write_lock_slowpath(rwlock_t *lock) cnts = atomic_read(&lock->cnts); if ( (cnts == _QW_WAITING) && (atomic_cmpxchg(&lock->cnts, _QW_WAITING, - _QW_LOCKED) == _QW_WAITING) ) + _write_lock_val()) == _QW_WAITING) ) break; cpu_relax(); diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h index 3dfea1ac2a..59f0d21075 100644 --- a/xen/include/xen/rwlock.h +++ b/xen/include/xen/rwlock.h @@ -20,21 +20,30 @@ typedef struct { #define DEFINE_RWLOCK(l) rwlock_t l = RW_LOCK_UNLOCKED #define rwlock_init(l) (*(l) = (rwlock_t)RW_LOCK_UNLOCKED) -/* - * Writer states & reader shift and bias. - * - * Writer field is 8 bit to allow for potential optimisation, see - * _write_unlock(). - */ -#define _QW_WAITING 1 /* A writer is waiting */ -#define _QW_LOCKED 0xff /* A writer holds the lock */ -#define _QW_WMASK 0xff /* Writer mask.*/ -#define _QR_SHIFT 8 /* Reader count shift */ +/* Writer states & reader shift and bias. */ +#define _QW_CPUMASK 0x3fff /* Writer CPU mask */ +#define _QW_SHIFT 14 /* Writer flags shift */ +#define _QW_WAITING (1u << _QW_SHIFT) /* A writer is waiting */ +#define _QW_LOCKED (3u << _QW_SHIFT) /* A writer holds the lock */ +#define _QW_WMASK (3u << _QW_SHIFT) /* Writer mask */ +#define _QR_SHIFT 16 /* Reader count shift */ #define _QR_BIAS (1U << _QR_SHIFT) void queue_read_lock_slowpath(rwlock_t *lock); void queue_write_lock_slowpath(rwlock_t *lock); +static inline bool _is_write_locked_by_me(uint32_t cnts) +{ + BUILD_BUG_ON(_QW_CPUMASK < NR_CPUS); + return (cnts & _QW_WMASK) == _QW_LOCKED && + (cnts & _QW_CPUMASK) == smp_processor_id(); +} + +static inline bool _can_read_lock(uint32_t cnts) +{ + return !(cnts & _QW_WMASK) || _is_write_locked_by_me(cnts); +} + /* * _read_trylock - try to acquire read lock of a queue rwlock. * @lock : Pointer to queue rwlock structure. @@ -45,10 +54,10 @@ static inline int _read_trylock(rwlock_t *lock) u32 cnts; cnts = atomic_read(&lock->cnts); - if ( likely(!(cnts & _QW_WMASK)) ) + if ( likely(_can_read_lock(cnts)) ) { cnts = (u32)atomic_add_return(_QR_BIAS, &lock->cnts); - if ( likely(!(cnts & _QW_WMASK)) ) + if ( likely(_can_read_lock(cnts)) ) return 1; atomic_sub(_QR_BIAS, &lock->cnts); } @@ -64,7 +73,7 @@ static inline void _read_lock(rwlock_t *lock) u32 cnts; cnts = atomic_add_return(_QR_BIAS, &lock->cnts); - if ( likely(!(cnts & _QW_WMASK)) ) + if ( likely(_can_read_lock(cnts)) ) return; /* The slowpath will decrement the reader count, if necessary. */ @@ -115,6 +124,11 @@ static inline int _rw_is_locked(rwlock_t *lock) return atomic_read(&lock->cnts); } +static inline uint32_t _write_lock_val(void) +{ + return _QW_LOCKED | smp_processor_id(); +} + /* * queue_write_lock - acquire write lock of a queue rwlock. * @lock : Pointer to queue rwlock structure. @@ -122,7 +136,7 @@ static inline int _rw_is_locked(rwlock_t *lock) static inline void _write_lock(rwlock_t *lock) { /* Optimize for the unfair lock case where the fair flag is 0. */ - if ( atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0 ) + if ( atomic_cmpxchg(&lock->cnts, 0, _write_lock_val()) == 0 ) return; queue_write_lock_slowpath(lock); @@ -157,16 +171,15 @@ static inline int _write_trylock(rwlock_t *lock) if ( unlikely(cnts) ) return 0; - return likely(atomic_cmpxchg(&lock->cnts, 0, _QW_LOCKED) == 0); + return likely(atomic_cmpxchg(&lock->cnts, 0, _write_lock_val()) == 0); } static inline void _write_unlock(rwlock_t *lock) { - /* - * If the writer field is atomic, it can be cleared directly. - * Otherwise, an atomic subtraction will be used to clear it. - */ - atomic_sub(_QW_LOCKED, &lock->cnts); + /* Since the writer field is atomic, it can be cleared directly. */ + ASSERT(_is_write_locked_by_me(atomic_read(&lock->cnts))); + BUILD_BUG_ON(_QR_SHIFT != 16); + write_atomic((uint16_t *)&lock->cnts, 0); } static inline void _write_unlock_irq(rwlock_t *lock)
Allow a CPU already holding the lock in write mode to also lock it in read mode. There's no harm in allowing read locking a rwlock that's already owned by the caller (ie: CPU) in write mode. Allowing such accesses is required at least for the CPU maps use-case. In order to do this reserve 14bits of the lock, this allows to support up to 16384 CPUs. Also reduce the write lock mask to 2 bits: one to signal there are pending writers waiting on the lock and the other to signal the lock is owned in write mode. Note the write related data is using 16bits, this is done in order to be able to clear it (and thus release the lock) using a 16bit atomic write. This reduces the maximum number of concurrent readers from 16777216 to 65536, I think this should still be enough, or else the lock field can be expanded from 32 to 64bits if all architectures support atomic operations on 64bit integers. Fixes: 5872c83b42c608 ('smp: convert the cpu maps lock into a rw lock') Reported-by: Jan Beulich <jbeulich@suse.com> Reported-by: Jürgen Groß <jgross@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Changes since v1: - Move the processor ID field to start at bit 0 and hence avoid the shift when checking it. - Enlarge write related structures to use 16bit, and hence allow them to be cleared using an atomic write. --- xen/common/rwlock.c | 4 +-- xen/include/xen/rwlock.h | 53 +++++++++++++++++++++++++--------------- 2 files changed, 35 insertions(+), 22 deletions(-)