Message ID | 20190812143625.42745-3-andrew.murray@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: avoid out-of-line ll/sc atomics | expand |
Hi Andrew, On Mon, Aug 12, 2019 at 03:36:22PM +0100, Andrew Murray wrote: > For many of the ll/sc atomic operations we use the 'I' machine constraint > regardless to the instruction used - this may not be optimal. > > Let's add an additional parameter to the ATOMIC_xx macros that allows the > caller to specify an appropriate machine constraint. > > Let's also improve __CMPXCHG_CASE by replacing the 'K' constraint with a > caller provided constraint. Please note that whilst we would like to use > the 'K' constraint on 32 bit operations, we choose not to provide any > constraint to avoid a GCC bug which results in a build error. > > Earlier versions of GCC (no later than 8.1.0) appear to incorrectly handle > the 'K' constraint for the value 4294967295. From reading the above, it's difficult to discern what's a fix and what's an improvement, and I think we need to be more explicit about that. It would also be helpful to have the necessary context up-front. How about: | The A64 ISA accepts distinct (but overlapping) ranges of immediates for: | | * add arithmetic instructions ('I' machine constraint) | * sub arithmetic instructions ('J' machine constraint) | * 32-bit logical instructions ('K' machine constraint) | * 64-bit logical instructions ('L' machine constraint) | | ... but we currently use the 'I' constraint for many atomic operations | using sub or logical instructions, which is not always valid. | | When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates | to be passed to instructions, potentially resulting in a build failure. | When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics | always use a register as they have no visibility of the value passed by | the caller. | | This patch adds a constraint parameter to the ATOMIC_xx and | __CMPXCHG_CASE macros so that we can pass appropriate constraints for | each case, with uses updated accordingly. | | Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted | 0xffffffff, so we must instead force the use of a register. Given we haven't had any bug reports, I'm not sure whether this needs a Fixes tag or Cc stable. This has been a latent issue for a long time, but upstream code doesn't seem to have tickled it. [...] > -ATOMIC_OPS(and, and) > -ATOMIC_OPS(andnot, bic) > -ATOMIC_OPS(or, orr) > -ATOMIC_OPS(xor, eor) > +ATOMIC_OPS(and, and, K) > +ATOMIC_OPS(andnot, bic, ) > +ATOMIC_OPS(or, orr, K) > +ATOMIC_OPS(xor, eor, K) Surely it's not safe to use the K constraint here, either? AFAICT code like: atomic_xor(~0, &atom); ... would suffer from the same problem as described for cmpxchg. [...] > -ATOMIC64_OPS(and, and) > -ATOMIC64_OPS(andnot, bic) > -ATOMIC64_OPS(or, orr) > -ATOMIC64_OPS(xor, eor) > +ATOMIC64_OPS(and, and, K) > +ATOMIC64_OPS(andnot, bic, ) > +ATOMIC64_OPS(or, orr, K) > +ATOMIC64_OPS(xor, eor, K) Shouldn't these be 'L'? IIUC K is a subset of L, so if that's deliberate we should call that out explicitly... > +__CMPXCHG_CASE(w, b, , 8, , , , , ) > +__CMPXCHG_CASE(w, h, , 16, , , , , ) > +__CMPXCHG_CASE(w, , , 32, , , , , ) > +__CMPXCHG_CASE( , , , 64, , , , , L) > +__CMPXCHG_CASE(w, b, acq_, 8, , a, , "memory", ) > +__CMPXCHG_CASE(w, h, acq_, 16, , a, , "memory", ) > +__CMPXCHG_CASE(w, , acq_, 32, , a, , "memory", ) > +__CMPXCHG_CASE( , , acq_, 64, , a, , "memory", L) > +__CMPXCHG_CASE(w, b, rel_, 8, , , l, "memory", ) > +__CMPXCHG_CASE(w, h, rel_, 16, , , l, "memory", ) > +__CMPXCHG_CASE(w, , rel_, 32, , , l, "memory", ) > +__CMPXCHG_CASE( , , rel_, 64, , , l, "memory", L) > +__CMPXCHG_CASE(w, b, mb_, 8, dmb ish, , l, "memory", ) > +__CMPXCHG_CASE(w, h, mb_, 16, dmb ish, , l, "memory", ) > +__CMPXCHG_CASE(w, , mb_, 32, dmb ish, , l, "memory", ) > +__CMPXCHG_CASE( , , mb_, 64, dmb ish, , l, "memory", L) ... but these uses imply that's not the case. Otherwise this looks good to me. Thanks, Mark.
On Thu, Aug 22, 2019 at 04:32:23PM +0100, Mark Rutland wrote: > Hi Andrew, > > On Mon, Aug 12, 2019 at 03:36:22PM +0100, Andrew Murray wrote: > > For many of the ll/sc atomic operations we use the 'I' machine constraint > > regardless to the instruction used - this may not be optimal. > > > > Let's add an additional parameter to the ATOMIC_xx macros that allows the > > caller to specify an appropriate machine constraint. > > > > Let's also improve __CMPXCHG_CASE by replacing the 'K' constraint with a > > caller provided constraint. Please note that whilst we would like to use > > the 'K' constraint on 32 bit operations, we choose not to provide any > > constraint to avoid a GCC bug which results in a build error. > > > > Earlier versions of GCC (no later than 8.1.0) appear to incorrectly handle > > the 'K' constraint for the value 4294967295. > > From reading the above, it's difficult to discern what's a fix and > what's an improvement, and I think we need to be more explicit about > that. It would also be helpful to have the necessary context up-front. > > How about: > > | The A64 ISA accepts distinct (but overlapping) ranges of immediates for: > | > | * add arithmetic instructions ('I' machine constraint) > | * sub arithmetic instructions ('J' machine constraint) > | * 32-bit logical instructions ('K' machine constraint) > | * 64-bit logical instructions ('L' machine constraint) > | > | ... but we currently use the 'I' constraint for many atomic operations > | using sub or logical instructions, which is not always valid. > | > | When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates > | to be passed to instructions, potentially resulting in a build failure. > | When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics > | always use a register as they have no visibility of the value passed by > | the caller. > | > | This patch adds a constraint parameter to the ATOMIC_xx and > | __CMPXCHG_CASE macros so that we can pass appropriate constraints for > | each case, with uses updated accordingly. > | > | Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted > | 0xffffffff, so we must instead force the use of a register. Looks great - I'll adopt this, thanks for writing it. > > Given we haven't had any bug reports, I'm not sure whether this needs a > Fixes tag or Cc stable. This has been a latent issue for a long time, > but upstream code doesn't seem to have tickled it. Yes I guess this is more a correctness issue rather than a reproducible bug in upstream code. I won't add a fixes tag or CC to stable. > > [...] > > > -ATOMIC_OPS(and, and) > > -ATOMIC_OPS(andnot, bic) > > -ATOMIC_OPS(or, orr) > > -ATOMIC_OPS(xor, eor) > > +ATOMIC_OPS(and, and, K) > > +ATOMIC_OPS(andnot, bic, ) > > +ATOMIC_OPS(or, orr, K) > > +ATOMIC_OPS(xor, eor, K) > > Surely it's not safe to use the K constraint here, either? AFAICT code > like: > > atomic_xor(~0, &atom); > > ... would suffer from the same problem as described for cmpxchg. Thanks for spotting this. Yes, I think the resolution here (and for any 32bit bitmask immediate) is to drop the constraint. Do you agree that we should drop the 'K' constraint for both orr and eor above? > > [...] > > > -ATOMIC64_OPS(and, and) > > -ATOMIC64_OPS(andnot, bic) > > -ATOMIC64_OPS(or, orr) > > -ATOMIC64_OPS(xor, eor) > > +ATOMIC64_OPS(and, and, K) > > +ATOMIC64_OPS(andnot, bic, ) > > +ATOMIC64_OPS(or, orr, K) > > +ATOMIC64_OPS(xor, eor, K) > > Shouldn't these be 'L'? > > IIUC K is a subset of L, so if that's deliberate we should call that out > explicitly... Oooh yes that's wrong. I guess the atomic64_[and,or,xor] are rarely called in the kernel which perhaps is why the compiler hasn't shouted at me. Do you agree that the and, orr and eor should all be 'L' instead of 'K'? > > > +__CMPXCHG_CASE(w, b, , 8, , , , , ) > > +__CMPXCHG_CASE(w, h, , 16, , , , , ) > > +__CMPXCHG_CASE(w, , , 32, , , , , ) > > +__CMPXCHG_CASE( , , , 64, , , , , L) > > +__CMPXCHG_CASE(w, b, acq_, 8, , a, , "memory", ) > > +__CMPXCHG_CASE(w, h, acq_, 16, , a, , "memory", ) > > +__CMPXCHG_CASE(w, , acq_, 32, , a, , "memory", ) > > +__CMPXCHG_CASE( , , acq_, 64, , a, , "memory", L) > > +__CMPXCHG_CASE(w, b, rel_, 8, , , l, "memory", ) > > +__CMPXCHG_CASE(w, h, rel_, 16, , , l, "memory", ) > > +__CMPXCHG_CASE(w, , rel_, 32, , , l, "memory", ) > > +__CMPXCHG_CASE( , , rel_, 64, , , l, "memory", L) > > +__CMPXCHG_CASE(w, b, mb_, 8, dmb ish, , l, "memory", ) > > +__CMPXCHG_CASE(w, h, mb_, 16, dmb ish, , l, "memory", ) > > +__CMPXCHG_CASE(w, , mb_, 32, dmb ish, , l, "memory", ) > > +__CMPXCHG_CASE( , , mb_, 64, dmb ish, , l, "memory", L) > > ... but these uses imply that's not the case. Yup, so I can leave these as they are. Phew - thanks for the review. Andrew Murray > > Otherwise this looks good to me. > > Thanks, > Mark.
On Wed, Aug 28, 2019 at 02:01:19PM +0100, Andrew Murray wrote: > On Thu, Aug 22, 2019 at 04:32:23PM +0100, Mark Rutland wrote: > > Hi Andrew, > > > > On Mon, Aug 12, 2019 at 03:36:22PM +0100, Andrew Murray wrote: > > > For many of the ll/sc atomic operations we use the 'I' machine constraint > > > regardless to the instruction used - this may not be optimal. > > > > > > Let's add an additional parameter to the ATOMIC_xx macros that allows the > > > caller to specify an appropriate machine constraint. > > > > > > Let's also improve __CMPXCHG_CASE by replacing the 'K' constraint with a > > > caller provided constraint. Please note that whilst we would like to use > > > the 'K' constraint on 32 bit operations, we choose not to provide any > > > constraint to avoid a GCC bug which results in a build error. > > > > > > Earlier versions of GCC (no later than 8.1.0) appear to incorrectly handle > > > the 'K' constraint for the value 4294967295. > > > > From reading the above, it's difficult to discern what's a fix and > > what's an improvement, and I think we need to be more explicit about > > that. It would also be helpful to have the necessary context up-front. > > > > How about: > > > > | The A64 ISA accepts distinct (but overlapping) ranges of immediates for: > > | > > | * add arithmetic instructions ('I' machine constraint) > > | * sub arithmetic instructions ('J' machine constraint) > > | * 32-bit logical instructions ('K' machine constraint) > > | * 64-bit logical instructions ('L' machine constraint) > > | > > | ... but we currently use the 'I' constraint for many atomic operations > > | using sub or logical instructions, which is not always valid. > > | > > | When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates > > | to be passed to instructions, potentially resulting in a build failure. > > | When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics > > | always use a register as they have no visibility of the value passed by > > | the caller. > > | > > | This patch adds a constraint parameter to the ATOMIC_xx and > > | __CMPXCHG_CASE macros so that we can pass appropriate constraints for > > | each case, with uses updated accordingly. > > | > > | Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted > > | 0xffffffff, so we must instead force the use of a register. > > Looks great - I'll adopt this, thanks for writing it. Cool. :) > > Given we haven't had any bug reports, I'm not sure whether this needs a > > Fixes tag or Cc stable. This has been a latent issue for a long time, > > but upstream code doesn't seem to have tickled it. > > Yes I guess this is more a correctness issue rather than a reproducible bug > in upstream code. I won't add a fixes tag or CC to stable. Sounds good to me. > > [...] > > > > > -ATOMIC_OPS(and, and) > > > -ATOMIC_OPS(andnot, bic) > > > -ATOMIC_OPS(or, orr) > > > -ATOMIC_OPS(xor, eor) > > > +ATOMIC_OPS(and, and, K) > > > +ATOMIC_OPS(andnot, bic, ) > > > +ATOMIC_OPS(or, orr, K) > > > +ATOMIC_OPS(xor, eor, K) > > > > Surely it's not safe to use the K constraint here, either? AFAICT code > > like: > > > > atomic_xor(~0, &atom); > > > > ... would suffer from the same problem as described for cmpxchg. > > Thanks for spotting this. > > Yes, I think the resolution here (and for any 32bit bitmask immediate) is to > drop the constraint. > > Do you agree that we should drop the 'K' constraint for both orr and eor > above? Yes, I think we should drop the 'K' for all the 32-bit logical ops. > > [...] > > > > > -ATOMIC64_OPS(and, and) > > > -ATOMIC64_OPS(andnot, bic) > > > -ATOMIC64_OPS(or, orr) > > > -ATOMIC64_OPS(xor, eor) > > > +ATOMIC64_OPS(and, and, K) > > > +ATOMIC64_OPS(andnot, bic, ) > > > +ATOMIC64_OPS(or, orr, K) > > > +ATOMIC64_OPS(xor, eor, K) > > > > Shouldn't these be 'L'? > > > > IIUC K is a subset of L, so if that's deliberate we should call that out > > explicitly... > > Oooh yes that's wrong. I guess the atomic64_[and,or,xor] are rarely called > in the kernel which perhaps is why the compiler hasn't shouted at me. > > Do you agree that the and, orr and eor should all be 'L' instead of 'K'? Yes, I think all the 64-bit logical ops should all use 'L'. I did a quick local test, and the 'L' constraint seems to correctly reject ~0UL (i.e. it doesn't seem to have a similar bug to the 'K' constraint). > > > +__CMPXCHG_CASE(w, b, , 8, , , , , ) > > > +__CMPXCHG_CASE(w, h, , 16, , , , , ) > > > +__CMPXCHG_CASE(w, , , 32, , , , , ) > > > +__CMPXCHG_CASE( , , , 64, , , , , L) > > > +__CMPXCHG_CASE(w, b, acq_, 8, , a, , "memory", ) > > > +__CMPXCHG_CASE(w, h, acq_, 16, , a, , "memory", ) > > > +__CMPXCHG_CASE(w, , acq_, 32, , a, , "memory", ) > > > +__CMPXCHG_CASE( , , acq_, 64, , a, , "memory", L) > > > +__CMPXCHG_CASE(w, b, rel_, 8, , , l, "memory", ) > > > +__CMPXCHG_CASE(w, h, rel_, 16, , , l, "memory", ) > > > +__CMPXCHG_CASE(w, , rel_, 32, , , l, "memory", ) > > > +__CMPXCHG_CASE( , , rel_, 64, , , l, "memory", L) > > > +__CMPXCHG_CASE(w, b, mb_, 8, dmb ish, , l, "memory", ) > > > +__CMPXCHG_CASE(w, h, mb_, 16, dmb ish, , l, "memory", ) > > > +__CMPXCHG_CASE(w, , mb_, 32, dmb ish, , l, "memory", ) > > > +__CMPXCHG_CASE( , , mb_, 64, dmb ish, , l, "memory", L) > > > > ... but these uses imply that's not the case. > > Yup, so I can leave these as they are. Yup; sounds good. Thanks, Mark.
On Wed, Aug 28, 2019 at 04:25:40PM +0100, Mark Rutland wrote: > On Wed, Aug 28, 2019 at 02:01:19PM +0100, Andrew Murray wrote: > > On Thu, Aug 22, 2019 at 04:32:23PM +0100, Mark Rutland wrote: > > > Hi Andrew, > > > > > > On Mon, Aug 12, 2019 at 03:36:22PM +0100, Andrew Murray wrote: > > > > For many of the ll/sc atomic operations we use the 'I' machine constraint > > > > regardless to the instruction used - this may not be optimal. > > > > > > > > Let's add an additional parameter to the ATOMIC_xx macros that allows the > > > > caller to specify an appropriate machine constraint. > > > > > > > > Let's also improve __CMPXCHG_CASE by replacing the 'K' constraint with a > > > > caller provided constraint. Please note that whilst we would like to use > > > > the 'K' constraint on 32 bit operations, we choose not to provide any > > > > constraint to avoid a GCC bug which results in a build error. > > > > > > > > Earlier versions of GCC (no later than 8.1.0) appear to incorrectly handle > > > > the 'K' constraint for the value 4294967295. > > > > > > From reading the above, it's difficult to discern what's a fix and > > > what's an improvement, and I think we need to be more explicit about > > > that. It would also be helpful to have the necessary context up-front. > > > > > > How about: > > > > > > | The A64 ISA accepts distinct (but overlapping) ranges of immediates for: > > > | > > > | * add arithmetic instructions ('I' machine constraint) > > > | * sub arithmetic instructions ('J' machine constraint) > > > | * 32-bit logical instructions ('K' machine constraint) > > > | * 64-bit logical instructions ('L' machine constraint) > > > | > > > | ... but we currently use the 'I' constraint for many atomic operations > > > | using sub or logical instructions, which is not always valid. > > > | > > > | When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates > > > | to be passed to instructions, potentially resulting in a build failure. > > > | When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics > > > | always use a register as they have no visibility of the value passed by > > > | the caller. > > > | > > > | This patch adds a constraint parameter to the ATOMIC_xx and > > > | __CMPXCHG_CASE macros so that we can pass appropriate constraints for > > > | each case, with uses updated accordingly. > > > | > > > | Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted > > > | 0xffffffff, so we must instead force the use of a register. > > > > Looks great - I'll adopt this, thanks for writing it. > > Cool. :) > > > > Given we haven't had any bug reports, I'm not sure whether this needs a > > > Fixes tag or Cc stable. This has been a latent issue for a long time, > > > but upstream code doesn't seem to have tickled it. > > > > Yes I guess this is more a correctness issue rather than a reproducible bug > > in upstream code. I won't add a fixes tag or CC to stable. > > Sounds good to me. > > > > [...] > > > > > > > -ATOMIC_OPS(and, and) > > > > -ATOMIC_OPS(andnot, bic) > > > > -ATOMIC_OPS(or, orr) > > > > -ATOMIC_OPS(xor, eor) > > > > +ATOMIC_OPS(and, and, K) > > > > +ATOMIC_OPS(andnot, bic, ) > > > > +ATOMIC_OPS(or, orr, K) > > > > +ATOMIC_OPS(xor, eor, K) > > > > > > Surely it's not safe to use the K constraint here, either? AFAICT code > > > like: > > > > > > atomic_xor(~0, &atom); > > > > > > ... would suffer from the same problem as described for cmpxchg. > > > > Thanks for spotting this. > > > > Yes, I think the resolution here (and for any 32bit bitmask immediate) is to > > drop the constraint. > > > > Do you agree that we should drop the 'K' constraint for both orr and eor > > above? > > Yes, I think we should drop the 'K' for all the 32-bit logical ops. Ah yes, I keep getting 'and' and 'add' mixed up. OK I'll drop 'K' from all the above. > > > > [...] > > > > > > > -ATOMIC64_OPS(and, and) > > > > -ATOMIC64_OPS(andnot, bic) > > > > -ATOMIC64_OPS(or, orr) > > > > -ATOMIC64_OPS(xor, eor) > > > > +ATOMIC64_OPS(and, and, K) > > > > +ATOMIC64_OPS(andnot, bic, ) > > > > +ATOMIC64_OPS(or, orr, K) > > > > +ATOMIC64_OPS(xor, eor, K) > > > > > > Shouldn't these be 'L'? > > > > > > IIUC K is a subset of L, so if that's deliberate we should call that out > > > explicitly... > > > > Oooh yes that's wrong. I guess the atomic64_[and,or,xor] are rarely called > > in the kernel which perhaps is why the compiler hasn't shouted at me. > > > > Do you agree that the and, orr and eor should all be 'L' instead of 'K'? > > Yes, I think all the 64-bit logical ops should all use 'L'. With the exception of bic? I don't think there is an appropriate constraint for this (it requires an 8 bit immediate). > > I did a quick local test, and the 'L' constraint seems to correctly > reject ~0UL (i.e. it doesn't seem to have a similar bug to the 'K' > constraint). Yes, if I recall correctly the issue was only with 'K'. > > > > > +__CMPXCHG_CASE(w, b, , 8, , , , , ) > > > > +__CMPXCHG_CASE(w, h, , 16, , , , , ) > > > > +__CMPXCHG_CASE(w, , , 32, , , , , ) > > > > +__CMPXCHG_CASE( , , , 64, , , , , L) > > > > +__CMPXCHG_CASE(w, b, acq_, 8, , a, , "memory", ) > > > > +__CMPXCHG_CASE(w, h, acq_, 16, , a, , "memory", ) > > > > +__CMPXCHG_CASE(w, , acq_, 32, , a, , "memory", ) > > > > +__CMPXCHG_CASE( , , acq_, 64, , a, , "memory", L) > > > > +__CMPXCHG_CASE(w, b, rel_, 8, , , l, "memory", ) > > > > +__CMPXCHG_CASE(w, h, rel_, 16, , , l, "memory", ) > > > > +__CMPXCHG_CASE(w, , rel_, 32, , , l, "memory", ) > > > > +__CMPXCHG_CASE( , , rel_, 64, , , l, "memory", L) > > > > +__CMPXCHG_CASE(w, b, mb_, 8, dmb ish, , l, "memory", ) > > > > +__CMPXCHG_CASE(w, h, mb_, 16, dmb ish, , l, "memory", ) > > > > +__CMPXCHG_CASE(w, , mb_, 32, dmb ish, , l, "memory", ) > > > > +__CMPXCHG_CASE( , , mb_, 64, dmb ish, , l, "memory", L) > > > > > > ... but these uses imply that's not the case. > > > > Yup, so I can leave these as they are. > > Yup; sounds good. Thanks, Andrew Murray > > Thanks, > Mark.
On Wed, Aug 28, 2019 at 04:44:22PM +0100, Andrew Murray wrote: > On Wed, Aug 28, 2019 at 04:25:40PM +0100, Mark Rutland wrote: > > On Wed, Aug 28, 2019 at 02:01:19PM +0100, Andrew Murray wrote: > > > On Thu, Aug 22, 2019 at 04:32:23PM +0100, Mark Rutland wrote: > > > > On Mon, Aug 12, 2019 at 03:36:22PM +0100, Andrew Murray wrote: > > > > [...] > > > > > > > > > -ATOMIC64_OPS(and, and) > > > > > -ATOMIC64_OPS(andnot, bic) > > > > > -ATOMIC64_OPS(or, orr) > > > > > -ATOMIC64_OPS(xor, eor) > > > > > +ATOMIC64_OPS(and, and, K) > > > > > +ATOMIC64_OPS(andnot, bic, ) > > > > > +ATOMIC64_OPS(or, orr, K) > > > > > +ATOMIC64_OPS(xor, eor, K) > > > > > > > > Shouldn't these be 'L'? > > > > > > > > IIUC K is a subset of L, so if that's deliberate we should call that out > > > > explicitly... > > > > > > Oooh yes that's wrong. I guess the atomic64_[and,or,xor] are rarely called > > > in the kernel which perhaps is why the compiler hasn't shouted at me. > > > > > > Do you agree that the and, orr and eor should all be 'L' instead of 'K'? > > > > Yes, I think all the 64-bit logical ops should all use 'L'. > > With the exception of bic? I don't think there is an appropriate constraint > for this (it requires an 8 bit immediate). The ARM ARM doesn't mention BIC (Immediate), and AFAICT that's an (undocumented?) alias for AND (Immediate) with a negated immediate. Where did you find a description with an 8-bit immediate? Regardless, yes, drop the 'L' there -- I can't find any suitable constraint either. Thanks, Mark.
On Wed, Aug 28, 2019 at 05:24:09PM +0100, Mark Rutland wrote: > On Wed, Aug 28, 2019 at 04:44:22PM +0100, Andrew Murray wrote: > > On Wed, Aug 28, 2019 at 04:25:40PM +0100, Mark Rutland wrote: > > > On Wed, Aug 28, 2019 at 02:01:19PM +0100, Andrew Murray wrote: > > > > On Thu, Aug 22, 2019 at 04:32:23PM +0100, Mark Rutland wrote: > > > > > On Mon, Aug 12, 2019 at 03:36:22PM +0100, Andrew Murray wrote: > > > > > [...] > > > > > > > > > > > -ATOMIC64_OPS(and, and) > > > > > > -ATOMIC64_OPS(andnot, bic) > > > > > > -ATOMIC64_OPS(or, orr) > > > > > > -ATOMIC64_OPS(xor, eor) > > > > > > +ATOMIC64_OPS(and, and, K) > > > > > > +ATOMIC64_OPS(andnot, bic, ) > > > > > > +ATOMIC64_OPS(or, orr, K) > > > > > > +ATOMIC64_OPS(xor, eor, K) > > > > > > > > > > Shouldn't these be 'L'? > > > > > > > > > > IIUC K is a subset of L, so if that's deliberate we should call that out > > > > > explicitly... > > > > > > > > Oooh yes that's wrong. I guess the atomic64_[and,or,xor] are rarely called > > > > in the kernel which perhaps is why the compiler hasn't shouted at me. > > > > > > > > Do you agree that the and, orr and eor should all be 'L' instead of 'K'? > > > > > > Yes, I think all the 64-bit logical ops should all use 'L'. > > > > With the exception of bic? I don't think there is an appropriate constraint > > for this (it requires an 8 bit immediate). > > The ARM ARM doesn't mention BIC (Immediate), and AFAICT that's an > (undocumented?) alias for AND (Immediate) with a negated immediate. > > Where did you find a description with an 8-bit immediate? > I think it's a SIMD instruction, see C7.2.13 of ARM DDI 0487D or http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/EOR_log_imm.html > Regardless, yes, drop the 'L' there -- I can't find any suitable > constraint either. OK I'll drop it, thanks for the feedback. Thanks, Andrew Murray > > Thanks, > Mark. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h index c8c850bc3dfb..4ebff769f3ed 100644 --- a/arch/arm64/include/asm/atomic_ll_sc.h +++ b/arch/arm64/include/asm/atomic_ll_sc.h @@ -26,7 +26,7 @@ * (the optimize attribute silently ignores these options). */ -#define ATOMIC_OP(op, asm_op) \ +#define ATOMIC_OP(op, asm_op, constraint) \ __LL_SC_INLINE void \ __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v)) \ { \ @@ -40,11 +40,11 @@ __LL_SC_PREFIX(arch_atomic_##op(int i, atomic_t *v)) \ " stxr %w1, %w0, %2\n" \ " cbnz %w1, 1b" \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ - : "Ir" (i)); \ + : #constraint "r" (i)); \ } \ __LL_SC_EXPORT(arch_atomic_##op); -#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op) \ +#define ATOMIC_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\ __LL_SC_INLINE int \ __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v)) \ { \ @@ -59,14 +59,14 @@ __LL_SC_PREFIX(arch_atomic_##op##_return##name(int i, atomic_t *v)) \ " cbnz %w1, 1b\n" \ " " #mb \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ - : "Ir" (i) \ + : #constraint "r" (i) \ : cl); \ \ return result; \ } \ __LL_SC_EXPORT(arch_atomic_##op##_return##name); -#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op) \ +#define ATOMIC_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint) \ __LL_SC_INLINE int \ __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v)) \ { \ @@ -81,7 +81,7 @@ __LL_SC_PREFIX(arch_atomic_fetch_##op##name(int i, atomic_t *v)) \ " cbnz %w2, 1b\n" \ " " #mb \ : "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \ - : "Ir" (i) \ + : #constraint "r" (i) \ : cl); \ \ return result; \ @@ -99,8 +99,8 @@ __LL_SC_EXPORT(arch_atomic_fetch_##op##name); ATOMIC_FETCH_OP (_acquire, , a, , "memory", __VA_ARGS__)\ ATOMIC_FETCH_OP (_release, , , l, "memory", __VA_ARGS__) -ATOMIC_OPS(add, add) -ATOMIC_OPS(sub, sub) +ATOMIC_OPS(add, add, I) +ATOMIC_OPS(sub, sub, J) #undef ATOMIC_OPS #define ATOMIC_OPS(...) \ @@ -110,17 +110,17 @@ ATOMIC_OPS(sub, sub) ATOMIC_FETCH_OP (_acquire, , a, , "memory", __VA_ARGS__)\ ATOMIC_FETCH_OP (_release, , , l, "memory", __VA_ARGS__) -ATOMIC_OPS(and, and) -ATOMIC_OPS(andnot, bic) -ATOMIC_OPS(or, orr) -ATOMIC_OPS(xor, eor) +ATOMIC_OPS(and, and, K) +ATOMIC_OPS(andnot, bic, ) +ATOMIC_OPS(or, orr, K) +ATOMIC_OPS(xor, eor, K) #undef ATOMIC_OPS #undef ATOMIC_FETCH_OP #undef ATOMIC_OP_RETURN #undef ATOMIC_OP -#define ATOMIC64_OP(op, asm_op) \ +#define ATOMIC64_OP(op, asm_op, constraint) \ __LL_SC_INLINE void \ __LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v)) \ { \ @@ -134,11 +134,11 @@ __LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v)) \ " stxr %w1, %0, %2\n" \ " cbnz %w1, 1b" \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ - : "Ir" (i)); \ + : #constraint "r" (i)); \ } \ __LL_SC_EXPORT(arch_atomic64_##op); -#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op) \ +#define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op, constraint)\ __LL_SC_INLINE s64 \ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\ { \ @@ -153,14 +153,14 @@ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\ " cbnz %w1, 1b\n" \ " " #mb \ : "=&r" (result), "=&r" (tmp), "+Q" (v->counter) \ - : "Ir" (i) \ + : #constraint "r" (i) \ : cl); \ \ return result; \ } \ __LL_SC_EXPORT(arch_atomic64_##op##_return##name); -#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op) \ +#define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op, constraint)\ __LL_SC_INLINE s64 \ __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v)) \ { \ @@ -175,7 +175,7 @@ __LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v)) \ " cbnz %w2, 1b\n" \ " " #mb \ : "=&r" (result), "=&r" (val), "=&r" (tmp), "+Q" (v->counter) \ - : "Ir" (i) \ + : #constraint "r" (i) \ : cl); \ \ return result; \ @@ -193,8 +193,8 @@ __LL_SC_EXPORT(arch_atomic64_fetch_##op##name); ATOMIC64_FETCH_OP (_acquire,, a, , "memory", __VA_ARGS__) \ ATOMIC64_FETCH_OP (_release,, , l, "memory", __VA_ARGS__) -ATOMIC64_OPS(add, add) -ATOMIC64_OPS(sub, sub) +ATOMIC64_OPS(add, add, I) +ATOMIC64_OPS(sub, sub, J) #undef ATOMIC64_OPS #define ATOMIC64_OPS(...) \ @@ -204,10 +204,10 @@ ATOMIC64_OPS(sub, sub) ATOMIC64_FETCH_OP (_acquire,, a, , "memory", __VA_ARGS__) \ ATOMIC64_FETCH_OP (_release,, , l, "memory", __VA_ARGS__) -ATOMIC64_OPS(and, and) -ATOMIC64_OPS(andnot, bic) -ATOMIC64_OPS(or, orr) -ATOMIC64_OPS(xor, eor) +ATOMIC64_OPS(and, and, K) +ATOMIC64_OPS(andnot, bic, ) +ATOMIC64_OPS(or, orr, K) +ATOMIC64_OPS(xor, eor, K) #undef ATOMIC64_OPS #undef ATOMIC64_FETCH_OP @@ -237,7 +237,7 @@ __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v)) } __LL_SC_EXPORT(arch_atomic64_dec_if_positive); -#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl) \ +#define __CMPXCHG_CASE(w, sfx, name, sz, mb, acq, rel, cl, constraint) \ __LL_SC_INLINE u##sz \ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr, \ unsigned long old, \ @@ -265,29 +265,34 @@ __LL_SC_PREFIX(__cmpxchg_case_##name##sz(volatile void *ptr, \ "2:" \ : [tmp] "=&r" (tmp), [oldval] "=&r" (oldval), \ [v] "+Q" (*(u##sz *)ptr) \ - : [old] "Kr" (old), [new] "r" (new) \ + : [old] #constraint "r" (old), [new] "r" (new) \ : cl); \ \ return oldval; \ } \ __LL_SC_EXPORT(__cmpxchg_case_##name##sz); -__CMPXCHG_CASE(w, b, , 8, , , , ) -__CMPXCHG_CASE(w, h, , 16, , , , ) -__CMPXCHG_CASE(w, , , 32, , , , ) -__CMPXCHG_CASE( , , , 64, , , , ) -__CMPXCHG_CASE(w, b, acq_, 8, , a, , "memory") -__CMPXCHG_CASE(w, h, acq_, 16, , a, , "memory") -__CMPXCHG_CASE(w, , acq_, 32, , a, , "memory") -__CMPXCHG_CASE( , , acq_, 64, , a, , "memory") -__CMPXCHG_CASE(w, b, rel_, 8, , , l, "memory") -__CMPXCHG_CASE(w, h, rel_, 16, , , l, "memory") -__CMPXCHG_CASE(w, , rel_, 32, , , l, "memory") -__CMPXCHG_CASE( , , rel_, 64, , , l, "memory") -__CMPXCHG_CASE(w, b, mb_, 8, dmb ish, , l, "memory") -__CMPXCHG_CASE(w, h, mb_, 16, dmb ish, , l, "memory") -__CMPXCHG_CASE(w, , mb_, 32, dmb ish, , l, "memory") -__CMPXCHG_CASE( , , mb_, 64, dmb ish, , l, "memory") +/* + * Earlier versions of GCC (no later than 8.1.0) appear to incorrectly + * handle the 'K' constraint for the value 4294967295 - thus we use no + * constraint for 32 bit operations. + */ +__CMPXCHG_CASE(w, b, , 8, , , , , ) +__CMPXCHG_CASE(w, h, , 16, , , , , ) +__CMPXCHG_CASE(w, , , 32, , , , , ) +__CMPXCHG_CASE( , , , 64, , , , , L) +__CMPXCHG_CASE(w, b, acq_, 8, , a, , "memory", ) +__CMPXCHG_CASE(w, h, acq_, 16, , a, , "memory", ) +__CMPXCHG_CASE(w, , acq_, 32, , a, , "memory", ) +__CMPXCHG_CASE( , , acq_, 64, , a, , "memory", L) +__CMPXCHG_CASE(w, b, rel_, 8, , , l, "memory", ) +__CMPXCHG_CASE(w, h, rel_, 16, , , l, "memory", ) +__CMPXCHG_CASE(w, , rel_, 32, , , l, "memory", ) +__CMPXCHG_CASE( , , rel_, 64, , , l, "memory", L) +__CMPXCHG_CASE(w, b, mb_, 8, dmb ish, , l, "memory", ) +__CMPXCHG_CASE(w, h, mb_, 16, dmb ish, , l, "memory", ) +__CMPXCHG_CASE(w, , mb_, 32, dmb ish, , l, "memory", ) +__CMPXCHG_CASE( , , mb_, 64, dmb ish, , l, "memory", L) #undef __CMPXCHG_CASE
For many of the ll/sc atomic operations we use the 'I' machine constraint regardless to the instruction used - this may not be optimal. Let's add an additional parameter to the ATOMIC_xx macros that allows the caller to specify an appropriate machine constraint. Let's also improve __CMPXCHG_CASE by replacing the 'K' constraint with a caller provided constraint. Please note that whilst we would like to use the 'K' constraint on 32 bit operations, we choose not to provide any constraint to avoid a GCC bug which results in a build error. Earlier versions of GCC (no later than 8.1.0) appear to incorrectly handle the 'K' constraint for the value 4294967295. Signed-off-by: Andrew Murray <andrew.murray@arm.com> --- arch/arm64/include/asm/atomic_ll_sc.h | 89 ++++++++++++++------------- 1 file changed, 47 insertions(+), 42 deletions(-)