Message ID | 20240625190719.788643-7-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xen: Rework for_each_set_bit() | expand |
On 25.06.2024 21:07, Andrew Cooper wrote: > In all 3 examples, we're iterating over a scaler. No caller can pass the > COMPRESSED flag in, so the upper bound of 63, as opposed to 64, doesn't > matter. Not sure, maybe more a language question (for my education): Is "can" really appropriate here? In recalculate_xstate() we calculate the value ourselves, but in the two other cases the value is incoming to the functions. Architecturally those value should not have bit 63 set, but that's weaker than "can" according to my understanding. I'd be fine with "may", for example. > This alone produces: > > add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-161 (-161) > Function old new delta > compress_xsave_states 66 58 -8 > xstate_uncompressed_size 119 71 -48 > xstate_compressed_size 124 76 -48 > recalculate_xstate 347 290 -57 > > where xstate_{un,}compressed_size() have practically halved in size despite > being small before. > > The change in compress_xsave_states() is unexpected. The function is almost > entirely dead code, and within what remains there's a smaller stack frame. I > suspect it's leftovers that the optimiser couldn't fully discard. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Other than the above: Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
On 26/06/2024 11:24 am, Jan Beulich wrote: > On 25.06.2024 21:07, Andrew Cooper wrote: >> In all 3 examples, we're iterating over a scaler. No caller can pass the >> COMPRESSED flag in, so the upper bound of 63, as opposed to 64, doesn't >> matter. > Not sure, maybe more a language question (for my education): Is "can" > really appropriate here? It's not the greatest choice, but it's not objectively wrong either. > In recalculate_xstate() we calculate the > value ourselves, but in the two other cases the value is incoming to > the functions. Architecturally those value should not have bit 63 set, > but that's weaker than "can" according to my understanding. I'd be > fine with "may", for example. There's an ASSERT() in xstate_uncompressed_size() which covers the property, but most if the justification comes from the fact that the callers pass in values which are really loaded into hardware registers. But it is certainly more accurate to say that callers don't pass the flag in. There isn't an ASSERT() in xstate_compressed_size(), but I suppose I could fold this in: diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index 88dbfbeafacd..f72f14626b7d 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -623,6 +623,8 @@ unsigned int xstate_compressed_size(uint64_t xstates) { unsigned int size = XSTATE_AREA_MIN_SIZE; + ASSERT((xstates & ~(X86_XCR0_STATES | X86_XSS_STATES)) == 0); + if ( xstates == 0 ) return 0; which brings it more in line with xstate_uncompressed_size(), and has a side effect of confirming the absence of the COMPRESSED bit. Thoughts? >> This alone produces: >> >> add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-161 (-161) >> Function old new delta >> compress_xsave_states 66 58 -8 >> xstate_uncompressed_size 119 71 -48 >> xstate_compressed_size 124 76 -48 >> recalculate_xstate 347 290 -57 >> >> where xstate_{un,}compressed_size() have practically halved in size despite >> being small before. >> >> The change in compress_xsave_states() is unexpected. The function is almost >> entirely dead code, and within what remains there's a smaller stack frame. I >> suspect it's leftovers that the optimiser couldn't fully discard. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Other than the above: > Reviewed-by: Jan Beulich <jbeulich@suse.com> Thanks. ~Andrew
On 26.06.2024 20:09, Andrew Cooper wrote: > On 26/06/2024 11:24 am, Jan Beulich wrote: >> On 25.06.2024 21:07, Andrew Cooper wrote: >>> In all 3 examples, we're iterating over a scaler. No caller can pass the >>> COMPRESSED flag in, so the upper bound of 63, as opposed to 64, doesn't >>> matter. >> Not sure, maybe more a language question (for my education): Is "can" >> really appropriate here? > > It's not the greatest choice, but it's not objectively wrong either. > >> In recalculate_xstate() we calculate the >> value ourselves, but in the two other cases the value is incoming to >> the functions. Architecturally those value should not have bit 63 set, >> but that's weaker than "can" according to my understanding. I'd be >> fine with "may", for example. > > There's an ASSERT() in xstate_uncompressed_size() which covers the > property, but most if the justification comes from the fact that the > callers pass in values which are really loaded into hardware registers. > > But it is certainly more accurate to say that callers don't pass the > flag in. > > There isn't an ASSERT() in xstate_compressed_size(), but I suppose I > could fold this in: > > diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c > index 88dbfbeafacd..f72f14626b7d 100644 > --- a/xen/arch/x86/xstate.c > +++ b/xen/arch/x86/xstate.c > @@ -623,6 +623,8 @@ unsigned int xstate_compressed_size(uint64_t xstates) > { > unsigned int size = XSTATE_AREA_MIN_SIZE; > > + ASSERT((xstates & ~(X86_XCR0_STATES | X86_XSS_STATES)) == 0); > + > if ( xstates == 0 ) > return 0; > > > which brings it more in line with xstate_uncompressed_size(), and has a > side effect of confirming the absence of the COMPRESSED bit. > > Thoughts? Definitely fine with me. Jan
diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index cd53bac777dc..fa55f6073089 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -193,7 +193,7 @@ static void sanitise_featureset(uint32_t *fs) static void recalculate_xstate(struct cpu_policy *p) { uint64_t xstates = XSTATE_FP_SSE; - unsigned int i, ecx_mask = 0, Da1 = p->xstate.Da1; + unsigned int ecx_mask = 0, Da1 = p->xstate.Da1; /* * The Da1 leaf is the only piece of information preserved in the common @@ -237,7 +237,7 @@ static void recalculate_xstate(struct cpu_policy *p) /* Subleafs 2+ */ xstates &= ~XSTATE_FP_SSE; BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63); - bitmap_for_each ( i, &xstates, 63 ) + for_each_set_bit ( i, xstates ) { /* * Pass through size (eax) and offset (ebx) directly. Visbility of diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c index da9053c0a262..88dbfbeafacd 100644 --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -589,7 +589,7 @@ static bool valid_xcr0(uint64_t xcr0) unsigned int xstate_uncompressed_size(uint64_t xcr0) { - unsigned int size = XSTATE_AREA_MIN_SIZE, i; + unsigned int size = XSTATE_AREA_MIN_SIZE; /* Non-XCR0 states don't exist in an uncompressed image. */ ASSERT((xcr0 & ~X86_XCR0_STATES) == 0); @@ -606,7 +606,7 @@ unsigned int xstate_uncompressed_size(uint64_t xcr0) * with respect their index. */ xcr0 &= ~(X86_XCR0_SSE | X86_XCR0_X87); - bitmap_for_each ( i, &xcr0, 63 ) + for_each_set_bit ( i, xcr0 ) { const struct xstate_component *c = &raw_cpu_policy.xstate.comp[i]; unsigned int s = c->offset + c->size; @@ -621,7 +621,7 @@ unsigned int xstate_uncompressed_size(uint64_t xcr0) unsigned int xstate_compressed_size(uint64_t xstates) { - unsigned int i, size = XSTATE_AREA_MIN_SIZE; + unsigned int size = XSTATE_AREA_MIN_SIZE; if ( xstates == 0 ) return 0; @@ -634,7 +634,7 @@ unsigned int xstate_compressed_size(uint64_t xstates) * componenets require aligning to 64 first. */ xstates &= ~(X86_XCR0_SSE | X86_XCR0_X87); - bitmap_for_each ( i, &xstates, 63 ) + for_each_set_bit ( i, xstates ) { const struct xstate_component *c = &raw_cpu_policy.xstate.comp[i];
In all 3 examples, we're iterating over a scaler. No caller can pass the COMPRESSED flag in, so the upper bound of 63, as opposed to 64, doesn't matter. This alone produces: add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-161 (-161) Function old new delta compress_xsave_states 66 58 -8 xstate_uncompressed_size 119 71 -48 xstate_compressed_size 124 76 -48 recalculate_xstate 347 290 -57 where xstate_{un,}compressed_size() have practically halved in size despite being small before. The change in compress_xsave_states() is unexpected. The function is almost entirely dead code, and within what remains there's a smaller stack frame. I suspect it's leftovers that the optimiser couldn't fully discard. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien@xen.org> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> CC: Bertrand Marquis <bertrand.marquis@arm.com> CC: Michal Orzel <michal.orzel@amd.com> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com> --- xen/arch/x86/cpu-policy.c | 4 ++-- xen/arch/x86/xstate.c | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-)