diff mbox series

[2/4] x86/hvm: Use for_each_set_bit() in hvm_emulate_writeback()

Message ID 20240827135746.1908070-3-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series xen/bitops: More for_each_bit() conversions | expand

Commit Message

Andrew Cooper Aug. 27, 2024, 1:57 p.m. UTC
... which is more consise than the opencoded form.

Also, for production VMs, ~100% of emulations are simple MOVs, so it is likely
that there are no segments to write back.

Furthermore, now that find_{first,next}_bit() are no longer in use, the
seg_reg_{accessed,dirty} fields aren't forced to be unsigned long, although
they do need to remain unsigned int because of __set_bit() elsewhere.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Pulling current out into curr is good for code generation.  When using current
in the loop, GCC can't retain the calculation across the call to
hvm_set_segment_register() and is forced to re-read from the cpu_info block.

However, if curr is initialised, it's calculated even in the likely path...
---
 xen/arch/x86/hvm/emulate.c             | 20 ++++++++++----------
 xen/arch/x86/include/asm/hvm/emulate.h |  4 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

Comments

Jan Beulich Aug. 27, 2024, 4:07 p.m. UTC | #1
On 27.08.2024 15:57, Andrew Cooper wrote:
> ... which is more consise than the opencoded form.
> 
> Also, for production VMs, ~100% of emulations are simple MOVs, so it is likely
> that there are no segments to write back.
> 
> Furthermore, now that find_{first,next}_bit() are no longer in use, the
> seg_reg_{accessed,dirty} fields aren't forced to be unsigned long, although
> they do need to remain unsigned int because of __set_bit() elsewhere.
> 
> No practical change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Pulling current out into curr is good for code generation.  When using current
> in the loop, GCC can't retain the calculation across the call to
> hvm_set_segment_register() and is forced to re-read from the cpu_info block.
> 
> However, if curr is initialised, it's calculated even in the likely path...

That's a little odd, as I don't think I can spot what would force the compiler
into doing so. As a wild guess, ...

> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2908,18 +2908,18 @@ void hvm_emulate_init_per_insn(
>  void hvm_emulate_writeback(
>      struct hvm_emulate_ctxt *hvmemul_ctxt)
>  {
> -    enum x86_segment seg;
> +    struct vcpu *curr;
> +    unsigned int dirty = hvmemul_ctxt->seg_reg_dirty;

... is the order of these two possibly relevant? Yet of course it's not the
end of the world whichever way it's done.

> -    seg = find_first_bit(&hvmemul_ctxt->seg_reg_dirty,
> -                         ARRAY_SIZE(hvmemul_ctxt->seg_reg));
> +    if ( likely(!dirty) )
> +        return;
>  
> -    while ( seg < ARRAY_SIZE(hvmemul_ctxt->seg_reg) )
> -    {
> -        hvm_set_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
> -        seg = find_next_bit(&hvmemul_ctxt->seg_reg_dirty,
> -                            ARRAY_SIZE(hvmemul_ctxt->seg_reg),
> -                            seg+1);
> -    }
> +    curr = current;
> +
> +    for_each_set_bit ( seg, dirty )
> +        hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]);
> +
> +    hvmemul_ctxt->seg_reg_dirty = 0;

Why is this suddenly appearing here? You don't mention it in the description,
so it's not clear whether you found a (however minor) issue, or whether
that's purely cosmetic (yet then it's still an extra store we could do
without).

Jan
Andrew Cooper Aug. 28, 2024, 2:44 p.m. UTC | #2
On 27/08/2024 5:07 pm, Jan Beulich wrote:
> On 27.08.2024 15:57, Andrew Cooper wrote:
>> ... which is more consise than the opencoded form.
>>
>> Also, for production VMs, ~100% of emulations are simple MOVs, so it is likely
>> that there are no segments to write back.
>>
>> Furthermore, now that find_{first,next}_bit() are no longer in use, the
>> seg_reg_{accessed,dirty} fields aren't forced to be unsigned long, although
>> they do need to remain unsigned int because of __set_bit() elsewhere.
>>
>> No practical change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Pulling current out into curr is good for code generation.  When using current
>> in the loop, GCC can't retain the calculation across the call to
>> hvm_set_segment_register() and is forced to re-read from the cpu_info block.
>>
>> However, if curr is initialised, it's calculated even in the likely path...
> That's a little odd, as I don't think I can spot what would force the compiler
> into doing so. As a wild guess, ...
>
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -2908,18 +2908,18 @@ void hvm_emulate_init_per_insn(
>>  void hvm_emulate_writeback(
>>      struct hvm_emulate_ctxt *hvmemul_ctxt)
>>  {
>> -    enum x86_segment seg;
>> +    struct vcpu *curr;
>> +    unsigned int dirty = hvmemul_ctxt->seg_reg_dirty;
> ... is the order of these two possibly relevant? Yet of course it's not the
> end of the world whichever way it's done.

My general conclusion is that GCC doesn't try very hard to optimise
likely() early exits.

I suspect there's some reasonably low hanging fruit to be found there.

>
>> -    seg = find_first_bit(&hvmemul_ctxt->seg_reg_dirty,
>> -                         ARRAY_SIZE(hvmemul_ctxt->seg_reg));
>> +    if ( likely(!dirty) )
>> +        return;
>>  
>> -    while ( seg < ARRAY_SIZE(hvmemul_ctxt->seg_reg) )
>> -    {
>> -        hvm_set_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
>> -        seg = find_next_bit(&hvmemul_ctxt->seg_reg_dirty,
>> -                            ARRAY_SIZE(hvmemul_ctxt->seg_reg),
>> -                            seg+1);
>> -    }
>> +    curr = current;
>> +
>> +    for_each_set_bit ( seg, dirty )
>> +        hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]);
>> +
>> +    hvmemul_ctxt->seg_reg_dirty = 0;
> Why is this suddenly appearing here? You don't mention it in the description,
> so it's not clear whether you found a (however minor) issue, or whether
> that's purely cosmetic (yet then it's still an extra store we could do
> without).

Oh, yes.  Nothing anywhere in Xen ever clears these segment dirty bits.

I suspect the worst that will go wrong is that we'll waste time
re-{VMWRITE,memcpy}-ing the segment registers into the VMCS/VMCB, but
the logic in Xen is definitely not right.

~Andrew
Jan Beulich Aug. 28, 2024, 2:56 p.m. UTC | #3
On 28.08.2024 16:44, Andrew Cooper wrote:
> On 27/08/2024 5:07 pm, Jan Beulich wrote:
>> On 27.08.2024 15:57, Andrew Cooper wrote:
>>> +    for_each_set_bit ( seg, dirty )
>>> +        hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]);
>>> +
>>> +    hvmemul_ctxt->seg_reg_dirty = 0;
>> Why is this suddenly appearing here? You don't mention it in the description,
>> so it's not clear whether you found a (however minor) issue, or whether
>> that's purely cosmetic (yet then it's still an extra store we could do
>> without).
> 
> Oh, yes.  Nothing anywhere in Xen ever clears these segment dirty bits.

hvm_emulate_init_once()?

> I suspect the worst that will go wrong is that we'll waste time
> re-{VMWRITE,memcpy}-ing the segment registers into the VMCS/VMCB, but
> the logic in Xen is definitely not right.

I'm on the edge of asking to do such clearing before emulation, not after
processing the dirty bits. That would then be hvm_emulate_init_per_insn(),
well centralized.

Jan
Andrew Cooper Aug. 28, 2024, 6:56 p.m. UTC | #4
On 28/08/2024 3:56 pm, Jan Beulich wrote:
> On 28.08.2024 16:44, Andrew Cooper wrote:
>> On 27/08/2024 5:07 pm, Jan Beulich wrote:
>>> On 27.08.2024 15:57, Andrew Cooper wrote:
>>>> +    for_each_set_bit ( seg, dirty )
>>>> +        hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]);
>>>> +
>>>> +    hvmemul_ctxt->seg_reg_dirty = 0;
>>> Why is this suddenly appearing here? You don't mention it in the description,
>>> so it's not clear whether you found a (however minor) issue, or whether
>>> that's purely cosmetic (yet then it's still an extra store we could do
>>> without).
>> Oh, yes.  Nothing anywhere in Xen ever clears these segment dirty bits.
> hvm_emulate_init_once()?

I meant after emulation.  The value is initialised to 0 at the start of day.

>
>> I suspect the worst that will go wrong is that we'll waste time
>> re-{VMWRITE,memcpy}-ing the segment registers into the VMCS/VMCB, but
>> the logic in Xen is definitely not right.
> I'm on the edge of asking to do such clearing before emulation, not after
> processing the dirty bits. That would then be hvm_emulate_init_per_insn(),
> well centralized.

Specifically, hvmemul_ctxt should not believe itself to be dirty after a
call to hvm_emulate_writeback(), because that's the logic to make the
context no-longer-dirty.

That said, the more I look at this, the less convinced I am by it.  For
a function named writeback(), it's doing a very narrow thing that is not
the usual meaning of the term when it comes to pipelines or insn
emulation...

~Andrew
Jan Beulich Aug. 29, 2024, 6:13 a.m. UTC | #5
On 28.08.2024 20:56, Andrew Cooper wrote:
> On 28/08/2024 3:56 pm, Jan Beulich wrote:
>> On 28.08.2024 16:44, Andrew Cooper wrote:
>>> On 27/08/2024 5:07 pm, Jan Beulich wrote:
>>>> On 27.08.2024 15:57, Andrew Cooper wrote:
>>>>> +    for_each_set_bit ( seg, dirty )
>>>>> +        hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]);
>>>>> +
>>>>> +    hvmemul_ctxt->seg_reg_dirty = 0;
>>>> Why is this suddenly appearing here? You don't mention it in the description,
>>>> so it's not clear whether you found a (however minor) issue, or whether
>>>> that's purely cosmetic (yet then it's still an extra store we could do
>>>> without).
>>> Oh, yes.  Nothing anywhere in Xen ever clears these segment dirty bits.
>> hvm_emulate_init_once()?
> 
> I meant after emulation.  The value is initialised to 0 at the start of day.
> 
>>
>>> I suspect the worst that will go wrong is that we'll waste time
>>> re-{VMWRITE,memcpy}-ing the segment registers into the VMCS/VMCB, but
>>> the logic in Xen is definitely not right.
>> I'm on the edge of asking to do such clearing before emulation, not after
>> processing the dirty bits. That would then be hvm_emulate_init_per_insn(),
>> well centralized.
> 
> Specifically, hvmemul_ctxt should not believe itself to be dirty after a
> call to hvm_emulate_writeback(), because that's the logic to make the
> context no-longer-dirty.

That's one aspect, yes. Debuggability is another. For that retaining state
until it strictly needs clearing out may be helpful. Plus ...

> That said, the more I look at this, the less convinced I am by it.  For
> a function named writeback(), it's doing a very narrow thing that is not
> the usual meaning of the term when it comes to pipelines or insn
> emulation...

... as you say here.

Anyway - I'll leave where to put the clearing to you, just as long as it's
at least mentioned in the description.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index feb4792cc567..732bdbab25b0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2908,18 +2908,18 @@  void hvm_emulate_init_per_insn(
 void hvm_emulate_writeback(
     struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
-    enum x86_segment seg;
+    struct vcpu *curr;
+    unsigned int dirty = hvmemul_ctxt->seg_reg_dirty;
 
-    seg = find_first_bit(&hvmemul_ctxt->seg_reg_dirty,
-                         ARRAY_SIZE(hvmemul_ctxt->seg_reg));
+    if ( likely(!dirty) )
+        return;
 
-    while ( seg < ARRAY_SIZE(hvmemul_ctxt->seg_reg) )
-    {
-        hvm_set_segment_register(current, seg, &hvmemul_ctxt->seg_reg[seg]);
-        seg = find_next_bit(&hvmemul_ctxt->seg_reg_dirty,
-                            ARRAY_SIZE(hvmemul_ctxt->seg_reg),
-                            seg+1);
-    }
+    curr = current;
+
+    for_each_set_bit ( seg, dirty )
+        hvm_set_segment_register(curr, seg, &hvmemul_ctxt->seg_reg[seg]);
+
+    hvmemul_ctxt->seg_reg_dirty = 0;
 }
 
 /*
diff --git a/xen/arch/x86/include/asm/hvm/emulate.h b/xen/arch/x86/include/asm/hvm/emulate.h
index 29d679442e10..972cdf1fa0cf 100644
--- a/xen/arch/x86/include/asm/hvm/emulate.h
+++ b/xen/arch/x86/include/asm/hvm/emulate.h
@@ -36,8 +36,8 @@  struct hvm_emulate_ctxt {
     unsigned int insn_buf_bytes;
 
     struct segment_register seg_reg[10];
-    unsigned long seg_reg_accessed;
-    unsigned long seg_reg_dirty;
+    unsigned int seg_reg_accessed;
+    unsigned int seg_reg_dirty;
 
     /*
      * MFNs behind temporary mappings in the write callback.  The length is