diff mbox series

[v2,8/9] x86/HVM: drop stdvga's "lock" struct member

Message ID 716868cb-6a94-4470-a1a5-a4b5994e8195@suse.com (mailing list archive)
State New
Headers show
Series x86/HVM: drop stdvga caching mode | expand

Commit Message

Jan Beulich Sept. 11, 2024, 12:29 p.m. UTC
No state is left to protect. It being the last field, drop the struct
itself as well. Similarly for then ending up empty, drop the .complete
handler.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

Comments

Andrew Cooper Sept. 11, 2024, 12:42 p.m. UTC | #1
On 11/09/2024 1:29 pm, Jan Beulich wrote:
> No state is left to protect. It being the last field, drop the struct
> itself as well. Similarly for then ending up empty, drop the .complete
> handler.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> with one change.

> ---
> v2: New.
>
> --- a/xen/arch/x86/hvm/stdvga.c
> +++ b/xen/arch/x86/hvm/stdvga.c
> @@ -69,8 +69,6 @@ static int cf_check stdvga_mem_write(
>  static bool cf_check stdvga_mem_accept(
>      const struct hvm_io_handler *handler, const ioreq_t *p)
>  {
> -    struct hvm_hw_stdvga *s = &current->domain->arch.hvm.stdvga;
> -
>      /*
>       * The range check must be done without taking the lock, to avoid
>       * deadlock when hvm_mmio_internal() is called from
> @@ -80,50 +78,31 @@ static bool cf_check stdvga_mem_accept(
>           (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
>          return 0;

This wants adjusting too.  At a minimum the comment about deadlock needs
dropping, and a straight delete is fine.

However for performance, we also want to do the dir/ptr/count exclusions
before the address range exclusions, meaning that ...

>  
> -    spin_lock(&s->lock);
> -
>      if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 )
>      {
>          /*
>           * Only accept single direct writes, as that's the only thing we can
>           * accelerate using buffered ioreq handling.
>           */

... it wants merging with this into a single expression.

~Andrew
Jan Beulich Sept. 11, 2024, 12:58 p.m. UTC | #2
On 11.09.2024 14:42, Andrew Cooper wrote:
> On 11/09/2024 1:29 pm, Jan Beulich wrote:
>> No state is left to protect. It being the last field, drop the struct
>> itself as well. Similarly for then ending up empty, drop the .complete
>> handler.
>>
>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> with one change.

Thanks.

>> --- a/xen/arch/x86/hvm/stdvga.c
>> +++ b/xen/arch/x86/hvm/stdvga.c
>> @@ -69,8 +69,6 @@ static int cf_check stdvga_mem_write(
>>  static bool cf_check stdvga_mem_accept(
>>      const struct hvm_io_handler *handler, const ioreq_t *p)
>>  {
>> -    struct hvm_hw_stdvga *s = &current->domain->arch.hvm.stdvga;
>> -
>>      /*
>>       * The range check must be done without taking the lock, to avoid
>>       * deadlock when hvm_mmio_internal() is called from
>> @@ -80,50 +78,31 @@ static bool cf_check stdvga_mem_accept(
>>           (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
>>          return 0;
> 
> This wants adjusting too.  At a minimum the comment about deadlock needs
> dropping, and a straight delete is fine.

Oh, of course. I meant to but then forgot.

> However for performance, we also want to do the dir/ptr/count exclusions
> before the address range exclusions, meaning that ...
> 
>>  
>> -    spin_lock(&s->lock);
>> -
>>      if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 )
>>      {
>>          /*
>>           * Only accept single direct writes, as that's the only thing we can
>>           * accelerate using buffered ioreq handling.
>>           */
> 
> ... it wants merging with this into a single expression.

I'm not convinced, and hence would at least want to keep this separate.
Which exact order checks want doing in would require more detailed
analysis imo. Or do you have blindingly obvious reasons to believe that
the re-ordering you suggest is always going to be an improvement?

Jan
Andrew Cooper Sept. 11, 2024, 1:07 p.m. UTC | #3
On 11/09/2024 1:58 pm, Jan Beulich wrote:
> On 11.09.2024 14:42, Andrew Cooper wrote:
>> On 11/09/2024 1:29 pm, Jan Beulich wrote:
>> However for performance, we also want to do the dir/ptr/count exclusions
>> before the address range exclusions, meaning that ...
>>
>>>  
>>> -    spin_lock(&s->lock);
>>> -
>>>      if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 )
>>>      {
>>>          /*
>>>           * Only accept single direct writes, as that's the only thing we can
>>>           * accelerate using buffered ioreq handling.
>>>           */
>> ... it wants merging with this into a single expression.
> I'm not convinced, and hence would at least want to keep this separate.
> Which exact order checks want doing in would require more detailed
> analysis imo. Or do you have blindingly obvious reasons to believe that
> the re-ordering you suggest is always going to be an improvement?

I'm not overly fussed if this is delayed to a later patch.  My review
stands as long as the comment is gone.

But, right now, accept() is called linearly over all handlers (there's
not range based registration) so *every* IO comes through this logic path.

The likely path is the excluded path.  ioreq_mmio_{first,last}_byte()
are non-trivial logic because they account for DF, so being able to
exclude based on direction/size before the DF calculations is a definite
improvement.

~Andrew
Jan Beulich Sept. 11, 2024, 2:05 p.m. UTC | #4
On 11.09.2024 15:07, Andrew Cooper wrote:
> On 11/09/2024 1:58 pm, Jan Beulich wrote:
>> On 11.09.2024 14:42, Andrew Cooper wrote:
>>> On 11/09/2024 1:29 pm, Jan Beulich wrote:
>>> However for performance, we also want to do the dir/ptr/count exclusions
>>> before the address range exclusions, meaning that ...
>>>
>>>>  
>>>> -    spin_lock(&s->lock);
>>>> -
>>>>      if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 )
>>>>      {
>>>>          /*
>>>>           * Only accept single direct writes, as that's the only thing we can
>>>>           * accelerate using buffered ioreq handling.
>>>>           */
>>> ... it wants merging with this into a single expression.
>> I'm not convinced, and hence would at least want to keep this separate.
>> Which exact order checks want doing in would require more detailed
>> analysis imo. Or do you have blindingly obvious reasons to believe that
>> the re-ordering you suggest is always going to be an improvement?
> 
> I'm not overly fussed if this is delayed to a later patch.  My review
> stands as long as the comment is gone.
> 
> But, right now, accept() is called linearly over all handlers (there's
> not range based registration) so *every* IO comes through this logic path.

Not exactly every, only ones not claimed earlier. But yes.

> The likely path is the excluded path.  ioreq_mmio_{first,last}_byte()
> are non-trivial logic because they account for DF, so being able to
> exclude based on direction/size before the DF calculations is a definite
> improvement.

Perhaps. Yet if we were to re-order, calling ioreq_mmio_{first,last}_byte()
becomes questionable in the first place. I wouldn't expect the compiler to
spot that it can reduce those expressions as a result of knowing ->count
being 1 (and hence ->df playing no role at all). Maybe I'm overly
pessimistic ...

Jan
diff mbox series

Patch

--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -69,8 +69,6 @@  static int cf_check stdvga_mem_write(
 static bool cf_check stdvga_mem_accept(
     const struct hvm_io_handler *handler, const ioreq_t *p)
 {
-    struct hvm_hw_stdvga *s = &current->domain->arch.hvm.stdvga;
-
     /*
      * The range check must be done without taking the lock, to avoid
      * deadlock when hvm_mmio_internal() is called from
@@ -80,50 +78,31 @@  static bool cf_check stdvga_mem_accept(
          (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
         return 0;
 
-    spin_lock(&s->lock);
-
     if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 )
     {
         /*
          * Only accept single direct writes, as that's the only thing we can
          * accelerate using buffered ioreq handling.
          */
-        goto reject;
+        return false;
     }
 
-    /* s->lock intentionally held */
-    return 1;
-
- reject:
-    spin_unlock(&s->lock);
-    return 0;
-}
-
-static void cf_check stdvga_mem_complete(const struct hvm_io_handler *handler)
-{
-    struct hvm_hw_stdvga *s = &current->domain->arch.hvm.stdvga;
-
-    spin_unlock(&s->lock);
+    return true;
 }
 
 static const struct hvm_io_ops stdvga_mem_ops = {
     .accept = stdvga_mem_accept,
     .read = stdvga_mem_read,
     .write = stdvga_mem_write,
-    .complete = stdvga_mem_complete
 };
 
 void stdvga_init(struct domain *d)
 {
-    struct hvm_hw_stdvga *s = &d->arch.hvm.stdvga;
     struct hvm_io_handler *handler;
 
     if ( !has_vvga(d) )
         return;
 
-    memset(s, 0, sizeof(*s));
-    spin_lock_init(&s->lock);
-    
     /* VGA memory */
     handler = hvm_next_io_handler(d);
     if ( handler )
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -72,7 +72,6 @@  struct hvm_domain {
     struct hvm_hw_vpic     vpic[2]; /* 0=master; 1=slave */
     struct hvm_vioapic    **vioapic;
     unsigned int           nr_vioapics;
-    struct hvm_hw_stdvga   stdvga;
 
     /*
      * hvm_hw_pmtimer is a publicly-visible name. We will defer renaming
--- a/xen/arch/x86/include/asm/hvm/io.h
+++ b/xen/arch/x86/include/asm/hvm/io.h
@@ -110,10 +110,6 @@  struct vpci_arch_msix_entry {
     int pirq;
 };
 
-struct hvm_hw_stdvga {
-    spinlock_t lock;
-};
-
 void stdvga_init(struct domain *d);
 
 extern void hvm_dpci_msi_eoi(struct domain *d, int vector);