Message ID | 716868cb-6a94-4470-a1a5-a4b5994e8195@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/HVM: drop stdvga caching mode | expand |
On 11/09/2024 1:29 pm, Jan Beulich wrote: > No state is left to protect. It being the last field, drop the struct > itself as well. Similarly for then ending up empty, drop the .complete > handler. > > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> with one change. > --- > v2: New. > > --- a/xen/arch/x86/hvm/stdvga.c > +++ b/xen/arch/x86/hvm/stdvga.c > @@ -69,8 +69,6 @@ static int cf_check stdvga_mem_write( > static bool cf_check stdvga_mem_accept( > const struct hvm_io_handler *handler, const ioreq_t *p) > { > - struct hvm_hw_stdvga *s = ¤t->domain->arch.hvm.stdvga; > - > /* > * The range check must be done without taking the lock, to avoid > * deadlock when hvm_mmio_internal() is called from > @@ -80,50 +78,31 @@ static bool cf_check stdvga_mem_accept( > (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) ) > return 0; This wants adjusting too. At a minimum the comment about deadlock needs dropping, and a straight delete is fine. However for performance, we also want to do the dir/ptr/count exclusions before the address range exclusions, meaning that ... > > - spin_lock(&s->lock); > - > if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 ) > { > /* > * Only accept single direct writes, as that's the only thing we can > * accelerate using buffered ioreq handling. > */ ... it wants merging with this into a single expression. ~Andrew
On 11.09.2024 14:42, Andrew Cooper wrote: > On 11/09/2024 1:29 pm, Jan Beulich wrote: >> No state is left to protect. It being the last field, drop the struct >> itself as well. Similarly for then ending up empty, drop the .complete >> handler. >> >> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> with one change. Thanks. >> --- a/xen/arch/x86/hvm/stdvga.c >> +++ b/xen/arch/x86/hvm/stdvga.c >> @@ -69,8 +69,6 @@ static int cf_check stdvga_mem_write( >> static bool cf_check stdvga_mem_accept( >> const struct hvm_io_handler *handler, const ioreq_t *p) >> { >> - struct hvm_hw_stdvga *s = ¤t->domain->arch.hvm.stdvga; >> - >> /* >> * The range check must be done without taking the lock, to avoid >> * deadlock when hvm_mmio_internal() is called from >> @@ -80,50 +78,31 @@ static bool cf_check stdvga_mem_accept( >> (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) ) >> return 0; > > This wants adjusting too. At a minimum the comment about deadlock needs > dropping, and a straight delete is fine. Oh, of course. I meant to but then forgot. > However for performance, we also want to do the dir/ptr/count exclusions > before the address range exclusions, meaning that ... > >> >> - spin_lock(&s->lock); >> - >> if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 ) >> { >> /* >> * Only accept single direct writes, as that's the only thing we can >> * accelerate using buffered ioreq handling. >> */ > > ... it wants merging with this into a single expression. I'm not convinced, and hence would at least want to keep this separate. Which exact order checks want doing in would require more detailed analysis imo. Or do you have blindingly obvious reasons to believe that the re-ordering you suggest is always going to be an improvement? Jan
On 11/09/2024 1:58 pm, Jan Beulich wrote: > On 11.09.2024 14:42, Andrew Cooper wrote: >> On 11/09/2024 1:29 pm, Jan Beulich wrote: >> However for performance, we also want to do the dir/ptr/count exclusions >> before the address range exclusions, meaning that ... >> >>> >>> - spin_lock(&s->lock); >>> - >>> if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 ) >>> { >>> /* >>> * Only accept single direct writes, as that's the only thing we can >>> * accelerate using buffered ioreq handling. >>> */ >> ... it wants merging with this into a single expression. > I'm not convinced, and hence would at least want to keep this separate. > Which exact order checks want doing in would require more detailed > analysis imo. Or do you have blindingly obvious reasons to believe that > the re-ordering you suggest is always going to be an improvement? I'm not overly fussed if this is delayed to a later patch. My review stands as long as the comment is gone. But, right now, accept() is called linearly over all handlers (there's not range based registration) so *every* IO comes through this logic path. The likely path is the excluded path. ioreq_mmio_{first,last}_byte() are non-trivial logic because they account for DF, so being able to exclude based on direction/size before the DF calculations is a definite improvement. ~Andrew
On 11.09.2024 15:07, Andrew Cooper wrote: > On 11/09/2024 1:58 pm, Jan Beulich wrote: >> On 11.09.2024 14:42, Andrew Cooper wrote: >>> On 11/09/2024 1:29 pm, Jan Beulich wrote: >>> However for performance, we also want to do the dir/ptr/count exclusions >>> before the address range exclusions, meaning that ... >>> >>>> >>>> - spin_lock(&s->lock); >>>> - >>>> if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 ) >>>> { >>>> /* >>>> * Only accept single direct writes, as that's the only thing we can >>>> * accelerate using buffered ioreq handling. >>>> */ >>> ... it wants merging with this into a single expression. >> I'm not convinced, and hence would at least want to keep this separate. >> Which exact order checks want doing in would require more detailed >> analysis imo. Or do you have blindingly obvious reasons to believe that >> the re-ordering you suggest is always going to be an improvement? > > I'm not overly fussed if this is delayed to a later patch. My review > stands as long as the comment is gone. > > But, right now, accept() is called linearly over all handlers (there's > not range based registration) so *every* IO comes through this logic path. Not exactly every, only ones not claimed earlier. But yes. > The likely path is the excluded path. ioreq_mmio_{first,last}_byte() > are non-trivial logic because they account for DF, so being able to > exclude based on direction/size before the DF calculations is a definite > improvement. Perhaps. Yet if we were to re-order, calling ioreq_mmio_{first,last}_byte() becomes questionable in the first place. I wouldn't expect the compiler to spot that it can reduce those expressions as a result of knowing ->count being 1 (and hence ->df playing no role at all). Maybe I'm overly pessimistic ... Jan
--- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -69,8 +69,6 @@ static int cf_check stdvga_mem_write( static bool cf_check stdvga_mem_accept( const struct hvm_io_handler *handler, const ioreq_t *p) { - struct hvm_hw_stdvga *s = ¤t->domain->arch.hvm.stdvga; - /* * The range check must be done without taking the lock, to avoid * deadlock when hvm_mmio_internal() is called from @@ -80,50 +78,31 @@ static bool cf_check stdvga_mem_accept( (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) ) return 0; - spin_lock(&s->lock); - if ( p->dir != IOREQ_WRITE || p->data_is_ptr || p->count != 1 ) { /* * Only accept single direct writes, as that's the only thing we can * accelerate using buffered ioreq handling. */ - goto reject; + return false; } - /* s->lock intentionally held */ - return 1; - - reject: - spin_unlock(&s->lock); - return 0; -} - -static void cf_check stdvga_mem_complete(const struct hvm_io_handler *handler) -{ - struct hvm_hw_stdvga *s = ¤t->domain->arch.hvm.stdvga; - - spin_unlock(&s->lock); + return true; } static const struct hvm_io_ops stdvga_mem_ops = { .accept = stdvga_mem_accept, .read = stdvga_mem_read, .write = stdvga_mem_write, - .complete = stdvga_mem_complete }; void stdvga_init(struct domain *d) { - struct hvm_hw_stdvga *s = &d->arch.hvm.stdvga; struct hvm_io_handler *handler; if ( !has_vvga(d) ) return; - memset(s, 0, sizeof(*s)); - spin_lock_init(&s->lock); - /* VGA memory */ handler = hvm_next_io_handler(d); if ( handler ) --- a/xen/arch/x86/include/asm/hvm/domain.h +++ b/xen/arch/x86/include/asm/hvm/domain.h @@ -72,7 +72,6 @@ struct hvm_domain { struct hvm_hw_vpic vpic[2]; /* 0=master; 1=slave */ struct hvm_vioapic **vioapic; unsigned int nr_vioapics; - struct hvm_hw_stdvga stdvga; /* * hvm_hw_pmtimer is a publicly-visible name. We will defer renaming --- a/xen/arch/x86/include/asm/hvm/io.h +++ b/xen/arch/x86/include/asm/hvm/io.h @@ -110,10 +110,6 @@ struct vpci_arch_msix_entry { int pirq; }; -struct hvm_hw_stdvga { - spinlock_t lock; -}; - void stdvga_init(struct domain *d); extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
No state is left to protect. It being the last field, drop the struct itself as well. Similarly for then ending up empty, drop the .complete handler. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: New.