Message ID | 1606732298-22107-18-git-send-email-olekstysh@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V3,01/23] x86/ioreq: Prepare IOREQ feature for making it common | expand |
On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: > --- a/xen/include/xen/ioreq.h > +++ b/xen/include/xen/ioreq.h > @@ -55,6 +55,20 @@ struct ioreq_server { > uint8_t bufioreq_handling; > }; > > +/* > + * This should only be used when d == current->domain and it's not paused, Is the "not paused" part really relevant here? Besides it being rare that the current domain would be paused (if so, it's in the process of having all its vCPU-s scheduled out), does this matter at all? Apart from this the patch looks okay to me, but I'm not sure it addresses Paul's concerns. Iirc he had suggested to switch back to a list if doing a swipe over the entire array is too expensive in this specific case. Jan
On 08.12.20 17:11, Jan Beulich wrote: Hi Jan > On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: >> --- a/xen/include/xen/ioreq.h >> +++ b/xen/include/xen/ioreq.h >> @@ -55,6 +55,20 @@ struct ioreq_server { >> uint8_t bufioreq_handling; >> }; >> >> +/* >> + * This should only be used when d == current->domain and it's not paused, > Is the "not paused" part really relevant here? Besides it being rare > that the current domain would be paused (if so, it's in the process > of having all its vCPU-s scheduled out), does this matter at all?do any extra actionsdo any extra actions No, it isn't relevant, I will drop it. > > Apart from this the patch looks okay to me, but I'm not sure it > addresses Paul's concerns. Iirc he had suggested to switch back to > a list if doing a swipe over the entire array is too expensive in > this specific case. We would like to avoid to do any extra actions in leave_hypervisor_to_guest() if possible. But not only there, the logic whether we check/set mapcache_invalidation variable could be avoided if a domain doesn't use IOREQ server...
Hi Paul. On 08.12.20 17:33, Oleksandr wrote: > > On 08.12.20 17:11, Jan Beulich wrote: > > Hi Jan > >> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: >>> --- a/xen/include/xen/ioreq.h >>> +++ b/xen/include/xen/ioreq.h >>> @@ -55,6 +55,20 @@ struct ioreq_server { >>> uint8_t bufioreq_handling; >>> }; >>> +/* >>> + * This should only be used when d == current->domain and it's not >>> paused, >> Is the "not paused" part really relevant here? Besides it being rare >> that the current domain would be paused (if so, it's in the process >> of having all its vCPU-s scheduled out), does this matter at all?do >> any extra actionsdo any extra actions > > No, it isn't relevant, I will drop it. > > >> >> Apart from this the patch looks okay to me, but I'm not sure it >> addresses Paul's concerns. Iirc he had suggested to switch back to >> a list if doing a swipe over the entire array is too expensive in >> this specific case. > We would like to avoid to do any extra actions in > leave_hypervisor_to_guest() if possible. > But not only there, the logic whether we check/set > mapcache_invalidation variable could be avoided if a domain doesn't > use IOREQ server... Are you OK with this patch (common part of it)?
> -----Original Message----- > From: Oleksandr <olekstysh@gmail.com> > Sent: 08 December 2020 16:57 > To: Paul Durrant <paul@xen.org> > Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano > Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk > <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap > <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall > <julien.grall@arm.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() > > > Hi Paul. > > > On 08.12.20 17:33, Oleksandr wrote: > > > > On 08.12.20 17:11, Jan Beulich wrote: > > > > Hi Jan > > > >> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: > >>> --- a/xen/include/xen/ioreq.h > >>> +++ b/xen/include/xen/ioreq.h > >>> @@ -55,6 +55,20 @@ struct ioreq_server { > >>> uint8_t bufioreq_handling; > >>> }; > >>> +/* > >>> + * This should only be used when d == current->domain and it's not > >>> paused, > >> Is the "not paused" part really relevant here? Besides it being rare > >> that the current domain would be paused (if so, it's in the process > >> of having all its vCPU-s scheduled out), does this matter at all?do > >> any extra actionsdo any extra actions > > > > No, it isn't relevant, I will drop it. > > > > > >> > >> Apart from this the patch looks okay to me, but I'm not sure it > >> addresses Paul's concerns. Iirc he had suggested to switch back to > >> a list if doing a swipe over the entire array is too expensive in > >> this specific case. > > We would like to avoid to do any extra actions in > > leave_hypervisor_to_guest() if possible. > > But not only there, the logic whether we check/set > > mapcache_invalidation variable could be avoided if a domain doesn't > > use IOREQ server... > > > Are you OK with this patch (common part of it)? How much of a performance benefit is this? The array is small to simply counting the non-NULL entries should be pretty quick. Paul > > > -- > Regards, > > Oleksandr Tyshchenko
On 08.12.20 21:43, Paul Durrant wrote: Hi Paul >> -----Original Message----- >> From: Oleksandr <olekstysh@gmail.com> >> Sent: 08 December 2020 16:57 >> To: Paul Durrant <paul@xen.org> >> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano >> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk >> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap >> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall >> <julien.grall@arm.com>; xen-devel@lists.xenproject.org >> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() >> >> >> Hi Paul. >> >> >> On 08.12.20 17:33, Oleksandr wrote: >>> On 08.12.20 17:11, Jan Beulich wrote: >>> >>> Hi Jan >>> >>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: >>>>> --- a/xen/include/xen/ioreq.h >>>>> +++ b/xen/include/xen/ioreq.h >>>>> @@ -55,6 +55,20 @@ struct ioreq_server { >>>>> uint8_t bufioreq_handling; >>>>> }; >>>>> +/* >>>>> + * This should only be used when d == current->domain and it's not >>>>> paused, >>>> Is the "not paused" part really relevant here? Besides it being rare >>>> that the current domain would be paused (if so, it's in the process >>>> of having all its vCPU-s scheduled out), does this matter at all?do >>>> any extra actionsdo any extra actions >>> No, it isn't relevant, I will drop it. >>> >>> >>>> Apart from this the patch looks okay to me, but I'm not sure it >>>> addresses Paul's concerns. Iirc he had suggested to switch back to >>>> a list if doing a swipe over the entire array is too expensive in >>>> this specific case. >>> We would like to avoid to do any extra actions in >>> leave_hypervisor_to_guest() if possible. >>> But not only there, the logic whether we check/set >>> mapcache_invalidation variable could be avoided if a domain doesn't >>> use IOREQ server... >> >> Are you OK with this patch (common part of it)? > How much of a performance benefit is this? The array is small to simply counting the non-NULL entries should be pretty quick. I didn't perform performance measurements on how much this call consumes. In our system we run three domains. The emulator is in DomD only, so I would like to avoid to call vcpu_ioreq_handle_completion() for every Dom0/DomU's vCPUs if there is no real need to do it. On Arm vcpu_ioreq_handle_completion() is called with IRQ enabled, so the call is accompanied with corresponding irq_enable/irq_disable. These unneeded actions could be avoided by using this simple one-line helper...
> -----Original Message----- > From: Oleksandr <olekstysh@gmail.com> > Sent: 08 December 2020 20:17 > To: paul@xen.org > Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; > 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk' > <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' > <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall' > <julien.grall@arm.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() > > > On 08.12.20 21:43, Paul Durrant wrote: > > Hi Paul > > >> -----Original Message----- > >> From: Oleksandr <olekstysh@gmail.com> > >> Sent: 08 December 2020 16:57 > >> To: Paul Durrant <paul@xen.org> > >> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano > >> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk > >> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap > >> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall > >> <julien.grall@arm.com>; xen-devel@lists.xenproject.org > >> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() > >> > >> > >> Hi Paul. > >> > >> > >> On 08.12.20 17:33, Oleksandr wrote: > >>> On 08.12.20 17:11, Jan Beulich wrote: > >>> > >>> Hi Jan > >>> > >>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: > >>>>> --- a/xen/include/xen/ioreq.h > >>>>> +++ b/xen/include/xen/ioreq.h > >>>>> @@ -55,6 +55,20 @@ struct ioreq_server { > >>>>> uint8_t bufioreq_handling; > >>>>> }; > >>>>> +/* > >>>>> + * This should only be used when d == current->domain and it's not > >>>>> paused, > >>>> Is the "not paused" part really relevant here? Besides it being rare > >>>> that the current domain would be paused (if so, it's in the process > >>>> of having all its vCPU-s scheduled out), does this matter at all?do > >>>> any extra actionsdo any extra actions > >>> No, it isn't relevant, I will drop it. > >>> > >>> > >>>> Apart from this the patch looks okay to me, but I'm not sure it > >>>> addresses Paul's concerns. Iirc he had suggested to switch back to > >>>> a list if doing a swipe over the entire array is too expensive in > >>>> this specific case. > >>> We would like to avoid to do any extra actions in > >>> leave_hypervisor_to_guest() if possible. > >>> But not only there, the logic whether we check/set > >>> mapcache_invalidation variable could be avoided if a domain doesn't > >>> use IOREQ server... > >> > >> Are you OK with this patch (common part of it)? > > How much of a performance benefit is this? The array is small to simply counting the non-NULL > entries should be pretty quick. > I didn't perform performance measurements on how much this call consumes. > In our system we run three domains. The emulator is in DomD only, so I > would like to avoid to call vcpu_ioreq_handle_completion() for every > Dom0/DomU's vCPUs > if there is no real need to do it. This is not relevant to the domain that the emulator is running in; it's concerning the domains which the emulator is servicing. How many of those are there? > On Arm vcpu_ioreq_handle_completion() > is called with IRQ enabled, so the call is accompanied with > corresponding irq_enable/irq_disable. > These unneeded actions could be avoided by using this simple one-line > helper... > The helper may be one line but there is more to the patch than that. I still think you could just walk the array in the helper rather than keeping a running occupancy count. Paul
Hi Oleksandr and Paul, Sorry for jumping late in the conversation. On 09/12/2020 09:01, Paul Durrant wrote: >> -----Original Message----- >> From: Oleksandr <olekstysh@gmail.com> >> Sent: 08 December 2020 20:17 >> To: paul@xen.org >> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; >> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk' >> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' >> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall' >> <julien.grall@arm.com>; xen-devel@lists.xenproject.org >> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() >> >> >> On 08.12.20 21:43, Paul Durrant wrote: >> >> Hi Paul >> >>>> -----Original Message----- >>>> From: Oleksandr <olekstysh@gmail.com> >>>> Sent: 08 December 2020 16:57 >>>> To: Paul Durrant <paul@xen.org> >>>> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano >>>> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk >>>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap >>>> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall >>>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org >>>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() >>>> >>>> >>>> Hi Paul. >>>> >>>> >>>> On 08.12.20 17:33, Oleksandr wrote: >>>>> On 08.12.20 17:11, Jan Beulich wrote: >>>>> >>>>> Hi Jan >>>>> >>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: >>>>>>> --- a/xen/include/xen/ioreq.h >>>>>>> +++ b/xen/include/xen/ioreq.h >>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server { >>>>>>> uint8_t bufioreq_handling; >>>>>>> }; >>>>>>> +/* >>>>>>> + * This should only be used when d == current->domain and it's not >>>>>>> paused, >>>>>> Is the "not paused" part really relevant here? Besides it being rare >>>>>> that the current domain would be paused (if so, it's in the process >>>>>> of having all its vCPU-s scheduled out), does this matter at all?do >>>>>> any extra actionsdo any extra actions >>>>> No, it isn't relevant, I will drop it. >>>>> >>>>> >>>>>> Apart from this the patch looks okay to me, but I'm not sure it >>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to >>>>>> a list if doing a swipe over the entire array is too expensive in >>>>>> this specific case. >>>>> We would like to avoid to do any extra actions in >>>>> leave_hypervisor_to_guest() if possible. >>>>> But not only there, the logic whether we check/set >>>>> mapcache_invalidation variable could be avoided if a domain doesn't >>>>> use IOREQ server... >>>> >>>> Are you OK with this patch (common part of it)? >>> How much of a performance benefit is this? The array is small to simply counting the non-NULL >> entries should be pretty quick. >> I didn't perform performance measurements on how much this call consumes. >> In our system we run three domains. The emulator is in DomD only, so I >> would like to avoid to call vcpu_ioreq_handle_completion() for every >> Dom0/DomU's vCPUs >> if there is no real need to do it. > > This is not relevant to the domain that the emulator is running in; it's concerning the domains which the emulator is servicing. How many of those are there? AFAICT, the maximum number of IOREQ servers is 8 today. > >> On Arm vcpu_ioreq_handle_completion() >> is called with IRQ enabled, so the call is accompanied with >> corresponding irq_enable/irq_disable. >> These unneeded actions could be avoided by using this simple one-line >> helper... >> > > The helper may be one line but there is more to the patch than that. I still think you could just walk the array in the helper rather than keeping a running occupancy count. Right, the concern here is this function will be called in an hotpath (everytime we are re-entering to the guest). At the difference of x86, the entry/exit code is really small, so any additional code will have an impact on the overall performance. That said, the IOREQ code is a tech preview for Arm. So I would be fine going with Paul's approach until we have a better understanding on the performance of virtio/IOREQ. I am going to throw some more thoughts about the optimization here. The patch is focusing on performance impact when IOREQ is built-in and not used. I think we can do further optimization (which may superseed this one). get_pending_vcpu() (called from handle_hvm_io_completion()) is overly expensive in particular if you have no I/O forwarded to an IOREQ server. Entry to the hypervisor can happen for many reasons (interrupts, system registers emulation, I/O emulation...) and the I/O forwarded should be a small subset. Ideally, handle_hvm_io_completion() should be a NOP (at max a few instructions) if there are nothing to do. Maybe we want to introduce a per-vCPU flag indicating if an I/O has been forwarded to an IOREQ server. This would also us to bypass most of the function if there is nothing to do. Any thoughts? In any case this is more a forward looking rather than a request for the current series. What matters to me is we have a functional (not necessarily optimized) version of IOREQ in Xen 4.15. This would be a great step towards using Virto on Xen. Cheers,
Hi Paul. >>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: >>>>>>> --- a/xen/include/xen/ioreq.h >>>>>>> +++ b/xen/include/xen/ioreq.h >>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server { >>>>>>> uint8_t bufioreq_handling; >>>>>>> }; >>>>>>> +/* >>>>>>> + * This should only be used when d == current->domain and it's not >>>>>>> paused, >>>>>> Is the "not paused" part really relevant here? Besides it being rare >>>>>> that the current domain would be paused (if so, it's in the process >>>>>> of having all its vCPU-s scheduled out), does this matter at all?do >>>>>> any extra actionsdo any extra actions >>>>> No, it isn't relevant, I will drop it. >>>>> >>>>> >>>>>> Apart from this the patch looks okay to me, but I'm not sure it >>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to >>>>>> a list if doing a swipe over the entire array is too expensive in >>>>>> this specific case. >>>>> We would like to avoid to do any extra actions in >>>>> leave_hypervisor_to_guest() if possible. >>>>> But not only there, the logic whether we check/set >>>>> mapcache_invalidation variable could be avoided if a domain doesn't >>>>> use IOREQ server... >>>> Are you OK with this patch (common part of it)? >>> How much of a performance benefit is this? The array is small to simply counting the non-NULL >> entries should be pretty quick. >> I didn't perform performance measurements on how much this call consumes. >> In our system we run three domains. The emulator is in DomD only, so I >> would like to avoid to call vcpu_ioreq_handle_completion() for every >> Dom0/DomU's vCPUs >> if there is no real need to do it. > This is not relevant to the domain that the emulator is running in; it's concerning the domains which the emulator is servicing. How many of those are there? Err, yes, I wasn't precise when providing an example. Single emulator is running in DomD and servicing DomU. So with the helper in place the vcpu_ioreq_handle_completion() gets only called for DomU vCPUs (as expected). Without an optimization the vcpu_ioreq_handle_completion() gets called for _all_ vCPUs, and I see it as an extra action for Dom0, DomD vCPUs. > >> On Arm vcpu_ioreq_handle_completion() >> is called with IRQ enabled, so the call is accompanied with >> corresponding irq_enable/irq_disable. >> These unneeded actions could be avoided by using this simple one-line >> helper... >> > The helper may be one line but there is more to the patch than that. I still think you could just walk the array in the helper rather than keeping a running occupancy count. OK, is the implementation below close to what you propose? If yes, I will update a helper and drop nr_servers variable. bool domain_has_ioreq_server(const struct domain *d) { const struct ioreq_server *s; unsigned int id; FOR_EACH_IOREQ_SERVER(d, id, s) return true; return false; }
On 09.12.20 20:58, Julien Grall wrote: > Hi Oleksandr and Paul, Hi Julien, Paul. > > Sorry for jumping late in the conversation. > > On 09/12/2020 09:01, Paul Durrant wrote: >>> -----Original Message----- >>> From: Oleksandr <olekstysh@gmail.com> >>> Sent: 08 December 2020 20:17 >>> To: paul@xen.org >>> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' >>> <oleksandr_tyshchenko@epam.com>; >>> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' >>> <julien@xen.org>; 'Volodymyr Babchuk' >>> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' >>> <andrew.cooper3@citrix.com>; 'George Dunlap' >>> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei >>> Liu' <wl@xen.org>; 'Julien Grall' >>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org >>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce >>> domain_has_ioreq_server() >>> >>> >>> On 08.12.20 21:43, Paul Durrant wrote: >>> >>> Hi Paul >>> >>>>> -----Original Message----- >>>>> From: Oleksandr <olekstysh@gmail.com> >>>>> Sent: 08 December 2020 16:57 >>>>> To: Paul Durrant <paul@xen.org> >>>>> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko >>>>> <oleksandr_tyshchenko@epam.com>; Stefano >>>>> Stabellini <sstabellini@kernel.org>; Julien Grall >>>>> <julien@xen.org>; Volodymyr Babchuk >>>>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper >>>>> <andrew.cooper3@citrix.com>; George Dunlap >>>>> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei >>>>> Liu <wl@xen.org>; Julien Grall >>>>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org >>>>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce >>>>> domain_has_ioreq_server() >>>>> >>>>> >>>>> Hi Paul. >>>>> >>>>> >>>>> On 08.12.20 17:33, Oleksandr wrote: >>>>>> On 08.12.20 17:11, Jan Beulich wrote: >>>>>> >>>>>> Hi Jan >>>>>> >>>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: >>>>>>>> --- a/xen/include/xen/ioreq.h >>>>>>>> +++ b/xen/include/xen/ioreq.h >>>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server { >>>>>>>> uint8_t bufioreq_handling; >>>>>>>> }; >>>>>>>> +/* >>>>>>>> + * This should only be used when d == current->domain and it's >>>>>>>> not >>>>>>>> paused, >>>>>>> Is the "not paused" part really relevant here? Besides it being >>>>>>> rare >>>>>>> that the current domain would be paused (if so, it's in the process >>>>>>> of having all its vCPU-s scheduled out), does this matter at all?do >>>>>>> any extra actionsdo any extra actions >>>>>> No, it isn't relevant, I will drop it. >>>>>> >>>>>> >>>>>>> Apart from this the patch looks okay to me, but I'm not sure it >>>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to >>>>>>> a list if doing a swipe over the entire array is too expensive in >>>>>>> this specific case. >>>>>> We would like to avoid to do any extra actions in >>>>>> leave_hypervisor_to_guest() if possible. >>>>>> But not only there, the logic whether we check/set >>>>>> mapcache_invalidation variable could be avoided if a domain doesn't >>>>>> use IOREQ server... >>>>> >>>>> Are you OK with this patch (common part of it)? >>>> How much of a performance benefit is this? The array is small to >>>> simply counting the non-NULL >>> entries should be pretty quick. >>> I didn't perform performance measurements on how much this call >>> consumes. >>> In our system we run three domains. The emulator is in DomD only, so I >>> would like to avoid to call vcpu_ioreq_handle_completion() for every >>> Dom0/DomU's vCPUs >>> if there is no real need to do it. >> >> This is not relevant to the domain that the emulator is running in; >> it's concerning the domains which the emulator is servicing. How many >> of those are there? > > AFAICT, the maximum number of IOREQ servers is 8 today. > >> >>> On Arm vcpu_ioreq_handle_completion() >>> is called with IRQ enabled, so the call is accompanied with >>> corresponding irq_enable/irq_disable. >>> These unneeded actions could be avoided by using this simple one-line >>> helper... >>> >> >> The helper may be one line but there is more to the patch than that. >> I still think you could just walk the array in the helper rather than >> keeping a running occupancy count. > > Right, the concern here is this function will be called in an hotpath > (everytime we are re-entering to the guest). At the difference of x86, > the entry/exit code is really small, so any additional code will have > an impact on the overall performance. +1 > > > That said, the IOREQ code is a tech preview for Arm. So I would be > fine going with Paul's approach until we have a better understanding > on the performance of virtio/IOREQ. I am fine with Paul's approach for now (I only need a confirmation that I got it correctly). > > > I am going to throw some more thoughts about the optimization here. > The patch is focusing on performance impact when IOREQ is built-in and > not used. It is true, what I would to add here is the helper also avoids unnecessary vcpu_ioreq_handle_completion() calls as well another unnecessary action (mapcache handling logic, although it is not a hotpath) in subsequent patch when IOREQ is used. > I think we can do further optimization (which may superseed this one). > > get_pending_vcpu() (called from handle_hvm_io_completion()) is overly > expensive in particular if you have no I/O forwarded to an IOREQ > server. Entry to the hypervisor can happen for many reasons > (interrupts, system registers emulation, I/O emulation...) and the I/O > forwarded should be a small subset. > > Ideally, handle_hvm_io_completion() should be a NOP (at max a few > instructions) if there are nothing to do. Maybe we want to introduce a > per-vCPU flag indicating if an I/O has been forwarded to an IOREQ server. > > This would also us to bypass most of the function if there is nothing > to do. > > Any thoughts? > > In any case this is more a forward looking rather than a request for > the current series. What matters to me is we have a functional (not > necessarily optimized) version of IOREQ in Xen 4.15. This would be a > great step towards using Virto on Xen. Completely agree, current series is quite big) and if we will try to make it perfect I am afraid, we won't have it even in Xen 4.16). As for proposed optimization - I think it worth considering, I will mention about it in the cover letter for the series among other possible things such as buffered request, etc. > > > Cheers, >
> -----Original Message----- > From: Oleksandr <olekstysh@gmail.com> > Sent: 09 December 2020 20:36 > To: paul@xen.org > Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; > 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk' > <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' > <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall' > <julien.grall@arm.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() > > > Hi Paul. > > > >>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: > >>>>>>> --- a/xen/include/xen/ioreq.h > >>>>>>> +++ b/xen/include/xen/ioreq.h > >>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server { > >>>>>>> uint8_t bufioreq_handling; > >>>>>>> }; > >>>>>>> +/* > >>>>>>> + * This should only be used when d == current->domain and it's not > >>>>>>> paused, > >>>>>> Is the "not paused" part really relevant here? Besides it being rare > >>>>>> that the current domain would be paused (if so, it's in the process > >>>>>> of having all its vCPU-s scheduled out), does this matter at all?do > >>>>>> any extra actionsdo any extra actions > >>>>> No, it isn't relevant, I will drop it. > >>>>> > >>>>> > >>>>>> Apart from this the patch looks okay to me, but I'm not sure it > >>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to > >>>>>> a list if doing a swipe over the entire array is too expensive in > >>>>>> this specific case. > >>>>> We would like to avoid to do any extra actions in > >>>>> leave_hypervisor_to_guest() if possible. > >>>>> But not only there, the logic whether we check/set > >>>>> mapcache_invalidation variable could be avoided if a domain doesn't > >>>>> use IOREQ server... > >>>> Are you OK with this patch (common part of it)? > >>> How much of a performance benefit is this? The array is small to simply counting the non-NULL > >> entries should be pretty quick. > >> I didn't perform performance measurements on how much this call consumes. > >> In our system we run three domains. The emulator is in DomD only, so I > >> would like to avoid to call vcpu_ioreq_handle_completion() for every > >> Dom0/DomU's vCPUs > >> if there is no real need to do it. > > This is not relevant to the domain that the emulator is running in; it's concerning the domains > which the emulator is servicing. How many of those are there? > Err, yes, I wasn't precise when providing an example. > Single emulator is running in DomD and servicing DomU. So with the > helper in place the vcpu_ioreq_handle_completion() gets only called for > DomU vCPUs (as expected). > Without an optimization the vcpu_ioreq_handle_completion() gets called > for _all_ vCPUs, and I see it as an extra action for Dom0, DomD vCPUs. > > > > > >> On Arm vcpu_ioreq_handle_completion() > >> is called with IRQ enabled, so the call is accompanied with > >> corresponding irq_enable/irq_disable. > >> These unneeded actions could be avoided by using this simple one-line > >> helper... > >> > > The helper may be one line but there is more to the patch than that. I still think you could just > walk the array in the helper rather than keeping a running occupancy count. > > OK, is the implementation below close to what you propose? If yes, I > will update a helper and drop nr_servers variable. > > bool domain_has_ioreq_server(const struct domain *d) > { > const struct ioreq_server *s; > unsigned int id; > > FOR_EACH_IOREQ_SERVER(d, id, s) > return true; > > return false; > } Yes, that's what I had in mind. Paul > > -- > Regards, > > Oleksandr Tyshchenko
On 10.12.20 10:38, Paul Durrant wrote: Hi Paul. >> -----Original Message----- >> From: Oleksandr <olekstysh@gmail.com> >> Sent: 09 December 2020 20:36 >> To: paul@xen.org >> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; >> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk' >> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' >> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall' >> <julien.grall@arm.com>; xen-devel@lists.xenproject.org >> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() >> >> >> Hi Paul. >> >> >>>>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote: >>>>>>>>> --- a/xen/include/xen/ioreq.h >>>>>>>>> +++ b/xen/include/xen/ioreq.h >>>>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server { >>>>>>>>> uint8_t bufioreq_handling; >>>>>>>>> }; >>>>>>>>> +/* >>>>>>>>> + * This should only be used when d == current->domain and it's not >>>>>>>>> paused, >>>>>>>> Is the "not paused" part really relevant here? Besides it being rare >>>>>>>> that the current domain would be paused (if so, it's in the process >>>>>>>> of having all its vCPU-s scheduled out), does this matter at all?do >>>>>>>> any extra actionsdo any extra actions >>>>>>> No, it isn't relevant, I will drop it. >>>>>>> >>>>>>> >>>>>>>> Apart from this the patch looks okay to me, but I'm not sure it >>>>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to >>>>>>>> a list if doing a swipe over the entire array is too expensive in >>>>>>>> this specific case. >>>>>>> We would like to avoid to do any extra actions in >>>>>>> leave_hypervisor_to_guest() if possible. >>>>>>> But not only there, the logic whether we check/set >>>>>>> mapcache_invalidation variable could be avoided if a domain doesn't >>>>>>> use IOREQ server... >>>>>> Are you OK with this patch (common part of it)? >>>>> How much of a performance benefit is this? The array is small to simply counting the non-NULL >>>> entries should be pretty quick. >>>> I didn't perform performance measurements on how much this call consumes. >>>> In our system we run three domains. The emulator is in DomD only, so I >>>> would like to avoid to call vcpu_ioreq_handle_completion() for every >>>> Dom0/DomU's vCPUs >>>> if there is no real need to do it. >>> This is not relevant to the domain that the emulator is running in; it's concerning the domains >> which the emulator is servicing. How many of those are there? >> Err, yes, I wasn't precise when providing an example. >> Single emulator is running in DomD and servicing DomU. So with the >> helper in place the vcpu_ioreq_handle_completion() gets only called for >> DomU vCPUs (as expected). >> Without an optimization the vcpu_ioreq_handle_completion() gets called >> for _all_ vCPUs, and I see it as an extra action for Dom0, DomD vCPUs. >> >> >>>> On Arm vcpu_ioreq_handle_completion() >>>> is called with IRQ enabled, so the call is accompanied with >>>> corresponding irq_enable/irq_disable. >>>> These unneeded actions could be avoided by using this simple one-line >>>> helper... >>>> >>> The helper may be one line but there is more to the patch than that. I still think you could just >> walk the array in the helper rather than keeping a running occupancy count. >> >> OK, is the implementation below close to what you propose? If yes, I >> will update a helper and drop nr_servers variable. >> >> bool domain_has_ioreq_server(const struct domain *d) >> { >> const struct ioreq_server *s; >> unsigned int id; >> >> FOR_EACH_IOREQ_SERVER(d, id, s) >> return true; >> >> return false; >> } > Yes, that's what I had in mind. > > Paul Thank you for the clarification.
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 4cef43e..b6077d2 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2262,14 +2262,17 @@ static bool check_for_vcpu_work(void) struct vcpu *v = current; #ifdef CONFIG_IOREQ_SERVER - bool handled; + if ( domain_has_ioreq_server(v->domain) ) + { + bool handled; - local_irq_enable(); - handled = vcpu_ioreq_handle_completion(v); - local_irq_disable(); + local_irq_enable(); + handled = vcpu_ioreq_handle_completion(v); + local_irq_disable(); - if ( !handled ) - return true; + if ( !handled ) + return true; + } #endif if ( likely(!v->arch.need_flush_to_ram) ) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index 4855dd8..f35dcf9 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -39,9 +39,14 @@ static void set_ioreq_server(struct domain *d, unsigned int id, struct ioreq_server *s) { ASSERT(id < MAX_NR_IOREQ_SERVERS); - ASSERT(!s || !d->ioreq_server.server[id]); + ASSERT(!s ^ !d->ioreq_server.server[id]); d->ioreq_server.server[id] = s; + + if ( s ) + d->ioreq_server.nr_servers++; + else + d->ioreq_server.nr_servers--; } #define GET_IOREQ_SERVER(d, id) \ diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index 02ff998..2289e79 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -55,6 +55,20 @@ struct ioreq_server { uint8_t bufioreq_handling; }; +/* + * This should only be used when d == current->domain and it's not paused, + * or when they're distinct and d is paused. Otherwise the result is + * stale before the caller can inspect it. + */ +static inline bool domain_has_ioreq_server(const struct domain *d) +{ +#ifdef CONFIG_IOREQ_SERVER + return d->ioreq_server.nr_servers; +#else + return false; +#endif +} + static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p) { return unlikely(p->df) ? diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 8269f84..2277995 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -550,6 +550,7 @@ struct domain struct { spinlock_t lock; struct ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + unsigned int nr_servers; } ioreq_server; #endif };