Message ID | 1606732298-22107-16-git-send-email-olekstysh@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V3,01/23] x86/ioreq: Prepare IOREQ feature for making it common | expand |
Hello Oleksandr, Oleksandr Tyshchenko writes: > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > > This patch adds proper handling of return value of > vcpu_ioreq_handle_completion() which involves using a loop > in leave_hypervisor_to_guest(). > > The reason to use an unbounded loop here is the fact that vCPU > shouldn't continue until an I/O has completed. In Xen case, if an I/O > never completes then it most likely means that something went horribly > wrong with the Device Emulator. And it is most likely not safe to > continue. So letting the vCPU to spin forever if I/O never completes > is a safer action than letting it continue and leaving the guest in > unclear state and is the best what we can do for now. > > This wouldn't be an issue for Xen as do_softirq() would be called at > every loop. In case of failure, the guest will crash and the vCPU > will be unscheduled. > Why you don't block vcpu there and unblock it when response is ready? If I got it right, "client" vcpu will spin in the loop, eating own scheduling budget with no useful work done. In the worst case, it will prevent "server" vcpu from running. > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > CC: Julien Grall <julien.grall@arm.com> > > --- > Please note, this is a split/cleanup/hardening of Julien's PoC: > "Add support for Guest IO forwarding to a device emulator" > > Changes V1 -> V2: > - new patch, changes were derived from (+ new explanation): > arm/ioreq: Introduce arch specific bits for IOREQ/DM features > > Changes V2 -> V3: > - update patch description > --- > --- > xen/arch/arm/traps.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 036b13f..4cef43e 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -2257,18 +2257,23 @@ static void check_for_pcpu_work(void) > * Process pending work for the vCPU. Any call should be fast or > * implement preemption. > */ > -static void check_for_vcpu_work(void) > +static bool check_for_vcpu_work(void) > { > struct vcpu *v = current; > > #ifdef CONFIG_IOREQ_SERVER > + bool handled; > + > local_irq_enable(); > - vcpu_ioreq_handle_completion(v); > + handled = vcpu_ioreq_handle_completion(v); > local_irq_disable(); > + > + if ( !handled ) > + return true; > #endif > > if ( likely(!v->arch.need_flush_to_ram) ) > - return; > + return false; > > /* > * Give a chance for the pCPU to process work before handling the vCPU > @@ -2279,6 +2284,8 @@ static void check_for_vcpu_work(void) > local_irq_enable(); > p2m_flush_vm(v); > local_irq_disable(); > + > + return false; > } > > /* > @@ -2291,8 +2298,22 @@ void leave_hypervisor_to_guest(void) > { > local_irq_disable(); > > - check_for_vcpu_work(); > - check_for_pcpu_work(); > + /* > + * The reason to use an unbounded loop here is the fact that vCPU > + * shouldn't continue until an I/O has completed. In Xen case, if an I/O > + * never completes then it most likely means that something went horribly > + * wrong with the Device Emulator. And it is most likely not safe to > + * continue. So letting the vCPU to spin forever if I/O never completes > + * is a safer action than letting it continue and leaving the guest in > + * unclear state and is the best what we can do for now. > + * > + * This wouldn't be an issue for Xen as do_softirq() would be called at > + * every loop. In case of failure, the guest will crash and the vCPU > + * will be unscheduled. > + */ > + do { > + check_for_pcpu_work(); > + } while ( check_for_vcpu_work() ); > > vgic_sync_to_lrs();
Hi Volodymyr, On 30/11/2020 20:51, Volodymyr Babchuk wrote: > Oleksandr Tyshchenko writes: > >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >> >> This patch adds proper handling of return value of >> vcpu_ioreq_handle_completion() which involves using a loop >> in leave_hypervisor_to_guest(). >> >> The reason to use an unbounded loop here is the fact that vCPU >> shouldn't continue until an I/O has completed. In Xen case, if an I/O >> never completes then it most likely means that something went horribly >> wrong with the Device Emulator. And it is most likely not safe to >> continue. So letting the vCPU to spin forever if I/O never completes >> is a safer action than letting it continue and leaving the guest in >> unclear state and is the best what we can do for now. >> >> This wouldn't be an issue for Xen as do_softirq() would be called at >> every loop. In case of failure, the guest will crash and the vCPU >> will be unscheduled. >> > > Why you don't block vcpu there and unblock it when response is ready? The vCPU will already block while waiting for the event channel. See the call wait_for_event_channel() in the ioreq code. However, you can still receive spurious unblock (e.g. the event channel notificaiton is received before the I/O is unhandled). So you have to loop around and check if there are more work to do. > If > I got it right, "client" vcpu will spin in the loop, eating own > scheduling budget with no useful work done. You can't really do much about that if you have a rogue/buggy device model. > In the worst case, it will > prevent "server" vcpu from running. How so? Xen will raise the schedule softirq if the I/O is handled. You would have a pretty buggy system if your "client" vCPU is considered to be a much higher priority than your "server" vCPU. Cheers,
On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote: > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > > This patch adds proper handling of return value of > vcpu_ioreq_handle_completion() which involves using a loop > in leave_hypervisor_to_guest(). > > The reason to use an unbounded loop here is the fact that vCPU > shouldn't continue until an I/O has completed. In Xen case, if an I/O > never completes then it most likely means that something went horribly > wrong with the Device Emulator. And it is most likely not safe to > continue. So letting the vCPU to spin forever if I/O never completes > is a safer action than letting it continue and leaving the guest in > unclear state and is the best what we can do for now. > > This wouldn't be an issue for Xen as do_softirq() would be called at > every loop. In case of failure, the guest will crash and the vCPU > will be unscheduled. Imagine that we have two guests: one that requires an ioreq server and one that doesn't. If I am not mistaken this loop could potentially spin forever on a pcpu, thus preventing any other guest being scheduled, even if the other guest doesn't need any ioreq servers. My other concern is that we are busy-looping. Could we call something like wfi() or do_idle() instead? The ioreq server event notification of completion should wake us up? Following this line of thinking, I am wondering if instead of the busy-loop we should call vcpu_block_unless_event_pending(current) in try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or equivalent) calls xenevtchn_notify which ends up waking up the domU vcpu. Would that work? > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > CC: Julien Grall <julien.grall@arm.com> > > --- > Please note, this is a split/cleanup/hardening of Julien's PoC: > "Add support for Guest IO forwarding to a device emulator" > > Changes V1 -> V2: > - new patch, changes were derived from (+ new explanation): > arm/ioreq: Introduce arch specific bits for IOREQ/DM features > > Changes V2 -> V3: > - update patch description > --- > --- > xen/arch/arm/traps.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 036b13f..4cef43e 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -2257,18 +2257,23 @@ static void check_for_pcpu_work(void) > * Process pending work for the vCPU. Any call should be fast or > * implement preemption. > */ > -static void check_for_vcpu_work(void) > +static bool check_for_vcpu_work(void) > { > struct vcpu *v = current; > > #ifdef CONFIG_IOREQ_SERVER > + bool handled; > + > local_irq_enable(); > - vcpu_ioreq_handle_completion(v); > + handled = vcpu_ioreq_handle_completion(v); > local_irq_disable(); > + > + if ( !handled ) > + return true; > #endif > > if ( likely(!v->arch.need_flush_to_ram) ) > - return; > + return false; > > /* > * Give a chance for the pCPU to process work before handling the vCPU > @@ -2279,6 +2284,8 @@ static void check_for_vcpu_work(void) > local_irq_enable(); > p2m_flush_vm(v); > local_irq_disable(); > + > + return false; > } > > /* > @@ -2291,8 +2298,22 @@ void leave_hypervisor_to_guest(void) > { > local_irq_disable(); > > - check_for_vcpu_work(); > - check_for_pcpu_work(); > + /* > + * The reason to use an unbounded loop here is the fact that vCPU > + * shouldn't continue until an I/O has completed. In Xen case, if an I/O > + * never completes then it most likely means that something went horribly > + * wrong with the Device Emulator. And it is most likely not safe to > + * continue. So letting the vCPU to spin forever if I/O never completes > + * is a safer action than letting it continue and leaving the guest in > + * unclear state and is the best what we can do for now. > + * > + * This wouldn't be an issue for Xen as do_softirq() would be called at > + * every loop. In case of failure, the guest will crash and the vCPU > + * will be unscheduled. > + */ > + do { > + check_for_pcpu_work(); > + } while ( check_for_vcpu_work() ); > > vgic_sync_to_lrs(); > > -- > 2.7.4 >
On Wed, 9 Dec 2020, Stefano Stabellini wrote: > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote: > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > > > > This patch adds proper handling of return value of > > vcpu_ioreq_handle_completion() which involves using a loop > > in leave_hypervisor_to_guest(). > > > > The reason to use an unbounded loop here is the fact that vCPU > > shouldn't continue until an I/O has completed. In Xen case, if an I/O > > never completes then it most likely means that something went horribly > > wrong with the Device Emulator. And it is most likely not safe to > > continue. So letting the vCPU to spin forever if I/O never completes > > is a safer action than letting it continue and leaving the guest in > > unclear state and is the best what we can do for now. > > > > This wouldn't be an issue for Xen as do_softirq() would be called at > > every loop. In case of failure, the guest will crash and the vCPU > > will be unscheduled. > > Imagine that we have two guests: one that requires an ioreq server and > one that doesn't. If I am not mistaken this loop could potentially spin > forever on a pcpu, thus preventing any other guest being scheduled, even > if the other guest doesn't need any ioreq servers. > > > My other concern is that we are busy-looping. Could we call something > like wfi() or do_idle() instead? The ioreq server event notification of > completion should wake us up? > > Following this line of thinking, I am wondering if instead of the > busy-loop we should call vcpu_block_unless_event_pending(current) in > try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or > equivalent) calls xenevtchn_notify which ends up waking up the domU > vcpu. Would that work? I read now Julien's reply: we are already doing something similar to what I suggested with the following call chain: check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io -> wait_on_xen_event_channel So the busy-loop here is only a safety-belt in cause of a spurious wake-up, in which case we are going to call again check_for_vcpu_work, potentially causing a guest reschedule. Then, this is fine and addresses both my concerns. Maybe let's add a note in the commit message about it. I am also wondering if there is any benefit in calling wait_for_io() earlier, maybe from try_handle_mmio if IO_RETRY? leave_hypervisor_to_guest is very late for that. In any case, it is not an important optimization (if it is even an optimization at all) so it is fine to leave it as is.
Hi Stefano, On 09/12/2020 23:18, Stefano Stabellini wrote: > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote: >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >> >> This patch adds proper handling of return value of >> vcpu_ioreq_handle_completion() which involves using a loop >> in leave_hypervisor_to_guest(). >> >> The reason to use an unbounded loop here is the fact that vCPU >> shouldn't continue until an I/O has completed. In Xen case, if an I/O >> never completes then it most likely means that something went horribly >> wrong with the Device Emulator. And it is most likely not safe to >> continue. So letting the vCPU to spin forever if I/O never completes >> is a safer action than letting it continue and leaving the guest in >> unclear state and is the best what we can do for now. >> >> This wouldn't be an issue for Xen as do_softirq() would be called at >> every loop. In case of failure, the guest will crash and the vCPU >> will be unscheduled. > > Imagine that we have two guests: one that requires an ioreq server and > one that doesn't. If I am not mistaken this loop could potentially spin > forever on a pcpu, thus preventing any other guest being scheduled, even > if the other guest doesn't need any ioreq servers. That's not correct. At every loop we will call check_for_pcpu_work() that will process pending softirqs. If the rescheduling is necessary (might be set by a timer or by a caller in check_for_vcpu_work()), then the vCPU will be descheduled to give place to someone else. > > > My other concern is that we are busy-looping. Could we call something > like wfi() or do_idle() instead? The ioreq server event notification of > completion should wake us up? There are no busy loop here. If the IOREQ has not yet handled the I/O we will block the vCPU until an event notification is received (see the call to wait_on_xen_event_channel()). This loop make sure that all the vPCU works are done before we return to the guest. The worse that can happen here if the vCPU will never run again if the IOREQ server is been naughty. > > Following this line of thinking, I am wondering if instead of the > busy-loop we should call vcpu_block_unless_event_pending(current) in > try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or > equivalent) calls xenevtchn_notify which ends up waking up the domU > vcpu. Would that work? vcpu_block_unless_event_pending() will not block if there are interrupts pending. However, here we really want to block until the I/O has been completed. So vcpu_block_unless_event_pending() is not the right approach. The IOREQ code is using wait_on_xen_event_channel(). Yet, this can still "exit" early if an event has been received. But this doesn't mean the I/O has completed. So we need to check if the I/O has completed and wait again if it hasn't. I seem to keep having to explain how the code works. So maybe we want to update the commit message with more details? Cheers,
On 09/12/2020 23:35, Stefano Stabellini wrote: > On Wed, 9 Dec 2020, Stefano Stabellini wrote: >> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote: >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >>> >>> This patch adds proper handling of return value of >>> vcpu_ioreq_handle_completion() which involves using a loop >>> in leave_hypervisor_to_guest(). >>> >>> The reason to use an unbounded loop here is the fact that vCPU >>> shouldn't continue until an I/O has completed. In Xen case, if an I/O >>> never completes then it most likely means that something went horribly >>> wrong with the Device Emulator. And it is most likely not safe to >>> continue. So letting the vCPU to spin forever if I/O never completes >>> is a safer action than letting it continue and leaving the guest in >>> unclear state and is the best what we can do for now. >>> >>> This wouldn't be an issue for Xen as do_softirq() would be called at >>> every loop. In case of failure, the guest will crash and the vCPU >>> will be unscheduled. >> >> Imagine that we have two guests: one that requires an ioreq server and >> one that doesn't. If I am not mistaken this loop could potentially spin >> forever on a pcpu, thus preventing any other guest being scheduled, even >> if the other guest doesn't need any ioreq servers. >> >> >> My other concern is that we are busy-looping. Could we call something >> like wfi() or do_idle() instead? The ioreq server event notification of >> completion should wake us up? >> >> Following this line of thinking, I am wondering if instead of the >> busy-loop we should call vcpu_block_unless_event_pending(current) in >> try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or >> equivalent) calls xenevtchn_notify which ends up waking up the domU >> vcpu. Would that work? > > I read now Julien's reply: we are already doing something similar to > what I suggested with the following call chain: > > check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io -> wait_on_xen_event_channel > > So the busy-loop here is only a safety-belt in cause of a spurious > wake-up, in which case we are going to call again check_for_vcpu_work, > potentially causing a guest reschedule. > > Then, this is fine and addresses both my concerns. Maybe let's add a note > in the commit message about it. Damm, I hit the "sent" button just a second before seen your reply. :/ Oh well. I suggested the same because I have seen the same question multiple time. > > > I am also wondering if there is any benefit in calling wait_for_io() > earlier, maybe from try_handle_mmio if IO_RETRY? wait_for_io() may end up to deschedule the vCPU. I would like to avoid this to happen in the middle of the I/O emulation because we need to happen it without lock held at all. I don't think there are locks involved today, but the deeper in the call stack the scheduling happens, the more chance we may screw up in the future. However... > leave_hypervisor_to_guest is very late for that. ... I am not sure what's the problem with that. The IOREQ will be notified of the pending I/O as soon as try_handle_mmio() put the I/O in the shared page. If the IOREQ server is running on a different pCPU, then it might be possible that the I/O has completed before reached leave_hypervisor_to_guest(). In this case, we would not have to wait for the I/O. Cheers,
On Wed, 9 Dec 2020, Julien Grall wrote: > On 09/12/2020 23:35, Stefano Stabellini wrote: > > On Wed, 9 Dec 2020, Stefano Stabellini wrote: > > > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote: > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > > > > > > > > This patch adds proper handling of return value of > > > > vcpu_ioreq_handle_completion() which involves using a loop > > > > in leave_hypervisor_to_guest(). > > > > > > > > The reason to use an unbounded loop here is the fact that vCPU > > > > shouldn't continue until an I/O has completed. In Xen case, if an I/O > > > > never completes then it most likely means that something went horribly > > > > wrong with the Device Emulator. And it is most likely not safe to > > > > continue. So letting the vCPU to spin forever if I/O never completes > > > > is a safer action than letting it continue and leaving the guest in > > > > unclear state and is the best what we can do for now. > > > > > > > > This wouldn't be an issue for Xen as do_softirq() would be called at > > > > every loop. In case of failure, the guest will crash and the vCPU > > > > will be unscheduled. > > > > > > Imagine that we have two guests: one that requires an ioreq server and > > > one that doesn't. If I am not mistaken this loop could potentially spin > > > forever on a pcpu, thus preventing any other guest being scheduled, even > > > if the other guest doesn't need any ioreq servers. > > > > > > > > > My other concern is that we are busy-looping. Could we call something > > > like wfi() or do_idle() instead? The ioreq server event notification of > > > completion should wake us up? > > > > > > Following this line of thinking, I am wondering if instead of the > > > busy-loop we should call vcpu_block_unless_event_pending(current) in > > > try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or > > > equivalent) calls xenevtchn_notify which ends up waking up the domU > > > vcpu. Would that work? > > > > I read now Julien's reply: we are already doing something similar to > > what I suggested with the following call chain: > > > > check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io -> > > wait_on_xen_event_channel > > > > So the busy-loop here is only a safety-belt in cause of a spurious > > wake-up, in which case we are going to call again check_for_vcpu_work, > > potentially causing a guest reschedule. > > > > Then, this is fine and addresses both my concerns. Maybe let's add a note > > in the commit message about it. > > Damm, I hit the "sent" button just a second before seen your reply. :/ Oh > well. I suggested the same because I have seen the same question multiple > time. :-) > > I am also wondering if there is any benefit in calling wait_for_io() > > earlier, maybe from try_handle_mmio if IO_RETRY? > > wait_for_io() may end up to deschedule the vCPU. I would like to avoid this to > happen in the middle of the I/O emulation because we need to happen it without > lock held at all. > > I don't think there are locks involved today, but the deeper in the call stack > the scheduling happens, the more chance we may screw up in the future. > > However... > > > leave_hypervisor_to_guest is very late for that. > > ... I am not sure what's the problem with that. The IOREQ will be notified of > the pending I/O as soon as try_handle_mmio() put the I/O in the shared page. > > If the IOREQ server is running on a different pCPU, then it might be possible > that the I/O has completed before reached leave_hypervisor_to_guest(). In this > case, we would not have to wait for the I/O. Yeah, I was thinking about that too. Actually it could be faster this way we end up being lucky. The reason for moving it earlier would be that by the time leave_hypervisor_to_guest is called "Xen" has already decided to continue running this particular vcpu. If we called wait_for_io() earlier, we would give important information to the scheduler before any decision is made. This is more "philosophical" than practical though. Let's leave it as is.
Hi Stefano, On 10/12/2020 02:30, Stefano Stabellini wrote: >>> I am also wondering if there is any benefit in calling wait_for_io() >>> earlier, maybe from try_handle_mmio if IO_RETRY? >> >> wait_for_io() may end up to deschedule the vCPU. I would like to avoid this to >> happen in the middle of the I/O emulation because we need to happen it without >> lock held at all. >> >> I don't think there are locks involved today, but the deeper in the call stack >> the scheduling happens, the more chance we may screw up in the future. >> >> However... >> >>> leave_hypervisor_to_guest is very late for that. >> >> ... I am not sure what's the problem with that. The IOREQ will be notified of >> the pending I/O as soon as try_handle_mmio() put the I/O in the shared page. >> >> If the IOREQ server is running on a different pCPU, then it might be possible >> that the I/O has completed before reached leave_hypervisor_to_guest(). In this >> case, we would not have to wait for the I/O. > > Yeah, I was thinking about that too. Actually it could be faster > this way we end up being lucky. > > The reason for moving it earlier would be that by the time > leave_hypervisor_to_guest is called "Xen" has already decided to > continue running this particular vcpu. If we called wait_for_io() > earlier, we would give important information to the scheduler before any > decision is made. I don't understand this. Xen preemption is voluntary, so the scheduler is not going to run unless requested. wait_for_io() is a preemption point. So if you call it, then vCPU may get descheduled at that point. Why would we want to do this? What's our benefits here? Cheers,
On 10.12.20 04:30, Stefano Stabellini wrote: Hi Julien, Stefano > On Wed, 9 Dec 2020, Julien Grall wrote: >> On 09/12/2020 23:35, Stefano Stabellini wrote: >>> On Wed, 9 Dec 2020, Stefano Stabellini wrote: >>>> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote: >>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >>>>> >>>>> This patch adds proper handling of return value of >>>>> vcpu_ioreq_handle_completion() which involves using a loop >>>>> in leave_hypervisor_to_guest(). >>>>> >>>>> The reason to use an unbounded loop here is the fact that vCPU >>>>> shouldn't continue until an I/O has completed. In Xen case, if an I/O >>>>> never completes then it most likely means that something went horribly >>>>> wrong with the Device Emulator. And it is most likely not safe to >>>>> continue. So letting the vCPU to spin forever if I/O never completes >>>>> is a safer action than letting it continue and leaving the guest in >>>>> unclear state and is the best what we can do for now. >>>>> >>>>> This wouldn't be an issue for Xen as do_softirq() would be called at >>>>> every loop. In case of failure, the guest will crash and the vCPU >>>>> will be unscheduled. >>>> Imagine that we have two guests: one that requires an ioreq server and >>>> one that doesn't. If I am not mistaken this loop could potentially spin >>>> forever on a pcpu, thus preventing any other guest being scheduled, even >>>> if the other guest doesn't need any ioreq servers. >>>> >>>> >>>> My other concern is that we are busy-looping. Could we call something >>>> like wfi() or do_idle() instead? The ioreq server event notification of >>>> completion should wake us up? >>>> >>>> Following this line of thinking, I am wondering if instead of the >>>> busy-loop we should call vcpu_block_unless_event_pending(current) in >>>> try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or >>>> equivalent) calls xenevtchn_notify which ends up waking up the domU >>>> vcpu. Would that work? >>> I read now Julien's reply: we are already doing something similar to >>> what I suggested with the following call chain: >>> >>> check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io -> >>> wait_on_xen_event_channel >>> >>> So the busy-loop here is only a safety-belt in cause of a spurious >>> wake-up, in which case we are going to call again check_for_vcpu_work, >>> potentially causing a guest reschedule. >>> >>> Then, this is fine and addresses both my concerns. Maybe let's add a note >>> in the commit message about it. >> Damm, I hit the "sent" button just a second before seen your reply. :/ Oh >> well. I suggested the same because I have seen the same question multiple >> time. I will update commit description and probably the comment in code.
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 036b13f..4cef43e 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2257,18 +2257,23 @@ static void check_for_pcpu_work(void) * Process pending work for the vCPU. Any call should be fast or * implement preemption. */ -static void check_for_vcpu_work(void) +static bool check_for_vcpu_work(void) { struct vcpu *v = current; #ifdef CONFIG_IOREQ_SERVER + bool handled; + local_irq_enable(); - vcpu_ioreq_handle_completion(v); + handled = vcpu_ioreq_handle_completion(v); local_irq_disable(); + + if ( !handled ) + return true; #endif if ( likely(!v->arch.need_flush_to_ram) ) - return; + return false; /* * Give a chance for the pCPU to process work before handling the vCPU @@ -2279,6 +2284,8 @@ static void check_for_vcpu_work(void) local_irq_enable(); p2m_flush_vm(v); local_irq_disable(); + + return false; } /* @@ -2291,8 +2298,22 @@ void leave_hypervisor_to_guest(void) { local_irq_disable(); - check_for_vcpu_work(); - check_for_pcpu_work(); + /* + * The reason to use an unbounded loop here is the fact that vCPU + * shouldn't continue until an I/O has completed. In Xen case, if an I/O + * never completes then it most likely means that something went horribly + * wrong with the Device Emulator. And it is most likely not safe to + * continue. So letting the vCPU to spin forever if I/O never completes + * is a safer action than letting it continue and leaving the guest in + * unclear state and is the best what we can do for now. + * + * This wouldn't be an issue for Xen as do_softirq() would be called at + * every loop. In case of failure, the guest will crash and the vCPU + * will be unscheduled. + */ + do { + check_for_pcpu_work(); + } while ( check_for_vcpu_work() ); vgic_sync_to_lrs();