diff mbox series

[2/2] xen: sched: we never get into context_switch() with prev==next

Message ID 155577388740.25746.3780283868034526234.stgit@wayrath (mailing list archive)
State New, archived
Headers show
Series xen: sched/Credit2: small and trivial improvements | expand

Commit Message

Dario Faggioli April 20, 2019, 3:24 p.m. UTC
In schedule(), if we pick, as the next vcpu to run (next) the same one
that is running already (prev), we never get to call context_switch().

We can, therefore, get rid of all the `if`-s testing prev and next being
different, trading them with an ASSERT() (on ARM, the ASSERT() was even
already there!)

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Suggested-by: Juergen Gross <jgross@suse.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
 xen/arch/arm/domain.c |    3 +--
 xen/arch/x86/domain.c |   22 ++++++++--------------
 2 files changed, 9 insertions(+), 16 deletions(-)

Comments

Julien Grall April 21, 2019, 5:08 p.m. UTC | #1
Hi Dario,

On 4/20/19 4:24 PM, Dario Faggioli wrote:
> In schedule(), if we pick, as the next vcpu to run (next) the same one
> that is running already (prev), we never get to call context_switch().
> 
> We can, therefore, get rid of all the `if`-s testing prev and next being
> different, trading them with an ASSERT() (on ARM, the ASSERT() was even
> already there!)
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> Suggested-by: Juergen Gross <jgross@suse.com>

For Arm:

Acked-by: Julien Grall <julien.grall@arm.com>

Cheers,

> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   xen/arch/arm/domain.c |    3 +--
>   xen/arch/x86/domain.c |   22 ++++++++--------------
>   2 files changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 6dc633ed50..915ae0b4c6 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -343,8 +343,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>       ASSERT(prev != next);
>       ASSERT(!vcpu_cpu_dirty(next));
>   
> -    if ( prev != next )
> -        update_runstate_area(prev);
> +    update_runstate_area(prev);
>   
>       local_irq_disable();
>   
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 9eaa978ce5..d2d9f2fc3c 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1721,6 +1721,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>       const struct domain *prevd = prev->domain, *nextd = next->domain;
>       unsigned int dirty_cpu = next->dirty_cpu;
>   
> +    ASSERT(prev != next);
>       ASSERT(local_irq_is_enabled());
>   
>       get_cpu_info()->use_pv_cr3 = false;
> @@ -1732,12 +1733,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>           flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
>       }
>   
> -    if ( prev != next )
> -    {
> -        _update_runstate_area(prev);
> -        vpmu_switch_from(prev);
> -        np2m_schedule(NP2M_SCHEDLE_OUT);
> -    }
> +    _update_runstate_area(prev);
> +    vpmu_switch_from(prev);
> +    np2m_schedule(NP2M_SCHEDLE_OUT);
>   
>       if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm.tm_list) )
>           pt_save_timer(prev);
> @@ -1794,14 +1792,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>   
>       context_saved(prev);
>   
> -    if ( prev != next )
> -    {
> -        _update_runstate_area(next);
> -
> -        /* Must be done with interrupts enabled */
> -        vpmu_switch_to(next);
> -        np2m_schedule(NP2M_SCHEDLE_IN);
> -    }
> +    _update_runstate_area(next);
> +    /* Must be done with interrupts enabled */
> +    vpmu_switch_to(next);
> +    np2m_schedule(NP2M_SCHEDLE_IN);
>   
>       /* Ensure that the vcpu has an up-to-date time base. */
>       update_vcpu_system_time(next);
>
Andrii Anisov April 23, 2019, 9:13 a.m. UTC | #2
Hello Dario,

On 20.04.19 18:24, Dario Faggioli wrote:
> In schedule(), if we pick, as the next vcpu to run (next) the same one
> that is running already (prev), we never get to call context_switch().

And what about `if ( prev != current )` in arm/domain.c:schedule_tail() ?

> 
> We can, therefore, get rid of all the `if`-s testing prev and next being
> different, trading them with an ASSERT() (on ARM, the ASSERT() was even
> already there!)> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> Suggested-by: Juergen Gross <jgross@suse.com>

For all below:

Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>


> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   xen/arch/arm/domain.c |    3 +--
>   xen/arch/x86/domain.c |   22 ++++++++--------------
>   2 files changed, 9 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 6dc633ed50..915ae0b4c6 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -343,8 +343,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>       ASSERT(prev != next);
>       ASSERT(!vcpu_cpu_dirty(next));
>   
> -    if ( prev != next )
> -        update_runstate_area(prev);
> +    update_runstate_area(prev);
>   
>       local_irq_disable();
>   
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 9eaa978ce5..d2d9f2fc3c 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1721,6 +1721,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>       const struct domain *prevd = prev->domain, *nextd = next->domain;
>       unsigned int dirty_cpu = next->dirty_cpu;
>   
> +    ASSERT(prev != next);
>       ASSERT(local_irq_is_enabled());
>   
>       get_cpu_info()->use_pv_cr3 = false;
> @@ -1732,12 +1733,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>           flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
>       }
>   
> -    if ( prev != next )
> -    {
> -        _update_runstate_area(prev);
> -        vpmu_switch_from(prev);
> -        np2m_schedule(NP2M_SCHEDLE_OUT);
> -    }
> +    _update_runstate_area(prev);
> +    vpmu_switch_from(prev);
> +    np2m_schedule(NP2M_SCHEDLE_OUT);
>   
>       if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm.tm_list) )
>           pt_save_timer(prev);
> @@ -1794,14 +1792,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>   
>       context_saved(prev);
>   
> -    if ( prev != next )
> -    {
> -        _update_runstate_area(next);
> -
> -        /* Must be done with interrupts enabled */
> -        vpmu_switch_to(next);
> -        np2m_schedule(NP2M_SCHEDLE_IN);
> -    }
> +    _update_runstate_area(next);
> +    /* Must be done with interrupts enabled */
> +    vpmu_switch_to(next);
> +    np2m_schedule(NP2M_SCHEDLE_IN);
>   
>       /* Ensure that the vcpu has an up-to-date time base. */
>       update_vcpu_system_time(next);
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
>
Andrew Cooper April 23, 2019, 9:47 a.m. UTC | #3
On 23/04/2019 10:13, Andrii Anisov wrote:
> Hello Dario,
>
> On 20.04.19 18:24, Dario Faggioli wrote:
>> In schedule(), if we pick, as the next vcpu to run (next) the same one
>> that is running already (prev), we never get to call context_switch().
>
> And what about `if ( prev != current )` in arm/domain.c:schedule_tail() ?

schedule_tail() is called with current from the continue_running() path.

~Andrew
George Dunlap April 23, 2019, 9:50 a.m. UTC | #4
> On Apr 20, 2019, at 4:24 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
> 
> In schedule(), if we pick, as the next vcpu to run (next) the same one
> that is running already (prev), we never get to call context_switch().
> 
> We can, therefore, get rid of all the `if`-s testing prev and next being
> different, trading them with an ASSERT() (on ARM, the ASSERT() was even
> already there!)

Keeping in mind that ASSERT() is merely a debugging aid: suppose that testing didn’t discover this, and a bug that violated this assumption slipped into production.  Would the patched code DTRT?

It sort of looks like the prev/next checking is an optimization, and that duplicate checking shouldn’t cause any problems.  Is that the case?

 -George
Jürgen Groß April 23, 2019, 9:56 a.m. UTC | #5
On 23/04/2019 11:50, George Dunlap wrote:
> 
> 
>> On Apr 20, 2019, at 4:24 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
>>
>> In schedule(), if we pick, as the next vcpu to run (next) the same one
>> that is running already (prev), we never get to call context_switch().
>>
>> We can, therefore, get rid of all the `if`-s testing prev and next being
>> different, trading them with an ASSERT() (on ARM, the ASSERT() was even
>> already there!)
> 
> Keeping in mind that ASSERT() is merely a debugging aid: suppose that testing didn’t discover this, and a bug that violated this assumption slipped into production.  Would the patched code DTRT?

The unpatched code would already do something wrong: it would clear
v->is_running (through context_saved()) for the running vcpu,
potentially leading to all sorts of wrong decisions in the scheduling
code later.


Juergen
Andrew Cooper April 23, 2019, 10 a.m. UTC | #6
On 20/04/2019 16:24, Dario Faggioli wrote:
> In schedule(), if we pick, as the next vcpu to run (next) the same one
> that is running already (prev), we never get to call context_switch().
>
> We can, therefore, get rid of all the `if`-s testing prev and next being
> different, trading them with an ASSERT() (on ARM, the ASSERT() was even
> already there!)
>
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> Suggested-by: Juergen Gross <jgross@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

I'll pull this into x86-next given a complete set of R/A-by's

~Andrew
Andrii Anisov April 23, 2019, 10:33 a.m. UTC | #7
Hello Andrew,

On 23.04.19 12:47, Andrew Cooper wrote:
> schedule_tail() is called with current from the continue_running() path.
Not on ARM. On ARM continue_running() is empty.
For the quick check I've put a BUG() into the `else` branch, and did not hit it during system up, domain create/destroy, etc.
Dario Faggioli April 26, 2019, 12:58 p.m. UTC | #8
On Tue, 2019-04-23 at 11:56 +0200, Juergen Gross wrote:
> On 23/04/2019 11:50, George Dunlap wrote:
> > 
> > > On Apr 20, 2019, at 4:24 PM, Dario Faggioli <dfaggioli@suse.com>
> > > wrote:
> > > 
> > > We can, therefore, get rid of all the `if`-s testing prev and
> > > next being
> > > different, trading them with an ASSERT() (on ARM, the ASSERT()
> > > was even
> > > already there!)
> > 
> > Keeping in mind that ASSERT() is merely a debugging aid: suppose
> > that testing didn’t discover this, and a bug that violated this
> > assumption slipped into production.  Would the patched code DTRT?
> 
> The unpatched code would already do something wrong: it would clear
> v->is_running (through context_saved()) for the running vcpu,
> potentially leading to all sorts of wrong decisions in the scheduling
> code later.
> 
Exactly! Which basically means it's quite likely that, if we ever get
past the call to:

  return continue_running(prev)  (in schedule.c:schedule())

with next==prev, things will go very wrong, without the `if`-s that
this patch is killing being of much help.

And, yes, this is (micro) optimizing, but it's also an attempt to
improve the code, making it clear, for people paying with
context_switch(), that they don't have to deal with the special case of
prev and next being equal, while right now they may think they do.

Even if we decide to keep the `if`-s, I'd still like to have the ASSERT
() (and a comment explaining why we have both the assertion and the
checks).

I haven't done archaeology to find out when it was (if ever) that it
was ok to call context_switch() with next==prev. I can, and see about
adding what I find in the changelog, if you (George) think it could be
interesting.

Regards
Dario Faggioli April 26, 2019, 3:13 p.m. UTC | #9
On Tue, 2019-04-23 at 12:13 +0300, Andrii Anisov wrote:
> Hello Dario,
> 
> On 20.04.19 18:24, Dario Faggioli wrote:
> > In schedule(), if we pick, as the next vcpu to run (next) the same
> > one
> > that is running already (prev), we never get to call
> > context_switch().
> 
> And what about `if ( prev != current )` in
> arm/domain.c:schedule_tail() ?
> 
You're suggesting that's redundant too, aren't you?

From a quick look, it seems to me as well that it would be. I'm ok
taking care of it, not sure if in this patch, or in another one.

I'm open to suggestions from ARM maintainers...

Thanks and Regards
Julien Grall April 26, 2019, 4:22 p.m. UTC | #10
Hi Dario,

On 26/04/2019 16:13, Dario Faggioli wrote:
> On Tue, 2019-04-23 at 12:13 +0300, Andrii Anisov wrote:
>> Hello Dario,
>>
>> On 20.04.19 18:24, Dario Faggioli wrote:
>>> In schedule(), if we pick, as the next vcpu to run (next) the same
>>> one
>>> that is running already (prev), we never get to call
>>> context_switch().
>>
>> And what about `if ( prev != current )` in
>> arm/domain.c:schedule_tail() ?
>>
> You're suggesting that's redundant too, aren't you?
> 
>  From a quick look, it seems to me as well that it would be. I'm ok
> taking care of it, not sure if in this patch, or in another one.

I thought Andrew already queued this patch?

Anyway, I think you are right. We can drop the check in schedule_tail() as well.

Cheers,
Andrew Cooper April 26, 2019, 4:26 p.m. UTC | #11
On 26/04/2019 17:22, Julien Grall wrote:
> Hi Dario,
>
> On 26/04/2019 16:13, Dario Faggioli wrote:
>> On Tue, 2019-04-23 at 12:13 +0300, Andrii Anisov wrote:
>>> Hello Dario,
>>>
>>> On 20.04.19 18:24, Dario Faggioli wrote:
>>>> In schedule(), if we pick, as the next vcpu to run (next) the same
>>>> one
>>>> that is running already (prev), we never get to call
>>>> context_switch().
>>>
>>> And what about `if ( prev != current )` in
>>> arm/domain.c:schedule_tail() ?
>>>
>> You're suggesting that's redundant too, aren't you?
>>
>>  From a quick look, it seems to me as well that it would be. I'm ok
>> taking care of it, not sure if in this patch, or in another one.
>
> I thought Andrew already queued this patch?

I did.

>
> Anyway, I think you are right. We can drop the check in
> schedule_tail() as well.

That should really be a separate patch, because it is ARM-specific
scheduling behaviour.  The same is not true on x86.

~Andrew
Andrii Anisov May 8, 2019, 10:03 a.m. UTC | #12
Hello Dario,

Sorry for a delayed answer,

On 26.04.19 18:13, Dario Faggioli wrote:
> You're suggesting that's redundant too, aren't you?

Yes, I do.
And have pushed the patch [1].

[1] https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg00597.html
diff mbox series

Patch

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 6dc633ed50..915ae0b4c6 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -343,8 +343,7 @@  void context_switch(struct vcpu *prev, struct vcpu *next)
     ASSERT(prev != next);
     ASSERT(!vcpu_cpu_dirty(next));
 
-    if ( prev != next )
-        update_runstate_area(prev);
+    update_runstate_area(prev);
 
     local_irq_disable();
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9eaa978ce5..d2d9f2fc3c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1721,6 +1721,7 @@  void context_switch(struct vcpu *prev, struct vcpu *next)
     const struct domain *prevd = prev->domain, *nextd = next->domain;
     unsigned int dirty_cpu = next->dirty_cpu;
 
+    ASSERT(prev != next);
     ASSERT(local_irq_is_enabled());
 
     get_cpu_info()->use_pv_cr3 = false;
@@ -1732,12 +1733,9 @@  void context_switch(struct vcpu *prev, struct vcpu *next)
         flush_mask(cpumask_of(dirty_cpu), FLUSH_VCPU_STATE);
     }
 
-    if ( prev != next )
-    {
-        _update_runstate_area(prev);
-        vpmu_switch_from(prev);
-        np2m_schedule(NP2M_SCHEDLE_OUT);
-    }
+    _update_runstate_area(prev);
+    vpmu_switch_from(prev);
+    np2m_schedule(NP2M_SCHEDLE_OUT);
 
     if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm.tm_list) )
         pt_save_timer(prev);
@@ -1794,14 +1792,10 @@  void context_switch(struct vcpu *prev, struct vcpu *next)
 
     context_saved(prev);
 
-    if ( prev != next )
-    {
-        _update_runstate_area(next);
-
-        /* Must be done with interrupts enabled */
-        vpmu_switch_to(next);
-        np2m_schedule(NP2M_SCHEDLE_IN);
-    }
+    _update_runstate_area(next);
+    /* Must be done with interrupts enabled */
+    vpmu_switch_to(next);
+    np2m_schedule(NP2M_SCHEDLE_IN);
 
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);