diff mbox series

sched/core: Fix bug when moving a domain between cpupools

Message ID 20200327193023.506-1-jeff.kubascik@dornerworks.com (mailing list archive)
State New, archived
Headers show
Series sched/core: Fix bug when moving a domain between cpupools | expand

Commit Message

Jeff Kubascik March 27, 2020, 7:30 p.m. UTC
For each UNIT, sched_set_affinity is called before unit->priv is updated
to the new cpupool private UNIT data structure. The issue is
sched_set_affinity will call the adjust_affinity method of the cpupool.
If defined, the new cpupool may use unit->priv (e.g. credit), which at
this point still references the old cpupool private UNIT data structure.

This change fixes the bug by moving the switch of unit->priv earler in
the function.

Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
---
Hello,

I've been working on updating the arinc653 scheduler to support
multicore for a few months now. In the process of testing, I came across
this obscure bug in the core scheduler code that took me a few weeks to
track down. This bug resulted in the credit scheduler writing past the
end of the arinc653 private UNIT data structure into the TLSF allocator
bhdr structure of the adjacent region. This required some deep diving
into the TLSF allocator code to trace the bug back to this point.

Sincerely,
Jeff Kubascik
---
 xen/common/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff Kubascik April 14, 2020, 8:52 p.m. UTC | #1
Hello,

I wanted to follow up on this patch, as I have not seen any responses to it.

In my work on the ARINC653 scheduler, I have observed this bug write to memory
past the end of a private UNIT structure (and in my case, stomping on a TLSF
allocator header) when migrating a domain from an ARINC cpupool to a credit
cpupool. This occurs because (a) the private UNIT structure is smaller for the
ARINC cpupool and (b) the credit scheduler method csched_aff_cntl does some bit
setting/ clearing while the private UNIT pointer still points incorrectly to the
ARINC cpupool one.

On 3/27/2020 3:30 PM, Jeff Kubascik wrote:
> For each UNIT, sched_set_affinity is called before unit->priv is updated
> to the new cpupool private UNIT data structure. The issue is
> sched_set_affinity will call the adjust_affinity method of the cpupool.
> If defined, the new cpupool may use unit->priv (e.g. credit), which at
> this point still references the old cpupool private UNIT data structure.
> 
> This change fixes the bug by moving the switch of unit->priv earler in
> the function.
> 
> Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
> ---
> Hello,
> 
> I've been working on updating the arinc653 scheduler to support
> multicore for a few months now. In the process of testing, I came across
> this obscure bug in the core scheduler code that took me a few weeks to
> track down. This bug resulted in the credit scheduler writing past the
> end of the arinc653 private UNIT data structure into the TLSF allocator
> bhdr structure of the adjacent region. This required some deep diving
> into the TLSF allocator code to trace the bug back to this point.
> 
> Sincerely,
> Jeff Kubascik
> ---
>  xen/common/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index 7e8e7d2c39..ea572a345a 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -686,6 +686,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>          unsigned int unit_p = new_p;
>  
>          unitdata = unit->priv;
> +        unit->priv = unit_priv[unit_idx];
>  
>          for_each_sched_unit_vcpu ( unit, v )
>          {
> @@ -707,7 +708,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>           */
>          spin_unlock_irq(lock);
>  
> -        unit->priv = unit_priv[unit_idx];
>          if ( !d->is_dying )
>              sched_move_irqs(unit);
>  
> 

Sincerely,
Jeff Kubascik
Jürgen Groß April 15, 2020, 9:08 a.m. UTC | #2
On 27.03.20 20:30, Jeff Kubascik wrote:
> For each UNIT, sched_set_affinity is called before unit->priv is updated
> to the new cpupool private UNIT data structure. The issue is
> sched_set_affinity will call the adjust_affinity method of the cpupool.
> If defined, the new cpupool may use unit->priv (e.g. credit), which at
> this point still references the old cpupool private UNIT data structure.
> 
> This change fixes the bug by moving the switch of unit->priv earler in
> the function.
> 
> Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
Dario Faggioli April 16, 2020, 4:09 p.m. UTC | #3
On Wed, 2020-04-15 at 11:08 +0200, Jürgen Groß wrote:
> On 27.03.20 20:30, Jeff Kubascik wrote:
> > For each UNIT, sched_set_affinity is called before unit->priv is
> > updated
> > to the new cpupool private UNIT data structure. The issue is
> > sched_set_affinity will call the adjust_affinity method of the
> > cpupool.
> > If defined, the new cpupool may use unit->priv (e.g. credit), which
> > at
> > this point still references the old cpupool private UNIT data
> > structure.
> > 
> > This change fixes the bug by moving the switch of unit->priv earler
> > in
> > the function.
> > 
> > Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
>
Acked-by: Dario Faggioli <dfaggioli@suse.com>

Sorry it took a while. And thanks Juergen for also having a look.

Regards
Jeff Kubascik April 20, 2020, 1:42 p.m. UTC | #4
Thank you Juergen and Dario!

On 4/16/2020 12:09 PM, Dario Faggioli wrote:
> On Wed, 2020-04-15 at 11:08 +0200, Jürgen Groß wrote:
>> On 27.03.20 20:30, Jeff Kubascik wrote:
>>> For each UNIT, sched_set_affinity is called before unit->priv is
>>> updated
>>> to the new cpupool private UNIT data structure. The issue is
>>> sched_set_affinity will call the adjust_affinity method of the
>>> cpupool.
>>> If defined, the new cpupool may use unit->priv (e.g. credit), which
>>> at
>>> this point still references the old cpupool private UNIT data
>>> structure.
>>>
>>> This change fixes the bug by moving the switch of unit->priv earler
>>> in
>>> the function.
>>>
>>> Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
>>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
> Acked-by: Dario Faggioli <dfaggioli@suse.com>
> 
> Sorry it took a while. And thanks Juergen for also having a look.
> 
> Regards
> 

Sincerely,
Jeff Kubascik
diff mbox series

Patch

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 7e8e7d2c39..ea572a345a 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -686,6 +686,7 @@  int sched_move_domain(struct domain *d, struct cpupool *c)
         unsigned int unit_p = new_p;
 
         unitdata = unit->priv;
+        unit->priv = unit_priv[unit_idx];
 
         for_each_sched_unit_vcpu ( unit, v )
         {
@@ -707,7 +708,6 @@  int sched_move_domain(struct domain *d, struct cpupool *c)
          */
         spin_unlock_irq(lock);
 
-        unit->priv = unit_priv[unit_idx];
         if ( !d->is_dying )
             sched_move_irqs(unit);