diff mbox series

sched: correct sched_move_domain()'s cleanup path

Message ID 2b59a3a2-d2f3-4d29-ab40-3f630fd497fe@suse.com (mailing list archive)
State New, archived
Headers show
Series sched: correct sched_move_domain()'s cleanup path | expand

Commit Message

Jan Beulich Dec. 4, 2023, 10:57 a.m. UTC
It is only in the error case that we want to clean up the new pool's
scheduler data; in the success case it's rather the old scheduler's
data which needs cleaning up.

Reported-by: René Winther Højgaard <renewin@proton.me>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

Comments

George Dunlap Dec. 4, 2023, 1 p.m. UTC | #1
On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> It is only in the error case that we want to clean up the new pool's
> scheduler data; in the success case it's rather the old scheduler's
> data which needs cleaning up.
>
> Reported-by: René Winther Højgaard <renewin@proton.me>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
>
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>      for ( unit = old_units; unit; )
>      {
>          if ( unit->priv )
> -            sched_free_udata(c->sched, unit->priv);
> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>          old_unit = unit;
>          unit = unit->next_in_list;
>          xfree(old_unit);

This code is unfortunately written in a "clever" way which seems to
have introduced some confusion.  The one place which calls "goto
out_free" goes through and replaces *most* of the "old_*" variables
with the "new" equivalents.  That's why we're iterating over
`old_units` even on the failure path.

The result is that this change doesn't catch another bug on the
following line, in the error case:

sched_free_domdata(old_ops, old_domdata);

At this point, old_ops is still the old ops, but old_domdata is the
*new* domdata.

A patch like the following (compile tested only) would fix it along
the lines of the original intent:
8<-------
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index eba0cea4bb..78f21839d3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         {
             old_units = new_units;
             old_domdata = domdata;
+            old_ops = c->sched;
             ret = -ENOMEM;
             goto out_free;
         }
@@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     domain_unpause(d);

  out_free:
+    /*
+     * NB if we've jumped here, "old_units", "old_ops", and so on will
+     * actually be pointing to the new ops, since when aborting it's
+     * the new ops we want to free.
+     */
     for ( unit = old_units; unit; )
     {
         if ( unit->priv )
-            sched_free_udata(c->sched, unit->priv);
+            sched_free_udata(old_ops, unit->priv);
         old_unit = unit;
         unit = unit->next_in_list;
         xfree(old_unit);
---->8

But given that this kind of cleverness has already fooled two of our
most senior developers, I'd suggest making the whole thing more
explicit; something like the attached (again compile-tested only)?

 -George
Jürgen Groß Dec. 4, 2023, 1:47 p.m. UTC | #2
On 04.12.23 14:00, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> It is only in the error case that we want to clean up the new pool's
>> scheduler data; in the success case it's rather the old scheduler's
>> data which needs cleaning up.
>>
>> Reported-by: René Winther Højgaard <renewin@proton.me>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>       for ( unit = old_units; unit; )
>>       {
>>           if ( unit->priv )
>> -            sched_free_udata(c->sched, unit->priv);
>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>           old_unit = unit;
>>           unit = unit->next_in_list;
>>           xfree(old_unit);
> 
> This code is unfortunately written in a "clever" way which seems to
> have introduced some confusion.  The one place which calls "goto
> out_free" goes through and replaces *most* of the "old_*" variables
> with the "new" equivalents.  That's why we're iterating over
> `old_units` even on the failure path.
> 
> The result is that this change doesn't catch another bug on the
> following line, in the error case:
> 
> sched_free_domdata(old_ops, old_domdata);
> 
> At this point, old_ops is still the old ops, but old_domdata is the
> *new* domdata.
> 
> A patch like the following (compile tested only) would fix it along
> the lines of the original intent:
> 8<-------
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index eba0cea4bb..78f21839d3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>           {
>               old_units = new_units;
>               old_domdata = domdata;
> +            old_ops = c->sched;
>               ret = -ENOMEM;
>               goto out_free;
>           }
> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>       domain_unpause(d);
> 
>    out_free:
> +    /*
> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> +     * actually be pointing to the new ops, since when aborting it's
> +     * the new ops we want to free.
> +     */
>       for ( unit = old_units; unit; )
>       {
>           if ( unit->priv )
> -            sched_free_udata(c->sched, unit->priv);
> +            sched_free_udata(old_ops, unit->priv);
>           old_unit = unit;
>           unit = unit->next_in_list;
>           xfree(old_unit);
> ---->8
> 
> But given that this kind of cleverness has already fooled two of our
> most senior developers, I'd suggest making the whole thing more
> explicit; something like the attached (again compile-tested only)?

And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch.

Thoughts?


Juergen
George Dunlap Dec. 4, 2023, 1:55 p.m. UTC | #3
On Mon, Dec 4, 2023 at 1:47 PM Juergen Gross <jgross@suse.com> wrote:
>
> On 04.12.23 14:00, George Dunlap wrote:
> > On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> It is only in the error case that we want to clean up the new pool's
> >> scheduler data; in the success case it's rather the old scheduler's
> >> data which needs cleaning up.
> >>
> >> Reported-by: René Winther Højgaard <renewin@proton.me>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> Reviewed-by: Juergen Gross <jgross@suse.com>
> >>
> >> --- a/xen/common/sched/core.c
> >> +++ b/xen/common/sched/core.c
> >> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
> >>       for ( unit = old_units; unit; )
> >>       {
> >>           if ( unit->priv )
> >> -            sched_free_udata(c->sched, unit->priv);
> >> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
> >>           old_unit = unit;
> >>           unit = unit->next_in_list;
> >>           xfree(old_unit);
> >
> > This code is unfortunately written in a "clever" way which seems to
> > have introduced some confusion.  The one place which calls "goto
> > out_free" goes through and replaces *most* of the "old_*" variables
> > with the "new" equivalents.  That's why we're iterating over
> > `old_units` even on the failure path.
> >
> > The result is that this change doesn't catch another bug on the
> > following line, in the error case:
> >
> > sched_free_domdata(old_ops, old_domdata);
> >
> > At this point, old_ops is still the old ops, but old_domdata is the
> > *new* domdata.
> >
> > A patch like the following (compile tested only) would fix it along
> > the lines of the original intent:
> > 8<-------
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index eba0cea4bb..78f21839d3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >           {
> >               old_units = new_units;
> >               old_domdata = domdata;
> > +            old_ops = c->sched;
> >               ret = -ENOMEM;
> >               goto out_free;
> >           }
> > @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >       domain_unpause(d);
> >
> >    out_free:
> > +    /*
> > +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> > +     * actually be pointing to the new ops, since when aborting it's
> > +     * the new ops we want to free.
> > +     */
> >       for ( unit = old_units; unit; )
> >       {
> >           if ( unit->priv )
> > -            sched_free_udata(c->sched, unit->priv);
> > +            sched_free_udata(old_ops, unit->priv);
> >           old_unit = unit;
> >           unit = unit->next_in_list;
> >           xfree(old_unit);
> > ---->8
> >
> > But given that this kind of cleverness has already fooled two of our
> > most senior developers, I'd suggest making the whole thing more
> > explicit; something like the attached (again compile-tested only)?
>
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch.
>
> Thoughts?

I only see a PGP key and signature.  Did you forget to attach the patch?

 -George
Jürgen Groß Dec. 4, 2023, 2:10 p.m. UTC | #4
On 04.12.23 14:00, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> It is only in the error case that we want to clean up the new pool's
>> scheduler data; in the success case it's rather the old scheduler's
>> data which needs cleaning up.
>>
>> Reported-by: René Winther Højgaard <renewin@proton.me>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>       for ( unit = old_units; unit; )
>>       {
>>           if ( unit->priv )
>> -            sched_free_udata(c->sched, unit->priv);
>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>           old_unit = unit;
>>           unit = unit->next_in_list;
>>           xfree(old_unit);
> 
> This code is unfortunately written in a "clever" way which seems to
> have introduced some confusion.  The one place which calls "goto
> out_free" goes through and replaces *most* of the "old_*" variables
> with the "new" equivalents.  That's why we're iterating over
> `old_units` even on the failure path.
> 
> The result is that this change doesn't catch another bug on the
> following line, in the error case:
> 
> sched_free_domdata(old_ops, old_domdata);
> 
> At this point, old_ops is still the old ops, but old_domdata is the
> *new* domdata.
> 
> A patch like the following (compile tested only) would fix it along
> the lines of the original intent:
> 8<-------
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index eba0cea4bb..78f21839d3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>           {
>               old_units = new_units;
>               old_domdata = domdata;
> +            old_ops = c->sched;
>               ret = -ENOMEM;
>               goto out_free;
>           }
> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>       domain_unpause(d);
> 
>    out_free:
> +    /*
> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> +     * actually be pointing to the new ops, since when aborting it's
> +     * the new ops we want to free.
> +     */
>       for ( unit = old_units; unit; )
>       {
>           if ( unit->priv )
> -            sched_free_udata(c->sched, unit->priv);
> +            sched_free_udata(old_ops, unit->priv);
>           old_unit = unit;
>           unit = unit->next_in_list;
>           xfree(old_unit);
> ---->8
> 
> But given that this kind of cleverness has already fooled two of our
> most senior developers, I'd suggest making the whole thing more
> explicit; something like the attached (again compile-tested only)?

And I have again a third approach, making it crystal clear what is happening
with which data. No need to explain what is freed via which variables. See
attached patch (this time it should be really there).

Thoughts?


Juergen
George Dunlap Dec. 4, 2023, 2:18 p.m. UTC | #5
On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross <jgross@suse.com> wrote:
>
> On 04.12.23 14:00, George Dunlap wrote:
> > On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> It is only in the error case that we want to clean up the new pool's
> >> scheduler data; in the success case it's rather the old scheduler's
> >> data which needs cleaning up.
> >>
> >> Reported-by: René Winther Højgaard <renewin@proton.me>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> Reviewed-by: Juergen Gross <jgross@suse.com>
> >>
> >> --- a/xen/common/sched/core.c
> >> +++ b/xen/common/sched/core.c
> >> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
> >>       for ( unit = old_units; unit; )
> >>       {
> >>           if ( unit->priv )
> >> -            sched_free_udata(c->sched, unit->priv);
> >> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
> >>           old_unit = unit;
> >>           unit = unit->next_in_list;
> >>           xfree(old_unit);
> >
> > This code is unfortunately written in a "clever" way which seems to
> > have introduced some confusion.  The one place which calls "goto
> > out_free" goes through and replaces *most* of the "old_*" variables
> > with the "new" equivalents.  That's why we're iterating over
> > `old_units` even on the failure path.
> >
> > The result is that this change doesn't catch another bug on the
> > following line, in the error case:
> >
> > sched_free_domdata(old_ops, old_domdata);
> >
> > At this point, old_ops is still the old ops, but old_domdata is the
> > *new* domdata.
> >
> > A patch like the following (compile tested only) would fix it along
> > the lines of the original intent:
> > 8<-------
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index eba0cea4bb..78f21839d3 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >           {
> >               old_units = new_units;
> >               old_domdata = domdata;
> > +            old_ops = c->sched;
> >               ret = -ENOMEM;
> >               goto out_free;
> >           }
> > @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
> >       domain_unpause(d);
> >
> >    out_free:
> > +    /*
> > +     * NB if we've jumped here, "old_units", "old_ops", and so on will
> > +     * actually be pointing to the new ops, since when aborting it's
> > +     * the new ops we want to free.
> > +     */
> >       for ( unit = old_units; unit; )
> >       {
> >           if ( unit->priv )
> > -            sched_free_udata(c->sched, unit->priv);
> > +            sched_free_udata(old_ops, unit->priv);
> >           old_unit = unit;
> >           unit = unit->next_in_list;
> >           xfree(old_unit);
> > ---->8
> >
> > But given that this kind of cleverness has already fooled two of our
> > most senior developers, I'd suggest making the whole thing more
> > explicit; something like the attached (again compile-tested only)?
>
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch (this time it should be really there).

Yes, I thought about making a function as well -- that works for me too.

Personally I prefer to keep the "goto out", rather than duplicating
the rcu_read_unlock().  I'd yield if Jan said he preferred
duplication, however.

 -George
Jan Beulich Dec. 4, 2023, 2:38 p.m. UTC | #6
On 04.12.2023 15:10, Juergen Gross wrote:
> And I have again a third approach, making it crystal clear what is happening
> with which data. No need to explain what is freed via which variables. See
> attached patch (this time it should be really there).

Looks more neat to me than George's. Just one minor thing: Please can the
first parameter of sched_move_domain_cleanup() be constified?

Jan
Jan Beulich Dec. 4, 2023, 2:39 p.m. UTC | #7
On 04.12.2023 15:18, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross <jgross@suse.com> wrote:
>>
>> On 04.12.23 14:00, George Dunlap wrote:
>>> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> It is only in the error case that we want to clean up the new pool's
>>>> scheduler data; in the success case it's rather the old scheduler's
>>>> data which needs cleaning up.
>>>>
>>>> Reported-by: René Winther Højgaard <renewin@proton.me>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> --- a/xen/common/sched/core.c
>>>> +++ b/xen/common/sched/core.c
>>>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>>>       for ( unit = old_units; unit; )
>>>>       {
>>>>           if ( unit->priv )
>>>> -            sched_free_udata(c->sched, unit->priv);
>>>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>>>           old_unit = unit;
>>>>           unit = unit->next_in_list;
>>>>           xfree(old_unit);
>>>
>>> This code is unfortunately written in a "clever" way which seems to
>>> have introduced some confusion.  The one place which calls "goto
>>> out_free" goes through and replaces *most* of the "old_*" variables
>>> with the "new" equivalents.  That's why we're iterating over
>>> `old_units` even on the failure path.
>>>
>>> The result is that this change doesn't catch another bug on the
>>> following line, in the error case:
>>>
>>> sched_free_domdata(old_ops, old_domdata);
>>>
>>> At this point, old_ops is still the old ops, but old_domdata is the
>>> *new* domdata.
>>>
>>> A patch like the following (compile tested only) would fix it along
>>> the lines of the original intent:
>>> 8<-------
>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>> index eba0cea4bb..78f21839d3 100644
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>           {
>>>               old_units = new_units;
>>>               old_domdata = domdata;
>>> +            old_ops = c->sched;
>>>               ret = -ENOMEM;
>>>               goto out_free;
>>>           }
>>> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>       domain_unpause(d);
>>>
>>>    out_free:
>>> +    /*
>>> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
>>> +     * actually be pointing to the new ops, since when aborting it's
>>> +     * the new ops we want to free.
>>> +     */
>>>       for ( unit = old_units; unit; )
>>>       {
>>>           if ( unit->priv )
>>> -            sched_free_udata(c->sched, unit->priv);
>>> +            sched_free_udata(old_ops, unit->priv);
>>>           old_unit = unit;
>>>           unit = unit->next_in_list;
>>>           xfree(old_unit);
>>> ---->8
>>>
>>> But given that this kind of cleverness has already fooled two of our
>>> most senior developers, I'd suggest making the whole thing more
>>> explicit; something like the attached (again compile-tested only)?
>>
>> And I have again a third approach, making it crystal clear what is happening
>> with which data. No need to explain what is freed via which variables. See
>> attached patch (this time it should be really there).
> 
> Yes, I thought about making a function as well -- that works for me too.
> 
> Personally I prefer to keep the "goto out", rather than duplicating
> the rcu_read_unlock().  I'd yield if Jan said he preferred
> duplication, however.

I'm on the edge there actually.

Jan
Jürgen Groß Dec. 4, 2023, 2:40 p.m. UTC | #8
On 04.12.23 15:38, Jan Beulich wrote:
> On 04.12.2023 15:10, Juergen Gross wrote:
>> And I have again a third approach, making it crystal clear what is happening
>> with which data. No need to explain what is freed via which variables. See
>> attached patch (this time it should be really there).
> 
> Looks more neat to me than George's. Just one minor thing: Please can the
> first parameter of sched_move_domain_cleanup() be constified?

Yes, will do that.

I'll send out V2 soon together with the other fix (this probably wants
an update of the commit message) and a small cleanup patch I have.


Juergen
Jürgen Groß Dec. 4, 2023, 2:44 p.m. UTC | #9
On 04.12.23 15:39, Jan Beulich wrote:
> On 04.12.2023 15:18, George Dunlap wrote:
>> On Mon, Dec 4, 2023 at 2:10 PM Juergen Gross <jgross@suse.com> wrote:
>>>
>>> On 04.12.23 14:00, George Dunlap wrote:
>>>> On Mon, Dec 4, 2023 at 10:57 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>
>>>>> It is only in the error case that we want to clean up the new pool's
>>>>> scheduler data; in the success case it's rather the old scheduler's
>>>>> data which needs cleaning up.
>>>>>
>>>>> Reported-by: René Winther Højgaard <renewin@proton.me>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>>>>
>>>>> --- a/xen/common/sched/core.c
>>>>> +++ b/xen/common/sched/core.c
>>>>> @@ -810,7 +810,7 @@ int sched_move_domain(struct domain *d,
>>>>>        for ( unit = old_units; unit; )
>>>>>        {
>>>>>            if ( unit->priv )
>>>>> -            sched_free_udata(c->sched, unit->priv);
>>>>> +            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
>>>>>            old_unit = unit;
>>>>>            unit = unit->next_in_list;
>>>>>            xfree(old_unit);
>>>>
>>>> This code is unfortunately written in a "clever" way which seems to
>>>> have introduced some confusion.  The one place which calls "goto
>>>> out_free" goes through and replaces *most* of the "old_*" variables
>>>> with the "new" equivalents.  That's why we're iterating over
>>>> `old_units` even on the failure path.
>>>>
>>>> The result is that this change doesn't catch another bug on the
>>>> following line, in the error case:
>>>>
>>>> sched_free_domdata(old_ops, old_domdata);
>>>>
>>>> At this point, old_ops is still the old ops, but old_domdata is the
>>>> *new* domdata.
>>>>
>>>> A patch like the following (compile tested only) would fix it along
>>>> the lines of the original intent:
>>>> 8<-------
>>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>>> index eba0cea4bb..78f21839d3 100644
>>>> --- a/xen/common/sched/core.c
>>>> +++ b/xen/common/sched/core.c
>>>> @@ -720,6 +720,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>>            {
>>>>                old_units = new_units;
>>>>                old_domdata = domdata;
>>>> +            old_ops = c->sched;
>>>>                ret = -ENOMEM;
>>>>                goto out_free;
>>>>            }
>>>> @@ -809,10 +810,15 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>>>>        domain_unpause(d);
>>>>
>>>>     out_free:
>>>> +    /*
>>>> +     * NB if we've jumped here, "old_units", "old_ops", and so on will
>>>> +     * actually be pointing to the new ops, since when aborting it's
>>>> +     * the new ops we want to free.
>>>> +     */
>>>>        for ( unit = old_units; unit; )
>>>>        {
>>>>            if ( unit->priv )
>>>> -            sched_free_udata(c->sched, unit->priv);
>>>> +            sched_free_udata(old_ops, unit->priv);
>>>>            old_unit = unit;
>>>>            unit = unit->next_in_list;
>>>>            xfree(old_unit);
>>>> ---->8
>>>>
>>>> But given that this kind of cleverness has already fooled two of our
>>>> most senior developers, I'd suggest making the whole thing more
>>>> explicit; something like the attached (again compile-tested only)?
>>>
>>> And I have again a third approach, making it crystal clear what is happening
>>> with which data. No need to explain what is freed via which variables. See
>>> attached patch (this time it should be really there).
>>
>> Yes, I thought about making a function as well -- that works for me too.
>>
>> Personally I prefer to keep the "goto out", rather than duplicating
>> the rcu_read_unlock().  I'd yield if Jan said he preferred
>> duplication, however.
> 
> I'm on the edge there actually.

In this case I'd prefer it my way, as it avoids having to scroll down to the
out: label to see what is happening there. Additionally it enables to get rid
of the ret variable.

In the end the main part of the patch is the new function, so I'm not really
feeling strong regarding the dropping of "goto out".


Juergen
George Dunlap Dec. 4, 2023, 7:33 p.m. UTC | #10
On Mon, Dec 4, 2023 at 2:44 PM Juergen Gross <jgross@suse.com> wrote:
> >> Personally I prefer to keep the "goto out", rather than duplicating
> >> the rcu_read_unlock().  I'd yield if Jan said he preferred
> >> duplication, however.
> >
> > I'm on the edge there actually.
>
> In this case I'd prefer it my way, as it avoids having to scroll down to the
> out: label to see what is happening there. Additionally it enables to get rid
> of the ret variable.

The issue is, suppose we change something else, like needing to grab
(and release) another lock?  Sharing the exit path makes it easier to
avoid those kinds of mistakes.

 -George
Jürgen Groß Dec. 5, 2023, 7:26 a.m. UTC | #11
On 04.12.23 20:33, George Dunlap wrote:
> On Mon, Dec 4, 2023 at 2:44 PM Juergen Gross <jgross@suse.com> wrote:
>>>> Personally I prefer to keep the "goto out", rather than duplicating
>>>> the rcu_read_unlock().  I'd yield if Jan said he preferred
>>>> duplication, however.
>>>
>>> I'm on the edge there actually.
>>
>> In this case I'd prefer it my way, as it avoids having to scroll down to the
>> out: label to see what is happening there. Additionally it enables to get rid
>> of the ret variable.
> 
> The issue is, suppose we change something else, like needing to grab
> (and release) another lock?  Sharing the exit path makes it easier to
> avoid those kinds of mistakes.

Yes, this could happen. OTOH it could happen that an action is added
on the exit path which should _not_ be executed in the error case.

I agree that in case of more than one exit action needed the goto approach
is superior. For zero exit actions it is inferior, while for one it will
depend on the specific case IMHO.


Juergen
diff mbox series

Patch

--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -810,7 +810,7 @@  int sched_move_domain(struct domain *d,
     for ( unit = old_units; unit; )
     {
         if ( unit->priv )
-            sched_free_udata(c->sched, unit->priv);
+            sched_free_udata(ret ? c->sched : old_ops, unit->priv);
         old_unit = unit;
         unit = unit->next_in_list;
         xfree(old_unit);