mbox series

[0/4,v4] bfq: Avoid use-after-free when moving processes between cgroups

Message ID 20220114164215.28972-1-jack@suse.cz (mailing list archive)
Headers show
Series bfq: Avoid use-after-free when moving processes between cgroups | expand

Message

Jan Kara Jan. 14, 2022, 5:01 p.m. UTC
Hello,

here is the third version of my patches to fix use-after-free issues in BFQ
when processes with merged queues get moved to different cgroups. The patches
have survived some beating in my test VM, but so far I fail to reproduce the
original KASAN reports so testing from people who can reproduce them is most
welcome. Kuai, can you please give these patches a run in your setup? Thanks
a lot for your help with fixing this!

Changes since v3:
* Changed handling of bfq group move to handle the case when target of the
  merge has moved.

Changes since v2:
* Improved handling of bfq queue splitting on move between cgroups
* Removed broken change to bfq_put_cooperator()

Changes since v1:
* Added fix for bfq_put_cooperator()
* Added fix to handle move between cgroups in bfq_merge_bio()

								Honza
Previous versions:
Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3

Comments

Yu Kuai Jan. 17, 2022, 8:13 a.m. UTC | #1
在 2022/01/15 1:01, Jan Kara 写道:
> Hello,
> 
> here is the third version of my patches to fix use-after-free issues in BFQ
> when processes with merged queues get moved to different cgroups. The patches
> have survived some beating in my test VM, but so far I fail to reproduce the
> original KASAN reports so testing from people who can reproduce them is most
> welcome. Kuai, can you please give these patches a run in your setup? Thanks
> a lot for your help with fixing this!
> 
> Changes since v3:
> * Changed handling of bfq group move to handle the case when target of the
>    merge has moved.
Hi, Jan

The problem can still be reporduced...

Do you implement this in patch 4? If so, I don't understand how you
chieve this.

For example: 3 bfqqs were merged:
q1->q2->q3

If __bfq_bic_change_cgroup() is called with q2, the problem can be
fixed. However, if __bfq_bic_change_cgroup() is called with q3, can
the problem be fixed?

Thanks,
Kuai
> 
> Changes since v2:
> * Improved handling of bfq queue splitting on move between cgroups
> * Removed broken change to bfq_put_cooperator()
> 
> Changes since v1:
> * Added fix for bfq_put_cooperator()
> * Added fix to handle move between cgroups in bfq_merge_bio()
> 
> 								Honza
> Previous versions:
> Link: http://lore.kernel.org/r/20211223171425.3551-1-jack@suse.cz # v1
> Link: http://lore.kernel.org/r/20220105143037.20542-1-jack@suse.cz # v2
> Link: http://lore.kernel.org/r/20220112113529.6355-1-jack@suse.cz # v3
> .
>
Jan Kara Jan. 17, 2022, 11:45 a.m. UTC | #2
On Mon 17-01-22 16:13:09, yukuai (C) wrote:
> 在 2022/01/15 1:01, Jan Kara 写道:
> > Hello,
> > 
> > here is the third version of my patches to fix use-after-free issues in BFQ
> > when processes with merged queues get moved to different cgroups. The patches
> > have survived some beating in my test VM, but so far I fail to reproduce the
> > original KASAN reports so testing from people who can reproduce them is most
> > welcome. Kuai, can you please give these patches a run in your setup? Thanks
> > a lot for your help with fixing this!
> > 
> > Changes since v3:
> > * Changed handling of bfq group move to handle the case when target of the
> >    merge has moved.
> Hi, Jan
> 
> The problem can still be reporduced...

Drat. Thanks for testing.

> Do you implement this in patch 4? If so, I don't understand how you
> chieve this.

Yes, patch 4 should be handling this.

> For example: 3 bfqqs were merged:
> q1->q2->q3
> 
> If __bfq_bic_change_cgroup() is called with q2, the problem can be
> fixed. However, if __bfq_bic_change_cgroup() is called with q3, can
> the problem be fixed?

Maybe I'm missing something so let's walk through your example where the
bio is submitted for a task attached to q3. Bio is submitted,
__bfq_bic_change_cgroup() is called with sync_bfqq == q3, the loop

               while (sync_bfqq->new_bfqq)
                       sync_bfqq = sync_bfqq->new_bfqq;

won't do anything. Then we check:

               if (sync_bfqq->entity.sched_data != &bfqg->sched_data) {

This should be true because the task got moved and the bio is thus now
submitted for a different cgroup. Then we get to the condition:

                       if (orig_bfqq != sync_bfqq || bfq_bfqq_coop(sync_bfqq))

orig_bfqq == sync_bfqq so that won't help but the idea was that
bfq_bfqq_coop() would trigger in this case so we detach the process from q3
(through bic_set_bfqq(bic, NULL, 1)) and create new queue in the right
cgroup. Eventually, all the processes would detach from q3 and it would get
destroyed. So why do you think this scheme will not work?

And even if q3 is not marked as coop (which should not happen when there
are other queues merged to it), we would move q3 to a cgroup for which IO
is submitted - i.e., a one which is alive.

Hum, but maybe with the above merge setup (q1->q2->q3) if
__bfq_bic_change_cgroup() gets called for q1, q3 is already moved to the root
cgroup by bfq_pd_offline() and bio is for the root cgroup, then we decide
to keep everything as is. But bfq_setup_cooperator() will return q2 and
we queue IO there and q2 may still be pointing to a dead cgroup (but q2
didn't get reparented because it was not in any of the service trees). So
perhaps attached fixup will help?

								Honza
Jan Kara Jan. 17, 2022, 3:46 p.m. UTC | #3
On Mon 17-01-22 22:16:23, yukuai (C) wrote:
> 在 2022/01/17 19:45, Jan Kara 写道:
> > On Mon 17-01-22 16:13:09, yukuai (C) wrote:
> > > 在 2022/01/15 1:01, Jan Kara 写道:
> > > > Hello,
> > > > 
> > > > here is the third version of my patches to fix use-after-free issues in BFQ
> > > > when processes with merged queues get moved to different cgroups. The patches
> > > > have survived some beating in my test VM, but so far I fail to reproduce the
> > > > original KASAN reports so testing from people who can reproduce them is most
> > > > welcome. Kuai, can you please give these patches a run in your setup? Thanks
> > > > a lot for your help with fixing this!
> > > > 
> > > > Changes since v3:
> > > > * Changed handling of bfq group move to handle the case when target of the
> > > >     merge has moved.
> > > Hi, Jan
> > > 
> > > The problem can still be reporduced...
> > 
> > Drat. Thanks for testing.
> > 
> > > Do you implement this in patch 4? If so, I don't understand how you
> > > chieve this.
> > 
> > Yes, patch 4 should be handling this.
> > 
> > > For example: 3 bfqqs were merged:
> > > q1->q2->q3
> > > 
> > > If __bfq_bic_change_cgroup() is called with q2, the problem can be
> > > fixed. However, if __bfq_bic_change_cgroup() is called with q3, can
> > > the problem be fixed?
> > 
> > Maybe I'm missing something so let's walk through your example where the
> > bio is submitted for a task attached to q3. Bio is submitted,
> > __bfq_bic_change_cgroup() is called with sync_bfqq == q3, the loop
> > 
> >                 while (sync_bfqq->new_bfqq)
> >                         sync_bfqq = sync_bfqq->new_bfqq;
> > 
> > won't do anything. Then we check:
> > 
> >                 if (sync_bfqq->entity.sched_data != &bfqg->sched_data) {
> > 
> > This should be true because the task got moved and the bio is thus now
> > submitted for a different cgroup. Then we get to the condition:
> > 
> >                         if (orig_bfqq != sync_bfqq || bfq_bfqq_coop(sync_bfqq))
> > 
> > orig_bfqq == sync_bfqq so that won't help but the idea was that
> > bfq_bfqq_coop() would trigger in this case so we detach the process from q3
> > (through bic_set_bfqq(bic, NULL, 1)) and create new queue in the right
> > cgroup. Eventually, all the processes would detach from q3 and it would get
> > destroyed. So why do you think this scheme will not work?
> 
> Hi, Jan
> 
> I repoduced again with some debug info:

Thanks for your help with debugging!

> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index f6f5f156b9f2..f62ebc4bbe56 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -732,8 +732,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
> bfq_data *bfqd,
>                 struct bfq_queue *orig_bfqq = sync_bfqq;
> 
>                 /* Traverse the merge chain to bfqq we will be using */
> -               while (sync_bfqq->new_bfqq)
> +               while (sync_bfqq->new_bfqq) {
> +                       printk("%s: bfqq %px, new_bfqq %px\n", __func__,
> +                                       sync_bfqq, sync_bfqq->new_bfqq);
>                         sync_bfqq = sync_bfqq->new_bfqq;
> +               }
>                 /*
>                  * Target bfqq got moved to a different cgroup or this
> process
>                  * started submitting bios for different cgroup?
> @@ -756,6 +759,8 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
> bfq_data *bfqd,
>                                 bic_set_bfqq(bic, NULL, 1);

Maybe it would be nice to add a debug message here, saying we are detaching
the process from orig_bfqq. Just that we know that this branch executed.

>                                 return bfqg;
>                         }
> +                       printk("%s: bfqq %px move from %px to %px\n",
> __func__,
> +                               sync_bfqq, bfqq_group(sync_bfqq), bfqg);
>                         /* We are the only user of this bfqq, just move it
> */
>                         bfq_bfqq_move(bfqd, sync_bfqq, bfqg);

...

> -       if (bfqq->new_bfqq)
> +       if (bfqq->new_bfqq) {
> +               if (bfqq->entity.parent != bfqq->new_bfqq->entity.parent) {
> +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n",
> __func__,
> +                               bfqq, bfqq_group(bfqq), bfqq->new_bfqq,
> +                               bfqq_group(bfqq->new_bfqq));
> +                       BUG_ON(1);

OK, so I can see why this triggered. We have:

Set new_bfqq for bfqq *eec0:
[   50.207263] bfq_setup_merge: set bfqq ffff88816b16eec0 new_bfqq ffff8881133e1340 parent ffff88814380b000/ffff88814380b000
Move new_bfqq to a root cgroup:
[   50.485933] bfq_reparent_leaf_entity: bfqq ffff8881133e1340 move from ffff88814380b000 to ffff88810b1f1000
Submit bio for root cgroup through *eec0:
[   50.519910] __bfq_bic_change_cgroup: bfqq ffff88816b16eec0, new_bfqq ffff8881133e1340
__bfq_bic_change_cgroup() does nothing as the bio is for the cgroup to which
target queue belongs.
[   50.520640] bfq_setup_cooperator: bfqq ffff88816b16eec0(ffff88814380b000) new_bfqq ffff8881133e1340(ffff88810b1f1000)
The BUG_ON trips. 

So at this moment the BUG_ON was just too eager to trigger as we would be
submitting IO to a bfq queue belonging to an appropriate cgroup. But I
guess it does not make sence to keep unfinished merges over cgroup
migration and __bfq_bic_change_cgroup() should have detached task from its
bfqq anyway. Can you please try running with a new version of patch 4 which
is attached? And perhaps with your debug patch as well... Thanks!

								Honza
Jan Kara Jan. 21, 2022, 10:58 a.m. UTC | #4
On Tue 18-01-22 11:00:43, yukuai (C) wrote:
> 在 2022/01/17 23:46, Jan Kara 写道:
> > On Mon 17-01-22 22:16:23, yukuai (C) wrote:
> > > 在 2022/01/17 19:45, Jan Kara 写道:
> > > > On Mon 17-01-22 16:13:09, yukuai (C) wrote:
> > > > > 在 2022/01/15 1:01, Jan Kara 写道:
> > > > > > Hello,
> > > > > > 
> > > > > > here is the third version of my patches to fix use-after-free issues in BFQ
> > > > > > when processes with merged queues get moved to different cgroups. The patches
> > > > > > have survived some beating in my test VM, but so far I fail to reproduce the
> > > > > > original KASAN reports so testing from people who can reproduce them is most
> > > > > > welcome. Kuai, can you please give these patches a run in your setup? Thanks
> > > > > > a lot for your help with fixing this!
> > > > > > 
> > > > > > Changes since v3:
> > > > > > * Changed handling of bfq group move to handle the case when target of the
> > > > > >      merge has moved.
> > > > > Hi, Jan
> > > > > 
> > > > > The problem can still be reporduced...
> > > > 
> > > > Drat. Thanks for testing.
> > > > 
> > > > > Do you implement this in patch 4? If so, I don't understand how you
> > > > > chieve this.
> > > > 
> > > > Yes, patch 4 should be handling this.
> > > > 
> > > > > For example: 3 bfqqs were merged:
> > > > > q1->q2->q3
> > > > > 
> > > > > If __bfq_bic_change_cgroup() is called with q2, the problem can be
> > > > > fixed. However, if __bfq_bic_change_cgroup() is called with q3, can
> > > > > the problem be fixed?
> > > > 
> > > > Maybe I'm missing something so let's walk through your example where the
> > > > bio is submitted for a task attached to q3. Bio is submitted,
> > > > __bfq_bic_change_cgroup() is called with sync_bfqq == q3, the loop
> > > > 
> > > >                  while (sync_bfqq->new_bfqq)
> > > >                          sync_bfqq = sync_bfqq->new_bfqq;
> > > > 
> > > > won't do anything. Then we check:
> > > > 
> > > >                  if (sync_bfqq->entity.sched_data != &bfqg->sched_data) {
> > > > 
> > > > This should be true because the task got moved and the bio is thus now
> > > > submitted for a different cgroup. Then we get to the condition:
> > > > 
> > > >                          if (orig_bfqq != sync_bfqq || bfq_bfqq_coop(sync_bfqq))
> > > > 
> > > > orig_bfqq == sync_bfqq so that won't help but the idea was that
> > > > bfq_bfqq_coop() would trigger in this case so we detach the process from q3
> > > > (through bic_set_bfqq(bic, NULL, 1)) and create new queue in the right
> > > > cgroup. Eventually, all the processes would detach from q3 and it would get
> > > > destroyed. So why do you think this scheme will not work?
> > > 
> > > Hi, Jan
> > > 
> > > I repoduced again with some debug info:
> > 
> > Thanks for your help with debugging!
> > 
> > > diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> > > index f6f5f156b9f2..f62ebc4bbe56 100644
> > > --- a/block/bfq-cgroup.c
> > > +++ b/block/bfq-cgroup.c
> > > @@ -732,8 +732,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
> > > bfq_data *bfqd,
> > >                  struct bfq_queue *orig_bfqq = sync_bfqq;
> > > 
> > >                  /* Traverse the merge chain to bfqq we will be using */
> > > -               while (sync_bfqq->new_bfqq)
> > > +               while (sync_bfqq->new_bfqq) {
> > > +                       printk("%s: bfqq %px, new_bfqq %px\n", __func__,
> > > +                                       sync_bfqq, sync_bfqq->new_bfqq);
> > >                          sync_bfqq = sync_bfqq->new_bfqq;
> > > +               }
> > >                  /*
> > >                   * Target bfqq got moved to a different cgroup or this
> > > process
> > >                   * started submitting bios for different cgroup?
> > > @@ -756,6 +759,8 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
> > > bfq_data *bfqd,
> > >                                  bic_set_bfqq(bic, NULL, 1);
> > 
> > Maybe it would be nice to add a debug message here, saying we are detaching
> > the process from orig_bfqq. Just that we know that this branch executed.
> > 
> > >                                  return bfqg;
> > >                          }
> > > +                       printk("%s: bfqq %px move from %px to %px\n",
> > > __func__,
> > > +                               sync_bfqq, bfqq_group(sync_bfqq), bfqg);
> > >                          /* We are the only user of this bfqq, just move it
> > > */
> > >                          bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
> > 
> > ...
> > 
> > > -       if (bfqq->new_bfqq)
> > > +       if (bfqq->new_bfqq) {
> > > +               if (bfqq->entity.parent != bfqq->new_bfqq->entity.parent) {
> > > +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n",
> > > __func__,
> > > +                               bfqq, bfqq_group(bfqq), bfqq->new_bfqq,
> > > +                               bfqq_group(bfqq->new_bfqq));
> > > +                       BUG_ON(1);
> > 
> > OK, so I can see why this triggered. We have:
> > 
> > Set new_bfqq for bfqq *eec0:
> > [   50.207263] bfq_setup_merge: set bfqq ffff88816b16eec0 new_bfqq ffff8881133e1340 parent ffff88814380b000/ffff88814380b000
> > Move new_bfqq to a root cgroup:
> > [   50.485933] bfq_reparent_leaf_entity: bfqq ffff8881133e1340 move from ffff88814380b000 to ffff88810b1f1000
> > Submit bio for root cgroup through *eec0:
> > [   50.519910] __bfq_bic_change_cgroup: bfqq ffff88816b16eec0, new_bfqq ffff8881133e1340
> > __bfq_bic_change_cgroup() does nothing as the bio is for the cgroup to which
> > target queue belongs.
> > [   50.520640] bfq_setup_cooperator: bfqq ffff88816b16eec0(ffff88814380b000) new_bfqq ffff8881133e1340(ffff88810b1f1000)
> > The BUG_ON trips.
> > 
> > So at this moment the BUG_ON was just too eager to trigger as we would be
> > submitting IO to a bfq queue belonging to an appropriate cgroup. But I
> > guess it does not make sence to keep unfinished merges over cgroup
> > migration and __bfq_bic_change_cgroup() should have detached task from its
> > bfqq anyway. Can you please try running with a new version of patch 4 which
> > is attached? And perhaps with your debug patch as well... Thanks!
> 
> Hi
> 
> I reporduced again with the following:
> 
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index f6f5f156b9f2..1afb9127ca84 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -732,8 +732,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
> bfq_data *bfqd,
>                 struct bfq_queue *orig_bfqq = sync_bfqq;
> 
>                 /* Traverse the merge chain to bfqq we will be using */
> -               while (sync_bfqq->new_bfqq)
> +               if (sync_bfqq->new_bfqq) {
> +                       printk("%s: bfqq %px, new_bfqq %px\n", __func__,
> +                                       sync_bfqq, sync_bfqq->new_bfqq);
>                         sync_bfqq = sync_bfqq->new_bfqq;
> +               }
>                 /*
>                  * Target bfqq got moved to a different cgroup or this
> process
>                  * started submitting bios for different cgroup?
> @@ -754,8 +757,11 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct
> bfq_data *bfqd,
>                                 bfq_put_cooperator(orig_bfqq);
>                                 bfq_release_process_ref(bfqd, orig_bfqq);
>                                 bic_set_bfqq(bic, NULL, 1);
> +                               printk("%s: bfqq %px detached\n", __func__,
> orig_bfqq);
>                                 return bfqg;
>                         }
> +                       printk("%s: bfqq %px move from %px to %px\n",
> __func__,
> +                               sync_bfqq, bfqq_group(sync_bfqq), bfqg);
>                         /* We are the only user of this bfqq, just move it
> */
>                         bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
>                 }
> @@ -875,7 +881,10 @@ static void bfq_reparent_leaf_entity(struct bfq_data
> *bfqd,
>         }
> 
>         bfqq = bfq_entity_to_bfqq(child_entity);
> +       printk("%s: bfqq %px move from %px to %px\n", __func__,
> +                       bfqq, bfqq_group(bfqq), bfqd->root_group);
>         bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
> +
>  }
> 
>  /**
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 07be51bc229b..c6b439df28ad 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2797,6 +2797,8 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct
> bfq_queue *new_bfqq)
>          * are likely to increase the throughput.
>          */
>         bfqq->new_bfqq = new_bfqq;
> +       printk("%s: set bfqq %px new_bfqq %px parent %px/%px \n", __func__,
> bfqq,
> +                       new_bfqq, bfqq_group(bfqq), bfqq_group(new_bfqq));
>         new_bfqq->ref += process_refs;
>         return new_bfqq;
>  }
> @@ -2963,8 +2965,16 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct
> bfq_queue *bfqq,
>         if (bfq_too_late_for_merging(bfqq))
>                 return NULL;
> 
> -       if (bfqq->new_bfqq)
> +       if (bfqq->new_bfqq) {
> +               if (bfqq->entity.parent != bfqq->new_bfqq->entity.parent &&
> +                   bfqq_group(bfqq->new_bfqq) != bfqd->root_group) {
> +                       printk("%s: bfqq %px(%px) new_bfqq %px(%px)\n",
> __func__,
> +                               bfqq, bfqq_group(bfqq), bfqq->new_bfqq,
> +                               bfqq_group(bfqq->new_bfqq));
> +                       BUG_ON(1);
> +               }
>                 return bfqq->new_bfqq;
> +       }
> 
>         if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
>                 return NULL;
> @@ -6928,6 +6938,8 @@ static void __bfq_put_async_bfqq(struct bfq_data
> *bfqd,
> 
>         bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
>         if (bfqq) {
> +               printk("%s: bfqq %px move from %px to %px\n", __func__,
> +                               bfqq, bfqq_group(bfqq), bfqd->root_group);
>                 bfq_bfqq_move(bfqd, bfqq, bfqd->root_group);
> 
>                 bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",

This was not exactly what I wanted but now I've realized I've sent you a
wrong patch. My fault and thanks for testing!

> The result is attached, one thing that is weird: the uaf is triggered
> before the BUG_ON.

Yes, that is interesting. The BUG_ON at the end indeed shows a case where
there was a longer chain of merges bfq_pd_offline() moved some of the
queues from the merge chain to the root cgroup but not all. The state was

ffff888169d57440 -> ffff88810b593180 -> ffff88812cfc9340
(root)              (root)              (orig cgroup)

Then something strange happened:

[   92.692545] __bfq_bic_change_cgroup: bfqq ffff888169d57440, new_bfqq ffff88810b593180
[   92.693374] bfq_setup_cooperator: bfqq ffff88810b593180(ffff88816d167000) new_bfqq ffff88812cfc9340(ffff88817ab36000)

We can see __bfq_bic_change_cgroup() got called for queue ffff888169d57440
but then not for ffff88810b593180 while bfq_setup_cooperator() got called
for ffff88810b593180. Not sure what happened inside the BFQ. Anyway, the
presence of these merge chains and the fact that bfq_pd_offline() can move
arbitrary subset to different cgroup shows that we should be more careful.
I've changed __bfq_bic_change_cgroup() to check the whole merge chain and
detach from bfqq if anything does not match. I'll send it as a new
revision. Can you please test it? Thanks!

								Honza