diff mbox series

[v4,08/25] block: introduce assert_bdrv_graph_writable

Message ID 20211025101735.2060852-9-eesposit@redhat.com (mailing list archive)
State New, archived
Headers show
Series block layer: split block APIs in global state and I/O | expand

Commit Message

Emanuele Giuseppe Esposito Oct. 25, 2021, 10:17 a.m. UTC
We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.

BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.

TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c                                |  5 +++++
 block/io.c                             | 11 +++++++++++
 include/block/block_int-global-state.h | 10 +++++++++-
 3 files changed, 25 insertions(+), 1 deletion(-)

Comments

Hanna Czenczek Nov. 12, 2021, 2:40 p.m. UTC | #1
On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
> We want to be sure that the functions that write the child and
> parent list of a bs are under BQL and drain.
>
> BQL prevents from concurrent writings from the GS API, while
> drains protect from I/O.
>
> TODO: drains are missing in some functions using this assert.
> Therefore a proper assertion will fail. Because adding drains
> requires additional discussions, they will be added in future
> series.
>
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block.c                                |  5 +++++
>   block/io.c                             | 11 +++++++++++
>   include/block/block_int-global-state.h | 10 +++++++++-
>   3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/block.c b/block.c
> index 41c5883c5c..94bff5c757 100644
> --- a/block.c
> +++ b/block.c
> @@ -2734,12 +2734,14 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
>           if (child->klass->detach) {
>               child->klass->detach(child);
>           }
> +        assert_bdrv_graph_writable(old_bs);
>           QLIST_REMOVE(child, next_parent);

I think this belongs above the .detach() call (and the QLIST_REMOVE() 
belongs into the .detach() implementation, as done in 
https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
which has been merged to Kevin’s block branch).

>       }
>   
>       child->bs = new_bs;
>   
>       if (new_bs) {
> +        assert_bdrv_graph_writable(new_bs);
>           QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);

In both these places it’s a bit strange that the assertion is done on 
the child nodes.  The subgraph starting from them isn’t modified after 
all, so their subgraph technically doesn’t need to be writable.  I think 
a single assertion on the parent node would be preferable.

I presume the problem with that is that we don’t have the parent node 
here?  Do we need a new BdrvChildClass method that performs this 
assertion on the parent node?

>   
>           /*
> @@ -2940,6 +2942,7 @@ static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,
>           return ret;
>       }
>   
> +    assert_bdrv_graph_writable(parent_bs);
>       QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
>       /*
>        * child is removed in bdrv_attach_child_common_abort(), so don't care to
> @@ -3140,6 +3143,7 @@ static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
>   void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
>   {
>       assert(qemu_in_main_thread());
> +    assert_bdrv_graph_writable(parent);

It looks to me like we have this assertion mainly because 
bdrv_replace_child_noperm() doesn’t have a pointer to this parent node.  
It’s a workaround, but we should have this in every path that eventually 
ends up at bdrv_replace_child_noperm(), and that seems rather difficult 
for the bdrv_replace_node() family of functions. That to me sounds like 
it’d be good to have this as a BdrvChildClass function.

>       if (child == NULL) {
>           return;
>       }
> @@ -4903,6 +4907,7 @@ static void bdrv_remove_filter_or_cow_child_abort(void *opaque)
>       BdrvRemoveFilterOrCowChild *s = opaque;
>       BlockDriverState *parent_bs = s->child->opaque;
>   
> +    assert_bdrv_graph_writable(parent_bs);
>       QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
>       if (s->is_backing) {
>           parent_bs->backing = s->child;
> diff --git a/block/io.c b/block/io.c
> index f271ab3684..1c71e354d6 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -740,6 +740,17 @@ void bdrv_drain_all(void)
>       bdrv_drain_all_end();
>   }
>   
> +void assert_bdrv_graph_writable(BlockDriverState *bs)
> +{
> +    /*
> +     * TODO: this function is incomplete. Because the users of this
> +     * assert lack the necessary drains, check only for BQL.
> +     * Once the necessary drains are added,
> +     * assert also for qatomic_read(&bs->quiesce_counter) > 0
> +     */
> +    assert(qemu_in_main_thread());
> +}
> +
>   /**
>    * Remove an active request from the tracked requests list
>    *
> diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
> index d08e80222c..6bd7746409 100644
> --- a/include/block/block_int-global-state.h
> +++ b/include/block/block_int-global-state.h
> @@ -316,4 +316,12 @@ void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
>    */
>   void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
>   
> -#endif /* BLOCK_INT_GLOBAL_STATE*/
> +/**
> + * Make sure that the function is either running under
> + * drain and BQL. The latter protects from concurrent writings

“either ... and” sounds wrong to me.  I’d drop the “either” or say 
“running under both drain and BQL”.

Hanna

> + * from the GS API, while the former prevents concurrent reads
> + * from I/O.
> + */
> +void assert_bdrv_graph_writable(BlockDriverState *bs);
> +
> +#endif /* BLOCK_INT_GLOBAL_STATE */
Emanuele Giuseppe Esposito Nov. 18, 2021, 9:55 a.m. UTC | #2
On 12/11/2021 15:40, Hanna Reitz wrote:
> On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
>> We want to be sure that the functions that write the child and
>> parent list of a bs are under BQL and drain.
>>
>> BQL prevents from concurrent writings from the GS API, while
>> drains protect from I/O.
>>
>> TODO: drains are missing in some functions using this assert.
>> Therefore a proper assertion will fail. Because adding drains
>> requires additional discussions, they will be added in future
>> series.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block.c                                |  5 +++++
>>   block/io.c                             | 11 +++++++++++
>>   include/block/block_int-global-state.h | 10 +++++++++-
>>   3 files changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/block.c b/block.c
>> index 41c5883c5c..94bff5c757 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2734,12 +2734,14 @@ static void 
>> bdrv_replace_child_noperm(BdrvChild *child,
>>           if (child->klass->detach) {
>>               child->klass->detach(child);
>>           }
>> +        assert_bdrv_graph_writable(old_bs);
>>           QLIST_REMOVE(child, next_parent);
> 
> I think this belongs above the .detach() call (and the QLIST_REMOVE() 
> belongs into the .detach() implementation, as done in 
> https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
> which has been merged to Kevin’s block branch).

Yes, I rebased on kwolf/block branch. Thank you for pointing that out.
> 
>>       }
>>       child->bs = new_bs;
>>       if (new_bs) {
>> +        assert_bdrv_graph_writable(new_bs);
>>           QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
> 
> In both these places it’s a bit strange that the assertion is done on 
> the child nodes.  The subgraph starting from them isn’t modified after 
> all, so their subgraph technically doesn’t need to be writable.  I think 
> a single assertion on the parent node would be preferable.
> 
> I presume the problem with that is that we don’t have the parent node 
> here?  Do we need a new BdrvChildClass method that performs this 
> assertion on the parent node?
> 

Uhm I am not sure what you mean here.

Just to recap on how I see this: the assertion 
assert_bdrv_graph_writable(bs) is basically used to make sure we are 
protecting the write on some fields (childrens and parents lists in this 
patch) of a given @bs. It should work like a rwlock: reading is allowed 
to be concurrent, but a write should stop all readers to prevent 
concurrency issues. This is achieved by draining.

Let's use the first case that you point out, old_bs (it's specular for 
new_bs):

 >> +        assert_bdrv_graph_writable(old_bs);
 >>           QLIST_REMOVE(child, next_parent);

So old_bs should be the child "son" (child->bs), meaning old_bs->parents 
contains the child. Therefore when a child is removed by old_bs, we need 
to be sure we are doing it safely.

So we should check that if old_bs exists, old_bs should be drained, to 
prevent any other iothread from reading the ->parents list that is being 
updated.

The only thing to keep in mind in this case is that just wrapping a 
drain around that won't be enough, because then the child won't be 
included in the drain_end(old_bs). Therefore the right way to cover this 
drain-wise once the assertion also checks for drains is:

drain_begin(old_bs)
assert_bdrv_graph_writable(old_bs)
QLIST_REMOVE(child, next_parent)
/* old_bs will be under drain_end, but not the child */
bdrv_parent_drained_end_single(child);
bdrv_drained_end(old_bs);

I think you agree on this so far.

Now I think your concern is related to the child "parent", namely 
child->opaque. The problem is that in the .detach and .attach callbacks 
we are firstly adding/removing the child from the list, and then calling 
drain on the subtree. We would ideally need to do the opposite:

assert_bdrv_graph_writable(bs);
QLIST_REMOVE(child, next);
bdrv_unapply_subtree_drain(child, bs);

In this case I think this would actually work, because removing/adding 
the child from the ->children list beforehand just prevents an 
additional recursion call (I think, and the fact that tests are passing 
seems to confirm my theory).

Of course you know this stuff better than me, so let me know if 
something here is wrong.

>>           /*
>> @@ -2940,6 +2942,7 @@ static int 
>> bdrv_attach_child_noperm(BlockDriverState *parent_bs,
>>           return ret;
>>       }
>> +    assert_bdrv_graph_writable(parent_bs);
>>       QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
>>       /*
>>        * child is removed in bdrv_attach_child_common_abort(), so 
>> don't care to
>> @@ -3140,6 +3143,7 @@ static void 
>> bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
>>   void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
>>   {
>>       assert(qemu_in_main_thread());
>> +    assert_bdrv_graph_writable(parent);
> 
> It looks to me like we have this assertion mainly because 
> bdrv_replace_child_noperm() doesn’t have a pointer to this parent node. 
> It’s a workaround, but we should have this in every path that eventually 
> ends up at bdrv_replace_child_noperm(), and that seems rather difficult 
> for the bdrv_replace_node() family of functions. That to me sounds like 
> it’d be good to have this as a BdrvChildClass function.

I think this assertion is wrong. There is no ->childrens or ->parents 
manipulation here, it used to be in one of the function that it calls 
internally, but now as you pointed out is moved to .attach and .detach. 
So I will remove this.

Not sure about the BdrvChildClass function, feel free to elaborate more 
if what I wrote above is wrong/does not make sense to you.

Thank you,
Emanuele
> 
>>       if (child == NULL) {
>>           return;
>>       }
>> @@ -4903,6 +4907,7 @@ static void 
>> bdrv_remove_filter_or_cow_child_abort(void *opaque)
>>       BdrvRemoveFilterOrCowChild *s = opaque;
>>       BlockDriverState *parent_bs = s->child->opaque;
>> +    assert_bdrv_graph_writable(parent_bs);
>>       QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
>>       if (s->is_backing) {
>>           parent_bs->backing = s->child;
>> diff --git a/block/io.c b/block/io.c
>> index f271ab3684..1c71e354d6 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -740,6 +740,17 @@ void bdrv_drain_all(void)
>>       bdrv_drain_all_end();
>>   }
>> +void assert_bdrv_graph_writable(BlockDriverState *bs)
>> +{
>> +    /*
>> +     * TODO: this function is incomplete. Because the users of this
>> +     * assert lack the necessary drains, check only for BQL.
>> +     * Once the necessary drains are added,
>> +     * assert also for qatomic_read(&bs->quiesce_counter) > 0
>> +     */
>> +    assert(qemu_in_main_thread());
>> +}
>> +
>>   /**
>>    * Remove an active request from the tracked requests list
>>    *
>> diff --git a/include/block/block_int-global-state.h 
>> b/include/block/block_int-global-state.h
>> index d08e80222c..6bd7746409 100644
>> --- a/include/block/block_int-global-state.h
>> +++ b/include/block/block_int-global-state.h
>> @@ -316,4 +316,12 @@ void 
>> bdrv_remove_aio_context_notifier(BlockDriverState *bs,
>>    */
>>   void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
>> -#endif /* BLOCK_INT_GLOBAL_STATE*/
>> +/**
>> + * Make sure that the function is either running under
>> + * drain and BQL. The latter protects from concurrent writings
> 
> “either ... and” sounds wrong to me.  I’d drop the “either” or say 
> “running under both drain and BQL”.
> 
> Hanna
> 
>> + * from the GS API, while the former prevents concurrent reads
>> + * from I/O.
>> + */
>> +void assert_bdrv_graph_writable(BlockDriverState *bs);
>> +
>> +#endif /* BLOCK_INT_GLOBAL_STATE */
>
Emanuele Giuseppe Esposito Nov. 18, 2021, 10:24 a.m. UTC | #3
On 18/11/2021 10:55, Emanuele Giuseppe Esposito wrote:
> 
> On 12/11/2021 15:40, Hanna Reitz wrote:
>> On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
>>> We want to be sure that the functions that write the child and
>>> parent list of a bs are under BQL and drain.
>>>
>>> BQL prevents from concurrent writings from the GS API, while
>>> drains protect from I/O.
>>>
>>> TODO: drains are missing in some functions using this assert.
>>> Therefore a proper assertion will fail. Because adding drains
>>> requires additional discussions, they will be added in future
>>> series.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>   block.c                                |  5 +++++
>>>   block/io.c                             | 11 +++++++++++
>>>   include/block/block_int-global-state.h | 10 +++++++++-
>>>   3 files changed, 25 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 41c5883c5c..94bff5c757 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -2734,12 +2734,14 @@ static void 
>>> bdrv_replace_child_noperm(BdrvChild *child,
>>>           if (child->klass->detach) {
>>>               child->klass->detach(child);
>>>           }
>>> +        assert_bdrv_graph_writable(old_bs);
>>>           QLIST_REMOVE(child, next_parent);
>>
>> I think this belongs above the .detach() call (and the QLIST_REMOVE() 
>> belongs into the .detach() implementation, as done in 
>> https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, which 
>> has been merged to Kevin’s block branch).
> 
> Yes, I rebased on kwolf/block branch. Thank you for pointing that out.
>>
>>>       }
>>>       child->bs = new_bs;
>>>       if (new_bs) {
>>> +        assert_bdrv_graph_writable(new_bs);
>>>           QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>>
>> In both these places it’s a bit strange that the assertion is done on 
>> the child nodes.  The subgraph starting from them isn’t modified after 
>> all, so their subgraph technically doesn’t need to be writable.  I 
>> think a single assertion on the parent node would be preferable.
>>
>> I presume the problem with that is that we don’t have the parent node 
>> here?  Do we need a new BdrvChildClass method that performs this 
>> assertion on the parent node?
>>
> 
> Uhm I am not sure what you mean here.
> 
> Just to recap on how I see this: the assertion 
> assert_bdrv_graph_writable(bs) is basically used to make sure we are 
> protecting the write on some fields (childrens and parents lists in this 
> patch) of a given @bs. It should work like a rwlock: reading is allowed 
> to be concurrent, but a write should stop all readers to prevent 
> concurrency issues. This is achieved by draining.

I am thinking to add an additional explanation to 
assert_bdrv_graph_writable header comment by saying
"Drains act as a rwlock: while reading is allowed to be concurrent from 
all iothreads, when a write needs to be performed we need to stop 
(drain) all involved iothreads from reading the graph, to avoid race 
conditions."

Somethink like that.

Emanuele
> 
> Let's use the first case that you point out, old_bs (it's specular for 
> new_bs):
> 
>  >> +        assert_bdrv_graph_writable(old_bs);
>  >>           QLIST_REMOVE(child, next_parent);
> 
> So old_bs should be the child "son" (child->bs), meaning old_bs->parents 
> contains the child. Therefore when a child is removed by old_bs, we need 
> to be sure we are doing it safely.
> 
> So we should check that if old_bs exists, old_bs should be drained, to 
> prevent any other iothread from reading the ->parents list that is being 
> updated.
> 
> The only thing to keep in mind in this case is that just wrapping a 
> drain around that won't be enough, because then the child won't be 
> included in the drain_end(old_bs). Therefore the right way to cover this 
> drain-wise once the assertion also checks for drains is:
> 
> drain_begin(old_bs)
> assert_bdrv_graph_writable(old_bs)
> QLIST_REMOVE(child, next_parent)
> /* old_bs will be under drain_end, but not the child */
> bdrv_parent_drained_end_single(child);
> bdrv_drained_end(old_bs);
> 
> I think you agree on this so far.
> 
> Now I think your concern is related to the child "parent", namely 
> child->opaque. The problem is that in the .detach and .attach callbacks 
> we are firstly adding/removing the child from the list, and then calling 
> drain on the subtree. We would ideally need to do the opposite:
> 
> assert_bdrv_graph_writable(bs);
> QLIST_REMOVE(child, next);
> bdrv_unapply_subtree_drain(child, bs);
> 
> In this case I think this would actually work, because removing/adding 
> the child from the ->children list beforehand just prevents an 
> additional recursion call (I think, and the fact that tests are passing 
> seems to confirm my theory).
> 
> Of course you know this stuff better than me, so let me know if 
> something here is wrong.
> 
>>>           /*
>>> @@ -2940,6 +2942,7 @@ static int 
>>> bdrv_attach_child_noperm(BlockDriverState *parent_bs,
>>>           return ret;
>>>       }
>>> +    assert_bdrv_graph_writable(parent_bs);
>>>       QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
>>>       /*
>>>        * child is removed in bdrv_attach_child_common_abort(), so 
>>> don't care to
>>> @@ -3140,6 +3143,7 @@ static void 
>>> bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
>>>   void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
>>>   {
>>>       assert(qemu_in_main_thread());
>>> +    assert_bdrv_graph_writable(parent);
>>
>> It looks to me like we have this assertion mainly because 
>> bdrv_replace_child_noperm() doesn’t have a pointer to this parent 
>> node. It’s a workaround, but we should have this in every path that 
>> eventually ends up at bdrv_replace_child_noperm(), and that seems 
>> rather difficult for the bdrv_replace_node() family of functions. That 
>> to me sounds like it’d be good to have this as a BdrvChildClass function.
> 
> I think this assertion is wrong. There is no ->childrens or ->parents 
> manipulation here, it used to be in one of the function that it calls 
> internally, but now as you pointed out is moved to .attach and .detach. 
> So I will remove this.
> 
> Not sure about the BdrvChildClass function, feel free to elaborate more 
> if what I wrote above is wrong/does not make sense to you.
> 
> Thank you,
> Emanuele
>>
>>>       if (child == NULL) {
>>>           return;
>>>       }
>>> @@ -4903,6 +4907,7 @@ static void 
>>> bdrv_remove_filter_or_cow_child_abort(void *opaque)
>>>       BdrvRemoveFilterOrCowChild *s = opaque;
>>>       BlockDriverState *parent_bs = s->child->opaque;
>>> +    assert_bdrv_graph_writable(parent_bs);
>>>       QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
>>>       if (s->is_backing) {
>>>           parent_bs->backing = s->child;
>>> diff --git a/block/io.c b/block/io.c
>>> index f271ab3684..1c71e354d6 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -740,6 +740,17 @@ void bdrv_drain_all(void)
>>>       bdrv_drain_all_end();
>>>   }
>>> +void assert_bdrv_graph_writable(BlockDriverState *bs)
>>> +{
>>> +    /*
>>> +     * TODO: this function is incomplete. Because the users of this
>>> +     * assert lack the necessary drains, check only for BQL.
>>> +     * Once the necessary drains are added,
>>> +     * assert also for qatomic_read(&bs->quiesce_counter) > 0
>>> +     */
>>> +    assert(qemu_in_main_thread());
>>> +}
>>> +
>>>   /**
>>>    * Remove an active request from the tracked requests list
>>>    *
>>> diff --git a/include/block/block_int-global-state.h 
>>> b/include/block/block_int-global-state.h
>>> index d08e80222c..6bd7746409 100644
>>> --- a/include/block/block_int-global-state.h
>>> +++ b/include/block/block_int-global-state.h
>>> @@ -316,4 +316,12 @@ void 
>>> bdrv_remove_aio_context_notifier(BlockDriverState *bs,
>>>    */
>>>   void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
>>> -#endif /* BLOCK_INT_GLOBAL_STATE*/
>>> +/**
>>> + * Make sure that the function is either running under
>>> + * drain and BQL. The latter protects from concurrent writings
>>
>> “either ... and” sounds wrong to me.  I’d drop the “either” or say 
>> “running under both drain and BQL”.
>>
>> Hanna
>>
>>> + * from the GS API, while the former prevents concurrent reads
>>> + * from I/O.
>>> + */
>>> +void assert_bdrv_graph_writable(BlockDriverState *bs);
>>> +
>>> +#endif /* BLOCK_INT_GLOBAL_STATE */
>>
Hanna Czenczek Nov. 18, 2021, 3:17 p.m. UTC | #4
On 18.11.21 10:55, Emanuele Giuseppe Esposito wrote:
>
> On 12/11/2021 15:40, Hanna Reitz wrote:
>> On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
>>> We want to be sure that the functions that write the child and
>>> parent list of a bs are under BQL and drain.
>>>
>>> BQL prevents from concurrent writings from the GS API, while
>>> drains protect from I/O.
>>>
>>> TODO: drains are missing in some functions using this assert.
>>> Therefore a proper assertion will fail. Because adding drains
>>> requires additional discussions, they will be added in future
>>> series.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>   block.c                                |  5 +++++
>>>   block/io.c                             | 11 +++++++++++
>>>   include/block/block_int-global-state.h | 10 +++++++++-
>>>   3 files changed, 25 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 41c5883c5c..94bff5c757 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -2734,12 +2734,14 @@ static void 
>>> bdrv_replace_child_noperm(BdrvChild *child,
>>>           if (child->klass->detach) {
>>>               child->klass->detach(child);
>>>           }
>>> +        assert_bdrv_graph_writable(old_bs);
>>>           QLIST_REMOVE(child, next_parent);
>>
>> I think this belongs above the .detach() call (and the QLIST_REMOVE() 
>> belongs into the .detach() implementation, as done in 
>> https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
>> which has been merged to Kevin’s block branch).
>
> Yes, I rebased on kwolf/block branch. Thank you for pointing that out.
>>
>>>       }
>>>       child->bs = new_bs;
>>>       if (new_bs) {
>>> +        assert_bdrv_graph_writable(new_bs);
>>>           QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>>
>> In both these places it’s a bit strange that the assertion is done on 
>> the child nodes.  The subgraph starting from them isn’t modified 
>> after all, so their subgraph technically doesn’t need to be 
>> writable.  I think a single assertion on the parent node would be 
>> preferable.
>>
>> I presume the problem with that is that we don’t have the parent node 
>> here?  Do we need a new BdrvChildClass method that performs this 
>> assertion on the parent node?
>>
>
> Uhm I am not sure what you mean here.
>
> Just to recap on how I see this: the assertion 
> assert_bdrv_graph_writable(bs) is basically used to make sure we are 
> protecting the write on some fields (childrens and parents lists in 
> this patch) of a given @bs.

Oh, OK.  I understood it to mean that the subgraph starting at `bs` is 
mutable, i.e. including all of its recursive children.

And yes, you’re right, the child BDSs are indeed modified, too, so we 
should check them, too.

> It should work like a rwlock: reading is allowed to be concurrent, but 
> a write should stop all readers to prevent concurrency issues. This is 
> achieved by draining.

Draining works on a subgraph, so I suppose it does mean that the whole 
subgraph will be mutable.  But no matter, at least new_bs is not in the 
parent’s subgraph, so it wouldn’t be included in the check if we only 
checked the parent.

> Let's use the first case that you point out, old_bs (it's specular for 
> new_bs):
>
> >> +        assert_bdrv_graph_writable(old_bs);
> >>           QLIST_REMOVE(child, next_parent);
>
> So old_bs should be the child "son" (child->bs), meaning 
> old_bs->parents contains the child. Therefore when a child is removed 
> by old_bs, we need to be sure we are doing it safely.
>
> So we should check that if old_bs exists, old_bs should be drained, to 
> prevent any other iothread from reading the ->parents list that is 
> being updated.
>
> The only thing to keep in mind in this case is that just wrapping a 
> drain around that won't be enough, because then the child won't be 
> included in the drain_end(old_bs). Therefore the right way to cover 
> this drain-wise once the assertion also checks for drains is:
>
> drain_begin(old_bs)
> assert_bdrv_graph_writable(old_bs)
> QLIST_REMOVE(child, next_parent)
> /* old_bs will be under drain_end, but not the child */
> bdrv_parent_drained_end_single(child);
> bdrv_drained_end(old_bs);
>
> I think you agree on this so far.
>
> Now I think your concern is related to the child "parent", namely 
> child->opaque. The problem is that in the .detach and .attach 
> callbacks we are firstly adding/removing the child from the list, and 
> then calling drain on the subtree.

It was my impression that you’d want bdrv_replace_child_noperm() to 
always be called in a section where the subgraph starting from the 
parent BDS is drained, not just the child BDSs that are swapped (old_bs 
and new_bs).

My abstract concern is that bdrv_replace_child_noperm() does not modify 
only old_bs and new_bs, but actually modifies the whole subgraph 
starting at the parent node.  A concrete example for this is that we 
modify not only the children’s parent lists, but also the parent’s 
children list.

That’s why I’m asking why we only check that the graph is writable for 
the children, when actually I feel like we’re modifying the parent, too.

> We would ideally need to do the opposite:
>
> assert_bdrv_graph_writable(bs);
> QLIST_REMOVE(child, next);
> bdrv_unapply_subtree_drain(child, bs);
>
> In this case I think this would actually work, because removing/adding 
> the child from the ->children list beforehand just prevents an 
> additional recursion call (I think, and the fact that tests are 
> passing seems to confirm my theory).
>
> Of course you know this stuff better than me, so let me know if 
> something here is wrong.

Well.  I’m mostly wondering why you’re discussing how to do the drain 
right when I was mostly curious about why we check the children and not 
the parent for whether the graph is mutable at their respective 
position. O:)

It was my impression that so far we mostly wrapped graph change 
operations in drained sections (starting at the parent) and not leave it 
to bdrv_replace_child_noperm() to do so.  That function only deals with 
drain stuff because it balances the drain counters on old_bs and new_bs 
to match the counter on the parent’s subgraph.

Hanna
Emanuele Giuseppe Esposito Nov. 19, 2021, 8:55 a.m. UTC | #5
On 18/11/2021 16:17, Hanna Reitz wrote:
> On 18.11.21 10:55, Emanuele Giuseppe Esposito wrote:
>>
>> On 12/11/2021 15:40, Hanna Reitz wrote:
>>> On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
>>>> We want to be sure that the functions that write the child and
>>>> parent list of a bs are under BQL and drain.
>>>>
>>>> BQL prevents from concurrent writings from the GS API, while
>>>> drains protect from I/O.
>>>>
>>>> TODO: drains are missing in some functions using this assert.
>>>> Therefore a proper assertion will fail. Because adding drains
>>>> requires additional discussions, they will be added in future
>>>> series.
>>>>
>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>> ---
>>>>   block.c                                |  5 +++++
>>>>   block/io.c                             | 11 +++++++++++
>>>>   include/block/block_int-global-state.h | 10 +++++++++-
>>>>   3 files changed, 25 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 41c5883c5c..94bff5c757 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -2734,12 +2734,14 @@ static void 
>>>> bdrv_replace_child_noperm(BdrvChild *child,
>>>>           if (child->klass->detach) {
>>>>               child->klass->detach(child);
>>>>           }
>>>> +        assert_bdrv_graph_writable(old_bs);
>>>>           QLIST_REMOVE(child, next_parent);
>>>
>>> I think this belongs above the .detach() call (and the QLIST_REMOVE() 
>>> belongs into the .detach() implementation, as done in 
>>> https://lists.nongnu.org/archive/html/qemu-block/2021-11/msg00240.html, 
>>> which has been merged to Kevin’s block branch).
>>
>> Yes, I rebased on kwolf/block branch. Thank you for pointing that out.
>>>
>>>>       }
>>>>       child->bs = new_bs;
>>>>       if (new_bs) {
>>>> +        assert_bdrv_graph_writable(new_bs);
>>>>           QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>>>
>>> In both these places it’s a bit strange that the assertion is done on 
>>> the child nodes.  The subgraph starting from them isn’t modified 
>>> after all, so their subgraph technically doesn’t need to be 
>>> writable.  I think a single assertion on the parent node would be 
>>> preferable.
>>>
>>> I presume the problem with that is that we don’t have the parent node 
>>> here?  Do we need a new BdrvChildClass method that performs this 
>>> assertion on the parent node?
>>>
>>
>> Uhm I am not sure what you mean here.
>>
>> Just to recap on how I see this: the assertion 
>> assert_bdrv_graph_writable(bs) is basically used to make sure we are 
>> protecting the write on some fields (childrens and parents lists in 
>> this patch) of a given @bs.
> 
> Oh, OK.  I understood it to mean that the subgraph starting at `bs` is 
> mutable, i.e. including all of its recursive children.

Yes, sorry for the confusion. We want to drain the subgraph starting at 
`bs`, also because we modify both parent's .children list and children's 
.parent list.

> 
> And yes, you’re right, the child BDSs are indeed modified, too, so we 
> should check them, too.
> 
>> It should work like a rwlock: reading is allowed to be concurrent, but 
>> a write should stop all readers to prevent concurrency issues. This is 
>> achieved by draining.
> 
> Draining works on a subgraph, so I suppose it does mean that the whole 
> subgraph will be mutable.  But no matter, at least new_bs is not in the 
> parent’s subgraph, so it wouldn’t be included in the check if we only 
> checked the parent.
> 
>> Let's use the first case that you point out, old_bs (it's specular for 
>> new_bs):
>>
>> >> +        assert_bdrv_graph_writable(old_bs);
>> >>           QLIST_REMOVE(child, next_parent);
>>
>> So old_bs should be the child "son" (child->bs), meaning 
>> old_bs->parents contains the child. Therefore when a child is removed 
>> by old_bs, we need to be sure we are doing it safely.
>>
>> So we should check that if old_bs exists, old_bs should be drained, to 
>> prevent any other iothread from reading the ->parents list that is 
>> being updated.
>>
>> The only thing to keep in mind in this case is that just wrapping a 
>> drain around that won't be enough, because then the child won't be 
>> included in the drain_end(old_bs). Therefore the right way to cover 
>> this drain-wise once the assertion also checks for drains is:
>>
>> drain_begin(old_bs)
>> assert_bdrv_graph_writable(old_bs)
>> QLIST_REMOVE(child, next_parent)
>> /* old_bs will be under drain_end, but not the child */
>> bdrv_parent_drained_end_single(child);
>> bdrv_drained_end(old_bs);
>>
>> I think you agree on this so far.
>>
>> Now I think your concern is related to the child "parent", namely 
>> child->opaque. The problem is that in the .detach and .attach 
>> callbacks we are firstly adding/removing the child from the list, and 
>> then calling drain on the subtree.
> 
> It was my impression that you’d want bdrv_replace_child_noperm() to 
> always be called in a section where the subgraph starting from the 
> parent BDS is drained, not just the child BDSs that are swapped (old_bs 
> and new_bs).
> 
> My abstract concern is that bdrv_replace_child_noperm() does not modify 
> only old_bs and new_bs, but actually modifies the whole subgraph 
> starting at the parent node.  A concrete example for this is that we 
> modify not only the children’s parent lists, but also the parent’s 
> children list.
> 
> That’s why I’m asking why we only check that the graph is writable for 
> the children, when actually I feel like we’re modifying the parent, too.

Ok I think I understood what you mean, and I think I addressed this in 
the new series version but was not addressed here.

Maybe what I do is redundant, but I:
1) drain the childrens when we swap them
2) modify .attach and .detach to drain child->opaque (parent) too.
More precisely, I think it should be enough to change 
bdrv_child_cb_attach/detach in the way I showed below, since we are 
touching the parent's .children list only there.
blk_root_attach/detach seem not to deal with its children, so a drain 
there is not necessary.

This should cover everything we need.

> 
>> We would ideally need to do the opposite:
>>
>> assert_bdrv_graph_writable(bs);
>> QLIST_REMOVE(child, next);
>> bdrv_unapply_subtree_drain(child, bs);
>>
>> In this case I think this would actually work, because removing/adding 
>> the child from the ->children list beforehand just prevents an 
>> additional recursion call (I think, and the fact that tests are 
>> passing seems to confirm my theory).
>>
>> Of course you know this stuff better than me, so let me know if 
>> something here is wrong.
> 
> Well.  I’m mostly wondering why you’re discussing how to do the drain 
> right when I was mostly curious about why we check the children and not 
> the parent for whether the graph is mutable at their respective 
> position. O:)
> 
> It was my impression that so far we mostly wrapped graph change 
> operations in drained sections (starting at the parent) and not leave it 
> to bdrv_replace_child_noperm() to do so.  That function only deals with 
> drain stuff because it balances the drain counters on old_bs and new_bs 
> to match the counter on the parent’s subgraph.

"mostly" is the problem, because if you try to use the full assertion 
introduced in this patch (ie assert also for drains), it will fail.

So I figured I would (will) put the drains as near as possible to the 
section where they are needed, since bdrv_replace_child_noperm is being 
called in many different ways. But this is a discussion for a future series.

Emanuele
diff mbox series

Patch

diff --git a/block.c b/block.c
index 41c5883c5c..94bff5c757 100644
--- a/block.c
+++ b/block.c
@@ -2734,12 +2734,14 @@  static void bdrv_replace_child_noperm(BdrvChild *child,
         if (child->klass->detach) {
             child->klass->detach(child);
         }
+        assert_bdrv_graph_writable(old_bs);
         QLIST_REMOVE(child, next_parent);
     }
 
     child->bs = new_bs;
 
     if (new_bs) {
+        assert_bdrv_graph_writable(new_bs);
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
 
         /*
@@ -2940,6 +2942,7 @@  static int bdrv_attach_child_noperm(BlockDriverState *parent_bs,
         return ret;
     }
 
+    assert_bdrv_graph_writable(parent_bs);
     QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
     /*
      * child is removed in bdrv_attach_child_common_abort(), so don't care to
@@ -3140,6 +3143,7 @@  static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
 void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
 {
     assert(qemu_in_main_thread());
+    assert_bdrv_graph_writable(parent);
     if (child == NULL) {
         return;
     }
@@ -4903,6 +4907,7 @@  static void bdrv_remove_filter_or_cow_child_abort(void *opaque)
     BdrvRemoveFilterOrCowChild *s = opaque;
     BlockDriverState *parent_bs = s->child->opaque;
 
+    assert_bdrv_graph_writable(parent_bs);
     QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
     if (s->is_backing) {
         parent_bs->backing = s->child;
diff --git a/block/io.c b/block/io.c
index f271ab3684..1c71e354d6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -740,6 +740,17 @@  void bdrv_drain_all(void)
     bdrv_drain_all_end();
 }
 
+void assert_bdrv_graph_writable(BlockDriverState *bs)
+{
+    /*
+     * TODO: this function is incomplete. Because the users of this
+     * assert lack the necessary drains, check only for BQL.
+     * Once the necessary drains are added,
+     * assert also for qatomic_read(&bs->quiesce_counter) > 0
+     */
+    assert(qemu_in_main_thread());
+}
+
 /**
  * Remove an active request from the tracked requests list
  *
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index d08e80222c..6bd7746409 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -316,4 +316,12 @@  void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
  */
 void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
 
-#endif /* BLOCK_INT_GLOBAL_STATE*/
+/**
+ * Make sure that the function is either running under
+ * drain and BQL. The latter protects from concurrent writings
+ * from the GS API, while the former prevents concurrent reads
+ * from I/O.
+ */
+void assert_bdrv_graph_writable(BlockDriverState *bs);
+
+#endif /* BLOCK_INT_GLOBAL_STATE */