diff mbox series

[v8,2/7] copy-on-read: add filter append/drop functions

Message ID 1598633579-221780-3-git-send-email-andrey.shinkevich@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series Apply COR-filter to the block-stream permanently | expand

Commit Message

Zhijian Li (Fujitsu)" via Aug. 28, 2020, 4:52 p.m. UTC
Provide API for the COR-filter insertion/removal.
Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-on-read.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/copy-on-read.h |  35 +++++++++++++++++
 2 files changed, 139 insertions(+)
 create mode 100644 block/copy-on-read.h

Comments

Max Reitz Sept. 4, 2020, 11:22 a.m. UTC | #1
On 28.08.20 18:52, Andrey Shinkevich wrote:
> Provide API for the COR-filter insertion/removal.

Hm.  Why?

I see the implementation is just bdrv_open() + bdrv_replace_node(),
which is what I would have expected.  Why can’t the caller just do that?

Or maybe it would even make sense to just make it a generic block driver
function, like

BlockDriverState *
bdrv_insert_node(BlockDriverState *bs, const char *node_driver,
                 const char *node_name, QDict *node_options,
                 Error **errp);

?

(Similarly for dropping a node from a chain.)

> Also, drop the filter child permissions for an inactive state when the
> filter node is being removed.

Do we need .active for that?  Shouldn’t it be sufficient to just not
require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
node (i.e. perm == 0 in cor_child_perm())?

Of course, using an .active field works, too.  But Vladimir wanted a
similar field in the preallocate filter, and there already is in
backup-top.  I feel like there shouldn’t be a need for this.

.bdrv_child_perm() should generally be able to decide what permissions
it needs based on the information it gets.  If every driver needs to
keep track of whether it has any parents and feed that information into
.bdrv_child_perm() as external state, then something feels wrong.

If perm == 0 is not sufficient to rule out any parents, we should just
explicitly add a boolean that tells .bdrv_child_perm() whether there are
any parents or not – shouldn’t be too difficult.

> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/copy-on-read.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/copy-on-read.h |  35 +++++++++++++++++
>  2 files changed, 139 insertions(+)
>  create mode 100644 block/copy-on-read.h

[...]

>  block_init(bdrv_copy_on_read_init);
> diff --git a/block/copy-on-read.h b/block/copy-on-read.h
> new file mode 100644
> index 0000000..1686e4e
> --- /dev/null
> +++ b/block/copy-on-read.h
> @@ -0,0 +1,35 @@
> +/*
> + * Copy-on-read filter block driver
> + *
> + * The filter driver performs Copy-On-Read (COR) operations
> + *
> + * Copyright (c) 2018-2020 Virtuozzo International GmbH.
> + *
> + * Author:
> + *   Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef COPY_ON_READ_FILTER
> +#define COPY_ON_READ_FILTER

Bit of a weird include guard, seeing that most header files in qemu
(certainly under block/) base their name on their filename.  So this
would be BLOCK_COPY_ON_READ_H (or COPY_ON_READ_H).

(It’s just that COPY_ON_READ_FILTER kind of sounds like a configured
option, i.e. whether the filter is present or not.)

Max

> +
> +#include "block/block_int.h"
> +
> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
> +                                         const char *filter_node_name,
> +                                         Error **errp);
> +void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs);
> +
> +#endif /* COPY_ON_READ_FILTER */
>
Andrey Shinkevich Sept. 17, 2020, 4:09 p.m. UTC | #2
On 04.09.2020 14:22, Max Reitz wrote:
> On 28.08.20 18:52, Andrey Shinkevich wrote:
>> Provide API for the COR-filter insertion/removal.
...
>
>> Also, drop the filter child permissions for an inactive state when the
>> filter node is being removed.
> Do we need .active for that?  Shouldn’t it be sufficient to just not
> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
> node (i.e. perm == 0 in cor_child_perm())?
>
> Of course, using an .active field works, too.  But Vladimir wanted a
> similar field in the preallocate filter, and there already is in
> backup-top.  I feel like there shouldn’t be a need for this.
>
> .bdrv_child_perm() should generally be able to decide what permissions
> it needs based on the information it gets.  If every driver needs to
> keep track of whether it has any parents and feed that information into
> .bdrv_child_perm() as external state, then something feels wrong.
>
> If perm == 0 is not sufficient to rule out any parents, we should just
> explicitly add a boolean that tells .bdrv_child_perm() whether there are
> any parents or not – shouldn’t be too difficult.


The issue is that we fail in the bdrv_check_update_perm() with 
""Conflicts with use by..." if the *nperm = 0 and the *nshared = 
BLK_PERM_ALL are not set before the call to the bdrv_replace_node(). The 
filter is still in the backing chain by that moment and has a parent 
with child->perm != 0.

The implementation of  the .bdrv_child_perm()in the given patch is 
similar to one in the block/mirror.c.

How could we resolve the issue at the generic layer?


Andrey


>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/copy-on-read.h |  35 +++++++++++++++++
>>   2 files changed, 139 insertions(+)
>>   create mode 100644 block/copy-on-read.h
> [...]
Vladimir Sementsov-Ogievskiy Sept. 23, 2020, 2:38 p.m. UTC | #3
17.09.2020 19:09, Andrey Shinkevich wrote:
> On 04.09.2020 14:22, Max Reitz wrote:
>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>> Provide API for the COR-filter insertion/removal.
> ...
>>> Also, drop the filter child permissions for an inactive state when the
>>> filter node is being removed.
>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>> node (i.e. perm == 0 in cor_child_perm())?
>>
>> Of course, using an .active field works, too.  But Vladimir wanted a
>> similar field in the preallocate filter, and there already is in
>> backup-top.  I feel like there shouldn’t be a need for this.
>>
>> .bdrv_child_perm() should generally be able to decide what permissions
>> it needs based on the information it gets.  If every driver needs to
>> keep track of whether it has any parents and feed that information into
>> .bdrv_child_perm() as external state, then something feels wrong.
>>
>> If perm == 0 is not sufficient to rule out any parents, we should just
>> explicitly add a boolean that tells .bdrv_child_perm() whether there are
>> any parents or not – shouldn’t be too difficult.
> 
> 
> The issue is that we fail in the bdrv_check_update_perm() with ""Conflicts with use by..." if the *nperm = 0 and the *nshared = BLK_PERM_ALL are not set before the call to the bdrv_replace_node(). The filter is still in the backing chain by that moment and has a parent with child->perm != 0.
> 
> The implementation of  the .bdrv_child_perm()in the given patch is similar to one in the block/mirror.c.
> 
> How could we resolve the issue at the generic layer?
> 
> 

The problem is that when we update permissions in bdrv_replace_node, we consider new placement for "to" node, but old placement for "from" node. So, during update, we may consider stricter permission requirements for "from" than needed and they conflict with new "to" permissions.

We should consider all graph changes for permission update simultaneously, in same transaction. For this, we need refactor permission update system to work on queue of nodes instead of simple DFS recursion. And in the queue all the nodes to update should  be toplogically sorted. In this way, when we update node N, all its parents are already updated. And of course, we should make no-perm graph update before permission update, and rollback no-perm movement if permission update failed.

And we need topological sort anyway. Consider the following example:

A -
|  \
|  v
|  B
|  |
v  /
C<-

A is parent for B and C, B is parent for C.

Obviously, to update permissions, we should go in order A B C, so, when we update C, all it's parents permission already updated. But with current approach (simple recursion) we can update in sequence A C B C (C is updated twice). On first update of C, we consider old B permissions, so doing wrong thing. If it succeed, all is OK, on second C update we will finish with correct graph. But if the wrong thing failed, we break the whole process for no reason (it's possible that updated B permission will be less strict, but we will never check it).

I'll work on a patch for it.
Max Reitz Sept. 24, 2020, 1:25 p.m. UTC | #4
On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
> 17.09.2020 19:09, Andrey Shinkevich wrote:
>> On 04.09.2020 14:22, Max Reitz wrote:
>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>> Provide API for the COR-filter insertion/removal.
>> ...
>>>> Also, drop the filter child permissions for an inactive state when the
>>>> filter node is being removed.
>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>> node (i.e. perm == 0 in cor_child_perm())?
>>>
>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>> similar field in the preallocate filter, and there already is in
>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>
>>> .bdrv_child_perm() should generally be able to decide what permissions
>>> it needs based on the information it gets.  If every driver needs to
>>> keep track of whether it has any parents and feed that information into
>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>
>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>> explicitly add a boolean that tells .bdrv_child_perm() whether there are
>>> any parents or not – shouldn’t be too difficult.
>>
>>
>> The issue is that we fail in the bdrv_check_update_perm() with
>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>> The filter is still in the backing chain by that moment and has a
>> parent with child->perm != 0.
>>
>> The implementation of  the .bdrv_child_perm()in the given patch is
>> similar to one in the block/mirror.c.
>>
>> How could we resolve the issue at the generic layer?
>>
>>
> 
> The problem is that when we update permissions in bdrv_replace_node, we
> consider new placement for "to" node, but old placement for "from" node.
> So, during update, we may consider stricter permission requirements for
> "from" than needed and they conflict with new "to" permissions.
> 
> We should consider all graph changes for permission update
> simultaneously, in same transaction. For this, we need refactor
> permission update system to work on queue of nodes instead of simple DFS
> recursion. And in the queue all the nodes to update should  be
> toplogically sorted. In this way, when we update node N, all its parents
> are already updated. And of course, we should make no-perm graph update
> before permission update, and rollback no-perm movement if permission
> update failed.

OK, you’ve convinced me, .active is good enough for now. O:)

> And we need topological sort anyway. Consider the following example:
> 
> A -
> |  \
> |  v
> |  B
> |  |
> v  /
> C<-
> 
> A is parent for B and C, B is parent for C.
> 
> Obviously, to update permissions, we should go in order A B C, so, when
> we update C, all it's parents permission already updated. But with
> current approach (simple recursion) we can update in sequence A C B C (C
> is updated twice). On first update of C, we consider old B permissions,
> so doing wrong thing. If it succeed, all is OK, on second C update we
> will finish with correct graph. But if the wrong thing failed, we break
> the whole process for no reason (it's possible that updated B permission
> will be less strict, but we will never check it).

True.

> I'll work on a patch for it.

Sounds great, though I fear for the complexity.  I’ll just wait and see
for now.

Max
Vladimir Sementsov-Ogievskiy Sept. 24, 2020, 2:51 p.m. UTC | #5
24.09.2020 16:25, Max Reitz wrote:
> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>> Provide API for the COR-filter insertion/removal.
>>> ...
>>>>> Also, drop the filter child permissions for an inactive state when the
>>>>> filter node is being removed.
>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>
>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>> similar field in the preallocate filter, and there already is in
>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>
>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>> it needs based on the information it gets.  If every driver needs to
>>>> keep track of whether it has any parents and feed that information into
>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>
>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>> explicitly add a boolean that tells .bdrv_child_perm() whether there are
>>>> any parents or not – shouldn’t be too difficult.
>>>
>>>
>>> The issue is that we fail in the bdrv_check_update_perm() with
>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>> The filter is still in the backing chain by that moment and has a
>>> parent with child->perm != 0.
>>>
>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>> similar to one in the block/mirror.c.
>>>
>>> How could we resolve the issue at the generic layer?
>>>
>>>
>>
>> The problem is that when we update permissions in bdrv_replace_node, we
>> consider new placement for "to" node, but old placement for "from" node.
>> So, during update, we may consider stricter permission requirements for
>> "from" than needed and they conflict with new "to" permissions.
>>
>> We should consider all graph changes for permission update
>> simultaneously, in same transaction. For this, we need refactor
>> permission update system to work on queue of nodes instead of simple DFS
>> recursion. And in the queue all the nodes to update should  be
>> toplogically sorted. In this way, when we update node N, all its parents
>> are already updated. And of course, we should make no-perm graph update
>> before permission update, and rollback no-perm movement if permission
>> update failed.
> 
> OK, you’ve convinced me, .active is good enough for now. O:)
> 
>> And we need topological sort anyway. Consider the following example:
>>
>> A -
>> |  \
>> |  v
>> |  B
>> |  |
>> v  /
>> C<-
>>
>> A is parent for B and C, B is parent for C.
>>
>> Obviously, to update permissions, we should go in order A B C, so, when
>> we update C, all it's parents permission already updated. But with
>> current approach (simple recursion) we can update in sequence A C B C (C
>> is updated twice). On first update of C, we consider old B permissions,
>> so doing wrong thing. If it succeed, all is OK, on second C update we
>> will finish with correct graph. But if the wrong thing failed, we break
>> the whole process for no reason (it's possible that updated B permission
>> will be less strict, but we will never check it).
> 
> True.
> 
>> I'll work on a patch for it.
> 
> Sounds great, though I fear for the complexity.  I’ll just wait and see
> for now.
> 

If you are OK with .active for now, then I think, Andrey can resend with
.active and I'll dig into permissions in parallel. If Andrey's series
go first, I'll just drop .active later if my idea works.
Max Reitz Sept. 24, 2020, 3 p.m. UTC | #6
On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
> 24.09.2020 16:25, Max Reitz wrote:
>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>> Provide API for the COR-filter insertion/removal.
>>>> ...
>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>> the
>>>>>> filter node is being removed.
>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>
>>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>>> similar field in the preallocate filter, and there already is in
>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>
>>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>>> it needs based on the information it gets.  If every driver needs to
>>>>> keep track of whether it has any parents and feed that information
>>>>> into
>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>
>>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>> there are
>>>>> any parents or not – shouldn’t be too difficult.
>>>>
>>>>
>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>> The filter is still in the backing chain by that moment and has a
>>>> parent with child->perm != 0.
>>>>
>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>> similar to one in the block/mirror.c.
>>>>
>>>> How could we resolve the issue at the generic layer?
>>>>
>>>>
>>>
>>> The problem is that when we update permissions in bdrv_replace_node, we
>>> consider new placement for "to" node, but old placement for "from" node.
>>> So, during update, we may consider stricter permission requirements for
>>> "from" than needed and they conflict with new "to" permissions.
>>>
>>> We should consider all graph changes for permission update
>>> simultaneously, in same transaction. For this, we need refactor
>>> permission update system to work on queue of nodes instead of simple DFS
>>> recursion. And in the queue all the nodes to update should  be
>>> toplogically sorted. In this way, when we update node N, all its parents
>>> are already updated. And of course, we should make no-perm graph update
>>> before permission update, and rollback no-perm movement if permission
>>> update failed.
>>
>> OK, you’ve convinced me, .active is good enough for now. O:)
>>
>>> And we need topological sort anyway. Consider the following example:
>>>
>>> A -
>>> |  \
>>> |  v
>>> |  B
>>> |  |
>>> v  /
>>> C<-
>>>
>>> A is parent for B and C, B is parent for C.
>>>
>>> Obviously, to update permissions, we should go in order A B C, so, when
>>> we update C, all it's parents permission already updated. But with
>>> current approach (simple recursion) we can update in sequence A C B C (C
>>> is updated twice). On first update of C, we consider old B permissions,
>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>> will finish with correct graph. But if the wrong thing failed, we break
>>> the whole process for no reason (it's possible that updated B permission
>>> will be less strict, but we will never check it).
>>
>> True.
>>
>>> I'll work on a patch for it.
>>
>> Sounds great, though I fear for the complexity.  I’ll just wait and see
>> for now.
>>
> 
> If you are OK with .active for now, then I think, Andrey can resend with
> .active and I'll dig into permissions in parallel. If Andrey's series
> go first, I'll just drop .active later if my idea works.

Sure, that works for me.

Max
Andrey Shinkevich Sept. 24, 2020, 5:29 p.m. UTC | #7
On 24.09.2020 18:00, Max Reitz wrote:
> On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
>> 24.09.2020 16:25, Max Reitz wrote:
>>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>>> Provide API for the COR-filter insertion/removal.
>>>>> ...
>>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>>> the
>>>>>>> filter node is being removed.
>>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>>
>>>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>>>> similar field in the preallocate filter, and there already is in
>>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>>
>>>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>>>> it needs based on the information it gets.  If every driver needs to
>>>>>> keep track of whether it has any parents and feed that information
>>>>>> into
>>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>>
>>>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>>> there are
>>>>>> any parents or not – shouldn’t be too difficult.
>>>>>
>>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>>> The filter is still in the backing chain by that moment and has a
>>>>> parent with child->perm != 0.
>>>>>
>>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>>> similar to one in the block/mirror.c.
>>>>>
>>>>> How could we resolve the issue at the generic layer?
>>>>>
>>>>>
>>>> The problem is that when we update permissions in bdrv_replace_node, we
>>>> consider new placement for "to" node, but old placement for "from" node.
>>>> So, during update, we may consider stricter permission requirements for
>>>> "from" than needed and they conflict with new "to" permissions.
>>>>
>>>> We should consider all graph changes for permission update
>>>> simultaneously, in same transaction. For this, we need refactor
>>>> permission update system to work on queue of nodes instead of simple DFS
>>>> recursion. And in the queue all the nodes to update should  be
>>>> toplogically sorted. In this way, when we update node N, all its parents
>>>> are already updated. And of course, we should make no-perm graph update
>>>> before permission update, and rollback no-perm movement if permission
>>>> update failed.
>>> OK, you’ve convinced me, .active is good enough for now. O:)
>>>
>>>> And we need topological sort anyway. Consider the following example:
>>>>
>>>> A -
>>>> |  \
>>>> |  v
>>>> |  B
>>>> |  |
>>>> v  /
>>>> C<-
>>>>
>>>> A is parent for B and C, B is parent for C.
>>>>
>>>> Obviously, to update permissions, we should go in order A B C, so, when
>>>> we update C, all it's parents permission already updated. But with
>>>> current approach (simple recursion) we can update in sequence A C B C (C
>>>> is updated twice). On first update of C, we consider old B permissions,
>>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>>> will finish with correct graph. But if the wrong thing failed, we break
>>>> the whole process for no reason (it's possible that updated B permission
>>>> will be less strict, but we will never check it).
>>> True.
>>>
>>>> I'll work on a patch for it.
>>> Sounds great, though I fear for the complexity.  I’ll just wait and see
>>> for now.
>>>
>> If you are OK with .active for now, then I think, Andrey can resend with
>> .active and I'll dig into permissions in parallel. If Andrey's series
>> go first, I'll just drop .active later if my idea works.
> Sure, that works for me.
>
> Max


So, I am keeping the filter insert/remove functions in the COR-driver 
code for now rather than moving them to the block generic layer, aren't I?

Andrey
Andrey Shinkevich Sept. 24, 2020, 5:40 p.m. UTC | #8
On 24.09.2020 20:29, Andrey Shinkevich wrote:
> On 24.09.2020 18:00, Max Reitz wrote:
>> On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
>>> 24.09.2020 16:25, Max Reitz wrote:
>>>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>>>> Provide API for the COR-filter insertion/removal.
>>>>>> ...
>>>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>>>> the
>>>>>>>> filter node is being removed.
>>>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just 
>>>>>>> not
>>>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken 
>>>>>>> on the
>>>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>>>
>>>>>>> Of course, using an .active field works, too.  But Vladimir 
>>>>>>> wanted a
>>>>>>> similar field in the preallocate filter, and there already is in
>>>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>>>
>>>>>>> .bdrv_child_perm() should generally be able to decide what 
>>>>>>> permissions
>>>>>>> it needs based on the information it gets.  If every driver 
>>>>>>> needs to
>>>>>>> keep track of whether it has any parents and feed that information
>>>>>>> into
>>>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>>>
>>>>>>> If perm == 0 is not sufficient to rule out any parents, we 
>>>>>>> should just
>>>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>>>> there are
>>>>>>> any parents or not – shouldn’t be too difficult.
>>>>>>
>>>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>>>> The filter is still in the backing chain by that moment and has a
>>>>>> parent with child->perm != 0.
>>>>>>
>>>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>>>> similar to one in the block/mirror.c.
>>>>>>
>>>>>> How could we resolve the issue at the generic layer?
>>>>>>
>>>>>>
>>>>> The problem is that when we update permissions in 
>>>>> bdrv_replace_node, we
>>>>> consider new placement for "to" node, but old placement for "from" 
>>>>> node.
>>>>> So, during update, we may consider stricter permission 
>>>>> requirements for
>>>>> "from" than needed and they conflict with new "to" permissions.
>>>>>
>>>>> We should consider all graph changes for permission update
>>>>> simultaneously, in same transaction. For this, we need refactor
>>>>> permission update system to work on queue of nodes instead of 
>>>>> simple DFS
>>>>> recursion. And in the queue all the nodes to update should  be
>>>>> toplogically sorted. In this way, when we update node N, all its 
>>>>> parents
>>>>> are already updated. And of course, we should make no-perm graph 
>>>>> update
>>>>> before permission update, and rollback no-perm movement if permission
>>>>> update failed.
>>>> OK, you’ve convinced me, .active is good enough for now. O:)
>>>>
>>>>> And we need topological sort anyway. Consider the following example:
>>>>>
>>>>> A -
>>>>> |  \
>>>>> |  v
>>>>> |  B
>>>>> |  |
>>>>> v  /
>>>>> C<-
>>>>>
>>>>> A is parent for B and C, B is parent for C.
>>>>>
>>>>> Obviously, to update permissions, we should go in order A B C, so, 
>>>>> when
>>>>> we update C, all it's parents permission already updated. But with
>>>>> current approach (simple recursion) we can update in sequence A C 
>>>>> B C (C
>>>>> is updated twice). On first update of C, we consider old B 
>>>>> permissions,
>>>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>>>> will finish with correct graph. But if the wrong thing failed, we 
>>>>> break
>>>>> the whole process for no reason (it's possible that updated B 
>>>>> permission
>>>>> will be less strict, but we will never check it).
>>>> True.
>>>>
>>>>> I'll work on a patch for it.
>>>> Sounds great, though I fear for the complexity.  I’ll just wait and 
>>>> see
>>>> for now.
>>>>
>>> If you are OK with .active for now, then I think, Andrey can resend 
>>> with
>>> .active and I'll dig into permissions in parallel. If Andrey's series
>>> go first, I'll just drop .active later if my idea works.
>> Sure, that works for me.
>>
>> Max
>
>
> So, I am keeping the filter insert/remove functions in the COR-driver 
> code for now rather than moving them to the block generic layer, 
> aren't I?
>
> Andrey
>

As a concession, we can invoke .bdrv_insert/remove driver functions 
within the generic API ones.

Andrey
Vladimir Sementsov-Ogievskiy Sept. 24, 2020, 5:49 p.m. UTC | #9
24.09.2020 20:40, Andrey Shinkevich wrote:
> On 24.09.2020 20:29, Andrey Shinkevich wrote:
>> On 24.09.2020 18:00, Max Reitz wrote:
>>> On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
>>>> 24.09.2020 16:25, Max Reitz wrote:
>>>>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>>>>> Provide API for the COR-filter insertion/removal.
>>>>>>> ...
>>>>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>>>>> the
>>>>>>>>> filter node is being removed.
>>>>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>>>>
>>>>>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>>>>>> similar field in the preallocate filter, and there already is in
>>>>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>>>>
>>>>>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>>>>>> it needs based on the information it gets.  If every driver needs to
>>>>>>>> keep track of whether it has any parents and feed that information
>>>>>>>> into
>>>>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>>>>
>>>>>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>>>>> there are
>>>>>>>> any parents or not – shouldn’t be too difficult.
>>>>>>>
>>>>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>>>>> The filter is still in the backing chain by that moment and has a
>>>>>>> parent with child->perm != 0.
>>>>>>>
>>>>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>>>>> similar to one in the block/mirror.c.
>>>>>>>
>>>>>>> How could we resolve the issue at the generic layer?
>>>>>>>
>>>>>>>
>>>>>> The problem is that when we update permissions in bdrv_replace_node, we
>>>>>> consider new placement for "to" node, but old placement for "from" node.
>>>>>> So, during update, we may consider stricter permission requirements for
>>>>>> "from" than needed and they conflict with new "to" permissions.
>>>>>>
>>>>>> We should consider all graph changes for permission update
>>>>>> simultaneously, in same transaction. For this, we need refactor
>>>>>> permission update system to work on queue of nodes instead of simple DFS
>>>>>> recursion. And in the queue all the nodes to update should  be
>>>>>> toplogically sorted. In this way, when we update node N, all its parents
>>>>>> are already updated. And of course, we should make no-perm graph update
>>>>>> before permission update, and rollback no-perm movement if permission
>>>>>> update failed.
>>>>> OK, you’ve convinced me, .active is good enough for now. O:)
>>>>>
>>>>>> And we need topological sort anyway. Consider the following example:
>>>>>>
>>>>>> A -
>>>>>> |  \
>>>>>> |  v
>>>>>> |  B
>>>>>> |  |
>>>>>> v  /
>>>>>> C<-
>>>>>>
>>>>>> A is parent for B and C, B is parent for C.
>>>>>>
>>>>>> Obviously, to update permissions, we should go in order A B C, so, when
>>>>>> we update C, all it's parents permission already updated. But with
>>>>>> current approach (simple recursion) we can update in sequence A C B C (C
>>>>>> is updated twice). On first update of C, we consider old B permissions,
>>>>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>>>>> will finish with correct graph. But if the wrong thing failed, we break
>>>>>> the whole process for no reason (it's possible that updated B permission
>>>>>> will be less strict, but we will never check it).
>>>>> True.
>>>>>
>>>>>> I'll work on a patch for it.
>>>>> Sounds great, though I fear for the complexity.  I’ll just wait and see
>>>>> for now.
>>>>>
>>>> If you are OK with .active for now, then I think, Andrey can resend with
>>>> .active and I'll dig into permissions in parallel. If Andrey's series
>>>> go first, I'll just drop .active later if my idea works.
>>> Sure, that works for me.
>>>
>>> Max
>>
>>
>> So, I am keeping the filter insert/remove functions in the COR-driver code for now rather than moving them to the block generic layer, aren't I?
>>
>> Andrey
>>
> 
> As a concession, we can invoke .bdrv_insert/remove driver functions within the generic API ones.
> 
> Andrey
> 

No, such handlers will not help I think. Until we improve permission-update system we can't implement good generic insertion code. So, I'd keep the patch as is for now.
diff mbox series

Patch

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f..0ede7aa 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -23,11 +23,21 @@ 
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "block/copy-on-read.h"
+
+
+typedef struct BDRVStateCOR {
+    bool active;
+} BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
+    BDRVStateCOR *state = bs->opaque;
+
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                                false, errp);
@@ -42,6 +52,13 @@  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
         ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
             bs->file->bs->supported_zero_flags);
 
+    state->active = true;
+
+    /*
+     * We don't need to call bdrv_child_refresh_perms() now as the permissions
+     * will be updated later when the filter node gets its parent.
+     */
+
     return 0;
 }
 
@@ -57,6 +74,17 @@  static void cor_child_perm(BlockDriverState *bs, BdrvChild *c,
                            uint64_t perm, uint64_t shared,
                            uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVStateCOR *s = bs->opaque;
+
+    if (!s->active) {
+        /*
+         * While the filter is being removed
+         */
+        *nperm = 0;
+        *nshared = BLK_PERM_ALL;
+        return;
+    }
+
     *nperm = perm & PERM_PASSTHROUGH;
     *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -135,6 +163,7 @@  static void cor_lock_medium(BlockDriverState *bs, bool locked)
 
 static BlockDriver bdrv_copy_on_read = {
     .format_name                        = "copy-on-read",
+    .instance_size                      = sizeof(BDRVStateCOR),
 
     .bdrv_open                          = cor_open,
     .bdrv_child_perm                    = cor_child_perm,
@@ -159,4 +188,79 @@  static void bdrv_copy_on_read_init(void)
     bdrv_register(&bdrv_copy_on_read);
 }
 
+
+static BlockDriverState *create_filter_node(BlockDriverState *bs,
+                                            const char *filter_node_name,
+                                            Error **errp)
+{
+    QDict *opts = qdict_new();
+
+    qdict_put_str(opts, "driver", "copy-on-read");
+    qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+    if (filter_node_name) {
+        qdict_put_str(opts, "node-name", filter_node_name);
+    }
+
+    return bdrv_open(NULL, NULL, opts, BDRV_O_RDWR, errp);
+}
+
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         const char *filter_node_name,
+                                         Error **errp)
+{
+    BlockDriverState *cor_filter_bs;
+    Error *local_err = NULL;
+
+    cor_filter_bs = create_filter_node(bs, filter_node_name, errp);
+    if (cor_filter_bs == NULL) {
+        error_prepend(errp, "Could not create filter node: ");
+        return NULL;
+    }
+
+    if (!filter_node_name) {
+        cor_filter_bs->implicit = true;
+    }
+
+    bdrv_drained_begin(bs);
+    bdrv_replace_node(bs, cor_filter_bs, &local_err);
+    bdrv_drained_end(bs);
+
+    if (local_err) {
+        bdrv_unref(cor_filter_bs);
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    return cor_filter_bs;
+}
+
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
+{
+    BdrvChild *child;
+    BlockDriverState *bs;
+    BDRVStateCOR *s = cor_filter_bs->opaque;
+
+    child = bdrv_filter_child(cor_filter_bs);
+    if (!child) {
+        return;
+    }
+    bs = child->bs;
+
+    /* Retain the BDS until we complete the graph change. */
+    bdrv_ref(bs);
+    /* Hold a guest back from writing while permissions are being reset. */
+    bdrv_drained_begin(bs);
+    /* Drop permissions before the graph change. */
+    s->active = false;
+    bdrv_child_refresh_perms(cor_filter_bs, child, &error_abort);
+    bdrv_replace_node(cor_filter_bs, bs, &error_abort);
+
+    bdrv_drained_end(bs);
+    bdrv_unref(bs);
+    bdrv_unref(cor_filter_bs);
+}
+
+
 block_init(bdrv_copy_on_read_init);
diff --git a/block/copy-on-read.h b/block/copy-on-read.h
new file mode 100644
index 0000000..1686e4e
--- /dev/null
+++ b/block/copy-on-read.h
@@ -0,0 +1,35 @@ 
+/*
+ * Copy-on-read filter block driver
+ *
+ * The filter driver performs Copy-On-Read (COR) operations
+ *
+ * Copyright (c) 2018-2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *   Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef COPY_ON_READ_FILTER
+#define COPY_ON_READ_FILTER
+
+#include "block/block_int.h"
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         const char *filter_node_name,
+                                         Error **errp);
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs);
+
+#endif /* COPY_ON_READ_FILTER */