diff mbox series

[10/22] quorum: Implement .bdrv_recurse_can_replace()

Message ID 20190920152804.12875-11-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Fix check_to_replace_node() | expand

Commit Message

Max Reitz Sept. 20, 2019, 3:27 p.m. UTC
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

Comments

Vladimir Sementsov-Ogievskiy Sept. 25, 2019, 1:39 p.m. UTC | #1
20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 207054a64e..81b57dbae2 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our

I'm a bit misleaded: by @bs, or by node equal to @bs (accordingly to
bdrv_recurse_can_replace spec)?

> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)

Aha, and you answered already :) OK.

> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {

Hmm, still, what is wrong if we just put bdrv_recurse_can_replace(s->children[i].child->bs, to_replace) into this if condition?
(or may be more correct to call it after taking permissions)

> +            Error *local_err = NULL;
> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> +
> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> +
> +            if (local_err) {
> +                error_free(local_err);
> +                return false;
> +            }
> +
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
>   {
>   
> @@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
>   
>       .is_filter                          = true,
>       .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
>   };
>
Vladimir Sementsov-Ogievskiy Sept. 25, 2019, 2:12 p.m. UTC | #2
20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 207054a64e..81b57dbae2 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our
> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)
> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {
> +            Error *local_err = NULL;
> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> +

So we are trying to answer on a question "is it ok to replace" it, by cheating on
permission system... Possibly, it's a problem of general design, and instead of
  examining one subtree, we should ask all parents of to_replace node, are they
OK with such replacement..

Another idea is that it's strange to check permissions somewhere else than in generic
permission check functions. But I've no idea how to handle it in permission system.

Or in other words: I don't like it all, but I at least follow how it works. And this
looks better than it was and fixes some bugs.

> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> +
> +            if (local_err) {
> +                error_free(local_err);
> +                return false;
> +            }
> +
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
>   {
>   
> @@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
>   
>       .is_filter                          = true,
>       .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
>   };
>
Vladimir Sementsov-Ogievskiy Sept. 25, 2019, 2:14 p.m. UTC | #3
20.09.2019 18:27, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 207054a64e..81b57dbae2 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>       return false;
>   }
>   
> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> +                                       BlockDriverState *to_replace)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        /*
> +         * We have no idea whether our children show the same data as
> +         * this node (@bs).  It is actually highly likely that
> +         * @to_replace does not, because replacing a broken child is
> +         * one of the main use cases here.
> +         *
> +         * We do know that the new BDS will match @bs, so replacing
> +         * any of our children by it will be safe.  It cannot change
> +         * the data this quorum node presents to its parents.
> +         *
> +         * However, replacing @to_replace by @bs in any of our
> +         * children's chains may change visible data somewhere in
> +         * there.  We therefore cannot recurse down those chains with
> +         * bdrv_recurse_can_replace().
> +         * (More formally, bdrv_recurse_can_replace() requires that
> +         * @to_replace will be replaced by something matching the @bs
> +         * passed to it.  We cannot guarantee that.)
> +         *
> +         * Thus, we can only check whether any of our immediate
> +         * children matches @to_replace.
> +         *
> +         * (In the future, we might add a function to recurse down a
> +         * chain that checks that nothing there cares about a change
> +         * in data from the respective child in question.  For
> +         * example, most filters do not care when their child's data
> +         * suddenly changes, as long as their parents do not care.)
> +         */
> +        if (s->children[i].child->bs == to_replace) {
> +            Error *local_err = NULL;
> +
> +            /*
> +             * We now have to ensure that there is no other parent
> +             * that cares about replacing this child by a node with
> +             * potentially different data.
> +             */
> +            s->children[i].to_be_replaced = true;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> +
> +            /* Revert permissions */
> +            s->children[i].to_be_replaced = false;
> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> +
> +            if (local_err) {
> +                error_free(local_err);
> +                return false;
> +            }
> +
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
>   {
>   
> @@ -1195,6 +1256,7 @@ static BlockDriver bdrv_quorum = {
>   
>       .is_filter                          = true,
>       .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,

and here, it's useless until we set is_filter to false.

>   
>       .strong_runtime_opts                = quorum_strong_runtime_opts,
>   };
>
Max Reitz Sept. 26, 2019, 11:12 a.m. UTC | #4
On 25.09.19 15:39, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 62 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 207054a64e..81b57dbae2 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return false;
>>   }
>>   
>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>> +                                       BlockDriverState *to_replace)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        /*
>> +         * We have no idea whether our children show the same data as
>> +         * this node (@bs).  It is actually highly likely that
>> +         * @to_replace does not, because replacing a broken child is
>> +         * one of the main use cases here.
>> +         *
>> +         * We do know that the new BDS will match @bs, so replacing
>> +         * any of our children by it will be safe.  It cannot change
>> +         * the data this quorum node presents to its parents.
>> +         *
>> +         * However, replacing @to_replace by @bs in any of our
> 
> I'm a bit misleaded: by @bs, or by node equal to @bs (accordingly to
> bdrv_recurse_can_replace spec)?
> 
>> +         * children's chains may change visible data somewhere in
>> +         * there.  We therefore cannot recurse down those chains with
>> +         * bdrv_recurse_can_replace().
>> +         * (More formally, bdrv_recurse_can_replace() requires that
>> +         * @to_replace will be replaced by something matching the @bs
>> +         * passed to it.  We cannot guarantee that.)
> 
> Aha, and you answered already :) OK.
> 
>> +         *
>> +         * Thus, we can only check whether any of our immediate
>> +         * children matches @to_replace.
>> +         *
>> +         * (In the future, we might add a function to recurse down a
>> +         * chain that checks that nothing there cares about a change
>> +         * in data from the respective child in question.  For
>> +         * example, most filters do not care when their child's data
>> +         * suddenly changes, as long as their parents do not care.)
>> +         */
>> +        if (s->children[i].child->bs == to_replace) {
> 
> Hmm, still, what is wrong if we just put bdrv_recurse_can_replace(s->children[i].child->bs, to_replace) into this if condition?
> (or may be more correct to call it after taking permissions)

It’s wrong because:

bs=quorum -> child=filter -> to_replace <- other_parent

Quorum now takes WRITE on the child and unshares CONSISTENT_READ.  That
doesn’t concern the @other_parent at all, however, it only concerns
parents immediately on its child (the filter node).

bdrv_recurse_can_replace() will happily acknowledge you replacing
@to_replace by something equal to @filter, because that shouldn’t
concern @other_parent.

The problem of course is that we don’t want to replace @to_replace by
something equal to @filter, but by something equal to @quorum.  And that
may be different from @to_replace, so @other_parent should have a say in it.

What we’d need to do is have a function that recurses down and checks
whether there are any parents that care; which is exactly what I’ve
described in the last paragraph of the long comment above.


For now, I think it’s fine to just ignore such cases and forbid them.

Max
Max Reitz Sept. 26, 2019, 11:17 a.m. UTC | #5
On 25.09.19 16:12, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 62 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 207054a64e..81b57dbae2 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>       return false;
>>   }
>>   
>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>> +                                       BlockDriverState *to_replace)
>> +{
>> +    BDRVQuorumState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->num_children; i++) {
>> +        /*
>> +         * We have no idea whether our children show the same data as
>> +         * this node (@bs).  It is actually highly likely that
>> +         * @to_replace does not, because replacing a broken child is
>> +         * one of the main use cases here.
>> +         *
>> +         * We do know that the new BDS will match @bs, so replacing
>> +         * any of our children by it will be safe.  It cannot change
>> +         * the data this quorum node presents to its parents.
>> +         *
>> +         * However, replacing @to_replace by @bs in any of our
>> +         * children's chains may change visible data somewhere in
>> +         * there.  We therefore cannot recurse down those chains with
>> +         * bdrv_recurse_can_replace().
>> +         * (More formally, bdrv_recurse_can_replace() requires that
>> +         * @to_replace will be replaced by something matching the @bs
>> +         * passed to it.  We cannot guarantee that.)
>> +         *
>> +         * Thus, we can only check whether any of our immediate
>> +         * children matches @to_replace.
>> +         *
>> +         * (In the future, we might add a function to recurse down a
>> +         * chain that checks that nothing there cares about a change
>> +         * in data from the respective child in question.  For
>> +         * example, most filters do not care when their child's data
>> +         * suddenly changes, as long as their parents do not care.)
>> +         */
>> +        if (s->children[i].child->bs == to_replace) {
>> +            Error *local_err = NULL;
>> +
>> +            /*
>> +             * We now have to ensure that there is no other parent
>> +             * that cares about replacing this child by a node with
>> +             * potentially different data.
>> +             */
>> +            s->children[i].to_be_replaced = true;
>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
>> +
> 
> So we are trying to answer on a question "is it ok to replace" it, by cheating on
> permission system... Possibly, it's a problem of general design, and instead of
>   examining one subtree, we should ask all parents of to_replace node, are they
> OK with such replacement..

I’m not sure whether it’s cheating.

We want to replace some node.  A parent should be A-OK with that as long
as it hasn’t frozen its child link, and as long as it doesn’t care about
data changes (it should not have taken CONSISTENT_READ, and it must have
shared WRITE).

The only actual problem we have is that currently basically everything
takes CONSISTENT_READ (which is completely fine), but the only thing
that doesn’t is the mirror_top_bs, and that has exactly the problem of
“I can only get away without CONSISTENT_READ if it was me who unshared it”.

But that’s a different problem.  I don’t think this is cheating.

> Another idea is that it's strange to check permissions somewhere else than in generic
> permission check functions. But I've no idea how to handle it in permission system.

I don’t check the permissions, though.  I let quorum take what it needs
to allow changing one of its children.

What is a problem is that I should keep the permissions tightened until
the node is actually replaced and only then release them.  But that
turned out to be a huge mess so I resorted to just double-checking
before mirror actually completes.

Max
diff mbox series

Patch

diff --git a/block/quorum.c b/block/quorum.c
index 207054a64e..81b57dbae2 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -825,6 +825,67 @@  static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+static bool quorum_recurse_can_replace(BlockDriverState *bs,
+                                       BlockDriverState *to_replace)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        /*
+         * We have no idea whether our children show the same data as
+         * this node (@bs).  It is actually highly likely that
+         * @to_replace does not, because replacing a broken child is
+         * one of the main use cases here.
+         *
+         * We do know that the new BDS will match @bs, so replacing
+         * any of our children by it will be safe.  It cannot change
+         * the data this quorum node presents to its parents.
+         *
+         * However, replacing @to_replace by @bs in any of our
+         * children's chains may change visible data somewhere in
+         * there.  We therefore cannot recurse down those chains with
+         * bdrv_recurse_can_replace().
+         * (More formally, bdrv_recurse_can_replace() requires that
+         * @to_replace will be replaced by something matching the @bs
+         * passed to it.  We cannot guarantee that.)
+         *
+         * Thus, we can only check whether any of our immediate
+         * children matches @to_replace.
+         *
+         * (In the future, we might add a function to recurse down a
+         * chain that checks that nothing there cares about a change
+         * in data from the respective child in question.  For
+         * example, most filters do not care when their child's data
+         * suddenly changes, as long as their parents do not care.)
+         */
+        if (s->children[i].child->bs == to_replace) {
+            Error *local_err = NULL;
+
+            /*
+             * We now have to ensure that there is no other parent
+             * that cares about replacing this child by a node with
+             * potentially different data.
+             */
+            s->children[i].to_be_replaced = true;
+            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
+
+            /* Revert permissions */
+            s->children[i].to_be_replaced = false;
+            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
+
+            if (local_err) {
+                error_free(local_err);
+                return false;
+            }
+
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
 {
 
@@ -1195,6 +1256,7 @@  static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+    .bdrv_recurse_can_replace           = quorum_recurse_can_replace,
 
     .strong_runtime_opts                = quorum_strong_runtime_opts,
 };