diff mbox series

[v3,03/33] block: Add BdrvChildRole and BdrvChildRoleBits

Message ID 20200218124242.584644-4-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: Introduce real BdrvChildRole | expand

Commit Message

Max Reitz Feb. 18, 2020, 12:42 p.m. UTC
This mask will supplement BdrvChildClass when it comes to what role (or
combination of roles) a child takes for its parent.  It consists of
BdrvChildRoleBits values (which is an enum).

Because empty enums are not allowed, let us just start with it filled.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

Comments

Eric Blake Feb. 18, 2020, 1:05 p.m. UTC | #1
On 2/18/20 6:42 AM, Max Reitz wrote:
> This mask will supplement BdrvChildClass when it comes to what role (or
> combination of roles) a child takes for its parent.  It consists of
> BdrvChildRoleBits values (which is an enum).
> 
> Because empty enums are not allowed, let us just start with it filled.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block.h | 38 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 38 insertions(+)

> +
> +/* Mask of BdrvChildRoleBits values */
> +typedef unsigned int BdrvChildRole;
> +

I like how it turned out (both the additional comments earlier on, and 
the typedef for documentation purposes).

Reviewed-by: Eric Blake <eblake@redhat.com>
Kevin Wolf May 5, 2020, 11:19 a.m. UTC | #2
Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
> This mask will supplement BdrvChildClass when it comes to what role (or
> combination of roles) a child takes for its parent.  It consists of
> BdrvChildRoleBits values (which is an enum).
> 
> Because empty enums are not allowed, let us just start with it filled.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  include/block/block.h | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index fd89eb6c75..8c23948d08 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -268,6 +268,44 @@ enum {
>      DEFAULT_PERM_UNCHANGED      = BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH,
>  };
>  
> +enum BdrvChildRoleBits {
> +    /* Child stores data */
> +    BDRV_CHILD_DATA         = (1 << 0),
> +
> +    /* Child stores metadata */
> +    BDRV_CHILD_METADATA     = (1 << 1),
> +
> +    /*
> +     * A child to which the parent forwards all reads and writes.  It

Is "_all_ reads and writes" really required? Imagine a caching block
driver, should it not be considered a filter because it may just
complete the requests from its cache rather than asking the child?

> +     * must present exactly the same visible data as the parent.
> +     * Any node may have at most one filtered child at a time.
> +     */
> +    BDRV_CHILD_FILTERED     = (1 << 2),
> +
> +    /*
> +     * Child from which to read all data that isn’t allocated in the
> +     * parent (i.e., the backing child); such data is copied to the
> +     * parent through COW (and optionally COR).
> +     */
> +    BDRV_CHILD_COW          = (1 << 3),
> +
> +    /*
> +     * The primary child.  For most drivers, this is the child whose
> +     * filename applies best to the parent node.
> +     * Each parent must give this flag to no more than one child at a
> +     * time.
> +     */
> +    BDRV_CHILD_PRIMARY      = (1 << 4),

And I assume some drivers like quorum don't set it on any child.

> +    /* Useful combination of flags */
> +    BDRV_CHILD_IMAGE        = BDRV_CHILD_DATA
> +                              | BDRV_CHILD_METADATA
> +                              | BDRV_CHILD_PRIMARY,
> +};
> +
> +/* Mask of BdrvChildRoleBits values */
> +typedef unsigned int BdrvChildRole;
> +
>  char *bdrv_perm_names(uint64_t perm);
>  uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm);

The list intuitively makes sense to me. Let me try to think of some
interesting cases to see whether the documentation is complete or
whether it could be improved.


qcow2 is what everyone has in mind, so it should be obvious:

* Without a data file:
  * file: BDRV_CHILD_IMAGE
  * backing: BDRV_CHILD_COW

* With a data file:
  * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
  * data-file: BDRV_CHILD_DATA
  * backing: BDRV_CHILD_COW


We can use VMDK to make things a bit more interesting:

* file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
* extents.*: BDRV_CHILD_METADATA | BDRV_CHILD_DATA
* backing: BDRV_CHILD_COW

In other words, we can have multiple data and metadata children. Is this
correct or should extents not be marked as metadata? (Checked the final
code: yes we do have multiple of them in vmdk.) Should this be mentioned
in the documentation?

Do we then also want to allow multiple BDRV_CHILD_COW children? We don't
currently have a driver that needs it, but maybe it would be consistent
with DATA and METADATA then. However, it would contradict the
documentation that it's the "Child from which to read all data".


blkverify:

* x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
* x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED

Hm, according to the documentation, this doesn't work, FILTERED can be
set only for one node. But the condition ("the parent forwards all reads
and writes") applies to both children. I think the documentation should
mention what needs to be done in such cases. For blkverify, both
children are not equal in intention, so I guess the "real" filtered
child is x-image. But for quorum, you can't make any such distinction. I
assume the recommendation should be not to set FILTERED for any child
then.

Looking at the final code... Hm, your choice looks quite different: You
don't have DATA for x-raw, but you make it the PRIMARY and FILTERED
child. I think PRIMARY/FILTERED is just a bug (e.g. getlength and flush
being forwarded only to x-image show that it's primary). I do wonder
whether I have a different interpretation of DATA than you, though.

Also, the comparison makes me wonder whether FILTERED always implies
PRIMARY? Would there ever be a scenario where a child is FILTERED, but
not PRIMARY?


So I'm not completely sure if I understand the roles correctly, but I
guess my understanding is good enough to continue with the rest of the
series.

Kevin
Max Reitz May 5, 2020, 11:59 a.m. UTC | #3
On 05.05.20 13:19, Kevin Wolf wrote:
> Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
>> This mask will supplement BdrvChildClass when it comes to what role (or
>> combination of roles) a child takes for its parent.  It consists of
>> BdrvChildRoleBits values (which is an enum).
>>
>> Because empty enums are not allowed, let us just start with it filled.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  include/block/block.h | 38 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 38 insertions(+)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index fd89eb6c75..8c23948d08 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -268,6 +268,44 @@ enum {
>>      DEFAULT_PERM_UNCHANGED      = BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH,
>>  };
>>  
>> +enum BdrvChildRoleBits {
>> +    /* Child stores data */
>> +    BDRV_CHILD_DATA         = (1 << 0),
>> +
>> +    /* Child stores metadata */
>> +    BDRV_CHILD_METADATA     = (1 << 1),
>> +
>> +    /*
>> +     * A child to which the parent forwards all reads and writes.  It
> 
> Is "_all_ reads and writes" really required? Imagine a caching block
> driver, should it not be considered a filter because it may just
> complete the requests from its cache rather than asking the child?

Hm.  The important thing is that parent and child always show exactly
the same data, which is the second part of the definition given here.
So maybe we should flip both sentences, e.g.:

“A child which always presents exactly the same visibile data as the
parent, e.g. by virtue of the parent forwarding all reads and writes.”

?

>> +     * must present exactly the same visible data as the parent.
>> +     * Any node may have at most one filtered child at a time.
>> +     */
>> +    BDRV_CHILD_FILTERED     = (1 << 2),
>> +
>> +    /*
>> +     * Child from which to read all data that isn’t allocated in the
>> +     * parent (i.e., the backing child); such data is copied to the
>> +     * parent through COW (and optionally COR).
>> +     */
>> +    BDRV_CHILD_COW          = (1 << 3),
>> +
>> +    /*
>> +     * The primary child.  For most drivers, this is the child whose
>> +     * filename applies best to the parent node.
>> +     * Each parent must give this flag to no more than one child at a
>> +     * time.
>> +     */
>> +    BDRV_CHILD_PRIMARY      = (1 << 4),
> 
> And I assume some drivers like quorum don't set it on any child.

I thought “no more than one” implies that.

>> +    /* Useful combination of flags */
>> +    BDRV_CHILD_IMAGE        = BDRV_CHILD_DATA
>> +                              | BDRV_CHILD_METADATA
>> +                              | BDRV_CHILD_PRIMARY,
>> +};
>> +
>> +/* Mask of BdrvChildRoleBits values */
>> +typedef unsigned int BdrvChildRole;
>> +
>>  char *bdrv_perm_names(uint64_t perm);
>>  uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm);
> 
> The list intuitively makes sense to me. Let me try to think of some
> interesting cases to see whether the documentation is complete or
> whether it could be improved.
> 
> 
> qcow2 is what everyone has in mind, so it should be obvious:
> 
> * Without a data file:
>   * file: BDRV_CHILD_IMAGE
>   * backing: BDRV_CHILD_COW
> 
> * With a data file:
>   * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
>   * data-file: BDRV_CHILD_DATA
>   * backing: BDRV_CHILD_COW
> 
> 
> We can use VMDK to make things a bit more interesting:
> 
> * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
> * extents.*: BDRV_CHILD_METADATA | BDRV_CHILD_DATA
> * backing: BDRV_CHILD_COW
> 
> In other words, we can have multiple data and metadata children. Is this
> correct or should extents not be marked as metadata? (Checked the final
> code: yes we do have multiple of them in vmdk.) Should this be mentioned
> in the documentation?

If the extents contain metadata (I thought not, but I think I was just
wrong; sparse extents must contain their respective mapping structures),
then yes, they should all be marked as metadata children.

I’m not sure whether that needs to be mentioned explicitly in the doc,
because “Child stores metadata” seems sufficient to me.

> Do we then also want to allow multiple BDRV_CHILD_COW children? We don't
> currently have a driver that needs it, but maybe it would be consistent
> with DATA and METADATA then. However, it would contradict the
> documentation that it's the "Child from which to read all data".

Yes.  I would revisit that problem when the need arises.

It seems to me like this would open a whole can of worms, just like
allowing multiple filtered children does.

> blkverify:
> 
> * x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> * x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> 
> Hm, according to the documentation, this doesn't work, FILTERED can be
> set only for one node. But the condition ("the parent forwards all reads
> and writes") applies to both children. I think the documentation should
> mention what needs to be done in such cases.

I don’t know.  blkverify is a rare exception by design, because it can
abort when both children don’t match.  (I suppose we could theoretically
have a quorum mode where a child gets ejected once a mismatch is
detected, but that isn’t the case now.)

Furthermore, I would argue that blkverify actually expects the formatted
image to sometimes differ from the raw image, if anything, because the
format driver is to be tested.  This is the reason why I chose x-raw to
be the filtered child.

So there is no general instruction on what to do in such cases that I
followed here, I specifically chose one child based on what blkverify is
and what it’s supposed to do.  Therefore, I can’t really give a general
instruction on “what needs to be done in such cases”.

> For blkverify, both
> children are not equal in intention, so I guess the "real" filtered
> child is x-image. But for quorum, you can't make any such distinction. I
> assume the recommendation should be not to set FILTERED for any child
> then.

Quorum just isn’t a filter driver.

> Looking at the final code... Hm, your choice looks quite different: You
> don't have DATA for x-raw, but you make it the PRIMARY and FILTERED
> child. I think PRIMARY/FILTERED is just a bug (e.g. getlength and flush
> being forwarded only to x-image show that it's primary).

I rather consider getlength() a special case.  Ideally, we’d forward
getlength() to both and compare the results; however, image formats
might have different size resolution than raw files, so there could be a
difference, but it’d be irrelevant.

It makes then sense to forward it to the formatted image, because
generally formats have byte resolution for the disk size, whereas for
raw files it depends on caching and the filesystem, I think.

As for flush, yes, why do we forward it only to x-image?  Why is “the
raw file not important”?

> I do wonder
> whether I have a different interpretation of DATA than you, though.

I never set DATA for FILTERED, because I consider FILTERED to be
stronger than DATA, so once FILTERED is set, it doesn’t matter whether
DATA is set or not.  I suppose that should either be mentioned in the
documentation, or we decide that we should always set DATA regardless.

> Also, the comparison makes me wonder whether FILTERED always implies
> PRIMARY? Would there ever be a scenario where a child is FILTERED, but
> not PRIMARY?

I don’t know.  I suppose it does.  But what’s the implication?

Max
Kevin Wolf May 5, 2020, 12:54 p.m. UTC | #4
Am 05.05.2020 um 13:59 hat Max Reitz geschrieben:
> On 05.05.20 13:19, Kevin Wolf wrote:
> > Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
> >> This mask will supplement BdrvChildClass when it comes to what role (or
> >> combination of roles) a child takes for its parent.  It consists of
> >> BdrvChildRoleBits values (which is an enum).
> >>
> >> Because empty enums are not allowed, let us just start with it filled.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  include/block/block.h | 38 ++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 38 insertions(+)
> >>
> >> diff --git a/include/block/block.h b/include/block/block.h
> >> index fd89eb6c75..8c23948d08 100644
> >> --- a/include/block/block.h
> >> +++ b/include/block/block.h
> >> @@ -268,6 +268,44 @@ enum {
> >>      DEFAULT_PERM_UNCHANGED      = BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH,
> >>  };
> >>  
> >> +enum BdrvChildRoleBits {
> >> +    /* Child stores data */
> >> +    BDRV_CHILD_DATA         = (1 << 0),
> >> +
> >> +    /* Child stores metadata */
> >> +    BDRV_CHILD_METADATA     = (1 << 1),
> >> +
> >> +    /*
> >> +     * A child to which the parent forwards all reads and writes.  It
> > 
> > Is "_all_ reads and writes" really required? Imagine a caching block
> > driver, should it not be considered a filter because it may just
> > complete the requests from its cache rather than asking the child?
> 
> Hm.  The important thing is that parent and child always show exactly
> the same data, which is the second part of the definition given here.
> So maybe we should flip both sentences, e.g.:
> 
> “A child which always presents exactly the same visibile data as the
> parent, e.g. by virtue of the parent forwarding all reads and writes.”
>
> ?

Turns it into an example, which is a good way of explaining things that
are commonly, but not universally true. I like it.

> >> +     * must present exactly the same visible data as the parent.
> >> +     * Any node may have at most one filtered child at a time.
> >> +     */
> >> +    BDRV_CHILD_FILTERED     = (1 << 2),
> >> +
> >> +    /*
> >> +     * Child from which to read all data that isn’t allocated in the
> >> +     * parent (i.e., the backing child); such data is copied to the
> >> +     * parent through COW (and optionally COR).
> >> +     */
> >> +    BDRV_CHILD_COW          = (1 << 3),
> >> +
> >> +    /*
> >> +     * The primary child.  For most drivers, this is the child whose
> >> +     * filename applies best to the parent node.
> >> +     * Each parent must give this flag to no more than one child at a
> >> +     * time.
> >> +     */
> >> +    BDRV_CHILD_PRIMARY      = (1 << 4),
> > 
> > And I assume some drivers like quorum don't set it on any child.
> 
> I thought “no more than one” implies that.

Technically no, though it's probably the first assumption of most
people. It's just a suggestion for clarification, feel free to ignore
it.

> >> +    /* Useful combination of flags */
> >> +    BDRV_CHILD_IMAGE        = BDRV_CHILD_DATA
> >> +                              | BDRV_CHILD_METADATA
> >> +                              | BDRV_CHILD_PRIMARY,
> >> +};
> >> +
> >> +/* Mask of BdrvChildRoleBits values */
> >> +typedef unsigned int BdrvChildRole;
> >> +
> >>  char *bdrv_perm_names(uint64_t perm);
> >>  uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm);
> > 
> > The list intuitively makes sense to me. Let me try to think of some
> > interesting cases to see whether the documentation is complete or
> > whether it could be improved.
> > 
> > 
> > qcow2 is what everyone has in mind, so it should be obvious:
> > 
> > * Without a data file:
> >   * file: BDRV_CHILD_IMAGE
> >   * backing: BDRV_CHILD_COW
> > 
> > * With a data file:
> >   * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
> >   * data-file: BDRV_CHILD_DATA
> >   * backing: BDRV_CHILD_COW
> > 
> > 
> > We can use VMDK to make things a bit more interesting:
> > 
> > * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
> > * extents.*: BDRV_CHILD_METADATA | BDRV_CHILD_DATA
> > * backing: BDRV_CHILD_COW
> > 
> > In other words, we can have multiple data and metadata children. Is this
> > correct or should extents not be marked as metadata? (Checked the final
> > code: yes we do have multiple of them in vmdk.) Should this be mentioned
> > in the documentation?
> 
> If the extents contain metadata (I thought not, but I think I was just
> wrong; sparse extents must contain their respective mapping structures),
> then yes, they should all be marked as metadata children.
> 
> I’m not sure whether that needs to be mentioned explicitly in the doc,
> because “Child stores metadata” seems sufficient to me.

When you're the author, the meaning of everything is clear to you. :-)

In case of doubt, I would be more explicit so that the comment gives a
clear guideline for which role to use in which scenario.

> > Do we then also want to allow multiple BDRV_CHILD_COW children? We don't
> > currently have a driver that needs it, but maybe it would be consistent
> > with DATA and METADATA then. However, it would contradict the
> > documentation that it's the "Child from which to read all data".
> 
> Yes.  I would revisit that problem when the need arises.
> 
> It seems to me like this would open a whole can of worms, just like
> allowing multiple filtered children does.

Okay. Shall we document it explicitly like we do for the filter role?

> > blkverify:
> > 
> > * x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> > * x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> > 
> > Hm, according to the documentation, this doesn't work, FILTERED can be
> > set only for one node. But the condition ("the parent forwards all reads
> > and writes") applies to both children. I think the documentation should
> > mention what needs to be done in such cases.
> 
> I don’t know.  blkverify is a rare exception by design, because it can
> abort when both children don’t match.  (I suppose we could theoretically
> have a quorum mode where a child gets ejected once a mismatch is
> detected, but that isn’t the case now.)

Well, yes, this is exceptional. I would ignore that property for
assigning roles because when it comes to play, roles don't matter any
more because the whole process is gone. So...

> Furthermore, I would argue that blkverify actually expects the formatted
> image to sometimes differ from the raw image, if anything, because the
> format driver is to be tested.  This is the reason why I chose x-raw to
> be the filtered child.

...I don't think this case is relevant. If blkverify returns something,
both children have the same data.

> So there is no general instruction on what to do in such cases that I
> followed here, I specifically chose one child based on what blkverify is
> and what it’s supposed to do.  Therefore, I can’t really give a general
> instruction on “what needs to be done in such cases”.

Maybe the missing part for me is what FILTERED is even used for. I
assume it's for skipping over filters in certain functions in the
generic block layer?

In this case, maybe the right answer is that...

> > For blkverify, both
> > children are not equal in intention, so I guess the "real" filtered
> > child is x-image. But for quorum, you can't make any such distinction. I
> > assume the recommendation should be not to set FILTERED for any child
> > then.
> 
> Quorum just isn’t a filter driver.

...blkverify isn't one either because performing an operation on only
one child usually won't be correct.

> > Looking at the final code... Hm, your choice looks quite different: You
> > don't have DATA for x-raw, but you make it the PRIMARY and FILTERED
> > child. I think PRIMARY/FILTERED is just a bug (e.g. getlength and flush
> > being forwarded only to x-image show that it's primary).
> 
> I rather consider getlength() a special case.  Ideally, we’d forward
> getlength() to both and compare the results; however, image formats
> might have different size resolution than raw files, so there could be a
> difference, but it’d be irrelevant.
> 
> It makes then sense to forward it to the formatted image, because
> generally formats have byte resolution for the disk size, whereas for
> raw files it depends on caching and the filesystem, I think.
> 
> As for flush, yes, why do we forward it only to x-image?  Why is “the
> raw file not important”?

Because it's the copy that is used to check whether the main image is
correct. If things break, you just create a new copy. At least that's
how blkverify was supposed to be used.

In fact, I guess in the typical use cases for blkverify, cache=unsafe is
enough anyway because you would start over from scratch, so... not a
strong argument.

> > I do wonder
> > whether I have a different interpretation of DATA than you, though.
> 
> I never set DATA for FILTERED, because I consider FILTERED to be
> stronger than DATA, so once FILTERED is set, it doesn’t matter whether
> DATA is set or not.  I suppose that should either be mentioned in the
> documentation, or we decide that we should always set DATA regardless.

Either option should be fine. I guess documenting it is less work.

> > Also, the comparison makes me wonder whether FILTERED always implies
> > PRIMARY? Would there ever be a scenario where a child is FILTERED, but
> > not PRIMARY?
> 
> I don’t know.  I suppose it does.  But what’s the implication?

*shrug* I was just asking to see if I understand things right. We could
document it, but I don't have a good reason why we must do that.

Maybe the more relevant question would be if a FILTERED child must be
the only child to avoid the problems we're discussing for blkverify. But
I think I already answered that question for myself with "no", so
probably not much use asking it.

Kevin
Max Reitz May 5, 2020, 1:20 p.m. UTC | #5
On 05.05.20 14:54, Kevin Wolf wrote:
> Am 05.05.2020 um 13:59 hat Max Reitz geschrieben:
>> On 05.05.20 13:19, Kevin Wolf wrote:
>>> Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:

[...]

>>>> +    /* Useful combination of flags */
>>>> +    BDRV_CHILD_IMAGE        = BDRV_CHILD_DATA
>>>> +                              | BDRV_CHILD_METADATA
>>>> +                              | BDRV_CHILD_PRIMARY,
>>>> +};
>>>> +
>>>> +/* Mask of BdrvChildRoleBits values */
>>>> +typedef unsigned int BdrvChildRole;
>>>> +
>>>>  char *bdrv_perm_names(uint64_t perm);
>>>>  uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm);
>>>
>>> The list intuitively makes sense to me. Let me try to think of some
>>> interesting cases to see whether the documentation is complete or
>>> whether it could be improved.
>>>
>>>
>>> qcow2 is what everyone has in mind, so it should be obvious:
>>>
>>> * Without a data file:
>>>   * file: BDRV_CHILD_IMAGE
>>>   * backing: BDRV_CHILD_COW
>>>
>>> * With a data file:
>>>   * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
>>>   * data-file: BDRV_CHILD_DATA
>>>   * backing: BDRV_CHILD_COW
>>>
>>>
>>> We can use VMDK to make things a bit more interesting:
>>>
>>> * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA
>>> * extents.*: BDRV_CHILD_METADATA | BDRV_CHILD_DATA
>>> * backing: BDRV_CHILD_COW
>>>
>>> In other words, we can have multiple data and metadata children. Is this
>>> correct or should extents not be marked as metadata? (Checked the final
>>> code: yes we do have multiple of them in vmdk.) Should this be mentioned
>>> in the documentation?
>>
>> If the extents contain metadata (I thought not, but I think I was just
>> wrong; sparse extents must contain their respective mapping structures),
>> then yes, they should all be marked as metadata children.
>>
>> I’m not sure whether that needs to be mentioned explicitly in the doc,
>> because “Child stores metadata” seems sufficient to me.
> 
> When you're the author, the meaning of everything is clear to you. :-)
> 
> In case of doubt, I would be more explicit so that the comment gives a
> clear guideline for which role to use in which scenario.

OK, so you mean just noting everywhere explicitly how many children can
get a specific flag, and not just in some cases?  That is, make a note
for DATA and METADATA that they can be given to an arbitrary number of
children, and COW only to at most one.

>>> Do we then also want to allow multiple BDRV_CHILD_COW children? We don't
>>> currently have a driver that needs it, but maybe it would be consistent
>>> with DATA and METADATA then. However, it would contradict the
>>> documentation that it's the "Child from which to read all data".
>>
>> Yes.  I would revisit that problem when the need arises.
>>
>> It seems to me like this would open a whole can of worms, just like
>> allowing multiple filtered children does.
> 
> Okay. Shall we document it explicitly like we do for the filter role?

Yep.

>>> blkverify:
>>>
>>> * x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
>>> * x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
>>>
>>> Hm, according to the documentation, this doesn't work, FILTERED can be
>>> set only for one node. But the condition ("the parent forwards all reads
>>> and writes") applies to both children. I think the documentation should
>>> mention what needs to be done in such cases.
>>
>> I don’t know.  blkverify is a rare exception by design, because it can
>> abort when both children don’t match.  (I suppose we could theoretically
>> have a quorum mode where a child gets ejected once a mismatch is
>> detected, but that isn’t the case now.)
> 
> Well, yes, this is exceptional. I would ignore that property for
> assigning roles because when it comes to play, roles don't matter any
> more because the whole process is gone. So...
> 
>> Furthermore, I would argue that blkverify actually expects the formatted
>> image to sometimes differ from the raw image, if anything, because the
>> format driver is to be tested.  This is the reason why I chose x-raw to
>> be the filtered child.
> 
> ...I don't think this case is relevant. If blkverify returns something,
> both children have the same data.

Another argument is that right now, bs->file points to x-raw, and
.is_filter is set.  So x-raw is already treated as the filtered child.

>> So there is no general instruction on what to do in such cases that I
>> followed here, I specifically chose one child based on what blkverify is
>> and what it’s supposed to do.  Therefore, I can’t really give a general
>> instruction on “what needs to be done in such cases”.
> 
> Maybe the missing part for me is what FILTERED is even used for. I
> assume it's for skipping over filters in certain functions in the
> generic block layer?

Yes.

> In this case, maybe the right answer is that...
> 
>>> For blkverify, both
>>> children are not equal in intention, so I guess the "real" filtered
>>> child is x-image. But for quorum, you can't make any such distinction. I
>>> assume the recommendation should be not to set FILTERED for any child
>>> then.
>>
>> Quorum just isn’t a filter driver.
> 
> ...blkverify isn't one either because performing an operation on only
> one child usually won't be correct.

Good point.  It would work if filters are just skipped for functions
that read/query stuff, which I think is the case.  I don’t think we ever
skip filters when it comes to modifying data.

In any case, I wouldn’t lose too much sleep over blkverify whatever we
do.  It’s a driver used purely for debugging purposes.

>>> Looking at the final code... Hm, your choice looks quite different: You
>>> don't have DATA for x-raw, but you make it the PRIMARY and FILTERED
>>> child. I think PRIMARY/FILTERED is just a bug (e.g. getlength and flush
>>> being forwarded only to x-image show that it's primary).
>>
>> I rather consider getlength() a special case.  Ideally, we’d forward
>> getlength() to both and compare the results; however, image formats
>> might have different size resolution than raw files, so there could be a
>> difference, but it’d be irrelevant.
>>
>> It makes then sense to forward it to the formatted image, because
>> generally formats have byte resolution for the disk size, whereas for
>> raw files it depends on caching and the filesystem, I think.
>>
>> As for flush, yes, why do we forward it only to x-image?  Why is “the
>> raw file not important”?
> 
> Because it's the copy that is used to check whether the main image is
> correct. If things break, you just create a new copy. At least that's
> how blkverify was supposed to be used.

I wonder why blkverify decides that.  This should be up to the user, and
if the user wants to keep the verification image around, blkverify
shouldn’t prevent that.

> In fact, I guess in the typical use cases for blkverify, cache=unsafe is
> enough anyway because you would start over from scratch, so... not a
> strong argument.

That too.

>>> I do wonder
>>> whether I have a different interpretation of DATA than you, though.
>>
>> I never set DATA for FILTERED, because I consider FILTERED to be
>> stronger than DATA, so once FILTERED is set, it doesn’t matter whether
>> DATA is set or not.  I suppose that should either be mentioned in the
>> documentation, or we decide that we should always set DATA regardless.
> 
> Either option should be fine. I guess documenting it is less work.

OK.

>>> Also, the comparison makes me wonder whether FILTERED always implies
>>> PRIMARY? Would there ever be a scenario where a child is FILTERED, but
>>> not PRIMARY?
>>
>> I don’t know.  I suppose it does.  But what’s the implication?
> 
> *shrug* I was just asking to see if I understand things right. We could
> document it, but I don't have a good reason why we must do that.

OK.

> Maybe the more relevant question would be if a FILTERED child must be
> the only child to avoid the problems we're discussing for blkverify. But
> I think I already answered that question for myself with "no", so
> probably not much use asking it.

blkverify is just a bit weird, and I personally don’t mind just treating
it as something “special”, considering it’s just a debugging aid.

Regardless of blkverify, I don’t think FILTERED children must be the
only children, though, because I can well imagine filter drivers having
metadata children on the side, e.g. config data or bitmaps (not just
dirty bitmaps, but also e.g. what to cache for a hypothetical cache driver).

Max
Kevin Wolf May 5, 2020, 1:38 p.m. UTC | #6
Am 05.05.2020 um 15:20 hat Max Reitz geschrieben:
> On 05.05.20 14:54, Kevin Wolf wrote:
> > When you're the author, the meaning of everything is clear to you. :-)
> > 
> > In case of doubt, I would be more explicit so that the comment gives a
> > clear guideline for which role to use in which scenario.
> 
> OK, so you mean just noting everywhere explicitly how many children can
> get a specific flag, and not just in some cases?  That is, make a note
> for DATA and METADATA that they can be given to an arbitrary number of
> children, and COW only to at most one.

Sounds good to me.

> >>> blkverify:
> >>>
> >>> * x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> >>> * x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED
> >>>
> >>> Hm, according to the documentation, this doesn't work, FILTERED can be
> >>> set only for one node. But the condition ("the parent forwards all reads
> >>> and writes") applies to both children. I think the documentation should
> >>> mention what needs to be done in such cases.
> >>
> >> I don’t know.  blkverify is a rare exception by design, because it can
> >> abort when both children don’t match.  (I suppose we could theoretically
> >> have a quorum mode where a child gets ejected once a mismatch is
> >> detected, but that isn’t the case now.)
> > 
> > Well, yes, this is exceptional. I would ignore that property for
> > assigning roles because when it comes to play, roles don't matter any
> > more because the whole process is gone. So...
> > 
> >> Furthermore, I would argue that blkverify actually expects the formatted
> >> image to sometimes differ from the raw image, if anything, because the
> >> format driver is to be tested.  This is the reason why I chose x-raw to
> >> be the filtered child.
> > 
> > ...I don't think this case is relevant. If blkverify returns something,
> > both children have the same data.
> 
> Another argument is that right now, bs->file points to x-raw, and
> .is_filter is set.  So x-raw is already treated as the filtered child.

I admit defeat. :-)

> >> So there is no general instruction on what to do in such cases that I
> >> followed here, I specifically chose one child based on what blkverify is
> >> and what it’s supposed to do.  Therefore, I can’t really give a general
> >> instruction on “what needs to be done in such cases”.
> > 
> > Maybe the missing part for me is what FILTERED is even used for. I
> > assume it's for skipping over filters in certain functions in the
> > generic block layer?
> 
> Yes.
> 
> > In this case, maybe the right answer is that...
> > 
> >>> For blkverify, both
> >>> children are not equal in intention, so I guess the "real" filtered
> >>> child is x-image. But for quorum, you can't make any such distinction. I
> >>> assume the recommendation should be not to set FILTERED for any child
> >>> then.
> >>
> >> Quorum just isn’t a filter driver.
> > 
> > ...blkverify isn't one either because performing an operation on only
> > one child usually won't be correct.
> 
> Good point.  It would work if filters are just skipped for functions
> that read/query stuff, which I think is the case.  I don’t think we ever
> skip filters when it comes to modifying data.
> 
> In any case, I wouldn’t lose too much sleep over blkverify whatever we
> do.  It’s a driver used purely for debugging purposes.

Yeah, I'm not really worried about blkverify per se. It just seems like
an interesting corner case to discuss to make sure that the design of
the role system is right and doesn't miss anything important.

> > Maybe the more relevant question would be if a FILTERED child must be
> > the only child to avoid the problems we're discussing for blkverify. But
> > I think I already answered that question for myself with "no", so
> > probably not much use asking it.
> 
> blkverify is just a bit weird, and I personally don’t mind just treating
> it as something “special”, considering it’s just a debugging aid.
> 
> Regardless of blkverify, I don’t think FILTERED children must be the
> only children, though, because I can well imagine filter drivers having
> metadata children on the side, e.g. config data or bitmaps (not just
> dirty bitmaps, but also e.g. what to cache for a hypothetical cache driver).

The example of a caching driver that uses a child node for the cached
data (probably on a fast, but small disk) was what made me answer the
question with "no".

But as you write, having a pure metadata child could make sense, too.

Kevin
diff mbox series

Patch

diff --git a/include/block/block.h b/include/block/block.h
index fd89eb6c75..8c23948d08 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -268,6 +268,44 @@  enum {
     DEFAULT_PERM_UNCHANGED      = BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH,
 };
 
+enum BdrvChildRoleBits {
+    /* Child stores data */
+    BDRV_CHILD_DATA         = (1 << 0),
+
+    /* Child stores metadata */
+    BDRV_CHILD_METADATA     = (1 << 1),
+
+    /*
+     * A child to which the parent forwards all reads and writes.  It
+     * must present exactly the same visible data as the parent.
+     * Any node may have at most one filtered child at a time.
+     */
+    BDRV_CHILD_FILTERED     = (1 << 2),
+
+    /*
+     * Child from which to read all data that isn’t allocated in the
+     * parent (i.e., the backing child); such data is copied to the
+     * parent through COW (and optionally COR).
+     */
+    BDRV_CHILD_COW          = (1 << 3),
+
+    /*
+     * The primary child.  For most drivers, this is the child whose
+     * filename applies best to the parent node.
+     * Each parent must give this flag to no more than one child at a
+     * time.
+     */
+    BDRV_CHILD_PRIMARY      = (1 << 4),
+
+    /* Useful combination of flags */
+    BDRV_CHILD_IMAGE        = BDRV_CHILD_DATA
+                              | BDRV_CHILD_METADATA
+                              | BDRV_CHILD_PRIMARY,
+};
+
+/* Mask of BdrvChildRoleBits values */
+typedef unsigned int BdrvChildRole;
+
 char *bdrv_perm_names(uint64_t perm);
 uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm);