[v6,04/42] block: Add child access functions

Message ID	20190809161407.11920-5-mreitz@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Max Reitz <mreitz@redhat.com> To: qemu-block@nongnu.org Date: Fri, 9 Aug 2019 18:13:29 +0200 Message-Id: <20190809161407.11920-5-mreitz@redhat.com> In-Reply-To: <20190809161407.11920-1-mreitz@redhat.com> References: <20190809161407.11920-1-mreitz@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] [PATCH v6 04/42] block: Add child access functions Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	block: Deal with filters \| expand [v6,00/42] block: Deal with filters [v6,01/42] block: Mark commit and mirror as filter drivers [v6,02/42] copy-on-read: Support compressed writes [v6,03/42] throttle: Support compressed writes [v6,04/42] block: Add child access functions [v6,05/42] block: Add chain helper functions [v6,06/42] qcow2: Implement .bdrv_storage_child() [v6,07/42] block: filtered_cow_child() for has_zero_init() [v6,08/42] block: bdrv_set_backing_hd() is about bs->backing [v6,09/42] block: Include filters when freezing backing chain [v6,10/42] block: Drop bdrv_is_encrypted() [v6,11/42] block: Add bdrv_supports_compressed_writes() [v6,12/42] block: Use bdrv_filtered_rw* where obvious [v6,13/42] block: Use CAFs in block status functions [v6,14/42] block: Use CAFs when working with backing chains [v6,15/42] block: Re-evaluate backing file handling in reopen [v6,16/42] block: Flush all children in generic code [v6,17/42] block: Use CAFs in bdrv_refresh_limits() [v6,18/42] block: Use CAFs in bdrv_refresh_filename() [v6,19/42] block: Use CAF in bdrv_co_rw_vmstate() [v6,20/42] block/snapshot: Fix fallback [v6,21/42] block: Use CAFs for debug breakpoints [v6,22/42] block: Fix bdrv_get_allocated_file_size's fallback [v6,23/42] blockdev: Use CAF in external_snapshot_prepare() [v6,24/42] block: Use child access functions for QAPI queries [v6,25/42] mirror: Deal with filters [v6,26/42] backup: Deal with filters [v6,27/42] commit: Deal with filters [v6,28/42] stream: Deal with filters [v6,29/42] nbd: Use CAF when looking for dirty bitmap [v6,30/42] qemu-img: Use child access functions [v6,31/42] block: Drop backing_bs() [v6,32/42] block: Make bdrv_get_cumulative_perm() public [v6,33/42] blockdev: Fix active commit choice [v6,34/42] block: Inline bdrv_co_block_status_from_*() [v6,35/42] block: Fix check_to_replace_node() [v6,36/42] iotests: Add tests for mirror @replaces loops [v6,37/42] block: Leave BDS.backing_file constant [v6,38/42] iotests: Let complete_and_wait() work with commit [v6,39/42] iotests: Add filter commit test cases [v6,40/42] iotests: Add filter mirror test cases [v6,41/42] iotests: Add test for commit in sub directory [v6,42/42] iotests: Test committing to overridden backing

Max Reitz Aug. 9, 2019, 4:13 p.m. UTC

There are BDS children that the general block layer code can access,
namely bs->file and bs->backing.  Since the introduction of filters and
external data files, their meaning is not quite clear.  bs->backing can
be a COW source, or it can be an R/W-filtered child; bs->file can be an
R/W-filtered child, it can be data and metadata storage, or it can be
just metadata storage.

This overloading really is not helpful.  This patch adds function that
retrieve the correct child for each exact purpose.  Later patches in
this series will make use of them.  Doing so will allow us to handle
filter nodes and external data files in a meaningful way.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/block_int.h | 57 ++++++++++++++++++++--
 block.c                   | 99 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+), 3 deletions(-)

Eric Blake Aug. 9, 2019, 4:56 p.m. UTC | #1

On 8/9/19 11:13 AM, Max Reitz wrote:
> There are BDS children that the general block layer code can access,
> namely bs->file and bs->backing.  Since the introduction of filters and
> external data files, their meaning is not quite clear.  bs->backing can
> be a COW source, or it can be an R/W-filtered child; bs->file can be an
> R/W-filtered child, it can be data and metadata storage, or it can be
> just metadata storage.
> 
> This overloading really is not helpful.  This patch adds function that
> retrieve the correct child for each exact purpose.  Later patches in
> this series will make use of them.  Doing so will allow us to handle
> filter nodes and external data files in a meaningful way.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/block_int.h | 57 ++++++++++++++++++++--
>  block.c                   | 99 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 153 insertions(+), 3 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

Kevin Wolf Sept. 4, 2019, 4:16 p.m. UTC | #2

Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> There are BDS children that the general block layer code can access,
> namely bs->file and bs->backing.  Since the introduction of filters and
> external data files, their meaning is not quite clear.  bs->backing can
> be a COW source, or it can be an R/W-filtered child; bs->file can be an
> R/W-filtered child, it can be data and metadata storage, or it can be
> just metadata storage.
> 
> This overloading really is not helpful.  This patch adds function that
> retrieve the correct child for each exact purpose.  Later patches in
> this series will make use of them.  Doing so will allow us to handle
> filter nodes and external data files in a meaningful way.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Each time I look at this patch, I'm confused by the function names.
Maybe I should just ask what the idea there was, or more specifically:
What does the "filtered" in "filtered child" really mean?

Apparently any child of a filter node is "filtered" (which makes sense),
but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
raw doesn't have any "filtered" child. What's the system behind this?

It looks like bdrv_filtered_child() is the right function to iterate
along a backing file chain, but I just still fail to connect that and
the name of the function in a meaningful way.

> +/*
> + * Return the child that @bs acts as an overlay for, and from which data may be
> + * copied in COW or COR operations.  Usually this is the backing file.
> + */

Or NULL, if no such child exists.

It's relatively obvious here, but for some of the functions further down
it would be really good to describe in which cases NULL is expected (or
that NULL is even a possible return value).

Kevin

Max Reitz Sept. 9, 2019, 7:56 a.m. UTC | #3

On 04.09.19 18:16, Kevin Wolf wrote:
> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>> There are BDS children that the general block layer code can access,
>> namely bs->file and bs->backing.  Since the introduction of filters and
>> external data files, their meaning is not quite clear.  bs->backing can
>> be a COW source, or it can be an R/W-filtered child; bs->file can be an
>> R/W-filtered child, it can be data and metadata storage, or it can be
>> just metadata storage.
>>
>> This overloading really is not helpful.  This patch adds function that
>> retrieve the correct child for each exact purpose.  Later patches in
>> this series will make use of them.  Doing so will allow us to handle
>> filter nodes and external data files in a meaningful way.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> Each time I look at this patch, I'm confused by the function names.
> Maybe I should just ask what the idea there was, or more specifically:
> What does the "filtered" in "filtered child" really mean?
> 
> Apparently any child of a filter node is "filtered" (which makes sense),

It isn’t, filters can have non-filter children.  For example, backup-top
could have the source as a filtered child and the target as a non-filter
child.

> but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
> raw doesn't have any "filtered" child. What's the system behind this?

“filtered” means: If the parent node returns data from this child, it
won’t modify it, neither its content nor its position.  COW and R/W
filters differ in how they handle writes; R/W filters pass them through
to the filtered child, COW filters copy them off to some other child
node (and then the filtered child’s data will no longer be visible at
that location).

The main reason behind the common “filtered” name is for the generic
functions that work on both COW and true filter (R/W filters) chains.
We need such functionality sometimes.  I personally felt like the
concept of true (R/W) filters and COW children was similar enough to
share a common name base.

qcow2 has a COW child.  As such, it acts as a COW filter in the sense of
the function names.

raw has neither a COW child nor acts as an R/W filter.  As such, it has
no filtered child.  My opinion on this hasn’t changed.

(To reiterate, in practice I see no way anyone would ever use raw as an
R/W filter.
Either you use it without offset/size, in which case you simply use it
in lieu of a format node, so you precisely don’t want it to act as a
filter when it comes to allocation information and so on (even though it
can be classified a filter here).
Or you use it as kind of a filter with offset/size, but then it no
longer is a filter.

Filters are defined by “Every filter must fulfill these conditions: ...”
– not by “Everything that fulfills these conditions is a filter”.
Marking a driver as a filter has consequences, and I don’t see why we
would want those consequences for raw.)

> It looks like bdrv_filtered_child() is the right function to iterate
> along a backing file chain, but I just still fail to connect that and
> the name of the function in a meaningful way.

It‘s the right function to iterate along a filter chain.  This includes
COW backing children and R/W filtered children.

>> +/*
>> + * Return the child that @bs acts as an overlay for, and from which data may be
>> + * copied in COW or COR operations.  Usually this is the backing file.
>> + */
> 
> Or NULL, if no such child exists.
> 
> It's relatively obvious here, but for some of the functions further down
> it would be really good to describe in which cases NULL is expected (or
> that NULL is even a possible return value).

I’ll look into it.

Max

Kevin Wolf Sept. 9, 2019, 9:36 a.m. UTC | #4

Am 09.09.2019 um 09:56 hat Max Reitz geschrieben:
> On 04.09.19 18:16, Kevin Wolf wrote:
> > Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> >> There are BDS children that the general block layer code can access,
> >> namely bs->file and bs->backing.  Since the introduction of filters and
> >> external data files, their meaning is not quite clear.  bs->backing can
> >> be a COW source, or it can be an R/W-filtered child; bs->file can be an
> >> R/W-filtered child, it can be data and metadata storage, or it can be
> >> just metadata storage.
> >>
> >> This overloading really is not helpful.  This patch adds function that
> >> retrieve the correct child for each exact purpose.  Later patches in
> >> this series will make use of them.  Doing so will allow us to handle
> >> filter nodes and external data files in a meaningful way.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > 
> > Each time I look at this patch, I'm confused by the function names.
> > Maybe I should just ask what the idea there was, or more specifically:
> > What does the "filtered" in "filtered child" really mean?
> > 
> > Apparently any child of a filter node is "filtered" (which makes sense),
> 
> It isn’t, filters can have non-filter children.  For example, backup-top
> could have the source as a filtered child and the target as a non-filter
> child.

Hm, okay, makes sense. I had a definition in mind that says that filter
nodes only have a single child node. Is it that a filter may have only a
single _filtered_ child node?

> > but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
> > raw doesn't have any "filtered" child. What's the system behind this?
> 
> “filtered” means: If the parent node returns data from this child, it
> won’t modify it, neither its content nor its position.  COW and R/W
> filters differ in how they handle writes; R/W filters pass them through
> to the filtered child, COW filters copy them off to some other child
> node (and then the filtered child’s data will no longer be visible at
> that location).

But there is no reason why a node couldn't fulfill this condition for
more than one child node. bdrv_filtered_child() isn't well-defined then.
Technically, the description "Return any filtered child" is correct
because "any" can be interpreted as "an arbitrary", but obviously that
makes the function useless.

Specficially, according to your definition, qcow2 filters both the
backing file (COW filter) and the external data file (R/W filter).

> The main reason behind the common “filtered” name is for the generic
> functions that work on both COW and true filter (R/W filters) chains.
> We need such functionality sometimes.  I personally felt like the
> concept of true (R/W) filters and COW children was similar enough to
> share a common name base.

We generally call this concept a "backing chain".

> qcow2 has a COW child.  As such, it acts as a COW filter in the sense of
> the function names.
> 
> raw has neither a COW child nor acts as an R/W filter.  As such, it has
> no filtered child.  My opinion on this hasn’t changed.
> 
> (To reiterate, in practice I see no way anyone would ever use raw as an
> R/W filter.
> Either you use it without offset/size, in which case you simply use it
> in lieu of a format node, so you precisely don’t want it to act as a
> filter when it comes to allocation information and so on (even though it
> can be classified a filter here).
> Or you use it as kind of a filter with offset/size, but then it no
> longer is a filter.

Agreed with offset, but with only size, it matches your definition of a
filter.

> Filters are defined by “Every filter must fulfill these conditions: ...”
> – not by “Everything that fulfills these conditions is a filter”.
> Marking a driver as a filter has consequences, and I don’t see why we
> would want those consequences for raw.)
> 
> > It looks like bdrv_filtered_child() is the right function to iterate
> > along a backing file chain, but I just still fail to connect that and
> > the name of the function in a meaningful way.
> 
> It‘s the right function to iterate along a filter chain.  This includes
> COW backing children and R/W filtered children.

qcow2 doesn't fulfill the conditions for begin a filter driver. Two of
its possible children fulfill the conditions for being a filtered child.
You can pick either approach, talking about a "filter chain" just
doesn't make sense there. Either the chain is broken by a non-filter
driver like qcow2, or it must become a filter tree.

What we're really interested in is iterating the backing chain even
across filter nodes, so your implementation achieves the right result.
It just feels completely arbitrary, counterintuitive and confusing to
call this a (or actually "the") "filter chain" and to pretend that the
name tells anyone what it really is.

Kevin

Max Reitz Sept. 9, 2019, 2:04 p.m. UTC | #5

On 09.09.19 11:36, Kevin Wolf wrote:
> Am 09.09.2019 um 09:56 hat Max Reitz geschrieben:
>> On 04.09.19 18:16, Kevin Wolf wrote:
>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>>>> There are BDS children that the general block layer code can access,
>>>> namely bs->file and bs->backing.  Since the introduction of filters and
>>>> external data files, their meaning is not quite clear.  bs->backing can
>>>> be a COW source, or it can be an R/W-filtered child; bs->file can be an
>>>> R/W-filtered child, it can be data and metadata storage, or it can be
>>>> just metadata storage.
>>>>
>>>> This overloading really is not helpful.  This patch adds function that
>>>> retrieve the correct child for each exact purpose.  Later patches in
>>>> this series will make use of them.  Doing so will allow us to handle
>>>> filter nodes and external data files in a meaningful way.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>
>>> Each time I look at this patch, I'm confused by the function names.
>>> Maybe I should just ask what the idea there was, or more specifically:
>>> What does the "filtered" in "filtered child" really mean?
>>>
>>> Apparently any child of a filter node is "filtered" (which makes sense),
>>
>> It isn’t, filters can have non-filter children.  For example, backup-top
>> could have the source as a filtered child and the target as a non-filter
>> child.
> 
> Hm, okay, makes sense. I had a definition in mind that says that filter
> nodes only have a single child node. Is it that a filter may have only a
> single _filtered_ child node?

Well, there’s Quorum...

>>> but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
>>> raw doesn't have any "filtered" child. What's the system behind this?
>>
>> “filtered” means: If the parent node returns data from this child, it
>> won’t modify it, neither its content nor its position.  COW and R/W
>> filters differ in how they handle writes; R/W filters pass them through
>> to the filtered child, COW filters copy them off to some other child
>> node (and then the filtered child’s data will no longer be visible at
>> that location).
> 
> But there is no reason why a node couldn't fulfill this condition for
> more than one child node. bdrv_filtered_child() isn't well-defined then.
> Technically, the description "Return any filtered child" is correct
> because "any" can be interpreted as "an arbitrary", but obviously that
> makes the function useless.

Which is why it currently returns NULL for Quorum.

> Specficially, according to your definition, qcow2 filters both the
> backing file (COW filter) and the external data file (R/W filter).

Not wrong.  But the same question as for raw arises: Is there any use to
declaring qcow2 an R/W filter driver just because it fits the definition?

>> The main reason behind the common “filtered” name is for the generic
>> functions that work on both COW and true filter (R/W filters) chains.
>> We need such functionality sometimes.  I personally felt like the
>> concept of true (R/W) filters and COW children was similar enough to
>> share a common name base.
> 
> We generally call this concept a "backing chain".

I suppose that’s an exclusive “we”?  Because I use ”backing chain” to
refer to COW chains exclusively.

Such a chain may or may not include filters, but they are not really
load-bearing nodes of the chain.  As such, I generally want to skip them
when looking at a backing chain (hence e.g. bdrv_backing_chain_next()).

From what I can tell nobody has ever formalized any terms regarding COW
backing chains or R/W filter chains.  This series is an attempt.

>> qcow2 has a COW child.  As such, it acts as a COW filter in the sense of
>> the function names.
>>
>> raw has neither a COW child nor acts as an R/W filter.  As such, it has
>> no filtered child.  My opinion on this hasn’t changed.
>>
>> (To reiterate, in practice I see no way anyone would ever use raw as an
>> R/W filter.
>> Either you use it without offset/size, in which case you simply use it
>> in lieu of a format node, so you precisely don’t want it to act as a
>> filter when it comes to allocation information and so on (even though it
>> can be classified a filter here).
>> Or you use it as kind of a filter with offset/size, but then it no
>> longer is a filter.
> 
> Agreed with offset, but with only size, it matches your definition of a
> filter.

So?

Should we treat it as a filter when @offset is 0 but otherwise not?
That totally wouldn’t be confusing to users.

>> Filters are defined by “Every filter must fulfill these conditions: ...”
>> – not by “Everything that fulfills these conditions is a filter”.
>> Marking a driver as a filter has consequences, and I don’t see why we
>> would want those consequences for raw.)
>>
>>> It looks like bdrv_filtered_child() is the right function to iterate
>>> along a backing file chain, but I just still fail to connect that and
>>> the name of the function in a meaningful way.
>>
>> It‘s the right function to iterate along a filter chain.  This includes
>> COW backing children and R/W filtered children.
> 
> qcow2 doesn't fulfill the conditions for begin a filter driver. Two of
> its possible children fulfill the conditions for being a filtered child.
> You can pick either approach, talking about a "filter chain" just
> doesn't make sense there. Either the chain is broken by a non-filter
> driver like qcow2, or it must become a filter tree.

I have no idea what your point is.  There is no point in making it a
filter tree at this point, just as we never had a backing tree.

And the good example is Quorum.  qcow2 is a bad example because there is
no benefit in marking it an R/W filter for its external data file,
exactly like is the case for raw.

> What we're really interested in is iterating the backing chain even
> across filter nodes, so your implementation achieves the right result.
> It just feels completely arbitrary, counterintuitive and confusing to
> call this a (or actually "the") "filter chain" and to pretend that the
> name tells anyone what it really is.

So exactly the same as “bs->backing” or “backing chain” for me.

You disagreeing with me on these terms to me shows that there is a need
to formalize.  This is precisely what I want to do in this series.

The fact that we don’t use the term “filter chain” so far is the reason
why I introduce it.  Because it comes as a clean slate.  “backing chain”
already means something to me, and it means something different.

Max

Kevin Wolf Sept. 9, 2019, 4:13 p.m. UTC | #6

Am 09.09.2019 um 16:04 hat Max Reitz geschrieben:
> On 09.09.19 11:36, Kevin Wolf wrote:
> > Am 09.09.2019 um 09:56 hat Max Reitz geschrieben:
> >> On 04.09.19 18:16, Kevin Wolf wrote:
> >>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> >>>> There are BDS children that the general block layer code can access,
> >>>> namely bs->file and bs->backing.  Since the introduction of filters and
> >>>> external data files, their meaning is not quite clear.  bs->backing can
> >>>> be a COW source, or it can be an R/W-filtered child; bs->file can be an
> >>>> R/W-filtered child, it can be data and metadata storage, or it can be
> >>>> just metadata storage.
> >>>>
> >>>> This overloading really is not helpful.  This patch adds function that
> >>>> retrieve the correct child for each exact purpose.  Later patches in
> >>>> this series will make use of them.  Doing so will allow us to handle
> >>>> filter nodes and external data files in a meaningful way.
> >>>>
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>
> >>> Each time I look at this patch, I'm confused by the function names.
> >>> Maybe I should just ask what the idea there was, or more specifically:
> >>> What does the "filtered" in "filtered child" really mean?
> >>>
> >>> Apparently any child of a filter node is "filtered" (which makes sense),
> >>
> >> It isn’t, filters can have non-filter children.  For example, backup-top
> >> could have the source as a filtered child and the target as a non-filter
> >> child.
> > 
> > Hm, okay, makes sense. I had a definition in mind that says that filter
> > nodes only have a single child node. Is it that a filter may have only a
> > single _filtered_ child node?
> 
> Well, there’s Quorum...

Ah, nice, quorum sets is_filter = true even though it neither fulfulls
the conditions for it before this series, nor the changed conditions
after this series.

So either quorum lies and isn't actually a filter driver, or our
definition in the documentation of is_filter is wrong.

> >>> but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
> >>> raw doesn't have any "filtered" child. What's the system behind this?
> >>
> >> “filtered” means: If the parent node returns data from this child, it
> >> won’t modify it, neither its content nor its position.  COW and R/W
> >> filters differ in how they handle writes; R/W filters pass them through
> >> to the filtered child, COW filters copy them off to some other child
> >> node (and then the filtered child’s data will no longer be visible at
> >> that location).
> > 
> > But there is no reason why a node couldn't fulfill this condition for
> > more than one child node. bdrv_filtered_child() isn't well-defined then.
> > Technically, the description "Return any filtered child" is correct
> > because "any" can be interpreted as "an arbitrary", but obviously that
> > makes the function useless.
> 
> Which is why it currently returns NULL for Quorum.

Which is about the only possible choice that breaks the contract...

 * Return any filtered child, independently of how it reacts to write
 * accesses and whether data is copied onto this BDS through COR.

Maybe the documentation of bdrv_filtered_child() needs to be rephrased?

Going back to qcow2, it's really not much different as it has multiple
(two) filtered children, too. So if quorum returns NULL to mean "no
unambiguous result", why does it return bs->backing instead of NULL for
a qcow2 node?

(Yes, I know, because it's useful. But I'm trying to get some basic
consistency into these interfaces.)

> > Specficially, according to your definition, qcow2 filters both the
> > backing file (COW filter) and the external data file (R/W filter).
> 
> Not wrong.  But the same question as for raw arises: Is there any use to
> declaring qcow2 an R/W filter driver just because it fits the definition?

Wait, where is there even a place where this could be declared?

The once thing I see that a driver even can declare is drv->is_filter,
which is about the whole driver and not about nodes. It is false for
qcow2.

Then you made some criteria above that tell us whether a specific child
of a node is a filtered child or not. As it happens, qcow2 (which is not
a filter driver) can have two children that match the criteria for being
filtered children.

I already think this is a bit inconsistent, because why should a driver
that declares itself a non-filter be considered to filter children?
Okay, you say a broader definition of a filtered child is useful because
you can then include all BdrvChild links in a backing/filter chain. Fair
enough, it's not intuitive, but use a broader definition then.

But the point where you say that even though two of the children
are filtered children under your broader definition, for the purpose of
the API only one of them should be considered because the other one
isn't that useful, that's really one inconsistency too much for me. You
can't use a broad definition and then arbitrarily restrict the
definition again so that it matches the special case you're currently
interested in.

Either use a narrow definition, or use a broad one. But use only one and
use it consistently.

> >> The main reason behind the common “filtered” name is for the generic
> >> functions that work on both COW and true filter (R/W filters) chains.
> >> We need such functionality sometimes.  I personally felt like the
> >> concept of true (R/W) filters and COW children was similar enough to
> >> share a common name base.
> > 
> > We generally call this concept a "backing chain".
> 
> I suppose that’s an exclusive “we”?  Because I use ”backing chain” to
> refer to COW chains exclusively.
> 
> Such a chain may or may not include filters, but they are not really
> load-bearing nodes of the chain.  As such, I generally want to skip them
> when looking at a backing chain (hence e.g. bdrv_backing_chain_next()).
> 
> From what I can tell nobody has ever formalized any terms regarding COW
> backing chains or R/W filter chains.  This series is an attempt.

Well, as you can see, this attempt feels confusing to me.

I agree with your naming of bdrv_backing_chain_next(), it's clear enough
what path it will follow down the graph. I just disagree that "filter
chain" is a good term for something that prefers backing file links when
it has a choice.

> >> qcow2 has a COW child.  As such, it acts as a COW filter in the sense of
> >> the function names.
> >>
> >> raw has neither a COW child nor acts as an R/W filter.  As such, it has
> >> no filtered child.  My opinion on this hasn’t changed.
> >>
> >> (To reiterate, in practice I see no way anyone would ever use raw as an
> >> R/W filter.
> >> Either you use it without offset/size, in which case you simply use it
> >> in lieu of a format node, so you precisely don’t want it to act as a
> >> filter when it comes to allocation information and so on (even though it
> >> can be classified a filter here).
> >> Or you use it as kind of a filter with offset/size, but then it no
> >> longer is a filter.
> > 
> > Agreed with offset, but with only size, it matches your definition of a
> > filter.
> 
> So?
> 
> Should we treat it as a filter when @offset is 0 but otherwise not?
> That totally wouldn’t be confusing to users.

No, I'm just applying your definitions to see if the contradictions
between them and your explanations are of any importance. *shrug*

> >> Filters are defined by “Every filter must fulfill these conditions: ...”
> >> – not by “Everything that fulfills these conditions is a filter”.
> >> Marking a driver as a filter has consequences, and I don’t see why we
> >> would want those consequences for raw.)
> >>
> >>> It looks like bdrv_filtered_child() is the right function to iterate
> >>> along a backing file chain, but I just still fail to connect that and
> >>> the name of the function in a meaningful way.
> >>
> >> It‘s the right function to iterate along a filter chain.  This includes
> >> COW backing children and R/W filtered children.
> > 
> > qcow2 doesn't fulfill the conditions for begin a filter driver. Two of
> > its possible children fulfill the conditions for being a filtered child.
> > You can pick either approach, talking about a "filter chain" just
> > doesn't make sense there. Either the chain is broken by a non-filter
> > driver like qcow2, or it must become a filter tree.
> 
> I have no idea what your point is.  There is no point in making it a
> filter tree at this point, just as we never had a backing tree.
> 
> And the good example is Quorum.  qcow2 is a bad example because there is
> no benefit in marking it an R/W filter for its external data file,
> exactly like is the case for raw.

My point is not about changing the logic in the code, but about using
names that actually describe accurately what the logic does.

And as I said above, neither is "not useful" a convincing argument for
ignoring filtered children (as I think we're trying to build something
rather generic, not something that works only for what we consider
useful today) nor do I see how qcow2 could be marked or not marked an
R/W filter (as mentioned above).

> > What we're really interested in is iterating the backing chain even
> > across filter nodes, so your implementation achieves the right result.
> > It just feels completely arbitrary, counterintuitive and confusing to
> > call this a (or actually "the") "filter chain" and to pretend that the
> > name tells anyone what it really is.
> 
> So exactly the same as “bs->backing” or “backing chain” for me.
> 
> You disagreeing with me on these terms to me shows that there is a need
> to formalize.  This is precisely what I want to do in this series.
> 
> The fact that we don’t use the term “filter chain” so far is the reason
> why I introduce it.  Because it comes as a clean slate.  “backing chain”
> already means something to me, and it means something different.

Well, if "backing chain" is too narrow, "filter chain" is both too
unspecific and inconsistent with the various definitions of "filter" and
"filtered" we're using, and we can't think of anything more concise, we
might have to use names that just mention both.

bdrv_cow_child() // don't call COW a filter, because .is_filter = false
bdrv_filter_child() // your R/W filter, only for .is_filter = true nodes
bdrv_filter_or_cow_child()

Or something like that. This would bring some more consistency into the
way we use the words filter/filtered at least.

Kevin

Max Reitz Sept. 10, 2019, 9:14 a.m. UTC | #7

On 09.09.19 18:13, Kevin Wolf wrote:
> Am 09.09.2019 um 16:04 hat Max Reitz geschrieben:
>> On 09.09.19 11:36, Kevin Wolf wrote:
>>> Am 09.09.2019 um 09:56 hat Max Reitz geschrieben:
>>>> On 04.09.19 18:16, Kevin Wolf wrote:
>>>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>>>>>> There are BDS children that the general block layer code can access,
>>>>>> namely bs->file and bs->backing.  Since the introduction of filters and
>>>>>> external data files, their meaning is not quite clear.  bs->backing can
>>>>>> be a COW source, or it can be an R/W-filtered child; bs->file can be an
>>>>>> R/W-filtered child, it can be data and metadata storage, or it can be
>>>>>> just metadata storage.
>>>>>>
>>>>>> This overloading really is not helpful.  This patch adds function that
>>>>>> retrieve the correct child for each exact purpose.  Later patches in
>>>>>> this series will make use of them.  Doing so will allow us to handle
>>>>>> filter nodes and external data files in a meaningful way.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>
>>>>> Each time I look at this patch, I'm confused by the function names.
>>>>> Maybe I should just ask what the idea there was, or more specifically:
>>>>> What does the "filtered" in "filtered child" really mean?
>>>>>
>>>>> Apparently any child of a filter node is "filtered" (which makes sense),
>>>>
>>>> It isn’t, filters can have non-filter children.  For example, backup-top
>>>> could have the source as a filtered child and the target as a non-filter
>>>> child.
>>>
>>> Hm, okay, makes sense. I had a definition in mind that says that filter
>>> nodes only have a single child node. Is it that a filter may have only a
>>> single _filtered_ child node?
>>
>> Well, there’s Quorum...
> 
> Ah, nice, quorum sets is_filter = true even though it neither fulfulls
> the conditions for it before this series, nor the changed conditions
> after this series.
> 
> So either quorum lies and isn't actually a filter driver, or our
> definition in the documentation of is_filter is wrong.

You could say it lies because in FIFO mode it clearly isn’t a filter for
all of its children.

There is a reason for lying, though, which is
bdrv_recurse_is_first_non_filter(), which is necessary to use the whole
to_replace mirror stuff.

(You mirror from a quorum with a failed child and then replace the
failed child.  mirror needs to ensure that there are only R/W filters
between the child and the mirror source so that replacing it will not
suddenly change any visible data.  Which is actually a lie for quorum,
because the child is clearly broken and thus precisely doesn’t show the
same data...)

Maybe we should stop declaring Quorum a filter and then rename the
bdrv_recurse_is_first_non_filter() to, I don’t know,
bdrv_recurse_can_be_replaced_by_mirror()?

>>>>> but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
>>>>> raw doesn't have any "filtered" child. What's the system behind this?
>>>>
>>>> “filtered” means: If the parent node returns data from this child, it
>>>> won’t modify it, neither its content nor its position.  COW and R/W
>>>> filters differ in how they handle writes; R/W filters pass them through
>>>> to the filtered child, COW filters copy them off to some other child
>>>> node (and then the filtered child’s data will no longer be visible at
>>>> that location).
>>>
>>> But there is no reason why a node couldn't fulfill this condition for
>>> more than one child node. bdrv_filtered_child() isn't well-defined then.
>>> Technically, the description "Return any filtered child" is correct
>>> because "any" can be interpreted as "an arbitrary", but obviously that
>>> makes the function useless.
>>
>> Which is why it currently returns NULL for Quorum.
> 
> Which is about the only possible choice that breaks the contract...
> 
>  * Return any filtered child, independently of how it reacts to write

I don’t know if you’re serious about this proposition, because I don’t
know whether that could be useful in any way. :-?

>  * accesses and whether data is copied onto this BDS through COR.

I meant the contract as “Return the single filtered child there is, or NULL”

> Maybe the documentation of bdrv_filtered_child() needs to be rephrased?
> 
> Going back to qcow2, it's really not much different as it has multiple
> (two) filtered children, too.

Well, it doesn’t.  It isn’t an R/W filter.

Maybe what we actually need to rephrase is the definition of .is_filter.
 (Namely something along the lines of “Fulfills these guarantees (same
data, etc. pp.), *and* should be skipped for allocation information
queries etc.”.

> So if quorum returns NULL to mean "no
> unambiguous result", why does it return bs->backing instead of NULL for
> a qcow2 node?
> 
> (Yes, I know, because it's useful. But I'm trying to get some basic
> consistency into these interfaces.)

Not precisely because it’s useful, but because qcow2 does not have
.is_filter set.  :-)
(And it doesn’t have it set because that wouldn’t be useful.)

>>> Specficially, according to your definition, qcow2 filters both the
>>> backing file (COW filter) and the external data file (R/W filter).
>>
>> Not wrong.  But the same question as for raw arises: Is there any use to
>> declaring qcow2 an R/W filter driver just because it fits the definition?
> 
> Wait, where is there even a place where this could be declared?
> 
> The once thing I see that a driver even can declare is drv->is_filter,
> which is about the whole driver and not about nodes. It is false for
> qcow2.

That’s correct.  But that’s not a fundamental problem, of course, we
could make it a per-BDS attribute if that made sense.

> Then you made some criteria above that tell us whether a specific child
> of a node is a filtered child or not. As it happens, qcow2 (which is not
> a filter driver) can have two children that match the criteria for being
> filtered children.

But just arguing that I’m incapable of giving a good definition won’t
bring us along.

> I already think this is a bit inconsistent, because why should a driver
> that declares itself a non-filter be considered to filter children?

.is_filter is for R/W filters.  COW filters have .supports_backing for that.

> Okay, you say a broader definition of a filtered child is useful because
> you can then include all BdrvChild links in a backing/filter chain. Fair
> enough, it's not intuitive, but use a broader definition then.
> 
> But the point where you say that even though two of the children
> are filtered children under your broader definition, for the purpose of
> the API only one of them should be considered because the other one
> isn't that useful, that's really one inconsistency too much for me. You
> can't use a broad definition and then arbitrarily restrict the
> definition again so that it matches the special case you're currently
> interested in.
> 
> Either use a narrow definition, or use a broad one. But use only one and
> use it consistently.

I think the problem appears because you restrict the process to a single
step where there’s actually two.

Drivers can be either
(1) R/W filters (e.g. throttle)
(2) COW filters (e.g. qcow2)
(3) None of the above (e.g. vhdx, curl)

This choice is made on the driver level, not on the node level (for good
reason, see below*).

And then we derive the node’s filtered children from what the driver is.
 If it’s an R/W filter, bdrv_filtered_child() will return the R/W
filtered child.  If it’s a COW filter, bdrv_filtered_child() will return
the potentially existing COW backing child (or NULL, if there is no
backing child).

*
What is clear to me is that it isn’t useful to treat nodes of a specific
driver sometimes as R/W filter nodes and sometimes not.  R/W filter
nodes are handled differently from other nodes, and it would be
confusing if a certain driver sometimes behaves this and sometimes that
way.  (For example, if you put a raw node on top of a qcow2 node,
sometimes it would stop the backing chain, sometimes it wouldn’t.  That
makes no sense to me.)

OTOH, for COW filters, we do exactly that.  Sometimes they have a
backing file, sometimes they don’t.  That’s completely fine because
their overall behavior doesn’t change.

That makes me agree that there is indeed too much of a difference
between R/W filters and COW filters to lump them together under the
“filter” label.

[...]

> My point is not about changing the logic in the code, but about using
> names that actually describe accurately what the logic does.

Again, naming things is hard.

[...]

>> You disagreeing with me on these terms to me shows that there is a need
>> to formalize.  This is precisely what I want to do in this series.
>>
>> The fact that we don’t use the term “filter chain” so far is the reason
>> why I introduce it.  Because it comes as a clean slate.  “backing chain”
>> already means something to me, and it means something different.
> 
> Well, if "backing chain" is too narrow, "filter chain" is both too
> unspecific and inconsistent with the various definitions of "filter" and
> "filtered" we're using, and we can't think of anything more concise, we
> might have to use names that just mention both.
> 
> bdrv_cow_child() // don't call COW a filter, because .is_filter = false
> bdrv_filter_child() // your R/W filter, only for .is_filter = true nodes
> bdrv_filter_or_cow_child()
> 
> Or something like that. This would bring some more consistency into the
> way we use the words filter/filtered at least.

I’ll see how that looks overall, but why not.  Sounds good to me.

Max

Kevin Wolf Sept. 10, 2019, 10:47 a.m. UTC | #8

Am 10.09.2019 um 11:14 hat Max Reitz geschrieben:
> On 09.09.19 18:13, Kevin Wolf wrote:
> > Am 09.09.2019 um 16:04 hat Max Reitz geschrieben:
> >> On 09.09.19 11:36, Kevin Wolf wrote:
> >>> Am 09.09.2019 um 09:56 hat Max Reitz geschrieben:
> >>>> On 04.09.19 18:16, Kevin Wolf wrote:
> >>>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
> >>>>>> There are BDS children that the general block layer code can access,
> >>>>>> namely bs->file and bs->backing.  Since the introduction of filters and
> >>>>>> external data files, their meaning is not quite clear.  bs->backing can
> >>>>>> be a COW source, or it can be an R/W-filtered child; bs->file can be an
> >>>>>> R/W-filtered child, it can be data and metadata storage, or it can be
> >>>>>> just metadata storage.
> >>>>>>
> >>>>>> This overloading really is not helpful.  This patch adds function that
> >>>>>> retrieve the correct child for each exact purpose.  Later patches in
> >>>>>> this series will make use of them.  Doing so will allow us to handle
> >>>>>> filter nodes and external data files in a meaningful way.
> >>>>>>
> >>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>>>
> >>>>> Each time I look at this patch, I'm confused by the function names.
> >>>>> Maybe I should just ask what the idea there was, or more specifically:
> >>>>> What does the "filtered" in "filtered child" really mean?
> >>>>>
> >>>>> Apparently any child of a filter node is "filtered" (which makes sense),
> >>>>
> >>>> It isn’t, filters can have non-filter children.  For example, backup-top
> >>>> could have the source as a filtered child and the target as a non-filter
> >>>> child.
> >>>
> >>> Hm, okay, makes sense. I had a definition in mind that says that filter
> >>> nodes only have a single child node. Is it that a filter may have only a
> >>> single _filtered_ child node?
> >>
> >> Well, there’s Quorum...
> > 
> > Ah, nice, quorum sets is_filter = true even though it neither fulfulls
> > the conditions for it before this series, nor the changed conditions
> > after this series.
> > 
> > So either quorum lies and isn't actually a filter driver, or our
> > definition in the documentation of is_filter is wrong.
> 
> You could say it lies because in FIFO mode it clearly isn’t a filter for
> all of its children.
> 
> There is a reason for lying, though, which is
> bdrv_recurse_is_first_non_filter(), which is necessary to use the whole
> to_replace mirror stuff.

Hm, actually, now that you mention bdrv_recurse_is_first_non_filter(),
quorum was the first driver to declare itself a filter, so strictly
speaking, if there is an inconsistency, it's the other uses that are
abusing the field...

> (You mirror from a quorum with a failed child and then replace the
> failed child.  mirror needs to ensure that there are only R/W filters
> between the child and the mirror source so that replacing it will not
> suddenly change any visible data.  Which is actually a lie for quorum,
> because the child is clearly broken and thus precisely doesn’t show the
> same data...)
> 
> Maybe we should stop declaring Quorum a filter and then rename the
> bdrv_recurse_is_first_non_filter() to, I don’t know,
> bdrv_recurse_can_be_replaced_by_mirror()?

Why not.

> >>>>> but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
> >>>>> raw doesn't have any "filtered" child. What's the system behind this?
> >>>>
> >>>> “filtered” means: If the parent node returns data from this child, it
> >>>> won’t modify it, neither its content nor its position.  COW and R/W
> >>>> filters differ in how they handle writes; R/W filters pass them through
> >>>> to the filtered child, COW filters copy them off to some other child
> >>>> node (and then the filtered child’s data will no longer be visible at
> >>>> that location).
> >>>
> >>> But there is no reason why a node couldn't fulfill this condition for
> >>> more than one child node. bdrv_filtered_child() isn't well-defined then.
> >>> Technically, the description "Return any filtered child" is correct
> >>> because "any" can be interpreted as "an arbitrary", but obviously that
> >>> makes the function useless.
> >>
> >> Which is why it currently returns NULL for Quorum.
> > 
> > Which is about the only possible choice that breaks the contract...
> > 
> >  * Return any filtered child, independently of how it reacts to write
> 
> I don’t know if you’re serious about this proposition, because I don’t
> know whether that could be useful in any way. :-?

Huh? This is just quoting the contract from your code?

> >  * accesses and whether data is copied onto this BDS through COR.
> 
> I meant the contract as “Return the single filtered child there is, or NULL”

Then that should probably be spelt out in the contract. Probably even
explicitly "NULL if there is either no filtered child or multiple
filtered children".

> > Maybe the documentation of bdrv_filtered_child() needs to be rephrased?
> > 
> > Going back to qcow2, it's really not much different as it has multiple
> > (two) filtered children, too.
> 
> Well, it doesn’t.  It isn’t an R/W filter.

What do I have to look at to see whether something is an R/W filter or
not? qcow2 matches your criteria for an R/W filter. You say that it's
not useful, so it's not an R/W filter anyway. But where in the code
could I get this information?

This just doesn't make sense to me. If a driver matches the criteria for
an R/W filter, then it should be one. If qcow2 should not be considered
a R/W filter, then the criteria must be changed so that it isn't.

> Maybe what we actually need to rephrase is the definition of .is_filter.
>  (Namely something along the lines of “Fulfills these guarantees (same
> data, etc. pp.), *and* should be skipped for allocation information
> queries etc.”.

Hm - does this imply that .is_filter == this is a R/W filter? Because
this was never spelt out, neither in code comments nor in commit
messages.

If we called R/W filters just "filters" (which makes it obvious how it
relates to .is_filter) and COW nodes something that doesn't include the
word "filter", things might become a lot clearer.

> > So if quorum returns NULL to mean "no
> > unambiguous result", why does it return bs->backing instead of NULL for
> > a qcow2 node?
> > 
> > (Yes, I know, because it's useful. But I'm trying to get some basic
> > consistency into these interfaces.)
> 
> Not precisely because it’s useful, but because qcow2 does not have
> .is_filter set.  :-)
> (And it doesn’t have it set because that wouldn’t be useful.)
> 
> >>> Specficially, according to your definition, qcow2 filters both the
> >>> backing file (COW filter) and the external data file (R/W filter).
> >>
> >> Not wrong.  But the same question as for raw arises: Is there any use to
> >> declaring qcow2 an R/W filter driver just because it fits the definition?
> > 
> > Wait, where is there even a place where this could be declared?
> > 
> > The once thing I see that a driver even can declare is drv->is_filter,
> > which is about the whole driver and not about nodes. It is false for
> > qcow2.
> 
> That’s correct.  But that’s not a fundamental problem, of course, we
> could make it a per-BDS attribute if that made sense.

I was thinking per-child, actually, because you declare one BdrvChild
filtered and another not filtered.

But by now I think most of the confusion is really just a result of COW
being considered a filter in some respects (mainly just the names of the
child access functions), but not in others (like .is_filter).

> > Then you made some criteria above that tell us whether a specific child
> > of a node is a filtered child or not. As it happens, qcow2 (which is not
> > a filter driver) can have two children that match the criteria for being
> > filtered children.
> 
> But just arguing that I’m incapable of giving a good definition won’t
> bring us along.
> 
> > I already think this is a bit inconsistent, because why should a driver
> > that declares itself a non-filter be considered to filter children?
> 
> .is_filter is for R/W filters.  COW filters have .supports_backing for that.

Okay, so you confirm what I concluded above.

> > Okay, you say a broader definition of a filtered child is useful because
> > you can then include all BdrvChild links in a backing/filter chain. Fair
> > enough, it's not intuitive, but use a broader definition then.
> > 
> > But the point where you say that even though two of the children
> > are filtered children under your broader definition, for the purpose of
> > the API only one of them should be considered because the other one
> > isn't that useful, that's really one inconsistency too much for me. You
> > can't use a broad definition and then arbitrarily restrict the
> > definition again so that it matches the special case you're currently
> > interested in.
> > 
> > Either use a narrow definition, or use a broad one. But use only one and
> > use it consistently.
> 
> I think the problem appears because you restrict the process to a single
> step where there’s actually two.
> 
> Drivers can be either
> (1) R/W filters (e.g. throttle)
> (2) COW filters (e.g. qcow2)
> (3) None of the above (e.g. vhdx, curl)
> 
> This choice is made on the driver level, not on the node level (for good
> reason, see below*).

What prevents a driver from being
(4) COW filter and R/W filter (e.g. qcow2 if it were useful)?

I mean, conceptually, not in the implementation.

> And then we derive the node’s filtered children from what the driver is.
>  If it’s an R/W filter, bdrv_filtered_child() will return the R/W
> filtered child.  If it’s a COW filter, bdrv_filtered_child() will return
> the potentially existing COW backing child (or NULL, if there is no
> backing child).

I guess it boils down to me just not being able to get behind the
concept that COW is some sort of filter (especially when other things
like an external data file aren't).

> *
> What is clear to me is that it isn’t useful to treat nodes of a specific
> driver sometimes as R/W filter nodes and sometimes not.  R/W filter
> nodes are handled differently from other nodes, and it would be
> confusing if a certain driver sometimes behaves this and sometimes that
> way.  (For example, if you put a raw node on top of a qcow2 node,
> sometimes it would stop the backing chain, sometimes it wouldn’t.  That
> makes no sense to me.)
> 
> OTOH, for COW filters, we do exactly that.  Sometimes they have a
> backing file, sometimes they don’t.  That’s completely fine because
> their overall behavior doesn’t change.
> 
> 
> That makes me agree that there is indeed too much of a difference
> between R/W filters and COW filters to lump them together under the
> “filter” label.
> 
> [...]
> 
> > My point is not about changing the logic in the code, but about using
> > names that actually describe accurately what the logic does.
> 
> Again, naming things is hard.
> 
> [...]
> 
> >> You disagreeing with me on these terms to me shows that there is a need
> >> to formalize.  This is precisely what I want to do in this series.
> >>
> >> The fact that we don’t use the term “filter chain” so far is the reason
> >> why I introduce it.  Because it comes as a clean slate.  “backing chain”
> >> already means something to me, and it means something different.
> > 
> > Well, if "backing chain" is too narrow, "filter chain" is both too
> > unspecific and inconsistent with the various definitions of "filter" and
> > "filtered" we're using, and we can't think of anything more concise, we
> > might have to use names that just mention both.
> > 
> > bdrv_cow_child() // don't call COW a filter, because .is_filter = false
> > bdrv_filter_child() // your R/W filter, only for .is_filter = true nodes
> > bdrv_filter_or_cow_child()
> > 
> > Or something like that. This would bring some more consistency into the
> > way we use the words filter/filtered at least.
> 
> I’ll see how that looks overall, but why not.  Sounds good to me.

Good. Or, well, good enough at least. ;-)

bdrv_filter_or_cow_child() is not a pretty name, but as long as we can't
think of anything that accurately covers both in a single word, it will
do the job...

Kevin

Max Reitz Sept. 10, 2019, 11:36 a.m. UTC | #9

On 10.09.19 12:47, Kevin Wolf wrote:
> Am 10.09.2019 um 11:14 hat Max Reitz geschrieben:
>> On 09.09.19 18:13, Kevin Wolf wrote:
>>> Am 09.09.2019 um 16:04 hat Max Reitz geschrieben:
>>>> On 09.09.19 11:36, Kevin Wolf wrote:
>>>>> Am 09.09.2019 um 09:56 hat Max Reitz geschrieben:
>>>>>> On 04.09.19 18:16, Kevin Wolf wrote:
>>>>>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>>>>>>>> There are BDS children that the general block layer code can access,
>>>>>>>> namely bs->file and bs->backing.  Since the introduction of filters and
>>>>>>>> external data files, their meaning is not quite clear.  bs->backing can
>>>>>>>> be a COW source, or it can be an R/W-filtered child; bs->file can be an
>>>>>>>> R/W-filtered child, it can be data and metadata storage, or it can be
>>>>>>>> just metadata storage.
>>>>>>>>
>>>>>>>> This overloading really is not helpful.  This patch adds function that
>>>>>>>> retrieve the correct child for each exact purpose.  Later patches in
>>>>>>>> this series will make use of them.  Doing so will allow us to handle
>>>>>>>> filter nodes and external data files in a meaningful way.
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>
>>>>>>> Each time I look at this patch, I'm confused by the function names.
>>>>>>> Maybe I should just ask what the idea there was, or more specifically:
>>>>>>> What does the "filtered" in "filtered child" really mean?
>>>>>>>
>>>>>>> Apparently any child of a filter node is "filtered" (which makes sense),
>>>>>>
>>>>>> It isn’t, filters can have non-filter children.  For example, backup-top
>>>>>> could have the source as a filtered child and the target as a non-filter
>>>>>> child.
>>>>>
>>>>> Hm, okay, makes sense. I had a definition in mind that says that filter
>>>>> nodes only have a single child node. Is it that a filter may have only a
>>>>> single _filtered_ child node?
>>>>
>>>> Well, there’s Quorum...
>>>
>>> Ah, nice, quorum sets is_filter = true even though it neither fulfulls
>>> the conditions for it before this series, nor the changed conditions
>>> after this series.
>>>
>>> So either quorum lies and isn't actually a filter driver, or our
>>> definition in the documentation of is_filter is wrong.
>>
>> You could say it lies because in FIFO mode it clearly isn’t a filter for
>> all of its children.
>>
>> There is a reason for lying, though, which is
>> bdrv_recurse_is_first_non_filter(), which is necessary to use the whole
>> to_replace mirror stuff.
> 
> Hm, actually, now that you mention bdrv_recurse_is_first_non_filter(),
> quorum was the first driver to declare itself a filter, so strictly
> speaking, if there is an inconsistency, it's the other uses that are
> abusing the field...
> 
>> (You mirror from a quorum with a failed child and then replace the
>> failed child.  mirror needs to ensure that there are only R/W filters
>> between the child and the mirror source so that replacing it will not
>> suddenly change any visible data.  Which is actually a lie for quorum,
>> because the child is clearly broken and thus precisely doesn’t show the
>> same data...)
>>
>> Maybe we should stop declaring Quorum a filter and then rename the
>> bdrv_recurse_is_first_non_filter() to, I don’t know,
>> bdrv_recurse_can_be_replaced_by_mirror()?
> 
> Why not.

It feels difficult to do in this series because this is a whole new can
of worms.

In patch 35, I actually replace the mirror use case by
is_filtered_child().  So it looks to me as if that should not be done,
because I should instead fix bdrv_recurse_is_first_non_filter() (and
rename it), because quorum does allow replacing its children by mirror,
even if it does not act as a filter for them.

OTOH, there are other users of bdrv_is_first_non_filter().  Those are
qmp_block_resize() and external_snapshot_prepare(), who throw an error
if that returns false.

I think that’s just wrong.  First of all, I don’t even know why we have
that restriction anymore (I can imagine why it used to make sense before
the permission system).  qmp_block_resize() should always work as long
as it can get BLK_PERM_RESIZE; and I don’t know why the parents of some
node would care if you take a snapshot of their child.

>>>>>>> but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
>>>>>>> raw doesn't have any "filtered" child. What's the system behind this?
>>>>>>
>>>>>> “filtered” means: If the parent node returns data from this child, it
>>>>>> won’t modify it, neither its content nor its position.  COW and R/W
>>>>>> filters differ in how they handle writes; R/W filters pass them through
>>>>>> to the filtered child, COW filters copy them off to some other child
>>>>>> node (and then the filtered child’s data will no longer be visible at
>>>>>> that location).
>>>>>
>>>>> But there is no reason why a node couldn't fulfill this condition for
>>>>> more than one child node. bdrv_filtered_child() isn't well-defined then.
>>>>> Technically, the description "Return any filtered child" is correct
>>>>> because "any" can be interpreted as "an arbitrary", but obviously that
>>>>> makes the function useless.
>>>>
>>>> Which is why it currently returns NULL for Quorum.
>>>
>>> Which is about the only possible choice that breaks the contract...
>>>
>>>  * Return any filtered child, independently of how it reacts to write
>>
>> I don’t know if you’re serious about this proposition, because I don’t
>> know whether that could be useful in any way. :-?
> 
> Huh? This is just quoting the contract from your code?

I see.  I was thinking about “any of COW/RW, of which only one exists”.
 There is an assertion for that (that only one filtered child exists at
a time) in the code.  (And I consider assertions part of the contract.)

>>>  * accesses and whether data is copied onto this BDS through COR.
>>
>> I meant the contract as “Return the single filtered child there is, or NULL”
> 
> Then that should probably be spelt out in the contract.Probably even
> explicitly "NULL if there is either no filtered child or multiple
> filtered children".

Well, it’s spelled out through the assertion, but not in the
documentation, yes.

>>> Maybe the documentation of bdrv_filtered_child() needs to be rephrased?
>>>
>>> Going back to qcow2, it's really not much different as it has multiple
>>> (two) filtered children, too.
>>
>> Well, it doesn’t.  It isn’t an R/W filter.
> 
> What do I have to look at to see whether something is an R/W filter or
> not? qcow2 matches your criteria for an R/W filter.

No.  Some qcow2 nodes match the criteria.  But not all, which makes the
qcow2 driver not a filter driver.

> You say that it's
> not useful, so it's not an R/W filter anyway. But where in the code
> could I get this information?

“Where in the code”?  Do you want to add a comment to every BlockDriver
structure on why it does or doesn’t set .is_filter?

> This just doesn't make sense to me. If a driver matches the criteria for
> an R/W filter, then it should be one. If qcow2 should not be considered
> a R/W filter, then the criteria must be changed so that it isn't.

See below.

>> Maybe what we actually need to rephrase is the definition of .is_filter.
>>  (Namely something along the lines of “Fulfills these guarantees (same
>> data, etc. pp.), *and* should be skipped for allocation information
>> queries etc.”.
> 
> Hm - does this imply that .is_filter == this is a R/W filter? Because
> this was never spelt out, neither in code comments nor in commit
> messages.

While I’m not a fan of comment-less code, I do think that it’s possible
to read code.  Which clearly stated this.

> If we called R/W filters just "filters" (which makes it obvious how it
> relates to .is_filter) and COW nodes something that doesn't include the
> word "filter", things might become a lot clearer.

Because you apparently wrote this before reading that I agreed to your
renaming proposal, I now feel free to argue that I could just as well
rename .is_filter to .is_rw_filter.

Obviously I won’t because I prefer your proposal.

[...]

>>>>> Specficially, according to your definition, qcow2 filters both the
>>>>> backing file (COW filter) and the external data file (R/W filter).
>>>>
>>>> Not wrong.  But the same question as for raw arises: Is there any use to
>>>> declaring qcow2 an R/W filter driver just because it fits the definition?
>>>
>>> Wait, where is there even a place where this could be declared?
>>>
>>> The once thing I see that a driver even can declare is drv->is_filter,
>>> which is about the whole driver and not about nodes. It is false for
>>> qcow2.
>>
>> That’s correct.  But that’s not a fundamental problem, of course, we
>> could make it a per-BDS attribute if that made sense.
> 
> I was thinking per-child, actually, because you declare one BdrvChild
> filtered and another not filtered.

Why don’t you say so from the start then?

(Sorry, but honestly about 30 % of this discussion to me feels like
you’re playing games with me.  Please don’t take this the wrong way, I
mean it very neutrally.  It’s just that I feel like I’m explaining
things to you that you very much know, but you just want me to say them.
 And that feels unproductive and sometimes indeed frustrating.)

One thing is that this wouldn’t make the quorum case any easier because
it actually doesn’t know for which children it acts as a filter and for
which it doesn’t.

> But by now I think most of the confusion is really just a result of COW
> being considered a filter in some respects (mainly just the names of the
> child access functions), but not in others (like .is_filter).

I don’t quite see how it’s “by now” when in your first mail you already
basically wrote that functionally, everything works (leaving out
quorum), but that you’re confused (or claim to be confused, I have no
idea what’s real and what’s pretended anymore) by the names.

We have come to two results, as far as I can see:

First, naming COW backing nodes “COW filtered children” clashes with our
existing use of ”filter”.  There is no point in forcing the ”filter”
label on everything.  We can just keep calling (R/W) filters filters and
COW backing children COW children.  The names are succinct enough.

In some cases, we don’t care whether something is a COW or filtered
child, in such a case a caller can be bothered to use the slightly
longer bdrv_cow_or_filtered_child().

Second, most of the time we want a filter node to have a clear and
unique path to go down.  This is the important property of filters: That
you can skip them and go to the node that actually has the data.

Quorum breaks this by having multiple children, and nobody knows which
of them has the data we will see on the next read operation.

All “filters” who could have multiple children would have this problem.
 Hence a filter must always have a single unique data child.  I think.

[...]

>>> Either use a narrow definition, or use a broad one. But use only one and
>>> use it consistently.
>>
>> I think the problem appears because you restrict the process to a single
>> step where there’s actually two.
>>
>> Drivers can be either
>> (1) R/W filters (e.g. throttle)
>> (2) COW filters (e.g. qcow2)
>> (3) None of the above (e.g. vhdx, curl)
>>
>> This choice is made on the driver level, not on the node level (for good
>> reason, see below*).
> 
> What prevents a driver from being
> (4) COW filter and R/W filter (e.g. qcow2 if it were useful)?
> 
> I mean, conceptually, not in the implementation.

An R/W filter always shows the same data as the filtered child.  So the
COW child‘s data can never be visible, and as such you couldn’t have a
COW child at the same time.

Max

Kevin Wolf Sept. 10, 2019, 12:48 p.m. UTC | #10

Am 10.09.2019 um 13:36 hat Max Reitz geschrieben:
> On 10.09.19 12:47, Kevin Wolf wrote:
> > Am 10.09.2019 um 11:14 hat Max Reitz geschrieben:
> >> Maybe we should stop declaring Quorum a filter and then rename the
> >> bdrv_recurse_is_first_non_filter() to, I don’t know,
> >> bdrv_recurse_can_be_replaced_by_mirror()?
> > 
> > Why not.
> 
> It feels difficult to do in this series because this is a whole new can
> of worms.
> 
> In patch 35, I actually replace the mirror use case by
> is_filtered_child().  So it looks to me as if that should not be done,
> because I should instead fix bdrv_recurse_is_first_non_filter() (and
> rename it), because quorum does allow replacing its children by mirror,
> even if it does not act as a filter for them.
> 
> OTOH, there are other users of bdrv_is_first_non_filter().  Those are
> qmp_block_resize() and external_snapshot_prepare(), who throw an error
> if that returns false.
> 
> I think that’s just wrong.  First of all, I don’t even know why we have
> that restriction anymore (I can imagine why it used to make sense before
> the permission system).  qmp_block_resize() should always work as long
> as it can get BLK_PERM_RESIZE; and I don’t know why the parents of some
> node would care if you take a snapshot of their child.

Hm, doesn't it make sense in a way for qmp_block_resize() at least? It
means that you can't resize just a filter, but you need to resize the
image that actually provides the data for the filter.

Of course, there is no reason for it to be the _first_ non-filter as
long as BLK_PERM_RESIZE is shared, but just some non-filter node.

Two more random observations:

* quorum uses bdrv_filter_default_perms(), which allows BLK_PERM_RESIZE.
  I think this is wrong and quorum should make sure that all children are
  always the same size because otherwise it can't tell what its own size
  is. (Or vote on size...? :-/) Probably not a problem in practice as
  long as we check bdrv_is_first_non_filter().

* child_file and child_backing don't implement .resize. So if you resize
  a non-top-level image, parents (in particular filters) don't get their
  size adjusted. This is probably a bug, too, but one that isn't
  prevented by bdrv_is_first_non_filter() and should be visible today.

> >>> Maybe the documentation of bdrv_filtered_child() needs to be rephrased?
> >>>
> >>> Going back to qcow2, it's really not much different as it has multiple
> >>> (two) filtered children, too.
> >>
> >> Well, it doesn’t.  It isn’t an R/W filter.
> > 
> > What do I have to look at to see whether something is an R/W filter or
> > not? qcow2 matches your criteria for an R/W filter.
> 
> No.  Some qcow2 nodes match the criteria.  But not all, which makes the
> qcow2 driver not a filter driver.
> 
> > You say that it's not useful, so it's not an R/W filter anyway. But
> > where in the code could I get this information?
> 
> “Where in the code”?  Do you want to add a comment to every BlockDriver
> structure on why it does or doesn’t set .is_filter?

Never mind, I just didn't understand that .is_filter is the thing that
defines a R/W filter. In fact, I didn't really understand what
.is_filter was supposed to mean at all because I was so confused. For
some reason I was sure it had to mean any kind of filter, but that
assumption just didn't match up with its use at all.

> >>>>> Specficially, according to your definition, qcow2 filters both the
> >>>>> backing file (COW filter) and the external data file (R/W filter).
> >>>>
> >>>> Not wrong.  But the same question as for raw arises: Is there any use to
> >>>> declaring qcow2 an R/W filter driver just because it fits the definition?
> >>>
> >>> Wait, where is there even a place where this could be declared?
> >>>
> >>> The once thing I see that a driver even can declare is drv->is_filter,
> >>> which is about the whole driver and not about nodes. It is false for
> >>> qcow2.
> >>
> >> That’s correct.  But that’s not a fundamental problem, of course, we
> >> could make it a per-BDS attribute if that made sense.
> > 
> > I was thinking per-child, actually, because you declare one BdrvChild
> > filtered and another not filtered.
> 
> Why don’t you say so from the start then?

Yes, I wrote "nodes", thought "child nodes" and should have said
"children" because edges are not nodes. My bad, sorry.

> (Sorry, but honestly about 30 % of this discussion to me feels like
> you’re playing games with me.  Please don’t take this the wrong way, I
> mean it very neutrally.  It’s just that I feel like I’m explaining
> things to you that you very much know, but you just want me to say them.
>  And that feels unproductive and sometimes indeed frustrating.)

No, certainly not. If my mails seemed confusing or pointless, it just
shows how thoroughly confused I was.

> One thing is that this wouldn’t make the quorum case any easier because
> it actually doesn’t know for which children it acts as a filter and for
> which it doesn’t.
> 
> > But by now I think most of the confusion is really just a result of COW
> > being considered a filter in some respects (mainly just the names of the
> > child access functions), but not in others (like .is_filter).
> 
> I don’t quite see how it’s “by now” when in your first mail you already
> basically wrote that functionally, everything works (leaving out
> quorum), but that you’re confused (or claim to be confused, I have no
> idea what’s real and what’s pretended anymore) by the names.

Well, I saw that the special cases in the patches that I had reviewed so
far seemed to be converted correctly, but I just didn't understand the
whole concept behind it. It's possible to both understand that a
transformation is correct and to fail to grasp the concept behind it.

And your first answer only confused me more because you gave definitions
for R/W and COW filters that honestly ended up a bit misleading,
possibly as a result of your endeavour to make R/W filters and COW
sound like the same thing. (Which made me lose sight of basic facts like
that R/W filters must forward _every_ request without exception to their
filtered child even though COW doesn't.)

> We have come to two results, as far as I can see:
> 
> First, naming COW backing nodes “COW filtered children” clashes with our
> existing use of ”filter”.  There is no point in forcing the ”filter”
> label on everything.  We can just keep calling (R/W) filters filters and
> COW backing children COW children.  The names are succinct enough.
> 
> In some cases, we don’t care whether something is a COW or filtered
> child, in such a case a caller can be bothered to use the slightly
> longer bdrv_cow_or_filtered_child().

Aye.

> Second, most of the time we want a filter node to have a clear and
> unique path to go down.  This is the important property of filters: That
> you can skip them and go to the node that actually has the data.
> 
> Quorum breaks this by having multiple children, and nobody knows which
> of them has the data we will see on the next read operation.
> 
> All “filters” who could have multiple children would have this problem.
>  Hence a filter must always have a single unique data child.  I think.

I agree, and this is the condition that I mentioned somewhere above, but
failed to actually find guaranteed somewhere. We should probably make
this explicit.

Of course, quorum and similar things intend all their children to
provide the same data, but the whole point of the driver is that this is
not always guaranteed, so they aren't actually filters.

Kevin

Max Reitz Sept. 10, 2019, 12:59 p.m. UTC | #11

On 10.09.19 14:48, Kevin Wolf wrote:
> Am 10.09.2019 um 13:36 hat Max Reitz geschrieben:
>> On 10.09.19 12:47, Kevin Wolf wrote:
>>> Am 10.09.2019 um 11:14 hat Max Reitz geschrieben:
>>>> Maybe we should stop declaring Quorum a filter and then rename the
>>>> bdrv_recurse_is_first_non_filter() to, I don’t know,
>>>> bdrv_recurse_can_be_replaced_by_mirror()?
>>>
>>> Why not.
>>
>> It feels difficult to do in this series because this is a whole new can
>> of worms.
>>
>> In patch 35, I actually replace the mirror use case by
>> is_filtered_child().  So it looks to me as if that should not be done,
>> because I should instead fix bdrv_recurse_is_first_non_filter() (and
>> rename it), because quorum does allow replacing its children by mirror,
>> even if it does not act as a filter for them.
>>
>> OTOH, there are other users of bdrv_is_first_non_filter().  Those are
>> qmp_block_resize() and external_snapshot_prepare(), who throw an error
>> if that returns false.
>>
>> I think that’s just wrong.  First of all, I don’t even know why we have
>> that restriction anymore (I can imagine why it used to make sense before
>> the permission system).  qmp_block_resize() should always work as long
>> as it can get BLK_PERM_RESIZE; and I don’t know why the parents of some
>> node would care if you take a snapshot of their child.
> 
> Hm, doesn't it make sense in a way for qmp_block_resize() at least? It
> means that you can't resize just a filter, but you need to resize the
> image that actually provides the data for the filter.

Filters generally implement .bdrv_truncate() by passing it through, so
it should be fine.

> Of course, there is no reason for it to be the _first_ non-filter as
> long as BLK_PERM_RESIZE is shared, but just some non-filter node.
> 
> Two more random observations:
> 
> * quorum uses bdrv_filter_default_perms(), which allows BLK_PERM_RESIZE.
>   I think this is wrong and quorum should make sure that all children are
>   always the same size because otherwise it can't tell what its own size
>   is. (Or vote on size...? :-/) Probably not a problem in practice as
>   long as we check bdrv_is_first_non_filter().

(“Quorum is broken” seems to be a recurring observation.)

I agree, it shouldn’t share that permission.

> * child_file and child_backing don't implement .resize. So if you resize
>   a non-top-level image, parents (in particular filters) don't get their
>   size adjusted. This is probably a bug, too, but one that isn't
>   prevented by bdrv_is_first_non_filter() and should be visible today.

Hm. :-/

The good news is that I can try to fix this independently of this series.

[...]

>> We have come to two results, as far as I can see:
>>
>> First, naming COW backing nodes “COW filtered children” clashes with our
>> existing use of ”filter”.  There is no point in forcing the ”filter”
>> label on everything.  We can just keep calling (R/W) filters filters and
>> COW backing children COW children.  The names are succinct enough.
>>
>> In some cases, we don’t care whether something is a COW or filtered
>> child, in such a case a caller can be bothered to use the slightly
>> longer bdrv_cow_or_filtered_child().
> 
> Aye.
> 
>> Second, most of the time we want a filter node to have a clear and
>> unique path to go down.  This is the important property of filters: That
>> you can skip them and go to the node that actually has the data.
>>
>> Quorum breaks this by having multiple children, and nobody knows which
>> of them has the data we will see on the next read operation.
>>
>> All “filters” who could have multiple children would have this problem.
>>  Hence a filter must always have a single unique data child.  I think.
> 
> I agree, and this is the condition that I mentioned somewhere above, but
> failed to actually find guaranteed somewhere. We should probably make
> this explicit.
> 
> Of course, quorum and similar things intend all their children to
> provide the same data, but the whole point of the driver is that this is
> not always guaranteed, so they aren't actually filters.

OK, great, I’ll get cracking then.

Max

Kevin Wolf Sept. 10, 2019, 1:10 p.m. UTC | #12

Am 10.09.2019 um 14:59 hat Max Reitz geschrieben:
> On 10.09.19 14:48, Kevin Wolf wrote:
> > Am 10.09.2019 um 13:36 hat Max Reitz geschrieben:
> >> On 10.09.19 12:47, Kevin Wolf wrote:
> >>> Am 10.09.2019 um 11:14 hat Max Reitz geschrieben:
> >>>> Maybe we should stop declaring Quorum a filter and then rename the
> >>>> bdrv_recurse_is_first_non_filter() to, I don’t know,
> >>>> bdrv_recurse_can_be_replaced_by_mirror()?
> >>>
> >>> Why not.
> >>
> >> It feels difficult to do in this series because this is a whole new can
> >> of worms.
> >>
> >> In patch 35, I actually replace the mirror use case by
> >> is_filtered_child().  So it looks to me as if that should not be done,
> >> because I should instead fix bdrv_recurse_is_first_non_filter() (and
> >> rename it), because quorum does allow replacing its children by mirror,
> >> even if it does not act as a filter for them.
> >>
> >> OTOH, there are other users of bdrv_is_first_non_filter().  Those are
> >> qmp_block_resize() and external_snapshot_prepare(), who throw an error
> >> if that returns false.
> >>
> >> I think that’s just wrong.  First of all, I don’t even know why we have
> >> that restriction anymore (I can imagine why it used to make sense before
> >> the permission system).  qmp_block_resize() should always work as long
> >> as it can get BLK_PERM_RESIZE; and I don’t know why the parents of some
> >> node would care if you take a snapshot of their child.
> > 
> > Hm, doesn't it make sense in a way for qmp_block_resize() at least? It
> > means that you can't resize just a filter, but you need to resize the
> > image that actually provides the data for the filter.
> 
> Filters generally implement .bdrv_truncate() by passing it through, so
> it should be fine.

Good point.

Then checking bdrv_is_first_non_filter() probably just forbids the only
command that would actually work correctly (resizing the top-level
filter).

Kevin

[v6,04/42] block: Add child access functions

Commit Message

Comments

Patch