[v5,7/9] block: don't make snapshots for filters

Message ID	20160926080838.6992.95614.stgit@PASHA-ISP (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> To: qemu-devel@nongnu.org From: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Date: Mon, 26 Sep 2016 11:08:38 +0300 Message-ID: <20160926080838.6992.95614.stgit@PASHA-ISP> In-Reply-To: <20160926080757.6992.74311.stgit@PASHA-ISP> References: <20160926080757.6992.74311.stgit@PASHA-ISP> User-Agent: StGit/0.16 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters Precedence: list Cc: kwolf@redhat.com, peter.maydell@linaro.org, mst@redhat.com, jasowang@redhat.com, quintela@redhat.com, pbonzini@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Pavel Dovgalyuk Sept. 26, 2016, 8:08 a.m. UTC

This patch disables snapshotting for block driver filters.
It is needed, because snapshots should be created
in underlying disk images, not in filters itself.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 block/snapshot.c |    3 +++
 1 file changed, 3 insertions(+)

Kevin Wolf Sept. 26, 2016, 9:23 a.m. UTC | #1

Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> This patch disables snapshotting for block driver filters.
> It is needed, because snapshots should be created
> in underlying disk images, not in filters itself.
> 
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>

But that's exactly what the existing code implements? If a driver
doesn't provide .bdrv_snapshot_goto, the request is redirected to
bs->file.

>  block/snapshot.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/snapshot.c b/block/snapshot.c
> index bf5c2ca..8998b8b 100644
> --- a/block/snapshot.c
> +++ b/block/snapshot.c
> @@ -184,6 +184,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>      if (!drv) {
>          return -ENOMEDIUM;
>      }
> +    if (drv->is_filter) {
> +        return 0;
> +    }

This, on the other hand, doesn't redirect the request, but silently
ignores it. That is, loading the snapshot will apparently succeed, but
it wouldn't actually load anything and the disk would stay in its
current state.

>      if (drv->bdrv_snapshot_goto) {
>          return drv->bdrv_snapshot_goto(bs, snapshot_id);
>      }

Kevin

Pavel Dovgalyuk Sept. 26, 2016, 9:51 a.m. UTC | #2

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > This patch disables snapshotting for block driver filters.
> > It is needed, because snapshots should be created
> > in underlying disk images, not in filters itself.
> >
> > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> 
> But that's exactly what the existing code implements? If a driver
> doesn't provide .bdrv_snapshot_goto, the request is redirected to
> bs->file.
> 
> >  block/snapshot.c |    3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/block/snapshot.c b/block/snapshot.c
> > index bf5c2ca..8998b8b 100644
> > --- a/block/snapshot.c
> > +++ b/block/snapshot.c
> > @@ -184,6 +184,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
> >      if (!drv) {
> >          return -ENOMEDIUM;
> >      }
> > +    if (drv->is_filter) {
> > +        return 0;
> > +    }
> 
> This, on the other hand, doesn't redirect the request, but silently
> ignores it. That is, loading the snapshot will apparently succeed, but
> it wouldn't actually load anything and the disk would stay in its
> current state.

In my use case bdrv_all_goto_snapshot iterates all block drivers, including
filters and disk images. Therefore skipping goto for images is ok.

Maybe I should move this check to bdrv_can_snapshot?

> 
> >      if (drv->bdrv_snapshot_goto) {
> >          return drv->bdrv_snapshot_goto(bs, snapshot_id);
> >      }


Pavel Dovgalyuk

Kevin Wolf Sept. 26, 2016, 1:17 p.m. UTC | #3

Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > This patch disables snapshotting for block driver filters.
> > > It is needed, because snapshots should be created
> > > in underlying disk images, not in filters itself.
> > >
> > > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> > 
> > But that's exactly what the existing code implements? If a driver
> > doesn't provide .bdrv_snapshot_goto, the request is redirected to
> > bs->file.
> > 
> > >  block/snapshot.c |    3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/block/snapshot.c b/block/snapshot.c
> > > index bf5c2ca..8998b8b 100644
> > > --- a/block/snapshot.c
> > > +++ b/block/snapshot.c
> > > @@ -184,6 +184,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
> > >      if (!drv) {
> > >          return -ENOMEDIUM;
> > >      }
> > > +    if (drv->is_filter) {
> > > +        return 0;
> > > +    }
> > 
> > This, on the other hand, doesn't redirect the request, but silently
> > ignores it. That is, loading the snapshot will apparently succeed, but
> > it wouldn't actually load anything and the disk would stay in its
> > current state.
> 
> In my use case bdrv_all_goto_snapshot iterates all block drivers, including
> filters and disk images. Therefore skipping goto for images is ok.

Hm, this can happy today indeed.

Originally, we only called bdrv_goto_snapshot() for all _top level_
BDSes, and this is still what you normally get. However, if you
explicitly create a BDS (e.g. with its own -drive option), it is
considered a top level BDS without actually being top level for the
guest, and therefore the snapshotting function is called for it.

Of course, this is highly inefficient because the goto_snapshot request
is passed by the filter driver and then called another time for the
lower node, effectively loading the snapshot a second time.

On the other hand if you use a single -drive option to create both the
qcow2 BDS and the blkreplay filter, we do need to pass down the
goto_snapshot request because it won't be called for the qcow2 layer
otherwise.

I'm not completely sure yet what the right behaviour would be here.

Kevin

Pavel Dovgalyuk Sept. 27, 2016, 2:06 p.m. UTC | #4

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > > This patch disables snapshotting for block driver filters.
> > > > It is needed, because snapshots should be created
> > > > in underlying disk images, not in filters itself.
> > > >
> > > > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> > >
> > > But that's exactly what the existing code implements? If a driver
> > > doesn't provide .bdrv_snapshot_goto, the request is redirected to
> > > bs->file.
> > >
> > > >  block/snapshot.c |    3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/block/snapshot.c b/block/snapshot.c
> > > > index bf5c2ca..8998b8b 100644
> > > > --- a/block/snapshot.c
> > > > +++ b/block/snapshot.c
> > > > @@ -184,6 +184,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
> > > >      if (!drv) {
> > > >          return -ENOMEDIUM;
> > > >      }
> > > > +    if (drv->is_filter) {
> > > > +        return 0;
> > > > +    }
> > >
> > > This, on the other hand, doesn't redirect the request, but silently
> > > ignores it. That is, loading the snapshot will apparently succeed, but
> > > it wouldn't actually load anything and the disk would stay in its
> > > current state.
> >
> > In my use case bdrv_all_goto_snapshot iterates all block drivers, including
> > filters and disk images. Therefore skipping goto for images is ok.
> 
> Hm, this can happy today indeed.
> 
> Originally, we only called bdrv_goto_snapshot() for all _top level_
> BDSes, and this is still what you normally get. However, if you
> explicitly create a BDS (e.g. with its own -drive option), it is
> considered a top level BDS without actually being top level for the
> guest, and therefore the snapshotting function is called for it.
> 
> Of course, this is highly inefficient because the goto_snapshot request
> is passed by the filter driver and then called another time for the
> lower node, effectively loading the snapshot a second time.
> 
> On the other hand if you use a single -drive option to create both the
> qcow2 BDS and the blkreplay filter, we do need to pass down the
> goto_snapshot request because it won't be called for the qcow2 layer
> otherwise.

How this can be specified in command line?
I believed that separate -drive option is required.

> 
> I'm not completely sure yet what the right behaviour would be here.

Pavel Dovgalyuk

Kevin Wolf Sept. 28, 2016, 8:36 a.m. UTC | #5

Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > > > This patch disables snapshotting for block driver filters.
> > > > > It is needed, because snapshots should be created
> > > > > in underlying disk images, not in filters itself.
> > > > >
> > > > > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> > > >
> > > > But that's exactly what the existing code implements? If a driver
> > > > doesn't provide .bdrv_snapshot_goto, the request is redirected to
> > > > bs->file.
> > > >
> > > > >  block/snapshot.c |    3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/block/snapshot.c b/block/snapshot.c
> > > > > index bf5c2ca..8998b8b 100644
> > > > > --- a/block/snapshot.c
> > > > > +++ b/block/snapshot.c
> > > > > @@ -184,6 +184,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
> > > > >      if (!drv) {
> > > > >          return -ENOMEDIUM;
> > > > >      }
> > > > > +    if (drv->is_filter) {
> > > > > +        return 0;
> > > > > +    }
> > > >
> > > > This, on the other hand, doesn't redirect the request, but silently
> > > > ignores it. That is, loading the snapshot will apparently succeed, but
> > > > it wouldn't actually load anything and the disk would stay in its
> > > > current state.
> > >
> > > In my use case bdrv_all_goto_snapshot iterates all block drivers, including
> > > filters and disk images. Therefore skipping goto for images is ok.
> > 
> > Hm, this can happy today indeed.
> > 
> > Originally, we only called bdrv_goto_snapshot() for all _top level_
> > BDSes, and this is still what you normally get. However, if you
> > explicitly create a BDS (e.g. with its own -drive option), it is
> > considered a top level BDS without actually being top level for the
> > guest, and therefore the snapshotting function is called for it.
> > 
> > Of course, this is highly inefficient because the goto_snapshot request
> > is passed by the filter driver and then called another time for the
> > lower node, effectively loading the snapshot a second time.
> > 
> > On the other hand if you use a single -drive option to create both the
> > qcow2 BDS and the blkreplay filter, we do need to pass down the
> > goto_snapshot request because it won't be called for the qcow2 layer
> > otherwise.
> 
> How this can be specified in command line?
> I believed that separate -drive option is required.

Something like this:

    -drive driver=blkreplay,image.driver=file,image.filename=test.img

Kevin

Pavel Dovgalyuk Sept. 28, 2016, 9:32 a.m. UTC | #6

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > Originally, we only called bdrv_goto_snapshot() for all _top level_
> > > BDSes, and this is still what you normally get. However, if you
> > > explicitly create a BDS (e.g. with its own -drive option), it is
> > > considered a top level BDS without actually being top level for the
> > > guest, and therefore the snapshotting function is called for it.
> > >
> > > Of course, this is highly inefficient because the goto_snapshot request
> > > is passed by the filter driver and then called another time for the
> > > lower node, effectively loading the snapshot a second time.

Maybe double-saving/loading does the smallest damage then?
And we should just document how to use blkreplay effectively?

> > >
> > > On the other hand if you use a single -drive option to create both the
> > > qcow2 BDS and the blkreplay filter, we do need to pass down the
> > > goto_snapshot request because it won't be called for the qcow2 layer
> > > otherwise.
> >
> > How this can be specified in command line?
> > I believed that separate -drive option is required.
> 
> Something like this:
> 
>     -drive driver=blkreplay,image.driver=file,image.filename=test.img
> 

I tried the following command line, but VM does not detect the hard drive
and cannot boot.

-drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay 
-device ide-hd,drive=img-blkreplay


Pavel Dovgalyuk

Kevin Wolf Sept. 28, 2016, 9:43 a.m. UTC | #7

Am 28.09.2016 um 11:32 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > > Originally, we only called bdrv_goto_snapshot() for all _top level_
> > > > BDSes, and this is still what you normally get. However, if you
> > > > explicitly create a BDS (e.g. with its own -drive option), it is
> > > > considered a top level BDS without actually being top level for the
> > > > guest, and therefore the snapshotting function is called for it.
> > > >
> > > > Of course, this is highly inefficient because the goto_snapshot request
> > > > is passed by the filter driver and then called another time for the
> > > > lower node, effectively loading the snapshot a second time.
> 
> Maybe double-saving/loading does the smallest damage then?
> And we should just document how to use blkreplay effectively?
> 
> > > >
> > > > On the other hand if you use a single -drive option to create both the
> > > > qcow2 BDS and the blkreplay filter, we do need to pass down the
> > > > goto_snapshot request because it won't be called for the qcow2 layer
> > > > otherwise.
> > >
> > > How this can be specified in command line?
> > > I believed that separate -drive option is required.
> > 
> > Something like this:
> > 
> >     -drive driver=blkreplay,image.driver=file,image.filename=test.img
> > 
> 
> I tried the following command line, but VM does not detect the hard drive
> and cannot boot.
> 
> -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay 
> -device ide-hd,drive=img-blkreplay

My command line was assuming a raw image. It looks like you're using a
qcow (hopefully qcow2?) image. If so, then you need to include the qcow2
driver:

-drive driver=blkreplay,if=none,image.driver=qcow2,\
image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay

Kevin

Pavel Dovgalyuk Sept. 28, 2016, 12:49 p.m. UTC | #8

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 28.09.2016 um 11:32 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben:
> > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > > > Originally, we only called bdrv_goto_snapshot() for all _top level_
> > > > > BDSes, and this is still what you normally get. However, if you
> > > > > explicitly create a BDS (e.g. with its own -drive option), it is
> > > > > considered a top level BDS without actually being top level for the
> > > > > guest, and therefore the snapshotting function is called for it.
> > > > >
> > > > > Of course, this is highly inefficient because the goto_snapshot request
> > > > > is passed by the filter driver and then called another time for the
> > > > > lower node, effectively loading the snapshot a second time.
> >
> > Maybe double-saving/loading does the smallest damage then?
> > And we should just document how to use blkreplay effectively?
> >
> > > > >
> > > > > On the other hand if you use a single -drive option to create both the
> > > > > qcow2 BDS and the blkreplay filter, we do need to pass down the
> > > > > goto_snapshot request because it won't be called for the qcow2 layer
> > > > > otherwise.
> > > >
> > > > How this can be specified in command line?
> > > > I believed that separate -drive option is required.
> > >
> > > Something like this:
> > >
> > >     -drive driver=blkreplay,image.driver=file,image.filename=test.img
> > >
> >
> > I tried the following command line, but VM does not detect the hard drive
> > and cannot boot.
> >
> > -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> blkreplay
> > -device ide-hd,drive=img-blkreplay
> 
> My command line was assuming a raw image. It looks like you're using a
> qcow (hopefully qcow2?) image. If so, then you need to include the qcow2
> driver:
> 
> -drive driver=blkreplay,if=none,image.driver=qcow2,\
> image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay

This doesn't work for some reason. Replay just hangs at some moment.

Maybe there exists some internal difference between command line with one or two -drive options?

Pavel Dovgalyuk

Kevin Wolf Sept. 29, 2016, 10:32 a.m. UTC | #9

Am 28.09.2016 um 14:49 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 28.09.2016 um 11:32 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > > > > Originally, we only called bdrv_goto_snapshot() for all _top level_
> > > > > > BDSes, and this is still what you normally get. However, if you
> > > > > > explicitly create a BDS (e.g. with its own -drive option), it is
> > > > > > considered a top level BDS without actually being top level for the
> > > > > > guest, and therefore the snapshotting function is called for it.
> > > > > >
> > > > > > Of course, this is highly inefficient because the goto_snapshot request
> > > > > > is passed by the filter driver and then called another time for the
> > > > > > lower node, effectively loading the snapshot a second time.
> > >
> > > Maybe double-saving/loading does the smallest damage then?
> > > And we should just document how to use blkreplay effectively?
> > >
> > > > > >
> > > > > > On the other hand if you use a single -drive option to create both the
> > > > > > qcow2 BDS and the blkreplay filter, we do need to pass down the
> > > > > > goto_snapshot request because it won't be called for the qcow2 layer
> > > > > > otherwise.
> > > > >
> > > > > How this can be specified in command line?
> > > > > I believed that separate -drive option is required.
> > > >
> > > > Something like this:
> > > >
> > > >     -drive driver=blkreplay,image.driver=file,image.filename=test.img
> > > >
> > >
> > > I tried the following command line, but VM does not detect the hard drive
> > > and cannot boot.
> > >
> > > -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> > blkreplay
> > > -device ide-hd,drive=img-blkreplay
> > 
> > My command line was assuming a raw image. It looks like you're using a
> > qcow (hopefully qcow2?) image. If so, then you need to include the qcow2
> > driver:
> > 
> > -drive driver=blkreplay,if=none,image.driver=qcow2,\
> > image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay
> 
> This doesn't work for some reason. Replay just hangs at some moment.
> 
> Maybe there exists some internal difference between command line with one or two -drive options?

There are some subtle differences that affect the management of the
block driver nodes, but they shouldn't make a difference for the I/O
path. But I don't know the replay code well, so maybe it makes a
difference there (which would be a bug).

If you have two separate -drive options, a BlockBackend is created for
each of them. The lower layer one wouldn't be used normally, but if you
iterate over all BlockBackends somewhere, you would see it. Both BBs are
owned by the monitor, and if you remove the top layer, the monitor
reference will keep the lower layer alive.

In contrast, with a single -drive option, only the top layer (blkreplay)
has a BlockBackend attached, but the qcow2 layer doesn't. If the
blkreplay BlockBackend is deleted, the qcow2 layer automatically goes
away as well because the monitor doesn't hold a reference to it.

I think these are the most important (and possibly the only)
differences.

Kevin

Pavel Dovgalyuk Nov. 16, 2016, 9:49 a.m. UTC | #10

Kevin,

> From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 28.09.2016 um 11:32 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > > > > Originally, we only called bdrv_goto_snapshot() for all _top level_
> > > > > > BDSes, and this is still what you normally get. However, if you
> > > > > > explicitly create a BDS (e.g. with its own -drive option), it is
> > > > > > considered a top level BDS without actually being top level for the
> > > > > > guest, and therefore the snapshotting function is called for it.
> > > > > >
> > > > > > Of course, this is highly inefficient because the goto_snapshot request
> > > > > > is passed by the filter driver and then called another time for the
> > > > > > lower node, effectively loading the snapshot a second time.
> > >
> > > Maybe double-saving/loading does the smallest damage then?
> > > And we should just document how to use blkreplay effectively?
> > >
> > > > > >
> > > > > > On the other hand if you use a single -drive option to create both the
> > > > > > qcow2 BDS and the blkreplay filter, we do need to pass down the
> > > > > > goto_snapshot request because it won't be called for the qcow2 layer
> > > > > > otherwise.
> > > > >
> > > > > How this can be specified in command line?
> > > > > I believed that separate -drive option is required.
> > > >
> > > > Something like this:
> > > >
> > > >     -drive driver=blkreplay,image.driver=file,image.filename=test.img
> > > >
> > >
> > > I tried the following command line, but VM does not detect the hard drive
> > > and cannot boot.
> > >
> > > -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> > blkreplay
> > > -device ide-hd,drive=img-blkreplay
> >
> > My command line was assuming a raw image. It looks like you're using a
> > qcow (hopefully qcow2?) image. If so, then you need to include the qcow2
> > driver:
> >
> > -drive driver=blkreplay,if=none,image.driver=qcow2,\
> > image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay
> 
> This doesn't work for some reason. Replay just hangs at some moment.
> 
> Maybe there exists some internal difference between command line with one or two -drive
> options?

I've investigated this issue.
This command line works ok:
 -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay 
 -device ide-hd,drive=img-blkreplay

And this does not:
 -drive
driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow
,id=img-blkreplay
 -device ide-hd,drive=img-blkreplay

QEMU hangs at some moment of replay.

I found that some dma requests do not pass through the blkreplay driver
due to the following line in block-backend.c:
    return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);

This line passes read request directly to qcow driver and blkreplay cannot
process it to make deterministic.

Pavel Dovgalyuk

Paolo Bonzini Nov. 16, 2016, 12:15 p.m. UTC | #11

> I've investigated this issue.
> This command line works ok:
>  -drive
>  driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay
>  -device ide-hd,drive=img-blkreplay
> 
> And this does not:
>  -drive
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow
> ,id=img-blkreplay
>  -device ide-hd,drive=img-blkreplay
> 
> QEMU hangs at some moment of replay.
> 
> I found that some dma requests do not pass through the blkreplay driver
> due to the following line in block-backend.c:
>     return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> 
> This line passes read request directly to qcow driver and blkreplay cannot
> process it to make deterministic.

I don't understand, blk->root should be the blkreplay here.

Paolo

Kevin Wolf Nov. 16, 2016, 12:20 p.m. UTC | #12

Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > My command line was assuming a raw image. It looks like you're using a
> > > qcow (hopefully qcow2?) image. If so, then you need to include the qcow2
> > > driver:
> > >
> > > -drive driver=blkreplay,if=none,image.driver=qcow2,\
> > > image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay
> > 
> > This doesn't work for some reason. Replay just hangs at some moment.
> > 
> > Maybe there exists some internal difference between command line with one or two -drive
> > options?
> 
> I've investigated this issue.
> This command line works ok:
>  -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay 
>  -device ide-hd,drive=img-blkreplay
> 
> And this does not:
>  -drive
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow
> ,id=img-blkreplay
>  -device ide-hd,drive=img-blkreplay
> 
> QEMU hangs at some moment of replay.
> 
> I found that some dma requests do not pass through the blkreplay driver
> due to the following line in block-backend.c:
>     return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> 
> This line passes read request directly to qcow driver and blkreplay cannot
> process it to make deterministic.

How does that bypass blkreplay? blk->root is supposed to be the blkreply
node, do you see something different? If it were the qcow2 node, then I
would expect that no requests at all go through the blkreplay layer.

Kevin

Pavel Dovgalyuk Nov. 16, 2016, 1:50 p.m. UTC | #13

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > I've investigated this issue.
> > This command line works ok:
> >  -drive
> >  driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay
> >  -device ide-hd,drive=img-blkreplay
> >
> > And this does not:
> >  -drive
> >
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdis
> k.qcow
> > ,id=img-blkreplay
> >  -device ide-hd,drive=img-blkreplay
> >
> > QEMU hangs at some moment of replay.
> >
> > I found that some dma requests do not pass through the blkreplay driver
> > due to the following line in block-backend.c:
> >     return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> >
> > This line passes read request directly to qcow driver and blkreplay cannot
> > process it to make deterministic.
> 
> I don't understand, blk->root should be the blkreplay here.

I've got some more logs. I used the disk image which references the backing file.
It seems that some weird things happen with both command lines.

== For the first command line (blkreplay separated from image):
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp_overlay1)
 -> bdrv_co_preadv(blkreplay, temp_overlay)
 -> bdrv_co_preadv(qcow2, temp_overlay2)
 -> bdrv_co_preadv(qcow2, image_overlay)
 -> bdrv_co_preadv(qcow2, image_backing)
 -> bdrv_co_preadv(file, image_backing)

But sometimes it changes to:
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp_overlay1)
 -> bdrv_co_preadv(file, temp_overlay1)

== For the second command line (blkreplay combined with image):

In most cases we have the following call stack:
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp_overlay)
 -> bdrv_co_preadv(blkreplay, image_overlay)
 -> bdrv_co_preadv(qcow2, image_overlay)
 -> bdrv_co_preadv(qcow2, image_backing)
 -> bdrv_co_preadv(file, image_backing)

But sometimes it changes to:
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp overlay)
 -> bdrv_co_preadv(file, temp overlay)

========

It seems, that temporary overlay is created over blkreplay, which
it intended to work as a simple filter. Is that correct?
	
Pavel Dovgalyuk

Pavel Dovgalyuk Nov. 21, 2016, 12:22 p.m. UTC | #14

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> >
> > I've investigated this issue.
> > This command line works ok:
> >  -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> blkreplay
> >  -device ide-hd,drive=img-blkreplay
> >
> > And this does not:
> >  -drive
> >
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdis
> k.qcow
> > ,id=img-blkreplay
> >  -device ide-hd,drive=img-blkreplay
> >
> > QEMU hangs at some moment of replay.
> >
> > I found that some dma requests do not pass through the blkreplay driver
> > due to the following line in block-backend.c:
> >     return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> >
> > This line passes read request directly to qcow driver and blkreplay cannot
> > process it to make deterministic.
> 
> How does that bypass blkreplay? blk->root is supposed to be the blkreply
> node, do you see something different? If it were the qcow2 node, then I
> would expect that no requests at all go through the blkreplay layer.

It seems, that the problem is in -snapshot option.
We have one of the following block layers depending on command line:
 tmp_overlay1 -> blkreplay -> tmp_overlay2 -> disk_image
 tmp_overlay1 -> blkreplay -> disk_image

But the correct scheme is intended to be the following:
 blkreplay -> tmp_overlay1 -> disk_image

How can we fix it?
Maybe we should add blkreplay automatically to all block devices and not
to specify it in the command line?

Pavel Dovgalyuk

Kevin Wolf Nov. 21, 2016, 3:54 p.m. UTC | #15

Am 21.11.2016 um 13:22 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > > > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> > >
> > > I've investigated this issue.
> > > This command line works ok:
> > >  -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> > blkreplay
> > >  -device ide-hd,drive=img-blkreplay
> > >
> > > And this does not:
> > >  -drive
> > >
> > driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdis
> > k.qcow
> > > ,id=img-blkreplay
> > >  -device ide-hd,drive=img-blkreplay
> > >
> > > QEMU hangs at some moment of replay.
> > >
> > > I found that some dma requests do not pass through the blkreplay driver
> > > due to the following line in block-backend.c:
> > >     return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> > >
> > > This line passes read request directly to qcow driver and blkreplay cannot
> > > process it to make deterministic.
> > 
> > How does that bypass blkreplay? blk->root is supposed to be the blkreply
> > node, do you see something different? If it were the qcow2 node, then I
> > would expect that no requests at all go through the blkreplay layer.
> 
> It seems, that the problem is in -snapshot option.
> We have one of the following block layers depending on command line:
>  tmp_overlay1 -> blkreplay -> tmp_overlay2 -> disk_image
>  tmp_overlay1 -> blkreplay -> disk_image
> 
> But the correct scheme is intended to be the following:
>  blkreplay -> tmp_overlay1 -> disk_image
> 
> How can we fix it?
> Maybe we should add blkreplay automatically to all block devices and not
> to specify it in the command line?

I think you found a pretty fundamental design problem with "global"
drive options that add a filter node such as -snapshot and replay mode
(replay mode isn't one of them today, but your suggestion to make it
automatic would turn it into one).

At the core of the problem I think we have two questions:

1. Which block nodes should be affected and get a filter node added, and
   which nodes shouldn't get one? In your case, disl_image is defined
   with a -drive option, but shouldn't get the snapshot.

2. In which order should filter nodes be added?

Both of these questions feel hard. As long as we haven't thought through
the concept as such (rather than discussing one-off hacks) and we're not
completely certain what the right answer to the questions is, we
shouldn't add more automatic filter nodes, because chances are that we
get it wrong and would regret it.

The obvious answer for a workaround would be: Make everything manual,
i.e. don't use -snapshot, but create a qcow2 overlay manually.

For getting the automatic thing right, please give me some time to think
about the design. I'll also meet Markus (CCed) in person end of next
week and I think this is a design question that would be useful to
discuss with him then.

Kevin

Pavel Dovgalyuk Dec. 5, 2016, 7:43 a.m. UTC | #16

Paolo,

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 21.11.2016 um 13:22 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > > > > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> > > >
> > > > I've investigated this issue.
> > > > This command line works ok:
> > > >  -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> > > blkreplay
> > > >  -device ide-hd,drive=img-blkreplay
> > > >
> > > > And this does not:
> > > >  -drive
> > > >
> > >
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdis
> > > k.qcow
> > > > ,id=img-blkreplay
> > > >  -device ide-hd,drive=img-blkreplay
> > > >
> > > > QEMU hangs at some moment of replay.
> > > >
> > > > I found that some dma requests do not pass through the blkreplay driver
> > > > due to the following line in block-backend.c:
> > > >     return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> > > >
> > > > This line passes read request directly to qcow driver and blkreplay cannot
> > > > process it to make deterministic.
> > >
> > > How does that bypass blkreplay? blk->root is supposed to be the blkreply
> > > node, do you see something different? If it were the qcow2 node, then I
> > > would expect that no requests at all go through the blkreplay layer.
> >
> > It seems, that the problem is in -snapshot option.
> > We have one of the following block layers depending on command line:
> >  tmp_overlay1 -> blkreplay -> tmp_overlay2 -> disk_image
> >  tmp_overlay1 -> blkreplay -> disk_image
> >
> > But the correct scheme is intended to be the following:
> >  blkreplay -> tmp_overlay1 -> disk_image
> >
> > How can we fix it?
> > Maybe we should add blkreplay automatically to all block devices and not
> > to specify it in the command line?
> 
> I think you found a pretty fundamental design problem with "global"
> drive options that add a filter node such as -snapshot and replay mode
> (replay mode isn't one of them today, but your suggestion to make it
> automatic would turn it into one).
> 
> At the core of the problem I think we have two questions:
> 
> 1. Which block nodes should be affected and get a filter node added, and
>    which nodes shouldn't get one? In your case, disl_image is defined
>    with a -drive option, but shouldn't get the snapshot.
> 
> 2. In which order should filter nodes be added?
> 
> Both of these questions feel hard. As long as we haven't thought through
> the concept as such (rather than discussing one-off hacks) and we're not
> completely certain what the right answer to the questions is, we
> shouldn't add more automatic filter nodes, because chances are that we
> get it wrong and would regret it.
> 
> The obvious answer for a workaround would be: Make everything manual,
> i.e. don't use -snapshot, but create a qcow2 overlay manually.

What about to switching to manual overlay creation by default?
We can make rrsnapshot option mandatory.
Therefore user will have to create snapshot in image or overlay and
the disk image will not be corrupted.

It is not very convenient, but we could disable rrsnapshot again when
the solution for -snapshot will be found.

> For getting the automatic thing right, please give me some time to think
> about the design. I'll also meet Markus (CCed) in person end of next
> week and I think this is a design question that would be useful to
> discuss with him then.

Pavel Dovgalyuk

Kevin Wolf Dec. 5, 2016, 10:34 a.m. UTC | #17

Am 05.12.2016 um 08:43 hat Pavel Dovgalyuk geschrieben:
> Paolo,
> 
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 21.11.2016 um 13:22 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> > > > >
> > > > > I've investigated this issue.
> > > > > This command line works ok:
> > > > >  -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> > > > blkreplay
> > > > >  -device ide-hd,drive=img-blkreplay
> > > > >
> > > > > And this does not:
> > > > >  -drive
> > > > >
> > > >
> > driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdis
> > > > k.qcow
> > > > > ,id=img-blkreplay
> > > > >  -device ide-hd,drive=img-blkreplay
> > > > >
> > > > > QEMU hangs at some moment of replay.
> > > > >
> > > > > I found that some dma requests do not pass through the blkreplay driver
> > > > > due to the following line in block-backend.c:
> > > > >     return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> > > > >
> > > > > This line passes read request directly to qcow driver and blkreplay cannot
> > > > > process it to make deterministic.
> > > >
> > > > How does that bypass blkreplay? blk->root is supposed to be the blkreply
> > > > node, do you see something different? If it were the qcow2 node, then I
> > > > would expect that no requests at all go through the blkreplay layer.
> > >
> > > It seems, that the problem is in -snapshot option.
> > > We have one of the following block layers depending on command line:
> > >  tmp_overlay1 -> blkreplay -> tmp_overlay2 -> disk_image
> > >  tmp_overlay1 -> blkreplay -> disk_image
> > >
> > > But the correct scheme is intended to be the following:
> > >  blkreplay -> tmp_overlay1 -> disk_image
> > >
> > > How can we fix it?
> > > Maybe we should add blkreplay automatically to all block devices and not
> > > to specify it in the command line?
> > 
> > I think you found a pretty fundamental design problem with "global"
> > drive options that add a filter node such as -snapshot and replay mode
> > (replay mode isn't one of them today, but your suggestion to make it
> > automatic would turn it into one).
> > 
> > At the core of the problem I think we have two questions:
> > 
> > 1. Which block nodes should be affected and get a filter node added, and
> >    which nodes shouldn't get one? In your case, disl_image is defined
> >    with a -drive option, but shouldn't get the snapshot.
> > 
> > 2. In which order should filter nodes be added?
> > 
> > Both of these questions feel hard. As long as we haven't thought through
> > the concept as such (rather than discussing one-off hacks) and we're not
> > completely certain what the right answer to the questions is, we
> > shouldn't add more automatic filter nodes, because chances are that we
> > get it wrong and would regret it.
> > 
> > The obvious answer for a workaround would be: Make everything manual,
> > i.e. don't use -snapshot, but create a qcow2 overlay manually.
> 
> What about to switching to manual overlay creation by default?
> We can make rrsnapshot option mandatory.
> Therefore user will have to create snapshot in image or overlay and
> the disk image will not be corrupted.
> 
> It is not very convenient, but we could disable rrsnapshot again when
> the solution for -snapshot will be found.

Hm, what is this rrsnapshot option? git grep can't find it.

Anyway, it seems that doing things manually is the safe way as long as
we don't know the final solution, so I think I agree.

For a slightly more convenient way, one of the problems to solve seems
to be that snapshot=on always affects the top level node and you can't
create a temporary snapshot in the middle of the chain. Perhaps we
should introduce a 'temporary-overlay' driver or something like that, so
that you could specify things like this:

    -drive if=none,driver=file,filename=test.img,id=orig
    -drive if=none,driver=temporary-overlay,file=orig,id=snap
    -drive if=none,driver=blkreplay,image=snap

Which makes me wonder... Is blkreplay usable without the temporary
snapshot or is this pretty much a requirement? Because if it has to be
there, the next step could be that blkreplay creates temporary-overlay
internally in its .bdrv_open().

Kevin

Pavel Dovgalyuk Dec. 5, 2016, 11:49 a.m. UTC | #18

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 05.12.2016 um 08:43 hat Pavel Dovgalyuk geschrieben:
> > Paolo,
> >
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > Am 21.11.2016 um 13:22 hat Pavel Dovgalyuk geschrieben:
> > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > > > >
> > > > > How does that bypass blkreplay? blk->root is supposed to be the blkreply
> > > > > node, do you see something different? If it were the qcow2 node, then I
> > > > > would expect that no requests at all go through the blkreplay layer.
> > > >
> > > > It seems, that the problem is in -snapshot option.
> > > > We have one of the following block layers depending on command line:
> > > >  tmp_overlay1 -> blkreplay -> tmp_overlay2 -> disk_image
> > > >  tmp_overlay1 -> blkreplay -> disk_image
> > > >
> > > > But the correct scheme is intended to be the following:
> > > >  blkreplay -> tmp_overlay1 -> disk_image
> > > >
> > > > How can we fix it?
> > > > Maybe we should add blkreplay automatically to all block devices and not
> > > > to specify it in the command line?
> > >
> > > I think you found a pretty fundamental design problem with "global"
> > > drive options that add a filter node such as -snapshot and replay mode
> > > (replay mode isn't one of them today, but your suggestion to make it
> > > automatic would turn it into one).
> > >
> > > At the core of the problem I think we have two questions:
> > >
> > > 1. Which block nodes should be affected and get a filter node added, and
> > >    which nodes shouldn't get one? In your case, disl_image is defined
> > >    with a -drive option, but shouldn't get the snapshot.
> > >
> > > 2. In which order should filter nodes be added?
> > >
> > > Both of these questions feel hard. As long as we haven't thought through
> > > the concept as such (rather than discussing one-off hacks) and we're not
> > > completely certain what the right answer to the questions is, we
> > > shouldn't add more automatic filter nodes, because chances are that we
> > > get it wrong and would regret it.
> > >
> > > The obvious answer for a workaround would be: Make everything manual,
> > > i.e. don't use -snapshot, but create a qcow2 overlay manually.
> >
> > What about to switching to manual overlay creation by default?
> > We can make rrsnapshot option mandatory.
> > Therefore user will have to create snapshot in image or overlay and
> > the disk image will not be corrupted.
> >
> > It is not very convenient, but we could disable rrsnapshot again when
> > the solution for -snapshot will be found.
> 
> Hm, what is this rrsnapshot option? git grep can't find it.

It was a patch that was not included yet.
This option creates/loads vm snapshot in record/replay mode
leaving original disk image unchanged.
Record/replay without this option uses '-snapshot' to preserve
the state of the disk images.

> Anyway, it seems that doing things manually is the safe way as long as
> we don't know the final solution, so I think I agree.
> 
> For a slightly more convenient way, one of the problems to solve seems
> to be that snapshot=on always affects the top level node and you can't
> create a temporary snapshot in the middle of the chain. Perhaps we
> should introduce a 'temporary-overlay' driver or something like that, so
> that you could specify things like this:
> 
>     -drive if=none,driver=file,filename=test.img,id=orig
>     -drive if=none,driver=temporary-overlay,file=orig,id=snap
>     -drive if=none,driver=blkreplay,image=snap

This seems reasonable for manual way.

> Which makes me wonder... Is blkreplay usable without the temporary
> snapshot or is this pretty much a requirement? 

It's not a requirement. But to make replay deterministic we have to
start with the same image every time. As I know, this may be achieved by:
1. Restoring original disk image manually
2. Using vm snapshot to start execution from
3. Using -snapshot option
4. Not using disks at all

> Because if it has to be
> there, the next step could be that blkreplay creates temporary-overlay
> internally in its .bdrv_open().

Here is your answer about such an approach :)
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg04687.html

Pavel Dovgalyuk

Kevin Wolf Dec. 5, 2016, 12:44 p.m. UTC | #19

Am 05.12.2016 um 12:49 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > Am 05.12.2016 um 08:43 hat Pavel Dovgalyuk geschrieben:
> > > Paolo,
> > >
> > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > Am 21.11.2016 um 13:22 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > > > > Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > > > > >
> > > > > > How does that bypass blkreplay? blk->root is supposed to be the blkreply
> > > > > > node, do you see something different? If it were the qcow2 node, then I
> > > > > > would expect that no requests at all go through the blkreplay layer.
> > > > >
> > > > > It seems, that the problem is in -snapshot option.
> > > > > We have one of the following block layers depending on command line:
> > > > >  tmp_overlay1 -> blkreplay -> tmp_overlay2 -> disk_image
> > > > >  tmp_overlay1 -> blkreplay -> disk_image
> > > > >
> > > > > But the correct scheme is intended to be the following:
> > > > >  blkreplay -> tmp_overlay1 -> disk_image
> > > > >
> > > > > How can we fix it?
> > > > > Maybe we should add blkreplay automatically to all block devices and not
> > > > > to specify it in the command line?
> > > >
> > > > I think you found a pretty fundamental design problem with "global"
> > > > drive options that add a filter node such as -snapshot and replay mode
> > > > (replay mode isn't one of them today, but your suggestion to make it
> > > > automatic would turn it into one).
> > > >
> > > > At the core of the problem I think we have two questions:
> > > >
> > > > 1. Which block nodes should be affected and get a filter node added, and
> > > >    which nodes shouldn't get one? In your case, disl_image is defined
> > > >    with a -drive option, but shouldn't get the snapshot.
> > > >
> > > > 2. In which order should filter nodes be added?
> > > >
> > > > Both of these questions feel hard. As long as we haven't thought through
> > > > the concept as such (rather than discussing one-off hacks) and we're not
> > > > completely certain what the right answer to the questions is, we
> > > > shouldn't add more automatic filter nodes, because chances are that we
> > > > get it wrong and would regret it.
> > > >
> > > > The obvious answer for a workaround would be: Make everything manual,
> > > > i.e. don't use -snapshot, but create a qcow2 overlay manually.
> > >
> > > What about to switching to manual overlay creation by default?
> > > We can make rrsnapshot option mandatory.
> > > Therefore user will have to create snapshot in image or overlay and
> > > the disk image will not be corrupted.
> > >
> > > It is not very convenient, but we could disable rrsnapshot again when
> > > the solution for -snapshot will be found.
> > 
> > Hm, what is this rrsnapshot option? git grep can't find it.
> 
> It was a patch that was not included yet.
> This option creates/loads vm snapshot in record/replay mode
> leaving original disk image unchanged.

You mean internal qcow2 snapshots? Ok.

> Record/replay without this option uses '-snapshot' to preserve
> the state of the disk images.
> 
> > Anyway, it seems that doing things manually is the safe way as long as
> > we don't know the final solution, so I think I agree.
> > 
> > For a slightly more convenient way, one of the problems to solve seems
> > to be that snapshot=on always affects the top level node and you can't
> > create a temporary snapshot in the middle of the chain. Perhaps we
> > should introduce a 'temporary-overlay' driver or something like that, so
> > that you could specify things like this:
> > 
> >     -drive if=none,driver=file,filename=test.img,id=orig
> >     -drive if=none,driver=temporary-overlay,file=orig,id=snap
> >     -drive if=none,driver=blkreplay,image=snap
> 
> This seems reasonable for manual way.

Maybe another, easier to implement option could be something like this:

    -drive if=none,driver=file,filename=test.img,snapshot=on,overlay.node-name=snap
    -drive if=none,driver=blkreplay,image=snap

It would require that we implement support for overlay.* options like we
already support backing.* options. Allowing to specify options for the
overlay node is probably nice to have anyway.

However, there could be a few tricky parts there. For example, what
happens if someone uses overlay.backing=something-else? Perhaps
completely disallowing backing and backing.* for overlays would already
solve this.

> > Which makes me wonder... Is blkreplay usable without the temporary
> > snapshot or is this pretty much a requirement? 
> 
> It's not a requirement. But to make replay deterministic we have to
> start with the same image every time. As I know, this may be achieved by:
> 1. Restoring original disk image manually
> 2. Using vm snapshot to start execution from
> 3. Using -snapshot option
> 4. Not using disks at all
> 
> > Because if it has to be
> > there, the next step could be that blkreplay creates temporary-overlay
> > internally in its .bdrv_open().
> 
> Here is your answer about such an approach :)
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg04687.html

Right, and unfortunately these are still good points.

Especially the part where you allowed to give the overlay filename
really needs to work the way it does now with the 'image' option. We
might not need to be that strict with temporary overlays, restricting to
qcow2 with default options could be acceptable there - but whatever I
think of to support both cases results in something that isn't really
easier than the manual way that we figured out above.

Kevin

Pavel Dovgalyuk Dec. 6, 2016, 9:37 a.m. UTC | #20

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 05.12.2016 um 12:49 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> > > Am 05.12.2016 um 08:43 hat Pavel Dovgalyuk geschrieben:
> 
> > Record/replay without this option uses '-snapshot' to preserve
> > the state of the disk images.
> >
> > > Anyway, it seems that doing things manually is the safe way as long as
> > > we don't know the final solution, so I think I agree.
> > >
> > > For a slightly more convenient way, one of the problems to solve seems
> > > to be that snapshot=on always affects the top level node and you can't
> > > create a temporary snapshot in the middle of the chain. Perhaps we
> > > should introduce a 'temporary-overlay' driver or something like that, so
> > > that you could specify things like this:
> > >
> > >     -drive if=none,driver=file,filename=test.img,id=orig
> > >     -drive if=none,driver=temporary-overlay,file=orig,id=snap
> > >     -drive if=none,driver=blkreplay,image=snap
> >
> > This seems reasonable for manual way.
> 
> Maybe another, easier to implement option could be something like this:
> 
>     -drive if=none,driver=file,filename=test.img,snapshot=on,overlay.node-name=snap
>     -drive if=none,driver=blkreplay,image=snap
> 
> It would require that we implement support for overlay.* options like we
> already support backing.* options. Allowing to specify options for the
> overlay node is probably nice to have anyway.
> 
> However, there could be a few tricky parts there. For example, what
> happens if someone uses overlay.backing=something-else? Perhaps
> completely disallowing backing and backing.* for overlays would already
> solve this.
> 
> > > Which makes me wonder... Is blkreplay usable without the temporary
> > > snapshot or is this pretty much a requirement?
> >
> > It's not a requirement. But to make replay deterministic we have to
> > start with the same image every time. As I know, this may be achieved by:
> > 1. Restoring original disk image manually
> > 2. Using vm snapshot to start execution from
> > 3. Using -snapshot option
> > 4. Not using disks at all
> >
> > > Because if it has to be
> > > there, the next step could be that blkreplay creates temporary-overlay
> > > internally in its .bdrv_open().
> >
> > Here is your answer about such an approach :)
> > https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg04687.html
> 
> Right, and unfortunately these are still good points.
> 
> Especially the part where you allowed to give the overlay filename
> really needs to work the way it does now with the 'image' option. We
> might not need to be that strict with temporary overlays, restricting to
> qcow2 with default options could be acceptable there - but whatever I
> think of to support both cases results in something that isn't really
> easier than the manual way that we figured out above.

Can we stop on the following?
1. Don't create any overlays automatically when user wants to save/restore VM state
2. In the opposite case create snapshots, but do not use -snapshot option.
   Snapshots will be created by the blkreplay as in the link specified.

Pavel Dovgalyuk

Pavel Dovgalyuk Dec. 22, 2016, 6:58 a.m. UTC | #21

Kevin,

> From: Kevin Wolf [mailto:kwolf@redhat.com]
> Am 05.12.2016 um 12:49 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kwolf@redhat.com]
> >
> > It's not a requirement. But to make replay deterministic we have to
> > start with the same image every time. As I know, this may be achieved by:
> > 1. Restoring original disk image manually
> > 2. Using vm snapshot to start execution from
> > 3. Using -snapshot option
> > 4. Not using disks at all
> >
> > > Because if it has to be
> > > there, the next step could be that blkreplay creates temporary-overlay
> > > internally in its .bdrv_open().
> >
> > Here is your answer about such an approach :)
> > https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg04687.html
> 
> Right, and unfortunately these are still good points.
> 
> Especially the part where you allowed to give the overlay filename
> really needs to work the way it does now with the 'image' option. We
> might not need to be that strict with temporary overlays, restricting to
> qcow2 with default options could be acceptable there - but whatever I
> think of to support both cases results in something that isn't really
> easier than the manual way that we figured out above.

What about the new version?

Pavel Dovgalyuk

[v5,7/9] block: don't make snapshots for filters

Commit Message

Comments

Patch