mbox series

[v5,0/4] virtio-fs: shared file system for virtual machines

Message ID 20190910151206.4671-1-mszeredi@redhat.com (mailing list archive)
Headers show
Series virtio-fs: shared file system for virtual machines | expand

Message

Miklos Szeredi Sept. 10, 2019, 3:12 p.m. UTC
Git tree for this version is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#virtiofs-v5

Only post patches that actually add virtiofs (virtiofs-v5-base..virtiofs-v5).

I've folded the series from Vivek and fixed a couple of TODO comments
myself.  AFAICS two issues remain that need to be resolved in the short
term, one way or the other: freeze/restore and full virtqueue.

Thanks,
Miklos
---

Dr. David Alan Gilbert (1):
  fuse: reserve values for mapping protocol

Michael S. Tsirkin (1):
  fuse: reserve byteswapped init opcodes

Stefan Hajnoczi (2):
  virtio-fs: add Documentation/filesystems/virtiofs.rst
  virtio-fs: add virtiofs filesystem

 Documentation/filesystems/index.rst    |   10 +
 Documentation/filesystems/virtiofs.rst |   60 ++
 MAINTAINERS                            |   11 +
 fs/fuse/Kconfig                        |   11 +
 fs/fuse/Makefile                       |    1 +
 fs/fuse/fuse_i.h                       |    9 +
 fs/fuse/inode.c                        |    4 +
 fs/fuse/virtio_fs.c                    | 1191 ++++++++++++++++++++++++
 include/uapi/linux/fuse.h              |   12 +-
 include/uapi/linux/virtio_fs.h         |   19 +
 include/uapi/linux/virtio_ids.h        |    1 +
 11 files changed, 1328 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/filesystems/virtiofs.rst
 create mode 100644 fs/fuse/virtio_fs.c
 create mode 100644 include/uapi/linux/virtio_fs.h

Comments

Stefan Hajnoczi Sept. 11, 2019, 12:24 p.m. UTC | #1
On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> Git tree for this version is available here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#virtiofs-v5
> 
> Only post patches that actually add virtiofs (virtiofs-v5-base..virtiofs-v5).
> 
> I've folded the series from Vivek and fixed a couple of TODO comments
> myself.  AFAICS two issues remain that need to be resolved in the short
> term, one way or the other: freeze/restore and full virtqueue.

Thank you!  I am investigating freeze/restore.

Stefan
Vivek Goyal Sept. 11, 2019, 2:52 p.m. UTC | #2
On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> Git tree for this version is available here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#virtiofs-v5
> 
> Only post patches that actually add virtiofs (virtiofs-v5-base..virtiofs-v5).
> 
> I've folded the series from Vivek and fixed a couple of TODO comments
> myself.  AFAICS two issues remain that need to be resolved in the short
> term, one way or the other: freeze/restore and full virtqueue.

Hi Miklos,

We are already handling full virtqueue by waiting a bit and retrying.
I think TODO in virtio_fs_enqueue_req() is stale. Caller already
handles virtqueue full situation by retrying.

Havind said that, this could be improved by using some sort of wait
queue or completion privimitve.

Thanks
Vivek
Stefan Hajnoczi Sept. 11, 2019, 3:52 p.m. UTC | #3
On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> I've folded the series from Vivek and fixed a couple of TODO comments
> myself.  AFAICS two issues remain that need to be resolved in the short
> term, one way or the other: freeze/restore and full virtqueue.

I have researched freeze/restore and come to the conclusion that it
needs to be a future feature.  It will probably come together with live
migration support for reasons mentioned below.

Most virtio devices have fairly simply power management freeze/restore
functions that shut down the device and bring it back to the state held
in memory, respectively.  virtio-fs, as well as virtio-9p and
virtio-gpu, are different because they contain session state.  It is not
easily possible to bring back the state held in memory after the device
has been reset.

The following areas of the FUSE protocol are stateful and need special
attention:

 * FUSE_INIT - this is pretty easy, we must re-negotiate the same
   settings as before.

 * FUSE_LOOKUP -> fuse_inode (inode_map)

   The session contains a set of inode numbers that have been looked up
   using FUSE_LOOKUP.  They are ephemeral in the current virtiofsd
   implementation and vary across device reset.  Therefore we are unable
   to restore the same inode numbers upon restore.

   The solution is persistent inode numbers in virtiofsd.  This is also
   needed to make open_by_handle_at(2) work and probably for live
   migration.

 * FUSE_OPEN -> fh (fd_map)

   The session contains FUSE file handles for open files.  There is
   currently no way of re-opening a file so that a specific fh is
   returned.  A mechanism to do so probably isn't necessary if the
   driver can update the fh to the new one produced by the device for
   all open files instead.

 * FUSE_OPENDIR -> fh (dirp_map)

   Same story as for FUSE_OPEN but for open directories.

 * FUSE_GETLK/SETLK/SETLKW -> (inode->posix_locks and fcntl(F_OFD_GET/SETLK))

   The session contains file locks.  The driver must reacquire them upon
   restore.  It's unclear what to do when locking fails.

Live migration has the same problem since the FUSE session will be moved
to a new virtio-fs device instance.  It makes sense to tackle both
features together.  This is something that can be implemented in the
next year, but it's not a quick fix.

Stefan
Miklos Szeredi Sept. 12, 2019, 7:34 a.m. UTC | #4
On Wed, Sep 11, 2019 at 4:53 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> > Git tree for this version is available here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#virtiofs-v5
> >
> > Only post patches that actually add virtiofs (virtiofs-v5-base..virtiofs-v5).
> >
> > I've folded the series from Vivek and fixed a couple of TODO comments
> > myself.  AFAICS two issues remain that need to be resolved in the short
> > term, one way or the other: freeze/restore and full virtqueue.
>
> Hi Miklos,
>
> We are already handling full virtqueue by waiting a bit and retrying.
> I think TODO in virtio_fs_enqueue_req() is stale. Caller already
> handles virtqueue full situation by retrying.

Ah.

>
> Havind said that, this could be improved by using some sort of wait
> queue or completion privimitve.

Yeah, the request queuing can be cleaned up in several ways.  But I
think that we might postpone that till after merging with mainline.

Thanks,
Miklos
Miklos Szeredi Sept. 12, 2019, 8:14 a.m. UTC | #5
On Wed, Sep 11, 2019 at 5:54 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> > I've folded the series from Vivek and fixed a couple of TODO comments
> > myself.  AFAICS two issues remain that need to be resolved in the short
> > term, one way or the other: freeze/restore and full virtqueue.
>
> I have researched freeze/restore and come to the conclusion that it
> needs to be a future feature.  It will probably come together with live
> migration support for reasons mentioned below.
>
> Most virtio devices have fairly simply power management freeze/restore
> functions that shut down the device and bring it back to the state held
> in memory, respectively.  virtio-fs, as well as virtio-9p and
> virtio-gpu, are different because they contain session state.  It is not
> easily possible to bring back the state held in memory after the device
> has been reset.
>
> The following areas of the FUSE protocol are stateful and need special
> attention:
>
>  * FUSE_INIT - this is pretty easy, we must re-negotiate the same
>    settings as before.
>
>  * FUSE_LOOKUP -> fuse_inode (inode_map)
>
>    The session contains a set of inode numbers that have been looked up
>    using FUSE_LOOKUP.  They are ephemeral in the current virtiofsd
>    implementation and vary across device reset.  Therefore we are unable
>    to restore the same inode numbers upon restore.
>
>    The solution is persistent inode numbers in virtiofsd.  This is also
>    needed to make open_by_handle_at(2) work and probably for live
>    migration.
>
>  * FUSE_OPEN -> fh (fd_map)
>
>    The session contains FUSE file handles for open files.  There is
>    currently no way of re-opening a file so that a specific fh is
>    returned.  A mechanism to do so probably isn't necessary if the
>    driver can update the fh to the new one produced by the device for
>    all open files instead.
>
>  * FUSE_OPENDIR -> fh (dirp_map)
>
>    Same story as for FUSE_OPEN but for open directories.
>
>  * FUSE_GETLK/SETLK/SETLKW -> (inode->posix_locks and fcntl(F_OFD_GET/SETLK))
>
>    The session contains file locks.  The driver must reacquire them upon
>    restore.  It's unclear what to do when locking fails.
>
> Live migration has the same problem since the FUSE session will be moved
> to a new virtio-fs device instance.  It makes sense to tackle both
> features together.  This is something that can be implemented in the
> next year, but it's not a quick fix.

Right.   The question for now is: should the freeze silently succeed
(as it seems to do now) or should it fail instead?

I guess normally freezing should be okay, as long as the virtiofsd
remains connected while the system is frozen.

I tried to test this with "echo -n mem > /sys/power/state", which
indeed resulted in the virtio_fs_freeze() callback being called.
However, I couldn't find a way to wake up the system...

Thanks,
Miklos
Stefan Hajnoczi Sept. 12, 2019, 12:54 p.m. UTC | #6
On Thu, Sep 12, 2019 at 10:14:11AM +0200, Miklos Szeredi wrote:
> On Wed, Sep 11, 2019 at 5:54 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> > > I've folded the series from Vivek and fixed a couple of TODO comments
> > > myself.  AFAICS two issues remain that need to be resolved in the short
> > > term, one way or the other: freeze/restore and full virtqueue.
> >
> > I have researched freeze/restore and come to the conclusion that it
> > needs to be a future feature.  It will probably come together with live
> > migration support for reasons mentioned below.
> >
> > Most virtio devices have fairly simply power management freeze/restore
> > functions that shut down the device and bring it back to the state held
> > in memory, respectively.  virtio-fs, as well as virtio-9p and
> > virtio-gpu, are different because they contain session state.  It is not
> > easily possible to bring back the state held in memory after the device
> > has been reset.
> >
> > The following areas of the FUSE protocol are stateful and need special
> > attention:
> >
> >  * FUSE_INIT - this is pretty easy, we must re-negotiate the same
> >    settings as before.
> >
> >  * FUSE_LOOKUP -> fuse_inode (inode_map)
> >
> >    The session contains a set of inode numbers that have been looked up
> >    using FUSE_LOOKUP.  They are ephemeral in the current virtiofsd
> >    implementation and vary across device reset.  Therefore we are unable
> >    to restore the same inode numbers upon restore.
> >
> >    The solution is persistent inode numbers in virtiofsd.  This is also
> >    needed to make open_by_handle_at(2) work and probably for live
> >    migration.
> >
> >  * FUSE_OPEN -> fh (fd_map)
> >
> >    The session contains FUSE file handles for open files.  There is
> >    currently no way of re-opening a file so that a specific fh is
> >    returned.  A mechanism to do so probably isn't necessary if the
> >    driver can update the fh to the new one produced by the device for
> >    all open files instead.
> >
> >  * FUSE_OPENDIR -> fh (dirp_map)
> >
> >    Same story as for FUSE_OPEN but for open directories.
> >
> >  * FUSE_GETLK/SETLK/SETLKW -> (inode->posix_locks and fcntl(F_OFD_GET/SETLK))
> >
> >    The session contains file locks.  The driver must reacquire them upon
> >    restore.  It's unclear what to do when locking fails.
> >
> > Live migration has the same problem since the FUSE session will be moved
> > to a new virtio-fs device instance.  It makes sense to tackle both
> > features together.  This is something that can be implemented in the
> > next year, but it's not a quick fix.
> 
> Right.   The question for now is: should the freeze silently succeed
> (as it seems to do now) or should it fail instead?
> 
> I guess normally freezing should be okay, as long as the virtiofsd
> remains connected while the system is frozen.
> 
> I tried to test this with "echo -n mem > /sys/power/state", which
> indeed resulted in the virtio_fs_freeze() callback being called.
> However, I couldn't find a way to wake up the system...

The issue occurs only on restore.  The core virtio driver code resets
the device so we lose state and cannot resume.

virtio-9p and virtio-gpu do not implement the .freeze() callback but
this is problematic since the system will think freeze succeeded.  It's
safer for virtio-fs to implement .freeze() and return -EOPNOTSUPP.

Can you squash in a trivial return -EOPNOTSUPP .freeze() function?

Thanks,
Stefan
Miklos Szeredi Sept. 12, 2019, 1:06 p.m. UTC | #7
On Thu, Sep 12, 2019 at 2:54 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Sep 12, 2019 at 10:14:11AM +0200, Miklos Szeredi wrote:
> > On Wed, Sep 11, 2019 at 5:54 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> > > > I've folded the series from Vivek and fixed a couple of TODO comments
> > > > myself.  AFAICS two issues remain that need to be resolved in the short
> > > > term, one way or the other: freeze/restore and full virtqueue.
> > >
> > > I have researched freeze/restore and come to the conclusion that it
> > > needs to be a future feature.  It will probably come together with live
> > > migration support for reasons mentioned below.
> > >
> > > Most virtio devices have fairly simply power management freeze/restore
> > > functions that shut down the device and bring it back to the state held
> > > in memory, respectively.  virtio-fs, as well as virtio-9p and
> > > virtio-gpu, are different because they contain session state.  It is not
> > > easily possible to bring back the state held in memory after the device
> > > has been reset.
> > >
> > > The following areas of the FUSE protocol are stateful and need special
> > > attention:
> > >
> > >  * FUSE_INIT - this is pretty easy, we must re-negotiate the same
> > >    settings as before.
> > >
> > >  * FUSE_LOOKUP -> fuse_inode (inode_map)
> > >
> > >    The session contains a set of inode numbers that have been looked up
> > >    using FUSE_LOOKUP.  They are ephemeral in the current virtiofsd
> > >    implementation and vary across device reset.  Therefore we are unable
> > >    to restore the same inode numbers upon restore.
> > >
> > >    The solution is persistent inode numbers in virtiofsd.  This is also
> > >    needed to make open_by_handle_at(2) work and probably for live
> > >    migration.
> > >
> > >  * FUSE_OPEN -> fh (fd_map)
> > >
> > >    The session contains FUSE file handles for open files.  There is
> > >    currently no way of re-opening a file so that a specific fh is
> > >    returned.  A mechanism to do so probably isn't necessary if the
> > >    driver can update the fh to the new one produced by the device for
> > >    all open files instead.
> > >
> > >  * FUSE_OPENDIR -> fh (dirp_map)
> > >
> > >    Same story as for FUSE_OPEN but for open directories.
> > >
> > >  * FUSE_GETLK/SETLK/SETLKW -> (inode->posix_locks and fcntl(F_OFD_GET/SETLK))
> > >
> > >    The session contains file locks.  The driver must reacquire them upon
> > >    restore.  It's unclear what to do when locking fails.
> > >
> > > Live migration has the same problem since the FUSE session will be moved
> > > to a new virtio-fs device instance.  It makes sense to tackle both
> > > features together.  This is something that can be implemented in the
> > > next year, but it's not a quick fix.
> >
> > Right.   The question for now is: should the freeze silently succeed
> > (as it seems to do now) or should it fail instead?
> >
> > I guess normally freezing should be okay, as long as the virtiofsd
> > remains connected while the system is frozen.
> >
> > I tried to test this with "echo -n mem > /sys/power/state", which
> > indeed resulted in the virtio_fs_freeze() callback being called.
> > However, I couldn't find a way to wake up the system...
>
> The issue occurs only on restore.  The core virtio driver code resets
> the device so we lose state and cannot resume.
>
> virtio-9p and virtio-gpu do not implement the .freeze() callback but
> this is problematic since the system will think freeze succeeded.  It's
> safer for virtio-fs to implement .freeze() and return -EOPNOTSUPP.
>
> Can you squash in a trivial return -EOPNOTSUPP .freeze() function?

Sure.

Is this a regression from 9p?  How easy would it be to restore state
in virtques and reconnect to existing virtiofsd (no saving of FUSE
state should be required in this case)?

Thanks,
Miklos
Miklos Szeredi Sept. 12, 2019, 2:12 p.m. UTC | #8
On Thu, Sep 12, 2019 at 3:07 PM Miklos Szeredi <mszeredi@redhat.com> wrote:

> Is this a regression from 9p?

Let me answer myself: 9p seems to behave similarly: after
suspend/resume it hangs.

So added -EOPNOTSUPP + pr_warn() to the freeze function and verified
that this fixes the bad behavior.

Thanks,
Miklos