diff mbox series

virtiofsd: Show submounts

Message ID 20200424133516.73077-1-mreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series virtiofsd: Show submounts | expand

Commit Message

Max Reitz April 24, 2020, 1:35 p.m. UTC
Currently, setup_mounts() bind-mounts the shared directory without
MS_REC.  This makes all submounts disappear.

Pass MS_REC so that the guest can see submounts again.

Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tools/virtiofsd/passthrough_ll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dr. David Alan Gilbert April 27, 2020, 5:59 p.m. UTC | #1
* Max Reitz (mreitz@redhat.com) wrote:
> Currently, setup_mounts() bind-mounts the shared directory without
> MS_REC.  This makes all submounts disappear.
> 
> Pass MS_REC so that the guest can see submounts again.

Thanks!

> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756

Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?

> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 4c35c95b25..9d7f863e66 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>      int oldroot;
>      int newroot;
>  
> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>          exit(1);
>      }

Do we want MS_SLAVE to pick up future mounts that might happenf rom the
host?
What's the interaction between this and the MS_REC|MS_SLAVE that we have
a few lines above for / ?

Dave

> -- 
> 2.25.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Max Reitz April 28, 2020, 6:06 a.m. UTC | #2
On 27.04.20 19:59, Dr. David Alan Gilbert wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
>> Currently, setup_mounts() bind-mounts the shared directory without
>> MS_REC.  This makes all submounts disappear.
>>
>> Pass MS_REC so that the guest can see submounts again.
> 
> Thanks!
> 
>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> 
> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?

Well, I bisected it and landed at 3ca8a2b1.  So while the problematic
line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1.

>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  tools/virtiofsd/passthrough_ll.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 4c35c95b25..9d7f863e66 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>>      int oldroot;
>>      int newroot;
>>  
>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>>          exit(1);
>>      }
> 
> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> host?

Hm.  So first it looks to me from the man page like one shouldn’t give
MS_SLAVE on the first mount() call but kind of only use it for remounts
(in the list at the start, “Create a bind mount” is separate from
“Change the propagation type of an existing mount”, and the man page
later says “The only other flags that can be specified while changing
the propagation type are MS_REC (described below) and MS_SILENT (which
is ignored).”).

Second, even if I do change the propagation type to MS_SLAVE in a second
call, mounts done after qemu has been started don’t show up in the guest
(for me).

So while it sounds correct, I can’t see it having an effect, actually.

> What's the interaction between this and the MS_REC|MS_SLAVE that we have
> a few lines above for / ?

Good question.  It would seem to me that there isn’t any.  That previous
mount call just sets MS_REC | MS_SLAVE for the whole mount namespace,
and then we do a new mount here (by default from / to /) that needs its
own flags.

(More interesting is perhaps why we have that other mount() call below,
which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /.
 I can’t imagine that to have any effect.)

Max
Daniel P. Berrangé April 28, 2020, 8:46 a.m. UTC | #3
On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
> > Currently, setup_mounts() bind-mounts the shared directory without
> > MS_REC.  This makes all submounts disappear.
> > 
> > Pass MS_REC so that the guest can see submounts again.
> 
> Thanks!
> 
> > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> 
> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 4c35c95b25..9d7f863e66 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >      int oldroot;
> >      int newroot;
> >  
> > -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> >          exit(1);
> >      }
> 
> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> host?

I think that probably makes sense to have MS_SLAVE by default, as that
means the set of files exposed to the guest is consistent across a QEMU
restart. Without MS_SLAVE new mounts aren't visible until after QEMU
is restarted which is likely surprising/undesirable to admins.


Regards,
Daniel
Dr. David Alan Gilbert April 28, 2020, 9:59 a.m. UTC | #4
* Max Reitz (mreitz@redhat.com) wrote:
> On 27.04.20 19:59, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> >> Currently, setup_mounts() bind-mounts the shared directory without
> >> MS_REC.  This makes all submounts disappear.
> >>
> >> Pass MS_REC so that the guest can see submounts again.
> > 
> > Thanks!
> > 
> >> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> > 
> > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> 
> Well, I bisected it and landed at 3ca8a2b1.  So while the problematic
> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1.

OK, I'd rather stick with the Fixes: for the patch that was actually
wrong.

> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  tools/virtiofsd/passthrough_ll.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >> index 4c35c95b25..9d7f863e66 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >>      int oldroot;
> >>      int newroot;
> >>  
> >> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> >> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> >>          exit(1);
> >>      }
> > 
> > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > host?
> 
> Hm.  So first it looks to me from the man page like one shouldn’t give
> MS_SLAVE on the first mount() call but kind of only use it for remounts
> (in the list at the start, “Create a bind mount” is separate from
> “Change the propagation type of an existing mount”, and the man page
> later says “The only other flags that can be specified while changing
> the propagation type are MS_REC (described below) and MS_SILENT (which
> is ignored).”).
> 
> Second, even if I do change the propagation type to MS_SLAVE in a second
> call, mounts done after qemu has been started don’t show up in the guest
> (for me).
> 
> So while it sounds correct, I can’t see it having an effect, actually.

That's unfortunate; but I guess we can debug that separately

> > What's the interaction between this and the MS_REC|MS_SLAVE that we have
> > a few lines above for / ?
> 
> Good question.  It would seem to me that there isn’t any.  That previous
> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace,
> and then we do a new mount here (by default from / to /) that needs its
> own flags.
> 
> (More interesting is perhaps why we have that other mount() call below,
> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /.
>  I can’t imagine that to have any effect.)

Is that just trying to be careful before the umount2 so it doesn't try
to unmount something useful?

Dave

> Max
> 



--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Max Reitz April 28, 2020, 10:13 a.m. UTC | #5
On 28.04.20 11:59, Dr. David Alan Gilbert wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
>> On 27.04.20 19:59, Dr. David Alan Gilbert wrote:
>>> * Max Reitz (mreitz@redhat.com) wrote:
>>>> Currently, setup_mounts() bind-mounts the shared directory without
>>>> MS_REC.  This makes all submounts disappear.
>>>>
>>>> Pass MS_REC so that the guest can see submounts again.
>>>
>>> Thanks!
>>>
>>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
>>>
>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
>>
>> Well, I bisected it and landed at 3ca8a2b1.  So while the problematic
>> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1.
> 
> OK, I'd rather stick with the Fixes: for the patch that was actually
> wrong.

Why not both? :)

>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  tools/virtiofsd/passthrough_ll.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>> index 4c35c95b25..9d7f863e66 100644
>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>>>>      int oldroot;
>>>>      int newroot;
>>>>  
>>>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
>>>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>>>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>>>>          exit(1);
>>>>      }
>>>
>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
>>> host?
>>
>> Hm.  So first it looks to me from the man page like one shouldn’t give
>> MS_SLAVE on the first mount() call but kind of only use it for remounts
>> (in the list at the start, “Create a bind mount” is separate from
>> “Change the propagation type of an existing mount”, and the man page
>> later says “The only other flags that can be specified while changing
>> the propagation type are MS_REC (described below) and MS_SILENT (which
>> is ignored).”).
>>
>> Second, even if I do change the propagation type to MS_SLAVE in a second
>> call, mounts done after qemu has been started don’t show up in the guest
>> (for me).
>>
>> So while it sounds correct, I can’t see it having an effect, actually.
> 
> That's unfortunate; but I guess we can debug that separately
> 
>>> What's the interaction between this and the MS_REC|MS_SLAVE that we have
>>> a few lines above for / ?
>>
>> Good question.  It would seem to me that there isn’t any.  That previous
>> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace,
>> and then we do a new mount here (by default from / to /) that needs its
>> own flags.
>>
>> (More interesting is perhaps why we have that other mount() call below,
>> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /.
>>  I can’t imagine that to have any effect.)
> 
> Is that just trying to be careful before the umount2 so it doesn't try
> to unmount something useful?

Perhaps, but still, it shouldn’t matter.  I rather suspect that
setup_namespaces() and setup_mounts() were developed (or taken from
elsewhere) independently, so they both have to work independently, and
thus they do overlapping stuff.

Max
Dr. David Alan Gilbert April 28, 2020, 10:19 a.m. UTC | #6
* Max Reitz (mreitz@redhat.com) wrote:
> On 28.04.20 11:59, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> >> On 27.04.20 19:59, Dr. David Alan Gilbert wrote:
> >>> * Max Reitz (mreitz@redhat.com) wrote:
> >>>> Currently, setup_mounts() bind-mounts the shared directory without
> >>>> MS_REC.  This makes all submounts disappear.
> >>>>
> >>>> Pass MS_REC so that the guest can see submounts again.
> >>>
> >>> Thanks!
> >>>
> >>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> >>>
> >>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> >>
> >> Well, I bisected it and landed at 3ca8a2b1.  So while the problematic
> >> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1.
> > 
> > OK, I'd rather stick with the Fixes: for the patch that was actually
> > wrong.
> 
> Why not both? :)
> 
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> ---
> >>>>  tools/virtiofsd/passthrough_ll.c | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >>>> index 4c35c95b25..9d7f863e66 100644
> >>>> --- a/tools/virtiofsd/passthrough_ll.c
> >>>> +++ b/tools/virtiofsd/passthrough_ll.c
> >>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >>>>      int oldroot;
> >>>>      int newroot;
> >>>>  
> >>>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> >>>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >>>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> >>>>          exit(1);
> >>>>      }
> >>>
> >>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> >>> host?
> >>
> >> Hm.  So first it looks to me from the man page like one shouldn’t give
> >> MS_SLAVE on the first mount() call but kind of only use it for remounts
> >> (in the list at the start, “Create a bind mount” is separate from
> >> “Change the propagation type of an existing mount”, and the man page
> >> later says “The only other flags that can be specified while changing
> >> the propagation type are MS_REC (described below) and MS_SILENT (which
> >> is ignored).”).
> >>
> >> Second, even if I do change the propagation type to MS_SLAVE in a second
> >> call, mounts done after qemu has been started don’t show up in the guest
> >> (for me).
> >>
> >> So while it sounds correct, I can’t see it having an effect, actually.
> > 
> > That's unfortunate; but I guess we can debug that separately
> > 
> >>> What's the interaction between this and the MS_REC|MS_SLAVE that we have
> >>> a few lines above for / ?
> >>
> >> Good question.  It would seem to me that there isn’t any.  That previous
> >> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace,
> >> and then we do a new mount here (by default from / to /) that needs its
> >> own flags.
> >>
> >> (More interesting is perhaps why we have that other mount() call below,
> >> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /.
> >>  I can’t imagine that to have any effect.)
> > 
> > Is that just trying to be careful before the umount2 so it doesn't try
> > to unmount something useful?
> 
> Perhaps, but still, it shouldn’t matter.  I rather suspect that
> setup_namespaces() and setup_mounts() were developed (or taken from
> elsewhere) independently, so they both have to work independently, and
> thus they do overlapping stuff.

Yep, agreed.

Dave

> Max
> 



--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Stefan Hajnoczi April 28, 2020, 2:51 p.m. UTC | #7
On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
> > Currently, setup_mounts() bind-mounts the shared directory without
> > MS_REC.  This makes all submounts disappear.
> > 
> > Pass MS_REC so that the guest can see submounts again.
> 
> Thanks!
> 
> > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> 
> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 4c35c95b25..9d7f863e66 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >      int oldroot;
> >      int newroot;
> >  
> > -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> >          exit(1);
> >      }
> 
> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> host?

There are two separate concepts:

1. Mount namespaces.  The virtiofsd process is sandboxed and lives in
   its own mount namespace.  Therefore it does not share the mounts that
   the rest of the host system sees.

2. Propagation type.  This is related to bind mounts so that mount
   operations that happen in one bind-mounted location can also appear
   in other bind-mounted locations.

Since virtiofsd is in a separate mount namespace, does the propagation
type even have any effect?

Stefan
Daniel P. Berrangé April 28, 2020, 2:58 p.m. UTC | #8
On Tue, Apr 28, 2020 at 03:51:43PM +0100, Stefan Hajnoczi wrote:
> On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> > > Currently, setup_mounts() bind-mounts the shared directory without
> > > MS_REC.  This makes all submounts disappear.
> > > 
> > > Pass MS_REC so that the guest can see submounts again.
> > 
> > Thanks!
> > 
> > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> > 
> > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> > 
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 4c35c95b25..9d7f863e66 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> > >      int oldroot;
> > >      int newroot;
> > >  
> > > -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > > +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > >          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> > >          exit(1);
> > >      }
> > 
> > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > host?
> 
> There are two separate concepts:
> 
> 1. Mount namespaces.  The virtiofsd process is sandboxed and lives in
>    its own mount namespace.  Therefore it does not share the mounts that
>    the rest of the host system sees.
> 
> 2. Propagation type.  This is related to bind mounts so that mount
>    operations that happen in one bind-mounted location can also appear
>    in other bind-mounted locations.
> 
> Since virtiofsd is in a separate mount namespace, does the propagation
> type even have any effect?

Yes, propagation should work across mount namespaces if you get the mount
flags right.  You can try it out using  unshare + mount commands
to debug different scenarios.

Regards,
Daniel
Miklos Szeredi April 28, 2020, 7:07 p.m. UTC | #9
On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> > > Currently, setup_mounts() bind-mounts the shared directory without
> > > MS_REC.  This makes all submounts disappear.
> > >
> > > Pass MS_REC so that the guest can see submounts again.
> >
> > Thanks!
> >
> > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> >
> > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> >
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 4c35c95b25..9d7f863e66 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> > >      int oldroot;
> > >      int newroot;
> > >
> > > -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > > +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > >          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> > >          exit(1);
> > >      }
> >
> > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > host?
>
> There are two separate concepts:
>
> 1. Mount namespaces.  The virtiofsd process is sandboxed and lives in
>    its own mount namespace.  Therefore it does not share the mounts that
>    the rest of the host system sees.
>
> 2. Propagation type.  This is related to bind mounts so that mount
>    operations that happen in one bind-mounted location can also appear
>    in other bind-mounted locations.
>
> Since virtiofsd is in a separate mount namespace, does the propagation
> type even have any effect?

It's a complicated thing.  Current setup results in propagation
happening to the cloned namespace, but not to the bind mounted root.

Why?  Because setting mounts "slave" after unshare, results in the
propagation being stopped at that point.  To make it propagate
further, change it back to "shared".  Note: the result changing to
"slave" and then to "shared" results in breaking the backward
propagation to the original namespace, but allowing propagation
further down the chain.

Thanks,
Miklos
Dr. David Alan Gilbert April 28, 2020, 7:15 p.m. UTC | #10
* Miklos Szeredi (mszeredi@redhat.com) wrote:
> On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
> > > * Max Reitz (mreitz@redhat.com) wrote:
> > > > Currently, setup_mounts() bind-mounts the shared directory without
> > > > MS_REC.  This makes all submounts disappear.
> > > >
> > > > Pass MS_REC so that the guest can see submounts again.
> > >
> > > Thanks!
> > >
> > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> > >
> > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> > >
> > > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > > ---
> > > >  tools/virtiofsd/passthrough_ll.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > > index 4c35c95b25..9d7f863e66 100644
> > > > --- a/tools/virtiofsd/passthrough_ll.c
> > > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> > > >      int oldroot;
> > > >      int newroot;
> > > >
> > > > -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > > > +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > > >          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> > > >          exit(1);
> > > >      }
> > >
> > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > > host?
> >
> > There are two separate concepts:
> >
> > 1. Mount namespaces.  The virtiofsd process is sandboxed and lives in
> >    its own mount namespace.  Therefore it does not share the mounts that
> >    the rest of the host system sees.
> >
> > 2. Propagation type.  This is related to bind mounts so that mount
> >    operations that happen in one bind-mounted location can also appear
> >    in other bind-mounted locations.
> >
> > Since virtiofsd is in a separate mount namespace, does the propagation
> > type even have any effect?
> 
> It's a complicated thing.  Current setup results in propagation
> happening to the cloned namespace, but not to the bind mounted root.
> 
> Why?  Because setting mounts "slave" after unshare, results in the
> propagation being stopped at that point.  To make it propagate
> further, change it back to "shared".  Note: the result changing to
> "slave" and then to "shared" results in breaking the backward
> propagation to the original namespace, but allowing propagation
> further down the chain.

Do you mean on the "/" ?

So our current sequence is:

   (new namespace)
 1)    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
 2)   if (mount("proc", "/proc", "proc",
           ....
 3)   if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
 4)  (chdir newroot, pivot, chdir oldroot)
 5)   if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
 6)   if (umount2(".", MNT_DETACH) < 0) {

So are you saying we need a:
       if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {

  and can this go straight after (1) ?

Dave

> Thanks,
> Miklos
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Miklos Szeredi April 29, 2020, 7:59 a.m. UTC | #11
On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:

> So our current sequence is:
>
>    (new namespace)
>  1)    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
>  2)   if (mount("proc", "/proc", "proc",
>            ....
>  3)   if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>  4)  (chdir newroot, pivot, chdir oldroot)
>  5)   if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
>  6)   if (umount2(".", MNT_DETACH) < 0) {
>
> So are you saying we need a:
>        if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {
>
>   and can this go straight after (1) ?

Or right before (3).   Important thing is that that new mount will
only receive propagation if the type of the mount at source (before
(3) is performed) is shared.

Thanks,
Miklos
Max Reitz April 29, 2020, 8:31 a.m. UTC | #12
On 28.04.20 21:15, Dr. David Alan Gilbert wrote:
> * Miklos Szeredi (mszeredi@redhat.com) wrote:
>> On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>
>>> On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote:
>>>> * Max Reitz (mreitz@redhat.com) wrote:
>>>>> Currently, setup_mounts() bind-mounts the shared directory without
>>>>> MS_REC.  This makes all submounts disappear.
>>>>>
>>>>> Pass MS_REC so that the guest can see submounts again.
>>>>
>>>> Thanks!
>>>>
>>>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
>>>>
>>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>  tools/virtiofsd/passthrough_ll.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>>> index 4c35c95b25..9d7f863e66 100644
>>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>>>>>      int oldroot;
>>>>>      int newroot;
>>>>>
>>>>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
>>>>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>>>>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>>>>>          exit(1);
>>>>>      }
>>>>
>>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
>>>> host?
>>>
>>> There are two separate concepts:
>>>
>>> 1. Mount namespaces.  The virtiofsd process is sandboxed and lives in
>>>    its own mount namespace.  Therefore it does not share the mounts that
>>>    the rest of the host system sees.
>>>
>>> 2. Propagation type.  This is related to bind mounts so that mount
>>>    operations that happen in one bind-mounted location can also appear
>>>    in other bind-mounted locations.
>>>
>>> Since virtiofsd is in a separate mount namespace, does the propagation
>>> type even have any effect?
>>
>> It's a complicated thing.  Current setup results in propagation
>> happening to the cloned namespace, but not to the bind mounted root.
>>
>> Why?  Because setting mounts "slave" after unshare, results in the
>> propagation being stopped at that point.  To make it propagate
>> further, change it back to "shared".  Note: the result changing to
>> "slave" and then to "shared" results in breaking the backward
>> propagation to the original namespace, but allowing propagation
>> further down the chain.
> 
> Do you mean on the "/" ?
> 
> So our current sequence is:
> 
>    (new namespace)
>  1)    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
>  2)   if (mount("proc", "/proc", "proc",
>            ....
>  3)   if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>  4)  (chdir newroot, pivot, chdir oldroot)
>  5)   if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
>  6)   if (umount2(".", MNT_DETACH) < 0) {
> 
> So are you saying we need a:
>        if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {
> 
>   and can this go straight after (1) ?

Isn’t MS_SHARED and MS_SLAVE mutually exclusive, that is, both are just
different propagation types?  So shouldn’t putting this after (1) be
effectively the same as replacing (1)?

Max
Miklos Szeredi April 29, 2020, 8:52 a.m. UTC | #13
On Wed, Apr 29, 2020 at 10:31 AM Max Reitz <mreitz@redhat.com> wrote:
>
> On 28.04.20 21:15, Dr. David Alan Gilbert wrote:

> > So are you saying we need a:
> >        if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {
> >
> >   and can this go straight after (1) ?
>
> Isn’t MS_SHARED and MS_SLAVE mutually exclusive, that is, both are just
> different propagation types?  So shouldn’t putting this after (1) be
> effectively the same as replacing (1)?

No, think of it more like a set of groups (mounts in a group have the
same "shared:$ID" tag in /proc/self/mountinfo) and these groups being
arranged into a tree, where child groups get propagation from the
parent group (mounts have a "master:$PARENT_ID" tag), but not the
other way round.

So if a mount is part of a shared group and this group is also the
child of another shared group, then it will have both a "shared:$ID"
and a "master:$PARENT_ID" tag in /proc/self/mountinfo.

For the gory details see Documentation/filesystems/sharedsubtree.txt

Thanks,
Miklos
Miklos Szeredi April 29, 2020, 9:26 a.m. UTC | #14
On Wed, Apr 29, 2020 at 9:59 AM Miklos Szeredi <mszeredi@redhat.com> wrote:
>
> On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>
> > So our current sequence is:
> >
> >    (new namespace)
> >  1)    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
> >  2)   if (mount("proc", "/proc", "proc",
> >            ....
> >  3)   if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >  4)  (chdir newroot, pivot, chdir oldroot)
> >  5)   if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
> >  6)   if (umount2(".", MNT_DETACH) < 0) {
> >
> > So are you saying we need a:
> >        if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {
> >
> >   and can this go straight after (1) ?
>
> Or right before (3).   Important thing is that that new mount will
> only receive propagation if the type of the mount at source (before
> (3) is performed) is shared.

And seems I was wrong.  Bind mounting clones the slave property, hence
no need to set MS_SHARED.  I.e. if the source was a slave, the bind
mount will be a slave to the same master as well; the two slaves won't
receive propagation between each other, but both will receive
propagation from the master.

The only reason to set MS_SHARED would be if the bind mount wanted to
receive propagation from within the cloned namespace.   Which is not
the case.

Didn't I tell ya it was complicated ;)

Thanks,
Miklos
Max Reitz April 29, 2020, 9:31 a.m. UTC | #15
On 29.04.20 10:52, Miklos Szeredi wrote:
> On Wed, Apr 29, 2020 at 10:31 AM Max Reitz <mreitz@redhat.com> wrote:
>>
>> On 28.04.20 21:15, Dr. David Alan Gilbert wrote:
> 
>>> So are you saying we need a:
>>>        if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {
>>>
>>>   and can this go straight after (1) ?
>>
>> Isn’t MS_SHARED and MS_SLAVE mutually exclusive, that is, both are just
>> different propagation types?  So shouldn’t putting this after (1) be
>> effectively the same as replacing (1)?
> 
> No, think of it more like a set of groups (mounts in a group have the
> same "shared:$ID" tag in /proc/self/mountinfo) and these groups being
> arranged into a tree, where child groups get propagation from the
> parent group (mounts have a "master:$PARENT_ID" tag), but not the
> other way round.
> 
> So if a mount is part of a shared group and this group is also the
> child of another shared group, then it will have both a "shared:$ID"
> and a "master:$PARENT_ID" tag in /proc/self/mountinfo.

Ah, OK.

Thanks a lot for the explanation!

> For the gory details see Documentation/filesystems/sharedsubtree.txt

Not sure whether I want to *cough* O:)

Max
Vivek Goyal April 29, 2020, 12:34 p.m. UTC | #16
On Wed, Apr 29, 2020 at 11:26:49AM +0200, Miklos Szeredi wrote:
> On Wed, Apr 29, 2020 at 9:59 AM Miklos Szeredi <mszeredi@redhat.com> wrote:
> >
> > On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> >
> > > So our current sequence is:
> > >
> > >    (new namespace)
> > >  1)    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
> > >  2)   if (mount("proc", "/proc", "proc",
> > >            ....
> > >  3)   if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > >  4)  (chdir newroot, pivot, chdir oldroot)
> > >  5)   if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
> > >  6)   if (umount2(".", MNT_DETACH) < 0) {
> > >
> > > So are you saying we need a:
> > >        if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {
> > >
> > >   and can this go straight after (1) ?
> >
> > Or right before (3).   Important thing is that that new mount will
> > only receive propagation if the type of the mount at source (before
> > (3) is performed) is shared.
> 
> And seems I was wrong.  Bind mounting clones the slave property, hence
> no need to set MS_SHARED.  I.e. if the source was a slave, the bind
> mount will be a slave to the same master as well; the two slaves won't
> receive propagation between each other, but both will receive
> propagation from the master.

Agreed. I was playing with it yesterday and noticed the same thing. Wanted
to test more before I said anything

Anyway, I did following.

$ mkdir /tmp/a /tmp/a/c /tmp/b
$ mount --bind /tmp/a /tmp/a

$ findmnt -o +PROPAGATION /tmp/a
TARGET SOURCE    FSTYPE OPTIONS                  PROPAGATION
/tmp/a tmpfs[/a] tmpfs  rw,nosuid,nodev,seclabel shared

$ cat /proc/self/mountifo | grep "/tmp/a"
613 49 0:45 /a /tmp/a rw,nosuid,nodev shared:30 - tmpfs tmpfs rw,seclabel

# Mountpoint /tmp/a is part of peer group "30"
# Create a new mount namespace with slave propagation

$ unshare -m --propagation slave bash

$ findmnt -o +PROPAGATION /tmp/a
TARGET SOURCE    FSTYPE OPTIONS                  PROPAGATION
/tmp/a tmpfs[/a] tmpfs  rw,nosuid,nodev,seclabel private,slave

$ cat /proc/self/mountinfo | grep /tmp/a
666 665 0:45 /a /tmp/a rw,nosuid,nodev master:30 - tmpfs tmpfs rw,seclabel

# /tmp/a in new mount namespace is slave of master "30"

# bind mount /tmp/a to /tmp/b and b should become slave of "30" too.
$ mount --bind /tmp/a /tmp/b

$findmnt -o +PROPAGATION /tmp/b
TARGET SOURCE    FSTYPE OPTIONS                  PROPAGATION
/tmp/b tmpfs[/a] tmpfs  rw,nosuid,nodev,seclabel private,slave

$ cat /proc/self/mountinfo | grep /tmp/b
671 665 0:45 /a /tmp/b rw,nosuid,nodev master:30 - tmpfs tmpfs rw,seclabel

# So /tmp/b is slave of "master:30" too. Say if host mounts something
# under /tmp/a (in init namespace), it should propagate to /tmp/a as
# well as /tmp/b in new mount namespace.

# Do following in init mount namespace
$ mount --bind /tmp/a/c /tmp/a/c

# Check in newly created mount namespace.
# findmnt
├─/tmp                                tmpfs       tmpfs   rw,nosuid,nodev,seclab
│ ├─/tmp/a                            tmpfs[/a]   tmpfs   rw,nosuid,nodev,seclab
│ │ └─/tmp/a/c                        tmpfs[/a/c] tmpfs   rw,nosuid,nodev,seclab
│ ├─/tmp/b                            tmpfs[/a]   tmpfs   rw,nosuid,nodev,seclab
│ │ └─/tmp/b/c                        tmpfs[/a/c] tmpfs   rw,nosuid,nodev,seclab
│ └─/tmp/a/c                          tmpfs[/a/c] tmpfs   rw,nosuid,nodev,seclab

Mount of c has propagated into /tmp/b/c as well.

And that's what we want.

Thanks
Vivek
Vivek Goyal April 29, 2020, 12:41 p.m. UTC | #17
On Wed, Apr 29, 2020 at 08:34:24AM -0400, Vivek Goyal wrote:
> On Wed, Apr 29, 2020 at 11:26:49AM +0200, Miklos Szeredi wrote:
> > On Wed, Apr 29, 2020 at 9:59 AM Miklos Szeredi <mszeredi@redhat.com> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert
> > > <dgilbert@redhat.com> wrote:
> > >
> > > > So our current sequence is:
> > > >
> > > >    (new namespace)
> > > >  1)    if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) {
> > > >  2)   if (mount("proc", "/proc", "proc",
> > > >            ....
> > > >  3)   if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > > >  4)  (chdir newroot, pivot, chdir oldroot)
> > > >  5)   if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
> > > >  6)   if (umount2(".", MNT_DETACH) < 0) {
> > > >
> > > > So are you saying we need a:
> > > >        if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) {
> > > >
> > > >   and can this go straight after (1) ?
> > >
> > > Or right before (3).   Important thing is that that new mount will
> > > only receive propagation if the type of the mount at source (before
> > > (3) is performed) is shared.
> > 
> > And seems I was wrong.  Bind mounting clones the slave property, hence
> > no need to set MS_SHARED.  I.e. if the source was a slave, the bind
> > mount will be a slave to the same master as well; the two slaves won't
> > receive propagation between each other, but both will receive
> > propagation from the master.
> 
> Agreed. I was playing with it yesterday and noticed the same thing. Wanted
> to test more before I said anything
> 
> Anyway, I did following.
> 
> $ mkdir /tmp/a /tmp/a/c /tmp/b
> $ mount --bind /tmp/a /tmp/a
> 
> $ findmnt -o +PROPAGATION /tmp/a
> TARGET SOURCE    FSTYPE OPTIONS                  PROPAGATION
> /tmp/a tmpfs[/a] tmpfs  rw,nosuid,nodev,seclabel shared

A note, this is "shared" by default becase parent mount ("/") is "shared"
by default due to systemd settings. Some distributions make "/" private
instead and in that case, new mount namespace will not be "slave" and
will not receive propagation events.

In that case we will have to document to bind mount source directory
with "shared" propagation so that mounts done later on host can 
propagate into virtiofsd namespace.

Thanks
Vivek
Dr. David Alan Gilbert April 29, 2020, 2:57 p.m. UTC | #18
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
> > Currently, setup_mounts() bind-mounts the shared directory without
> > MS_REC.  This makes all submounts disappear.
> > 
> > Pass MS_REC so that the guest can see submounts again.
> 
> Thanks!
> 
> > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> 
> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >  tools/virtiofsd/passthrough_ll.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index 4c35c95b25..9d7f863e66 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >      int oldroot;
> >      int newroot;
> >  
> > -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> >          exit(1);
> >      }
> 
> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> host?
> What's the interaction between this and the MS_REC|MS_SLAVE that we have
> a few lines above for / ?

Just to confirm something from vgoyal, and what had confused me about
why we hadn't spotted this earlier.

Even without this patch, the SLAVE stuff worked so if you start the
daemon and *then* mount under the shared directory, the guest sees it
with or without this patch.

However, without this patch you don't see a mount that was already there
before you start the daemon.

Dave

> Dave
> 
> > -- 
> > 2.25.3
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Vivek Goyal April 29, 2020, 3:35 p.m. UTC | #19
On Wed, Apr 29, 2020 at 03:57:20PM +0100, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> > > Currently, setup_mounts() bind-mounts the shared directory without
> > > MS_REC.  This makes all submounts disappear.
> > > 
> > > Pass MS_REC so that the guest can see submounts again.
> > 
> > Thanks!
> > 
> > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> > 
> > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> > 
> > > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > > ---
> > >  tools/virtiofsd/passthrough_ll.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > > index 4c35c95b25..9d7f863e66 100644
> > > --- a/tools/virtiofsd/passthrough_ll.c
> > > +++ b/tools/virtiofsd/passthrough_ll.c
> > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> > >      int oldroot;
> > >      int newroot;
> > >  
> > > -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> > > +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> > >          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> > >          exit(1);
> > >      }
> > 
> > Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> > host?
> > What's the interaction between this and the MS_REC|MS_SLAVE that we have
> > a few lines above for / ?
> 
> Just to confirm something from vgoyal, and what had confused me about
> why we hadn't spotted this earlier.
> 
> Even without this patch, the SLAVE stuff worked so if you start the
> daemon and *then* mount under the shared directory, the guest sees it
> with or without this patch.
> 
> However, without this patch you don't see a mount that was already there
> before you start the daemon.

MS_REC will do recursive mount and make all existing submounts visible.
But it does not do anything about propagation of newly created mounts
and that's controlled by propagation properties of mount points.

Our propagation properties seem to be SLAVE already and things work
for new mounts. But existing submounts must have been masked during
bind mount due to absense of MS_REC.

Vivek
Max Reitz April 30, 2020, 8:06 a.m. UTC | #20
On 29.04.20 16:57, Dr. David Alan Gilbert wrote:
> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>> * Max Reitz (mreitz@redhat.com) wrote:
>>> Currently, setup_mounts() bind-mounts the shared directory without
>>> MS_REC.  This makes all submounts disappear.
>>>
>>> Pass MS_REC so that the guest can see submounts again.
>>
>> Thanks!
>>
>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
>>
>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>  tools/virtiofsd/passthrough_ll.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>> index 4c35c95b25..9d7f863e66 100644
>>> --- a/tools/virtiofsd/passthrough_ll.c
>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>>>      int oldroot;
>>>      int newroot;
>>>  
>>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
>>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>>>          exit(1);
>>>      }
>>
>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
>> host?
>> What's the interaction between this and the MS_REC|MS_SLAVE that we have
>> a few lines above for / ?
> 
> Just to confirm something from vgoyal, and what had confused me about
> why we hadn't spotted this earlier.
> 
> Even without this patch, the SLAVE stuff worked so if you start the
> daemon and *then* mount under the shared directory, the guest sees it
> with or without this patch.

Hm, I don’t.  Do you really?

Max
Dr. David Alan Gilbert April 30, 2020, 8:58 a.m. UTC | #21
* Max Reitz (mreitz@redhat.com) wrote:
> On 29.04.20 16:57, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> >> * Max Reitz (mreitz@redhat.com) wrote:
> >>> Currently, setup_mounts() bind-mounts the shared directory without
> >>> MS_REC.  This makes all submounts disappear.
> >>>
> >>> Pass MS_REC so that the guest can see submounts again.
> >>
> >> Thanks!
> >>
> >>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> >>
> >> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
> >>
> >>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>> ---
> >>>  tools/virtiofsd/passthrough_ll.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >>> index 4c35c95b25..9d7f863e66 100644
> >>> --- a/tools/virtiofsd/passthrough_ll.c
> >>> +++ b/tools/virtiofsd/passthrough_ll.c
> >>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
> >>>      int oldroot;
> >>>      int newroot;
> >>>  
> >>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> >>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
> >>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
> >>>          exit(1);
> >>>      }
> >>
> >> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
> >> host?
> >> What's the interaction between this and the MS_REC|MS_SLAVE that we have
> >> a few lines above for / ?
> > 
> > Just to confirm something from vgoyal, and what had confused me about
> > why we hadn't spotted this earlier.
> > 
> > Even without this patch, the SLAVE stuff worked so if you start the
> > daemon and *then* mount under the shared directory, the guest sees it
> > with or without this patch.
> 
> Hm, I don’t.  Do you really?

Yes! With your patch reverted:

Start virtiofsd, mount in the guest:

host:
# ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback

guest:
# mount -t virtiofs myfs /sysroot

host:
# findmnt -o +PROPAGATION -N 6100
TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
/      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
# mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
# findmnt -o +PROPAGATION -N 6100
TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
/      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
└─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave
# touch /home/dgilbert/virtio-fs/fs/tmp/hello

guest:
# ls -l /sysroot/tmp
total 0
-rw-r--r-- 1 root root 0 Apr 30 08:50 hello

Maybe this is related to what Vivek said about default behaviours on
systemd's, what does:

# findmnt -o +PROPAGATION
TARGET    SOURCE         FSTYPE     OPTIONS                                                                                  PROPAGATION
/         /dev/mapper/fedora_dgilbert--t580-root
│                        xfs        rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota                        shared

say for your source= directory?

Dave

> Max
> 



--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Max Reitz April 30, 2020, 9:21 a.m. UTC | #22
On 30.04.20 10:58, Dr. David Alan Gilbert wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
>> On 29.04.20 16:57, Dr. David Alan Gilbert wrote:
>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>>>> * Max Reitz (mreitz@redhat.com) wrote:
>>>>> Currently, setup_mounts() bind-mounts the shared directory without
>>>>> MS_REC.  This makes all submounts disappear.
>>>>>
>>>>> Pass MS_REC so that the guest can see submounts again.
>>>>
>>>> Thanks!
>>>>
>>>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
>>>>
>>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>  tools/virtiofsd/passthrough_ll.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>>> index 4c35c95b25..9d7f863e66 100644
>>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>>>>>      int oldroot;
>>>>>      int newroot;
>>>>>  
>>>>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
>>>>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>>>>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>>>>>          exit(1);
>>>>>      }
>>>>
>>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
>>>> host?
>>>> What's the interaction between this and the MS_REC|MS_SLAVE that we have
>>>> a few lines above for / ?
>>>
>>> Just to confirm something from vgoyal, and what had confused me about
>>> why we hadn't spotted this earlier.
>>>
>>> Even without this patch, the SLAVE stuff worked so if you start the
>>> daemon and *then* mount under the shared directory, the guest sees it
>>> with or without this patch.
>>
>> Hm, I don’t.  Do you really?
> 
> Yes! With your patch reverted:
> 
> Start virtiofsd, mount in the guest:
> 
> host:
> # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> 
> guest:
> # mount -t virtiofs myfs /sysroot

OK, for some reason I didn’t try to mount in the guest first, but did
the host mount after starting virtiofsd.

My mount test uses multiple mounts, and two of them I see mounted, but
three I still don’t see mounted.

Let me see whether I can come up with something reproducible that isn’t
a script.

[...]

> Maybe this is related to what Vivek said about default behaviours on
> systemd's, what does:
> 
> # findmnt -o +PROPAGATION
> TARGET    SOURCE         FSTYPE     OPTIONS                                                                                  PROPAGATION
> /         /dev/mapper/fedora_dgilbert--t580-root
> │                        xfs        rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota                        shared
> 
> say for your source= directory?

I have Fedora, too, so pretty much the same.

Max
Max Reitz April 30, 2020, 11:38 a.m. UTC | #23
On 30.04.20 11:21, Max Reitz wrote:
> On 30.04.20 10:58, Dr. David Alan Gilbert wrote:
>> * Max Reitz (mreitz@redhat.com) wrote:
>>> On 29.04.20 16:57, Dr. David Alan Gilbert wrote:
>>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>>>>> * Max Reitz (mreitz@redhat.com) wrote:
>>>>>> Currently, setup_mounts() bind-mounts the shared directory without
>>>>>> MS_REC.  This makes all submounts disappear.
>>>>>>
>>>>>> Pass MS_REC so that the guest can see submounts again.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
>>>>>
>>>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ?
>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>  tools/virtiofsd/passthrough_ll.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>>>> index 4c35c95b25..9d7f863e66 100644
>>>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>>>>>>      int oldroot;
>>>>>>      int newroot;
>>>>>>  
>>>>>> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
>>>>>> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>>>>>>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>>>>>>          exit(1);
>>>>>>      }
>>>>>
>>>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the
>>>>> host?
>>>>> What's the interaction between this and the MS_REC|MS_SLAVE that we have
>>>>> a few lines above for / ?
>>>>
>>>> Just to confirm something from vgoyal, and what had confused me about
>>>> why we hadn't spotted this earlier.
>>>>
>>>> Even without this patch, the SLAVE stuff worked so if you start the
>>>> daemon and *then* mount under the shared directory, the guest sees it
>>>> with or without this patch.
>>>
>>> Hm, I don’t.  Do you really?
>>
>> Yes! With your patch reverted:
>>
>> Start virtiofsd, mount in the guest:
>>
>> host:
>> # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
>>
>> guest:
>> # mount -t virtiofs myfs /sysroot
> 
> OK, for some reason I didn’t try to mount in the guest first, but did
> the host mount after starting virtiofsd.
> 
> My mount test uses multiple mounts, and two of them I see mounted, but
> three I still don’t see mounted.

And if I do a find /sysroot (or find /mnt in my case) in the guest
before mounting anything on the host, the mounts don’t appear in the
guest at all, but that’s kind of a virtiofsd bug (or feature?).  That
is, if it has opened some directory before it became a mountpoint, then
it stays not-a-mountpoint.

> Let me see whether I can come up with something reproducible that isn’t
> a script.

Unfortunately, I can’t.  I’ve attached the script.

It launches qemu, then in the guest you have to mount “host” somewhere
(before the script emits “mounting”, ideally).

I can see that in the guest, mnt0-0 is mounted, and mnt0-0/sub is, too,
but mnt0-1 and mnt1 aren’t.

With this patch here, everything is mounted.

Max
Vivek Goyal April 30, 2020, 1:56 p.m. UTC | #24
On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote:
[..]
> > > Even without this patch, the SLAVE stuff worked so if you start the
> > > daemon and *then* mount under the shared directory, the guest sees it
> > > with or without this patch.
> > 
> > Hm, I don’t.  Do you really?
> 
> Yes! With your patch reverted:
> 
> Start virtiofsd, mount in the guest:
> 
> host:
> # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> 
> guest:
> # mount -t virtiofs myfs /sysroot
> 
> host:
> # findmnt -o +PROPAGATION -N 6100
> TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
> # findmnt -o +PROPAGATION -N 6100
> TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> └─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave

Why is it showing a mount point at "/tmp". If mount point propagated, then
inside guest we should see a mount point at /sysroot/tmp?

So there are two things.

A. Propagation of mount from host to virtiofsd.
B. Visibility of that mount inside guest over fuse protocol (submount
  functionality).

I think A works for me without any patches. But don't think B is working
for me. I don't see the submount inside guest. 

> # touch /home/dgilbert/virtio-fs/fs/tmp/hello
> 
> guest:
> # ls -l /sysroot/tmp
> total 0
> -rw-r--r-- 1 root root 0 Apr 30 08:50 hello

Do a "findmnt /sysroot/tmp" inside guest and see what do you see.

You will be able to see "hello" as long as virtiofsd sees the new
mount point, I think. And guest does not have to see that mount point
for this simple test to work.

Vivek
Dr. David Alan Gilbert April 30, 2020, 2:20 p.m. UTC | #25
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote:
> [..]
> > > > Even without this patch, the SLAVE stuff worked so if you start the
> > > > daemon and *then* mount under the shared directory, the guest sees it
> > > > with or without this patch.
> > > 
> > > Hm, I don’t.  Do you really?
> > 
> > Yes! With your patch reverted:
> > 
> > Start virtiofsd, mount in the guest:
> > 
> > host:
> > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> > 
> > guest:
> > # mount -t virtiofs myfs /sysroot
> > 
> > host:
> > # findmnt -o +PROPAGATION -N 6100
> > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
> > # findmnt -o +PROPAGATION -N 6100
> > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > └─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave
> 
> Why is it showing a mount point at "/tmp". If mount point propagated, then
> inside guest we should see a mount point at /sysroot/tmp?

That findmnt is on the host.

> So there are two things.
> 
> A. Propagation of mount from host to virtiofsd.
> B. Visibility of that mount inside guest over fuse protocol (submount
>   functionality).
> 
> I think A works for me without any patches. But don't think B is working
> for me. I don't see the submount inside guest. 
> 
> > # touch /home/dgilbert/virtio-fs/fs/tmp/hello
> > 
> > guest:
> > # ls -l /sysroot/tmp
> > total 0
> > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello
> 
> Do a "findmnt /sysroot/tmp" inside guest and see what do you see.
> 
> You will be able to see "hello" as long as virtiofsd sees the new
> mount point, I think. And guest does not have to see that mount point
> for this simple test to work.

Right, the guest just sees:

`-/sysroot                            myfs       virtiof rw,relatime

Dave

> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Vivek Goyal April 30, 2020, 2:24 p.m. UTC | #26
On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote:
> > [..]
> > > > > Even without this patch, the SLAVE stuff worked so if you start the
> > > > > daemon and *then* mount under the shared directory, the guest sees it
> > > > > with or without this patch.
> > > > 
> > > > Hm, I don’t.  Do you really?
> > > 
> > > Yes! With your patch reverted:
> > > 
> > > Start virtiofsd, mount in the guest:
> > > 
> > > host:
> > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> > > 
> > > guest:
> > > # mount -t virtiofs myfs /sysroot
> > > 
> > > host:
> > > # findmnt -o +PROPAGATION -N 6100
> > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
> > > # findmnt -o +PROPAGATION -N 6100
> > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > └─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave
> > 
> > Why is it showing a mount point at "/tmp". If mount point propagated, then
> > inside guest we should see a mount point at /sysroot/tmp?
> 
> That findmnt is on the host.
> 
> > So there are two things.
> > 
> > A. Propagation of mount from host to virtiofsd.
> > B. Visibility of that mount inside guest over fuse protocol (submount
> >   functionality).
> > 
> > I think A works for me without any patches. But don't think B is working
> > for me. I don't see the submount inside guest. 
> > 
> > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello
> > > 
> > > guest:
> > > # ls -l /sysroot/tmp
> > > total 0
> > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello
> > 
> > Do a "findmnt /sysroot/tmp" inside guest and see what do you see.
> > 
> > You will be able to see "hello" as long as virtiofsd sees the new
> > mount point, I think. And guest does not have to see that mount point
> > for this simple test to work.
> 
> Right, the guest just sees:
> 
> `-/sysroot                            myfs       virtiof rw,relatime

Aha.. we are on same page now.

So we need to figure out what's needed to propagate mounts inside guest.
And that's what Max is working on, I think. 

Thanks
Vivek
Daniel P. Berrangé April 30, 2020, 2:34 p.m. UTC | #27
On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote:
> > [..]
> > > > > Even without this patch, the SLAVE stuff worked so if you start the
> > > > > daemon and *then* mount under the shared directory, the guest sees it
> > > > > with or without this patch.
> > > > 
> > > > Hm, I don’t.  Do you really?
> > > 
> > > Yes! With your patch reverted:
> > > 
> > > Start virtiofsd, mount in the guest:
> > > 
> > > host:
> > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> > > 
> > > guest:
> > > # mount -t virtiofs myfs /sysroot
> > > 
> > > host:
> > > # findmnt -o +PROPAGATION -N 6100
> > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
> > > # findmnt -o +PROPAGATION -N 6100
> > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > └─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave
> > 
> > Why is it showing a mount point at "/tmp". If mount point propagated, then
> > inside guest we should see a mount point at /sysroot/tmp?
> 
> That findmnt is on the host.
> 
> > So there are two things.
> > 
> > A. Propagation of mount from host to virtiofsd.
> > B. Visibility of that mount inside guest over fuse protocol (submount
> >   functionality).
> > 
> > I think A works for me without any patches. But don't think B is working
> > for me. I don't see the submount inside guest. 
> > 
> > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello
> > > 
> > > guest:
> > > # ls -l /sysroot/tmp
> > > total 0
> > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello
> > 
> > Do a "findmnt /sysroot/tmp" inside guest and see what do you see.
> > 
> > You will be able to see "hello" as long as virtiofsd sees the new
> > mount point, I think. And guest does not have to see that mount point
> > for this simple test to work.
> 
> Right, the guest just sees:
> 
> `-/sysroot                            myfs       virtiof rw,relatime

That is a good thing surely ? If I'm exporting "/sysroot" from the host,
I want the content in "/sysroot/some/sub/mount" to be visible to the
guest, but I don't want the guest to see "/sysroot/some/sub/mount"
as an actual mount point. That would be leaking information about the
host storage setup into the guest. The host admin should be free to
re-arrange submounts in the host OS, to bring more storage space online,
and have this be transparent to the guest OS.

Regards,
Daniel
Vivek Goyal April 30, 2020, 2:41 p.m. UTC | #28
On Thu, Apr 30, 2020 at 03:34:25PM +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote:
> > > [..]
> > > > > > Even without this patch, the SLAVE stuff worked so if you start the
> > > > > > daemon and *then* mount under the shared directory, the guest sees it
> > > > > > with or without this patch.
> > > > > 
> > > > > Hm, I don’t.  Do you really?
> > > > 
> > > > Yes! With your patch reverted:
> > > > 
> > > > Start virtiofsd, mount in the guest:
> > > > 
> > > > host:
> > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> > > > 
> > > > guest:
> > > > # mount -t virtiofs myfs /sysroot
> > > > 
> > > > host:
> > > > # findmnt -o +PROPAGATION -N 6100
> > > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
> > > > # findmnt -o +PROPAGATION -N 6100
> > > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > > └─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave
> > > 
> > > Why is it showing a mount point at "/tmp". If mount point propagated, then
> > > inside guest we should see a mount point at /sysroot/tmp?
> > 
> > That findmnt is on the host.
> > 
> > > So there are two things.
> > > 
> > > A. Propagation of mount from host to virtiofsd.
> > > B. Visibility of that mount inside guest over fuse protocol (submount
> > >   functionality).
> > > 
> > > I think A works for me without any patches. But don't think B is working
> > > for me. I don't see the submount inside guest. 
> > > 
> > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello
> > > > 
> > > > guest:
> > > > # ls -l /sysroot/tmp
> > > > total 0
> > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello
> > > 
> > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see.
> > > 
> > > You will be able to see "hello" as long as virtiofsd sees the new
> > > mount point, I think. And guest does not have to see that mount point
> > > for this simple test to work.
> > 
> > Right, the guest just sees:
> > 
> > `-/sysroot                            myfs       virtiof rw,relatime
> 
> That is a good thing surely ? If I'm exporting "/sysroot" from the host,
> I want the content in "/sysroot/some/sub/mount" to be visible to the
> guest, but I don't want the guest to see "/sysroot/some/sub/mount"
> as an actual mount point. That would be leaking information about the
> host storage setup into the guest. The host admin should be free to
> re-arrange submounts in the host OS, to bring more storage space online,
> and have this be transparent to the guest OS.

If we don't see mount inside guest, we run into the possibility of inode
number collision. On host two files in shared dir can have same inode
number (if they are on two different filesystem with different device
numbers). But inside guest, we will show device number of virtiofs,
and it will look as if two files in this filesystem have same inode
number, breaking some workloads.

By propagating mounts (submounts), we can assign a unique device number
to these submounts and hence <dev,inode> number pair will become unique.

W.r.t information lea, may be we can mask some of the information in
submounts inside guest.

Thanks
Vivek
Daniel P. Berrangé April 30, 2020, 2:47 p.m. UTC | #29
On Thu, Apr 30, 2020 at 10:41:16AM -0400, Vivek Goyal wrote:
> On Thu, Apr 30, 2020 at 03:34:25PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote:
> > > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote:
> > > > [..]
> > > > > > > Even without this patch, the SLAVE stuff worked so if you start the
> > > > > > > daemon and *then* mount under the shared directory, the guest sees it
> > > > > > > with or without this patch.
> > > > > > 
> > > > > > Hm, I don’t.  Do you really?
> > > > > 
> > > > > Yes! With your patch reverted:
> > > > > 
> > > > > Start virtiofsd, mount in the guest:
> > > > > 
> > > > > host:
> > > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> > > > > 
> > > > > guest:
> > > > > # mount -t virtiofs myfs /sysroot
> > > > > 
> > > > > host:
> > > > > # findmnt -o +PROPAGATION -N 6100
> > > > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
> > > > > # findmnt -o +PROPAGATION -N 6100
> > > > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > > > └─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave
> > > > 
> > > > Why is it showing a mount point at "/tmp". If mount point propagated, then
> > > > inside guest we should see a mount point at /sysroot/tmp?
> > > 
> > > That findmnt is on the host.
> > > 
> > > > So there are two things.
> > > > 
> > > > A. Propagation of mount from host to virtiofsd.
> > > > B. Visibility of that mount inside guest over fuse protocol (submount
> > > >   functionality).
> > > > 
> > > > I think A works for me without any patches. But don't think B is working
> > > > for me. I don't see the submount inside guest. 
> > > > 
> > > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello
> > > > > 
> > > > > guest:
> > > > > # ls -l /sysroot/tmp
> > > > > total 0
> > > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello
> > > > 
> > > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see.
> > > > 
> > > > You will be able to see "hello" as long as virtiofsd sees the new
> > > > mount point, I think. And guest does not have to see that mount point
> > > > for this simple test to work.
> > > 
> > > Right, the guest just sees:
> > > 
> > > `-/sysroot                            myfs       virtiof rw,relatime
> > 
> > That is a good thing surely ? If I'm exporting "/sysroot" from the host,
> > I want the content in "/sysroot/some/sub/mount" to be visible to the
> > guest, but I don't want the guest to see "/sysroot/some/sub/mount"
> > as an actual mount point. That would be leaking information about the
> > host storage setup into the guest. The host admin should be free to
> > re-arrange submounts in the host OS, to bring more storage space online,
> > and have this be transparent to the guest OS.
> 
> If we don't see mount inside guest, we run into the possibility of inode
> number collision. On host two files in shared dir can have same inode
> number (if they are on two different filesystem with different device
> numbers). But inside guest, we will show device number of virtiofs,
> and it will look as if two files in this filesystem have same inode
> number, breaking some workloads.
> 
> By propagating mounts (submounts), we can assign a unique device number
> to these submounts and hence <dev,inode> number pair will become unique.

Ah, yes, that's true.  In 9pfs there was recent changes precisely
because of this clash possibility:

commit 1a6ed33cc56997479bbe5b48337ff8da44585bd4
Author: Antonios Motakis <antonios.motakis@huawei.com>
Date:   Thu Oct 10 11:36:05 2019 +0200

    9p: Added virtfs option 'multidevs=remap|forbid|warn'
    
    'warn' (default): Only log an error message (once) on host if more than one
    device is shared by same export, except of that just ignore this config
    error though. This is the default behaviour for not breaking existing
    installations implying that they really know what they are doing.
    
    'forbid': Like 'warn', but except of just logging an error this
    also denies access of guest to additional devices.
    
    'remap': Allows to share more than one device per export by remapping
    inodes from host to guest appropriately. To support multiple devices on the
    9p share, and avoid qid path collisions we take the device id as input to
    generate a unique QID path. The lowest 48 bits of the path will be set
    equal to the file inode, and the top bits will be uniquely assigned based
    on the top 16 bits of the inode and the device id.


Perhaps we should try to support the same options in virtio-fs. At least
the "forbid" and "remap" options make sense I think. "warn" was only
really there for backcompat.  If we can expose it to the guest, then a
further "expose" option would be viable.

Regards,
Daniel
Vivek Goyal April 30, 2020, 3:41 p.m. UTC | #30
On Thu, Apr 30, 2020 at 03:47:04PM +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 30, 2020 at 10:41:16AM -0400, Vivek Goyal wrote:
> > On Thu, Apr 30, 2020 at 03:34:25PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote:
> > > > > [..]
> > > > > > > > Even without this patch, the SLAVE stuff worked so if you start the
> > > > > > > > daemon and *then* mount under the shared directory, the guest sees it
> > > > > > > > with or without this patch.
> > > > > > > 
> > > > > > > Hm, I don’t.  Do you really?
> > > > > > 
> > > > > > Yes! With your patch reverted:
> > > > > > 
> > > > > > Start virtiofsd, mount in the guest:
> > > > > > 
> > > > > > host:
> > > > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs  -o log_level=warn -o no_writeback
> > > > > > 
> > > > > > guest:
> > > > > > # mount -t virtiofs myfs /sysroot
> > > > > > 
> > > > > > host:
> > > > > > # findmnt -o +PROPAGATION -N 6100
> > > > > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > > > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp
> > > > > > # findmnt -o +PROPAGATION -N 6100
> > > > > > TARGET SOURCE                                                              FSTYPE OPTIONS                                                      PROPAGATION
> > > > > > /      /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs    rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave
> > > > > > └─/tmp /dev/null                                                           tmpfs  rw,relatime,seclabel                                         private,slave
> > > > > 
> > > > > Why is it showing a mount point at "/tmp". If mount point propagated, then
> > > > > inside guest we should see a mount point at /sysroot/tmp?
> > > > 
> > > > That findmnt is on the host.
> > > > 
> > > > > So there are two things.
> > > > > 
> > > > > A. Propagation of mount from host to virtiofsd.
> > > > > B. Visibility of that mount inside guest over fuse protocol (submount
> > > > >   functionality).
> > > > > 
> > > > > I think A works for me without any patches. But don't think B is working
> > > > > for me. I don't see the submount inside guest. 
> > > > > 
> > > > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello
> > > > > > 
> > > > > > guest:
> > > > > > # ls -l /sysroot/tmp
> > > > > > total 0
> > > > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello
> > > > > 
> > > > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see.
> > > > > 
> > > > > You will be able to see "hello" as long as virtiofsd sees the new
> > > > > mount point, I think. And guest does not have to see that mount point
> > > > > for this simple test to work.
> > > > 
> > > > Right, the guest just sees:
> > > > 
> > > > `-/sysroot                            myfs       virtiof rw,relatime
> > > 
> > > That is a good thing surely ? If I'm exporting "/sysroot" from the host,
> > > I want the content in "/sysroot/some/sub/mount" to be visible to the
> > > guest, but I don't want the guest to see "/sysroot/some/sub/mount"
> > > as an actual mount point. That would be leaking information about the
> > > host storage setup into the guest. The host admin should be free to
> > > re-arrange submounts in the host OS, to bring more storage space online,
> > > and have this be transparent to the guest OS.
> > 
> > If we don't see mount inside guest, we run into the possibility of inode
> > number collision. On host two files in shared dir can have same inode
> > number (if they are on two different filesystem with different device
> > numbers). But inside guest, we will show device number of virtiofs,
> > and it will look as if two files in this filesystem have same inode
> > number, breaking some workloads.
> > 
> > By propagating mounts (submounts), we can assign a unique device number
> > to these submounts and hence <dev,inode> number pair will become unique.
> 
> Ah, yes, that's true.  In 9pfs there was recent changes precisely
> because of this clash possibility:
> 
> commit 1a6ed33cc56997479bbe5b48337ff8da44585bd4
> Author: Antonios Motakis <antonios.motakis@huawei.com>
> Date:   Thu Oct 10 11:36:05 2019 +0200
> 
>     9p: Added virtfs option 'multidevs=remap|forbid|warn'
>     
>     'warn' (default): Only log an error message (once) on host if more than one
>     device is shared by same export, except of that just ignore this config
>     error though. This is the default behaviour for not breaking existing
>     installations implying that they really know what they are doing.
>     
>     'forbid': Like 'warn', but except of just logging an error this
>     also denies access of guest to additional devices.
>     
>     'remap': Allows to share more than one device per export by remapping
>     inodes from host to guest appropriately. To support multiple devices on the
>     9p share, and avoid qid path collisions we take the device id as input to
>     generate a unique QID path. The lowest 48 bits of the path will be set
>     equal to the file inode, and the top bits will be uniquely assigned based
>     on the top 16 bits of the inode and the device id.
> 
> 
> Perhaps we should try to support the same options in virtio-fs. At least
> the "forbid" and "remap" options make sense I think. "warn" was only
> really there for backcompat.  If we can expose it to the guest, then a
> further "expose" option would be viable.

Providing options in virtiofsd daemon so that use can tweak these settings
makes sense.

Given we are already allowing accessing multiple device inside guest, we
also probably need a mode to continue to allow that so that not to break
existing setups.

"forbid", "remap" make sense. And I like the option "expose" too.

"remap" will require little thought that how do we do that. We briefly
discussed encoding device number in higher inode bits and Miklos was
not too keen on it. What if filesystem is using all 64bits of inode. We
also need to do encoding in such a way so that inode number is persistent.

Thanks
Vivek
Dr. David Alan Gilbert May 1, 2020, 5:53 p.m. UTC | #31
* Max Reitz (mreitz@redhat.com) wrote:
> Currently, setup_mounts() bind-mounts the shared directory without
> MS_REC.  This makes all submounts disappear.
> 
> Pass MS_REC so that the guest can see submounts again.
> 
> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Queued.

> ---
>  tools/virtiofsd/passthrough_ll.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 4c35c95b25..9d7f863e66 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source)
>      int oldroot;
>      int newroot;
>  
> -    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
> +    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
>          fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
>          exit(1);
>      }
> -- 
> 2.25.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox series

Patch

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 4c35c95b25..9d7f863e66 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2643,7 +2643,7 @@  static void setup_mounts(const char *source)
     int oldroot;
     int newroot;
 
-    if (mount(source, source, NULL, MS_BIND, NULL) < 0) {
+    if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) {
         fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
         exit(1);
     }