Message ID | 20200424133516.73077-1-mreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtiofsd: Show submounts | expand |
* Max Reitz (mreitz@redhat.com) wrote: > Currently, setup_mounts() bind-mounts the shared directory without > MS_REC. This makes all submounts disappear. > > Pass MS_REC so that the guest can see submounts again. Thanks! > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > tools/virtiofsd/passthrough_ll.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > index 4c35c95b25..9d7f863e66 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > int oldroot; > int newroot; > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > exit(1); > } Do we want MS_SLAVE to pick up future mounts that might happenf rom the host? What's the interaction between this and the MS_REC|MS_SLAVE that we have a few lines above for / ? Dave > -- > 2.25.3 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 27.04.20 19:59, Dr. David Alan Gilbert wrote: > * Max Reitz (mreitz@redhat.com) wrote: >> Currently, setup_mounts() bind-mounts the shared directory without >> MS_REC. This makes all submounts disappear. >> >> Pass MS_REC so that the guest can see submounts again. > > Thanks! > >> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? Well, I bisected it and landed at 3ca8a2b1. So while the problematic line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1. >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> tools/virtiofsd/passthrough_ll.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c >> index 4c35c95b25..9d7f863e66 100644 >> --- a/tools/virtiofsd/passthrough_ll.c >> +++ b/tools/virtiofsd/passthrough_ll.c >> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) >> int oldroot; >> int newroot; >> >> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { >> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { >> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); >> exit(1); >> } > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > host? Hm. So first it looks to me from the man page like one shouldn’t give MS_SLAVE on the first mount() call but kind of only use it for remounts (in the list at the start, “Create a bind mount” is separate from “Change the propagation type of an existing mount”, and the man page later says “The only other flags that can be specified while changing the propagation type are MS_REC (described below) and MS_SILENT (which is ignored).”). Second, even if I do change the propagation type to MS_SLAVE in a second call, mounts done after qemu has been started don’t show up in the guest (for me). So while it sounds correct, I can’t see it having an effect, actually. > What's the interaction between this and the MS_REC|MS_SLAVE that we have > a few lines above for / ? Good question. It would seem to me that there isn’t any. That previous mount call just sets MS_REC | MS_SLAVE for the whole mount namespace, and then we do a new mount here (by default from / to /) that needs its own flags. (More interesting is perhaps why we have that other mount() call below, which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /. I can’t imagine that to have any effect.) Max
On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote: > * Max Reitz (mreitz@redhat.com) wrote: > > Currently, setup_mounts() bind-mounts the shared directory without > > MS_REC. This makes all submounts disappear. > > > > Pass MS_REC so that the guest can see submounts again. > > Thanks! > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > --- > > tools/virtiofsd/passthrough_ll.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > index 4c35c95b25..9d7f863e66 100644 > > --- a/tools/virtiofsd/passthrough_ll.c > > +++ b/tools/virtiofsd/passthrough_ll.c > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > > int oldroot; > > int newroot; > > > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > > exit(1); > > } > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > host? I think that probably makes sense to have MS_SLAVE by default, as that means the set of files exposed to the guest is consistent across a QEMU restart. Without MS_SLAVE new mounts aren't visible until after QEMU is restarted which is likely surprising/undesirable to admins. Regards, Daniel
* Max Reitz (mreitz@redhat.com) wrote: > On 27.04.20 19:59, Dr. David Alan Gilbert wrote: > > * Max Reitz (mreitz@redhat.com) wrote: > >> Currently, setup_mounts() bind-mounts the shared directory without > >> MS_REC. This makes all submounts disappear. > >> > >> Pass MS_REC so that the guest can see submounts again. > > > > Thanks! > > > >> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > Well, I bisected it and landed at 3ca8a2b1. So while the problematic > line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1. OK, I'd rather stick with the Fixes: for the patch that was actually wrong. > >> Signed-off-by: Max Reitz <mreitz@redhat.com> > >> --- > >> tools/virtiofsd/passthrough_ll.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > >> index 4c35c95b25..9d7f863e66 100644 > >> --- a/tools/virtiofsd/passthrough_ll.c > >> +++ b/tools/virtiofsd/passthrough_ll.c > >> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > >> int oldroot; > >> int newroot; > >> > >> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > >> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > >> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > >> exit(1); > >> } > > > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > > host? > > Hm. So first it looks to me from the man page like one shouldn’t give > MS_SLAVE on the first mount() call but kind of only use it for remounts > (in the list at the start, “Create a bind mount” is separate from > “Change the propagation type of an existing mount”, and the man page > later says “The only other flags that can be specified while changing > the propagation type are MS_REC (described below) and MS_SILENT (which > is ignored).”). > > Second, even if I do change the propagation type to MS_SLAVE in a second > call, mounts done after qemu has been started don’t show up in the guest > (for me). > > So while it sounds correct, I can’t see it having an effect, actually. That's unfortunate; but I guess we can debug that separately > > What's the interaction between this and the MS_REC|MS_SLAVE that we have > > a few lines above for / ? > > Good question. It would seem to me that there isn’t any. That previous > mount call just sets MS_REC | MS_SLAVE for the whole mount namespace, > and then we do a new mount here (by default from / to /) that needs its > own flags. > > (More interesting is perhaps why we have that other mount() call below, > which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /. > I can’t imagine that to have any effect.) Is that just trying to be careful before the umount2 so it doesn't try to unmount something useful? Dave > Max > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 28.04.20 11:59, Dr. David Alan Gilbert wrote: > * Max Reitz (mreitz@redhat.com) wrote: >> On 27.04.20 19:59, Dr. David Alan Gilbert wrote: >>> * Max Reitz (mreitz@redhat.com) wrote: >>>> Currently, setup_mounts() bind-mounts the shared directory without >>>> MS_REC. This makes all submounts disappear. >>>> >>>> Pass MS_REC so that the guest can see submounts again. >>> >>> Thanks! >>> >>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 >>> >>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? >> >> Well, I bisected it and landed at 3ca8a2b1. So while the problematic >> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1. > > OK, I'd rather stick with the Fixes: for the patch that was actually > wrong. Why not both? :) >>>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>>> --- >>>> tools/virtiofsd/passthrough_ll.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c >>>> index 4c35c95b25..9d7f863e66 100644 >>>> --- a/tools/virtiofsd/passthrough_ll.c >>>> +++ b/tools/virtiofsd/passthrough_ll.c >>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) >>>> int oldroot; >>>> int newroot; >>>> >>>> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { >>>> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { >>>> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); >>>> exit(1); >>>> } >>> >>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the >>> host? >> >> Hm. So first it looks to me from the man page like one shouldn’t give >> MS_SLAVE on the first mount() call but kind of only use it for remounts >> (in the list at the start, “Create a bind mount” is separate from >> “Change the propagation type of an existing mount”, and the man page >> later says “The only other flags that can be specified while changing >> the propagation type are MS_REC (described below) and MS_SILENT (which >> is ignored).”). >> >> Second, even if I do change the propagation type to MS_SLAVE in a second >> call, mounts done after qemu has been started don’t show up in the guest >> (for me). >> >> So while it sounds correct, I can’t see it having an effect, actually. > > That's unfortunate; but I guess we can debug that separately > >>> What's the interaction between this and the MS_REC|MS_SLAVE that we have >>> a few lines above for / ? >> >> Good question. It would seem to me that there isn’t any. That previous >> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace, >> and then we do a new mount here (by default from / to /) that needs its >> own flags. >> >> (More interesting is perhaps why we have that other mount() call below, >> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /. >> I can’t imagine that to have any effect.) > > Is that just trying to be careful before the umount2 so it doesn't try > to unmount something useful? Perhaps, but still, it shouldn’t matter. I rather suspect that setup_namespaces() and setup_mounts() were developed (or taken from elsewhere) independently, so they both have to work independently, and thus they do overlapping stuff. Max
* Max Reitz (mreitz@redhat.com) wrote: > On 28.04.20 11:59, Dr. David Alan Gilbert wrote: > > * Max Reitz (mreitz@redhat.com) wrote: > >> On 27.04.20 19:59, Dr. David Alan Gilbert wrote: > >>> * Max Reitz (mreitz@redhat.com) wrote: > >>>> Currently, setup_mounts() bind-mounts the shared directory without > >>>> MS_REC. This makes all submounts disappear. > >>>> > >>>> Pass MS_REC so that the guest can see submounts again. > >>> > >>> Thanks! > >>> > >>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > >>> > >>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > >> > >> Well, I bisected it and landed at 3ca8a2b1. So while the problematic > >> line may have been introduced by 5baa3b8e, it wasn’t used until 3ca8a2b1. > > > > OK, I'd rather stick with the Fixes: for the patch that was actually > > wrong. > > Why not both? :) > > >>>> Signed-off-by: Max Reitz <mreitz@redhat.com> > >>>> --- > >>>> tools/virtiofsd/passthrough_ll.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > >>>> index 4c35c95b25..9d7f863e66 100644 > >>>> --- a/tools/virtiofsd/passthrough_ll.c > >>>> +++ b/tools/virtiofsd/passthrough_ll.c > >>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > >>>> int oldroot; > >>>> int newroot; > >>>> > >>>> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > >>>> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > >>>> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > >>>> exit(1); > >>>> } > >>> > >>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the > >>> host? > >> > >> Hm. So first it looks to me from the man page like one shouldn’t give > >> MS_SLAVE on the first mount() call but kind of only use it for remounts > >> (in the list at the start, “Create a bind mount” is separate from > >> “Change the propagation type of an existing mount”, and the man page > >> later says “The only other flags that can be specified while changing > >> the propagation type are MS_REC (described below) and MS_SILENT (which > >> is ignored).”). > >> > >> Second, even if I do change the propagation type to MS_SLAVE in a second > >> call, mounts done after qemu has been started don’t show up in the guest > >> (for me). > >> > >> So while it sounds correct, I can’t see it having an effect, actually. > > > > That's unfortunate; but I guess we can debug that separately > > > >>> What's the interaction between this and the MS_REC|MS_SLAVE that we have > >>> a few lines above for / ? > >> > >> Good question. It would seem to me that there isn’t any. That previous > >> mount call just sets MS_REC | MS_SLAVE for the whole mount namespace, > >> and then we do a new mount here (by default from / to /) that needs its > >> own flags. > >> > >> (More interesting is perhaps why we have that other mount() call below, > >> which again sets MS_REC | MS_SLAVE for the old (not-yet-bind-mounted) /. > >> I can’t imagine that to have any effect.) > > > > Is that just trying to be careful before the umount2 so it doesn't try > > to unmount something useful? > > Perhaps, but still, it shouldn’t matter. I rather suspect that > setup_namespaces() and setup_mounts() were developed (or taken from > elsewhere) independently, so they both have to work independently, and > thus they do overlapping stuff. Yep, agreed. Dave > Max > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote: > * Max Reitz (mreitz@redhat.com) wrote: > > Currently, setup_mounts() bind-mounts the shared directory without > > MS_REC. This makes all submounts disappear. > > > > Pass MS_REC so that the guest can see submounts again. > > Thanks! > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > --- > > tools/virtiofsd/passthrough_ll.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > index 4c35c95b25..9d7f863e66 100644 > > --- a/tools/virtiofsd/passthrough_ll.c > > +++ b/tools/virtiofsd/passthrough_ll.c > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > > int oldroot; > > int newroot; > > > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > > exit(1); > > } > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > host? There are two separate concepts: 1. Mount namespaces. The virtiofsd process is sandboxed and lives in its own mount namespace. Therefore it does not share the mounts that the rest of the host system sees. 2. Propagation type. This is related to bind mounts so that mount operations that happen in one bind-mounted location can also appear in other bind-mounted locations. Since virtiofsd is in a separate mount namespace, does the propagation type even have any effect? Stefan
On Tue, Apr 28, 2020 at 03:51:43PM +0100, Stefan Hajnoczi wrote: > On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote: > > * Max Reitz (mreitz@redhat.com) wrote: > > > Currently, setup_mounts() bind-mounts the shared directory without > > > MS_REC. This makes all submounts disappear. > > > > > > Pass MS_REC so that the guest can see submounts again. > > > > Thanks! > > > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > > --- > > > tools/virtiofsd/passthrough_ll.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > > index 4c35c95b25..9d7f863e66 100644 > > > --- a/tools/virtiofsd/passthrough_ll.c > > > +++ b/tools/virtiofsd/passthrough_ll.c > > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > > > int oldroot; > > > int newroot; > > > > > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > > > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > > > exit(1); > > > } > > > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > > host? > > There are two separate concepts: > > 1. Mount namespaces. The virtiofsd process is sandboxed and lives in > its own mount namespace. Therefore it does not share the mounts that > the rest of the host system sees. > > 2. Propagation type. This is related to bind mounts so that mount > operations that happen in one bind-mounted location can also appear > in other bind-mounted locations. > > Since virtiofsd is in a separate mount namespace, does the propagation > type even have any effect? Yes, propagation should work across mount namespaces if you get the mount flags right. You can try it out using unshare + mount commands to debug different scenarios. Regards, Daniel
On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote: > > * Max Reitz (mreitz@redhat.com) wrote: > > > Currently, setup_mounts() bind-mounts the shared directory without > > > MS_REC. This makes all submounts disappear. > > > > > > Pass MS_REC so that the guest can see submounts again. > > > > Thanks! > > > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > > --- > > > tools/virtiofsd/passthrough_ll.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > > index 4c35c95b25..9d7f863e66 100644 > > > --- a/tools/virtiofsd/passthrough_ll.c > > > +++ b/tools/virtiofsd/passthrough_ll.c > > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > > > int oldroot; > > > int newroot; > > > > > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > > > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > > > exit(1); > > > } > > > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > > host? > > There are two separate concepts: > > 1. Mount namespaces. The virtiofsd process is sandboxed and lives in > its own mount namespace. Therefore it does not share the mounts that > the rest of the host system sees. > > 2. Propagation type. This is related to bind mounts so that mount > operations that happen in one bind-mounted location can also appear > in other bind-mounted locations. > > Since virtiofsd is in a separate mount namespace, does the propagation > type even have any effect? It's a complicated thing. Current setup results in propagation happening to the cloned namespace, but not to the bind mounted root. Why? Because setting mounts "slave" after unshare, results in the propagation being stopped at that point. To make it propagate further, change it back to "shared". Note: the result changing to "slave" and then to "shared" results in breaking the backward propagation to the original namespace, but allowing propagation further down the chain. Thanks, Miklos
* Miklos Szeredi (mszeredi@redhat.com) wrote: > On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote: > > > * Max Reitz (mreitz@redhat.com) wrote: > > > > Currently, setup_mounts() bind-mounts the shared directory without > > > > MS_REC. This makes all submounts disappear. > > > > > > > > Pass MS_REC so that the guest can see submounts again. > > > > > > Thanks! > > > > > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > > > > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > > > > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > > > --- > > > > tools/virtiofsd/passthrough_ll.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > > > index 4c35c95b25..9d7f863e66 100644 > > > > --- a/tools/virtiofsd/passthrough_ll.c > > > > +++ b/tools/virtiofsd/passthrough_ll.c > > > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > > > > int oldroot; > > > > int newroot; > > > > > > > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > > > > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > > > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > > > > exit(1); > > > > } > > > > > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > > > host? > > > > There are two separate concepts: > > > > 1. Mount namespaces. The virtiofsd process is sandboxed and lives in > > its own mount namespace. Therefore it does not share the mounts that > > the rest of the host system sees. > > > > 2. Propagation type. This is related to bind mounts so that mount > > operations that happen in one bind-mounted location can also appear > > in other bind-mounted locations. > > > > Since virtiofsd is in a separate mount namespace, does the propagation > > type even have any effect? > > It's a complicated thing. Current setup results in propagation > happening to the cloned namespace, but not to the bind mounted root. > > Why? Because setting mounts "slave" after unshare, results in the > propagation being stopped at that point. To make it propagate > further, change it back to "shared". Note: the result changing to > "slave" and then to "shared" results in breaking the backward > propagation to the original namespace, but allowing propagation > further down the chain. Do you mean on the "/" ? So our current sequence is: (new namespace) 1) if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) { 2) if (mount("proc", "/proc", "proc", .... 3) if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { 4) (chdir newroot, pivot, chdir oldroot) 5) if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) { 6) if (umount2(".", MNT_DETACH) < 0) { So are you saying we need a: if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { and can this go straight after (1) ? Dave > Thanks, > Miklos > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > So our current sequence is: > > (new namespace) > 1) if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) { > 2) if (mount("proc", "/proc", "proc", > .... > 3) if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > 4) (chdir newroot, pivot, chdir oldroot) > 5) if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) { > 6) if (umount2(".", MNT_DETACH) < 0) { > > So are you saying we need a: > if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { > > and can this go straight after (1) ? Or right before (3). Important thing is that that new mount will only receive propagation if the type of the mount at source (before (3) is performed) is shared. Thanks, Miklos
On 28.04.20 21:15, Dr. David Alan Gilbert wrote: > * Miklos Szeredi (mszeredi@redhat.com) wrote: >> On Tue, Apr 28, 2020 at 4:52 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: >>> >>> On Mon, Apr 27, 2020 at 06:59:02PM +0100, Dr. David Alan Gilbert wrote: >>>> * Max Reitz (mreitz@redhat.com) wrote: >>>>> Currently, setup_mounts() bind-mounts the shared directory without >>>>> MS_REC. This makes all submounts disappear. >>>>> >>>>> Pass MS_REC so that the guest can see submounts again. >>>> >>>> Thanks! >>>> >>>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 >>>> >>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? >>>> >>>>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>>>> --- >>>>> tools/virtiofsd/passthrough_ll.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c >>>>> index 4c35c95b25..9d7f863e66 100644 >>>>> --- a/tools/virtiofsd/passthrough_ll.c >>>>> +++ b/tools/virtiofsd/passthrough_ll.c >>>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) >>>>> int oldroot; >>>>> int newroot; >>>>> >>>>> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { >>>>> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { >>>>> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); >>>>> exit(1); >>>>> } >>>> >>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the >>>> host? >>> >>> There are two separate concepts: >>> >>> 1. Mount namespaces. The virtiofsd process is sandboxed and lives in >>> its own mount namespace. Therefore it does not share the mounts that >>> the rest of the host system sees. >>> >>> 2. Propagation type. This is related to bind mounts so that mount >>> operations that happen in one bind-mounted location can also appear >>> in other bind-mounted locations. >>> >>> Since virtiofsd is in a separate mount namespace, does the propagation >>> type even have any effect? >> >> It's a complicated thing. Current setup results in propagation >> happening to the cloned namespace, but not to the bind mounted root. >> >> Why? Because setting mounts "slave" after unshare, results in the >> propagation being stopped at that point. To make it propagate >> further, change it back to "shared". Note: the result changing to >> "slave" and then to "shared" results in breaking the backward >> propagation to the original namespace, but allowing propagation >> further down the chain. > > Do you mean on the "/" ? > > So our current sequence is: > > (new namespace) > 1) if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) { > 2) if (mount("proc", "/proc", "proc", > .... > 3) if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > 4) (chdir newroot, pivot, chdir oldroot) > 5) if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) { > 6) if (umount2(".", MNT_DETACH) < 0) { > > So are you saying we need a: > if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { > > and can this go straight after (1) ? Isn’t MS_SHARED and MS_SLAVE mutually exclusive, that is, both are just different propagation types? So shouldn’t putting this after (1) be effectively the same as replacing (1)? Max
On Wed, Apr 29, 2020 at 10:31 AM Max Reitz <mreitz@redhat.com> wrote: > > On 28.04.20 21:15, Dr. David Alan Gilbert wrote: > > So are you saying we need a: > > if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { > > > > and can this go straight after (1) ? > > Isn’t MS_SHARED and MS_SLAVE mutually exclusive, that is, both are just > different propagation types? So shouldn’t putting this after (1) be > effectively the same as replacing (1)? No, think of it more like a set of groups (mounts in a group have the same "shared:$ID" tag in /proc/self/mountinfo) and these groups being arranged into a tree, where child groups get propagation from the parent group (mounts have a "master:$PARENT_ID" tag), but not the other way round. So if a mount is part of a shared group and this group is also the child of another shared group, then it will have both a "shared:$ID" and a "master:$PARENT_ID" tag in /proc/self/mountinfo. For the gory details see Documentation/filesystems/sharedsubtree.txt Thanks, Miklos
On Wed, Apr 29, 2020 at 9:59 AM Miklos Szeredi <mszeredi@redhat.com> wrote: > > On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > > So our current sequence is: > > > > (new namespace) > > 1) if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) { > > 2) if (mount("proc", "/proc", "proc", > > .... > > 3) if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > 4) (chdir newroot, pivot, chdir oldroot) > > 5) if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) { > > 6) if (umount2(".", MNT_DETACH) < 0) { > > > > So are you saying we need a: > > if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { > > > > and can this go straight after (1) ? > > Or right before (3). Important thing is that that new mount will > only receive propagation if the type of the mount at source (before > (3) is performed) is shared. And seems I was wrong. Bind mounting clones the slave property, hence no need to set MS_SHARED. I.e. if the source was a slave, the bind mount will be a slave to the same master as well; the two slaves won't receive propagation between each other, but both will receive propagation from the master. The only reason to set MS_SHARED would be if the bind mount wanted to receive propagation from within the cloned namespace. Which is not the case. Didn't I tell ya it was complicated ;) Thanks, Miklos
On 29.04.20 10:52, Miklos Szeredi wrote: > On Wed, Apr 29, 2020 at 10:31 AM Max Reitz <mreitz@redhat.com> wrote: >> >> On 28.04.20 21:15, Dr. David Alan Gilbert wrote: > >>> So are you saying we need a: >>> if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { >>> >>> and can this go straight after (1) ? >> >> Isn’t MS_SHARED and MS_SLAVE mutually exclusive, that is, both are just >> different propagation types? So shouldn’t putting this after (1) be >> effectively the same as replacing (1)? > > No, think of it more like a set of groups (mounts in a group have the > same "shared:$ID" tag in /proc/self/mountinfo) and these groups being > arranged into a tree, where child groups get propagation from the > parent group (mounts have a "master:$PARENT_ID" tag), but not the > other way round. > > So if a mount is part of a shared group and this group is also the > child of another shared group, then it will have both a "shared:$ID" > and a "master:$PARENT_ID" tag in /proc/self/mountinfo. Ah, OK. Thanks a lot for the explanation! > For the gory details see Documentation/filesystems/sharedsubtree.txt Not sure whether I want to *cough* O:) Max
On Wed, Apr 29, 2020 at 11:26:49AM +0200, Miklos Szeredi wrote: > On Wed, Apr 29, 2020 at 9:59 AM Miklos Szeredi <mszeredi@redhat.com> wrote: > > > > On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert > > <dgilbert@redhat.com> wrote: > > > > > So our current sequence is: > > > > > > (new namespace) > > > 1) if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) { > > > 2) if (mount("proc", "/proc", "proc", > > > .... > > > 3) if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > > 4) (chdir newroot, pivot, chdir oldroot) > > > 5) if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) { > > > 6) if (umount2(".", MNT_DETACH) < 0) { > > > > > > So are you saying we need a: > > > if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { > > > > > > and can this go straight after (1) ? > > > > Or right before (3). Important thing is that that new mount will > > only receive propagation if the type of the mount at source (before > > (3) is performed) is shared. > > And seems I was wrong. Bind mounting clones the slave property, hence > no need to set MS_SHARED. I.e. if the source was a slave, the bind > mount will be a slave to the same master as well; the two slaves won't > receive propagation between each other, but both will receive > propagation from the master. Agreed. I was playing with it yesterday and noticed the same thing. Wanted to test more before I said anything Anyway, I did following. $ mkdir /tmp/a /tmp/a/c /tmp/b $ mount --bind /tmp/a /tmp/a $ findmnt -o +PROPAGATION /tmp/a TARGET SOURCE FSTYPE OPTIONS PROPAGATION /tmp/a tmpfs[/a] tmpfs rw,nosuid,nodev,seclabel shared $ cat /proc/self/mountifo | grep "/tmp/a" 613 49 0:45 /a /tmp/a rw,nosuid,nodev shared:30 - tmpfs tmpfs rw,seclabel # Mountpoint /tmp/a is part of peer group "30" # Create a new mount namespace with slave propagation $ unshare -m --propagation slave bash $ findmnt -o +PROPAGATION /tmp/a TARGET SOURCE FSTYPE OPTIONS PROPAGATION /tmp/a tmpfs[/a] tmpfs rw,nosuid,nodev,seclabel private,slave $ cat /proc/self/mountinfo | grep /tmp/a 666 665 0:45 /a /tmp/a rw,nosuid,nodev master:30 - tmpfs tmpfs rw,seclabel # /tmp/a in new mount namespace is slave of master "30" # bind mount /tmp/a to /tmp/b and b should become slave of "30" too. $ mount --bind /tmp/a /tmp/b $findmnt -o +PROPAGATION /tmp/b TARGET SOURCE FSTYPE OPTIONS PROPAGATION /tmp/b tmpfs[/a] tmpfs rw,nosuid,nodev,seclabel private,slave $ cat /proc/self/mountinfo | grep /tmp/b 671 665 0:45 /a /tmp/b rw,nosuid,nodev master:30 - tmpfs tmpfs rw,seclabel # So /tmp/b is slave of "master:30" too. Say if host mounts something # under /tmp/a (in init namespace), it should propagate to /tmp/a as # well as /tmp/b in new mount namespace. # Do following in init mount namespace $ mount --bind /tmp/a/c /tmp/a/c # Check in newly created mount namespace. # findmnt ├─/tmp tmpfs tmpfs rw,nosuid,nodev,seclab │ ├─/tmp/a tmpfs[/a] tmpfs rw,nosuid,nodev,seclab │ │ └─/tmp/a/c tmpfs[/a/c] tmpfs rw,nosuid,nodev,seclab │ ├─/tmp/b tmpfs[/a] tmpfs rw,nosuid,nodev,seclab │ │ └─/tmp/b/c tmpfs[/a/c] tmpfs rw,nosuid,nodev,seclab │ └─/tmp/a/c tmpfs[/a/c] tmpfs rw,nosuid,nodev,seclab Mount of c has propagated into /tmp/b/c as well. And that's what we want. Thanks Vivek
On Wed, Apr 29, 2020 at 08:34:24AM -0400, Vivek Goyal wrote: > On Wed, Apr 29, 2020 at 11:26:49AM +0200, Miklos Szeredi wrote: > > On Wed, Apr 29, 2020 at 9:59 AM Miklos Szeredi <mszeredi@redhat.com> wrote: > > > > > > On Tue, Apr 28, 2020 at 9:15 PM Dr. David Alan Gilbert > > > <dgilbert@redhat.com> wrote: > > > > > > > So our current sequence is: > > > > > > > > (new namespace) > > > > 1) if (mount(NULL, "/", NULL, MS_REC | MS_SLAVE, NULL) < 0) { > > > > 2) if (mount("proc", "/proc", "proc", > > > > .... > > > > 3) if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > > > 4) (chdir newroot, pivot, chdir oldroot) > > > > 5) if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) { > > > > 6) if (umount2(".", MNT_DETACH) < 0) { > > > > > > > > So are you saying we need a: > > > > if (mount(NULL, "/", NULL, MS_REC | MS_SHARED, NULL) < 0) { > > > > > > > > and can this go straight after (1) ? > > > > > > Or right before (3). Important thing is that that new mount will > > > only receive propagation if the type of the mount at source (before > > > (3) is performed) is shared. > > > > And seems I was wrong. Bind mounting clones the slave property, hence > > no need to set MS_SHARED. I.e. if the source was a slave, the bind > > mount will be a slave to the same master as well; the two slaves won't > > receive propagation between each other, but both will receive > > propagation from the master. > > Agreed. I was playing with it yesterday and noticed the same thing. Wanted > to test more before I said anything > > Anyway, I did following. > > $ mkdir /tmp/a /tmp/a/c /tmp/b > $ mount --bind /tmp/a /tmp/a > > $ findmnt -o +PROPAGATION /tmp/a > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > /tmp/a tmpfs[/a] tmpfs rw,nosuid,nodev,seclabel shared A note, this is "shared" by default becase parent mount ("/") is "shared" by default due to systemd settings. Some distributions make "/" private instead and in that case, new mount namespace will not be "slave" and will not receive propagation events. In that case we will have to document to bind mount source directory with "shared" propagation so that mounts done later on host can propagate into virtiofsd namespace. Thanks Vivek
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > * Max Reitz (mreitz@redhat.com) wrote: > > Currently, setup_mounts() bind-mounts the shared directory without > > MS_REC. This makes all submounts disappear. > > > > Pass MS_REC so that the guest can see submounts again. > > Thanks! > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > --- > > tools/virtiofsd/passthrough_ll.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > index 4c35c95b25..9d7f863e66 100644 > > --- a/tools/virtiofsd/passthrough_ll.c > > +++ b/tools/virtiofsd/passthrough_ll.c > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > > int oldroot; > > int newroot; > > > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > > exit(1); > > } > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > host? > What's the interaction between this and the MS_REC|MS_SLAVE that we have > a few lines above for / ? Just to confirm something from vgoyal, and what had confused me about why we hadn't spotted this earlier. Even without this patch, the SLAVE stuff worked so if you start the daemon and *then* mount under the shared directory, the guest sees it with or without this patch. However, without this patch you don't see a mount that was already there before you start the daemon. Dave > Dave > > > -- > > 2.25.3 > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, Apr 29, 2020 at 03:57:20PM +0100, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > > * Max Reitz (mreitz@redhat.com) wrote: > > > Currently, setup_mounts() bind-mounts the shared directory without > > > MS_REC. This makes all submounts disappear. > > > > > > Pass MS_REC so that the guest can see submounts again. > > > > Thanks! > > > > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > > > > Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > > > > > Signed-off-by: Max Reitz <mreitz@redhat.com> > > > --- > > > tools/virtiofsd/passthrough_ll.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > > > index 4c35c95b25..9d7f863e66 100644 > > > --- a/tools/virtiofsd/passthrough_ll.c > > > +++ b/tools/virtiofsd/passthrough_ll.c > > > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > > > int oldroot; > > > int newroot; > > > > > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > > > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > > > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > > > exit(1); > > > } > > > > Do we want MS_SLAVE to pick up future mounts that might happenf rom the > > host? > > What's the interaction between this and the MS_REC|MS_SLAVE that we have > > a few lines above for / ? > > Just to confirm something from vgoyal, and what had confused me about > why we hadn't spotted this earlier. > > Even without this patch, the SLAVE stuff worked so if you start the > daemon and *then* mount under the shared directory, the guest sees it > with or without this patch. > > However, without this patch you don't see a mount that was already there > before you start the daemon. MS_REC will do recursive mount and make all existing submounts visible. But it does not do anything about propagation of newly created mounts and that's controlled by propagation properties of mount points. Our propagation properties seem to be SLAVE already and things work for new mounts. But existing submounts must have been masked during bind mount due to absense of MS_REC. Vivek
On 29.04.20 16:57, Dr. David Alan Gilbert wrote: > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: >> * Max Reitz (mreitz@redhat.com) wrote: >>> Currently, setup_mounts() bind-mounts the shared directory without >>> MS_REC. This makes all submounts disappear. >>> >>> Pass MS_REC so that the guest can see submounts again. >> >> Thanks! >> >>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 >> >> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? >> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> tools/virtiofsd/passthrough_ll.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c >>> index 4c35c95b25..9d7f863e66 100644 >>> --- a/tools/virtiofsd/passthrough_ll.c >>> +++ b/tools/virtiofsd/passthrough_ll.c >>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) >>> int oldroot; >>> int newroot; >>> >>> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { >>> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { >>> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); >>> exit(1); >>> } >> >> Do we want MS_SLAVE to pick up future mounts that might happenf rom the >> host? >> What's the interaction between this and the MS_REC|MS_SLAVE that we have >> a few lines above for / ? > > Just to confirm something from vgoyal, and what had confused me about > why we hadn't spotted this earlier. > > Even without this patch, the SLAVE stuff worked so if you start the > daemon and *then* mount under the shared directory, the guest sees it > with or without this patch. Hm, I don’t. Do you really? Max
* Max Reitz (mreitz@redhat.com) wrote: > On 29.04.20 16:57, Dr. David Alan Gilbert wrote: > > * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > >> * Max Reitz (mreitz@redhat.com) wrote: > >>> Currently, setup_mounts() bind-mounts the shared directory without > >>> MS_REC. This makes all submounts disappear. > >>> > >>> Pass MS_REC so that the guest can see submounts again. > >> > >> Thanks! > >> > >>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > >> > >> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? > >> > >>> Signed-off-by: Max Reitz <mreitz@redhat.com> > >>> --- > >>> tools/virtiofsd/passthrough_ll.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > >>> index 4c35c95b25..9d7f863e66 100644 > >>> --- a/tools/virtiofsd/passthrough_ll.c > >>> +++ b/tools/virtiofsd/passthrough_ll.c > >>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > >>> int oldroot; > >>> int newroot; > >>> > >>> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > >>> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > >>> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > >>> exit(1); > >>> } > >> > >> Do we want MS_SLAVE to pick up future mounts that might happenf rom the > >> host? > >> What's the interaction between this and the MS_REC|MS_SLAVE that we have > >> a few lines above for / ? > > > > Just to confirm something from vgoyal, and what had confused me about > > why we hadn't spotted this earlier. > > > > Even without this patch, the SLAVE stuff worked so if you start the > > daemon and *then* mount under the shared directory, the guest sees it > > with or without this patch. > > Hm, I don’t. Do you really? Yes! With your patch reverted: Start virtiofsd, mount in the guest: host: # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback guest: # mount -t virtiofs myfs /sysroot host: # findmnt -o +PROPAGATION -N 6100 TARGET SOURCE FSTYPE OPTIONS PROPAGATION / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp # findmnt -o +PROPAGATION -N 6100 TARGET SOURCE FSTYPE OPTIONS PROPAGATION / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave # touch /home/dgilbert/virtio-fs/fs/tmp/hello guest: # ls -l /sysroot/tmp total 0 -rw-r--r-- 1 root root 0 Apr 30 08:50 hello Maybe this is related to what Vivek said about default behaviours on systemd's, what does: # findmnt -o +PROPAGATION TARGET SOURCE FSTYPE OPTIONS PROPAGATION / /dev/mapper/fedora_dgilbert--t580-root │ xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota shared say for your source= directory? Dave > Max > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 30.04.20 10:58, Dr. David Alan Gilbert wrote: > * Max Reitz (mreitz@redhat.com) wrote: >> On 29.04.20 16:57, Dr. David Alan Gilbert wrote: >>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: >>>> * Max Reitz (mreitz@redhat.com) wrote: >>>>> Currently, setup_mounts() bind-mounts the shared directory without >>>>> MS_REC. This makes all submounts disappear. >>>>> >>>>> Pass MS_REC so that the guest can see submounts again. >>>> >>>> Thanks! >>>> >>>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 >>>> >>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? >>>> >>>>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>>>> --- >>>>> tools/virtiofsd/passthrough_ll.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c >>>>> index 4c35c95b25..9d7f863e66 100644 >>>>> --- a/tools/virtiofsd/passthrough_ll.c >>>>> +++ b/tools/virtiofsd/passthrough_ll.c >>>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) >>>>> int oldroot; >>>>> int newroot; >>>>> >>>>> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { >>>>> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { >>>>> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); >>>>> exit(1); >>>>> } >>>> >>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the >>>> host? >>>> What's the interaction between this and the MS_REC|MS_SLAVE that we have >>>> a few lines above for / ? >>> >>> Just to confirm something from vgoyal, and what had confused me about >>> why we hadn't spotted this earlier. >>> >>> Even without this patch, the SLAVE stuff worked so if you start the >>> daemon and *then* mount under the shared directory, the guest sees it >>> with or without this patch. >> >> Hm, I don’t. Do you really? > > Yes! With your patch reverted: > > Start virtiofsd, mount in the guest: > > host: > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > guest: > # mount -t virtiofs myfs /sysroot OK, for some reason I didn’t try to mount in the guest first, but did the host mount after starting virtiofsd. My mount test uses multiple mounts, and two of them I see mounted, but three I still don’t see mounted. Let me see whether I can come up with something reproducible that isn’t a script. [...] > Maybe this is related to what Vivek said about default behaviours on > systemd's, what does: > > # findmnt -o +PROPAGATION > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > / /dev/mapper/fedora_dgilbert--t580-root > │ xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota shared > > say for your source= directory? I have Fedora, too, so pretty much the same. Max
On 30.04.20 11:21, Max Reitz wrote: > On 30.04.20 10:58, Dr. David Alan Gilbert wrote: >> * Max Reitz (mreitz@redhat.com) wrote: >>> On 29.04.20 16:57, Dr. David Alan Gilbert wrote: >>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: >>>>> * Max Reitz (mreitz@redhat.com) wrote: >>>>>> Currently, setup_mounts() bind-mounts the shared directory without >>>>>> MS_REC. This makes all submounts disappear. >>>>>> >>>>>> Pass MS_REC so that the guest can see submounts again. >>>>> >>>>> Thanks! >>>>> >>>>>> Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 >>>>> >>>>> Should this actually be 5baa3b8e95064c2434bd9e2f312edd5e9ae275dc ? >>>>> >>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>>>>> --- >>>>>> tools/virtiofsd/passthrough_ll.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c >>>>>> index 4c35c95b25..9d7f863e66 100644 >>>>>> --- a/tools/virtiofsd/passthrough_ll.c >>>>>> +++ b/tools/virtiofsd/passthrough_ll.c >>>>>> @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) >>>>>> int oldroot; >>>>>> int newroot; >>>>>> >>>>>> - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { >>>>>> + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { >>>>>> fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); >>>>>> exit(1); >>>>>> } >>>>> >>>>> Do we want MS_SLAVE to pick up future mounts that might happenf rom the >>>>> host? >>>>> What's the interaction between this and the MS_REC|MS_SLAVE that we have >>>>> a few lines above for / ? >>>> >>>> Just to confirm something from vgoyal, and what had confused me about >>>> why we hadn't spotted this earlier. >>>> >>>> Even without this patch, the SLAVE stuff worked so if you start the >>>> daemon and *then* mount under the shared directory, the guest sees it >>>> with or without this patch. >>> >>> Hm, I don’t. Do you really? >> >> Yes! With your patch reverted: >> >> Start virtiofsd, mount in the guest: >> >> host: >> # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback >> >> guest: >> # mount -t virtiofs myfs /sysroot > > OK, for some reason I didn’t try to mount in the guest first, but did > the host mount after starting virtiofsd. > > My mount test uses multiple mounts, and two of them I see mounted, but > three I still don’t see mounted. And if I do a find /sysroot (or find /mnt in my case) in the guest before mounting anything on the host, the mounts don’t appear in the guest at all, but that’s kind of a virtiofsd bug (or feature?). That is, if it has opened some directory before it became a mountpoint, then it stays not-a-mountpoint. > Let me see whether I can come up with something reproducible that isn’t > a script. Unfortunately, I can’t. I’ve attached the script. It launches qemu, then in the guest you have to mount “host” somewhere (before the script emits “mounting”, ideally). I can see that in the guest, mnt0-0 is mounted, and mnt0-0/sub is, too, but mnt0-1 and mnt1 aren’t. With this patch here, everything is mounted. Max
On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote: [..] > > > Even without this patch, the SLAVE stuff worked so if you start the > > > daemon and *then* mount under the shared directory, the guest sees it > > > with or without this patch. > > > > Hm, I don’t. Do you really? > > Yes! With your patch reverted: > > Start virtiofsd, mount in the guest: > > host: > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > guest: > # mount -t virtiofs myfs /sysroot > > host: > # findmnt -o +PROPAGATION -N 6100 > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp > # findmnt -o +PROPAGATION -N 6100 > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave Why is it showing a mount point at "/tmp". If mount point propagated, then inside guest we should see a mount point at /sysroot/tmp? So there are two things. A. Propagation of mount from host to virtiofsd. B. Visibility of that mount inside guest over fuse protocol (submount functionality). I think A works for me without any patches. But don't think B is working for me. I don't see the submount inside guest. > # touch /home/dgilbert/virtio-fs/fs/tmp/hello > > guest: > # ls -l /sysroot/tmp > total 0 > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello Do a "findmnt /sysroot/tmp" inside guest and see what do you see. You will be able to see "hello" as long as virtiofsd sees the new mount point, I think. And guest does not have to see that mount point for this simple test to work. Vivek
* Vivek Goyal (vgoyal@redhat.com) wrote: > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote: > [..] > > > > Even without this patch, the SLAVE stuff worked so if you start the > > > > daemon and *then* mount under the shared directory, the guest sees it > > > > with or without this patch. > > > > > > Hm, I don’t. Do you really? > > > > Yes! With your patch reverted: > > > > Start virtiofsd, mount in the guest: > > > > host: > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > > > guest: > > # mount -t virtiofs myfs /sysroot > > > > host: > > # findmnt -o +PROPAGATION -N 6100 > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp > > # findmnt -o +PROPAGATION -N 6100 > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave > > Why is it showing a mount point at "/tmp". If mount point propagated, then > inside guest we should see a mount point at /sysroot/tmp? That findmnt is on the host. > So there are two things. > > A. Propagation of mount from host to virtiofsd. > B. Visibility of that mount inside guest over fuse protocol (submount > functionality). > > I think A works for me without any patches. But don't think B is working > for me. I don't see the submount inside guest. > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello > > > > guest: > > # ls -l /sysroot/tmp > > total 0 > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see. > > You will be able to see "hello" as long as virtiofsd sees the new > mount point, I think. And guest does not have to see that mount point > for this simple test to work. Right, the guest just sees: `-/sysroot myfs virtiof rw,relatime Dave > Vivek -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote: > * Vivek Goyal (vgoyal@redhat.com) wrote: > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote: > > [..] > > > > > Even without this patch, the SLAVE stuff worked so if you start the > > > > > daemon and *then* mount under the shared directory, the guest sees it > > > > > with or without this patch. > > > > > > > > Hm, I don’t. Do you really? > > > > > > Yes! With your patch reverted: > > > > > > Start virtiofsd, mount in the guest: > > > > > > host: > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > > > > > guest: > > > # mount -t virtiofs myfs /sysroot > > > > > > host: > > > # findmnt -o +PROPAGATION -N 6100 > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp > > > # findmnt -o +PROPAGATION -N 6100 > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave > > > > Why is it showing a mount point at "/tmp". If mount point propagated, then > > inside guest we should see a mount point at /sysroot/tmp? > > That findmnt is on the host. > > > So there are two things. > > > > A. Propagation of mount from host to virtiofsd. > > B. Visibility of that mount inside guest over fuse protocol (submount > > functionality). > > > > I think A works for me without any patches. But don't think B is working > > for me. I don't see the submount inside guest. > > > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello > > > > > > guest: > > > # ls -l /sysroot/tmp > > > total 0 > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello > > > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see. > > > > You will be able to see "hello" as long as virtiofsd sees the new > > mount point, I think. And guest does not have to see that mount point > > for this simple test to work. > > Right, the guest just sees: > > `-/sysroot myfs virtiof rw,relatime Aha.. we are on same page now. So we need to figure out what's needed to propagate mounts inside guest. And that's what Max is working on, I think. Thanks Vivek
On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote: > * Vivek Goyal (vgoyal@redhat.com) wrote: > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote: > > [..] > > > > > Even without this patch, the SLAVE stuff worked so if you start the > > > > > daemon and *then* mount under the shared directory, the guest sees it > > > > > with or without this patch. > > > > > > > > Hm, I don’t. Do you really? > > > > > > Yes! With your patch reverted: > > > > > > Start virtiofsd, mount in the guest: > > > > > > host: > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > > > > > guest: > > > # mount -t virtiofs myfs /sysroot > > > > > > host: > > > # findmnt -o +PROPAGATION -N 6100 > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp > > > # findmnt -o +PROPAGATION -N 6100 > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave > > > > Why is it showing a mount point at "/tmp". If mount point propagated, then > > inside guest we should see a mount point at /sysroot/tmp? > > That findmnt is on the host. > > > So there are two things. > > > > A. Propagation of mount from host to virtiofsd. > > B. Visibility of that mount inside guest over fuse protocol (submount > > functionality). > > > > I think A works for me without any patches. But don't think B is working > > for me. I don't see the submount inside guest. > > > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello > > > > > > guest: > > > # ls -l /sysroot/tmp > > > total 0 > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello > > > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see. > > > > You will be able to see "hello" as long as virtiofsd sees the new > > mount point, I think. And guest does not have to see that mount point > > for this simple test to work. > > Right, the guest just sees: > > `-/sysroot myfs virtiof rw,relatime That is a good thing surely ? If I'm exporting "/sysroot" from the host, I want the content in "/sysroot/some/sub/mount" to be visible to the guest, but I don't want the guest to see "/sysroot/some/sub/mount" as an actual mount point. That would be leaking information about the host storage setup into the guest. The host admin should be free to re-arrange submounts in the host OS, to bring more storage space online, and have this be transparent to the guest OS. Regards, Daniel
On Thu, Apr 30, 2020 at 03:34:25PM +0100, Daniel P. Berrangé wrote: > On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote: > > * Vivek Goyal (vgoyal@redhat.com) wrote: > > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote: > > > [..] > > > > > > Even without this patch, the SLAVE stuff worked so if you start the > > > > > > daemon and *then* mount under the shared directory, the guest sees it > > > > > > with or without this patch. > > > > > > > > > > Hm, I don’t. Do you really? > > > > > > > > Yes! With your patch reverted: > > > > > > > > Start virtiofsd, mount in the guest: > > > > > > > > host: > > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > > > > > > > guest: > > > > # mount -t virtiofs myfs /sysroot > > > > > > > > host: > > > > # findmnt -o +PROPAGATION -N 6100 > > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp > > > > # findmnt -o +PROPAGATION -N 6100 > > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > > └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave > > > > > > Why is it showing a mount point at "/tmp". If mount point propagated, then > > > inside guest we should see a mount point at /sysroot/tmp? > > > > That findmnt is on the host. > > > > > So there are two things. > > > > > > A. Propagation of mount from host to virtiofsd. > > > B. Visibility of that mount inside guest over fuse protocol (submount > > > functionality). > > > > > > I think A works for me without any patches. But don't think B is working > > > for me. I don't see the submount inside guest. > > > > > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello > > > > > > > > guest: > > > > # ls -l /sysroot/tmp > > > > total 0 > > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello > > > > > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see. > > > > > > You will be able to see "hello" as long as virtiofsd sees the new > > > mount point, I think. And guest does not have to see that mount point > > > for this simple test to work. > > > > Right, the guest just sees: > > > > `-/sysroot myfs virtiof rw,relatime > > That is a good thing surely ? If I'm exporting "/sysroot" from the host, > I want the content in "/sysroot/some/sub/mount" to be visible to the > guest, but I don't want the guest to see "/sysroot/some/sub/mount" > as an actual mount point. That would be leaking information about the > host storage setup into the guest. The host admin should be free to > re-arrange submounts in the host OS, to bring more storage space online, > and have this be transparent to the guest OS. If we don't see mount inside guest, we run into the possibility of inode number collision. On host two files in shared dir can have same inode number (if they are on two different filesystem with different device numbers). But inside guest, we will show device number of virtiofs, and it will look as if two files in this filesystem have same inode number, breaking some workloads. By propagating mounts (submounts), we can assign a unique device number to these submounts and hence <dev,inode> number pair will become unique. W.r.t information lea, may be we can mask some of the information in submounts inside guest. Thanks Vivek
On Thu, Apr 30, 2020 at 10:41:16AM -0400, Vivek Goyal wrote: > On Thu, Apr 30, 2020 at 03:34:25PM +0100, Daniel P. Berrangé wrote: > > On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote: > > > * Vivek Goyal (vgoyal@redhat.com) wrote: > > > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote: > > > > [..] > > > > > > > Even without this patch, the SLAVE stuff worked so if you start the > > > > > > > daemon and *then* mount under the shared directory, the guest sees it > > > > > > > with or without this patch. > > > > > > > > > > > > Hm, I don’t. Do you really? > > > > > > > > > > Yes! With your patch reverted: > > > > > > > > > > Start virtiofsd, mount in the guest: > > > > > > > > > > host: > > > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > > > > > > > > > guest: > > > > > # mount -t virtiofs myfs /sysroot > > > > > > > > > > host: > > > > > # findmnt -o +PROPAGATION -N 6100 > > > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp > > > > > # findmnt -o +PROPAGATION -N 6100 > > > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > > > └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave > > > > > > > > Why is it showing a mount point at "/tmp". If mount point propagated, then > > > > inside guest we should see a mount point at /sysroot/tmp? > > > > > > That findmnt is on the host. > > > > > > > So there are two things. > > > > > > > > A. Propagation of mount from host to virtiofsd. > > > > B. Visibility of that mount inside guest over fuse protocol (submount > > > > functionality). > > > > > > > > I think A works for me without any patches. But don't think B is working > > > > for me. I don't see the submount inside guest. > > > > > > > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello > > > > > > > > > > guest: > > > > > # ls -l /sysroot/tmp > > > > > total 0 > > > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello > > > > > > > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see. > > > > > > > > You will be able to see "hello" as long as virtiofsd sees the new > > > > mount point, I think. And guest does not have to see that mount point > > > > for this simple test to work. > > > > > > Right, the guest just sees: > > > > > > `-/sysroot myfs virtiof rw,relatime > > > > That is a good thing surely ? If I'm exporting "/sysroot" from the host, > > I want the content in "/sysroot/some/sub/mount" to be visible to the > > guest, but I don't want the guest to see "/sysroot/some/sub/mount" > > as an actual mount point. That would be leaking information about the > > host storage setup into the guest. The host admin should be free to > > re-arrange submounts in the host OS, to bring more storage space online, > > and have this be transparent to the guest OS. > > If we don't see mount inside guest, we run into the possibility of inode > number collision. On host two files in shared dir can have same inode > number (if they are on two different filesystem with different device > numbers). But inside guest, we will show device number of virtiofs, > and it will look as if two files in this filesystem have same inode > number, breaking some workloads. > > By propagating mounts (submounts), we can assign a unique device number > to these submounts and hence <dev,inode> number pair will become unique. Ah, yes, that's true. In 9pfs there was recent changes precisely because of this clash possibility: commit 1a6ed33cc56997479bbe5b48337ff8da44585bd4 Author: Antonios Motakis <antonios.motakis@huawei.com> Date: Thu Oct 10 11:36:05 2019 +0200 9p: Added virtfs option 'multidevs=remap|forbid|warn' 'warn' (default): Only log an error message (once) on host if more than one device is shared by same export, except of that just ignore this config error though. This is the default behaviour for not breaking existing installations implying that they really know what they are doing. 'forbid': Like 'warn', but except of just logging an error this also denies access of guest to additional devices. 'remap': Allows to share more than one device per export by remapping inodes from host to guest appropriately. To support multiple devices on the 9p share, and avoid qid path collisions we take the device id as input to generate a unique QID path. The lowest 48 bits of the path will be set equal to the file inode, and the top bits will be uniquely assigned based on the top 16 bits of the inode and the device id. Perhaps we should try to support the same options in virtio-fs. At least the "forbid" and "remap" options make sense I think. "warn" was only really there for backcompat. If we can expose it to the guest, then a further "expose" option would be viable. Regards, Daniel
On Thu, Apr 30, 2020 at 03:47:04PM +0100, Daniel P. Berrangé wrote: > On Thu, Apr 30, 2020 at 10:41:16AM -0400, Vivek Goyal wrote: > > On Thu, Apr 30, 2020 at 03:34:25PM +0100, Daniel P. Berrangé wrote: > > > On Thu, Apr 30, 2020 at 03:20:13PM +0100, Dr. David Alan Gilbert wrote: > > > > * Vivek Goyal (vgoyal@redhat.com) wrote: > > > > > On Thu, Apr 30, 2020 at 09:58:12AM +0100, Dr. David Alan Gilbert wrote: > > > > > [..] > > > > > > > > Even without this patch, the SLAVE stuff worked so if you start the > > > > > > > > daemon and *then* mount under the shared directory, the guest sees it > > > > > > > > with or without this patch. > > > > > > > > > > > > > > Hm, I don’t. Do you really? > > > > > > > > > > > > Yes! With your patch reverted: > > > > > > > > > > > > Start virtiofsd, mount in the guest: > > > > > > > > > > > > host: > > > > > > # ./virtiofsd --socket-path=/tmp/vhostqemu -o source=/home/dgilbert/virtio-fs/fs -o log_level=warn -o no_writeback > > > > > > > > > > > > guest: > > > > > > # mount -t virtiofs myfs /sysroot > > > > > > > > > > > > host: > > > > > > # findmnt -o +PROPAGATION -N 6100 > > > > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > > > > # mount -t tmpfs /dev/null /home/dgilbert/virtio-fs/fs/tmp > > > > > > # findmnt -o +PROPAGATION -N 6100 > > > > > > TARGET SOURCE FSTYPE OPTIONS PROPAGATION > > > > > > / /dev/mapper/fedora_dgilbert--t580-root[/home/dgilbert/virtio-fs/fs] xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,no private,slave > > > > > > └─/tmp /dev/null tmpfs rw,relatime,seclabel private,slave > > > > > > > > > > Why is it showing a mount point at "/tmp". If mount point propagated, then > > > > > inside guest we should see a mount point at /sysroot/tmp? > > > > > > > > That findmnt is on the host. > > > > > > > > > So there are two things. > > > > > > > > > > A. Propagation of mount from host to virtiofsd. > > > > > B. Visibility of that mount inside guest over fuse protocol (submount > > > > > functionality). > > > > > > > > > > I think A works for me without any patches. But don't think B is working > > > > > for me. I don't see the submount inside guest. > > > > > > > > > > > # touch /home/dgilbert/virtio-fs/fs/tmp/hello > > > > > > > > > > > > guest: > > > > > > # ls -l /sysroot/tmp > > > > > > total 0 > > > > > > -rw-r--r-- 1 root root 0 Apr 30 08:50 hello > > > > > > > > > > Do a "findmnt /sysroot/tmp" inside guest and see what do you see. > > > > > > > > > > You will be able to see "hello" as long as virtiofsd sees the new > > > > > mount point, I think. And guest does not have to see that mount point > > > > > for this simple test to work. > > > > > > > > Right, the guest just sees: > > > > > > > > `-/sysroot myfs virtiof rw,relatime > > > > > > That is a good thing surely ? If I'm exporting "/sysroot" from the host, > > > I want the content in "/sysroot/some/sub/mount" to be visible to the > > > guest, but I don't want the guest to see "/sysroot/some/sub/mount" > > > as an actual mount point. That would be leaking information about the > > > host storage setup into the guest. The host admin should be free to > > > re-arrange submounts in the host OS, to bring more storage space online, > > > and have this be transparent to the guest OS. > > > > If we don't see mount inside guest, we run into the possibility of inode > > number collision. On host two files in shared dir can have same inode > > number (if they are on two different filesystem with different device > > numbers). But inside guest, we will show device number of virtiofs, > > and it will look as if two files in this filesystem have same inode > > number, breaking some workloads. > > > > By propagating mounts (submounts), we can assign a unique device number > > to these submounts and hence <dev,inode> number pair will become unique. > > Ah, yes, that's true. In 9pfs there was recent changes precisely > because of this clash possibility: > > commit 1a6ed33cc56997479bbe5b48337ff8da44585bd4 > Author: Antonios Motakis <antonios.motakis@huawei.com> > Date: Thu Oct 10 11:36:05 2019 +0200 > > 9p: Added virtfs option 'multidevs=remap|forbid|warn' > > 'warn' (default): Only log an error message (once) on host if more than one > device is shared by same export, except of that just ignore this config > error though. This is the default behaviour for not breaking existing > installations implying that they really know what they are doing. > > 'forbid': Like 'warn', but except of just logging an error this > also denies access of guest to additional devices. > > 'remap': Allows to share more than one device per export by remapping > inodes from host to guest appropriately. To support multiple devices on the > 9p share, and avoid qid path collisions we take the device id as input to > generate a unique QID path. The lowest 48 bits of the path will be set > equal to the file inode, and the top bits will be uniquely assigned based > on the top 16 bits of the inode and the device id. > > > Perhaps we should try to support the same options in virtio-fs. At least > the "forbid" and "remap" options make sense I think. "warn" was only > really there for backcompat. If we can expose it to the guest, then a > further "expose" option would be viable. Providing options in virtiofsd daemon so that use can tweak these settings makes sense. Given we are already allowing accessing multiple device inside guest, we also probably need a mode to continue to allow that so that not to break existing setups. "forbid", "remap" make sense. And I like the option "expose" too. "remap" will require little thought that how do we do that. We briefly discussed encoding device number in higher inode bits and Miklos was not too keen on it. What if filesystem is using all 64bits of inode. We also need to do encoding in such a way so that inode number is persistent. Thanks Vivek
* Max Reitz (mreitz@redhat.com) wrote: > Currently, setup_mounts() bind-mounts the shared directory without > MS_REC. This makes all submounts disappear. > > Pass MS_REC so that the guest can see submounts again. > > Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 > Signed-off-by: Max Reitz <mreitz@redhat.com> Queued. > --- > tools/virtiofsd/passthrough_ll.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c > index 4c35c95b25..9d7f863e66 100644 > --- a/tools/virtiofsd/passthrough_ll.c > +++ b/tools/virtiofsd/passthrough_ll.c > @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) > int oldroot; > int newroot; > > - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { > + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { > fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); > exit(1); > } > -- > 2.25.3 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index 4c35c95b25..9d7f863e66 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -2643,7 +2643,7 @@ static void setup_mounts(const char *source) int oldroot; int newroot; - if (mount(source, source, NULL, MS_BIND, NULL) < 0) { + if (mount(source, source, NULL, MS_BIND | MS_REC, NULL) < 0) { fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source); exit(1); }
Currently, setup_mounts() bind-mounts the shared directory without MS_REC. This makes all submounts disappear. Pass MS_REC so that the guest can see submounts again. Fixes: 3ca8a2b1c83eb185c232a4e87abbb65495263756 Signed-off-by: Max Reitz <mreitz@redhat.com> --- tools/virtiofsd/passthrough_ll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)