Message ID | 20191103185133.GR26530@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable | expand |
On Sun, Nov 3, 2019 at 8:52 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > lower_dentry can't go from positive to negative (we have it pinned), > but it *can* go from negative to positive. So fetching ->d_inode > into a local variable, doing a blocking allocation, checking that > now ->d_inode is non-NULL and feeding the value we'd fetched > earlier to a function that won't accept NULL is not a good idea. > > Cc: stable@vger.kernel.org > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c > index a905d5f4f3b0..3c2298721359 100644 > --- a/fs/ecryptfs/inode.c > +++ b/fs/ecryptfs/inode.c > @@ -319,7 +319,7 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode) > static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, > struct dentry *lower_dentry) > { > - struct inode *inode, *lower_inode = d_inode(lower_dentry); > + struct inode *inode, *lower_inode; > struct ecryptfs_dentry_info *dentry_info; > struct vfsmount *lower_mnt; > int rc = 0; > @@ -339,7 +339,15 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, > dentry_info->lower_path.mnt = lower_mnt; > dentry_info->lower_path.dentry = lower_dentry; > > - if (d_really_is_negative(lower_dentry)) { > + /* > + * negative dentry can go positive under us here - its parent is not > + * locked. That's OK and that could happen just as we return from > + * ecryptfs_lookup() anyway. Just need to be careful and fetch > + * ->d_inode only once - it's not stable here. > + */ > + lower_inode = READ_ONCE(lower_dentry->d_inode); > + > + if (!lower_inode) { > /* We want to add because we couldn't find in lower */ > d_add(dentry, NULL); > return NULL; Sigh! Open coding a human readable macro to solve a subtle lookup race. That doesn't sound like a scalable solution. I have a feeling this is not the last patch we will be seeing along those lines. Seeing that developers already confused about when they should use d_really_is_negative() over d_is_negative() [1] and we probably don't want to add d_really_really_is_negative(), how about applying that READ_ONCE into d_really_is_negative() and re-purpose it as a macro to be used when races with lookup are a concern? Thanks, Amir. [1] https://lore.kernel.org/linux-fsdevel/20190903135803.GA25692@hsiangkao-HP-ZHAN-66-Pro-G1/
On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote: > > - if (d_really_is_negative(lower_dentry)) { > > + /* > > + * negative dentry can go positive under us here - its parent is not > > + * locked. That's OK and that could happen just as we return from > > + * ecryptfs_lookup() anyway. Just need to be careful and fetch > > + * ->d_inode only once - it's not stable here. > > + */ > > + lower_inode = READ_ONCE(lower_dentry->d_inode); > > + > > + if (!lower_inode) { > > /* We want to add because we couldn't find in lower */ > > d_add(dentry, NULL); > > return NULL; > > Sigh! > > Open coding a human readable macro to solve a subtle lookup race. > That doesn't sound like a scalable solution. > I have a feeling this is not the last patch we will be seeing along > those lines. > > Seeing that developers already confused about when they should use > d_really_is_negative() over d_is_negative() [1] and we probably > don't want to add d_really_really_is_negative(), how about > applying that READ_ONCE into d_really_is_negative() and > re-purpose it as a macro to be used when races with lookup are > a concern? Would you care to explain what that "fix" would've achieved here, considering the fact that barriers are no-ops on UP and this is *NOT* an SMP race? And it's very much present on UP - we have fetch ->d_inode into local variable do blocking allocation check if ->d_inode is NULL now if it is not, use the value in local variable and expect it to be non-NULL That's not a case of missing barriers. At all. And no redefinition of d_really_is_negative() is going to help - it can't retroactively affect the value explicitly fetched into a local variable some time prior to that. There are other patches dealing with ->d_inode accesses, but they are generally not along the same lines. The problem is rarely the same...
On Wed, Nov 13, 2019 at 2:52 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote: > > > - if (d_really_is_negative(lower_dentry)) { > > > + /* > > > + * negative dentry can go positive under us here - its parent is not > > > + * locked. That's OK and that could happen just as we return from > > > + * ecryptfs_lookup() anyway. Just need to be careful and fetch > > > + * ->d_inode only once - it's not stable here. > > > + */ > > > + lower_inode = READ_ONCE(lower_dentry->d_inode); > > > + > > > + if (!lower_inode) { > > > /* We want to add because we couldn't find in lower */ > > > d_add(dentry, NULL); > > > return NULL; > > > > Sigh! > > > > Open coding a human readable macro to solve a subtle lookup race. > > That doesn't sound like a scalable solution. > > I have a feeling this is not the last patch we will be seeing along > > those lines. > > > > Seeing that developers already confused about when they should use > > d_really_is_negative() over d_is_negative() [1] and we probably > > don't want to add d_really_really_is_negative(), how about > > applying that READ_ONCE into d_really_is_negative() and > > re-purpose it as a macro to be used when races with lookup are > > a concern? > > Would you care to explain what that "fix" would've achieved here, > considering the fact that barriers are no-ops on UP and this is > *NOT* an SMP race? > > And it's very much present on UP - we have > fetch ->d_inode into local variable > do blocking allocation > check if ->d_inode is NULL now > if it is not, use the value in local variable and expect it to be non-NULL > > That's not a case of missing barriers. At all. And no redefinition of > d_really_is_negative() is going to help - it can't retroactively affect > the value explicitly fetched into a local variable some time prior to > that. > Indeed. I missed that part of your commit message and didn't realize the variable was being used later. The language in the comment "can go positive under us" implied SMP race so I misunderstood the reason for READ_ONCE(). Sorry for the noise. Amir.
Please can you UNSUBSCRIBE me from this list? thx Le 13/11/2019 à 13:52, Al Viro a écrit : > On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote: >>> - if (d_really_is_negative(lower_dentry)) { >>> + /* >>> + * negative dentry can go positive under us here - its parent is not >>> + * locked. That's OK and that could happen just as we return from >>> + * ecryptfs_lookup() anyway. Just need to be careful and fetch >>> + * ->d_inode only once - it's not stable here. >>> + */ >>> + lower_inode = READ_ONCE(lower_dentry->d_inode); >>> + >>> + if (!lower_inode) { >>> /* We want to add because we couldn't find in lower */ >>> d_add(dentry, NULL); >>> return NULL; >> Sigh! >> >> Open coding a human readable macro to solve a subtle lookup race. >> That doesn't sound like a scalable solution. >> I have a feeling this is not the last patch we will be seeing along >> those lines. >> >> Seeing that developers already confused about when they should use >> d_really_is_negative() over d_is_negative() [1] and we probably >> don't want to add d_really_really_is_negative(), how about >> applying that READ_ONCE into d_really_is_negative() and >> re-purpose it as a macro to be used when races with lookup are >> a concern? > Would you care to explain what that "fix" would've achieved here, > considering the fact that barriers are no-ops on UP and this is > *NOT* an SMP race? > > And it's very much present on UP - we have > fetch ->d_inode into local variable > do blocking allocation > check if ->d_inode is NULL now > if it is not, use the value in local variable and expect it to be non-NULL > > That's not a case of missing barriers. At all. And no redefinition of > d_really_is_negative() is going to help - it can't retroactively affect > the value explicitly fetched into a local variable some time prior to > that. > > There are other patches dealing with ->d_inode accesses, but they are > generally not along the same lines. The problem is rarely the same... >
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index a905d5f4f3b0..3c2298721359 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -319,7 +319,7 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode) static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, struct dentry *lower_dentry) { - struct inode *inode, *lower_inode = d_inode(lower_dentry); + struct inode *inode, *lower_inode; struct ecryptfs_dentry_info *dentry_info; struct vfsmount *lower_mnt; int rc = 0; @@ -339,7 +339,15 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry, dentry_info->lower_path.mnt = lower_mnt; dentry_info->lower_path.dentry = lower_dentry; - if (d_really_is_negative(lower_dentry)) { + /* + * negative dentry can go positive under us here - its parent is not + * locked. That's OK and that could happen just as we return from + * ecryptfs_lookup() anyway. Just need to be careful and fetch + * ->d_inode only once - it's not stable here. + */ + lower_inode = READ_ONCE(lower_dentry->d_inode); + + if (!lower_inode) { /* We want to add because we couldn't find in lower */ d_add(dentry, NULL); return NULL;
lower_dentry can't go from positive to negative (we have it pinned), but it *can* go from negative to positive. So fetching ->d_inode into a local variable, doing a blocking allocation, checking that now ->d_inode is non-NULL and feeding the value we'd fetched earlier to a function that won't accept NULL is not a good idea. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> ---