diff mbox series

[RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable

Message ID 20191103185133.GR26530@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show
Series [RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable | expand

Commit Message

Al Viro Nov. 3, 2019, 6:51 p.m. UTC
lower_dentry can't go from positive to negative (we have it pinned),
but it *can* go from negative to positive.  So fetching ->d_inode
into a local variable, doing a blocking allocation, checking that
now ->d_inode is non-NULL and feeding the value we'd fetched
earlier to a function that won't accept NULL is not a good idea.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

Comments

Amir Goldstein Nov. 13, 2019, 7:01 a.m. UTC | #1
On Sun, Nov 3, 2019 at 8:52 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> lower_dentry can't go from positive to negative (we have it pinned),
> but it *can* go from negative to positive.  So fetching ->d_inode
> into a local variable, doing a blocking allocation, checking that
> now ->d_inode is non-NULL and feeding the value we'd fetched
> earlier to a function that won't accept NULL is not a good idea.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index a905d5f4f3b0..3c2298721359 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -319,7 +319,7 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode)
>  static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
>                                      struct dentry *lower_dentry)
>  {
> -       struct inode *inode, *lower_inode = d_inode(lower_dentry);
> +       struct inode *inode, *lower_inode;
>         struct ecryptfs_dentry_info *dentry_info;
>         struct vfsmount *lower_mnt;
>         int rc = 0;
> @@ -339,7 +339,15 @@ static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
>         dentry_info->lower_path.mnt = lower_mnt;
>         dentry_info->lower_path.dentry = lower_dentry;
>
> -       if (d_really_is_negative(lower_dentry)) {
> +       /*
> +        * negative dentry can go positive under us here - its parent is not
> +        * locked.  That's OK and that could happen just as we return from
> +        * ecryptfs_lookup() anyway.  Just need to be careful and fetch
> +        * ->d_inode only once - it's not stable here.
> +        */
> +       lower_inode = READ_ONCE(lower_dentry->d_inode);
> +
> +       if (!lower_inode) {
>                 /* We want to add because we couldn't find in lower */
>                 d_add(dentry, NULL);
>                 return NULL;

Sigh!

Open coding a human readable macro to solve a subtle lookup race.
That doesn't sound like a scalable solution.
I have a feeling this is not the last patch we will be seeing along
those lines.

Seeing that developers already confused about when they should use
d_really_is_negative() over d_is_negative() [1] and we probably
don't want to add d_really_really_is_negative(), how about
applying that READ_ONCE into d_really_is_negative() and
re-purpose it as a macro to be used when races with lookup are
a concern?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20190903135803.GA25692@hsiangkao-HP-ZHAN-66-Pro-G1/
Al Viro Nov. 13, 2019, 12:52 p.m. UTC | #2
On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote:
> > -       if (d_really_is_negative(lower_dentry)) {
> > +       /*
> > +        * negative dentry can go positive under us here - its parent is not
> > +        * locked.  That's OK and that could happen just as we return from
> > +        * ecryptfs_lookup() anyway.  Just need to be careful and fetch
> > +        * ->d_inode only once - it's not stable here.
> > +        */
> > +       lower_inode = READ_ONCE(lower_dentry->d_inode);
> > +
> > +       if (!lower_inode) {
> >                 /* We want to add because we couldn't find in lower */
> >                 d_add(dentry, NULL);
> >                 return NULL;
> 
> Sigh!
> 
> Open coding a human readable macro to solve a subtle lookup race.
> That doesn't sound like a scalable solution.
> I have a feeling this is not the last patch we will be seeing along
> those lines.
> 
> Seeing that developers already confused about when they should use
> d_really_is_negative() over d_is_negative() [1] and we probably
> don't want to add d_really_really_is_negative(), how about
> applying that READ_ONCE into d_really_is_negative() and
> re-purpose it as a macro to be used when races with lookup are
> a concern?

Would you care to explain what that "fix" would've achieved here,
considering the fact that barriers are no-ops on UP and this is
*NOT* an SMP race?

And it's very much present on UP - we have
	fetch ->d_inode into local variable
	do blocking allocation
	check if ->d_inode is NULL now
	if it is not, use the value in local variable and expect it to be non-NULL

That's not a case of missing barriers.  At all.  And no redefinition of
d_really_is_negative() is going to help - it can't retroactively affect
the value explicitly fetched into a local variable some time prior to
that.

There are other patches dealing with ->d_inode accesses, but they are
generally not along the same lines.  The problem is rarely the same...
Amir Goldstein Nov. 13, 2019, 4:22 p.m. UTC | #3
On Wed, Nov 13, 2019 at 2:52 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote:
> > > -       if (d_really_is_negative(lower_dentry)) {
> > > +       /*
> > > +        * negative dentry can go positive under us here - its parent is not
> > > +        * locked.  That's OK and that could happen just as we return from
> > > +        * ecryptfs_lookup() anyway.  Just need to be careful and fetch
> > > +        * ->d_inode only once - it's not stable here.
> > > +        */
> > > +       lower_inode = READ_ONCE(lower_dentry->d_inode);
> > > +
> > > +       if (!lower_inode) {
> > >                 /* We want to add because we couldn't find in lower */
> > >                 d_add(dentry, NULL);
> > >                 return NULL;
> >
> > Sigh!
> >
> > Open coding a human readable macro to solve a subtle lookup race.
> > That doesn't sound like a scalable solution.
> > I have a feeling this is not the last patch we will be seeing along
> > those lines.
> >
> > Seeing that developers already confused about when they should use
> > d_really_is_negative() over d_is_negative() [1] and we probably
> > don't want to add d_really_really_is_negative(), how about
> > applying that READ_ONCE into d_really_is_negative() and
> > re-purpose it as a macro to be used when races with lookup are
> > a concern?
>
> Would you care to explain what that "fix" would've achieved here,
> considering the fact that barriers are no-ops on UP and this is
> *NOT* an SMP race?
>
> And it's very much present on UP - we have
>         fetch ->d_inode into local variable
>         do blocking allocation
>         check if ->d_inode is NULL now
>         if it is not, use the value in local variable and expect it to be non-NULL
>
> That's not a case of missing barriers.  At all.  And no redefinition of
> d_really_is_negative() is going to help - it can't retroactively affect
> the value explicitly fetched into a local variable some time prior to
> that.
>

Indeed. I missed that part of your commit message and didn't
realize the variable was being used later.
The language in the comment "can go positive under us" implied
SMP race so I misunderstood the reason for READ_ONCE().

Sorry for the noise.

Amir.
Jean-Louis Biasini Nov. 13, 2019, 8:18 p.m. UTC | #4
Please can you UNSUBSCRIBE me from this list?

thx

Le 13/11/2019 à 13:52, Al Viro a écrit :
> On Wed, Nov 13, 2019 at 09:01:36AM +0200, Amir Goldstein wrote:
>>> -       if (d_really_is_negative(lower_dentry)) {
>>> +       /*
>>> +        * negative dentry can go positive under us here - its parent is not
>>> +        * locked.  That's OK and that could happen just as we return from
>>> +        * ecryptfs_lookup() anyway.  Just need to be careful and fetch
>>> +        * ->d_inode only once - it's not stable here.
>>> +        */
>>> +       lower_inode = READ_ONCE(lower_dentry->d_inode);
>>> +
>>> +       if (!lower_inode) {
>>>                 /* We want to add because we couldn't find in lower */
>>>                 d_add(dentry, NULL);
>>>                 return NULL;
>> Sigh!
>>
>> Open coding a human readable macro to solve a subtle lookup race.
>> That doesn't sound like a scalable solution.
>> I have a feeling this is not the last patch we will be seeing along
>> those lines.
>>
>> Seeing that developers already confused about when they should use
>> d_really_is_negative() over d_is_negative() [1] and we probably
>> don't want to add d_really_really_is_negative(), how about
>> applying that READ_ONCE into d_really_is_negative() and
>> re-purpose it as a macro to be used when races with lookup are
>> a concern?
> Would you care to explain what that "fix" would've achieved here,
> considering the fact that barriers are no-ops on UP and this is
> *NOT* an SMP race?
>
> And it's very much present on UP - we have
> 	fetch ->d_inode into local variable
> 	do blocking allocation
> 	check if ->d_inode is NULL now
> 	if it is not, use the value in local variable and expect it to be non-NULL
>
> That's not a case of missing barriers.  At all.  And no redefinition of
> d_really_is_negative() is going to help - it can't retroactively affect
> the value explicitly fetched into a local variable some time prior to
> that.
>
> There are other patches dealing with ->d_inode accesses, but they are
> generally not along the same lines.  The problem is rarely the same...
>
diff mbox series

Patch

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index a905d5f4f3b0..3c2298721359 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -319,7 +319,7 @@  static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode)
 static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
 				     struct dentry *lower_dentry)
 {
-	struct inode *inode, *lower_inode = d_inode(lower_dentry);
+	struct inode *inode, *lower_inode;
 	struct ecryptfs_dentry_info *dentry_info;
 	struct vfsmount *lower_mnt;
 	int rc = 0;
@@ -339,7 +339,15 @@  static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
 	dentry_info->lower_path.mnt = lower_mnt;
 	dentry_info->lower_path.dentry = lower_dentry;
 
-	if (d_really_is_negative(lower_dentry)) {
+	/*
+	 * negative dentry can go positive under us here - its parent is not
+	 * locked.  That's OK and that could happen just as we return from
+	 * ecryptfs_lookup() anyway.  Just need to be careful and fetch
+	 * ->d_inode only once - it's not stable here.
+	 */
+	lower_inode = READ_ONCE(lower_dentry->d_inode);
+
+	if (!lower_inode) {
 		/* We want to add because we couldn't find in lower */
 		d_add(dentry, NULL);
 		return NULL;