[v5,02/19] fs: don't take the i_lock in inode_inc_iversion
diff mbox

Message ID 20180109141059.25929-3-jlayton@kernel.org
State New
Headers show

Commit Message

Jeff Layton Jan. 9, 2018, 2:10 p.m. UTC
From: Jeff Layton <jlayton@redhat.com>

The rationale for taking the i_lock when incrementing this value is
lost in antiquity. The readers of the field don't take it (at least
not universally), so my assumption is that it was only done here to
serialize incrementors.

If that is indeed the case, then we can drop the i_lock from this
codepath and treat it as a atomic64_t for the purposes of
incrementing it. This allows us to use inode_inc_iversion without
any danger of lock inversion.

Note that the read side is not fetched atomically with this change.
The assumption here is that that is not a critical issue since the
i_version is not fully synchronized with anything else anyway.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/iversion.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Jan Kara Jan. 9, 2018, 3:14 p.m. UTC | #1
On Tue 09-01-18 09:10:42, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> The rationale for taking the i_lock when incrementing this value is
> lost in antiquity. The readers of the field don't take it (at least
> not universally), so my assumption is that it was only done here to
> serialize incrementors.
> 
> If that is indeed the case, then we can drop the i_lock from this
> codepath and treat it as a atomic64_t for the purposes of
> incrementing it. This allows us to use inode_inc_iversion without
> any danger of lock inversion.
> 
> Note that the read side is not fetched atomically with this change.
> The assumption here is that that is not a critical issue since the
> i_version is not fully synchronized with anything else anyway.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

This changes the memory barrier behavior but IMO it is good enough for an
intermediate version. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/iversion.h | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> index d09cc3a08740..5ad9eaa3a9b0 100644
> --- a/include/linux/iversion.h
> +++ b/include/linux/iversion.h
> @@ -104,12 +104,13 @@ inode_set_iversion_queried(struct inode *inode, u64 new)
>  static inline bool
>  inode_maybe_inc_iversion(struct inode *inode, bool force)
>  {
> -	spin_lock(&inode->i_lock);
> -	inode->i_version++;
> -	spin_unlock(&inode->i_lock);
> +	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
> +
> +	atomic64_inc(ivp);
>  	return true;
>  }
>  
> +
>  /**
>   * inode_inc_iversion - forcibly increment i_version
>   * @inode: inode that needs to be updated
> -- 
> 2.14.3
>
J. Bruce Fields Jan. 18, 2018, 9:45 p.m. UTC | #2
On Tue, Jan 09, 2018 at 09:10:42AM -0500, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> The rationale for taking the i_lock when incrementing this value is
> lost in antiquity. The readers of the field don't take it (at least
> not universally), so my assumption is that it was only done here to
> serialize incrementors.
> 
> If that is indeed the case, then we can drop the i_lock from this
> codepath and treat it as a atomic64_t for the purposes of
> incrementing it. This allows us to use inode_inc_iversion without
> any danger of lock inversion.
> 
> Note that the read side is not fetched atomically with this change.
> The assumption here is that that is not a critical issue since the
> i_version is not fully synchronized with anything else anyway.

So I guess it's theoretically possible that e.g. if you read while it's
incrementing from 2^32-1 to 2^32 you could read 0, 1, or 2^32+1?

If so then you could see an i_version value reused and incorrectly
decide that a file hadn't changed.

But it's such a tiny case, and I think you convert this to atomic64_t
later anyway, so, whatever.

--b.

> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  include/linux/iversion.h | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> index d09cc3a08740..5ad9eaa3a9b0 100644
> --- a/include/linux/iversion.h
> +++ b/include/linux/iversion.h
> @@ -104,12 +104,13 @@ inode_set_iversion_queried(struct inode *inode, u64 new)
>  static inline bool
>  inode_maybe_inc_iversion(struct inode *inode, bool force)
>  {
> -	spin_lock(&inode->i_lock);
> -	inode->i_version++;
> -	spin_unlock(&inode->i_lock);
> +	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
> +
> +	atomic64_inc(ivp);
>  	return true;
>  }
>  
> +
>  /**
>   * inode_inc_iversion - forcibly increment i_version
>   * @inode: inode that needs to be updated
> -- 
> 2.14.3
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Jan. 19, 2018, 2:36 p.m. UTC | #3
On Thu, 2018-01-18 at 16:45 -0500, J. Bruce Fields wrote:
> On Tue, Jan 09, 2018 at 09:10:42AM -0500, Jeff Layton wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > The rationale for taking the i_lock when incrementing this value is
> > lost in antiquity. The readers of the field don't take it (at least
> > not universally), so my assumption is that it was only done here to
> > serialize incrementors.
> > 
> > If that is indeed the case, then we can drop the i_lock from this
> > codepath and treat it as a atomic64_t for the purposes of
> > incrementing it. This allows us to use inode_inc_iversion without
> > any danger of lock inversion.
> > 
> > Note that the read side is not fetched atomically with this change.
> > The assumption here is that that is not a critical issue since the
> > i_version is not fully synchronized with anything else anyway.
> 
> So I guess it's theoretically possible that e.g. if you read while it's
> incrementing from 2^32-1 to 2^32 you could read 0, 1, or 2^32+1?
>
> If so then you could see an i_version value reused and incorrectly
> decide that a file hadn't changed.
> 
> But it's such a tiny case, and I think you convert this to atomic64_t
> later anyway, so, whatever.
> 
> --b.
> 

Shrug...we have that problem with the spinlock in place too. The bottom
line is that reads of this value are not serialized with the increment
at all.

I'm not 100% thrilled with this patch, but I think it's probably better
not to add the i_lock all over the place, even as an interim step in
cleaning this stuff up.

The good news here (as you mention) is that this nastiness gets cleaned
up in the last patch when we convert the thing to an atomic64_t.


> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  include/linux/iversion.h | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index d09cc3a08740..5ad9eaa3a9b0 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -104,12 +104,13 @@ inode_set_iversion_queried(struct inode *inode, u64 new)
> >  static inline bool
> >  inode_maybe_inc_iversion(struct inode *inode, bool force)
> >  {
> > -	spin_lock(&inode->i_lock);
> > -	inode->i_version++;
> > -	spin_unlock(&inode->i_lock);
> > +	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
> > +
> > +	atomic64_inc(ivp);
> >  	return true;
> >  }
> >  
> > +
> >  /**
> >   * inode_inc_iversion - forcibly increment i_version
> >   * @inode: inode that needs to be updated
> > -- 
> > 2.14.3
J. Bruce Fields Jan. 19, 2018, 2:43 p.m. UTC | #4
On Fri, Jan 19, 2018 at 09:36:34AM -0500, Jeff Layton wrote:
> Shrug...we have that problem with the spinlock in place too. The bottom
> line is that reads of this value are not serialized with the increment
> at all.

OK, so this wouldn't even be a new bug.

> I'm not 100% thrilled with this patch, but I think it's probably better
> not to add the i_lock all over the place, even as an interim step in
> cleaning this stuff up.

Makes sense to me.

I've got no comments on the rest of the series, except that I'm all for
it.

Thanks for persisting--it turned out to be more involved than I'd
imagined!

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index d09cc3a08740..5ad9eaa3a9b0 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -104,12 +104,13 @@  inode_set_iversion_queried(struct inode *inode, u64 new)
 static inline bool
 inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
-	spin_lock(&inode->i_lock);
-	inode->i_version++;
-	spin_unlock(&inode->i_lock);
+	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
+
+	atomic64_inc(ivp);
 	return true;
 }
 
+
 /**
  * inode_inc_iversion - forcibly increment i_version
  * @inode: inode that needs to be updated