diff mbox series

fs: switch timespec64 fields in inode to discrete integers

Message ID 20240517-amtime-v1-1-7b804ca4be8f@kernel.org (mailing list archive)
State New
Headers show
Series fs: switch timespec64 fields in inode to discrete integers | expand

Commit Message

Jeff Layton May 18, 2024, 12:08 a.m. UTC
Adjacent struct timespec64's pack poorly. Switch them to discrete
integer fields for the seconds and nanoseconds. This shaves 8 bytes off
struct inode with a garden-variety Fedora Kconfig on x86_64, but that
also moves the i_lock into the previous cacheline, away from the fields
it protects.

To remedy that, move i_generation above the i_lock, which moves the new
4-byte hole to just after the i_fsnotify_mask in my setup. Amir has
plans to use that to expand the i_fsnotify_mask, so add a comment to
that effect as well.

Cc: Amir Goldstein <amir73il@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
For reference (according to pahole):

    Before:	/* size: 624, cachelines: 10, members: 53 */
    After: 	/* size: 616, cachelines: 10, members: 56 */

I've done some lightweight testing with this and it seems fine, but I
wouldn't mind seeing it sit in -next for a bit to make sure I haven't
missed anything.
---
 include/linux/fs.h | 48 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 16 deletions(-)


---
base-commit: 7ee332c9f12bc5b380e36919cd7d056592a7073f
change-id: 20240517-amtime-68a7ebc76f45

Best regards,

Comments

Matthew Wilcox May 18, 2024, 5:23 a.m. UTC | #1
On Fri, May 17, 2024 at 08:08:40PM -0400, Jeff Layton wrote:
> For reference (according to pahole):
> 
>     Before:	/* size: 624, cachelines: 10, members: 53 */
>     After: 	/* size: 616, cachelines: 10, members: 56 */

Smaller is always better, but for a meaningful improvement, we'd want
to see more.  On my laptop running a Debian 6.6.15 kernel, I see:

inode_cache        11398  11475    640   25    4 : tunables    0    0    0 : slabdata    459    459      0

so there's 25 inodes per 4 pages.  Going down to 632 is still 25 per 4
pages.  At 628 bytes, we get 26 per 4 pages.  Ar 604 bytes, we're at 27.
And at 584 bytes, we get 28.

Of course, struct inode gets embedded in a lot of filesystem inodes.
xfs_inode         142562 142720   1024   32    8 : tunables    0    0    0 : slabdata   4460   4460      0
ext4_inode_cache      81     81   1184   27    8 : tunables    0    0    0 : slabdata      3      3      0
sock_inode_cache    2123   2223    832   39    8 : tunables    0    0    0 : slabdata     57     57      0

So any of them might cross a magic boundary where we suddenly get more
objects per slab.

Not trying to diss the work you've done here, just pointing out the
limits for anyone who's trying to do something similar.  Or maybe
inspire someone to do more reductions ;-)
Jeff Layton May 18, 2024, 10:48 a.m. UTC | #2
On Sat, 2024-05-18 at 06:23 +0100, Matthew Wilcox wrote:
> On Fri, May 17, 2024 at 08:08:40PM -0400, Jeff Layton wrote:
> > For reference (according to pahole):
> > 
> >     Before:	/* size: 624, cachelines: 10, members: 53 */
> >     After: 	/* size: 616, cachelines: 10, members: 56 */
> 
> Smaller is always better, but for a meaningful improvement, we'd want
> to see more.  On my laptop running a Debian 6.6.15 kernel, I see:
> 
> inode_cache        11398  11475    640   25    4 : tunables    0    0    0 : slabdata    459    459      0
> 
> so there's 25 inodes per 4 pages.  Going down to 632 is still 25 per 4
> pages.  At 628 bytes, we get 26 per 4 pages.  Ar 604 bytes, we're at 27.
> And at 584 bytes, we get 28.
> 
> Of course, struct inode gets embedded in a lot of filesystem inodes.
> xfs_inode         142562 142720   1024   32    8 : tunables    0    0    0 : slabdata   4460   4460      0
> ext4_inode_cache      81     81   1184   27    8 : tunables    0    0    0 : slabdata      3      3      0
> sock_inode_cache    2123   2223    832   39    8 : tunables    0    0    0 : slabdata     57     57      0
> 
> So any of them might cross a magic boundary where we suddenly get more
> objects per slab.
> 
> Not trying to diss the work you've done here, just pointing out the
> limits for anyone who's trying to do something similar.  Or maybe
> inspire someone to do more reductions ;-)

Way to bust my bubble, Willy. ;-)

On a more serious note, I may be able to squeeze out another 4 bytes by
moving __i_ctime to a single 8 byte word. It's never settable from
userland, so we probably don't need the full range of times that a
timespec64 gives us there. Shrinking that may also make the multigrain
time rework simpler.

David Howells was also looking at removing the i_private field as well.
Since these structs are usually embedded in a larger structure, it's
not clear that we need that field. If we can make that work, it'll mean
another 8 bytes goes away on 64-bit arches.

IOW, I think there may be some other opportunities for shrinkage in the
future.
Linus Torvalds May 18, 2024, 4:03 p.m. UTC | #3
On Fri, 17 May 2024 at 22:23, Matthew Wilcox <willy@infradead.org> wrote:
>
> Smaller is always better, but for a meaningful improvement, we'd want
> to see more.

I think one of the more interesting metrics for inodes is actually not
necessarily size per se, but cache footprint.

A *lot* of the inode is never actually touched in normal operation.
Inodes have all these fields that are only used for certain types, or
perhaps only for IO.

So inodes are big, but more important than shrinking them is likely to
try to make them dense in the cache for normal operations (ie
open/close/stat in particular). They cache very well, and actual
memory use - while still somewhat relevant - is less relevant than
cache misses.

             Linus
Jan Kara May 20, 2024, 10:43 a.m. UTC | #4
On Fri 17-05-24 20:08:40, Jeff Layton wrote:
> Adjacent struct timespec64's pack poorly. Switch them to discrete
> integer fields for the seconds and nanoseconds. This shaves 8 bytes off
> struct inode with a garden-variety Fedora Kconfig on x86_64, but that
> also moves the i_lock into the previous cacheline, away from the fields
> it protects.
> 
> To remedy that, move i_generation above the i_lock, which moves the new
> 4-byte hole to just after the i_fsnotify_mask in my setup. Amir has
> plans to use that to expand the i_fsnotify_mask, so add a comment to
> that effect as well.
> 
> Cc: Amir Goldstein <amir73il@gmail.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

I like this! Although it doesn't possibly result in less memory being used
by the slab cache in the end as Matthew points out, it still makes the
struct more dense and if nothing else it gives us more space for future
growth ;).  So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> For reference (according to pahole):
> 
>     Before:	/* size: 624, cachelines: 10, members: 53 */
>     After: 	/* size: 616, cachelines: 10, members: 56 */
> 
> I've done some lightweight testing with this and it seems fine, but I
> wouldn't mind seeing it sit in -next for a bit to make sure I haven't
> missed anything.
> ---
>  include/linux/fs.h | 48 ++++++++++++++++++++++++++++++++----------------
>  1 file changed, 32 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b7b9b3b79acc..45e8766de7d8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -660,9 +660,13 @@ struct inode {
>  	};
>  	dev_t			i_rdev;
>  	loff_t			i_size;
> -	struct timespec64	__i_atime;
> -	struct timespec64	__i_mtime;
> -	struct timespec64	__i_ctime; /* use inode_*_ctime accessors! */
> +	time64_t		i_atime_sec;
> +	time64_t		i_mtime_sec;
> +	time64_t		i_ctime_sec;
> +	u32			i_atime_nsec;
> +	u32			i_mtime_nsec;
> +	u32			i_ctime_nsec;
> +	u32			i_generation;
>  	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
>  	unsigned short          i_bytes;
>  	u8			i_blkbits;
> @@ -719,10 +723,10 @@ struct inode {
>  		unsigned		i_dir_seq;
>  	};
>  
> -	__u32			i_generation;
>  
>  #ifdef CONFIG_FSNOTIFY
>  	__u32			i_fsnotify_mask; /* all events this inode cares about */
> +	/* 32-bit hole reserved for expanding i_fsnotify_mask */
>  	struct fsnotify_mark_connector __rcu	*i_fsnotify_marks;
>  #endif
>  
> @@ -1544,23 +1548,27 @@ struct timespec64 inode_set_ctime_current(struct inode *inode);
>  
>  static inline time64_t inode_get_atime_sec(const struct inode *inode)
>  {
> -	return inode->__i_atime.tv_sec;
> +	return inode->i_atime_sec;
>  }
>  
>  static inline long inode_get_atime_nsec(const struct inode *inode)
>  {
> -	return inode->__i_atime.tv_nsec;
> +	return inode->i_atime_nsec;
>  }
>  
>  static inline struct timespec64 inode_get_atime(const struct inode *inode)
>  {
> -	return inode->__i_atime;
> +	struct timespec64 ts = { .tv_sec  = inode_get_atime_sec(inode),
> +				 .tv_nsec = inode_get_atime_nsec(inode) };
> +
> +	return ts;
>  }
>  
>  static inline struct timespec64 inode_set_atime_to_ts(struct inode *inode,
>  						      struct timespec64 ts)
>  {
> -	inode->__i_atime = ts;
> +	inode->i_atime_sec = ts.tv_sec;
> +	inode->i_atime_nsec = ts.tv_nsec;
>  	return ts;
>  }
>  
> @@ -1569,28 +1577,32 @@ static inline struct timespec64 inode_set_atime(struct inode *inode,
>  {
>  	struct timespec64 ts = { .tv_sec  = sec,
>  				 .tv_nsec = nsec };
> +
>  	return inode_set_atime_to_ts(inode, ts);
>  }
>  
>  static inline time64_t inode_get_mtime_sec(const struct inode *inode)
>  {
> -	return inode->__i_mtime.tv_sec;
> +	return inode->i_mtime_sec;
>  }
>  
>  static inline long inode_get_mtime_nsec(const struct inode *inode)
>  {
> -	return inode->__i_mtime.tv_nsec;
> +	return inode->i_mtime_nsec;
>  }
>  
>  static inline struct timespec64 inode_get_mtime(const struct inode *inode)
>  {
> -	return inode->__i_mtime;
> +	struct timespec64 ts = { .tv_sec  = inode_get_mtime_sec(inode),
> +				 .tv_nsec = inode_get_mtime_nsec(inode) };
> +	return ts;
>  }
>  
>  static inline struct timespec64 inode_set_mtime_to_ts(struct inode *inode,
>  						      struct timespec64 ts)
>  {
> -	inode->__i_mtime = ts;
> +	inode->i_mtime_sec = ts.tv_sec;
> +	inode->i_mtime_nsec = ts.tv_nsec;
>  	return ts;
>  }
>  
> @@ -1604,23 +1616,27 @@ static inline struct timespec64 inode_set_mtime(struct inode *inode,
>  
>  static inline time64_t inode_get_ctime_sec(const struct inode *inode)
>  {
> -	return inode->__i_ctime.tv_sec;
> +	return inode->i_ctime_sec;
>  }
>  
>  static inline long inode_get_ctime_nsec(const struct inode *inode)
>  {
> -	return inode->__i_ctime.tv_nsec;
> +	return inode->i_ctime_nsec;
>  }
>  
>  static inline struct timespec64 inode_get_ctime(const struct inode *inode)
>  {
> -	return inode->__i_ctime;
> +	struct timespec64 ts = { .tv_sec  = inode_get_ctime_sec(inode),
> +				 .tv_nsec = inode_get_ctime_nsec(inode) };
> +
> +	return ts;
>  }
>  
>  static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode,
>  						      struct timespec64 ts)
>  {
> -	inode->__i_ctime = ts;
> +	inode->i_ctime_sec = ts.tv_sec;
> +	inode->i_ctime_nsec = ts.tv_nsec;
>  	return ts;
>  }
>  
> 
> ---
> base-commit: 7ee332c9f12bc5b380e36919cd7d056592a7073f
> change-id: 20240517-amtime-68a7ebc76f45
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>
>
Christian Brauner May 21, 2024, 2:27 p.m. UTC | #5
On Fri, 17 May 2024 20:08:40 -0400, Jeff Layton wrote:
> Adjacent struct timespec64's pack poorly. Switch them to discrete
> integer fields for the seconds and nanoseconds. This shaves 8 bytes off
> struct inode with a garden-variety Fedora Kconfig on x86_64, but that
> also moves the i_lock into the previous cacheline, away from the fields
> it protects.
> 
> To remedy that, move i_generation above the i_lock, which moves the new
> 4-byte hole to just after the i_fsnotify_mask in my setup. Amir has
> plans to use that to expand the i_fsnotify_mask, so add a comment to
> that effect as well.
> 
> [...]

Let's see what the performance bot thing has to say...

---

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] fs: switch timespec64 fields in inode to discrete integers
      https://git.kernel.org/vfs/vfs/c/ad401d976810
David Sterba May 21, 2024, 4:08 p.m. UTC | #6
On Sat, May 18, 2024 at 06:48:30AM -0400, Jeff Layton wrote:
> On Sat, 2024-05-18 at 06:23 +0100, Matthew Wilcox wrote:
> > On Fri, May 17, 2024 at 08:08:40PM -0400, Jeff Layton wrote:
> > > For reference (according to pahole):
> > > 
> > >     Before:	/* size: 624, cachelines: 10, members: 53 */
> > >     After: 	/* size: 616, cachelines: 10, members: 56 */
> > 
> > Smaller is always better, but for a meaningful improvement, we'd want
> > to see more.  On my laptop running a Debian 6.6.15 kernel, I see:
> > 
> > inode_cache        11398  11475    640   25    4 : tunables    0    0    0 : slabdata    459    459      0
> > 
> > so there's 25 inodes per 4 pages.  Going down to 632 is still 25 per 4
> > pages.  At 628 bytes, we get 26 per 4 pages.  Ar 604 bytes, we're at 27.
> > And at 584 bytes, we get 28.
> > 
> > Of course, struct inode gets embedded in a lot of filesystem inodes.
> > xfs_inode         142562 142720   1024   32    8 : tunables    0    0    0 : slabdata   4460   4460      0
> > ext4_inode_cache      81     81   1184   27    8 : tunables    0    0    0 : slabdata      3      3      0
> > sock_inode_cache    2123   2223    832   39    8 : tunables    0    0    0 : slabdata     57     57      0
> > 
> > So any of them might cross a magic boundary where we suddenly get more
> > objects per slab.
> > 
> > Not trying to diss the work you've done here, just pointing out the
> > limits for anyone who's trying to do something similar.  Or maybe
> > inspire someone to do more reductions ;-)
> 
> Way to bust my bubble, Willy. ;-)
> 
> On a more serious note, I may be able to squeeze out another 4 bytes by
> moving __i_ctime to a single 8 byte word. It's never settable from
> userland, so we probably don't need the full range of times that a
> timespec64 gives us there. Shrinking that may also make the multigrain
> time rework simpler.
> 
> David Howells was also looking at removing the i_private field as well.
> Since these structs are usually embedded in a larger structure, it's
> not clear that we need that field. If we can make that work, it'll mean
> another 8 bytes goes away on 64-bit arches.
> 
> IOW, I think there may be some other opportunities for shrinkage in the
> future.

Incremental shrinking works well, we've managed to get btrfs_inode under
1024 bytes recently but it took several releases, removing, reordering
or otherwise optimizing the size. It's easier to focus on what's left
there than to take notes and assume all the other struct members that
could be optimized eventually.
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index b7b9b3b79acc..45e8766de7d8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -660,9 +660,13 @@  struct inode {
 	};
 	dev_t			i_rdev;
 	loff_t			i_size;
-	struct timespec64	__i_atime;
-	struct timespec64	__i_mtime;
-	struct timespec64	__i_ctime; /* use inode_*_ctime accessors! */
+	time64_t		i_atime_sec;
+	time64_t		i_mtime_sec;
+	time64_t		i_ctime_sec;
+	u32			i_atime_nsec;
+	u32			i_mtime_nsec;
+	u32			i_ctime_nsec;
+	u32			i_generation;
 	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
 	unsigned short          i_bytes;
 	u8			i_blkbits;
@@ -719,10 +723,10 @@  struct inode {
 		unsigned		i_dir_seq;
 	};
 
-	__u32			i_generation;
 
 #ifdef CONFIG_FSNOTIFY
 	__u32			i_fsnotify_mask; /* all events this inode cares about */
+	/* 32-bit hole reserved for expanding i_fsnotify_mask */
 	struct fsnotify_mark_connector __rcu	*i_fsnotify_marks;
 #endif
 
@@ -1544,23 +1548,27 @@  struct timespec64 inode_set_ctime_current(struct inode *inode);
 
 static inline time64_t inode_get_atime_sec(const struct inode *inode)
 {
-	return inode->__i_atime.tv_sec;
+	return inode->i_atime_sec;
 }
 
 static inline long inode_get_atime_nsec(const struct inode *inode)
 {
-	return inode->__i_atime.tv_nsec;
+	return inode->i_atime_nsec;
 }
 
 static inline struct timespec64 inode_get_atime(const struct inode *inode)
 {
-	return inode->__i_atime;
+	struct timespec64 ts = { .tv_sec  = inode_get_atime_sec(inode),
+				 .tv_nsec = inode_get_atime_nsec(inode) };
+
+	return ts;
 }
 
 static inline struct timespec64 inode_set_atime_to_ts(struct inode *inode,
 						      struct timespec64 ts)
 {
-	inode->__i_atime = ts;
+	inode->i_atime_sec = ts.tv_sec;
+	inode->i_atime_nsec = ts.tv_nsec;
 	return ts;
 }
 
@@ -1569,28 +1577,32 @@  static inline struct timespec64 inode_set_atime(struct inode *inode,
 {
 	struct timespec64 ts = { .tv_sec  = sec,
 				 .tv_nsec = nsec };
+
 	return inode_set_atime_to_ts(inode, ts);
 }
 
 static inline time64_t inode_get_mtime_sec(const struct inode *inode)
 {
-	return inode->__i_mtime.tv_sec;
+	return inode->i_mtime_sec;
 }
 
 static inline long inode_get_mtime_nsec(const struct inode *inode)
 {
-	return inode->__i_mtime.tv_nsec;
+	return inode->i_mtime_nsec;
 }
 
 static inline struct timespec64 inode_get_mtime(const struct inode *inode)
 {
-	return inode->__i_mtime;
+	struct timespec64 ts = { .tv_sec  = inode_get_mtime_sec(inode),
+				 .tv_nsec = inode_get_mtime_nsec(inode) };
+	return ts;
 }
 
 static inline struct timespec64 inode_set_mtime_to_ts(struct inode *inode,
 						      struct timespec64 ts)
 {
-	inode->__i_mtime = ts;
+	inode->i_mtime_sec = ts.tv_sec;
+	inode->i_mtime_nsec = ts.tv_nsec;
 	return ts;
 }
 
@@ -1604,23 +1616,27 @@  static inline struct timespec64 inode_set_mtime(struct inode *inode,
 
 static inline time64_t inode_get_ctime_sec(const struct inode *inode)
 {
-	return inode->__i_ctime.tv_sec;
+	return inode->i_ctime_sec;
 }
 
 static inline long inode_get_ctime_nsec(const struct inode *inode)
 {
-	return inode->__i_ctime.tv_nsec;
+	return inode->i_ctime_nsec;
 }
 
 static inline struct timespec64 inode_get_ctime(const struct inode *inode)
 {
-	return inode->__i_ctime;
+	struct timespec64 ts = { .tv_sec  = inode_get_ctime_sec(inode),
+				 .tv_nsec = inode_get_ctime_nsec(inode) };
+
+	return ts;
 }
 
 static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode,
 						      struct timespec64 ts)
 {
-	inode->__i_ctime = ts;
+	inode->i_ctime_sec = ts.tv_sec;
+	inode->i_ctime_nsec = ts.tv_nsec;
 	return ts;
 }