Message ID | 1528801890-25839-1-git-send-email-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue 12-06-18 14:11:30, Amir Goldstein wrote: > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > > Linus, > > I tried to find some space to cram i_generation in case > I get rid of i_fsnotify_mask, but stumbled on a possible > "lossless compression". > > Any caveats with this change? One concern I'd have is that i_blkbits is used relatively frequently (e.g. for block device access) and 1-byte fetch can be slower than 4-byte fetch on some architectures if I'm not mistaken. I'm not sure how big problem that would be through. But if you're looking at reducing struct inode size, it appears we can save one long by reorganizing struct address_space - there's currently 4-byte hole after i_mmap_writeable and after wb_err. Honza > include/linux/fs.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 760d8da1b6c7..6d0489613dc1 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -274,6 +274,7 @@ struct writeback_control; > > /* > * Write life time hint values. > + * Stored in struct inode as u8. > */ > enum rw_hint { > WRITE_LIFE_NOT_SET = 0, > @@ -607,8 +608,8 @@ struct inode { > struct timespec i_ctime; > spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ > unsigned short i_bytes; > - unsigned int i_blkbits; > - enum rw_hint i_write_hint; > + u8 i_blkbits; > + u8 i_write_hint; > blkcnt_t i_blocks; > > #ifdef __NEED_I_SIZE_ORDERED > -- > 2.7.4 >
On Tue, Jun 12, 2018 at 04:24:20PM +0200, Jan Kara wrote: > On Tue 12-06-18 14:11:30, Amir Goldstein wrote: > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > > > Linus, > > > > I tried to find some space to cram i_generation in case > > I get rid of i_fsnotify_mask, but stumbled on a possible > > "lossless compression". > > > > Any caveats with this change? > > One concern I'd have is that i_blkbits is used relatively frequently (e.g. > for block device access) and 1-byte fetch can be slower than 4-byte fetch > on some architectures if I'm not mistaken. I'm not sure how big problem > that would be through. > > But if you're looking at reducing struct inode size, it appears we can save > one long by reorganizing struct address_space - there's currently 4-byte > hole after i_mmap_writeable and after wb_err. I already have something along those lines in the XArray series. http://git.infradead.org/users/willy/linux-dax.git/commitdiff/527c1696148fd3044866dd56991bbb51c24c0d7b
On Tue, Jun 12, 2018 at 5:24 PM, Jan Kara <jack@suse.cz> wrote: > On Tue 12-06-18 14:11:30, Amir Goldstein wrote: >> Signed-off-by: Amir Goldstein <amir73il@gmail.com> >> --- >> >> Linus, >> >> I tried to find some space to cram i_generation in case >> I get rid of i_fsnotify_mask, but stumbled on a possible >> "lossless compression". >> >> Any caveats with this change? > > One concern I'd have is that i_blkbits is used relatively frequently (e.g. > for block device access) and 1-byte fetch can be slower than 4-byte fetch > on some architectures if I'm not mistaken. I'm not sure how big problem > that would be through. That's what I suspected. However, if using 1-byte access for i_write_hint is non controversial(?) then by re-arranging i_write_hint, I get the 4-byte hole that I need to move i_generation into (in place of i_write_hint): unsigned short i_bytes; + u8 i_write_hint; unsigned int i_blkbits; - enum rw_hint i_write_hint; blkcnt_t i_blocks; > > But if you're looking at reducing struct inode size, it appears we can save > one long by reorganizing struct address_space - there's currently 4-byte > hole after i_mmap_writeable and after wb_err. > That is not my goal, I just wanted to ask a naiive question, but seems that Matthew already has the struct address_space hole covered.. Thanks, Amir.
On Tue, Jun 12, 2018 at 7:38 AM Amir Goldstein <amir73il@gmail.com> wrote: > > That's what I suspected. However, if using 1-byte access for i_write_hint > is non controversial(?) then by re-arranging i_write_hint, I get the 4-byte > hole that I need to move i_generation into (in place of i_write_hint): > > unsigned short i_bytes; > + u8 i_write_hint; > unsigned int i_blkbits; > - enum rw_hint i_write_hint; > blkcnt_t i_blocks; Yeah, we shouldn't use 'enum' in structs anyway, since the size isn't well-defined. It depends on compiler flags etc. But using "__attribute__ ((__packed__))" for an enum like this is actually fine. Then you still get the type checking (not that there's much of it for enums, really), but I think gcc will act as if "-fshort-enums" was passed for that one case. In this case, I think "u8" is fine. And yes, byte accesses could be a cycle longer due to extra needed zero extension (or masking/shifting, for crazy architectures), but honestly, we'll never see that. Making 'struct inode' smaller is likely a bigger win (even if it might not be very noticeable _either, since allocation issues etc might make it not very noticeable). Linus
On Tue, Jun 12, 2018 at 7:31 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Jun 12, 2018 at 7:38 AM Amir Goldstein <amir73il@gmail.com> wrote: >> >> That's what I suspected. However, if using 1-byte access for i_write_hint >> is non controversial(?) then by re-arranging i_write_hint, I get the 4-byte >> hole that I need to move i_generation into (in place of i_write_hint): >> >> unsigned short i_bytes; >> + u8 i_write_hint; >> unsigned int i_blkbits; >> - enum rw_hint i_write_hint; >> blkcnt_t i_blocks; > > Yeah, we shouldn't use 'enum' in structs anyway, since the size isn't > well-defined. It depends on compiler flags etc. > > But using "__attribute__ ((__packed__))" for an enum like this is > actually fine. Then you still get the type checking (not that there's > much of it for enums, really), but I think gcc will act as if > "-fshort-enums" was passed for that one case. > > In this case, I think "u8" is fine. And yes, byte accesses could be a > cycle longer due to extra needed zero extension (or masking/shifting, > for crazy architectures), but honestly, we'll never see that. Making > 'struct inode' smaller is likely a bigger win (even if it might not be > very noticeable _either, since allocation issues etc might make it not > very noticeable). > Just to make sure I understand correctly, do you agree with Jan w.r.t original patch that we should leave i_blkbits int? Or do you think that shaving 8 bytes off of struct inode for both 64bit and 32bit arch is a bigger win? If latter, then I guess you can just take the patch as is. Thanks, Amir.
On Tue, Jun 12, 2018 at 10:50 AM Amir Goldstein <amir73il@gmail.com> wrote: > > Just to make sure I understand correctly, do you agree with Jan > w.r.t original patch that we should leave i_blkbits int? I think we could make i_blkbits be "u8" too. It really isn't that slow to load a byte, and i_blkbits isn't that critical. In fact, most uses of "i_blkbits" are (not surprisingly) for shifting stuff, which means that on at least x86, you don't even have any zero-extension issues. The x86 shift instructions only look at the low 6 bits of the count anyway, and any competent compiler will know that and could skip the zero extension if it's a problem. Of course, it so happens that gcc seems to use "movzbl mem,%ecx" in my trivial test, which is either because gcc is stupid or because gcc is smart knows that movzbl is actually mostly faster than doing "movb mem,%cl" (possibly due to partial register write stalls, for example). Linus
diff --git a/include/linux/fs.h b/include/linux/fs.h index 760d8da1b6c7..6d0489613dc1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -274,6 +274,7 @@ struct writeback_control; /* * Write life time hint values. + * Stored in struct inode as u8. */ enum rw_hint { WRITE_LIFE_NOT_SET = 0, @@ -607,8 +608,8 @@ struct inode { struct timespec i_ctime; spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */ unsigned short i_bytes; - unsigned int i_blkbits; - enum rw_hint i_write_hint; + u8 i_blkbits; + u8 i_write_hint; blkcnt_t i_blocks; #ifdef __NEED_I_SIZE_ORDERED
Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- Linus, I tried to find some space to cram i_generation in case I get rid of i_fsnotify_mask, but stumbled on a possible "lossless compression". Any caveats with this change? Thanks, Amir. include/linux/fs.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)