diff mbox series

nfs: round down reported block numbers in statfs

Message ID 20200910200644.8165-1-fllinden@amazon.com (mailing list archive)
State New, archived
Headers show
Series nfs: round down reported block numbers in statfs | expand

Commit Message

Frank van der Linden Sept. 10, 2020, 8:06 p.m. UTC
nfs_statfs rounds up the numbers of blocks as computed
from the numbers the server return values.

This works out well if the client block size, which is
the same as wrsize, is smaller than or equal to the actual
filesystem block size on the server.

But, for NFS4, the size is usually larger (1M), meaning
that statfs reports more free space than actually is
available. This confuses, for example, fstest generic/103.

Given a choice between reporting too much or too little
space, the latter is the safer option, so don't round
up the number of blocks. This also simplifies the code.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/super.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

Comments

Frank van der Linden Sept. 10, 2020, 8:31 p.m. UTC | #1
On Thu, Sep 10, 2020 at 08:06:44PM +0000, Frank van der Linden wrote:
> nfs_statfs rounds up the numbers of blocks as computed
> from the numbers the server return values.
> 
> This works out well if the client block size, which is
> the same as wrsize, is smaller than or equal to the actual
> filesystem block size on the server.
> 
> But, for NFS4, the size is usually larger (1M), meaning
> that statfs reports more free space than actually is
> available. This confuses, for example, fstest generic/103.
> 
> Given a choice between reporting too much or too little
> space, the latter is the safer option, so don't round
> up the number of blocks. This also simplifies the code.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>

I doubted whether I should send this in as an RFC, since this
is one of those things that might generate more discussion.

In any case, let me add some details here:

generic/103 is a test that sees if adding an xattr to a full
filesystem correctly returns ENOSPC. To achieve that, it gets
the number of free blocks (f_bavail), uses fallocate to allocate that
space minus some slack (512k), and then fills it up with 64k-sized
xattrs.

For NFS (4.2) this fails, because the filesystem rounds the free
blocks up to f_bsize (1M). So even f_bavail minus 512k can
be more than is actually available. The fallocate fails, and
the test fails before it gets to the xattr part.

Other client implementations simply use the lowest common
denominator for f_bsize (512), so the space reporting always
works out. But since wrsize is used here, you have to make
a choice between rounding up or rounding down, and the latter
seems safer.

Sure, all the caveats apply: the stats are just a snapshot, applications
shouldn't rely on the exact data, etc, but I think the xfstest
in question is a decent example of why rounding down is a bit better.

It also simplifies the code.

- Frank
kernel test robot Sept. 11, 2020, 4:32 a.m. UTC | #2
Hi Frank,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on nfs/linux-next]
[also build test ERROR on v5.9-rc4 next-20200910]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Frank-van-der-Linden/nfs-round-down-reported-block-numbers-in-statfs/20200911-040903
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: microblaze-nommu_defconfig (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=microblaze 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   microblaze-linux-ld: fs/nfs/super.o: in function `nfs_statfs':
>> (.text+0x140): undefined reference to `__udivdi3'
>> microblaze-linux-ld: (.text+0x168): undefined reference to `__udivdi3'
   microblaze-linux-ld: (.text+0x190): undefined reference to `__udivdi3'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot Sept. 11, 2020, 4:36 a.m. UTC | #3
Hi Frank,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on nfs/linux-next]
[also build test ERROR on v5.9-rc4 next-20200910]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Frank-van-der-Linden/nfs-round-down-reported-block-numbers-in-statfs/20200911-040903
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: riscv-rv32_defconfig (attached as .config)
compiler: riscv32-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv32-linux-ld: fs/nfs/super.o: in function `nfs_statfs':
   super.c:(.text+0x204): undefined reference to `__udivdi3'
>> riscv32-linux-ld: super.c:(.text+0x220): undefined reference to `__udivdi3'
   riscv32-linux-ld: super.c:(.text+0x23c): undefined reference to `__udivdi3'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Frank van der Linden Sept. 11, 2020, 9:59 p.m. UTC | #4
On Fri, Sep 11, 2020 at 12:36:31PM +0800, kernel test robot wrote:
> 
> 
> Hi Frank,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on nfs/linux-next]
> [also build test ERROR on v5.9-rc4 next-20200910]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Frank-van-der-Linden/nfs-round-down-reported-block-numbers-in-statfs/20200911-040903
> base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
> config: riscv-rv32_defconfig (attached as .config)
> compiler: riscv32-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=riscv
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>    riscv32-linux-ld: fs/nfs/super.o: in function `nfs_statfs':
>    super.c:(.text+0x204): undefined reference to `__udivdi3'
> >> riscv32-linux-ld: super.c:(.text+0x220): undefined reference to `__udivdi3'
>    riscv32-linux-ld: super.c:(.text+0x23c): undefined reference to `__udivdi3'
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Ah yeah, forgot about 64bit divides and 32bit archs..

Will send v2 with div64_u64.

- Frank
Trond Myklebust Sept. 16, 2020, 3:14 p.m. UTC | #5
On Thu, 2020-09-10 at 20:06 +0000, Frank van der Linden wrote:
> nfs_statfs rounds up the numbers of blocks as computed
> from the numbers the server return values.
> 
> This works out well if the client block size, which is
> the same as wrsize, is smaller than or equal to the actual
> filesystem block size on the server.
> 
> But, for NFS4, the size is usually larger (1M), meaning
> that statfs reports more free space than actually is
> available. This confuses, for example, fstest generic/103.
> 
> Given a choice between reporting too much or too little
> space, the latter is the safer option, so don't round
> up the number of blocks. This also simplifies the code.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
>  fs/nfs/super.c | 25 ++++---------------------
>  1 file changed, 4 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 7a70287f21a2..45d368a5cc95 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -217,8 +217,6 @@ EXPORT_SYMBOL_GPL(nfs_client_for_each_server);
>  int nfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>  	struct nfs_server *server = NFS_SB(dentry->d_sb);
> -	unsigned char blockbits;
> -	unsigned long blockres;
>  	struct nfs_fh *fh = NFS_FH(d_inode(dentry));
>  	struct nfs_fsstat res;
>  	int error = -ENOMEM;
> @@ -241,26 +239,11 @@ int nfs_statfs(struct dentry *dentry, struct
> kstatfs *buf)
>  
>  	buf->f_type = NFS_SUPER_MAGIC;
>  
> -	/*
> -	 * Current versions of glibc do not correctly handle the
> -	 * case where f_frsize != f_bsize.  Eventually we want to
> -	 * report the value of wtmult in this field.
> -	 */
> -	buf->f_frsize = dentry->d_sb->s_blocksize;

NACK. This comment and explicit setting is there to document why we're
not propagating the true value of the filesystem block size. Please do
not remove it.

> -
> -	/*
> -	 * On most *nix systems, f_blocks, f_bfree, and f_bavail
> -	 * are reported in units of f_frsize.  Linux hasn't had
> -	 * an f_frsize field in its statfs struct until recently,
> -	 * thus historically Linux's sys_statfs reports these
> -	 * fields in units of f_bsize.
> -	 */
>  	buf->f_bsize = dentry->d_sb->s_blocksize;
> -	blockbits = dentry->d_sb->s_blocksize_bits;
> -	blockres = (1 << blockbits) - 1;
> -	buf->f_blocks = (res.tbytes + blockres) >> blockbits;
> -	buf->f_bfree = (res.fbytes + blockres) >> blockbits;
> -	buf->f_bavail = (res.abytes + blockres) >> blockbits;
> +
> +	buf->f_blocks = res.tbytes / buf->f_bsize;
> +	buf->f_bfree = res.fbytes / buf->f_bsize;
> +	buf->f_bavail = res.abytes / buf->f_bsize;
>  
>  	buf->f_files = res.tfiles;
>  	buf->f_ffree = res.afiles;

As far as I can tell, all we're doing here is changing rounding up to
rounding down, since dentry->d_sb->s_blocksize should always equal to
(1 << dentry->d_sb->s_blocksize_bits).

Otherwise, switching from a shift to division seems like it is just
making the calculation less efficient. Why?
Frank van der Linden Sept. 16, 2020, 4:35 p.m. UTC | #6
On Wed, Sep 16, 2020 at 03:14:52PM +0000, Trond Myklebust wrote:
> On Thu, 2020-09-10 at 20:06 +0000, Frank van der Linden wrote:
> > nfs_statfs rounds up the numbers of blocks as computed
> > from the numbers the server return values.
> >
> > This works out well if the client block size, which is
> > the same as wrsize, is smaller than or equal to the actual
> > filesystem block size on the server.
> >
> > But, for NFS4, the size is usually larger (1M), meaning
> > that statfs reports more free space than actually is
> > available. This confuses, for example, fstest generic/103.
> >
> > Given a choice between reporting too much or too little
> > space, the latter is the safer option, so don't round
> > up the number of blocks. This also simplifies the code.
> >
> > Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> > ---
> >  fs/nfs/super.c | 25 ++++---------------------
> >  1 file changed, 4 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> > index 7a70287f21a2..45d368a5cc95 100644
> > --- a/fs/nfs/super.c
> > +++ b/fs/nfs/super.c
> > @@ -217,8 +217,6 @@ EXPORT_SYMBOL_GPL(nfs_client_for_each_server);
> >  int nfs_statfs(struct dentry *dentry, struct kstatfs *buf)
> >  {
> >       struct nfs_server *server = NFS_SB(dentry->d_sb);
> > -     unsigned char blockbits;
> > -     unsigned long blockres;
> >       struct nfs_fh *fh = NFS_FH(d_inode(dentry));
> >       struct nfs_fsstat res;
> >       int error = -ENOMEM;
> > @@ -241,26 +239,11 @@ int nfs_statfs(struct dentry *dentry, struct
> > kstatfs *buf)
> >
> >       buf->f_type = NFS_SUPER_MAGIC;
> >
> > -     /*
> > -      * Current versions of glibc do not correctly handle the
> > -      * case where f_frsize != f_bsize.  Eventually we want to
> > -      * report the value of wtmult in this field.
> > -      */
> > -     buf->f_frsize = dentry->d_sb->s_blocksize;
> 
> NACK. This comment and explicit setting is there to document why we're
> not propagating the true value of the filesystem block size. Please do
> not remove it.

It's a comment from 2006, which is a bit outdated. wtmult is an NFSv3
value, and glibc hasn't had that issue for a while. Maybe the comment
should be updated? I'm happy to not touch it, though - will leave that
change out.

> 
> > -
> > -     /*
> > -      * On most *nix systems, f_blocks, f_bfree, and f_bavail
> > -      * are reported in units of f_frsize.  Linux hasn't had
> > -      * an f_frsize field in its statfs struct until recently,
> > -      * thus historically Linux's sys_statfs reports these
> > -      * fields in units of f_bsize.
> > -      */
> >       buf->f_bsize = dentry->d_sb->s_blocksize;
> > -     blockbits = dentry->d_sb->s_blocksize_bits;
> > -     blockres = (1 << blockbits) - 1;
> > -     buf->f_blocks = (res.tbytes + blockres) >> blockbits;
> > -     buf->f_bfree = (res.fbytes + blockres) >> blockbits;
> > -     buf->f_bavail = (res.abytes + blockres) >> blockbits;
> > +
> > +     buf->f_blocks = res.tbytes / buf->f_bsize;
> > +     buf->f_bfree = res.fbytes / buf->f_bsize;
> > +     buf->f_bavail = res.abytes / buf->f_bsize;
> >
> >       buf->f_files = res.tfiles;
> >       buf->f_ffree = res.afiles;
> 
> As far as I can tell, all we're doing here is changing rounding up to
> rounding down, since dentry->d_sb->s_blocksize should always equal to
> (1 << dentry->d_sb->s_blocksize_bits).
> 
> Otherwise, switching from a shift to division seems like it is just
> making the calculation less efficient. Why?

Well, my thinking here was that using a straight division made the code
simpler, and it's not a code path that is performance-sensitive.

But, I forgot about needing a div64 macro for 32bit archs anyway, which
kind of undoes the 'make the code simpler' argument a bit..

I'll change it to just rounding down (e.g. remove blockres from the
equation).

Thanks,

- Frank
diff mbox series

Patch

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7a70287f21a2..45d368a5cc95 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -217,8 +217,6 @@  EXPORT_SYMBOL_GPL(nfs_client_for_each_server);
 int nfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct nfs_server *server = NFS_SB(dentry->d_sb);
-	unsigned char blockbits;
-	unsigned long blockres;
 	struct nfs_fh *fh = NFS_FH(d_inode(dentry));
 	struct nfs_fsstat res;
 	int error = -ENOMEM;
@@ -241,26 +239,11 @@  int nfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 	buf->f_type = NFS_SUPER_MAGIC;
 
-	/*
-	 * Current versions of glibc do not correctly handle the
-	 * case where f_frsize != f_bsize.  Eventually we want to
-	 * report the value of wtmult in this field.
-	 */
-	buf->f_frsize = dentry->d_sb->s_blocksize;
-
-	/*
-	 * On most *nix systems, f_blocks, f_bfree, and f_bavail
-	 * are reported in units of f_frsize.  Linux hasn't had
-	 * an f_frsize field in its statfs struct until recently,
-	 * thus historically Linux's sys_statfs reports these
-	 * fields in units of f_bsize.
-	 */
 	buf->f_bsize = dentry->d_sb->s_blocksize;
-	blockbits = dentry->d_sb->s_blocksize_bits;
-	blockres = (1 << blockbits) - 1;
-	buf->f_blocks = (res.tbytes + blockres) >> blockbits;
-	buf->f_bfree = (res.fbytes + blockres) >> blockbits;
-	buf->f_bavail = (res.abytes + blockres) >> blockbits;
+
+	buf->f_blocks = res.tbytes / buf->f_bsize;
+	buf->f_bfree = res.fbytes / buf->f_bsize;
+	buf->f_bavail = res.abytes / buf->f_bsize;
 
 	buf->f_files = res.tfiles;
 	buf->f_ffree = res.afiles;