diff mbox series

nfs(5): Update rsize/wsize options

Message ID OSZPR01MB7772841F20140ACC90AA433B88582@OSZPR01MB7772.jpnprd01.prod.outlook.com (mailing list archive)
State New
Headers show
Series nfs(5): Update rsize/wsize options | expand

Commit Message

Seiichi Ikarashi (Fujitsu) Nov. 11, 2024, 7:23 a.m. UTC
The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().

Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com>
---
utils/mount/nfs.man | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

Comments

Chuck Lever III Nov. 11, 2024, 2:59 p.m. UTC | #1
> On Nov 11, 2024, at 2:23 AM, Seiichi Ikarashi (Fujitsu) <s.ikarashi@fujitsu.com> wrote:
> 
> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().

I think the behavior changed recently due to a kernel
code change Anna did? That's my recollection.

If you can identify that commit, it would be great
information to add in the patch description here.


> Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com>
> ---
> utils/mount/nfs.man | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
> index 233a717..01fa22c 100644
> --- a/utils/mount/nfs.man
> +++ b/utils/mount/nfs.man
> @@ -215,15 +215,18 @@ or smaller than the
> setting. The largest read payload supported by the Linux NFS client
> is 1,048,576 bytes (one megabyte).
> .IP
> -The
> +The allowed
> .B rsize
> -value is a positive integral multiple of 1024.
> +value is a positive integral multiple of
> +.BR PAGE_SIZE ,
> +or a power of two if it is less than
> +.BR PAGE_SIZE .
> Specified
> .B rsize
> values lower than 1024 are replaced with 4096; values larger than
> 1048576 are replaced with 1048576. If a specified value is within the supported
> -range but not a multiple of 1024, it is rounded down to the nearest
> -multiple of 1024.
> +range but not such an allowed value, it is rounded down to the nearest
> +allowed value.
> .IP
> If an
> .B rsize
> @@ -257,16 +260,19 @@ setting. The largest write payload supported by the Linux NFS client
> is 1,048,576 bytes (one megabyte).
> .IP
> Similar to
> -.B rsize
> -, the
> +.BR rsize ,
> +the allowed
> .B wsize
> -value is a positive integral multiple of 1024.
> +value is a positive integral multiple of
> +.BR PAGE_SIZE ,
> +or a power of two if it is less than
> +.BR PAGE_SIZE .
> Specified
> .B wsize
> values lower than 1024 are replaced with 4096; values larger than
> 1048576 are replaced with 1048576. If a specified value is within the supported
> -range but not a multiple of 1024, it is rounded down to the nearest
> -multiple of 1024.
> +range but not such an allowed value, it is rounded down to the nearest
> +allowed value.
> .IP
> If a
> .B wsize

--
Chuck Lever
Seiichi Ikarashi (Fujitsu) Nov. 12, 2024, 1:07 a.m. UTC | #2
> 
> > On Nov 11, 2024, at 2:23 AM, Seiichi Ikarashi (Fujitsu)
> <s.ikarashi@fujitsu.com> wrote:
> >
> > The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
> > or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().
> 
> I think the behavior changed recently due to a kernel
> code change Anna did? That's my recollection.
> 
> If you can identify that commit, it would be great
> information to add in the patch description here.

I believe that your mentioned commits are
Commit 940261a ("NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE") and
Commit a60214c ("NFS: Allow very small rsize & wsize again").
Before 940261a, the values were "powers of 2".
After a60214c, they are "multiples of PAGE_SIZE or powers of 2 if < PAGE_SIZE".

I could not find the "multiples of 1024" implementation.
Only the range capping was implemented until Linux v2.1.31,
and the "powers of 2" era started from Linux v2.1.32.

Regards,
Ikarashi


> 
> 
> > Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com>
> > ---
> > utils/mount/nfs.man | 24 +++++++++++++++---------
> > 1 file changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
> > index 233a717..01fa22c 100644
> > --- a/utils/mount/nfs.man
> > +++ b/utils/mount/nfs.man
> > @@ -215,15 +215,18 @@ or smaller than the
> > setting. The largest read payload supported by the Linux NFS client
> > is 1,048,576 bytes (one megabyte).
> > .IP
> > -The
> > +The allowed
> > .B rsize
> > -value is a positive integral multiple of 1024.
> > +value is a positive integral multiple of
> > +.BR PAGE_SIZE ,
> > +or a power of two if it is less than
> > +.BR PAGE_SIZE .
> > Specified
> > .B rsize
> > values lower than 1024 are replaced with 4096; values larger than
> > 1048576 are replaced with 1048576. If a specified value is within the
> supported
> > -range but not a multiple of 1024, it is rounded down to the nearest
> > -multiple of 1024.
> > +range but not such an allowed value, it is rounded down to the nearest
> > +allowed value.
> > .IP
> > If an
> > .B rsize
> > @@ -257,16 +260,19 @@ setting. The largest write payload supported by the
> Linux NFS client
> > is 1,048,576 bytes (one megabyte).
> > .IP
> > Similar to
> > -.B rsize
> > -, the
> > +.BR rsize ,
> > +the allowed
> > .B wsize
> > -value is a positive integral multiple of 1024.
> > +value is a positive integral multiple of
> > +.BR PAGE_SIZE ,
> > +or a power of two if it is less than
> > +.BR PAGE_SIZE .
> > Specified
> > .B wsize
> > values lower than 1024 are replaced with 4096; values larger than
> > 1048576 are replaced with 1048576. If a specified value is within the
> supported
> > -range but not a multiple of 1024, it is rounded down to the nearest
> > -multiple of 1024.
> > +range but not such an allowed value, it is rounded down to the nearest
> > +allowed value.
> > .IP
> > If a
> > .B wsize
> 
> --
> Chuck Lever
>
Sebastian Feld Nov. 12, 2024, 11:27 a.m. UTC | #3
On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu)
<s.ikarashi@fujitsu.com> wrote:
>
> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().

*facepalm*

How should this work at all in a heterogeneous environment where
pagesizes can be 4k or 64k (ARM)?

IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and
warn if there is no exact match to a page size. Otherwise non-portable
chaos rules.

Sebi
Benjamin Coddington Nov. 12, 2024, 1:56 p.m. UTC | #4
On 12 Nov 2024, at 6:27, Sebastian Feld wrote:

> On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu)
> <s.ikarashi@fujitsu.com> wrote:
>>
>> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
>> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().
>
> *facepalm*
>
> How should this work at all in a heterogeneous environment where
> pagesizes can be 4k or 64k (ARM)?
>
> IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and
> warn if there is no exact match to a page size. Otherwise non-portable
> chaos rules.


I'm not following you - is "BIG" an acronym?

Can you explain what you mean by non-portable chaos?  I'm having trouble
seeing the problem.

Ben
Sebastian Feld Nov. 12, 2024, 2:06 p.m. UTC | #5
On Tue, Nov 12, 2024 at 2:56 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 12 Nov 2024, at 6:27, Sebastian Feld wrote:
>
> > On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu)
> > <s.ikarashi@fujitsu.com> wrote:
> >>
> >> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
> >> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().
> >
> > *facepalm*
> >
> > How should this work at all in a heterogeneous environment where
> > pagesizes can be 4k or 64k (ARM)?
> >
> > IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and
> > warn if there is no exact match to a page size. Otherwise non-portable
> > chaos rules.
>
>
> I'm not following you - is "BIG" an acronym?

I hit the wrong key. I wanted to write "BUG"

>
> Can you explain what you mean by non-portable chaos?  I'm having trouble
> seeing the problem.

x86-only-world-view: There are other platforms like PowerPC or ARM
which can have other page sizes, and even the default page size for a
platform can vary. ARM can do 4k, 64k defaults, servers default to
64k, IOT machines to 4k.
So this is NOT a documentation bug, it's a bug in the code which
should do what the doc says. Not the other way around.

This is a common design problem if engineers only think about
x86-only, and then surprises admins if things go haywire because their
assumption about page size is wrong.

Sebi
Benjamin Coddington Nov. 12, 2024, 2:21 p.m. UTC | #6
On 12 Nov 2024, at 9:06, Sebastian Feld wrote:

> On Tue, Nov 12, 2024 at 2:56 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>>
>> On 12 Nov 2024, at 6:27, Sebastian Feld wrote:
>>
>>> On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu)
>>> <s.ikarashi@fujitsu.com> wrote:
>>>>
>>>> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
>>>> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().
>>>
>>> *facepalm*
>>>
>>> How should this work at all in a heterogeneous environment where
>>> pagesizes can be 4k or 64k (ARM)?
>>>
>>> IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and
>>> warn if there is no exact match to a page size. Otherwise non-portable
>>> chaos rules.
>>
>>
>> I'm not following you - is "BIG" an acronym?
>
> I hit the wrong key. I wanted to write "BUG"
>
>>
>> Can you explain what you mean by non-portable chaos?  I'm having trouble
>> seeing the problem.
>
> x86-only-world-view: There are other platforms like PowerPC or ARM
> which can have other page sizes, and even the default page size for a
> platform can vary. ARM can do 4k, 64k defaults, servers default to
> 64k, IOT machines to 4k.
> So this is NOT a documentation bug, it's a bug in the code which
> should do what the doc says. Not the other way around.

What's the bug in the code again?  I'm still not seeing the bug.

Why should the code set the max io read/write size to a multiple of 1024
instead of a multiple of the page size?

Ben
Sebastian Feld Nov. 12, 2024, 2:25 p.m. UTC | #7
On Tue, Nov 12, 2024 at 3:21 PM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 12 Nov 2024, at 9:06, Sebastian Feld wrote:
>
> > On Tue, Nov 12, 2024 at 2:56 PM Benjamin Coddington <bcodding@redhat.com> wrote:
> >>
> >> On 12 Nov 2024, at 6:27, Sebastian Feld wrote:
> >>
> >>> On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu)
> >>> <s.ikarashi@fujitsu.com> wrote:
> >>>>
> >>>> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
> >>>> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().
> >>>
> >>> *facepalm*
> >>>
> >>> How should this work at all in a heterogeneous environment where
> >>> pagesizes can be 4k or 64k (ARM)?
> >>>
> >>> IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and
> >>> warn if there is no exact match to a page size. Otherwise non-portable
> >>> chaos rules.
> >>
> >>
> >> I'm not following you - is "BIG" an acronym?
> >
> > I hit the wrong key. I wanted to write "BUG"
> >
> >>
> >> Can you explain what you mean by non-portable chaos?  I'm having trouble
> >> seeing the problem.
> >
> > x86-only-world-view: There are other platforms like PowerPC or ARM
> > which can have other page sizes, and even the default page size for a
> > platform can vary. ARM can do 4k, 64k defaults, servers default to
> > 64k, IOT machines to 4k.
> > So this is NOT a documentation bug, it's a bug in the code which
> > should do what the doc says. Not the other way around.
>
> What's the bug in the code again?  I'm still not seeing the bug.
>
> Why should the code set the max io read/write size to a multiple of 1024
> instead of a multiple of the page size?

Because "pagesize" is a non-portable per-platform value?

Sebi
Benjamin Coddington Nov. 12, 2024, 2:37 p.m. UTC | #8
On 12 Nov 2024, at 9:25, Sebastian Feld wrote:

> Because "pagesize" is a non-portable per-platform value?

ah.  The code we're talking about is the linux kernel which is compiled for
the architecture and yes - not portable anyway.

Ben
Cedric Blancher Nov. 12, 2024, 6:52 p.m. UTC | #9
On Mon, 11 Nov 2024 at 08:25, Seiichi Ikarashi (Fujitsu)
<s.ikarashi@fujitsu.com> wrote:
>
> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE
> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size().

This is an implementation bug, NOT a documentation bug.

>
> Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com>

r- (patch rejected)

The rsize/wsize value must not depend on a variable, per-machine
value. For example SPARCv9 can use 8k, 32k, 512k and so on.
AARCH64/ARM64 can use 4k or 64, all selectable at boot parameters.
Better we fix the kernel code to count in 1k for rsize and wsize options.
Only question is whether we "round up", or "round down" to the
machine's page size.

I shudder already at this stupid scenario: Entries in /etc/fstab can
no longer be deployed via puppet, because a script must always use
/usr/bin/pagesize to peel our the machine's default page size, do some
calculations with /bin/bc and do a mount via script.
Sarcastic bonus points go to the person who decided to put
/usr/bin/pagesize into a separate package which is not installed by
default in Debian+RH.

Ced

> ---
> utils/mount/nfs.man | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
> index 233a717..01fa22c 100644
> --- a/utils/mount/nfs.man
> +++ b/utils/mount/nfs.man
> @@ -215,15 +215,18 @@ or smaller than the
>  setting. The largest read payload supported by the Linux NFS client
>  is 1,048,576 bytes (one megabyte).
>  .IP
> -The
> +The allowed
>  .B rsize
> -value is a positive integral multiple of 1024.
> +value is a positive integral multiple of
> +.BR PAGE_SIZE ,
> +or a power of two if it is less than
> +.BR PAGE_SIZE .
>  Specified
>  .B rsize
>  values lower than 1024 are replaced with 4096; values larger than
>  1048576 are replaced with 1048576. If a specified value is within the supported
> -range but not a multiple of 1024, it is rounded down to the nearest
> -multiple of 1024.
> +range but not such an allowed value, it is rounded down to the nearest
> +allowed value.
>  .IP
>  If an
>  .B rsize
> @@ -257,16 +260,19 @@ setting. The largest write payload supported by the Linux NFS client
>  is 1,048,576 bytes (one megabyte).
>  .IP
>  Similar to
> -.B rsize
> -, the
> +.BR rsize ,
> +the allowed
>  .B wsize
> -value is a positive integral multiple of 1024.
> +value is a positive integral multiple of
> +.BR PAGE_SIZE ,
> +or a power of two if it is less than
> +.BR PAGE_SIZE .
>  Specified
>  .B wsize
>  values lower than 1024 are replaced with 4096; values larger than
>  1048576 are replaced with 1048576. If a specified value is within the supported
> -range but not a multiple of 1024, it is rounded down to the nearest
> -multiple of 1024.
> +range but not such an allowed value, it is rounded down to the nearest
> +allowed value.
>  .IP
>  If a
>  .B wsize
>
Cedric Blancher Nov. 12, 2024, 6:52 p.m. UTC | #10
On Tue, 12 Nov 2024 at 15:40, Benjamin Coddington <bcodding@redhat.com> wrote:
>
> On 12 Nov 2024, at 9:25, Sebastian Feld wrote:
>
> > Because "pagesize" is a non-portable per-platform value?
>
> ah.  The code we're talking about is the linux kernel which is compiled for
> the architecture and yes - not portable anyway.

It has to be portable for an Administrator. NFS rsize and wsize should
not depend on a machine's page size.

Otherwise you cannot have such entries in /etc/fstab, instead an
Administrator has to rely on /usr/bin/pagesize, /bin/bc and manual
mount script just to pass the rsize+wsize in a portable manner. 100%
not compatible to puppet and common cluster software, and even less
portable for people using nfsroot.

Ced
Benjamin Coddington Nov. 12, 2024, 10:05 p.m. UTC | #11
On 12 Nov 2024, at 13:52, Cedric Blancher wrote:

> On Tue, 12 Nov 2024 at 15:40, Benjamin Coddington <bcodding@redhat.com> wrote:
>>
>> On 12 Nov 2024, at 9:25, Sebastian Feld wrote:
>>
>>> Because "pagesize" is a non-portable per-platform value?
>>
>> ah.  The code we're talking about is the linux kernel which is compiled for
>> the architecture and yes - not portable anyway.
>
> It has to be portable for an Administrator. NFS rsize and wsize should
> not depend on a machine's page size.

They don't, they're just optimized to the machine's page size.

> Otherwise you cannot have such entries in /etc/fstab, instead an
> Administrator has to rely on /usr/bin/pagesize, /bin/bc and manual
> mount script just to pass the rsize+wsize in a portable manner. 100%
> not compatible to puppet and common cluster software, and even less
> portable for people using nfsroot.

Puppet and common cluster software absolutely can dynamically generate fstab
values based on machine types for this theoretical problem.

Anyway, I'm sure reasonable patches will be accepted.

Ben
diff mbox series

Patch

diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
index 233a717..01fa22c 100644
--- a/utils/mount/nfs.man
+++ b/utils/mount/nfs.man
@@ -215,15 +215,18 @@  or smaller than the
 setting. The largest read payload supported by the Linux NFS client
 is 1,048,576 bytes (one megabyte).
 .IP
-The
+The allowed
 .B rsize
-value is a positive integral multiple of 1024.
+value is a positive integral multiple of
+.BR PAGE_SIZE ,
+or a power of two if it is less than
+.BR PAGE_SIZE .
 Specified
 .B rsize
 values lower than 1024 are replaced with 4096; values larger than
 1048576 are replaced with 1048576. If a specified value is within the supported
-range but not a multiple of 1024, it is rounded down to the nearest
-multiple of 1024.
+range but not such an allowed value, it is rounded down to the nearest
+allowed value.
 .IP
 If an
 .B rsize
@@ -257,16 +260,19 @@  setting. The largest write payload supported by the Linux NFS client
 is 1,048,576 bytes (one megabyte).
 .IP
 Similar to
-.B rsize
-, the
+.BR rsize ,
+the allowed
 .B wsize
-value is a positive integral multiple of 1024.
+value is a positive integral multiple of
+.BR PAGE_SIZE ,
+or a power of two if it is less than
+.BR PAGE_SIZE .
 Specified
 .B wsize
 values lower than 1024 are replaced with 4096; values larger than
 1048576 are replaced with 1048576. If a specified value is within the supported
-range but not a multiple of 1024, it is rounded down to the nearest
-multiple of 1024.
+range but not such an allowed value, it is rounded down to the nearest
+allowed value.
 .IP
 If a
 .B wsize