Message ID | OSZPR01MB7772841F20140ACC90AA433B88582@OSZPR01MB7772.jpnprd01.prod.outlook.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | nfs(5): Update rsize/wsize options | expand |
> On Nov 11, 2024, at 2:23 AM, Seiichi Ikarashi (Fujitsu) <s.ikarashi@fujitsu.com> wrote: > > The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE > or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). I think the behavior changed recently due to a kernel code change Anna did? That's my recollection. If you can identify that commit, it would be great information to add in the patch description here. > Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com> > --- > utils/mount/nfs.man | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man > index 233a717..01fa22c 100644 > --- a/utils/mount/nfs.man > +++ b/utils/mount/nfs.man > @@ -215,15 +215,18 @@ or smaller than the > setting. The largest read payload supported by the Linux NFS client > is 1,048,576 bytes (one megabyte). > .IP > -The > +The allowed > .B rsize > -value is a positive integral multiple of 1024. > +value is a positive integral multiple of > +.BR PAGE_SIZE , > +or a power of two if it is less than > +.BR PAGE_SIZE . > Specified > .B rsize > values lower than 1024 are replaced with 4096; values larger than > 1048576 are replaced with 1048576. If a specified value is within the supported > -range but not a multiple of 1024, it is rounded down to the nearest > -multiple of 1024. > +range but not such an allowed value, it is rounded down to the nearest > +allowed value. > .IP > If an > .B rsize > @@ -257,16 +260,19 @@ setting. The largest write payload supported by the Linux NFS client > is 1,048,576 bytes (one megabyte). > .IP > Similar to > -.B rsize > -, the > +.BR rsize , > +the allowed > .B wsize > -value is a positive integral multiple of 1024. > +value is a positive integral multiple of > +.BR PAGE_SIZE , > +or a power of two if it is less than > +.BR PAGE_SIZE . > Specified > .B wsize > values lower than 1024 are replaced with 4096; values larger than > 1048576 are replaced with 1048576. If a specified value is within the supported > -range but not a multiple of 1024, it is rounded down to the nearest > -multiple of 1024. > +range but not such an allowed value, it is rounded down to the nearest > +allowed value. > .IP > If a > .B wsize -- Chuck Lever
> > > On Nov 11, 2024, at 2:23 AM, Seiichi Ikarashi (Fujitsu) > <s.ikarashi@fujitsu.com> wrote: > > > > The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE > > or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). > > I think the behavior changed recently due to a kernel > code change Anna did? That's my recollection. > > If you can identify that commit, it would be great > information to add in the patch description here. I believe that your mentioned commits are Commit 940261a ("NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE") and Commit a60214c ("NFS: Allow very small rsize & wsize again"). Before 940261a, the values were "powers of 2". After a60214c, they are "multiples of PAGE_SIZE or powers of 2 if < PAGE_SIZE". I could not find the "multiples of 1024" implementation. Only the range capping was implemented until Linux v2.1.31, and the "powers of 2" era started from Linux v2.1.32. Regards, Ikarashi > > > > Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com> > > --- > > utils/mount/nfs.man | 24 +++++++++++++++--------- > > 1 file changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man > > index 233a717..01fa22c 100644 > > --- a/utils/mount/nfs.man > > +++ b/utils/mount/nfs.man > > @@ -215,15 +215,18 @@ or smaller than the > > setting. The largest read payload supported by the Linux NFS client > > is 1,048,576 bytes (one megabyte). > > .IP > > -The > > +The allowed > > .B rsize > > -value is a positive integral multiple of 1024. > > +value is a positive integral multiple of > > +.BR PAGE_SIZE , > > +or a power of two if it is less than > > +.BR PAGE_SIZE . > > Specified > > .B rsize > > values lower than 1024 are replaced with 4096; values larger than > > 1048576 are replaced with 1048576. If a specified value is within the > supported > > -range but not a multiple of 1024, it is rounded down to the nearest > > -multiple of 1024. > > +range but not such an allowed value, it is rounded down to the nearest > > +allowed value. > > .IP > > If an > > .B rsize > > @@ -257,16 +260,19 @@ setting. The largest write payload supported by the > Linux NFS client > > is 1,048,576 bytes (one megabyte). > > .IP > > Similar to > > -.B rsize > > -, the > > +.BR rsize , > > +the allowed > > .B wsize > > -value is a positive integral multiple of 1024. > > +value is a positive integral multiple of > > +.BR PAGE_SIZE , > > +or a power of two if it is less than > > +.BR PAGE_SIZE . > > Specified > > .B wsize > > values lower than 1024 are replaced with 4096; values larger than > > 1048576 are replaced with 1048576. If a specified value is within the > supported > > -range but not a multiple of 1024, it is rounded down to the nearest > > -multiple of 1024. > > +range but not such an allowed value, it is rounded down to the nearest > > +allowed value. > > .IP > > If a > > .B wsize > > -- > Chuck Lever >
On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu) <s.ikarashi@fujitsu.com> wrote: > > The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE > or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). *facepalm* How should this work at all in a heterogeneous environment where pagesizes can be 4k or 64k (ARM)? IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and warn if there is no exact match to a page size. Otherwise non-portable chaos rules. Sebi
On 12 Nov 2024, at 6:27, Sebastian Feld wrote: > On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu) > <s.ikarashi@fujitsu.com> wrote: >> >> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE >> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). > > *facepalm* > > How should this work at all in a heterogeneous environment where > pagesizes can be 4k or 64k (ARM)? > > IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and > warn if there is no exact match to a page size. Otherwise non-portable > chaos rules. I'm not following you - is "BIG" an acronym? Can you explain what you mean by non-portable chaos? I'm having trouble seeing the problem. Ben
On Tue, Nov 12, 2024 at 2:56 PM Benjamin Coddington <bcodding@redhat.com> wrote: > > On 12 Nov 2024, at 6:27, Sebastian Feld wrote: > > > On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu) > > <s.ikarashi@fujitsu.com> wrote: > >> > >> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE > >> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). > > > > *facepalm* > > > > How should this work at all in a heterogeneous environment where > > pagesizes can be 4k or 64k (ARM)? > > > > IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and > > warn if there is no exact match to a page size. Otherwise non-portable > > chaos rules. > > > I'm not following you - is "BIG" an acronym? I hit the wrong key. I wanted to write "BUG" > > Can you explain what you mean by non-portable chaos? I'm having trouble > seeing the problem. x86-only-world-view: There are other platforms like PowerPC or ARM which can have other page sizes, and even the default page size for a platform can vary. ARM can do 4k, 64k defaults, servers default to 64k, IOT machines to 4k. So this is NOT a documentation bug, it's a bug in the code which should do what the doc says. Not the other way around. This is a common design problem if engineers only think about x86-only, and then surprises admins if things go haywire because their assumption about page size is wrong. Sebi
On 12 Nov 2024, at 9:06, Sebastian Feld wrote: > On Tue, Nov 12, 2024 at 2:56 PM Benjamin Coddington <bcodding@redhat.com> wrote: >> >> On 12 Nov 2024, at 6:27, Sebastian Feld wrote: >> >>> On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu) >>> <s.ikarashi@fujitsu.com> wrote: >>>> >>>> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE >>>> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). >>> >>> *facepalm* >>> >>> How should this work at all in a heterogeneous environment where >>> pagesizes can be 4k or 64k (ARM)? >>> >>> IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and >>> warn if there is no exact match to a page size. Otherwise non-portable >>> chaos rules. >> >> >> I'm not following you - is "BIG" an acronym? > > I hit the wrong key. I wanted to write "BUG" > >> >> Can you explain what you mean by non-portable chaos? I'm having trouble >> seeing the problem. > > x86-only-world-view: There are other platforms like PowerPC or ARM > which can have other page sizes, and even the default page size for a > platform can vary. ARM can do 4k, 64k defaults, servers default to > 64k, IOT machines to 4k. > So this is NOT a documentation bug, it's a bug in the code which > should do what the doc says. Not the other way around. What's the bug in the code again? I'm still not seeing the bug. Why should the code set the max io read/write size to a multiple of 1024 instead of a multiple of the page size? Ben
On Tue, Nov 12, 2024 at 3:21 PM Benjamin Coddington <bcodding@redhat.com> wrote: > > On 12 Nov 2024, at 9:06, Sebastian Feld wrote: > > > On Tue, Nov 12, 2024 at 2:56 PM Benjamin Coddington <bcodding@redhat.com> wrote: > >> > >> On 12 Nov 2024, at 6:27, Sebastian Feld wrote: > >> > >>> On Mon, Nov 11, 2024 at 8:25 AM Seiichi Ikarashi (Fujitsu) > >>> <s.ikarashi@fujitsu.com> wrote: > >>>> > >>>> The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE > >>>> or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). > >>> > >>> *facepalm* > >>> > >>> How should this work at all in a heterogeneous environment where > >>> pagesizes can be 4k or 64k (ARM)? > >>> > >>> IMHO this is a BIG, rsize and wsize should count in 1024 bytes, and > >>> warn if there is no exact match to a page size. Otherwise non-portable > >>> chaos rules. > >> > >> > >> I'm not following you - is "BIG" an acronym? > > > > I hit the wrong key. I wanted to write "BUG" > > > >> > >> Can you explain what you mean by non-portable chaos? I'm having trouble > >> seeing the problem. > > > > x86-only-world-view: There are other platforms like PowerPC or ARM > > which can have other page sizes, and even the default page size for a > > platform can vary. ARM can do 4k, 64k defaults, servers default to > > 64k, IOT machines to 4k. > > So this is NOT a documentation bug, it's a bug in the code which > > should do what the doc says. Not the other way around. > > What's the bug in the code again? I'm still not seeing the bug. > > Why should the code set the max io read/write size to a multiple of 1024 > instead of a multiple of the page size? Because "pagesize" is a non-portable per-platform value? Sebi
On 12 Nov 2024, at 9:25, Sebastian Feld wrote:
> Because "pagesize" is a non-portable per-platform value?
ah. The code we're talking about is the linux kernel which is compiled for
the architecture and yes - not portable anyway.
Ben
On Mon, 11 Nov 2024 at 08:25, Seiichi Ikarashi (Fujitsu) <s.ikarashi@fujitsu.com> wrote: > > The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE > or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). This is an implementation bug, NOT a documentation bug. > > Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com> r- (patch rejected) The rsize/wsize value must not depend on a variable, per-machine value. For example SPARCv9 can use 8k, 32k, 512k and so on. AARCH64/ARM64 can use 4k or 64, all selectable at boot parameters. Better we fix the kernel code to count in 1k for rsize and wsize options. Only question is whether we "round up", or "round down" to the machine's page size. I shudder already at this stupid scenario: Entries in /etc/fstab can no longer be deployed via puppet, because a script must always use /usr/bin/pagesize to peel our the machine's default page size, do some calculations with /bin/bc and do a mount via script. Sarcastic bonus points go to the person who decided to put /usr/bin/pagesize into a separate package which is not installed by default in Debian+RH. Ced > --- > utils/mount/nfs.man | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man > index 233a717..01fa22c 100644 > --- a/utils/mount/nfs.man > +++ b/utils/mount/nfs.man > @@ -215,15 +215,18 @@ or smaller than the > setting. The largest read payload supported by the Linux NFS client > is 1,048,576 bytes (one megabyte). > .IP > -The > +The allowed > .B rsize > -value is a positive integral multiple of 1024. > +value is a positive integral multiple of > +.BR PAGE_SIZE , > +or a power of two if it is less than > +.BR PAGE_SIZE . > Specified > .B rsize > values lower than 1024 are replaced with 4096; values larger than > 1048576 are replaced with 1048576. If a specified value is within the supported > -range but not a multiple of 1024, it is rounded down to the nearest > -multiple of 1024. > +range but not such an allowed value, it is rounded down to the nearest > +allowed value. > .IP > If an > .B rsize > @@ -257,16 +260,19 @@ setting. The largest write payload supported by the Linux NFS client > is 1,048,576 bytes (one megabyte). > .IP > Similar to > -.B rsize > -, the > +.BR rsize , > +the allowed > .B wsize > -value is a positive integral multiple of 1024. > +value is a positive integral multiple of > +.BR PAGE_SIZE , > +or a power of two if it is less than > +.BR PAGE_SIZE . > Specified > .B wsize > values lower than 1024 are replaced with 4096; values larger than > 1048576 are replaced with 1048576. If a specified value is within the supported > -range but not a multiple of 1024, it is rounded down to the nearest > -multiple of 1024. > +range but not such an allowed value, it is rounded down to the nearest > +allowed value. > .IP > If a > .B wsize >
On Tue, 12 Nov 2024 at 15:40, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 12 Nov 2024, at 9:25, Sebastian Feld wrote: > > > Because "pagesize" is a non-portable per-platform value? > > ah. The code we're talking about is the linux kernel which is compiled for > the architecture and yes - not portable anyway. It has to be portable for an Administrator. NFS rsize and wsize should not depend on a machine's page size. Otherwise you cannot have such entries in /etc/fstab, instead an Administrator has to rely on /usr/bin/pagesize, /bin/bc and manual mount script just to pass the rsize+wsize in a portable manner. 100% not compatible to puppet and common cluster software, and even less portable for people using nfsroot. Ced
On 12 Nov 2024, at 13:52, Cedric Blancher wrote: > On Tue, 12 Nov 2024 at 15:40, Benjamin Coddington <bcodding@redhat.com> wrote: >> >> On 12 Nov 2024, at 9:25, Sebastian Feld wrote: >> >>> Because "pagesize" is a non-portable per-platform value? >> >> ah. The code we're talking about is the linux kernel which is compiled for >> the architecture and yes - not portable anyway. > > It has to be portable for an Administrator. NFS rsize and wsize should > not depend on a machine's page size. They don't, they're just optimized to the machine's page size. > Otherwise you cannot have such entries in /etc/fstab, instead an > Administrator has to rely on /usr/bin/pagesize, /bin/bc and manual > mount script just to pass the rsize+wsize in a portable manner. 100% > not compatible to puppet and common cluster software, and even less > portable for people using nfsroot. Puppet and common cluster software absolutely can dynamically generate fstab values based on machine types for this theoretical problem. Anyway, I'm sure reasonable patches will be accepted. Ben
diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man index 233a717..01fa22c 100644 --- a/utils/mount/nfs.man +++ b/utils/mount/nfs.man @@ -215,15 +215,18 @@ or smaller than the setting. The largest read payload supported by the Linux NFS client is 1,048,576 bytes (one megabyte). .IP -The +The allowed .B rsize -value is a positive integral multiple of 1024. +value is a positive integral multiple of +.BR PAGE_SIZE , +or a power of two if it is less than +.BR PAGE_SIZE . Specified .B rsize values lower than 1024 are replaced with 4096; values larger than 1048576 are replaced with 1048576. If a specified value is within the supported -range but not a multiple of 1024, it is rounded down to the nearest -multiple of 1024. +range but not such an allowed value, it is rounded down to the nearest +allowed value. .IP If an .B rsize @@ -257,16 +260,19 @@ setting. The largest write payload supported by the Linux NFS client is 1,048,576 bytes (one megabyte). .IP Similar to -.B rsize -, the +.BR rsize , +the allowed .B wsize -value is a positive integral multiple of 1024. +value is a positive integral multiple of +.BR PAGE_SIZE , +or a power of two if it is less than +.BR PAGE_SIZE . Specified .B wsize values lower than 1024 are replaced with 4096; values larger than 1048576 are replaced with 1048576. If a specified value is within the supported -range but not a multiple of 1024, it is rounded down to the nearest -multiple of 1024. +range but not such an allowed value, it is rounded down to the nearest +allowed value. .IP If a .B wsize
The rsize/wsize values are not multiples of 1024 but multiples of PAGE_SIZE or powers of 2 if < PAGE_SIZE as defined in fs/nfs/internal.h:nfs_io_size(). Signed-off-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com> --- utils/mount/nfs.man | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)