mbox series

[RFC,v3,0/6] Intruduce nfsrahead

Message ID 20220323201841.4166549-1-tbecker@redhat.com (mailing list archive)
Headers show
Series Intruduce nfsrahead | expand

Message

Thiago Becker March 23, 2022, 8:18 p.m. UTC
Recent changes in the linux kernel caused NFS readahead to default to
128 from the previous default of 15 * rsize. This causes performance
penalties to some read-heavy workloads, which can be fixed by
tuning the readahead for that given mount.

Specifically, the read troughput on a sec=krb5p mount drops by 50-75%
when comparing the default readahead with a readahead of 15360.

Previous discussions:
https://lore.kernel.org/linux-nfs/20210803130717.2890565-1-trbecker@gmail.com/
I attempted to add a non-kernel option to mount.nfs, and it was
rejected.

https://lore.kernel.org/linux-nfs/20210811171402.947156-1-trbecker@gmail.com/
Attempted to add a mount option to the kernel, rejected as well.

I had started a separate tool to set the readahead of BDIs, but the
scope is specifically for NFS, so I would like to get the community
feeling for having this in nfs-utils.

This patch series introduces nfs-readahead-udev, a utility to
automatically set NFS readahead when NFS is mounted. The utility is
triggered by udev when a new BDI is added, returns to udev the value of
the readahead that should be used.

The tool currently supports setting read ahead per mountpoint, nfs major
version, or by a global default value.

v2:
    - explain the motivation

v3:
    - adopt already available facilities
    - nfsrahead is now configured in nfs.conf

Thiago Becker (6):
  Create nfsrahead
  nfsrahead: configure udev
  nfsrahead: only set readahead for nfs devices.
  nfsrahead: add logging
  hfsrahead: get the information from the config file.
  nfsrahead: User documentation

 .gitignore                          |   2 +
 configure.ac                        |   1 +
 systemd/nfs.conf.man                |  11 ++
 tools/Makefile.am                   |   2 +-
 tools/nfsrahead/99-nfs_bdi.rules.in |   1 +
 tools/nfsrahead/Makefile.am         |  15 +++
 tools/nfsrahead/main.c              | 179 ++++++++++++++++++++++++++++
 tools/nfsrahead/nfsrahead.man       |  72 +++++++++++
 8 files changed, 282 insertions(+), 1 deletion(-)
 create mode 100644 tools/nfsrahead/99-nfs_bdi.rules.in
 create mode 100644 tools/nfsrahead/Makefile.am
 create mode 100644 tools/nfsrahead/main.c
 create mode 100644 tools/nfsrahead/nfsrahead.man

Comments

Matthew Wilcox March 23, 2022, 9:32 p.m. UTC | #1
On Wed, Mar 23, 2022 at 05:18:35PM -0300, Thiago Becker wrote:
> Recent changes in the linux kernel caused NFS readahead to default to
> 128 from the previous default of 15 * rsize. This causes performance
> penalties to some read-heavy workloads, which can be fixed by
> tuning the readahead for that given mount.

Which recent changes?  Something in NFS or something in the VFS/MM?
Did you even think about asking a wider audience than the NFS mailing
list?  I only happened to notice this while I was looking for something
else, otherwise I would never have seen it.  The responses from other
people to your patches were right; you're trying to do this all wrong.

Let's start out with a bug report instead of a solution.  What changed
and when?
Trond Myklebust March 23, 2022, 9:58 p.m. UTC | #2
On Wed, 2022-03-23 at 21:32 +0000, Matthew Wilcox wrote:
> On Wed, Mar 23, 2022 at 05:18:35PM -0300, Thiago Becker wrote:
> > Recent changes in the linux kernel caused NFS readahead to default
> > to
> > 128 from the previous default of 15 * rsize. This causes
> > performance
> > penalties to some read-heavy workloads, which can be fixed by
> > tuning the readahead for that given mount.
> 
> Which recent changes?  Something in NFS or something in the VFS/MM?
> Did you even think about asking a wider audience than the NFS mailing
> list?  I only happened to notice this while I was looking for
> something
> else, otherwise I would never have seen it.  The responses from other
> people to your patches were right; you're trying to do this all
> wrong.
> 
> Let's start out with a bug report instead of a solution.  What
> changed
> and when?

I believe Thiago is talking about the changes introduced by commit
c128e575514c "NFS: Optimise the default readahead size" (i.e. we're
talking about Linux 5.4).

...and yes, as the commit description notes, users who want to change
the default can do so using the standard sysfs mechanism.
AFAICS, all this is doing is providing a toolset to allow users to more
easily set up and edit the udev scripts that will automate these
settings.
Thiago Becker March 25, 2022, 12:31 p.m. UTC | #3
Hello,

On Wed, Mar 23, 2022 at 6:32 PM Matthew Wilcox <willy@infradead.org> wrote:
> Which recent changes?  Something in NFS or something in the VFS/MM?
> Did you even think about asking a wider audience than the NFS mailing
> list?  I only happened to notice this while I was looking for something
> else, otherwise I would never have seen it.  The responses from other
> people to your patches were right; you're trying to do this all wrong.
>
> Let's start out with a bug report instead of a solution.  What changed
> and when?
>

As Trond stated, c128e575514c ("NFS: Optimise the default readahead
size") changed the way readahead is calculated for NFS mounts. This
caused some read workloads to underperform, compared to the
performance from previous revisions. To recall, the current policy
is to adopt the system default readahead of 128kiB, and mounts
with sec=krb5p take a performance hit of 50-75% when readahead
is 128. I haven't performed an exhaustive search for other workloads
that might also be affected, but I noticed the meaningful drop in
performance in sec=sys mounts, notes at the end.

The previous policy was to calculate the readahead as a
multiple of rsize, so we prescribed increasing the value to the
complaining part, and this fixed the issue. We are now trying to find a
solution that we can incorporate into the system.

thiago.

----- Tests
===== RAWHIDE (35% performance hit) =====
# uname -r
5.16.0-0.rc0.20211112git5833291ab6de.12.fc36.x86_64

# grep nfs /proc/self/mountinfo
601 60 0:55 / /mnt rw,relatime shared:332 - nfs4
192.168.122.225:/exports
rw,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.83,local_lock=none,addr=192.168.122.225

# cat /sys/class/bdi/0\:55/read_ahead_kb
128

# for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1
| grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.5025 s, 260 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.4474 s, 261 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 18.0181 s, 238 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 18.2323 s, 236 MB/s

# echo 15360 > /sys/class/bdi/0\:55/read_ahead_kb

# for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1
| grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.2601 s, 381 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.1885 s, 384 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.5877 s, 371 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 10.9475 s, 392 MB/s

===== UPSTREAM (30% performance hit) =====
# uname -r
5.17.0+

# grep nfs /proc/self/mountinfo
583 60 0:55 / /mnt rw,relatime shared:302 - nfs4
192.168.122.225:/exports
rw,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.83,local_lock=none,addr=192.168.122.225

# cat /sys/class/bdi/0\:55/read_ahead_kb
128

# for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1
| grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.056 s, 252 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.1258 s, 251 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.5981 s, 259 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 16.5487 s, 260 MB/s

# echo 15360 > /sys/class/bdi/0\:55/read_ahead_kb

# for i in {0..3} ; do dd if=/mnt/testfile.bin of=/dev/null bs=1M 2>&1
| grep copied ; echo 3 > /proc/sys/vm/drop_caches ; done
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 12.3855 s, 347 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.2528 s, 382 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.9849 s, 358 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 11.2953 s, 380 MB/s