[0/6] NFS: Add support for the v4.2 READ_PLUS operation
mbox series

Message ID 20190222215918.20647-1-Anna.Schumaker@Netapp.com
Headers show
  • NFS: Add support for the v4.2 READ_PLUS operation
Related show


Anna Schumaker Feb. 22, 2019, 9:59 p.m. UTC
From: Anna Schumaker <Anna.Schumaker@Netapp.com>

These patches add client support for the READ_PLUS operation.  This
operation is meant to improve file reading performance when working with
sparse files, but underlying filesystem performance on the server side
may have an effect on the actual read performance.  I've done a bunch of
testing on virtual machines, and I found that READ_PLUS performs best

  1) The file being read is not yet in the server's page cache.
  2) The read request begins with a hole segment. And
  3) The server only performs one llseek() call during encoding

I've added a "noreadplus" mount option to allow users to disabl ethe new
operation if it becomes a problem, similar to the "nordirplus" mount
option that we already have.

Here are the results of my performance tests, separated by underlying
filesystem and if the file is already in the cache or not.  The NFS v4.2
column is for the standard READ operation, and v4.2+ is with READ_PLUS.
In addition to the 100% data and 100% hole cases, I also test with files
that alternate between data and hole chunks.  I tested with two files
for each chunk size, one beginning with a data segment and one beginning
with a hole.  I used the `vmtouch` utility to load and clear the file
from the server's cache, and I used the following `dd` command on the
client for reading back the file:

  $ dd if=$src of=/dev/null bs=$rsize_from_mount 2>&1

     xfs (uncached)     |  NFS v3  NFS v4.0  NFS v4.1  NFS v4.2  NFS v4.2+
   Whole File (data)    |  3.228s    3.361s    3.679s    3.382s    3.483s
   Whole File (hole)    |  1.276s    1.086s    1.143s    1.066s    0.805s
   Sparse  4K (data)    |  3.473s    3.953s    3.740s    3.535s    3.515s
   Sparse  4K (hole)    |  3.373s    3.192s    3.120s    3.113s    2.709s
   Sparse  8K (data)    |  3.782s    3.527s    3.589s    3.476s    3.494s
   Sparse  8K (hole)    |  3.161s    3.328s    2.974s    2.889s    2.863s
   Sparse 16K (data)    |  3.804s    3.945s    3.885s    3.507s    3.569s
   Sparse 16K (hole)    |  2.961s    3.124s    3.413s    3.136s    2.712s
   Sparse 32K (data)    |  2.891s    3.632s    3.833s    3.643s    3.485s
   Sparse 32K (hole)    |  2.592s    2.216s    2.545s    2.665s    2.829s

      xfs (cached)      |  NFS v3  NFS v4.0  NFS v4.1  NFS v4.2  NFS v4.2+
   Whole File (data)    |  0.939s    0.943s    0.939s    0.942s    1.153s
   Whole File (hole)    |  0.982s    1.007s    0.991s    0.946s    0.826s
   Sparse  4K (data)    |  0.980s    0.999s    0.961s    0.996s    1.166s
   Sparse  4K (hole)    |  1.001s    0.972s    0.997s    1.001s    1.201s
   Sparse  8K (data)    |  1.272s    1.053s    0.999s    0.974s    1.200s
   Sparse  8K (hole)    |  0.965s    1.004s    1.036s    1.006s    1.248s
   Sparse 16K (data)    |  0.995s    0.993s    1.035s    1.054s    1.210s
   Sparse 16K (hole)    |  0.966s    0.982s    1.091s    1.038s    1.214s
   Sparse 32K (data)    |  1.054s    0.968s    1.045s    0.990s    1.203s
   Sparse 32K (hole)    |  1.019s    0.960s    1.001s    0.983s    1.254s

    ext4 (uncached)     |  NFS v3  NFS v4.0  NFS v4.1  NFS v4.2  NFS v4.2+
   Whole File (data)    |  6.089s    6.104s    6.489s    6.342s    6.137s
   Whole File (hole)    |  2.603s    2.258s    2.226s    2.315s    1.715s
   Sparse  4K (data)    |  7.063s    7.372s    7.064s    7.149s    7.459s
   Sparse  4K (hole)    |  7.231s    6.709s    6.495s    6.880s    6.138s
   Sparse  8K (data)    |  6.576s    6.938s    6.386s    6.086s    6.154s
   Sparse  8K (hole)    |  5.903s    6.089s    5.555s    5.578s    5.442s
   Sparse 16K (data)    |  6.556s    6.257s    6.135s    5.588s    5.856s
   Sparse 16K (hole)    |  5.504s    5.290s    5.545s    5.195s    4.983s
   Sparse 32K (data)    |  5.047s    5.490s    5.734s    5.578s    5.378s
   Sparse 32K (hole)    |  4.232s    3.860s    4.299s    4.466s    4.633s

     ext4 (cached)      |  NFS v3  NFS v4.0  NFS v4.1  NFS v4.2  NFS v4.2+
   Whole File (data)    |  1.873s    1.881s    1.869s    1.890s    2.344s
   Whole File (hole)    |  1.929s    2.009s    1.963s    1.917s    1.554s
   Sparse  4K (data)    |  1.961s    1.974s    1.957s    1.986s    2.408s
   Sparse  4K (hole)    |  2.056s    2.025s    1.977s    1.988s    2.458s
   Sparse  8K (data)    |  2.297s    2.038s    2.008s    1.954s    2.437s
   Sparse  8K (hole)    |  1.939s    2.011s    2.024s    2.015s    2.509s
   Sparse 16K (data)    |  1.907s    1.973s    2.053s    2.070s    2.411s
   Sparse 16K (hole)    |  1.940s    1.964s    2.075s    1.996s    2.422s
   Sparse 32K (data)    |  2.045s    1.921s    2.021s    2.013s    2.388s
   Sparse 32K (hole)    |  1.984s    1.944s    1.997s    1.974s    2.398s

    btrfs (uncached)    |  NFS v3  NFS v4.0  NFS v4.1  NFS v4.2  NFS v4.2+
   Whole File (data)    |  9.369s    9.438s    9.837s    9.840s   11.790s
   Whole File (hole)    |  4.052s    3.390s    3.380s    3.619s    2.519s
   Sparse  4K (data)    |  9.738s   10.110s    9.774s    9.819s   12.471s
   Sparse  4K (hole)    |  9.907s    9.504s    9.241s    9.610s    9.054s
   Sparse  8K (data)    |  9.132s    9.453s    8.954s    8.660s   10.555s
   Sparse  8K (hole)    |  8.290s    8.489s    8.305s    8.332s    7.850s
   Sparse 16K (data)    |  8.742s    8.507s    8.667s    8.002s    9.940s
   Sparse 16K (hole)    |  7.635s    7.604s    7.967s    7.558s    7.062s
   Sparse 32K (data)    |  7.279s    7.670s    8.006s    7.705s    9.219s
   Sparse 32K (hole)    |  6.200s    5.713s    6.268s    6.464s    6.486s

     btrfs (cached)     |  NFS v3  NFS v4.0  NFS v4.1  NFS v4.2  NFS v4.2+
   Whole File (data)    |  2.770s    2.814s    2.841s    2.854s    3.492s
   Whole File (hole)    |  2.871s    2.970s    3.001s    2.929s    2.372s
   Sparse  4K (data)    |  2.945s    2.905s    2.930s    2.951s    3.663s
   Sparse  4K (hole)    |  3.032s    3.057s    2.962s    3.050s    3.705s
   Sparse  8K (data)    |  3.277s    3.069s    3.127s    3.034s    3.652s
   Sparse  8K (hole)    |  2.866s    2.959s    3.078s    2.989s    3.762s
   Sparse 16K (data)    |  2.916s    2.923s    3.060s    3.081s    3.631s
   Sparse 16K (hole)    |  2.948s    2.969s    3.108s    2.990s    3.623s
   Sparse 32K (data)    |  3.044s    2.881s    3.052s    2.962s    3.585s
   Sparse 32K (hole)    |  2.954s    2.957s    3.018s    2.951s    3.639s

I also have performance numbers for if we encode every hole and data
segment but I figured this email was long enough already. I'm happy to
share it if requested!



Anna Schumaker (6):
  SUNRPC: Split out a function for setting current page
  SUNRPC: Add the ability to expand holes in data pages
  SUNRPC: Add the ability to shift data to a specific offset
  NFS: Add basic READ_PLUS support
  NFS: Add support for decoding multiple segments
  NFS: Add a mount option for READ_PLUS

 fs/nfs/nfs42xdr.c          | 164 +++++++++++++++++++++++++
 fs/nfs/nfs4client.c        |   3 +
 fs/nfs/nfs4proc.c          |  32 ++++-
 fs/nfs/nfs4xdr.c           |   1 +
 fs/nfs/super.c             |  21 ++++
 include/linux/nfs4.h       |   3 +-
 include/linux/nfs_fs_sb.h  |   2 +
 include/linux/nfs_xdr.h    |   2 +-
 include/linux/sunrpc/xdr.h |   2 +
 net/sunrpc/xdr.c           | 244 ++++++++++++++++++++++++++++++++++++-
 10 files changed, 467 insertions(+), 7 deletions(-)