read/write performance regression due to better v4 lock state tracking

We've gotten some recent bug reports about a performance regression in
recent RHEL kernels that have gotten the backported RELEASE_LOCKOWNER
changes (commit f11ac8db5 in particular). I've also been able to
reproduce this in recent mainline kernels too.

Most of the reports concern dump(8), which apparently forks and has
multiple processes doing I/O to the same area of a file. There's no
fcntl locking involved.

Max Matveev did some analysis of the problem and wrote a reproducer for
it (attached). Simply run this on a current kernel and you'll see a
bunch of FILE_SYNC page-sized writes go out onto the wire. Older
kernels run this program much faster.

He discovered that the problem is primarily due to the bottom two
conditions in nfs_flush_incompatible that were added in commit
f11ac8db5:

 do_flush = req->wb_page != page || req->wb_context != ctx ||
            req->wb_lock_context->lockowner != current->files ||
            req->wb_lock_context->pid != current->tgid;

Because it's doing I/O from different tasks to the same page, those
pages all get flushed out with FILE_SYNC writes. Previously they were
not considered incompatible and no flush was required. It's important
to note that this is even the case on a v3 mount, which shouldn't have
any issue with different lockowners.

The following patch is a proof-of-concept that corrects the problem,
but it's more of a hack than a real fix. Consider it a starting point
for discussion. What would be preferable is a real fix that could allow
v4 to perform better in this situation too, but I'm unclear on how best
to do that.

Perhaps we could look somehow at whether there were any fcntl locks
active and skip those two checks if not?

Thoughts?

--------------------------[snip]--------------------------

[PATCH] nfs: allow coalescing of requests from different lockowners for v2/3 mounts

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/pagelist.c | 12 ++++++++----
 fs/nfs/write.c    |  4 ++--
 2 files changed, 10 insertions(+), 6 deletions(-)

read/write performance regression due to better v4 lock state tracking

Commit Message

Patch