[2/2] NFS: avoid deadlocks with loop-back mounted NFS filesystems.

Support for loop-back mounted NFS filesystems is useful when NFS is
used to access shared storage in a high-availability cluster.

If the node running the NFS server fails, some other node can mount the
filesystem and start providing NFS service.  If that node already had
the filesystem NFS mounted, it will now have it loop-back mounted.

nfsd can suffer a deadlock when allocating memory and entering direct
reclaim.
While direct reclaim does not write to the NFS filesystem it can send
and wait for a COMMIT through nfs_release_page().

This patch modifies nfs_release_page() and the functions it calls so
that if the COMMIT is sent to the local host (i.e. is routed over the
loop-back interface) then nfs_release_page() will not wait forever for
that COMMIT to complete.  This is achieved using a new flag
FLUSH_COND_CONNECTED.  When this is set the flush is conditional and
will only wait if the client is connected to a non-local server.

Aborting early should be safe as kswapd uses the same code but never
waits for the COMMIT.

Always aborting early could upset the balance of memory management so
when the commit was sent to the local host we still wait but with a
100ms timeout.  This is enough to significantly slow down any process
that is allocating lots of memory and often long enough to let the
commit complete.

In those rare cases where it is nfsd, or something that nfsd is
waiting for, that is calling nfs_release_page(), 100ms is not so long
that throughput will be seriously affected.

When fail-over happens a remote (foreign) client will first become
disconnected and then turn into a local client.
To prevent deadlocks from happening at this point, we still have a
timeout when the COMMIT has been sent to a remote client. In this case
the timeout is longer (1 second).

So when a node that has mounted a remote filesystem loses the
connection, nfs_release_page() will stop blocking and start failing.
Memory allocators will then be able to make progress without blocking
in NFS.  Any process which writes to a file will still be blocked in
balance_dirty_pages().

This patch makes use of the new 'private' field in
"struct wait_bit_key" to store the start time of a commit, so the
'action' function called by __wait_on_bit() knows how much longer
it is appropriate to wait.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/file.c               |    2 +
 fs/nfs/write.c              |   72 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/freezer.h     |   10 ++++++
 include/uapi/linux/nfs_fs.h |    3 ++
 4 files changed, 81 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[2/2] NFS: avoid deadlocks with loop-back mounted NFS filesystems.

Commit Message

Comments

Patch