[RFC,v4,56/69] pick_link(): more straightforward handling of allocation failures

From: Al Viro <viro@zeniv.linux.org.uk>

From: Al Viro <viro@zeniv.linux.org.uk>

pick_link() needs to push onto stack; we start with using two-element
array embedded into struct nameidata and the first time we need
more than that we switch to separately allocated array.

Allocation can fail, of course, and handling of that would be simple
enough - we need to drop 'link' and bugger off.  However, the things
get more complicated in RCU mode.  There we must do GFP_ATOMIC
allocation.  If that fails, we try to switch to non-RCU mode and
repeat the allocation.

To switch to non-RCU mode we need to grab references to 'link' and
to everything in nameidata.  The latter done by unlazy_walk();
the former - legitimize_path().  'link' must go first - after
unlazy_walk() we are out of RCU-critical period and it's too
late to call legitimize_path() since the references in link->mnt
and link->dentry might be pointing to freed and reused memory.

So we do legitimize_path(), then unlazy_walk().  And that's where
it gets too subtle: what to do if the former fails?  We MUST
do path_put(link) to avoid leaks.  And we can't do that under
rcu_read_lock().  Solution in mainline was to empty then nameidata
manually, drop out of RCU mode and then do put_path().

In effect, we open-code the things eventual terminate_walk()
would've done on error in RCU mode.  That looks badly out of place
and confusing.  We could add a comment along the lines of the
explanation above, but... there's a simpler solution.  Call
unlazy_walk() even if legitimaze_path() fails.  It will take
us out of RCU mode, so we'll be able to do path_put(link).

Yes, it will do unnecessary work - attempt to grab references
on the stuff in nameidata, only to have them dropped as soon
as we return the error to upper layer and get terminate_walk()
called there.  So what?  We are thoroughly off the fast path
by that point - we had GFP_ATOMIC allocation fail, we had
->d_seq or mount_lock mismatch and we are about to try walking
the same path from scratch in non-RCU mode.  Which will need
to do the same allocation, this time with GFP_KERNEL, so it will
be able to apply memory pressure for blocking stuff.

Compared to that the cost of several lockref_get_not_dead()
is noise.  And the logics become much easier to understand
that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namei.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Message ID	20200313235357.2646756-56-viro@ZenIV.linux.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ymZa=46=vger.kernel.org=linux-fsdevel-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0954F1668 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Fri, 13 Mar 2020 23:55:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E7F982074F for <patchwork-linux-fsdevel@patchwork.kernel.org>; Fri, 13 Mar 2020 23:55:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728045AbgCMXzG (ORCPT <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>); Fri, 13 Mar 2020 19:55:06 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:50166 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727837AbgCMXyG (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Fri, 13 Mar 2020 19:54:06 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1jCu7x-00B6ds-33; Fri, 13 Mar 2020 23:54:05 +0000 From: Al Viro <viro@ZenIV.linux.org.uk> To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@linux-foundation.org> Subject: [RFC][PATCH v4 56/69] pick_link(): more straightforward handling of allocation failures Date: Fri, 13 Mar 2020 23:53:44 +0000 Message-Id: <20200313235357.2646756-56-viro@ZenIV.linux.org.uk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200313235357.2646756-1-viro@ZenIV.linux.org.uk> References: <20200313235303.GP23230@ZenIV.linux.org.uk> <20200313235357.2646756-1-viro@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org
Series	[RFC,v4,01/69] do_add_mount(): lift lock_mount/unlock_mount into callers \| expand [RFC,v4,01/69] do_add_mount(): lift lock_mount/unlock_mount into callers [RFC,v4,02/69] fix automount/automount race properly [RFC,v4,03/69] follow_automount(): get rid of dead^Wstillborn code [RFC,v4,04/69] follow_automount() doesn't need the entire nameidata [RFC,v4,05/69] make build_open_flags() treat O_CREAT \| O_EXCL as implying O_NOFOLLOW [RFC,v4,06/69] handle_mounts(): start building a sane wrapper for follow_managed() [RFC,v4,07/69] atomic_open(): saner calling conventions (return dentry on success) [RFC,v4,08/69] lookup_open(): saner calling conventions (return dentry on success) [RFC,v4,09/69] do_last(): collapse the call of path_to_nameidata() [RFC,v4,10/69] handle_mounts(): pass dentry in, turn path into a pure out argument [RFC,v4,11/69] lookup_fast(): consolidate the RCU success case [RFC,v4,12/69] teach handle_mounts() to handle RCU mode [RFC,v4,13/69] lookup_fast(): take mount traversal into callers [RFC,v4,14/69] step_into() callers: dismiss the symlink earlier [RFC,v4,15/69] new step_into() flag: WALK_NOFOLLOW [RFC,v4,16/69] fold handle_mounts() into step_into() [RFC,v4,17/69] LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() [RFC,v4,18/69] expand the only remaining call of path_lookup_conditional() [RFC,v4,19/69] merging pick_link() with get_link(), part 1 [RFC,v4,20/69] merging pick_link() with get_link(), part 2 [RFC,v4,21/69] merging pick_link() with get_link(), part 3 [RFC,v4,22/69] merging pick_link() with get_link(), part 4 [RFC,v4,23/69] merging pick_link() with get_link(), part 5 [RFC,v4,24/69] merging pick_link() with get_link(), part 6 [RFC,v4,25/69] finally fold get_link() into pick_link() [RFC,v4,26/69] sanitize handling of nd->last_type, kill LAST_BIND [RFC,v4,27/69] namei: invert the meaning of WALK_FOLLOW [RFC,v4,28/69] pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT [RFC,v4,29/69] link_path_walk(): simplify stack handling [RFC,v4,30/69] namei: have link_path_walk() maintain LOOKUP_PARENT [RFC,v4,31/69] massage __follow_mount_rcu() a bit [RFC,v4,32/69] new helper: traverse_mounts() [RFC,v4,33/69] atomic_open(): return the right dentry in FMODE_OPENED case [RFC,v4,34/69] atomic_open(): lift the call of may_open() into do_last() [RFC,v4,35/69] do_last(): merge the may_open() calls [RFC,v4,36/69] do_last(): don't bother with keeping got_write in FMODE_OPENED case [RFC,v4,37/69] do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case [RFC,v4,38/69] do_last(): simplify the liveness analysis past finish_open_created [RFC,v4,39/69] do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case [RFC,v4,40/69] split the lookup-related parts of do_last() into a separate helper [RFC,v4,41/69] path_connected(): pass mount and dentry separately [RFC,v4,42/69] path_parent_directory(): leave changing path->dentry to callers [RFC,v4,43/69] expand path_parent_directory() in its callers [RFC,v4,44/69] follow_dotdot{,_rcu}(): lift switching nd->path to parent out of loop [RFC,v4,45/69] follow_dotdot{,_rcu}(): lift LOOKUP_BENEATH checks out of loop [RFC,v4,46/69] move handle_dots(), follow_dotdot() and follow_dotdot_rcu() past step_into() [RFC,v4,47/69] handle_dots(), follow_dotdot{,_rcu}(): preparation to switch to step_into() [RFC,v4,48/69] follow_dotdot{,_rcu}(): switch to use of step_into() [RFC,v4,49/69] lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu [RFC,v4,50/69] follow_dotdot{,_rcu}(): massage loops [RFC,v4,51/69] follow_dotdot_rcu(): be lazy about changing nd->path [RFC,v4,52/69] follow_dotdot(): be lazy about changing nd->path [RFC,v4,53/69] helper for mount rootwards traversal [RFC,v4,54/69] non-RCU analogue of the previous commit [RFC,v4,55/69] fs/namei.c: kill follow_mount() [RFC,v4,56/69] pick_link(): more straightforward handling of allocation failures [RFC,v4,57/69] pick_link(): pass it struct path already with normal refcounting rules [RFC,v4,58/69] fold path_to_nameidata() into its only remaining caller [RFC,v4,59/69] pick_link(): take reserving space on stack into a new helper [RFC,v4,60/69] reserve_stack(): switch to __nd_alloc_stack() [RFC,v4,61/69] __nd_alloc_stack(): make it return bool [RFC,v4,62/69] link_path_walk(): sample parent's i_uid and i_mode for the last component [RFC,v4,63/69] take post-lookup part of do_last() out of loop [RFC,v4,64/69] open_last_lookups(): consolidate fsnotify_create() calls [RFC,v4,65/69] open_last_lookups(): don't abuse complete_walk() when all we want is unlazy [RFC,v4,66/69] open_last_lookups(): lift O_EXCL\|O_CREAT handling into do_open() [RFC,v4,67/69] open_last_lookups(): move complete_walk() into do_open() [RFC,v4,68/69] atomic_open(): no need to pass struct open_flags anymore [RFC,v4,69/69] lookup_open(): don't bother with fallbacks to lookup+create

[RFC,v4,56/69] pick_link(): more straightforward handling of allocation failures

Commit Message

Patch