diff mbox

[02/20] STAGING/lustre: limit follow_link recursion using stack space.

Message ID 20150323023738.8161.97062.stgit@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown March 23, 2015, 2:37 a.m. UTC
lustre's ->follow_link() uses a lot of stack space and so
need to limit symlink recursion based on stack size.

It currently tests current->link_count, but that will soon
become private to fs/namei.c.
So instead base on actual available stack space.
This patch aborts recursive symlinks in less than 2K of space
is available.  This seems consistent with current code, but
hasn't been tested.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/staging/lustre/lustre/llite/symlink.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Al Viro April 18, 2015, 3:01 a.m. UTC | #1
On Mon, Mar 23, 2015 at 01:37:38PM +1100, NeilBrown wrote:
> lustre's ->follow_link() uses a lot of stack space and so
> need to limit symlink recursion based on stack size.
> 
> It currently tests current->link_count, but that will soon
> become private to fs/namei.c.
> So instead base on actual available stack space.
> This patch aborts recursive symlinks in less than 2K of space
> is available.  This seems consistent with current code, but
> hasn't been tested.

BTW, in the best case that logics is fishy.  We have "up to 5 levels
with 4Kb stack and up to 7 with 8Kb one".  Could somebody manage to dig out
the reasons for such limits?  Preferably along with the kernel version where
the overflows had been observed, both for 4K and 8K cases.

I'm very tempted to rip that thing out in the "kill link_path_walk()
recursion completely" series...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger April 19, 2015, 8:57 p.m. UTC | #2
On Apr 17, 2015, at 9:01 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Mon, Mar 23, 2015 at 01:37:38PM +1100, NeilBrown wrote:
>> lustre's ->follow_link() uses a lot of stack space and so
>> need to limit symlink recursion based on stack size.
>> 
>> It currently tests current->link_count, but that will soon
>> become private to fs/namei.c.
>> So instead base on actual available stack space.
>> This patch aborts recursive symlinks in less than 2K of space
>> is available.  This seems consistent with current code, but
>> hasn't been tested.
> 
> BTW, in the best case that logics is fishy.  We have "up to 5 levels with
> 4Kb stack and up to 7 with 8Kb one".  Could somebody manage to dig out
> the reasons for such limits?  Preferably along with the kernel version
> where the overflows had been observed, both for 4K and 8K cases.

Hi Al,
I checked in our bug history, and the 8KB stack limit was hit with
older clients running racer or our recursive-symlink regression test:

2.6.18: https://bugzilla.lustre.org/show_bug.cgi?id=18533#c0
2.6.16: https://bugzilla.lustre.org/show_bug.cgi?id=19380#c11

The 4KB stack limit for clients has existed a lot longer than that,
but CONFIG_4KSTACKS was not the default on all kernels for a while.
The following bug showed a stack overflow with 2.6.22 kernels:

https://bugzilla.lustre.org/show_bug.cgi?id=17379#c0

Prior to 2.6.16 when we needed client-side kernel patches and a custom
kernel build, we always forced the CONFIG_4KSTACKS off in the config.

In general, Lustre is a heavy stack user because it is a network
filesystem, and doubly so if the Lustre client is re-exporting the
filesystem to NFS clients.

I'd be happy if symlink recursion was removed completely, but so far the
added symlink recursion limit hasn't been a problem for Lustre users.

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro April 19, 2015, 9:33 p.m. UTC | #3
On Sun, Apr 19, 2015 at 02:57:07PM -0600, Andreas Dilger wrote:

> I'd be happy if symlink recursion was removed completely, but so far the
> added symlink recursion limit hasn't been a problem for Lustre users.

Well, it's gone in my tree; I've just pushed the current queue to
vfs.git#link_path_walk.  Right now I'm looking at the unholy mess
gcc does to stack footprint with inlining - the last commit in there
is a result of exactly that.  Inlines in there really need tuning ;-/
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro April 20, 2015, 2:29 a.m. UTC | #4
On Sun, Apr 19, 2015 at 10:33:48PM +0100, Al Viro wrote:
> On Sun, Apr 19, 2015 at 02:57:07PM -0600, Andreas Dilger wrote:
> 
> > I'd be happy if symlink recursion was removed completely, but so far the
> > added symlink recursion limit hasn't been a problem for Lustre users.
> 
> Well, it's gone in my tree; I've just pushed the current queue to
> vfs.git#link_path_walk.  Right now I'm looking at the unholy mess
> gcc does to stack footprint with inlining - the last commit in there
> is a result of exactly that.  Inlines in there really need tuning ;-/

FWIW, right now in my tree the maximal stack footprint of call chains through
fs/namei.c (amd64, my test config, including aushit) is 1408 bytes.
Goes via rename() -> renameat2() -> user_path_parent() -> filename_lookup() ->
path_lookupa() -> path_init() or follow_link() -> link_path_walk() ->
walk_component() -> lookup_fast() -> follow_managed().  And that does *not*
depend upon the depth of symlink nesting.  The maximal depth when calling
any methods present in lustre is 1328; similar path, except that its tail
goes like walk_component() -> __lookup_hash() -> lookup_dcache() ->
->d_revalidate().  Again, independent from the symlink nesting depth.
->lookup() calls are at 1296 maximum, similar call chain, for ->permission()
it's 1152, for ->follow_link() - 1088.

For mainline it's _much_ worse.  Maximal depth on the same config is
2986 bytes (with 8 levels of nesting) and each level costs 208 bytes.
->d_revalidate() is at 2880; for lustre it would be reduced a bit (again,
208 per level), but if you have any symlinks at all, you will end up
deeper than in non-recursive variant.

And frankly, the most scary thing in there isn't lustre-related - it's NFS4
(and AFS, etc.), where ->d_automount() might get called on _that_ depth.  With
quite a bit of stack footprint of its own - we are doing NFS referral handling.
With almost 3Kb of stack already eaten up.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/staging/lustre/lustre/llite/symlink.c b/drivers/staging/lustre/lustre/llite/symlink.c
index 686b6a574cc5..ba37eb6b29dc 100644
--- a/drivers/staging/lustre/lustre/llite/symlink.c
+++ b/drivers/staging/lustre/lustre/llite/symlink.c
@@ -120,20 +120,27 @@  failed:
 
 static void *ll_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
+	unsigned long avail_space;
 	struct inode *inode = dentry->d_inode;
 	struct ptlrpc_request *request = NULL;
 	int rc;
 	char *symname = NULL;
 
 	CDEBUG(D_VFSTRACE, "VFS Op\n");
-	/* Limit the recursive symlink depth to 5 instead of default
-	 * 8 links when kernel has 4k stack to prevent stack overflow.
-	 * For 8k stacks we need to limit it to 7 for local servers. */
-	if (THREAD_SIZE < 8192 && current->link_count >= 6) {
-		rc = -ELOOP;
-	} else if (THREAD_SIZE == 8192 && current->link_count >= 8) {
+	/* Limit the recursive symlink depth.
+	 * Previously limited to 5 instead of default 8 links when
+	 * kernel has 4k stack to prevent stack overflow.
+	 * For 8k stacks, was limited to 7 for local servers.
+	 * Now limited to ensure 2K of stack is available for lustre.
+	 */
+#ifdef CONFIG_STACK_GROWSUP
+	avail_space = end_of_stack(current) - &avail_space;
+#else
+	avail_space = &avail_space - end_of_stack(current);
+#endif
+	if (avail_space < 2048)
 		rc = -ELOOP;
-	} else {
+	else {
 		ll_inode_size_lock(inode);
 		rc = ll_readlink_internal(inode, &request, &symname);
 		ll_inode_size_unlock(inode);