Message ID | 1429553588-24764-16-git-send-email-viro@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Apr 20, 2015 at 11:13 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > */ > static int link_path_walk(const char *name, struct nameidata *nd) > { > + struct saved { > + struct path link; > + void *cookie; > + const char *name; > + } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1; > struct path next; > int err; I was going to complain about this, and suggest that you move it to the nameidata, but I see that you did that later. That said, you then introduce a stack-allocated "struct saved stack[]" in path_mountpoint[] instead, *and* nameidata is saved on stack, so this all ends up being very stack-intensive anyway. I might have missed some patch here, but would it be possible to just allocate a single per-thread nameidata, and just leave it at that? Because allocating that thing on the stack when it contains what is now one kilobyte of array data is *not* acceptable. Other than that, my quick scan through this looked fine. And maybe that quick scan missed where you already did that too. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2015 at 02:04:53PM -0700, Linus Torvalds wrote: > That said, you then introduce a stack-allocated "struct saved stack[]" > in path_mountpoint[] instead, *and* nameidata is saved on stack, so > this all ends up being very stack-intensive anyway. > > I might have missed some patch here, but would it be possible to just > allocate a single per-thread nameidata, and just leave it at that? > Because allocating that thing on the stack when it contains what is > now one kilobyte of array data is *not* acceptable. What kilobyte? It's 9*4 pointers, IOW, 288 bytes total (assuming 64bit box). And nd->saved_names[] goes away, so scratch 9 pointers we used to have. Sure, we can allocate that dynamically (or hold a couple of elements on stack and allocate when/if we overgrow that), but it's not particularly large win. Breakeven point is circa the second level of nesting - symlink met when traversing a symlink... That - on amd64; on something with fatter stack frames I would expect the comparison to be even worse for mainline... We need to preserve 4 pointers on stack per level of nesting. Seeing that single link_path_walk() stack frame in mainline is about 5-6 times bigger than that, "just put enough for all levels into auto array" is an obvious approach - a couple of link_path_walk() stack frames will be heavier than that. For renameat() (the worst-case user of link_path_walk() - there are two struct nameidata on stack) we end up with breakeven at *one* level of nesting, what with getting rid of 2*9 pointers in ->saved_names[] of those nameidata. And yes, I've measured the actual stack use before and after... A kilobyte would suffice for 32 levels. _IF_ we go for "lift the restrictions on nesting completely", sure, we want to switch to (on-demand) dynamic allocation. It's not particularly hard to add and it might be worth doing, but it's a separate story. This series leaves the set of accepted pathnames as-is... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > What kilobyte? It's 9*4 pointers, IOW, 288 bytes total (assuming 64bit box). You also said that you were going to up the recursion limit to 40.. So 40*3*8 bytes.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > A kilobyte would suffice for 32 levels. _IF_ we go for "lift the restrictions > on nesting completely", sure, we want to switch to (on-demand) dynamic > allocation. And no, we will *never* lift the recursion limit. Not for 1kB, not for 1MB. Never. Because it's a latency and DoS issue too. We need to react well to true loops, but also to "very deep" non-loops. It's not about memory use, it's about users triggering unreasonable CPU resources. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2015 at 2:41 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > And no, we will *never* lift the recursion limit. Not for 1kB, not for > 1MB. Never. Just to clarify: that's for the "remove restrictions completely". Upping it to 32 or 40 would be fine. But not with allocations on the stack. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2015 at 02:39:43PM -0700, Linus Torvalds wrote: > On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > What kilobyte? It's 9*4 pointers, IOW, 288 bytes total (assuming 64bit box). > > You also said that you were going to up the recursion limit to 40.. So > 40*3*8 bytes.. Er... That's exactly what || We could reduce it further (see below), but I'm not sure it's worth || doing - it's not much extra complexity, but we only squeeze out ~250 bytes || that way, with the worst-case footprints (those are triggered by rename()) || are around 1.4Kb (instead of about twice as much in mainline). OTOH, || that extra complexity would've allowed us to get rid of the nesting limit || entirely (i.e. we get ELOOP when we encounter 40 symlinks, no matter in || which manner they are nested). That might be worth considering... had been about. And yes, it is easy to implement - new nameidata flag for "need to kfree() nd->stack", if (unlikely(current->link_count >= MAX_NESTED_LINKS)) { path_put_conditional(&next, nd); path_put(&nd->path); return -ELOOP; } BUG_ON(nd->depth >= MAX_NESTED_LINKS); nd->depth++; replaced with if (nd->depth == 2 && !(nd->flags & LOOKUP_KFREE)) { struct saved *p = kmalloc(41 * sizeof(*p)); if (!p) { path_put_conditional(&next, nd); path_put(&nd->path); return -ENOMEM; } memcpy(p, nd->stack, 2 * sizeof(*p)); nd->stack = p; nd->flags |= LOOKUP_KFREE; } nd->depth++; with obvious logics for freeing that crap afterwards. I really don't like the idea of putting it into nameidata, BTW - consider e.g. rename(). We don't need the contents of that thing after the link_path_walk() returns; no point duplicating it... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2015 at 02:41:31PM -0700, Linus Torvalds wrote: > On Mon, Apr 20, 2015 at 2:32 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > A kilobyte would suffice for 32 levels. _IF_ we go for "lift the restrictions > > on nesting completely", sure, we want to switch to (on-demand) dynamic > > allocation. > > And no, we will *never* lift the recursion limit. Not for 1kB, not for > 1MB. Never. We still have the "no more than 40 links total" limit, so it would obviously be limited by that... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 20, 2015 at 02:42:56PM -0700, Linus Torvalds wrote: > On Mon, Apr 20, 2015 at 2:41 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > And no, we will *never* lift the recursion limit. Not for 1kB, not for > > 1MB. Never. > > Just to clarify: that's for the "remove restrictions completely". > Upping it to 32 or 40 would be fine. But not with allocations on the > stack. Yep. There are two restrictions - one on the total amount of symlinks encountered (40), and another on the depth of nesting (8). The latter is due to recursion issues, and with dynamic nd->stack it goes away. The former, of course, stays, for the reasons you've mentioned - it has nothing to do with the depth of recursion. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/namei.c b/fs/namei.c index 6d79e00..4b98f4b 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1767,9 +1767,15 @@ static inline u64 hash_name(const char *name) */ static int link_path_walk(const char *name, struct nameidata *nd) { + struct saved { + struct path link; + void *cookie; + const char *name; + } stack[MAX_NESTED_LINKS], *last = stack + nd->depth - 1; struct path next; int err; - + +start: while (*name=='/') name++; if (!*name) @@ -1833,8 +1839,6 @@ Walked: goto Err; if (err) { - struct path link; - void *cookie; char *s; if (unlikely(nd->link_count >= MAX_NESTED_LINKS)) { @@ -1847,22 +1851,25 @@ Walked: nd->depth++; nd->link_count++; + last++; - link = next; - s = get_link(&link, nd, &cookie); + last->link = next; + s = get_link(&last->link, nd, &last->cookie); if (unlikely(IS_ERR(s))) { err = PTR_ERR(s); nd->link_count--; nd->depth--; + last--; goto Err; } err = 0; if (unlikely(!s)) { /* jumped */ - put_link(nd, &link, cookie); + put_link(nd, &last->link, last->cookie); nd->link_count--; nd->depth--; + last--; } else { if (*s == '/') { if (!nd->root.mnt) @@ -1873,17 +1880,24 @@ Walked: nd->flags |= LOOKUP_JUMPED; } nd->inode = nd->path.dentry->d_inode; - err = link_path_walk(s, nd); + last->name = name; + name = s; + goto start; + +back: + name = last->name; if (unlikely(err)) { - put_link(nd, &link, cookie); + put_link(nd, &last->link, last->cookie); nd->link_count--; nd->depth--; + last--; goto Err; } else { err = walk_component(nd, &next, LOOKUP_FOLLOW); - put_link(nd, &link, cookie); + put_link(nd, &last->link, last->cookie); nd->link_count--; nd->depth--; + last--; goto Walked; } } @@ -1895,9 +1909,13 @@ Walked: } terminate_walk(nd); Err: - return err; + if (likely(!nd->depth)) + return err; + goto back; OK: - return 0; + if (likely(!nd->depth)) + return 0; + goto back; } static int path_init(int dfd, const struct filename *name, unsigned int flags,