diff mbox

[1/2] vfs: make sure struct filename->iname is word-aligned

Message ID 1455662966-7977-1-git-send-email-linux@rasmusvillemoes.dk
State New
Headers show

Commit Message

Rasmus Villemoes Feb. 16, 2016, 10:49 p.m. UTC
I noticed that offsetof(struct filename, iname) is actually 28 on 64
bit platforms, so we always pass an unaligned pointer to
strncpy_from_user. This is mostly a problem for those 64 bit platforms
without HAVE_EFFICIENT_UNALIGNED_ACCESS, but even on x86_64, unaligned
accesses carry a penalty, especially when done in a loop.

Let's try to ensure we always pass an aligned destination pointer to
strncpy_from_user. I considered making refcnt a long instead of doing
the union thing, and mostly ended up tossing a coin.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
Cc'ing Linus, not because it's urgent in any way, but because he's
usually interested in strncpy_from_user and he can probably tell me
why this is completely immaterial.

 fs/namei.c         | 2 ++
 include/linux/fs.h | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Al Viro Feb. 16, 2016, 10:57 p.m. UTC | #1
On Tue, Feb 16, 2016 at 11:49:24PM +0100, Rasmus Villemoes wrote:
> I noticed that offsetof(struct filename, iname) is actually 28 on 64
> bit platforms, so we always pass an unaligned pointer to
> strncpy_from_user. This is mostly a problem for those 64 bit platforms
> without HAVE_EFFICIENT_UNALIGNED_ACCESS, but even on x86_64, unaligned
> accesses carry a penalty, especially when done in a loop.
> 
> Let's try to ensure we always pass an aligned destination pointer to
> strncpy_from_user. I considered making refcnt a long instead of doing
> the union thing, and mostly ended up tossing a coin.

Why not swap it with the previous field, then?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rasmus Villemoes Feb. 18, 2016, 8:10 p.m. UTC | #2
On Tue, Feb 16 2016, Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Tue, Feb 16, 2016 at 11:49:24PM +0100, Rasmus Villemoes wrote:
>> I noticed that offsetof(struct filename, iname) is actually 28 on 64
>> bit platforms, so we always pass an unaligned pointer to
>> strncpy_from_user. This is mostly a problem for those 64 bit platforms
>> without HAVE_EFFICIENT_UNALIGNED_ACCESS, but even on x86_64, unaligned
>> accesses carry a penalty, especially when done in a loop.
>> 
>> Let's try to ensure we always pass an aligned destination pointer to
>> strncpy_from_user. I considered making refcnt a long instead of doing
>> the union thing, and mostly ended up tossing a coin.
>
> Why not swap it with the previous field, then?

Sure, that would work as well. I don't really care how ->iname is pushed
out to offset 32, but I'd like to know if it's worth it.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Y. Ts'o Feb. 19, 2016, 3 p.m. UTC | #3
On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote:
> 
> Sure, that would work as well. I don't really care how ->iname is pushed
> out to offset 32, but I'd like to know if it's worth it.

Do you have access to one of these platforms where unaligned access is
really painful?  The usual thing is to benchmark something like "git
stat" which has to stat every single file in a repository's working
directory.  If you can't see it there, it seems unlikely you'd see it
anywhere else, yes?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rasmus Villemoes Feb. 22, 2016, 11:59 p.m. UTC | #4
On Fri, Feb 19 2016, Theodore Ts'o <tytso@mit.edu> wrote:

> On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote:
>> 
>> Sure, that would work as well. I don't really care how ->iname is pushed
>> out to offset 32, but I'd like to know if it's worth it.
>
> Do you have access to one of these platforms where unaligned access is
> really painful?

No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing
but lstat() on the same "aaaa..." string in a loop. 'before' is 4.4.2
with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In
perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in
userspace.

$ grep strncpy_from_user *.report
perf_30_0_after.report:     5.47%  s_f_u    [k] strncpy_from_user            
perf_30_0_before.report:     7.40%  s_f_u    [k] strncpy_from_user         
perf_30_3_after.report:     5.05%  s_f_u    [k] strncpy_from_user             
perf_30_3_before.report:     7.29%  s_f_u    [k] strncpy_from_user          
perf_30_4_after.report:     4.88%  s_f_u    [k] strncpy_from_user            
perf_30_4_before.report:     7.28%  s_f_u    [k] strncpy_from_user        
perf_30_6_after.report:     5.43%  s_f_u    [k] strncpy_from_user            
perf_30_6_before.report:     6.74%  s_f_u    [k] strncpy_from_user        
perf_40_0_after.report:     5.68%  s_f_u    [k] strncpy_from_user            
perf_40_0_before.report:    10.99%  s_f_u    [k] strncpy_from_user        
perf_40_3_after.report:     5.37%  s_f_u    [k] strncpy_from_user            
perf_40_3_before.report:    10.62%  s_f_u    [k] strncpy_from_user        
perf_40_4_after.report:     5.61%  s_f_u    [k] strncpy_from_user            
perf_40_4_before.report:    10.91%  s_f_u    [k] strncpy_from_user        
perf_40_6_after.report:     5.81%  s_f_u    [k] strncpy_from_user            
perf_40_6_before.report:    10.84%  s_f_u    [k] strncpy_from_user          
perf_50_0_after.report:     6.29%  s_f_u    [k] strncpy_from_user            
perf_50_0_before.report:    12.46%  s_f_u    [k] strncpy_from_user        
perf_50_3_after.report:     7.15%  s_f_u    [k] strncpy_from_user            
perf_50_3_before.report:    14.09%  s_f_u    [k] strncpy_from_user                   
perf_50_4_after.report:     7.64%  s_f_u    [k] strncpy_from_user            
perf_50_4_before.report:    14.10%  s_f_u    [k] strncpy_from_user        
perf_50_6_after.report:     7.30%  s_f_u    [k] strncpy_from_user            
perf_50_6_before.report:    14.10%  s_f_u    [k] strncpy_from_user        
perf_60_0_after.report:     6.81%  s_f_u    [k] strncpy_from_user            
perf_60_0_before.report:    13.25%  s_f_u    [k] strncpy_from_user         
perf_60_3_after.report:     9.48%  s_f_u    [k] strncpy_from_user            
perf_60_3_before.report:    13.26%  s_f_u    [k] strncpy_from_user         
perf_60_4_after.report:     9.90%  s_f_u    [k] strncpy_from_user            
perf_60_4_before.report:    15.09%  s_f_u    [k] strncpy_from_user          
perf_60_6_after.report:     9.91%  s_f_u    [k] strncpy_from_user            
perf_60_6_before.report:    13.85%  s_f_u    [k] strncpy_from_user        

So the numbers vary and it's a bit odd that some of the
userspace-unaligned cases seem faster than the corresponding aligned
ones, but overall I think it's ok to conclude there's a measurable
difference.

Note the huge jump from 30_y_before to 40_y_before. I suppose that's
because we do an unaligned store crossing a cache line boundary when the
string is > 32 bytes.

I suppose 2/2 is also responsible for some of the above, since it not
only aligns the kernel-side stores, but also means we stay within a
single cacheline for strings up to 56 bytes. I should measure the effect
of 1/2 by itself, but compiling a kernel takes forever for me, so I
won't get to that tonight.

[It turns out that 32 is the median length from 'git ls-files' in the
kernel tree, with 33.2 being the mean, so even though I used relatively
long paths above to get strncpy_from_user to stand out, such path
lengths are not totally uncommon.]

> The usual thing is to benchmark something like "git
> stat" which has to stat every single file in a repository's working
> directory.

I tried that as well; strncpy_from_user was around 0.5% both before and
after.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/namei.c b/fs/namei.c
index f624d132e01e..bd150fa799a2 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -35,6 +35,7 @@ 
 #include <linux/fs_struct.h>
 #include <linux/posix_acl.h>
 #include <linux/hash.h>
+#include <linux/bug.h>
 #include <asm/uaccess.h>
 
 #include "internal.h"
@@ -127,6 +128,7 @@  getname_flags(const char __user *filename, int flags, int *empty)
 	struct filename *result;
 	char *kname;
 	int len;
+	BUILD_BUG_ON(offsetof(struct filename, iname) % sizeof(long) != 0);
 
 	result = audit_reusename(filename);
 	if (result)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ae681002100a..d522e6391855 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2245,7 +2245,10 @@  struct filename {
 	const char		*name;	/* pointer to actual string */
 	const __user char	*uptr;	/* original userland pointer */
 	struct audit_names	*aname;
-	int			refcnt;
+	union {
+		int		refcnt;
+		long		__padding;
+	};
 	const char		iname[];
 };