diff mbox

fix fanotify_mark() breakage on big endian 32bit kernel

Message ID 20140704151235.GA22454@ls3530.dhcp.wdf.sap.corp (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Helge Deller July 4, 2014, 3:12 p.m. UTC
This patch affects big endian architectures only.

On those with 32bit userspace and 64bit kernel (CONFIG_COMPAT=y) the
64bit mask parameter is correctly constructed out of two 32bit values in
the compat_fanotify_mark() function and then passed as 64bit parameter
to the fanotify_mark() syscall.

But for the CONFIG_COMPAT=n case (32bit kernel & userspace),
compat_fanotify_mark() isn't used and the fanotify_mark syscall implementation
is used directly. In that case the upper and lower 32 bits of the 64bit mask
parameter is still swapped on big endian machines and thus leads to
fanotify_mark failing with -EINVAL.

Here is a strace of the same 32bit executable (fanotify01 testcase from LTP):

On a 64bit kernel it suceeds:
syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = 0

On a 32bit kernel it fails:
syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = -1 (errno 22)

Below is the easiest fix for this problem by simply swapping the upper and
lower 32bit of the 64 bit mask parameter when building a pure 32bit kernel.

But on the other side, using __u64 in a syscall API is IMHO wrong.  This may
easily break 32bit kernel builds, esp. on big endian machines.

The clean solution would probably be to use SYSCALL_DEFINE5() when
building a 64bit-kernel, and SYSCALL_DEFINE6() for fanotify_mark() when
building a pure 32bit kernel, something like this:

#ifdef CONFIG_64BIT
SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
                              __u64, mask, int, dfd,
                              const char  __user *, pathname)
#else
SYSCALL_DEFINE6(fanotify_mark, int, fanotify_fd, unsigned int, flags,
                                __u32, mask0, __u32, mask1, int, dfd,
                                const char  __user *, pathname)
#endif


Signed-off-by: Helge Deller <deller@gmx.de>
To: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>


--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Heinrich Schuchardt July 4, 2014, 4:48 p.m. UTC | #1
On 04.07.2014 17:12, Helge Deller wrote:
> This patch affects big endian architectures only.
>
> On those with 32bit userspace and 64bit kernel (CONFIG_COMPAT=y) the
> 64bit mask parameter is correctly constructed out of two 32bit values in
> the compat_fanotify_mark() function and then passed as 64bit parameter
> to the fanotify_mark() syscall.
>
> But for the CONFIG_COMPAT=n case (32bit kernel & userspace),
> compat_fanotify_mark() isn't used and the fanotify_mark syscall implementation

I was not able to find a symbol compat_fanotify_mark. Could you, please, 
indicate were this coding is.

> is used directly. In that case the upper and lower 32 bits of the 64bit mask
> parameter is still swapped on big endian machines and thus leads to
> fanotify_mark failing with -EINVAL.
>
> Here is a strace of the same 32bit executable (fanotify01 testcase from LTP):

https://github.com/linux-test-project/ltp
testcases/kernel/syscalls/fanotify/fanotify01.c
I guess.

>
> On a 64bit kernel it suceeds:
> syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
> syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = 0
>
> On a 32bit kernel it fails:
> syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
> syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = -1 (errno 22)

The syscall numbers are architecture specific.
Which architecture did you test on?

>
> Below is the easiest fix for this problem by simply swapping the upper and
> lower 32bit of the 64 bit mask parameter when building a pure 32bit kernel.

The problem you report is architecture specific.
Is fanotify_user.c really the right place for the correction?
Or should the fix be in the "arch" directory?

Best regards

Heinrich Schuchardt

>
> But on the other side, using __u64 in a syscall API is IMHO wrong.  This may
> easily break 32bit kernel builds, esp. on big endian machines.
>
> The clean solution would probably be to use SYSCALL_DEFINE5() when
> building a 64bit-kernel, and SYSCALL_DEFINE6() for fanotify_mark() when
> building a pure 32bit kernel, something like this:
>
> #ifdef CONFIG_64BIT
> SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
>                                __u64, mask, int, dfd,
>                                const char  __user *, pathname)
> #else
> SYSCALL_DEFINE6(fanotify_mark, int, fanotify_fd, unsigned int, flags,
>                                  __u32, mask0, __u32, mask1, int, dfd,
>                                  const char  __user *, pathname)
> #endif
>
>
> Signed-off-by: Helge Deller <deller@gmx.de>
> To: Eric Paris <eparis@redhat.com>
> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 3fdc8a3..374261c 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -787,6 +787,10 @@ SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
>   	struct path path;
>   	int ret;
>
> +#if defined(__BIG_ENDIAN) && !defined(CONFIG_64BIT)
> +	mask = (mask << 32) | (mask >> 32);
> +#endif
> +
>   	pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
>   		 __func__, fanotify_fd, flags, dfd, pathname, mask);
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller July 4, 2014, 5:03 p.m. UTC | #2
Hi Heinrich,

On 07/04/2014 06:48 PM, Heinrich Schuchardt wrote:
> On 04.07.2014 17:12, Helge Deller wrote:
>> This patch affects big endian architectures only.
>>
>> On those with 32bit userspace and 64bit kernel (CONFIG_COMPAT=y) the
>> 64bit mask parameter is correctly constructed out of two 32bit values in
>> the compat_fanotify_mark() function and then passed as 64bit parameter
>> to the fanotify_mark() syscall.
>>
>> But for the CONFIG_COMPAT=n case (32bit kernel & userspace),
>> compat_fanotify_mark() isn't used and the fanotify_mark syscall implementation
> 
> I was not able to find a symbol compat_fanotify_mark. Could you, please, indicate were this coding is.

fs/notify/fanotify/fanotify_user.c around line 892:
#ifdef CONFIG_COMPAT
COMPAT_SYSCALL_DEFINE6(fanotify_mark,
                                int, fanotify_fd, unsigned int, flags,
                                __u32, mask0, __u32, mask1, int, dfd,
                                const char  __user *, pathname)

>> is used directly. In that case the upper and lower 32 bits of the 64bit mask
>> parameter is still swapped on big endian machines and thus leads to
>> fanotify_mark failing with -EINVAL.
>>
>> Here is a strace of the same 32bit executable (fanotify01 testcase from LTP):
> 
> https://github.com/linux-test-project/ltp
> testcases/kernel/syscalls/fanotify/fanotify01.c
> I guess.

Yes.

>> On a 64bit kernel it suceeds:
>> syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
>> syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = 0
>>
>> On a 32bit kernel it fails:
>> syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
>> syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = -1 (errno 22)
> 
> The syscall numbers are architecture specific.
> Which architecture did you test on?

Yes, the numbers are architecture specifc.
I tested on HP-PARISC (parisc arch) with 32- and 64bit kernel.

>> Below is the easiest fix for this problem by simply swapping the upper and
>> lower 32bit of the 64 bit mask parameter when building a pure 32bit kernel.
> 
> The problem you report is architecture specific.

It affects all *big endian* architectures (parisc, s390, ppc, ...)
So, if people could test it with a 32bit kernel on those other architectures
it would be nice.

> Is fanotify_user.c really the right place for the correction?
> Or should the fix be in the "arch" directory?

I don't think the fix should go in the arch architectures, because
then you have to modify it for each big endian arch.

>> But on the other side, using __u64 in a syscall API is IMHO wrong.  This may
>> easily break 32bit kernel builds, esp. on big endian machines.
>>
>> The clean solution would probably be to use SYSCALL_DEFINE5() when
>> building a 64bit-kernel, and SYSCALL_DEFINE6() for fanotify_mark() when
>> building a pure 32bit kernel, something like this:

Again, I think using __u64 as type for a generic syscall is wrong, esp.
if the same code is compiled for 32- and 64bit. 
This is my (uncomplete!) suggestion, but it would add many more lines and 
makes reading the code more complicated.

>> #ifdef CONFIG_64BIT
>> SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
>>                                __u64, mask, int, dfd,
>>                                const char  __user *, pathname)
>> #else
>> SYSCALL_DEFINE6(fanotify_mark, int, fanotify_fd, unsigned int, flags,
>>                                  __u32, mask0, __u32, mask1, int, dfd,
>>                                  const char  __user *, pathname)
>> #endif
>>
>>
>> Signed-off-by: Helge Deller <deller@gmx.de>
>> To: Eric Paris <eparis@redhat.com>
>> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>>
>>
>> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
>> index 3fdc8a3..374261c 100644
>> --- a/fs/notify/fanotify/fanotify_user.c
>> +++ b/fs/notify/fanotify/fanotify_user.c
>> @@ -787,6 +787,10 @@ SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
>>       struct path path;
>>       int ret;
>>
>> +#if defined(__BIG_ENDIAN) && !defined(CONFIG_64BIT)
>> +    mask = (mask << 32) | (mask >> 32);
>> +#endif
>> +
>>       pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
>>            __func__, fanotify_fd, flags, dfd, pathname, mask);
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Heiko Carstens July 6, 2014, 9:15 a.m. UTC | #3
On Fri, Jul 04, 2014 at 05:12:35PM +0200, Helge Deller wrote:
> This patch affects big endian architectures only.
> 
> On those with 32bit userspace and 64bit kernel (CONFIG_COMPAT=y) the
> 64bit mask parameter is correctly constructed out of two 32bit values in
> the compat_fanotify_mark() function and then passed as 64bit parameter
> to the fanotify_mark() syscall.
> 
> But for the CONFIG_COMPAT=n case (32bit kernel & userspace),
> compat_fanotify_mark() isn't used and the fanotify_mark syscall implementation
> is used directly. In that case the upper and lower 32 bits of the 64bit mask
> parameter is still swapped on big endian machines and thus leads to
> fanotify_mark failing with -EINVAL.

Why do you think upper and lower 32 bits are swapped on big endian machines?
At least an s390 the C ABI defines that 64 bit values are split into an
even odd register pair, where the most significant bits are in the even numbered
register.

So for sys_fanotify_mark everything is fine on s390, and probably most other
architectures as well. Having a 64 bit syscall parameter indeed does work,
if all the architecture specific details have been correctly considered.

> Here is a strace of the same 32bit executable (fanotify01 testcase from LTP):
> 
> On a 64bit kernel it suceeds:
> syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
> syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = 0
> 
> On a 32bit kernel it fails:
> syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
> syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = -1 (errno 22)

So "0" and "0x3b" together should be the 64 bit "0x3b" mask, this looks just
fine.

> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 3fdc8a3..374261c 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -787,6 +787,10 @@ SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
>  	struct path path;
>  	int ret;
> 
> +#if defined(__BIG_ENDIAN) && !defined(CONFIG_64BIT)
> +	mask = (mask << 32) | (mask >> 32);
> +#endif
> +
>  	pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
>  		 __func__, fanotify_fd, flags, dfd, pathname, mask);

Did you activate this pr_debug()? I'm really wondering what the output looks
like on your machine.

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John David Anglin July 6, 2014, 3:08 p.m. UTC | #4
On 6-Jul-14, at 5:15 AM, Heiko Carstens wrote:

>> But for the CONFIG_COMPAT=n case (32bit kernel & userspace),
>> compat_fanotify_mark() isn't used and the fanotify_mark syscall  
>> implementation
>> is used directly. In that case the upper and lower 32 bits of the  
>> 64bit mask
>> parameter is still swapped on big endian machines and thus leads to
>> fanotify_mark failing with -EINVAL.
>
> Why do you think upper and lower 32 bits are swapped on big endian  
> machines?
> At least an s390 the C ABI defines that 64 bit values are split into  
> an
> even odd register pair, where the most significant bits are in the  
> even numbered
> register.


On hppa, there is no specific rule as to which registers are used to  
hold 64-bit integer values
on 32-bit machines except for the first two call arguments which are  
passed in registers.
For example, r25 and r26 contain the first argument and the most  
significant bits are in r25
and the least significant bits in r26..

In GCC, we typically have an odd even register pair to hold 64-bit  
values as register
r0 is not usable.

The rules are different for float values.

Dave
--
John David Anglin	dave.anglin@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller July 7, 2014, 1:54 p.m. UTC | #5
Hi Heiko,

> On Fri, Jul 04, 2014 at 05:12:35PM +0200, Helge Deller wrote:
> > This patch affects big endian architectures only.
> > 
> > On those with 32bit userspace and 64bit kernel (CONFIG_COMPAT=y) the
> > 64bit mask parameter is correctly constructed out of two 32bit values in
> > the compat_fanotify_mark() function and then passed as 64bit parameter
> > to the fanotify_mark() syscall.
> > 
> > But for the CONFIG_COMPAT=n case (32bit kernel & userspace),
> > compat_fanotify_mark() isn't used and the fanotify_mark syscall implementation
> > is used directly. In that case the upper and lower 32 bits of the 64bit mask
> > parameter is still swapped on big endian machines and thus leads to
> > fanotify_mark failing with -EINVAL.
> 
> Why do you think upper and lower 32 bits are swapped on big endian machines?

I assumed it, because I see this behaviour on parisc, and because of this commit
from you regarding the compat-case. I do recognize, that in this patch the u64 value
is constructed out of the two 32bit values to hand it over. So, this patch is OK.

commit 592f6b842f64e416c7598a1b97c649b34241e22d
Author: Heiko Carstens <heiko.carstens@de.ibm.com>
Date:   Mon Jan 27 17:07:19 2014 -0800

    compat: fix sys_fanotify_mark
    
    Commit 91c2e0bcae72 ("unify compat fanotify_mark(2), switch to
    COMPAT_SYSCALL_DEFINE") added a new unified compat fanotify_mark syscall
    to be used by all architectures.
    
    Unfortunately the unified version merges the split mask parameter in a
    wrong way: the lower and higher word got swapped.
    
    This was discovered with glibc's tst-fanotify test case.


> > Here is a strace of the same 32bit executable (fanotify01 testcase from LTP):
> > 
> > On a 64bit kernel it suceeds:
> > syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
> > syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = 0
> > 
> > On a 32bit kernel it fails:
> > syscall_322(0, 0, 0x3, 0x3, 0x266c8, 0x1) = 0x3
> > syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = -1 (errno 22)
> 
> So "0" and "0x3b" together should be the 64 bit "0x3b" mask, this looks just
> fine.
> 
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 3fdc8a3..374261c 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -787,6 +787,10 @@ SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
> >  	struct path path;
> >  	int ret;
> > 
> > +#if defined(__BIG_ENDIAN) && !defined(CONFIG_64BIT)
> > +	mask = (mask << 32) | (mask >> 32);
> > +#endif
> > +
> >  	pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
> >  		 __func__, fanotify_fd, flags, dfd, pathname, mask);
> 
> Did you activate this pr_debug()? I'm really wondering what the output looks
> like on your machine.

Just tested it.
On 3.16.0-rc4-32bit (without my patch)
syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = -1 (errno 22)
gives:
SYSC_fanotify_mark: fanotify_fd=3 flags=1 dfd=-100 pathname=000266c8 mask=3b00000000

and on 3.16.0-rc4-32bit+ (*with* my patch, same executable file):
syscall_323(0x3, 0x1, 0, 0x3b, 0xffffff9c, 0x266c8) = 0
gives:
SYSC_fanotify_mark: fanotify_fd=3 flags=1 dfd=-100 pathname=000266c8 mask=3b

So, my patch works as expected.

The Linux Test Project (LTP) uses in testcases/kernel/syscalls/fanotify/fanotify.h this coding, which is IMHO 
correct as it would break your commit 592f6b842f64e416c7598a1b97c649b34241e22d otherwise:
long myfanotify_mark(int fd, unsigned int flags, uint64_t mask,
                     int dfd, const char *pathname)
{
#if LTP_USE_64_ABI
        return ltp_syscall(__NR_fanotify_mark, fd, flags, mask, dfd, pathname);
#else
        return ltp_syscall(__NR_fanotify_mark, fd, flags,
                         __LONG_LONG_PAIR((unsigned long) (mask >> 32),
                                          (unsigned long) mask),
                         dfd, (unsigned long) pathname);
#endif
}

with __LONG_LONG_PAIR() defined in /usr/include/endian.h:
#if __BYTE_ORDER == __LITTLE_ENDIAN
# define __LONG_LONG_PAIR(HI, LO) LO, HI
#elif __BYTE_ORDER == __BIG_ENDIAN
# define __LONG_LONG_PAIR(HI, LO) HI, LO
#endif

and in glibc sysdeps/unix/sysv/linux/sys/fanotify.h I see:
extern int fanotify_mark (int __fanotify_fd, unsigned int __flags, uint64_t __mask, int __dfd, const char *__pathname);
with
sysdeps/unix/sysv/linux/s390/s390-32/syscalls.list:fanotify_mark        EXTRA   fanotify_mark   i:iiiiis        fanotify_mark


> At least an s390 the C ABI defines that 64 bit values are split into an
> even odd register pair, where the most significant bits are in the even numbered
> register.

and Dave wrote for hppa:
> In GCC, we typically have an odd even register pair to hold 64-bit
> values as register r0 is not usable.

This seems different.

> So for sys_fanotify_mark everything is fine on s390, and probably most other
> architectures as well. Having a 64 bit syscall parameter indeed does work,
> if all the architecture specific details have been correctly considered.

I think this is the problem!
For parisc the architecture specifc details have not been considered correctly.
I tried this test:

static int low32, high32;
SYSCALL_DEFINE5(fanotify_mark_test, int, fanotify_fd, unsigned int, flags,
                   __u64, mask, int, dfd, const char  __user *, pathname)
{
        low32 = (int) mask;
        high32 = (int) (mask >> 32);
}

and got:

        .section        .text.SyS_fanotify_mark_test,"ax",@progbits
        .align 4
.globl SyS_fanotify_mark_test
        .type   SyS_fanotify_mark_test, @function
SyS_fanotify_mark_test:
        .PROC
        .CALLINFO FRAME=64,NO_CALLS,SAVE_SP,ENTRY_GR=3
        .ENTRY
        copy %r3,%r1
        copy %r30,%r3
        stwm %r1,64(%r30)
        addil LR'low32-$global$,%r27
        ldi 0,%r28
        stw %r24,RR'low32-$global$(%r1)
        addil LR'high32-$global$,%r27
        stw %r23,RR'high32-$global$(%r1)
        ldo 64(%r3),%r30
        bv %r0(%r2)
        ldwm -64(%r30),%r3
        .EXIT
        .PROCEND

So on hppa r26 is fanotify_fd, %r25 is flags, %r24/%r23 is lower/higher 32bits of mask.
For the mask parameter this is different to what the __LONG_LONG_PAIR() marcro
would hand over to the syscall (which would be %r24/%r23 as higher/lower 32bits).

So, the problem is the usage of __u64 in the 32bit API. It has to be handled architecture-specific.
It seems to work for little-endian machines, and probably (by luck?!?) for s390, but I'm not sure if
it maybe breaks (like on parisc) on other arches, e.g. what about sparc?

For parisc I can work around that problem in the architecture-specifc coding, but I still
think using __64 here is wrong and just may lead to such bugs.

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Heiko Carstens July 7, 2014, 3:28 p.m. UTC | #6
On Mon, Jul 07, 2014 at 03:54:37PM +0200, Helge Deller wrote:
> Hi Heiko,
> > So for sys_fanotify_mark everything is fine on s390, and probably most other
> > architectures as well. Having a 64 bit syscall parameter indeed does work,
> > if all the architecture specific details have been correctly considered.
> 
> I think this is the problem!
[...]
> So on hppa r26 is fanotify_fd, %r25 is flags, %r24/%r23 is lower/higher 32bits of mask.
> For the mask parameter this is different to what the __LONG_LONG_PAIR() marcro
> would hand over to the syscall (which would be %r24/%r23 as higher/lower 32bits).
> 
> So, the problem is the usage of __u64 in the 32bit API. It has to be handled architecture-specific.
> It seems to work for little-endian machines, and probably (by luck?!?) for s390, but I'm not sure if
> it maybe breaks (like on parisc) on other arches, e.g. what about sparc?

No, it's not luck that it works on s390. Whenever we add a new entry to
our system call table, we make sure that 64 bit parameters, if present,
work on s390.
Otherwise we add s390 specific system calls like e.g. sys_s390_fallocate.

> For parisc I can work around that problem in the architecture-specifc coding, but I still
> think using __64 here is wrong and just may lead to such bugs.

There have always been problems with 64 bit system call parameters on 32 bit
architectures. See for example this thread:
http://oss.sgi.com/archives/xfs/2007-03/msg00557.html

David Woodhouse wrote summed up the system call ABI of a couple of
architectures a couple of years ago:
http://marc.info/?l=linux-arch&m=118277150812137&w=2

And there was also a system call howto from Ulrich Drepper:
http://copilotco.com/mail-archives/linux-kernel.2007/msg27844.html

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller July 11, 2014, 8 a.m. UTC | #7
On 07/07/2014 05:28 PM, Heiko Carstens wrote:
> On Mon, Jul 07, 2014 at 03:54:37PM +0200, Helge Deller wrote:
>> Hi Heiko,
>>> So for sys_fanotify_mark everything is fine on s390, and probably most other
>>> architectures as well. Having a 64 bit syscall parameter indeed does work,
>>> if all the architecture specific details have been correctly considered.
>>
>> I think this is the problem!
> [...]
>> So on hppa r26 is fanotify_fd, %r25 is flags, %r24/%r23 is lower/higher 32bits of mask.
>> For the mask parameter this is different to what the __LONG_LONG_PAIR() marcro
>> would hand over to the syscall (which would be %r24/%r23 as higher/lower 32bits).
>>
>> So, the problem is the usage of __u64 in the 32bit API. It has to be handled architecture-specific.
>> It seems to work for little-endian machines, and probably (by luck?!?) for s390, but I'm not sure if
>> it maybe breaks (like on parisc) on other arches, e.g. what about sparc?
> 
> No, it's not luck that it works on s390. Whenever we add a new entry to
> our system call table, we make sure that 64 bit parameters, if present,
> work on s390.
> Otherwise we add s390 specific system calls like e.g. sys_s390_fallocate.
> 
>> For parisc I can work around that problem in the architecture-specifc coding, but I still
>> think using __64 here is wrong and just may lead to such bugs.
> 
> There have always been problems with 64 bit system call parameters on 32 bit
> architectures. See for example this thread:
> http://oss.sgi.com/archives/xfs/2007-03/msg00557.html
> 
> David Woodhouse wrote summed up the system call ABI of a couple of
> architectures a couple of years ago:
> http://marc.info/?l=linux-arch&m=118277150812137&w=2
> 
> And there was also a system call howto from Ulrich Drepper:
> http://copilotco.com/mail-archives/linux-kernel.2007/msg27844.html

It finally turned out, that on hppa we end up with different assignments of parameters to kernel arguments depending on if we call the glibc wrapper function
int fanotify_mark (int __fanotify_fd, unsigned int __flags, uint64_t __mask, int __dfd, const char *__pathname);
or directly calling the syscall manually
syscall(__NR_fanotify_mark, ...)

Reason is, that the syscall() function is implemented as C-function and because we now have the sysno as first parameter in front of the other parameters the compiler will unexpectedly add an empty paramenter in front of the u64 value to ensure the correct calling alignment for 64bit values.
This means, on hppa you can't simply use syscall() to call the kernel fanotify_mark() function directly, but you have to use the glibc function instead.

I'll push this patch through the parisc-linux git tree:
https://patchwork.kernel.org/patch/4524561/
which fixes the kernel in the hppa-arch specifc coding to adjust the parameters in a way as if userspace calls the glibc wrapper function fanotify_mark().

So, please ignore my previous patches in this thread.

Thanks!
Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 3fdc8a3..374261c 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -787,6 +787,10 @@  SYSCALL_DEFINE5(fanotify_mark, int, fanotify_fd, unsigned int, flags,
 	struct path path;
 	int ret;
 
+#if defined(__BIG_ENDIAN) && !defined(CONFIG_64BIT)
+	mask = (mask << 32) | (mask >> 32);
+#endif
+
 	pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
 		 __func__, fanotify_fd, flags, dfd, pathname, mask);