Message ID | 4E177624.1030500@systemhalted.org (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On Friday, July 08, 2011 17:27:00 Carlos O'Donell wrote: > Implements futex op support and makes futex cmpxchg atomic. > Tested on 64-bit SMP kernel running on 2 x PA8700s. seems like all the changes to this func are arch-independent and could be placed into asm-generic/futex.h. the current parisc futex.h looks like a copy of the asm-generic one ... -mike
On 7/8/2011 5:42 PM, Mike Frysinger wrote: > On Friday, July 08, 2011 17:27:00 Carlos O'Donell wrote: >> Implements futex op support and makes futex cmpxchg atomic. >> Tested on 64-bit SMP kernel running on 2 x PA8700s. > > seems like all the changes to this func are arch-independent and could be > placed into asm-generic/futex.h. the current parisc futex.h looks like a copy > of the asm-generic one ... > -mike That's true. I didn't want to rock the boat on this one until I hammered out all the userspace testing. Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tested on rp3440 with 3.0.0-rc6+. Two full builds and checks of GCC have been done with make -j6. No anomalous behaviour was observed (i.e., this change may fix the gnat1 bug that I previously reported). Attached is full patch set used in testing. Tested-by: John David Anglin <dave.anglin@bell.net> On 8-Jul-11, at 5:27 PM, Carlos O'Donell wrote: > Implements futex op support and makes futex cmpxchg atomic. > Tested on 64-bit SMP kernel running on 2 x PA8700s. > > Signed-off-by: Carlos O'Donell <carlos@systemhalted.org> > --- > futex.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++++----- > 1 file changed, 59 insertions(+), 5 deletions(-) > > diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/ > asm/futex.h > index 67a33cc..94f2f4d 100644 > --- a/arch/parisc/include/asm/futex.h > +++ b/arch/parisc/include/asm/futex.h > @@ -5,11 +5,14 @@ > > #include <linux/futex.h> > #include <linux/uaccess.h> > +#include <asm/atomic.h> > #include <asm/errno.h> > > static inline int > futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) > { > + unsigned long int flags; > + u32 val; > int op = (encoded_op >> 28) & 7; > int cmp = (encoded_op >> 24) & 15; > int oparg = (encoded_op << 8) >> 20; > @@ -23,16 +26,53 @@ futex_atomic_op_inuser (int encoded_op, u32 > __user *uaddr) > > pagefault_disable(); > > + _atomic_spin_lock_irqsave (uaddr, flags); > + > switch (op) { > case FUTEX_OP_SET: > + /* *(int *)UADDR2 = OPARG; */ > + ret = get_user (oldval, uaddr); > + if (!ret) > + ret = put_user (oparg, uaddr); > + break; > case FUTEX_OP_ADD: > + /* *(int *)UADDR2 += OPARG; */ > + ret = get_user (oldval, uaddr); > + if (!ret) { > + val = oldval + oparg; > + ret = put_user (val, uaddr); > + } > + break; > case FUTEX_OP_OR: > + /* *(int *)UADDR2 |= OPARG; */ > + ret = get_user (oldval, uaddr); > + if (!ret) { > + val = oldval | oparg; > + ret = put_user (val, uaddr); > + } > + break; > case FUTEX_OP_ANDN: > + /* *(int *)UADDR2 &= ~OPARG; */ > + ret = get_user (oldval, uaddr); > + if (!ret) { > + val = oldval & ~oparg; > + ret = put_user (val, uaddr); > + } > + break; > case FUTEX_OP_XOR: > + /* *(int *)UADDR2 ^= OPARG; */ > + ret = get_user (oldval, uaddr); > + if (!ret) { > + val = oldval ^ oparg; > + ret = put_user (val, uaddr); > + } > + break; > default: > ret = -ENOSYS; > } > > + _atomic_spin_unlock_irqrestore (uaddr, flags); > + > pagefault_enable(); > > if (!ret) { > @@ -54,7 +94,9 @@ static inline int > futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > u32 oldval, u32 newval) > { > + int ret; > u32 val; > + unsigned long flags; > > /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is > * our gateway page, and causes no end of trouble... > @@ -65,12 +107,24 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 > __user *uaddr, > if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) > return -EFAULT; > > - if (get_user(val, uaddr)) > - return -EFAULT; > - if (val == oldval && put_user(newval, uaddr)) > - return -EFAULT; > + /* HPPA has no cmpxchg in hardware and therefore the > + * best we can do here is use an array of locks. The > + * lock selected is based on a hash of the userspace > + * address. This should scale to a couple of CPUs. > + */ > + > + _atomic_spin_lock_irqsave (uaddr, flags); > + > + ret = get_user(val, uaddr); > + > + if (!ret && val == oldval) > + ret = put_user (newval, uaddr); > + > *uval = val; > - return 0; > + > + _atomic_spin_unlock_irqrestore (uaddr, flags); > + > + return ret; > } > > #endif /*__KERNEL__*/ > -- > To unsubscribe from this list: send the line "unsubscribe linux- > parisc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Dave -- John David Anglin dave.anglin@bell.net
On 8-Jul-11, at 6:13 PM, Carlos O'Donell wrote: > On 7/8/2011 5:42 PM, Mike Frysinger wrote: >> On Friday, July 08, 2011 17:27:00 Carlos O'Donell wrote: >>> Implements futex op support and makes futex cmpxchg atomic. >>> Tested on 64-bit SMP kernel running on 2 x PA8700s. >> >> seems like all the changes to this func are arch-independent and >> could be >> placed into asm-generic/futex.h. the current parisc futex.h looks >> like a copy >> of the asm-generic one ... >> -mike > > That's true. I didn't want to rock the boat on this one until > I hammered out all the userspace testing. Do you have a userspace patch? There was the thread stack allocation bug and possibly futex related issues. I can more or less consistently hpmc my rp3440 running the GCC libgomp testsuite with 3.0.0-rc7+. Unfortunately, it seems hpmcs are somewhat deferred, so it's really hard to to tell what's happening. Usually, the machine hpmcs in the idle loop. One instance seemed to have a corrupt user space stack pointer. Another instance seemed to occur with inconsistent space registers in the kernel. Often two cpus check almost simultaneously. When this happens, both have the same "Assist Check" value (space register). The "Check Summary" is always 0x8400000000800000. I think Kyle fixed one of these last March. The cause was missing IPC system calls. I'm a bit vague regarding whether the fix was installed or not. Dave -- John David Anglin dave.anglin@bell.net -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 22, 2011 at 5:51 PM, John David Anglin <dave.anglin@bell.net> wrote: > Do you have a userspace patch? There was the thread stack allocation bug > and possibly futex related issues. I do have a userspace patch, what would you like the patch against? > The "Check Summary" is always 0x8400000000800000. I think Kyle fixed one of > these last March. > The cause was missing IPC system calls. I'm a bit vague regarding whether > the fix was installed > or not. I wouldn't expect a missing syscall to HPMC the machine, it should return ENOSYS and userspace should return a failure code... eventually something might fail to check the function return and fail. That's far from an HPMC though. Cheers, Carlos. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24-Jul-11, at 2:42 PM, Carlos O'Donell wrote: > On Fri, Jul 22, 2011 at 5:51 PM, John David Anglin <dave.anglin@bell.net > > wrote: >> Do you have a userspace patch? There was the thread stack >> allocation bug >> and possibly futex related issues. > > I do have a userspace patch, what would you like the patch against? The current version is unstable or 2.11.2-10. > >> The "Check Summary" is always 0x8400000000800000. I think Kyle >> fixed one of >> these last March. >> The cause was missing IPC system calls. I'm a bit vague regarding >> whether >> the fix was installed >> or not. > > I wouldn't expect a missing syscall to HPMC the machine, it should > return ENOSYS > and userspace should return a failure code... eventually something > might fail to > check the function return and fail. > > That's far from an HPMC though. Yah, I now think it was a matter of luck exposing the missing IPC calls. I wish we knew how to decode the "Check Summary" but I haven't found any documentation on the net. My current theory is the value indicates some kind of cache malfunction. I suspect that the corruption may occur during range based flushes due code containing inequivalent aliases. I think 0x2000000000000000 indicates a memory timeout. I did find a small bug in flush_cache_range (sr3 has wrong type). If I'm correct about the problem being range based, this bug probably made things better by calling flush_cache_all more frequently. I have a whole collection of hpmcs and almost all have the same check summary. It very hard to see much consistency. Many have hpmc's in the idle loop. A couple have hpmc's in flush_data_cache. It's always possible that there is a hardware problem, but the problem occurs so frequently in the libgomp testsuite that I have to think it is software triggered. I haven't been able to trigger by running compilations or tests manually. There must be some interconnection between the cores because a fault in one almost triggers a TOC hpmc in the other. The "Assist Check" value is a space register value and it's probably the context of the process that caused the cache problem. I'm currently running a test to check operation using mainly flush_instruction_cache and flush_data_cache (whole cache flushes). So far, it hasn't hpmc'd in the libgomp testsuite, but it's very slooooow. I plan to rebuild the main packages used in the libgomp testsuite to see if this helps. Dave -- John David Anglin dave.anglin@bell.net -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24-Jul-11, at 4:30 PM, John David Anglin wrote: > > On 24-Jul-11, at 2:42 PM, Carlos O'Donell wrote: > >> On Fri, Jul 22, 2011 at 5:51 PM, John David Anglin <dave.anglin@bell.net >> > wrote: >>> Do you have a userspace patch? There was the thread stack >>> allocation bug >>> and possibly futex related issues. >> >> I do have a userspace patch, what would you like the patch against? > > The current version is unstable or 2.11.2-10. I think we need a patch against the current unstable version (2.13-13). I have been trying to fix the HPMCs caused by flush_cache_range. I have modified it to use flush_cache_page. This didn't fix the HPMCs, so I added a check for inequivalent aliases along the lines as that in flush_dcache_page. Without that, I'm seeing warnings like the following: INEQUIVALENT ALIASES 0x40095000 and 0x40096000 in file pam_deny.so INEQUIVALENT ALIASES 0x40096000 and 0x40095000 in file pam_deny.so INEQUIVALENT ALIASES 0x40095000 and 0x40096000 in file pam_deny.so INEQUIVALENT ALIASES 0x40096000 and 0x40095000 in file pam_deny.so INEQUIVALENT ALIASES 0x40095000 and 0x40096000 in file pam_deny.so INEQUIVALENT ALIASES 0x40092000 and 0x40093000 in file pam_permit.so INEQUIVALENT ALIASES 0x40093000 and 0x40092000 in file pam_permit.so INEQUIVALENT ALIASES 0x40092000 and 0x40093000 in file pam_permit.so INEQUIVALENT ALIASES 0x40093000 and 0x40092000 in file pam_permit.so INEQUIVALENT ALIASES 0x40092000 and 0x40093000 in file pam_permit.so INEQUIVALENT ALIASES 0x40707000 and 0x40708000 in file pam_env.so INEQUIVALENT ALIASES 0x4008b000 and 0x4008c000 in file pam_limits.so I tried rebuilding pam from source but the only version I could find is 1.1.3-2. It has a dependency on multiarch-support. While it builds, it won't install. multiarch-support appears to be provided by eglibc 2.13. I'm hoping this is the cause of the HPMCs as the messages appear somewhat randomly. Another issue that I saw in debugging flush_cache_range is that it is sometimes called with "64-bit" ranges. There are no pte's and probably no page table for these flushes. Here's an example: vm_start 0xfb0e9000, vm_end 0x7fff1001000 start 0x7fff0fff000, end 0x7fff1001000, cr25 0x3c376000 sr0 0x0, sr1 0x0, sr2 0x0, sr3 0x800 sr4 0x0, sr5 0x0, sr6 0x0, sr7 0x0 The first call and some occasional subsequent calls have this wierd range. I am currently building 2.13 and will see what needs fixing. Dave -- John David Anglin dave.anglin@bell.net -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5-Aug-11, at 5:44 PM, John David Anglin wrote:
> I am currently building 2.13 and will see what needs fixing.
This is what I see:
# Testsuite failures, someone should be working towards
# fixing these! They are listed here for the purpose of
# regression testing during builds.
# Format: <Failed test>, Error <Make error code> [(ignored)]
#
annexc.out, Error 1 (ignored)
check-execstack.out, Error 1
check-localplt.out, Error 1
check-textrel.out, Error 1
test-double.out, Error 1
test-float.out, Error 1
test-idouble.out, Error 1
test-ifloat.out, Error 1
tst-addr1.out, Error 1
tst-audit2.out, Error 139
tst-backtrace2.out, Error 1
tst-cancelx20.out, Error 1
tst-cancelx21.out, Error 1
tst-cancelx4.out, Error 1
tst-cancelx5.out, Error 1
tst-cleanupx4.out, Error 1
tst-cond10.out, Error 1
tst-cond18.out, Error 1
tst-cond20.out, Error 1
tst-cond21.out, Error 1
tst-cpuclock2.out, Error 1
tst-cputimer1.out, Error 1
tst-cputimer2.out, Error 1
tst-cputimer3.out, Error 1
tst-getcpu.out, Error 1
tst-longjmp_chk.out, Error 1
tst-mqueue3.out, Error 1
tst-sprofil.out, Error 137
tst-timer4.out, Error 1
tst-timer5.out, Error 1
tst-tls11.out, Error 139
tst-tls3.out, Error 1
***************
Encountered regressions that don't match expected failures:
test-double.out, Error 1
test-float.out, Error 1
test-idouble.out, Error 1
test-ifloat.out, Error 1
tst-cond10.out, Error 1
tst-cond18.out, Error 1
tst-cond20.out, Error 1
tst-cond21.out, Error 1
tst-sprofil.out, Error 137
tst-tls11.out, Error 139
tst-tls3.out, Error 1
make: *** [/home/dave/debian/multiarch-support/eglibc-2.13/stamp-dir/
check_libc] Error 1
dpkg-buildpackage: error: debian/rules build gave error exit status 2
Dave
--
John David Anglin dave.anglin@bell.net
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6-Aug-11, at 11:33 AM, John David Anglin wrote: > On 5-Aug-11, at 5:44 PM, John David Anglin wrote: > >> I am currently building 2.13 and will see what needs fixing. For better or worse, I hacked the glibc fail list and installed 2.13-13 with multiarch support. This was a bit hairy because I didn't realize some of the dependencies. I should have read the debian transition wiki... In any case, I have installed and rebuilt a bunch of stuff including pam. I have my fingers crossed that the main cache bug is fixed on my rp3440. There are still issues and a lot of stuff to do to complete the transition. nscd is still broken: http://www.spinics.net/lists/linux-parisc/msg02195.html I probably shouldn't have installed it. Problem is fixed by disabling passwd and group caching in conf file. This is probably a glibc timer issue as problem goes away when commands are repeated. Multiarch support is a pain in the neck. The debian patches to binutils and GCC haven't been sent upstream, so normal GCC testing is broken (includes and libraries are now in target specific directories). I think the same is also true for glibc. Other packages also need to be aware of the change. Probably, the library universe needs completely rebuilding. Currently, I'm stuck trying to build the attr package. There seems to be a glibc issue with main which causes the build to fail. However, it might be a libtool issue (it probably needs to be multiarch aware). So, this is where we have to go to support PA-RISC debian linux. Dave -- John David Anglin dave.anglin@bell.net -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h index 67a33cc..94f2f4d 100644 --- a/arch/parisc/include/asm/futex.h +++ b/arch/parisc/include/asm/futex.h @@ -5,11 +5,14 @@ #include <linux/futex.h> #include <linux/uaccess.h> +#include <asm/atomic.h> #include <asm/errno.h> static inline int futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) { + unsigned long int flags; + u32 val; int op = (encoded_op >> 28) & 7; int cmp = (encoded_op >> 24) & 15; int oparg = (encoded_op << 8) >> 20; @@ -23,16 +26,53 @@ futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr) pagefault_disable(); + _atomic_spin_lock_irqsave (uaddr, flags); + switch (op) { case FUTEX_OP_SET: + /* *(int *)UADDR2 = OPARG; */ + ret = get_user (oldval, uaddr); + if (!ret) + ret = put_user (oparg, uaddr); + break; case FUTEX_OP_ADD: + /* *(int *)UADDR2 += OPARG; */ + ret = get_user (oldval, uaddr); + if (!ret) { + val = oldval + oparg; + ret = put_user (val, uaddr); + } + break; case FUTEX_OP_OR: + /* *(int *)UADDR2 |= OPARG; */ + ret = get_user (oldval, uaddr); + if (!ret) { + val = oldval | oparg; + ret = put_user (val, uaddr); + } + break; case FUTEX_OP_ANDN: + /* *(int *)UADDR2 &= ~OPARG; */ + ret = get_user (oldval, uaddr); + if (!ret) { + val = oldval & ~oparg; + ret = put_user (val, uaddr); + } + break; case FUTEX_OP_XOR: + /* *(int *)UADDR2 ^= OPARG; */ + ret = get_user (oldval, uaddr); + if (!ret) { + val = oldval ^ oparg; + ret = put_user (val, uaddr); + } + break; default: ret = -ENOSYS; } + _atomic_spin_unlock_irqrestore (uaddr, flags); + pagefault_enable(); if (!ret) { @@ -54,7 +94,9 @@ static inline int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval, u32 newval) { + int ret; u32 val; + unsigned long flags; /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is * our gateway page, and causes no end of trouble... @@ -65,12 +107,24 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) return -EFAULT; - if (get_user(val, uaddr)) - return -EFAULT; - if (val == oldval && put_user(newval, uaddr)) - return -EFAULT; + /* HPPA has no cmpxchg in hardware and therefore the + * best we can do here is use an array of locks. The + * lock selected is based on a hash of the userspace + * address. This should scale to a couple of CPUs. + */ + + _atomic_spin_lock_irqsave (uaddr, flags); + + ret = get_user(val, uaddr); + + if (!ret && val == oldval) + ret = put_user (newval, uaddr); + *uval = val; - return 0; + + _atomic_spin_unlock_irqrestore (uaddr, flags); + + return ret; } #endif /*__KERNEL__*/
Implements futex op support and makes futex cmpxchg atomic. Tested on 64-bit SMP kernel running on 2 x PA8700s. Signed-off-by: Carlos O'Donell <carlos@systemhalted.org> --- futex.h | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 59 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html