diff mbox series

[09/16] sparc64: use the generic get_user_pages_fast code

Message ID 20190625143715.1689-10-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/16] mm: use untagged_addr() for get_user_pages_fast addresses | expand

Commit Message

Christoph Hellwig June 25, 2019, 2:37 p.m. UTC
The sparc64 code is mostly equivalent to the generic one, minus various
bugfixes and two arch overrides that this patch adds to pgtable.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
---
 arch/sparc/Kconfig                  |   1 +
 arch/sparc/include/asm/pgtable_64.h |  18 ++
 arch/sparc/mm/Makefile              |   2 +-
 arch/sparc/mm/gup.c                 | 340 ----------------------------
 4 files changed, 20 insertions(+), 341 deletions(-)
 delete mode 100644 arch/sparc/mm/gup.c

Comments

Dmitry V. Levin July 17, 2019, 9:59 p.m. UTC | #1
Hi,

On Tue, Jun 25, 2019 at 04:37:08PM +0200, Christoph Hellwig wrote:
> The sparc64 code is mostly equivalent to the generic one, minus various
> bugfixes and two arch overrides that this patch adds to pgtable.h.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
> ---
>  arch/sparc/Kconfig                  |   1 +
>  arch/sparc/include/asm/pgtable_64.h |  18 ++
>  arch/sparc/mm/Makefile              |   2 +-
>  arch/sparc/mm/gup.c                 | 340 ----------------------------
>  4 files changed, 20 insertions(+), 341 deletions(-)
>  delete mode 100644 arch/sparc/mm/gup.c

So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505
(thanks to Anatoly for bisecting) and introduced a regression: 
futex.test from the strace test suite now causes an Oops on sparc64
in futex syscall.

Here is a heavily stripped down reproducer:

// SPDX-License-Identifier: GPL-2.0-or-later
#include <err.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <asm/unistd.h>
int main(void)
{
	size_t page_size = sysconf(_SC_PAGESIZE);
	size_t alloc_size = 3 * page_size;
	void *p = mmap(NULL, alloc_size, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	if (MAP_FAILED == p)
		err(EXIT_FAILURE, "mmap(%zu)", alloc_size);
	void *hole = p + page_size;
	if (munmap(hole, page_size))
		err(EXIT_FAILURE, "munmap(%p, %zu)", hole, page_size);
	syscall(__NR_futex, (unsigned long) hole, 0L, 0L, 0L, 0L, 0L);
	return 0;
}
Linus Torvalds July 17, 2019, 10:04 p.m. UTC | #2
On Wed, Jul 17, 2019 at 2:59 PM Dmitry V. Levin <ldv@altlinux.org> wrote:
>
> So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505
> (thanks to Anatoly for bisecting) and introduced a regression:
> futex.test from the strace test suite now causes an Oops on sparc64
> in futex syscall.

Can you post the oops here in the same thread too? Maybe it's already
posted somewhere else, but I can't seem to find anything likely on
lkml at least..

On x86-64, it obviously just causes the (expected) EFAULT error from
the futex call.

Somebody with access to sparc64 probably needs to debug this, but
having the exact oops wouldn't hurt...

             Linus
Dmitry V. Levin July 17, 2019, 11:30 p.m. UTC | #3
On Wed, Jul 17, 2019 at 03:04:56PM -0700, Linus Torvalds wrote:
> On Wed, Jul 17, 2019 at 2:59 PM Dmitry V. Levin <ldv@altlinux.org> wrote:
> >
> > So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505
> > (thanks to Anatoly for bisecting) and introduced a regression:
> > futex.test from the strace test suite now causes an Oops on sparc64
> > in futex syscall.
> 
> Can you post the oops here in the same thread too? Maybe it's already
> posted somewhere else, but I can't seem to find anything likely on
> lkml at least..

Sure, here it is:

[  514.137217] Unable to handle kernel paging request at virtual address 00060000541d0000
[  514.137295] tsk->{mm,active_mm}->context = 00000000000005b2
[  514.137343] tsk->{mm,active_mm}->pgd = fff80024955a2000
[  514.137387]               \|/ ____ \|/
                             "@'/ .. \`@"
                             /_| \__/ |_\
                                \__U_/
[  514.137493] futex(1599): Oops [#1]
[  514.137533] CPU: 26 PID: 1599 Comm: futex Not tainted 5.2.0-05721-gd3649f68b433 #1096
[  514.137595] TSTATE: 0000000011001603 TPC: 000000000051adc4 TNPC: 000000000051adc8 Y: 00000000    Not tainted
[  514.137678] TPC: <get_futex_key+0xe4/0x6a0>
[  514.137712] g0: 0000000000000000 g1: 0000000000e75178 g2: 000000000009a21d g3: 0000000000000000
[  514.137769] g4: fff8002474fbc0e0 g5: fff80024aa80c000 g6: fff8002495aec000 g7: 0000000000000200
[  514.137825] o0: 0000000000000001 o1: 0000000000000001 o2: 0000000000000000 o3: fff8002495aefa28
[  514.137882] o4: fff8000100030000 o5: fff800010002e000 sp: fff8002495aef161 ret_pc: 000000000051ada4
[  514.137944] RPC: <get_futex_key+0xc4/0x6a0>
[  514.137978] l0: 000000000051b144 l1: 0000000000000001 l2: 0000000000c01950 l3: fff80024626051c0
[  514.138036] l4: 0000000000c01970 l5: 0000000000cf6800 l6: 00060000541d13f0 l7: 00000000014b3000
[  514.138094] i0: 0000000000000001 i1: 000000000051af30 i2: fff8002495aefc28 i3: 0000000000000001
[  514.138152] i4: 0000000000cf69b0 i5: fff800010002e000 i6: fff8002495aef231 i7: 000000000051b3a8
[  514.138211] I7: <futex_wait_setup+0x28/0x120>
[  514.138245] Call Trace:
[  514.138271]  [000000000051b3a8] futex_wait_setup+0x28/0x120
[  514.138313]  [000000000051b550] futex_wait+0xb0/0x200
[  514.138352]  [000000000051d734] do_futex+0xd4/0xc00
[  514.138390]  [000000000051e384] sys_futex+0x124/0x140
[  514.138435]  [0000000000406294] linux_sparc_syscall+0x34/0x44
[  514.138478] Disabling lock debugging due to kernel taint
[  514.138501] Caller[000000000051b3a8]: futex_wait_setup+0x28/0x120
[  514.138524] Caller[000000000051b550]: futex_wait+0xb0/0x200
[  514.138547] Caller[000000000051d734]: do_futex+0xd4/0xc00
[  514.138568] Caller[000000000051e384]: sys_futex+0x124/0x140
[  514.138590] Caller[0000000000406294]: linux_sparc_syscall+0x34/0x44
[  514.138614] Caller[0000010000000e90]: 0x10000000e90
[  514.138633] Instruction DUMP:
[  514.138635]  0640016e 
[  514.138650]  b13da000 
[  514.138663]  ec5fa7f7 
[  514.138676] <c25da008>
[  514.138689]  ae100016 
[  514.138702]  84086001 
[  514.138714]  82007fff 
[  514.138727]  af789401 
[  514.138740]  f05de018
Linus Torvalds July 18, 2019, 12:17 a.m. UTC | #4
On Wed, Jul 17, 2019 at 4:30 PM Dmitry V. Levin <ldv@altlinux.org> wrote:
>
> Sure, here it is:

Hmm. I'm not seeing anything obviously wrong in the generic gup conversion.

From the oops, I assume that the problem is that get_user_pages_fast()
returned an invalid page, causing the bad access later in
get_futex_key(). But that's odd too, considering that
get_user_pages_fast() had already accessed the page (both for looking
up the head, and for then doing things like SetPageReferenced(page)).

The only half-way subtle thing is the pte_access_permitted() movement,
but it looks like it matches what gup_pte_range() did in the original
sparc64 code. And the address masking is done the same way too, as far
as I can tell.

So clearly there's something wrong there, but I'm not seeing it. Maybe
I'm incorrectly looking at that pte case, and the problem happened
earlier.

Anyway, I suspect some sparc64 person needs to delve into it.

I know this got reviewed by sparc64 people (the final commit message
only has a single Reviewed-by, but I see an Ack by Davem in my maill
that seems to have gotten lost by the time the patch made it in), but
maybe actually nobody ever _tested_ it until it hit my tree?

                   Linus
David Miller July 18, 2019, 1:21 a.m. UTC | #5
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 17 Jul 2019 17:17:16 -0700

> Anyway, I suspect some sparc64 person needs to delve into it.

I'll take a look at it soon if someone doesn't figure it out before
me.
David Miller July 18, 2019, 9:14 p.m. UTC | #6
From: "Dmitry V. Levin" <ldv@altlinux.org>
Date: Thu, 18 Jul 2019 00:59:56 +0300

> So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505
> (thanks to Anatoly for bisecting) and introduced a regression: 
> futex.test from the strace test suite now causes an Oops on sparc64
> in futex syscall.
> 
> Here is a heavily stripped down reproducer:

Does not reproduce for me on a T4-2 machine.

So this problem might depend on the type of system you are on,
I suspect it's one of those "pre-Niagara vs. Niagara and later"
situations because that's the dividing line between two set of
wildly different TLB and cache management methods.

What kind of machine are you on?
David Miller July 18, 2019, 9:43 p.m. UTC | #7
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 17 Jul 2019 17:17:16 -0700

> From the oops, I assume that the problem is that get_user_pages_fast()
> returned an invalid page, causing the bad access later in
> get_futex_key().

That's correct.  It's the first deref of page that oops's.


> But that's odd too, considering that get_user_pages_fast() had
> already accessed the page (both for looking up the head, and for
> then doing things like SetPageReferenced(page)).

Even the huge page cases all do that dereference as well, so it is
indeed a mystery how the pointer works inside of get_user_pages_fast()
but becomes garbage in the caller.

This page pointer sits on the stack, so maybe something stores garbage
there meanwhile.  Maybe the issue is even compiler dependent.

I'll keep looking over the changes made here for clues.
Christoph Hellwig July 19, 2019, 6 a.m. UTC | #8
On Thu, Jul 18, 2019 at 02:14:05PM -0700, David Miller wrote:
> From: "Dmitry V. Levin" <ldv@altlinux.org>
> Date: Thu, 18 Jul 2019 00:59:56 +0300
> 
> > So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505
> > (thanks to Anatoly for bisecting) and introduced a regression: 
> > futex.test from the strace test suite now causes an Oops on sparc64
> > in futex syscall.
> > 
> > Here is a heavily stripped down reproducer:
> 
> Does not reproduce for me on a T4-2 machine.
> 
> So this problem might depend on the type of system you are on,
> I suspect it's one of those "pre-Niagara vs. Niagara and later"
> situations because that's the dividing line between two set of
> wildly different TLB and cache management methods.
> 
> What kind of machine are you on?

FYI, I'm pretty sure I tested this on Guenthers build test qemu
setup in the end, which further speaks for the different machines
issue.
Anatoly Pugachev July 24, 2019, 7:32 p.m. UTC | #9
On Fri, Jul 19, 2019 at 12:14 AM David Miller <davem@davemloft.net> wrote:
> > So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505
> > (thanks to Anatoly for bisecting) and introduced a regression:
> > futex.test from the strace test suite now causes an Oops on sparc64
> > in futex syscall.
> >
> > Here is a heavily stripped down reproducer:
>
> Does not reproduce for me on a T4-2 machine.
>
> So this problem might depend on the type of system you are on,
> I suspect it's one of those "pre-Niagara vs. Niagara and later"
> situations because that's the dividing line between two set of
> wildly different TLB and cache management methods.
>
> What kind of machine are you on?

David,

the first test where it was discovered was done on my test LDOM named
ttip, hardware (hypervisor) is T5-2 server, running under Solaris 11.4
OS.
ttip LDOM is debian sparc64 unstable , so with almost all the latest
software (gcc 8.3.0, binutils 2.32.51.20190707-1, debian GLIBC
2.28-10, etc..)

For another test, i also installed LDOM with oracle sparc linux
https://oss.oracle.com/projects/linux-sparc/ , but I've to install a
more fresh version of gcc on it first, since system installed gcc 4.4
is too old for a git kernel (linux-2.6/Documentation/Changes lists gcc
4.6 as a minimal version), so I choose to install gcc-7.4.0 to /opt/
(leaving system installed gcc 4.4 under /usr/bin). Compiled and
installed git kernel version, i.e. last tag 5.3.0-rc1 and ran the
test. Kernel still produced oops.
David Miller July 24, 2019, 8:13 p.m. UTC | #10
From: Anatoly Pugachev <matorola@gmail.com>
Date: Wed, 24 Jul 2019 22:32:17 +0300

> the first test where it was discovered was done on my test LDOM named
> ttip, hardware (hypervisor) is T5-2 server, running under Solaris 11.4
> OS.
> ttip LDOM is debian sparc64 unstable , so with almost all the latest
> software (gcc 8.3.0, binutils 2.32.51.20190707-1, debian GLIBC
> 2.28-10, etc..)
> 
> For another test, i also installed LDOM with oracle sparc linux
> https://oss.oracle.com/projects/linux-sparc/ , but I've to install a
> more fresh version of gcc on it first, since system installed gcc 4.4
> is too old for a git kernel (linux-2.6/Documentation/Changes lists gcc
> 4.6 as a minimal version), so I choose to install gcc-7.4.0 to /opt/
> (leaving system installed gcc 4.4 under /usr/bin). Compiled and
> installed git kernel version, i.e. last tag 5.3.0-rc1 and ran the
> test. Kernel still produced oops.

I suspect, therefore, that we have a miscompile.

Please put your unstripped vmlinux image somewhere so I can take a closer
look.

Thank you.
Anatoly Pugachev July 25, 2019, 6:33 p.m. UTC | #11
On Wed, Jul 24, 2019 at 11:13 PM David Miller <davem@davemloft.net> wrote:
>
> From: Anatoly Pugachev <matorola@gmail.com>
> Date: Wed, 24 Jul 2019 22:32:17 +0300
>
> > the first test where it was discovered was done on my test LDOM named
> > ttip, hardware (hypervisor) is T5-2 server, running under Solaris 11.4
> > OS.
> > ttip LDOM is debian sparc64 unstable , so with almost all the latest
> > software (gcc 8.3.0, binutils 2.32.51.20190707-1, debian GLIBC
> > 2.28-10, etc..)
> >
> > For another test, i also installed LDOM with oracle sparc linux
> > https://oss.oracle.com/projects/linux-sparc/ , but I've to install a
> > more fresh version of gcc on it first, since system installed gcc 4.4
> > is too old for a git kernel (linux-2.6/Documentation/Changes lists gcc
> > 4.6 as a minimal version), so I choose to install gcc-7.4.0 to /opt/
> > (leaving system installed gcc 4.4 under /usr/bin). Compiled and
> > installed git kernel version, i.e. last tag 5.3.0-rc1 and ran the
> > test. Kernel still produced oops.
>
> I suspect, therefore, that we have a miscompile.
>
> Please put your unstripped vmlinux image somewhere so I can take a closer
> look.

David,

http://u164.east.ru/kernel/

there's vmlinuz-5.3.0-rc1 kernel and archive 5.3.0-rc1-modules.tar.gz
of /lib/modules/5.3.0-rc1/
this is from oracle sparclinux LDOM , compiled with 7.4.0 gcc

Thank you.
David Miller July 25, 2019, 10:52 p.m. UTC | #12
From: Anatoly Pugachev <matorola@gmail.com>
Date: Thu, 25 Jul 2019 21:33:24 +0300

> http://u164.east.ru/kernel/
> 
> there's vmlinuz-5.3.0-rc1 kernel and archive 5.3.0-rc1-modules.tar.gz
> of /lib/modules/5.3.0-rc1/
> this is from oracle sparclinux LDOM , compiled with 7.4.0 gcc

Thank you, I'll take a look.
Khalid Aziz July 26, 2019, 5:58 p.m. UTC | #13
On 7/17/19 3:59 PM, Dmitry V. Levin wrote:
> Hi,
> 
> On Tue, Jun 25, 2019 at 04:37:08PM +0200, Christoph Hellwig wrote:
>> The sparc64 code is mostly equivalent to the generic one, minus various
>> bugfixes and two arch overrides that this patch adds to pgtable.h.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
>> ---
>>  arch/sparc/Kconfig                  |   1 +
>>  arch/sparc/include/asm/pgtable_64.h |  18 ++
>>  arch/sparc/mm/Makefile              |   2 +-
>>  arch/sparc/mm/gup.c                 | 340 ----------------------------
>>  4 files changed, 20 insertions(+), 341 deletions(-)
>>  delete mode 100644 arch/sparc/mm/gup.c
> 
> So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505
> (thanks to Anatoly for bisecting) and introduced a regression: 
> futex.test from the strace test suite now causes an Oops on sparc64
> in futex syscall
> 

I have been working on reproducing this problem but ran into a different
problem. I found 5.1 and newer kernels no longer boot on an S7 server or
in an ldom on a T7 server (kernel hangs after "crc32c_sparc64: Using
sparc64 crc32c opcode optimized CRC32C implementation" on console). A
long git bisect session between 5.0 and 5.1 pointed to commit
73a66023c937 ("sparc64: fix sparc_ipc type conversion") but that makes
no sense. I will keep working on finding root cause. I wonder if
Anatoly's git bisect result is also suspect.

--
Khalid
David Miller July 28, 2019, 2:09 a.m. UTC | #14
From: Anatoly Pugachev <matorola@gmail.com>
Date: Thu, 25 Jul 2019 21:33:24 +0300

> http://u164.east.ru/kernel/
> 
> there's vmlinuz-5.3.0-rc1 kernel and archive 5.3.0-rc1-modules.tar.gz
> of /lib/modules/5.3.0-rc1/
> this is from oracle sparclinux LDOM , compiled with 7.4.0 gcc

Please, I really really need the unstripped kernel image with all the
symbols.  This vmlinuz file is stripped already.  The System.map does
not serve as a replacement.

Thank you.
Anatoly Pugachev July 28, 2019, 8 p.m. UTC | #15
On Sun, Jul 28, 2019 at 5:09 AM David Miller <davem@davemloft.net> wrote:
> From: Anatoly Pugachev <matorola@gmail.com>
> Date: Thu, 25 Jul 2019 21:33:24 +0300
> > there's vmlinuz-5.3.0-rc1 kernel and archive 5.3.0-rc1-modules.tar.gz
> > of /lib/modules/5.3.0-rc1/
> > this is from oracle sparclinux LDOM , compiled with 7.4.0 gcc
>
> Please, I really really need the unstripped kernel image with all the
> symbols.  This vmlinuz file is stripped already.  The System.map does
> not serve as a replacement.

David,

http://u164.east.ru/kernel2/

I'm sorry missed debug kernel first. Enabled CONFIG_DEBUG_INFO=y
Anatoly Pugachev Aug. 9, 2019, 7:59 p.m. UTC | #16
On Thu, Jul 18, 2019 at 12:59 AM Dmitry V. Levin <ldv@altlinux.org> wrote:
> On Tue, Jun 25, 2019 at 04:37:08PM +0200, Christoph Hellwig wrote:
> > The sparc64 code is mostly equivalent to the generic one, minus various
> > bugfixes and two arch overrides that this patch adds to pgtable.h.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
> > ---
> >  arch/sparc/Kconfig                  |   1 +
> >  arch/sparc/include/asm/pgtable_64.h |  18 ++
> >  arch/sparc/mm/Makefile              |   2 +-
> >  arch/sparc/mm/gup.c                 | 340 ----------------------------
> >  4 files changed, 20 insertions(+), 341 deletions(-)
> >  delete mode 100644 arch/sparc/mm/gup.c
>
> So this ended up as commit 7b9afb86b6328f10dc2cad9223d7def12d60e505

I've tried to revert this commit on a current master branch , but i'm getting :

linux-2.6$ git show 7b9afb86b632 > /tmp/gup.patch
linux-2.6$ patch -p1 -R < /tmp/gup.patch
...
linux-2.6$ make -j && make -j modules
...
  CALL    scripts/atomic/check-atomics.sh
  CALL    scripts/checksyscalls.sh
<stdin>:1511:2: warning: #warning syscall clone3 not implemented [-Wcpp]
  CHK     include/generated/compile.h
  CHK     include/generated/autoksyms.h
  GEN     .version
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  AR      init/built-in.a
  LD      vmlinux.o
ld: mm/gup.o: in function `__get_user_pages_fast':
gup.c:(.text+0x1bc0): multiple definition of `__get_user_pages_fast';
arch/sparc/mm/gup.o:gup.c:(.text+0x620): first defined here
ld: mm/gup.o: in function `get_user_pages_fast':
gup.c:(.text+0x1be0): multiple definition of `get_user_pages_fast';
arch/sparc/mm/gup.o:gup.c:(.text+0x740): first defined here
make: *** [Makefile:1060: vmlinux] Error 1

Can someone help me to revert this commit? Is it even possible? Since
it's not only futex strace calls getting killed and producing OOPS,
even util-linux.git 'make check' hangs machine/LDOM with multiple OOPS
in logs, while previous (before this commit) kernel passes tests ok
(and without kernel OOPS). I've already tried to compile current
master with gcc-6, gcc-7, gcc-8 debian versions, but all produce same
OOPS.

Thanks.
Christoph Hellwig Aug. 10, 2019, 7:17 a.m. UTC | #17
There isn't really a way to use an arch-specific get_user_pages_fast
in mainline, you'd need to revert the whole series.  As a relatively
quick workaround you can just remove the

        select HAVE_FAST_GUP if SPARC64

line from arch/sparc/Kconfig
Mikael Pettersson Aug. 10, 2019, 7:36 p.m. UTC | #18
For the record the futex test case OOPSes a 5.3-rc3 kernel running on
a Sun Blade 2500 (2 x USIIIi).  This system runs a custom distro with
a custom toolchain (gcc-8.3 based), so I doubt it's a distro problem.

On Sat, Aug 10, 2019 at 9:17 AM Christoph Hellwig <hch@lst.de> wrote:
>
> There isn't really a way to use an arch-specific get_user_pages_fast
> in mainline, you'd need to revert the whole series.  As a relatively
> quick workaround you can just remove the
>
>         select HAVE_FAST_GUP if SPARC64
>
> line from arch/sparc/Kconfig
Anatoly Pugachev Aug. 11, 2019, 8:30 p.m. UTC | #19
On Sat, Aug 10, 2019 at 10:36 PM Mikael Pettersson <mikpelinux@gmail.com> wrote:
> For the record the futex test case OOPSes a 5.3-rc3 kernel running on
> a Sun Blade 2500 (2 x USIIIi).  This system runs a custom distro with
> a custom toolchain (gcc-8.3 based), so I doubt it's a distro problem.

Mikael, Khalid,

can you please test util-linux source code with 'make check' on
current git kernel and post the results?
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git

Thanks.

As with my test machine/LDOM, util-linux 'make check' hangs git kernel
with the OOPS in the end of this message.

PS: And I was able to revert patch so current kernel git master branch
works again, futex strace test works as before (not being killed and
does not produce kernel OOPS), as well util-linux 'make check' does
not kills kernel. If anyone interested I can post the patch, but I'm
not sure it's a right thing to do, if all other architectures were
converted to use generic GUP code (mm/gup.c).


[   47.600488] BUG: Bad rss-counter state mm:00000000ae46ef00 idx:0 val:-17
[   47.600645] BUG: Bad rss-counter state mm:00000000ae46ef00 idx:1 val:102
[   47.673090] fdisk[4270]: segfault at 20 ip fff8000100007ed8 (rpc
fff8000100007e30) sp 000007feffe79661 error 1 in
ld-2.28.so[fff8000100000000+22000]
[   47.674415] BUG: Bad rss-counter state mm:00000000ca65883c idx:0 val:17
[   47.674722] BUG: Bad rss-counter state mm:00000000ca65883c idx:1 val:1
[   47.785453] ------------[ cut here ]------------
[   47.785722] WARNING: CPU: 17 PID: 96 at mm/slab.h:410
kmem_cache_free+0xb4/0x300
[   47.785880] virt_to_cache: Object is not a Slab page!
[   47.786003] Modules linked in: tun ip_set_hash_net ip_set nf_tables
nfnetlink binfmt_misc camellia_sparc64 des_sparc64 des_generic
aes_sparc64 md5_sparc64 sha512_sparc64 sha256_sparc64 n2_rng rng_core
flash sha1_sparc64 ip_tables x_tables ipv6 nf_defrag_ipv6 autofs4 ext4
crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy
async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear
md_mod crc32c_sparc64
[   47.787041] CPU: 17 PID: 96 Comm: ksoftirqd/17 Not tainted 5.3.0-rc3 #1143
[   47.787181] Call Trace:
[   47.787268]  [0000000000464540] __warn+0xc0/0x100
[   47.787384]  [00000000004645b4] warn_slowpath_fmt+0x34/0x60
[   47.787512]  [00000000006758f4] kmem_cache_free+0xb4/0x300
[   47.787648]  [0000000000451b68] pgtable_free+0x28/0x40
[   47.787779]  [00000000006470fc] tlb_remove_table_rcu+0x3c/0x80
[   47.787928]  [00000000004fae94] rcu_core+0xbd4/0x1000
[   47.788048]  [00000000004fb7ac] rcu_core_si+0xc/0x20
[   47.788177]  [0000000000abc648] __do_softirq+0x288/0x500
[   47.788296]  [000000000046beb0] run_ksoftirqd+0x30/0x80
[   47.788433]  [0000000000493c64] smpboot_thread_fn+0x244/0x280
[   47.788575]  [000000000048ef50] kthread+0x110/0x140
[   47.788707]  [00000000004060e4] ret_from_fork+0x1c/0x2c
[   47.788835]  [0000000000000000] 0x0
[   47.788927] irq event stamp: 19420
[   47.789028] hardirqs last  enabled at (19428): [<00000000004e2910>]
console_unlock+0x630/0x6c0
[   47.789230] hardirqs last disabled at (19435): [<00000000004e23dc>]
console_unlock+0xfc/0x6c0
[   47.789420] softirqs last  enabled at (19254): [<0000000000abc854>]
__do_softirq+0x494/0x500
[   47.789612] softirqs last disabled at (19259): [<000000000046beb0>]
run_ksoftirqd+0x30/0x80
[   47.789795] ---[ end trace afb11a4826780c48 ]---
[   47.925975] Unable to handle kernel paging request at virtual
address 0006120000000000
[   47.926088] tsk->{mm,active_mm}->context = 0000000000001b68
[   47.926150] tsk->{mm,active_mm}->pgd = fff8002438f90000
[   47.926202]               \|/ ____ \|/
[   47.926202]               "@'/ .. \`@"
[   47.926202]               /_| \__/ |_\
[   47.926202]                  \__U_/
[   47.926311] kworker/25:2(653): Oops [#1]
[   47.926354] CPU: 25 PID: 653 Comm: kworker/25:2 Tainted: G        W
        5.3.0-rc3 #1143
[   47.926433] Workqueue: xfs-conv/dm-0 xfs_end_io
[   47.926479] TSTATE: 0000000080001605 TPC: 000000000067588c TNPC:
0000000000675890 Y: 00000000    Tainted: G        W
[   47.926570] TPC: <kmem_cache_free+0x4c/0x300>
[   47.926611] g0: 0000000000675668 g1: 0006120000000000 g2:
0000004000000000 g3: 0006000000000000
[   47.926682] g4: fff80024938c8e40 g5: fff80024a83bc000 g6:
fff8002490254000 g7: 0000000000000102
[   47.926751] o0: 0000000000000000 o1: 0000000000d02c30 o2:
fff80024938c96b8 o3: 000000000000c000
[   47.926821] o4: 00000000014c3000 o5: 0000000000000000 sp:
fff80024ad577121 ret_pc: 00000000004d5148
[   47.926898] RPC: <lock_is_held_type+0x68/0xe0>
[   47.926940] l0: 0000000000000000 l1: 0000000000000000 l2:
00000000f0000000 l3: 0000000000000080
[   47.927010] l4: 0000000000d953c0 l5: 0000000000d953c0 l6:
0000000000000002 l7: 000000000000000b
[   47.927080] i0: fff800003040b1e0 i1: 0000000000000000 i2:
fff80024938c96b8 i3: fff80024938c8e40
[   47.927149] i4: 0000000000000004 i5: fff80024938c9730 i6:
fff80024ad5771d1 i7: 00000000006409f4
[   47.927223] I7: <ptlock_free+0x14/0x40>
[   47.927261] Call Trace:
[   47.927291]  [00000000006409f4] ptlock_free+0x14/0x40
[   47.927342]  [0000000000450a54] __pte_free+0x34/0x80
[   47.927388]  [0000000000451b54] pgtable_free+0x14/0x40
[   47.927436]  [00000000006470fc] tlb_remove_table_rcu+0x3c/0x80
[   47.927497]  [00000000004fae94] rcu_core+0xbd4/0x1000
[   47.927543]  [00000000004fb7ac] rcu_core_si+0xc/0x20
[   47.927593]  [0000000000abc648] __do_softirq+0x288/0x500
[   47.927644]  [000000000042d054] do_softirq_own_stack+0x34/0x60
[   47.927697]  [000000000046c1c8] irq_exit+0x68/0xe0
[   47.927742]  [0000000000abc1b8] timer_interrupt+0x98/0xc0
[   47.927791]  [0000000000427490] sys_call_table+0x780/0x970
[   47.927845]  [0000000000609ba8] test_clear_page_writeback+0x2c8/0x300
[   47.927900]  [00000000005f9d18] end_page_writeback+0x58/0xa0
[   47.927951]  [00000000007b83f8] xfs_destroy_ioend+0xf8/0x240
[   47.928002]  [00000000007b86a4] xfs_end_ioend+0x164/0x1e0
[   47.928050]  [00000000007b9550] xfs_end_io+0x90/0xc0
[   47.928095] Disabling lock debugging due to kernel taint
[   47.928118] Caller[00000000006409f4]: ptlock_free+0x14/0x40
[   47.928140] Caller[0000000000450a54]: __pte_free+0x34/0x80
[   47.928162] Caller[0000000000451b54]: pgtable_free+0x14/0x40
[   47.928184] Caller[00000000006470fc]: tlb_remove_table_rcu+0x3c/0x80
[   47.928208] Caller[00000000004fae94]: rcu_core+0xbd4/0x1000
[   47.928230] Caller[00000000004fb7ac]: rcu_core_si+0xc/0x20
[   47.928252] Caller[0000000000abc648]: __do_softirq+0x288/0x500
[   47.928278] Caller[000000000042d054]: do_softirq_own_stack+0x34/0x60
[   47.928306] Caller[000000000046c1c8]: irq_exit+0x68/0xe0
[   47.928330] Caller[0000000000abc1b8]: timer_interrupt+0x98/0xc0
[   47.928357] Caller[0000000000427490]: sys_call_table+0x780/0x970
[   47.928384] Caller[0000000000609b9c]: test_clear_page_writeback+0x2bc/0x300
[   47.928412] Caller[00000000005f9d18]: end_page_writeback+0x58/0xa0
[   47.928736] Caller[00000000007b83f8]: xfs_destroy_ioend+0xf8/0x240
[   47.928770] Caller[00000000007b86a4]: xfs_end_ioend+0x164/0x1e0
[   47.928798] Caller[00000000007b9550]: xfs_end_io+0x90/0xc0
[   47.928829] Caller[0000000000486ea4]: process_one_work+0x3e4/0x720
[   47.928858] Caller[00000000004874b8]: worker_thread+0x2d8/0x5a0
[   47.928888] Caller[000000000048ef50]: kthread+0x110/0x140
[   47.928922] Caller[00000000004060e4]: ret_from_fork+0x1c/0x2c
[   47.928952] Caller[0000000000000000]: 0x0
[   47.928973] Instruction DUMP:
[   47.928976]  82004002
[   47.928995]  83287003
[   47.929013]  82004003
[   47.929031] <c4586008>
[   47.929048]  8608a001
[   47.929065]  8400bfff
[   47.929082]  8578c401
[   47.929098]  82100002
[   47.929115]  c458a008
[   47.929132]
[   47.929161] Kernel panic - not syncing: Aiee, killing interrupt handler!
[   47.933949] Press Stop-A (L1-A) from sun keyboard or send break
[   47.933949] twice on console to return to the boot prom
[   47.933995] ---[ end Kernel panic - not syncing: Aiee, killing
interrupt handler! ]---
diff mbox series

Patch

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 26ab6f5bbaaf..22435471f942 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -28,6 +28,7 @@  config SPARC
 	select RTC_DRV_M48T59
 	select RTC_SYSTOHC
 	select HAVE_ARCH_JUMP_LABEL if SPARC64
+	select HAVE_GENERIC_GUP if SPARC64
 	select GENERIC_IRQ_SHOW
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select GENERIC_PCI_IOMAP
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 1904782dcd39..547ff96fb228 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1098,6 +1098,24 @@  static inline unsigned long untagged_addr(unsigned long start)
 }
 #define untagged_addr untagged_addr
 
+static inline bool pte_access_permitted(pte_t pte, bool write)
+{
+	u64 prot;
+
+	if (tlb_type == hypervisor) {
+		prot = _PAGE_PRESENT_4V | _PAGE_P_4V;
+		if (write)
+			prot |= _PAGE_WRITE_4V;
+	} else {
+		prot = _PAGE_PRESENT_4U | _PAGE_P_4U;
+		if (write)
+			prot |= _PAGE_WRITE_4U;
+	}
+
+	return (pte_val(pte) & (prot | _PAGE_SPECIAL)) == prot;
+}
+#define pte_access_permitted pte_access_permitted
+
 #include <asm/tlbflush.h>
 #include <asm-generic/pgtable.h>
 
diff --git a/arch/sparc/mm/Makefile b/arch/sparc/mm/Makefile
index d39075b1e3b7..b078205b70e0 100644
--- a/arch/sparc/mm/Makefile
+++ b/arch/sparc/mm/Makefile
@@ -5,7 +5,7 @@ 
 asflags-y := -ansi
 ccflags-y := -Werror
 
-obj-$(CONFIG_SPARC64)   += ultra.o tlb.o tsb.o gup.o
+obj-$(CONFIG_SPARC64)   += ultra.o tlb.o tsb.o
 obj-y                   += fault_$(BITS).o
 obj-y                   += init_$(BITS).o
 obj-$(CONFIG_SPARC32)   += extable.o srmmu.o iommu.o io-unit.o
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
deleted file mode 100644
index 1e770a517d4a..000000000000
--- a/arch/sparc/mm/gup.c
+++ /dev/null
@@ -1,340 +0,0 @@ 
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Lockless get_user_pages_fast for sparc, cribbed from powerpc
- *
- * Copyright (C) 2008 Nick Piggin
- * Copyright (C) 2008 Novell Inc.
- */
-
-#include <linux/sched.h>
-#include <linux/mm.h>
-#include <linux/vmstat.h>
-#include <linux/pagemap.h>
-#include <linux/rwsem.h>
-#include <asm/pgtable.h>
-#include <asm/adi.h>
-
-/*
- * The performance critical leaf functions are made noinline otherwise gcc
- * inlines everything into a single function which results in too much
- * register pressure.
- */
-static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
-		unsigned long end, int write, struct page **pages, int *nr)
-{
-	unsigned long mask, result;
-	pte_t *ptep;
-
-	if (tlb_type == hypervisor) {
-		result = _PAGE_PRESENT_4V|_PAGE_P_4V;
-		if (write)
-			result |= _PAGE_WRITE_4V;
-	} else {
-		result = _PAGE_PRESENT_4U|_PAGE_P_4U;
-		if (write)
-			result |= _PAGE_WRITE_4U;
-	}
-	mask = result | _PAGE_SPECIAL;
-
-	ptep = pte_offset_kernel(&pmd, addr);
-	do {
-		struct page *page, *head;
-		pte_t pte = *ptep;
-
-		if ((pte_val(pte) & mask) != result)
-			return 0;
-		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-
-		/* The hugepage case is simplified on sparc64 because
-		 * we encode the sub-page pfn offsets into the
-		 * hugepage PTEs.  We could optimize this in the future
-		 * use page_cache_add_speculative() for the hugepage case.
-		 */
-		page = pte_page(pte);
-		head = compound_head(page);
-		if (!page_cache_get_speculative(head))
-			return 0;
-		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-			put_page(head);
-			return 0;
-		}
-
-		pages[*nr] = page;
-		(*nr)++;
-	} while (ptep++, addr += PAGE_SIZE, addr != end);
-
-	return 1;
-}
-
-static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
-			unsigned long end, int write, struct page **pages,
-			int *nr)
-{
-	struct page *head, *page;
-	int refs;
-
-	if (!(pmd_val(pmd) & _PAGE_VALID))
-		return 0;
-
-	if (write && !pmd_write(pmd))
-		return 0;
-
-	refs = 0;
-	page = pmd_page(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-	head = compound_head(page);
-	do {
-		VM_BUG_ON(compound_head(page) != head);
-		pages[*nr] = page;
-		(*nr)++;
-		page++;
-		refs++;
-	} while (addr += PAGE_SIZE, addr != end);
-
-	if (!page_cache_add_speculative(head, refs)) {
-		*nr -= refs;
-		return 0;
-	}
-
-	if (unlikely(pmd_val(pmd) != pmd_val(*pmdp))) {
-		*nr -= refs;
-		while (refs--)
-			put_page(head);
-		return 0;
-	}
-
-	return 1;
-}
-
-static int gup_huge_pud(pud_t *pudp, pud_t pud, unsigned long addr,
-			unsigned long end, int write, struct page **pages,
-			int *nr)
-{
-	struct page *head, *page;
-	int refs;
-
-	if (!(pud_val(pud) & _PAGE_VALID))
-		return 0;
-
-	if (write && !pud_write(pud))
-		return 0;
-
-	refs = 0;
-	page = pud_page(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-	head = compound_head(page);
-	do {
-		VM_BUG_ON(compound_head(page) != head);
-		pages[*nr] = page;
-		(*nr)++;
-		page++;
-		refs++;
-	} while (addr += PAGE_SIZE, addr != end);
-
-	if (!page_cache_add_speculative(head, refs)) {
-		*nr -= refs;
-		return 0;
-	}
-
-	if (unlikely(pud_val(pud) != pud_val(*pudp))) {
-		*nr -= refs;
-		while (refs--)
-			put_page(head);
-		return 0;
-	}
-
-	return 1;
-}
-
-static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
-		int write, struct page **pages, int *nr)
-{
-	unsigned long next;
-	pmd_t *pmdp;
-
-	pmdp = pmd_offset(&pud, addr);
-	do {
-		pmd_t pmd = *pmdp;
-
-		next = pmd_addr_end(addr, end);
-		if (pmd_none(pmd))
-			return 0;
-		if (unlikely(pmd_large(pmd))) {
-			if (!gup_huge_pmd(pmdp, pmd, addr, next,
-					  write, pages, nr))
-				return 0;
-		} else if (!gup_pte_range(pmd, addr, next, write,
-					  pages, nr))
-			return 0;
-	} while (pmdp++, addr = next, addr != end);
-
-	return 1;
-}
-
-static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
-		int write, struct page **pages, int *nr)
-{
-	unsigned long next;
-	pud_t *pudp;
-
-	pudp = pud_offset(&pgd, addr);
-	do {
-		pud_t pud = *pudp;
-
-		next = pud_addr_end(addr, end);
-		if (pud_none(pud))
-			return 0;
-		if (unlikely(pud_large(pud))) {
-			if (!gup_huge_pud(pudp, pud, addr, next,
-					  write, pages, nr))
-				return 0;
-		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
-			return 0;
-	} while (pudp++, addr = next, addr != end);
-
-	return 1;
-}
-
-/*
- * Note a difference with get_user_pages_fast: this always returns the
- * number of pages pinned, 0 if no pages were pinned.
- */
-int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
-			  struct page **pages)
-{
-	struct mm_struct *mm = current->mm;
-	unsigned long addr, len, end;
-	unsigned long next, flags;
-	pgd_t *pgdp;
-	int nr = 0;
-
-#ifdef CONFIG_SPARC64
-	if (adi_capable()) {
-		long addr = start;
-
-		/* If userspace has passed a versioned address, kernel
-		 * will not find it in the VMAs since it does not store
-		 * the version tags in the list of VMAs. Storing version
-		 * tags in list of VMAs is impractical since they can be
-		 * changed any time from userspace without dropping into
-		 * kernel. Any address search in VMAs will be done with
-		 * non-versioned addresses. Ensure the ADI version bits
-		 * are dropped here by sign extending the last bit before
-		 * ADI bits. IOMMU does not implement version tags.
-		 */
-		addr = (addr << (long)adi_nbits()) >> (long)adi_nbits();
-		start = addr;
-	}
-#endif
-	start &= PAGE_MASK;
-	addr = start;
-	len = (unsigned long) nr_pages << PAGE_SHIFT;
-	end = start + len;
-
-	local_irq_save(flags);
-	pgdp = pgd_offset(mm, addr);
-	do {
-		pgd_t pgd = *pgdp;
-
-		next = pgd_addr_end(addr, end);
-		if (pgd_none(pgd))
-			break;
-		if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
-			break;
-	} while (pgdp++, addr = next, addr != end);
-	local_irq_restore(flags);
-
-	return nr;
-}
-
-int get_user_pages_fast(unsigned long start, int nr_pages,
-			unsigned int gup_flags, struct page **pages)
-{
-	struct mm_struct *mm = current->mm;
-	unsigned long addr, len, end;
-	unsigned long next;
-	pgd_t *pgdp;
-	int nr = 0;
-
-#ifdef CONFIG_SPARC64
-	if (adi_capable()) {
-		long addr = start;
-
-		/* If userspace has passed a versioned address, kernel
-		 * will not find it in the VMAs since it does not store
-		 * the version tags in the list of VMAs. Storing version
-		 * tags in list of VMAs is impractical since they can be
-		 * changed any time from userspace without dropping into
-		 * kernel. Any address search in VMAs will be done with
-		 * non-versioned addresses. Ensure the ADI version bits
-		 * are dropped here by sign extending the last bit before
-		 * ADI bits. IOMMU does not implements version tags,
-		 */
-		addr = (addr << (long)adi_nbits()) >> (long)adi_nbits();
-		start = addr;
-	}
-#endif
-	start &= PAGE_MASK;
-	addr = start;
-	len = (unsigned long) nr_pages << PAGE_SHIFT;
-	end = start + len;
-
-	/*
-	 * XXX: batch / limit 'nr', to avoid large irq off latency
-	 * needs some instrumenting to determine the common sizes used by
-	 * important workloads (eg. DB2), and whether limiting the batch size
-	 * will decrease performance.
-	 *
-	 * It seems like we're in the clear for the moment. Direct-IO is
-	 * the main guy that batches up lots of get_user_pages, and even
-	 * they are limited to 64-at-a-time which is not so many.
-	 */
-	/*
-	 * This doesn't prevent pagetable teardown, but does prevent
-	 * the pagetables from being freed on sparc.
-	 *
-	 * So long as we atomically load page table pointers versus teardown,
-	 * we can follow the address down to the the page and take a ref on it.
-	 */
-	local_irq_disable();
-
-	pgdp = pgd_offset(mm, addr);
-	do {
-		pgd_t pgd = *pgdp;
-
-		next = pgd_addr_end(addr, end);
-		if (pgd_none(pgd))
-			goto slow;
-		if (!gup_pud_range(pgd, addr, next, gup_flags & FOLL_WRITE,
-				   pages, &nr))
-			goto slow;
-	} while (pgdp++, addr = next, addr != end);
-
-	local_irq_enable();
-
-	VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
-	return nr;
-
-	{
-		int ret;
-
-slow:
-		local_irq_enable();
-
-		/* Try to get the remaining pages with get_user_pages */
-		start += nr << PAGE_SHIFT;
-		pages += nr;
-
-		ret = get_user_pages_unlocked(start,
-			(end - start) >> PAGE_SHIFT, pages,
-			gup_flags);
-
-		/* Have to be a bit careful with return values */
-		if (nr > 0) {
-			if (ret < 0)
-				ret = nr;
-			else
-				ret += nr;
-		}
-
-		return ret;
-	}
-}