Message ID | 20170419114935.GE27829@leverpostej (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2017/4/19 19:49, Mark Rutland wrote: > Hi, > > Ard, this sseems to be a nomap issue. Please see below. > > Xiaojun, for some reason, the first message in this thread didn't seem > to make it to LAKML (or to me). In future could you please Cc me for > emails regarding perf on arm/arm64? > Sorry, this is my negligence. > On Wed, Apr 19, 2017 at 09:44:56AM +0530, Pratyush Anand wrote: >> On Saturday 15 April 2017 02:18 PM, Tan Xiaojun wrote: >>> My test server is Hisilicon D03/D05 (arm64). >>> Kernel source code is 4.11-rc6 (up to date) and config (as an attachment in the end) is generated by defconfig. >>> (Old version does not seem to have this problem. Linux-4.1 is fine and other versions I have not tested yet.) >> >> I tested with mustang(ARM64) and 4.11-rc6 and could not reproduce it. >> >>> When I do "perf top" and annotate a random kernel symbol (like vsnprintf or others), the system report an OOPS below: >>> (The probability of occurrence is very high, almost every time.) >>> >>> $ perf top >>> >>> Annotate vsnprintf ---- choose it >>> Zoom into perf(7066) thread >>> Zoom into the Kernel DSO >>> Browse map details >>> Run scripts for samples of thread [perf] >>> Run scripts for samples of symbol [vsnprintf] >>> Run scripts for all samples >>> Exit > > Was perf built from the same v4.11-rc6 source tree, or was this an older > perf binary? > No, I had used 4.10-rc7 or some other older perf binary. At first, I found this problem in kernel-4.10-rc7 + perf-4.10-rc7, and then I tested the other kernel versions. For now, I use "git bisect" and find the problem maybe between v4.5 and v4.6-rc1. I will try more and tell you the result. > With a perf tool built from v4.11-rc6, even with CAP_SYS_RAWIO, I see perf top > complaining that it it cannot annotate the symbol due to a lack of a vmlinux > file. I can't seem to convince it to use /proc/kcore. > > However, I can reproduce the issue by other means: > > # cat /proc/kcore > /dev/null > [ 4544.984139] Unable to handle kernel paging request at virtual address ffff804392800000 > [ 4544.991995] pgd = ffff80096745f000 > [ 4544.995369] [ffff804392800000] *pgd=0000000000000000 > [ 4545.000297] Internal error: Oops: 96000005 [#1] PREEMPT SMP > [ 4545.005815] Modules linked in: > [ 4545.008843] CPU: 1 PID: 8976 Comm: cat Not tainted 4.11.0-rc6 #1 > [ 4545.014790] Hardware name: ARM Juno development board (r1) (DT) > [ 4545.020653] task: ffff8009753fdb00 task.stack: ffff80097533c000 > [ 4545.026520] PC is at __memcpy+0x100/0x180 > [ 4545.030491] LR is at vread+0x144/0x280 > [ 4545.034202] pc : [<ffff0000083a1000>] lr : [<ffff0000081c126c>] pstate: 20000145 > [ 4545.041530] sp : ffff80097533fcb0 > [ 4545.044811] x29: ffff80097533fcb0 x28: ffff800962d24000 > [ 4545.050074] x27: 0000000000001000 x26: ffff8009753fdb00 > [ 4545.055337] x25: ffff000008200000 x24: ffff800977801380 > [ 4545.060600] x23: ffff8009753fdb00 x22: ffff800962d24000 > [ 4545.065863] x21: 0000000000001000 x20: ffff000008200000 > [ 4545.071125] x19: 0000000000001000 x18: 0000ffffefa323c0 > [ 4545.076387] x17: 0000ffffa9c87440 x16: ffff0000081fdfd0 > [ 4545.081649] x15: 0000ffffa9d01588 x14: 72a77346b2407be7 > [ 4545.086911] x13: 5299400690000000 x12: b0000001f9001a79 > [ 4545.092173] x11: 97fc098d91042260 x10: 0000000000000000 > [ 4545.097435] x9 : 0000000000000000 x8 : 9110626091260021 > [ 4545.102698] x7 : 0000000000001000 x6 : ffff800962d24000 > [ 4545.107960] x5 : ffff8009778013b0 x4 : 0000000000000000 > [ 4545.113222] x3 : 0400000000000001 x2 : 0000000000000f80 > [ 4545.118484] x1 : ffff804392800000 x0 : ffff800962d24000 > [ 4545.123745] > [ 4545.125220] Process cat (pid: 8976, stack limit = 0xffff80097533c000) > [ 4545.131598] Stack: (0xffff80097533fcb0 to 0xffff800975340000) > [ 4545.137289] fca0: ffff80097533fd30 ffff000008270f64 > [ 4545.145049] fcc0: 000000000000e000 000000003956f000 ffff000008f950d0 ffff80097533feb8 > [ 4545.152809] fce0: 0000000000002000 ffff8009753fdb00 ffff800962d24000 ffff000008e8d3d8 > [ 4545.160568] fd00: 0000000000001000 ffff000008200000 0000000000001000 ffff800962d24000 > [ 4545.168327] fd20: 0000000000001000 ffff000008e884a0 ffff80097533fdb0 ffff00000826340c > [ 4545.176086] fd40: ffff800976bf2800 fffffffffffffffb 000000003956d000 ffff80097533feb8 > [ 4545.183846] fd60: 0000000060000000 0000000000000015 0000000000000124 000000000000003f > [ 4545.191605] fd80: ffff000008962000 ffff8009753fdb00 ffff8009753fdb00 ffff8009753fdb00 > [ 4545.199364] fda0: 0000000300000124 0000000000002000 ffff80097533fdd0 ffff0000081fb83c > [ 4545.207123] fdc0: 0000000000010000 ffff80097514f900 ffff80097533fe50 ffff0000081fcb28 > [ 4545.214883] fde0: 0000000000010000 ffff80097514f900 0000000000000000 0000000000000000 > [ 4545.222642] fe00: ffff80097533fe30 ffff0000081fca1c ffff80097514f900 0000000000000000 > [ 4545.230401] fe20: 000000003956d000 ffff80097533feb8 ffff80097533fe50 ffff0000081fcb04 > [ 4545.238160] fe40: 0000000000010000 ffff80097514f900 ffff80097533fe80 ffff0000081fe014 > [ 4545.245919] fe60: ffff80097514f900 ffff80097514f900 000000003956d000 0000000000010000 > [ 4545.253678] fe80: 0000000000000000 ffff000008082f30 0000000000000000 0000800977146000 > [ 4545.261438] fea0: ffffffffffffffff 0000ffffa9c8745c 0000000000000124 0000000008202000 > [ 4545.269197] fec0: 0000000000000003 000000003956d000 0000000000010000 0000000000000000 > [ 4545.276956] fee0: 0000000000011011 0000000000000001 0000000000000011 0000000000000002 > [ 4545.284715] ff00: 000000000000003f 1f3c201f7372686b 00000000ffffffff 0000000000000030 > [ 4545.292474] ff20: 0000000000000038 0000000000000000 0000ffffa9bcca94 0000ffffa9d01588 > [ 4545.300233] ff40: 0000000000000000 0000ffffa9c87440 0000ffffefa323c0 0000000000010000 > [ 4545.307993] ff60: 000000000041a310 000000003956d000 0000000000000003 000000007fffe000 > [ 4545.315751] ff80: 00000000004088d0 0000000000010000 0000000000000000 0000000000000000 > [ 4545.323511] ffa0: 0000000000010000 0000ffffefa32690 0000000000404dcc 0000ffffefa32690 > [ 4545.331270] ffc0: 0000ffffa9c8745c 0000000060000000 0000000000000003 000000000000003f > [ 4545.339029] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 4545.346786] Call trace: > [ 4545.349207] Exception stack(0xffff80097533fae0 to 0xffff80097533fc10) > [ 4545.355586] fae0: 0000000000001000 0001000000000000 ffff80097533fcb0 ffff0000083a1000 > [ 4545.363345] fb00: 000000003957c000 ffff80097533fc00 0000000020000145 0000000000000025 > [ 4545.371105] fb20: ffff800962d24000 ffff000008e8d3d8 0000000000001000 ffff8009753fdb00 > [ 4545.378864] fb40: 0000000000000000 0000000000000002 ffff80097533fd30 ffff000008082604 > [ 4545.386623] fb60: 0000000000001000 0001000000000000 ffff80097533fd30 ffff0000083a0a90 > [ 4545.394382] fb80: ffff800962d24000 ffff804392800000 0000000000000f80 0400000000000001 > [ 4545.402140] fba0: 0000000000000000 ffff8009778013b0 ffff800962d24000 0000000000001000 > [ 4545.409899] fbc0: 9110626091260021 0000000000000000 0000000000000000 97fc098d91042260 > [ 4545.417658] fbe0: b0000001f9001a79 5299400690000000 72a77346b2407be7 0000ffffa9d01588 > [ 4545.425416] fc00: ffff0000081fdfd0 0000ffffa9c87440 > [ 4545.430248] [<ffff0000083a1000>] __memcpy+0x100/0x180 > [ 4545.435253] [<ffff000008270f64>] read_kcore+0x21c/0x3b0 > [ 4545.440429] [<ffff00000826340c>] proc_reg_read+0x64/0x90 > [ 4545.445691] [<ffff0000081fb83c>] __vfs_read+0x1c/0x108 > [ 4545.450779] [<ffff0000081fcb28>] vfs_read+0x80/0x130 > [ 4545.455696] [<ffff0000081fe014>] SyS_read+0x44/0xa0 > [ 4545.460528] [<ffff000008082f30>] el0_svc_naked+0x24/0x28 > [ 4545.465790] Code: d503201f d503201f d503201f d503201f (a8c12027) > [ 4545.471852] ---[ end trace 4d1897f94759f461 ]--- > [ 4545.476435] note: cat[8976] exited with preempt_count 2 > > So the call flow is: > > read_core() // finds the address is vmalloc or module > -> vread() > --> aligned_vread() > > In aligned_vread(), we vmalloc_to_page() the address, and find a page. > We then try to kmap_atomic() that. The generic kmap_atomic() returns the > linear map alias of the address. > > However, it appears that the page is nomap'd memory, and the linear > alias doesn't exist. Thus memcpy explodes when trying to access that > address. > > I've verified that with: > > ---- > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 0b05762..d7f48e0 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -9,6 +9,7 @@ > */ > > #include <linux/vmalloc.h> > +#include <linux/memblock.h> > #include <linux/mm.h> > #include <linux/module.h> > #include <linux/highmem.h> > @@ -1978,6 +1979,8 @@ static int aligned_vread(char *buf, char *addr, unsigned long count) > { > struct page *p; > int copied = 0; > + phys_addr_t phys; > + bool nomap; > > while (count) { > unsigned long offset, length; > @@ -2000,6 +2003,14 @@ static int aligned_vread(char *buf, char *addr, unsigned long count) > * function description) > */ > void *map = kmap_atomic(p); > + > + phys = page_to_phys(p); > + nomap = !memblock_is_map_memory(phys); > + > + pr_info("HARK: %s kmap'd %pa (%s memory) @ %p\n", > + __func__, &phys, (nomap ? "nomap" : "map"), > + map); > + > memcpy(buf, map + offset, length); > kunmap_atomic(map); > } else > ---- > > # cat /proc/kcore > /dev/null > > ... which eventually results in: > > [ 47.360980] HARK: aligned_vread kmap'd 0x000003e290005000 (nomap memory) @ ffff83e210005000 > [ 47.369297] Unable to handle kernel paging request at virtual address ffff83e210005000 > > I'm not sure what we should do here. > > I'm not immediately sure what the nomap region is. I'm using UEFI && DT, > so I guess it's not ACPI tables. > > I can try to dump more info later. > > Thanks, > Mark. Thanks a lot. Xiaojun. > > . >
On 2017/4/20 9:38, Tan Xiaojun wrote: > On 2017/4/19 19:49, Mark Rutland wrote: >> Hi, >> >> Ard, this sseems to be a nomap issue. Please see below. >> >> Xiaojun, for some reason, the first message in this thread didn't seem >> to make it to LAKML (or to me). In future could you please Cc me for >> emails regarding perf on arm/arm64? >> > > Sorry, this is my negligence. > >> On Wed, Apr 19, 2017 at 09:44:56AM +0530, Pratyush Anand wrote: >>> On Saturday 15 April 2017 02:18 PM, Tan Xiaojun wrote: >>>> My test server is Hisilicon D03/D05 (arm64). >>>> Kernel source code is 4.11-rc6 (up to date) and config (as an attachment in the end) is generated by defconfig. >>>> (Old version does not seem to have this problem. Linux-4.1 is fine and other versions I have not tested yet.) >>> >>> I tested with mustang(ARM64) and 4.11-rc6 and could not reproduce it. >>> Hi, Pratyush, Sorry, could you test it again? Because I tested it many times and found it is not triggered every time. And you can run "perf top -U" and try more kernel symbols to increase the probability of occurrence, or maybe you can try Mark's way "cat /proc/kcore > /dev/null". I would like to confirm whether this is hardware related, but I have no other arm64 boards except the boards of Hisilicon. >>>> When I do "perf top" and annotate a random kernel symbol (like vsnprintf or others), the system report an OOPS below: >>>> (The probability of occurrence is very high, almost every time.) >>>> >>>> $ perf top >>>> >>>> Annotate vsnprintf ---- choose it >>>> Zoom into perf(7066) thread >>>> Zoom into the Kernel DSO >>>> Browse map details >>>> Run scripts for samples of thread [perf] >>>> Run scripts for samples of symbol [vsnprintf] >>>> Run scripts for all samples >>>> Exit >> >> Was perf built from the same v4.11-rc6 source tree, or was this an older >> perf binary? >> > > No, I had used 4.10-rc7 or some other older perf binary. > At first, I found this problem in kernel-4.10-rc7 + perf-4.10-rc7, and then I tested the other kernel versions. > > For now, I use "git bisect" and find the problem maybe between v4.5 and v4.6-rc1. > > I will try more and tell you the result. > Hi, Mark, Ard, I found the patch which introduced the problem. The commit is: commit f9040773b7bbbd9e98eb6184a263512a7cfc133f Author: Ard Biesheuvel <ard.biesheuvel@linaro.org> Date: Tue Feb 16 13:52:40 2016 +0100 arm64: move kernel image to base of vmalloc area This moves the module area to right before the vmalloc area, and moves the kernel image to the base of the vmalloc area. This is an intermediate step towards implementing KASLR, which allows the kernel image to be located anywhere in the vmalloc area. Since other subsystems such as hibernate may still need to refer to the kernel text or data segments via their linears addresses, both are mapped in the linear region as well. The linear alias of the text region is mapped read-only/non-executable to prevent inadvertent modification or execution. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> It can work well without this patch in Linux-4.5-rc4. And it can trigger an OOPS with this patch in Linux-4.5-rc4. I try to revert it in v4.11-rc6, but it involves too much conflict. So I need to understand this patch fist. Then I can known where the problem is. Thanks. Xiaojun. >> With a perf tool built from v4.11-rc6, even with CAP_SYS_RAWIO, I see perf top >> complaining that it it cannot annotate the symbol due to a lack of a vmlinux >> file. I can't seem to convince it to use /proc/kcore. >> >> However, I can reproduce the issue by other means: >> >> # cat /proc/kcore > /dev/null >> [ 4544.984139] Unable to handle kernel paging request at virtual address ffff804392800000 >> [ 4544.991995] pgd = ffff80096745f000 >> [ 4544.995369] [ffff804392800000] *pgd=0000000000000000 >> [ 4545.000297] Internal error: Oops: 96000005 [#1] PREEMPT SMP >> [ 4545.005815] Modules linked in: >> [ 4545.008843] CPU: 1 PID: 8976 Comm: cat Not tainted 4.11.0-rc6 #1 >> [ 4545.014790] Hardware name: ARM Juno development board (r1) (DT) >> [ 4545.020653] task: ffff8009753fdb00 task.stack: ffff80097533c000 >> [ 4545.026520] PC is at __memcpy+0x100/0x180 >> [ 4545.030491] LR is at vread+0x144/0x280 >> [ 4545.034202] pc : [<ffff0000083a1000>] lr : [<ffff0000081c126c>] pstate: 20000145 >> [ 4545.041530] sp : ffff80097533fcb0 >> [ 4545.044811] x29: ffff80097533fcb0 x28: ffff800962d24000 >> [ 4545.050074] x27: 0000000000001000 x26: ffff8009753fdb00 >> [ 4545.055337] x25: ffff000008200000 x24: ffff800977801380 >> [ 4545.060600] x23: ffff8009753fdb00 x22: ffff800962d24000 >> [ 4545.065863] x21: 0000000000001000 x20: ffff000008200000 >> [ 4545.071125] x19: 0000000000001000 x18: 0000ffffefa323c0 >> [ 4545.076387] x17: 0000ffffa9c87440 x16: ffff0000081fdfd0 >> [ 4545.081649] x15: 0000ffffa9d01588 x14: 72a77346b2407be7 >> [ 4545.086911] x13: 5299400690000000 x12: b0000001f9001a79 >> [ 4545.092173] x11: 97fc098d91042260 x10: 0000000000000000 >> [ 4545.097435] x9 : 0000000000000000 x8 : 9110626091260021 >> [ 4545.102698] x7 : 0000000000001000 x6 : ffff800962d24000 >> [ 4545.107960] x5 : ffff8009778013b0 x4 : 0000000000000000 >> [ 4545.113222] x3 : 0400000000000001 x2 : 0000000000000f80 >> [ 4545.118484] x1 : ffff804392800000 x0 : ffff800962d24000 >> [ 4545.123745] >> [ 4545.125220] Process cat (pid: 8976, stack limit = 0xffff80097533c000) >> [ 4545.131598] Stack: (0xffff80097533fcb0 to 0xffff800975340000) >> [ 4545.137289] fca0: ffff80097533fd30 ffff000008270f64 >> [ 4545.145049] fcc0: 000000000000e000 000000003956f000 ffff000008f950d0 ffff80097533feb8 >> [ 4545.152809] fce0: 0000000000002000 ffff8009753fdb00 ffff800962d24000 ffff000008e8d3d8 >> [ 4545.160568] fd00: 0000000000001000 ffff000008200000 0000000000001000 ffff800962d24000 >> [ 4545.168327] fd20: 0000000000001000 ffff000008e884a0 ffff80097533fdb0 ffff00000826340c >> [ 4545.176086] fd40: ffff800976bf2800 fffffffffffffffb 000000003956d000 ffff80097533feb8 >> [ 4545.183846] fd60: 0000000060000000 0000000000000015 0000000000000124 000000000000003f >> [ 4545.191605] fd80: ffff000008962000 ffff8009753fdb00 ffff8009753fdb00 ffff8009753fdb00 >> [ 4545.199364] fda0: 0000000300000124 0000000000002000 ffff80097533fdd0 ffff0000081fb83c >> [ 4545.207123] fdc0: 0000000000010000 ffff80097514f900 ffff80097533fe50 ffff0000081fcb28 >> [ 4545.214883] fde0: 0000000000010000 ffff80097514f900 0000000000000000 0000000000000000 >> [ 4545.222642] fe00: ffff80097533fe30 ffff0000081fca1c ffff80097514f900 0000000000000000 >> [ 4545.230401] fe20: 000000003956d000 ffff80097533feb8 ffff80097533fe50 ffff0000081fcb04 >> [ 4545.238160] fe40: 0000000000010000 ffff80097514f900 ffff80097533fe80 ffff0000081fe014 >> [ 4545.245919] fe60: ffff80097514f900 ffff80097514f900 000000003956d000 0000000000010000 >> [ 4545.253678] fe80: 0000000000000000 ffff000008082f30 0000000000000000 0000800977146000 >> [ 4545.261438] fea0: ffffffffffffffff 0000ffffa9c8745c 0000000000000124 0000000008202000 >> [ 4545.269197] fec0: 0000000000000003 000000003956d000 0000000000010000 0000000000000000 >> [ 4545.276956] fee0: 0000000000011011 0000000000000001 0000000000000011 0000000000000002 >> [ 4545.284715] ff00: 000000000000003f 1f3c201f7372686b 00000000ffffffff 0000000000000030 >> [ 4545.292474] ff20: 0000000000000038 0000000000000000 0000ffffa9bcca94 0000ffffa9d01588 >> [ 4545.300233] ff40: 0000000000000000 0000ffffa9c87440 0000ffffefa323c0 0000000000010000 >> [ 4545.307993] ff60: 000000000041a310 000000003956d000 0000000000000003 000000007fffe000 >> [ 4545.315751] ff80: 00000000004088d0 0000000000010000 0000000000000000 0000000000000000 >> [ 4545.323511] ffa0: 0000000000010000 0000ffffefa32690 0000000000404dcc 0000ffffefa32690 >> [ 4545.331270] ffc0: 0000ffffa9c8745c 0000000060000000 0000000000000003 000000000000003f >> [ 4545.339029] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> [ 4545.346786] Call trace: >> [ 4545.349207] Exception stack(0xffff80097533fae0 to 0xffff80097533fc10) >> [ 4545.355586] fae0: 0000000000001000 0001000000000000 ffff80097533fcb0 ffff0000083a1000 >> [ 4545.363345] fb00: 000000003957c000 ffff80097533fc00 0000000020000145 0000000000000025 >> [ 4545.371105] fb20: ffff800962d24000 ffff000008e8d3d8 0000000000001000 ffff8009753fdb00 >> [ 4545.378864] fb40: 0000000000000000 0000000000000002 ffff80097533fd30 ffff000008082604 >> [ 4545.386623] fb60: 0000000000001000 0001000000000000 ffff80097533fd30 ffff0000083a0a90 >> [ 4545.394382] fb80: ffff800962d24000 ffff804392800000 0000000000000f80 0400000000000001 >> [ 4545.402140] fba0: 0000000000000000 ffff8009778013b0 ffff800962d24000 0000000000001000 >> [ 4545.409899] fbc0: 9110626091260021 0000000000000000 0000000000000000 97fc098d91042260 >> [ 4545.417658] fbe0: b0000001f9001a79 5299400690000000 72a77346b2407be7 0000ffffa9d01588 >> [ 4545.425416] fc00: ffff0000081fdfd0 0000ffffa9c87440 >> [ 4545.430248] [<ffff0000083a1000>] __memcpy+0x100/0x180 >> [ 4545.435253] [<ffff000008270f64>] read_kcore+0x21c/0x3b0 >> [ 4545.440429] [<ffff00000826340c>] proc_reg_read+0x64/0x90 >> [ 4545.445691] [<ffff0000081fb83c>] __vfs_read+0x1c/0x108 >> [ 4545.450779] [<ffff0000081fcb28>] vfs_read+0x80/0x130 >> [ 4545.455696] [<ffff0000081fe014>] SyS_read+0x44/0xa0 >> [ 4545.460528] [<ffff000008082f30>] el0_svc_naked+0x24/0x28 >> [ 4545.465790] Code: d503201f d503201f d503201f d503201f (a8c12027) >> [ 4545.471852] ---[ end trace 4d1897f94759f461 ]--- >> [ 4545.476435] note: cat[8976] exited with preempt_count 2 >> >> So the call flow is: >> >> read_core() // finds the address is vmalloc or module >> -> vread() >> --> aligned_vread() >> >> In aligned_vread(), we vmalloc_to_page() the address, and find a page. >> We then try to kmap_atomic() that. The generic kmap_atomic() returns the >> linear map alias of the address. >> >> However, it appears that the page is nomap'd memory, and the linear >> alias doesn't exist. Thus memcpy explodes when trying to access that >> address. >> >> I've verified that with: >> >> ---- >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 0b05762..d7f48e0 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -9,6 +9,7 @@ >> */ >> >> #include <linux/vmalloc.h> >> +#include <linux/memblock.h> >> #include <linux/mm.h> >> #include <linux/module.h> >> #include <linux/highmem.h> >> @@ -1978,6 +1979,8 @@ static int aligned_vread(char *buf, char *addr, unsigned long count) >> { >> struct page *p; >> int copied = 0; >> + phys_addr_t phys; >> + bool nomap; >> >> while (count) { >> unsigned long offset, length; >> @@ -2000,6 +2003,14 @@ static int aligned_vread(char *buf, char *addr, unsigned long count) >> * function description) >> */ >> void *map = kmap_atomic(p); >> + >> + phys = page_to_phys(p); >> + nomap = !memblock_is_map_memory(phys); >> + >> + pr_info("HARK: %s kmap'd %pa (%s memory) @ %p\n", >> + __func__, &phys, (nomap ? "nomap" : "map"), >> + map); >> + >> memcpy(buf, map + offset, length); >> kunmap_atomic(map); >> } else >> ---- >> >> # cat /proc/kcore > /dev/null >> >> ... which eventually results in: >> >> [ 47.360980] HARK: aligned_vread kmap'd 0x000003e290005000 (nomap memory) @ ffff83e210005000 >> [ 47.369297] Unable to handle kernel paging request at virtual address ffff83e210005000 >> >> I'm not sure what we should do here. >> >> I'm not immediately sure what the nomap region is. I'm using UEFI && DT, >> so I guess it's not ACPI tables. >> >> I can try to dump more info later. >> >> Thanks, >> Mark. > > Thanks a lot. > Xiaojun. > >> >> . >> >
Hi, On Fri, Apr 21, 2017 at 01:46:43PM +0800, Tan Xiaojun wrote: > On 2017/4/20 9:38, Tan Xiaojun wrote: > > On 2017/4/19 19:49, Mark Rutland wrote: > >> Hi, > >> > >> Ard, this sseems to be a nomap issue. Please see below. > >> > >> Xiaojun, for some reason, the first message in this thread didn't seem > >> to make it to LAKML (or to me). In future could you please Cc me for > >> emails regarding perf on arm/arm64? > >> > > > > Sorry, this is my negligence. > > > >> On Wed, Apr 19, 2017 at 09:44:56AM +0530, Pratyush Anand wrote: > >>> On Saturday 15 April 2017 02:18 PM, Tan Xiaojun wrote: > >>>> My test server is Hisilicon D03/D05 (arm64). > >>>> Kernel source code is 4.11-rc6 (up to date) and config (as an attachment in the end) is generated by defconfig. > >>>> (Old version does not seem to have this problem. Linux-4.1 is fine and other versions I have not tested yet.) > >>> > >>> I tested with mustang(ARM64) and 4.11-rc6 and could not reproduce it. > >>> > > Hi, > Pratyush, > > Sorry, could you test it again? Because I tested it many times and found it is not triggered every time. > And you can run "perf top -U" and try more kernel symbols to increase the probability of occurrence, or > maybe you can try Mark's way "cat /proc/kcore > /dev/null". > > I would like to confirm whether this is hardware related, but I have no other arm64 boards except the > boards of Hisilicon. As I mentioned in my prior reply, this is a bug in the way we handle nomap memory in the kernel. This is not hardware related, and this is not specific to perf. The kcore code expects that if a vmalloc mapping has a corresponding struct page, that it can be accessed via the linear mapping. However, this is not true for nomap memory. > I found the patch which introduced the problem. > The commit is: > > commit f9040773b7bbbd9e98eb6184a263512a7cfc133f > Author: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Date: Tue Feb 16 13:52:40 2016 +0100 > > arm64: move kernel image to base of vmalloc area > > This moves the module area to right before the vmalloc area, and moves > the kernel image to the base of the vmalloc area. This is an intermediate > step towards implementing KASLR, which allows the kernel image to be > located anywhere in the vmalloc area. > > Since other subsystems such as hibernate may still need to refer to the > kernel text or data segments via their linears addresses, both are mapped > in the linear region as well. The linear alias of the text region is > mapped read-only/non-executable to prevent inadvertent modification or > execution. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > > It can work well without this patch in Linux-4.5-rc4. And it can > trigger an OOPS with this patch in Linux-4.5-rc4. > > I try to revert it in v4.11-rc6, but it involves too much conflict. > So I need to understand this patch fist. Then I can known where the problem is. Reverting this patch is not the correct fix. The fix will either be changing the way we set things up for nomap memory, or with additions to the kcore or vread code to cater for nomap. Thanks, Mark.
On 2017/4/21 17:34, Mark Rutland wrote: > Hi, > > On Fri, Apr 21, 2017 at 01:46:43PM +0800, Tan Xiaojun wrote: >> On 2017/4/20 9:38, Tan Xiaojun wrote: >>> On 2017/4/19 19:49, Mark Rutland wrote: >>>> Hi, >>>> >>>> Ard, this sseems to be a nomap issue. Please see below. >>>> >>>> Xiaojun, for some reason, the first message in this thread didn't seem >>>> to make it to LAKML (or to me). In future could you please Cc me for >>>> emails regarding perf on arm/arm64? >>>> >>> >>> Sorry, this is my negligence. >>> >>>> On Wed, Apr 19, 2017 at 09:44:56AM +0530, Pratyush Anand wrote: >>>>> On Saturday 15 April 2017 02:18 PM, Tan Xiaojun wrote: >>>>>> My test server is Hisilicon D03/D05 (arm64). >>>>>> Kernel source code is 4.11-rc6 (up to date) and config (as an attachment in the end) is generated by defconfig. >>>>>> (Old version does not seem to have this problem. Linux-4.1 is fine and other versions I have not tested yet.) >>>>> >>>>> I tested with mustang(ARM64) and 4.11-rc6 and could not reproduce it. >>>>> >> >> Hi, >> Pratyush, >> >> Sorry, could you test it again? Because I tested it many times and found it is not triggered every time. >> And you can run "perf top -U" and try more kernel symbols to increase the probability of occurrence, or >> maybe you can try Mark's way "cat /proc/kcore > /dev/null". >> >> I would like to confirm whether this is hardware related, but I have no other arm64 boards except the >> boards of Hisilicon. > > As I mentioned in my prior reply, this is a bug in the way we handle > nomap memory in the kernel. > > This is not hardware related, and this is not specific to perf. > > The kcore code expects that if a vmalloc mapping has a corresponding > struct page, that it can be accessed via the linear mapping. However, > this is not true for nomap memory. > Yes, you are right. >> I found the patch which introduced the problem. >> The commit is: >> >> commit f9040773b7bbbd9e98eb6184a263512a7cfc133f >> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> Date: Tue Feb 16 13:52:40 2016 +0100 >> >> arm64: move kernel image to base of vmalloc area >> >> This moves the module area to right before the vmalloc area, and moves >> the kernel image to the base of the vmalloc area. This is an intermediate >> step towards implementing KASLR, which allows the kernel image to be >> located anywhere in the vmalloc area. >> >> Since other subsystems such as hibernate may still need to refer to the >> kernel text or data segments via their linears addresses, both are mapped >> in the linear region as well. The linear alias of the text region is >> mapped read-only/non-executable to prevent inadvertent modification or >> execution. >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> >> >> It can work well without this patch in Linux-4.5-rc4. And it can >> trigger an OOPS with this patch in Linux-4.5-rc4. >> >> I try to revert it in v4.11-rc6, but it involves too much conflict. >> So I need to understand this patch fist. Then I can known where the problem is. > > Reverting this patch is not the correct fix. > > The fix will either be changing the way we set things up for nomap > memory, or with additions to the kcore or vread code to cater for nomap. > It seems that the problem is serious and I want to fix it as soon as possible. But I know little about nomap memory. So if you can give a fix patch, I am glad to test it. ^-^ Thanks. Xiaojun. > Thanks, > Mark. > . >
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 0b05762..d7f48e0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -9,6 +9,7 @@ */ #include <linux/vmalloc.h> +#include <linux/memblock.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/highmem.h> @@ -1978,6 +1979,8 @@ static int aligned_vread(char *buf, char *addr, unsigned long count) { struct page *p; int copied = 0; + phys_addr_t phys; + bool nomap; while (count) { unsigned long offset, length; @@ -2000,6 +2003,14 @@ static int aligned_vread(char *buf, char *addr, unsigned long count) * function description) */ void *map = kmap_atomic(p); + + phys = page_to_phys(p); + nomap = !memblock_is_map_memory(phys); + + pr_info("HARK: %s kmap'd %pa (%s memory) @ %p\n", + __func__, &phys, (nomap ? "nomap" : "map"), + map); + memcpy(buf, map + offset, length); kunmap_atomic(map); } else