Message ID | 1474643941-109020-2-git-send-email-labath@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Pavel, On Fri, Sep 23, 2016 at 04:19:00PM +0100, Pavel Labath wrote: > Arm64 hardware does not always report a watchpoint hit address that > matches one of the watchpoints set. It can also report an address > "near" the watchpoint if a single instruction access both watched and > unwatched addresses. There is no straight-forward way, short of > disassembling the offending instruction, to map that address back to > the watchpoint. > > Previously, when the hardware reported a watchpoint hit on an address > that did not match our watchpoint (this happens in case of instructions > which access large chunks of memory such as "stp") the process would > enter a loop where we would be continually resuming it (because we did > not recognise that watchpoint hit) and it would keep hitting the > watchpoint again and again. The tracing process would never get > notified of the watchpoint hit. > > This commit fixes the problem by looking at the watchpoints near the > address reported by the hardware. If the address does not exactly match > one of the watchpoints we have set, it attributes the hit to the > nearest watchpoint we have. This heuristic is a bit dodgy, but I don't > think we can do much more, given the hardware limitations. > > Signed-off-by: Pavel Labath <labath@google.com> > --- > arch/arm64/kernel/hw_breakpoint.c | 98 +++++++++++++++++++++++++-------------- > 1 file changed, 64 insertions(+), 34 deletions(-) If the first patch in the series is no longer required (as you stated in your follow-up reply), then you can just drop it. > diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c > index 14562ae..3ce27ea 100644 > --- a/arch/arm64/kernel/hw_breakpoint.c > +++ b/arch/arm64/kernel/hw_breakpoint.c > @@ -664,49 +664,63 @@ unlock: > } > NOKPROBE_SYMBOL(breakpoint_handler); > > +/* > + * Arm64 hardware does not always report a watchpoint hit address that matches > + * one of the watchpoints set. It can also report an address "near" the > + * watchpoint if a single instruction access both watched and unwatched > + * addresses. There is no straight-forward way, short of disassembling the > + * offending instruction, to map that address back to the watchpoint. This > + * function computes the distance of the memory access from the watchpoint as a > + * heuristic for the likelyhood that a given access triggered the watchpoint. > + * > + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint > + * exception" of ARMv8 Architecture Reference Manual for details. > + * > + * The function returns the distance of the address from the bytes watched by > + * the watchpoint. In case of an exact match, it returns 0. > + */ > +static u64 get_distance_from_watchpoint(unsigned long addr, int i, > + struct arch_hw_breakpoint *info) > +{ > + u64 wp_low, wp_high; > + int first_bit; > + > + first_bit = ffs(info->ctrl.len); > + if (first_bit == 0) > + return -1; > + > + wp_low = info->address + first_bit - 1; > + wp_high = info->address + fls(info->ctrl.len) - 1; This would all be cleaner if you just called get_hbp_len(info->ctrl.len) to get the size of the watchpoint. We don't do anything sophisticated with the BAS, so you can assume everything is base + len. > @@ -723,10 +748,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, > /* Do we need to handle the stepping? */ > if (is_default_overflow_handler(wp)) > step = 1; > - > -unlock: > - rcu_read_unlock(); > } > + if (min_dist > 0 && min_dist != -1) { min_dist is unsigned, so this could be: if (min_dist + 1 > 1) Will
On Fri, Sep 23, 2016 at 8:49 PM, Pavel Labath <test.tberghammer@gmail.com> wrote: > Arm64 hardware does not always report a watchpoint hit address that > matches one of the watchpoints set. It can also report an address > "near" the watchpoint if a single instruction access both watched and > unwatched addresses. There is no straight-forward way, short of > disassembling the offending instruction, to map that address back to > the watchpoint. > > Previously, when the hardware reported a watchpoint hit on an address > that did not match our watchpoint (this happens in case of instructions > which access large chunks of memory such as "stp") the process would > enter a loop where we would be continually resuming it (because we did > not recognise that watchpoint hit) and it would keep hitting the > watchpoint again and again. The tracing process would never get > notified of the watchpoint hit. > > This commit fixes the problem by looking at the watchpoints near the > address reported by the hardware. If the address does not exactly match > one of the watchpoints we have set, it attributes the hit to the > nearest watchpoint we have. This heuristic is a bit dodgy, but I don't > think we can do much more, given the hardware limitations. IIUC, then you see an issue when an address watched is not the base address accessed by the instruction. For example, if an address 'a+8' is watched and an instruction accesses instruction from a to a +16. I tried to reproduce the issue with mustang using your test-case in patch3 (after couple of syntax modifcations for resolving compilation issue with gcc). All the test case did pass with existing code in v4.8. I noticed that, watchpoint exception is generated if any of the sub-location accessed from a single instruction is watched, provided watchdpoint watches either a byte, half word, word or double word from the base. So, either I must be missing something or the problem is not related to all arm64 platform. However, I did notice that it does not work if we watch an address which is at some offset from address programmed. For example, it works when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which is supported by hardware). I do have some patches to resolve that. https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel I will send them for review comment after some testing. ~Pratyush > > Signed-off-by: Pavel Labath <labath@google.com> > --- > arch/arm64/kernel/hw_breakpoint.c | 98 +++++++++++++++++++++++++-------------- > 1 file changed, 64 insertions(+), 34 deletions(-) > > diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c > index 14562ae..3ce27ea 100644 > --- a/arch/arm64/kernel/hw_breakpoint.c > +++ b/arch/arm64/kernel/hw_breakpoint.c > @@ -664,49 +664,63 @@ unlock: > } > NOKPROBE_SYMBOL(breakpoint_handler); > > +/* > + * Arm64 hardware does not always report a watchpoint hit address that matches > + * one of the watchpoints set. It can also report an address "near" the > + * watchpoint if a single instruction access both watched and unwatched > + * addresses. There is no straight-forward way, short of disassembling the > + * offending instruction, to map that address back to the watchpoint. This > + * function computes the distance of the memory access from the watchpoint as a > + * heuristic for the likelyhood that a given access triggered the watchpoint. > + * > + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint > + * exception" of ARMv8 Architecture Reference Manual for details. > + * > + * The function returns the distance of the address from the bytes watched by > + * the watchpoint. In case of an exact match, it returns 0. > + */ > +static u64 get_distance_from_watchpoint(unsigned long addr, int i, > + struct arch_hw_breakpoint *info) > +{ > + u64 wp_low, wp_high; > + int first_bit; > + > + first_bit = ffs(info->ctrl.len); > + if (first_bit == 0) > + return -1; > + > + wp_low = info->address + first_bit - 1; > + wp_high = info->address + fls(info->ctrl.len) - 1; > + if (addr < wp_low) > + return wp_low - addr; > + else if (addr > wp_high) > + return addr - wp_high; > + else > + return 0; > + > +} > + > static int watchpoint_handler(unsigned long addr, unsigned int esr, > struct pt_regs *regs) > { > - int i, step = 0, *kernel_step, access; > - u32 ctrl_reg; > - u64 val, alignment_mask; > + int i, step = 0, *kernel_step, access, closest_match = 0; > + u64 min_dist = -1, dist; > struct perf_event *wp, **slots; > struct debug_info *debug_info; > struct arch_hw_breakpoint *info; > - struct arch_hw_breakpoint_ctrl ctrl; > > slots = this_cpu_ptr(wp_on_reg); > debug_info = ¤t->thread.debug; > > + /* > + * Find all watchpoints that match the reported address. If no exact > + * match is found. Attribute the hit to the closest watchpoint. > + */ > + rcu_read_lock(); > for (i = 0; i < core_num_wrps; ++i) { > - rcu_read_lock(); > - > wp = slots[i]; > - > if (wp == NULL) > - goto unlock; > - > - info = counter_arch_bp(wp); > - /* AArch32 watchpoints are either 4 or 8 bytes aligned. */ > - if (is_compat_task()) { > - if (info->ctrl.len == ARM_BREAKPOINT_LEN_8) > - alignment_mask = 0x7; > - else > - alignment_mask = 0x3; > - } else { > - alignment_mask = 0x7; > - } > - > - /* Check if the watchpoint value matches. */ > - val = read_wb_reg(AARCH64_DBG_REG_WVR, i); > - if (val != (addr & ~alignment_mask)) > - goto unlock; > - > - /* Possible match, check the byte address select to confirm. */ > - ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i); > - decode_ctrl_reg(ctrl_reg, &ctrl); > - if (!((1 << (addr & alignment_mask)) & ctrl.len)) > - goto unlock; > + continue; > > /* > * Check that the access type matches. > @@ -715,7 +729,18 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, > access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W : > HW_BREAKPOINT_R; > if (!(access & hw_breakpoint_type(wp))) > - goto unlock; > + continue; > + > + info = counter_arch_bp(wp); > + > + dist = get_distance_from_watchpoint(addr, i, info); > + if (dist < min_dist) { > + min_dist = dist; > + closest_match = i; > + } > + /* Is this an exact match? */ > + if (dist != 0) > + continue; > > info->trigger = addr; > perf_bp_event(wp, regs); > @@ -723,10 +748,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, > /* Do we need to handle the stepping? */ > if (is_default_overflow_handler(wp)) > step = 1; > - > -unlock: > - rcu_read_unlock(); > } > + if (min_dist > 0 && min_dist != -1) { > + /* No exact match found. */ > + wp = slots[closest_match]; > + info = counter_arch_bp(wp); > + info->trigger = addr; > + perf_bp_event(wp, regs); > + } > + rcu_read_unlock(); > > if (!step) > return 0; > -- > 2.8.0.rc3.226.g39d4020 >
On 7 October 2016 at 09:38, Pratyush Anand <panand@redhat.com> wrote: > > > IIUC, then you see an issue when an address watched is not the base > address accessed by the instruction. For example, if an address 'a+8' > is watched and an instruction accesses instruction from a to a +16. I > tried to reproduce the issue with mustang using your test-case in > patch3 (after couple of syntax modifcations for resolving compilation > issue with gcc). All the test case did pass with existing code in > v4.8. I noticed that, watchpoint exception is generated if any of the > sub-location accessed from a single instruction is watched, provided > watchdpoint watches either a byte, half word, word or double word > from the base. > > > So, either I must be missing something or the problem is not related > to all arm64 platform. Hello Pratyush, Thank you for looking into this. The thing is, I have observed different behavior here depending on the exact hardware used. I don't have the exact parameters with me now, but I can look it up next week. The thing is that the spec is imprecise about what exact address the hardware can report for the watchpoint hit. I presume that is deliberate to give some leeway to implementers. The spec says the address can be anywhere in the range from the lowest memory address accessed by the instruction to the highest address watched by the watchpoint, but most hardware seems to be stricter than that and return an address that fits inside the watched range. On chip 1, I observed the behavior where the hardware would consistently report an address out of range of the watchpoint and we would just spin it in a loop. On chip 2, I observed the behavior where the hardware would report an out-of-range address for the first two dozen (~) iterations, after which it would "give up" and report an address that we were happy with. I don't really have an explanation for this - I can only assume that some external event like a reschedule to a different core caused some internal state of the hardware to be reset and cause it to report a different (better?) address instead. In the case where this was happening, it had no observable effects on userspace - it did not see the fact that we had re-executed the offending instruction a dozen times and as far as it was concerned, the watchpoint functionality worked perfectly. You can check whether this is happening in your case by instrumenting the code to print the reported address whenever it enters `watchpoint_handler`. (I am sorry about the test errors. I was compiling the test case with an android gcc - I'll make sure to check it with a vanilla linux gcc also.) > > However, I did notice that it does not work if we watch an address > which is at some offset from address programmed. For example, it works > when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which > is supported by hardware). > > I do have some patches to resolve that. > > https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel > > I will send them for review comment after some testing. I am looking forward to these patches - they were the next on my list to look into after I got this resolved. :) However: Are sure about 0x2 not being a valid byte mask? According to my reading of the armv8 spec (section D7.3.11, "DBGWCR<n>_EL1, Debug Watchpoint Control Registers, n = 0 - 15") it should be fine. ==== The valid values for BAS are 0b0000000, or a binary number all of whose set bits are contiguous. All other values are reserved and must not be used by software. ==== So, 0x2 (as well as 0x6, 0xC, 0xE) should be fine as it has a contiguous sequence of set bit(s). I haven't tried yet whether any hardware actually handles that correctly, but I was certainly hoping we would be able to watch more precise memory regions. regards, Pavel
Hi Pavel, On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote: > On 7 October 2016 at 09:38, Pratyush Anand <panand@redhat.com> wrote: >> >> >> IIUC, then you see an issue when an address watched is not the base >> address accessed by the instruction. For example, if an address 'a+8' >> is watched and an instruction accesses instruction from a to a +16. I >> tried to reproduce the issue with mustang using your test-case in >> patch3 (after couple of syntax modifcations for resolving compilation >> issue with gcc). All the test case did pass with existing code in >> v4.8. I noticed that, watchpoint exception is generated if any of the >> sub-location accessed from a single instruction is watched, provided >> watchdpoint watches either a byte, half word, word or double word >> from the base. >> >> >> So, either I must be missing something or the problem is not related >> to all arm64 platform. > > > Hello Pratyush, > > Thank you for looking into this. > > The thing is, I have observed different behavior here depending on the > exact hardware used. I don't have the exact parameters with me now, > but I can look it up next week. > > The thing is that the spec is imprecise about what exact address the > hardware can report for the watchpoint hit. I presume that is > deliberate to give some leeway to implementers. The spec says the > address can be anywhere in the range from the lowest memory address > accessed by the instruction to the highest address watched by the > watchpoint, I think, my patches should be able to take care of the above condition. > but most hardware seems to be stricter than that and > return an address that fits inside the watched range. > > On chip 1, I observed the behavior where the hardware would > consistently report an address out of range of the watchpoint and we > would just spin it in a loop. > > On chip 2, I observed the behavior where the hardware would report an > out-of-range address for the first two dozen (~) iterations, after > which it would "give up" and report an address that we were happy > with. I don't really have an explanation for this - I can only assume > that some external event like a reschedule to a different core caused > some internal state of the hardware to be reset and cause it to report > a different (better?) address instead. In the case where this was > happening, it had no observable effects on userspace - it did not see > the fact that we had re-executed the offending instruction a dozen > times and as far as it was concerned, the watchpoint functionality > worked perfectly. You can check whether this is happening in your case > by instrumenting the code to print the reported address whenever it > enters `watchpoint_handler`. > Yes, I had done that. In my case it was always starting memory address accessed by the instruction. e.g. if a strb writes to 0x400023, I would get addr=0x400023 in watchpoint_handler(), if a str writes to 0x400020-0x400027, I would get addr=0x400020. > (I am sorry about the test errors. I was compiling the test case with > an android gcc - I'll make sure to check it with a vanilla linux gcc > also.) > your test cases are helpful. I think, I would use them to build further test cases. > > >> However, I did notice that it does not work if we watch an address >> which is at some offset from address programmed. For example, it works >> when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which >> is supported by hardware). >> >> I do have some patches to resolve that. >> >> https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel >> >> I will send them for review comment after some testing. > > I am looking forward to these patches - they were the next on my list > to look into after I got this resolved. :) > > However: Are sure about 0x2 not being a valid byte mask? According to It is a valid mask indeed, and you should be able to use all those mask after my patches. > my reading of the armv8 spec (section D7.3.11, "DBGWCR<n>_EL1, Debug > Watchpoint Control Registers, n = 0 - 15") it should be fine. > ==== > The valid values for BAS are 0b0000000, or a binary number all of > whose set bits are contiguous. All other values are reserved and must > not be used by software. > ==== > So, 0x2 (as well as 0x6, 0xC, 0xE) should be fine as it has a > contiguous sequence of set bit(s). I haven't tried yet whether any > hardware actually handles that correctly, but I was certainly hoping > we would be able to watch more precise memory regions. I have tested couple of them, and I think my patches should help with using any contiguous bit mask acceptable by spec. ~Pratyush
On Friday 07 October 2016 10:54 PM, Pavel Labath wrote: > However, I did notice that it does not work if we watch an address >> which is at some offset from address programmed. For example, it works >> when byte_mask is 0x3, but it does not work if byte_mask if 0x2 (which >> is supported by hardware). >> >> I do have some patches to resolve that. >> >> https://github.com/pratyushanand/linux/commits/perf/upstream_arm64_devel >> >> I will send them for review comment after some testing. This branch has been updated with a test code as well. So far they work fine at my end. Planning to send them tomorrow for review. ~Pratyush
Hello Pratyush, I am sorry about the delay. I have finally had a chance to try out your changes today. Response inline. On 8 October 2016 at 06:10, Pratyush Anand <panand@redhat.com> wrote: > On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote: >> The thing is, I have observed different behavior here depending on the >> exact hardware used. I don't have the exact parameters with me now, >> but I can look it up next week. >> >> The thing is that the spec is imprecise about what exact address the >> hardware can report for the watchpoint hit. I presume that is >> deliberate to give some leeway to implementers. The spec says the >> address can be anywhere in the range from the lowest memory address >> accessed by the instruction to the highest address watched by the >> watchpoint, > > I think, my patches should be able to take care of the above condition. Unfortunately, the patch does not solve the problem for my hardware, because of the leeway you give in watchpoint_handler is not big enough. It does work however, if I change the line > if (addr + 7 < val + lens || addr > val + lene) to > if (addr + 15 < val + lens || addr > val + lene) I do not think we can assume that address reported by the hardware will be at most 7 bytes off from the address we put the watchpoint at. There is nothing in the spec that guarantees that, and it does not seem to be enough for some hardware. In fact, I am not sure we can assume 15 is enough either, but maybe it can do for now, until we can find hardware that does not work with that (I haven't yet tried what happens it the watchpoint is triggered by cache management instructions, which can access much larger blocks of memory). For reference, the hardware in question is: > Processor : AArch64 Processor rev 0 (aarch64) > processor : 0 > min_vddcx : 400000 > min_vddmx : 490000 > BogoMIPS : 38.00 > Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer : 0x51 > CPU architecture: 8 > CPU variant : 0x2 > CPU part : 0x201 > CPU revision : 0 > CPU param : 300 468 468 621 939 299 445 445 621 1077 > Hardware : Qualcomm Technologies, Inc MSM8996pro And this is how it behaves: The output of the test app triggering the watchpoint (I have set a single-byte watchpoint at 555556705f) > > Writing to: 555556705f, size: 1 > Writing to: 555556705e, size: 2 > Writing to: 555556705c, size: 4 > Writing to: 5555567058, size: 8 > Writing to: 5555567050, size: 16 > Writing to: 5555567040, size: 32 The addresses received by the kernel: [ 251.812166] c1 3780 hw-breakpoint: watchpoint_handler: addr: 555556705f, val+lens: 555556705f, val+lene: 555556705f [ 251.820341] c1 3781 hw-breakpoint: watchpoint_handler: addr: 555556705e, val+lens: 555556705f, val+lene: 555556705f [ 251.825572] c0 3782 hw-breakpoint: watchpoint_handler: addr: 555556705c, val+lens: 555556705f, val+lene: 555556705f [ 251.831085] c0 3783 hw-breakpoint: watchpoint_handler: addr: 5555567058, val+lens: 555556705f, val+lene: 555556705f [ 251.835804] c0 3784 hw-breakpoint: watchpoint_handler: addr: 5555567050, val+lens: 555556705f, val+lene: 555556705f [ 251.841350] c0 3785 hw-breakpoint: watchpoint_handler: addr: 5555567050, val+lens: 555556705f, val+lene: 555556705f Note that for the case of 16 and 32-byte access it returns the address 5555567050 -- this is why the "+15" is sufficient for me. The other thing I am not so sure about in your patch is that it has potential to mis-attribute the watchpoint hit if we have two watchpoints close together. For example, if I have first watchpoint on 0x1008-0x100f and a second one on 0x1000-0x1007, *and* the application writes one byte to 0x1004, then your code will still attribute the hit to the first watchpoint, even though it was not really triggered. This is the reason I implemented my fix as a two-stage process, first looking for exact hits, and then falling back to the nearest one. That said, I don't know enough about the codebase to say if this is a real problem. On the plus side, I like the fact that we can watch arbitrary memory regions now, and the feature is working perfectly. :) My proposal would be to combine the two patches - take the byte mask handling code from yours, and the hit-attribution code from my patch. What do you think? regards, pavel
Hi Pavel, Thanks a lot for your testing. On Wed, Oct 12, 2016 at 7:20 PM, Pavel Labath <labath@google.com> wrote: > Hello Pratyush, > > I am sorry about the delay. I have finally had a chance to try out > your changes today. Response inline. > > On 8 October 2016 at 06:10, Pratyush Anand <panand@redhat.com> wrote: >> On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote: >>> The thing is, I have observed different behavior here depending on the >>> exact hardware used. I don't have the exact parameters with me now, >>> but I can look it up next week. >>> >>> The thing is that the spec is imprecise about what exact address the >>> hardware can report for the watchpoint hit. I presume that is >>> deliberate to give some leeway to implementers. The spec says the >>> address can be anywhere in the range from the lowest memory address >>> accessed by the instruction to the highest address watched by the >>> watchpoint, >> >> I think, my patches should be able to take care of the above condition. > Unfortunately, the patch does not solve the problem for my hardware, > because of the leeway you give in watchpoint_handler is not big > enough. It does work however, if I change the line >> if (addr + 7 < val + lens || addr > val + lene) > to >> if (addr + 15 < val + lens || addr > val + lene) Yes, I missed that floating point register will be of size 16. > I do not think we can assume that address reported by the hardware > will be at most 7 bytes off from the address we put the watchpoint at. > There is nothing in the spec that guarantees that, and it does not > seem to be enough for some hardware. In fact, I am not sure we can > assume 15 is enough either, but maybe it can do for now, until we can Right. It might even be bigger, in case of cache maintenance instructions. > find hardware that does not work with that (I haven't yet tried what > happens it the watchpoint is triggered by cache management > instructions, which can access much larger blocks of memory). Not, sure, may be it can lie in cache line size range. > > For reference, the hardware in question is: >> Processor : AArch64 Processor rev 0 (aarch64) >> processor : 0 >> min_vddcx : 400000 >> min_vddmx : 490000 >> BogoMIPS : 38.00 >> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 >> CPU implementer : 0x51 >> CPU architecture: 8 >> CPU variant : 0x2 >> CPU part : 0x201 >> CPU revision : 0 >> CPU param : 300 468 468 621 939 299 445 445 621 1077 >> Hardware : Qualcomm Technologies, Inc MSM8996pro > > And this is how it behaves: > The output of the test app triggering the watchpoint (I have set a > single-byte watchpoint at 555556705f) >> >> Writing to: 555556705f, size: 1 >> Writing to: 555556705e, size: 2 >> Writing to: 555556705c, size: 4 >> Writing to: 5555567058, size: 8 >> Writing to: 5555567050, size: 16 >> Writing to: 5555567040, size: 32 > > The addresses received by the kernel: > [ 251.812166] c1 3780 hw-breakpoint: watchpoint_handler: addr: > 555556705f, val+lens: 555556705f, val+lene: 555556705f > [ 251.820341] c1 3781 hw-breakpoint: watchpoint_handler: addr: > 555556705e, val+lens: 555556705f, val+lene: 555556705f > [ 251.825572] c0 3782 hw-breakpoint: watchpoint_handler: addr: > 555556705c, val+lens: 555556705f, val+lene: 555556705f > [ 251.831085] c0 3783 hw-breakpoint: watchpoint_handler: addr: > 5555567058, val+lens: 555556705f, val+lene: 555556705f > [ 251.835804] c0 3784 hw-breakpoint: watchpoint_handler: addr: > 5555567050, val+lens: 555556705f, val+lene: 555556705f > [ 251.841350] c0 3785 hw-breakpoint: watchpoint_handler: addr: > 5555567050, val+lens: 555556705f, val+lene: 555556705f > > Note that for the case of 16 and 32-byte access it returns the address > 5555567050 -- this is why the "+15" is sufficient for me. > > > The other thing I am not so sure about in your patch is that it has > potential to mis-attribute the watchpoint hit if we have two > watchpoints close together. For example, if I have first watchpoint on > 0x1008-0x100f and a second one on 0x1000-0x1007, *and* the application > writes one byte to 0x1004, then your code will still attribute the hit > to the first watchpoint, even though it was not really triggered. This Hummm..yes, thanks for pointing it out. There could be only two solutions for it: (1) We read instruction at the location regs->pc and analyse it and find the size of read/write. or(2) What you have suggested in your patch. I think, its easier to go with your implementation. So, I have taken your patch and updated my perf/upstream_arm64_devel branch. May be you can give it a test for your test cases. > is the reason I implemented my fix as a two-stage process, first > looking for exact hits, and then falling back to the nearest one. That > said, I don't know enough about the codebase to say if this is a real > problem. > > On the plus side, I like the fact that we can watch arbitrary memory > regions now, and the feature is working perfectly. :) > Thanks :-) > > My proposal would be to combine the two patches - take the byte mask > handling code from yours, and the hit-attribution code from my patch. > What do you think? I am ok with merging them together as well as sending them as different patch in my v2 series. ~Pratyush
On 13 October 2016 at 10:58, Pratyush Anand <panand@redhat.com> wrote: > Hi Pavel, > > Thanks a lot for your testing. > > On Wed, Oct 12, 2016 at 7:20 PM, Pavel Labath <labath@google.com> wrote: >> Hello Pratyush, >> >> I am sorry about the delay. I have finally had a chance to try out >> your changes today. Response inline. >> >> On 8 October 2016 at 06:10, Pratyush Anand <panand@redhat.com> wrote: >>> On Fri, Oct 7, 2016 at 10:54 PM, Pavel Labath <labath@google.com> wrote: >>>> The thing is, I have observed different behavior here depending on the >>>> exact hardware used. I don't have the exact parameters with me now, >>>> but I can look it up next week. >>>> >>>> The thing is that the spec is imprecise about what exact address the >>>> hardware can report for the watchpoint hit. I presume that is >>>> deliberate to give some leeway to implementers. The spec says the >>>> address can be anywhere in the range from the lowest memory address >>>> accessed by the instruction to the highest address watched by the >>>> watchpoint, >>> >>> I think, my patches should be able to take care of the above condition. >> Unfortunately, the patch does not solve the problem for my hardware, >> because of the leeway you give in watchpoint_handler is not big >> enough. It does work however, if I change the line >>> if (addr + 7 < val + lens || addr > val + lene) >> to >>> if (addr + 15 < val + lens || addr > val + lene) > > Yes, I missed that floating point register will be of size 16. > >> I do not think we can assume that address reported by the hardware >> will be at most 7 bytes off from the address we put the watchpoint at. >> There is nothing in the spec that guarantees that, and it does not >> seem to be enough for some hardware. In fact, I am not sure we can >> assume 15 is enough either, but maybe it can do for now, until we can > > Right. It might even be bigger, in case of cache maintenance instructions. > >> find hardware that does not work with that (I haven't yet tried what >> happens it the watchpoint is triggered by cache management >> instructions, which can access much larger blocks of memory). > > Not, sure, may be it can lie in cache line size range. > >> >> For reference, the hardware in question is: >>> Processor : AArch64 Processor rev 0 (aarch64) >>> processor : 0 >>> min_vddcx : 400000 >>> min_vddmx : 490000 >>> BogoMIPS : 38.00 >>> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 >>> CPU implementer : 0x51 >>> CPU architecture: 8 >>> CPU variant : 0x2 >>> CPU part : 0x201 >>> CPU revision : 0 >>> CPU param : 300 468 468 621 939 299 445 445 621 1077 >>> Hardware : Qualcomm Technologies, Inc MSM8996pro >> >> And this is how it behaves: >> The output of the test app triggering the watchpoint (I have set a >> single-byte watchpoint at 555556705f) >>> >>> Writing to: 555556705f, size: 1 >>> Writing to: 555556705e, size: 2 >>> Writing to: 555556705c, size: 4 >>> Writing to: 5555567058, size: 8 >>> Writing to: 5555567050, size: 16 >>> Writing to: 5555567040, size: 32 >> >> The addresses received by the kernel: >> [ 251.812166] c1 3780 hw-breakpoint: watchpoint_handler: addr: >> 555556705f, val+lens: 555556705f, val+lene: 555556705f >> [ 251.820341] c1 3781 hw-breakpoint: watchpoint_handler: addr: >> 555556705e, val+lens: 555556705f, val+lene: 555556705f >> [ 251.825572] c0 3782 hw-breakpoint: watchpoint_handler: addr: >> 555556705c, val+lens: 555556705f, val+lene: 555556705f >> [ 251.831085] c0 3783 hw-breakpoint: watchpoint_handler: addr: >> 5555567058, val+lens: 555556705f, val+lene: 555556705f >> [ 251.835804] c0 3784 hw-breakpoint: watchpoint_handler: addr: >> 5555567050, val+lens: 555556705f, val+lene: 555556705f >> [ 251.841350] c0 3785 hw-breakpoint: watchpoint_handler: addr: >> 5555567050, val+lens: 555556705f, val+lene: 555556705f >> >> Note that for the case of 16 and 32-byte access it returns the address >> 5555567050 -- this is why the "+15" is sufficient for me. >> >> >> The other thing I am not so sure about in your patch is that it has >> potential to mis-attribute the watchpoint hit if we have two >> watchpoints close together. For example, if I have first watchpoint on >> 0x1008-0x100f and a second one on 0x1000-0x1007, *and* the application >> writes one byte to 0x1004, then your code will still attribute the hit >> to the first watchpoint, even though it was not really triggered. This > > Hummm..yes, thanks for pointing it out. > There could be only two solutions for it: > (1) We read instruction at the location regs->pc and analyse it and > find the size of read/write. > or(2) What you have suggested in your patch. > > I think, its easier to go with your implementation. So, I have taken > your patch and updated my perf/upstream_arm64_devel branch. May be you > can give it a test for your test cases. I've checked out the new version of your branch, and it works great. I'll write a patch with additional test cases to go on top of your branch, as the tests there do not capture the bug I was fixing. regards, Pavel
On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote: >> I think, its easier to go with your implementation. So, I have taken >> > your patch and updated my perf/upstream_arm64_devel branch. May be you >> > can give it a test for your test cases. > I've checked out the new version of your branch, and it works great. > I'll write a patch with additional test cases to go on top of your > branch, as the tests there do not capture the bug I was fixing. That would be great. We can send them all together as V2. Thanks for your help. ~Pratyush
On Fri, Oct 14, 2016 at 08:45:43AM +0530, Pratyush Anand wrote: > > > On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote: > >>I think, its easier to go with your implementation. So, I have taken > >>> your patch and updated my perf/upstream_arm64_devel branch. May be you > >>> can give it a test for your test cases. > >I've checked out the new version of your branch, and it works great. > >I'll write a patch with additional test cases to go on top of your > >branch, as the tests there do not capture the bug I was fixing. > > That would be great. We can send them all together as V2. Did you send a v2? I've been holding off reviewing this, but I just want to make sure I didn't miss the update. Cheers, Will
I've send the test suite update to Pratyush via github yesterday. I presume he will come with a v2 soon. :) regards, pavel On 19 October 2016 at 13:07, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Oct 14, 2016 at 08:45:43AM +0530, Pratyush Anand wrote: >> >> >> On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote: >> >>I think, its easier to go with your implementation. So, I have taken >> >>> your patch and updated my perf/upstream_arm64_devel branch. May be you >> >>> can give it a test for your test cases. >> >I've checked out the new version of your branch, and it works great. >> >I'll write a patch with additional test cases to go on top of your >> >branch, as the tests there do not capture the bug I was fixing. >> >> That would be great. We can send them all together as V2. > > Did you send a v2? I've been holding off reviewing this, but I just want > to make sure I didn't miss the update. > > Cheers, > > Will
Hi Will, On Wednesday 19 October 2016 05:37 PM, Will Deacon wrote: > On Fri, Oct 14, 2016 at 08:45:43AM +0530, Pratyush Anand wrote: >> >> >> On Thursday 13 October 2016 10:33 PM, Pavel Labath wrote: >>>> I think, its easier to go with your implementation. So, I have taken >>>>> your patch and updated my perf/upstream_arm64_devel branch. May be you >>>>> can give it a test for your test cases. >>> I've checked out the new version of your branch, and it works great. >>> I'll write a patch with additional test cases to go on top of your >>> branch, as the tests there do not capture the bug I was fixing. >> >> That would be great. We can send them all together as V2. > > Did you send a v2? I've been holding off reviewing this, but I just want > to make sure I didn't miss the update. I just posted V2. http://www.spinics.net/lists/arm-kernel/msg537194.html ~Pratyush
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 14562ae..3ce27ea 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -664,49 +664,63 @@ unlock: } NOKPROBE_SYMBOL(breakpoint_handler); +/* + * Arm64 hardware does not always report a watchpoint hit address that matches + * one of the watchpoints set. It can also report an address "near" the + * watchpoint if a single instruction access both watched and unwatched + * addresses. There is no straight-forward way, short of disassembling the + * offending instruction, to map that address back to the watchpoint. This + * function computes the distance of the memory access from the watchpoint as a + * heuristic for the likelyhood that a given access triggered the watchpoint. + * + * See Section D2.10.5 "Determining the memory location that caused a Watchpoint + * exception" of ARMv8 Architecture Reference Manual for details. + * + * The function returns the distance of the address from the bytes watched by + * the watchpoint. In case of an exact match, it returns 0. + */ +static u64 get_distance_from_watchpoint(unsigned long addr, int i, + struct arch_hw_breakpoint *info) +{ + u64 wp_low, wp_high; + int first_bit; + + first_bit = ffs(info->ctrl.len); + if (first_bit == 0) + return -1; + + wp_low = info->address + first_bit - 1; + wp_high = info->address + fls(info->ctrl.len) - 1; + if (addr < wp_low) + return wp_low - addr; + else if (addr > wp_high) + return addr - wp_high; + else + return 0; + +} + static int watchpoint_handler(unsigned long addr, unsigned int esr, struct pt_regs *regs) { - int i, step = 0, *kernel_step, access; - u32 ctrl_reg; - u64 val, alignment_mask; + int i, step = 0, *kernel_step, access, closest_match = 0; + u64 min_dist = -1, dist; struct perf_event *wp, **slots; struct debug_info *debug_info; struct arch_hw_breakpoint *info; - struct arch_hw_breakpoint_ctrl ctrl; slots = this_cpu_ptr(wp_on_reg); debug_info = ¤t->thread.debug; + /* + * Find all watchpoints that match the reported address. If no exact + * match is found. Attribute the hit to the closest watchpoint. + */ + rcu_read_lock(); for (i = 0; i < core_num_wrps; ++i) { - rcu_read_lock(); - wp = slots[i]; - if (wp == NULL) - goto unlock; - - info = counter_arch_bp(wp); - /* AArch32 watchpoints are either 4 or 8 bytes aligned. */ - if (is_compat_task()) { - if (info->ctrl.len == ARM_BREAKPOINT_LEN_8) - alignment_mask = 0x7; - else - alignment_mask = 0x3; - } else { - alignment_mask = 0x7; - } - - /* Check if the watchpoint value matches. */ - val = read_wb_reg(AARCH64_DBG_REG_WVR, i); - if (val != (addr & ~alignment_mask)) - goto unlock; - - /* Possible match, check the byte address select to confirm. */ - ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i); - decode_ctrl_reg(ctrl_reg, &ctrl); - if (!((1 << (addr & alignment_mask)) & ctrl.len)) - goto unlock; + continue; /* * Check that the access type matches. @@ -715,7 +729,18 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W : HW_BREAKPOINT_R; if (!(access & hw_breakpoint_type(wp))) - goto unlock; + continue; + + info = counter_arch_bp(wp); + + dist = get_distance_from_watchpoint(addr, i, info); + if (dist < min_dist) { + min_dist = dist; + closest_match = i; + } + /* Is this an exact match? */ + if (dist != 0) + continue; info->trigger = addr; perf_bp_event(wp, regs); @@ -723,10 +748,15 @@ static int watchpoint_handler(unsigned long addr, unsigned int esr, /* Do we need to handle the stepping? */ if (is_default_overflow_handler(wp)) step = 1; - -unlock: - rcu_read_unlock(); } + if (min_dist > 0 && min_dist != -1) { + /* No exact match found. */ + wp = slots[closest_match]; + info = counter_arch_bp(wp); + info->trigger = addr; + perf_bp_event(wp, regs); + } + rcu_read_unlock(); if (!step) return 0;
Arm64 hardware does not always report a watchpoint hit address that matches one of the watchpoints set. It can also report an address "near" the watchpoint if a single instruction access both watched and unwatched addresses. There is no straight-forward way, short of disassembling the offending instruction, to map that address back to the watchpoint. Previously, when the hardware reported a watchpoint hit on an address that did not match our watchpoint (this happens in case of instructions which access large chunks of memory such as "stp") the process would enter a loop where we would be continually resuming it (because we did not recognise that watchpoint hit) and it would keep hitting the watchpoint again and again. The tracing process would never get notified of the watchpoint hit. This commit fixes the problem by looking at the watchpoints near the address reported by the hardware. If the address does not exactly match one of the watchpoints we have set, it attributes the hit to the nearest watchpoint we have. This heuristic is a bit dodgy, but I don't think we can do much more, given the hardware limitations. Signed-off-by: Pavel Labath <labath@google.com> --- arch/arm64/kernel/hw_breakpoint.c | 98 +++++++++++++++++++++++++-------------- 1 file changed, 64 insertions(+), 34 deletions(-)