Message ID | 20231114051329.327572-1-anshuman.khandual@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64/perf: Enable branch stack sampling | expand |
On 14/11/2023 05:13, Anshuman Khandual wrote: > This series enables perf branch stack sampling support on arm64 platform > via a new arch feature called Branch Record Buffer Extension (BRBE). All > the relevant register definitions could be accessed here. > [...] > > --------------------------- Virtualisation support ------------------------ > > - Branch stack sampling is not currently supported inside the guest (TODO) > > - FEAT_BRBE advertised as absent via clearing ID_AA64DFR0_EL1.BRBE > - Future support in guest requires emulating FEAT_BRBE If you never add support for the host looking into a guest, and you save and restore all the BRBINF[n] registers, I think you might be able to just let the guest do whatever it wants with BRBE and not trap and emulate it? Maybe there is some edge case why that wouldn't work, but it's worth thinking about. For BRBE specifically I don't see much of a use case for hosts looking into a guest, at least not like with PMU counters. > > - Branch stack sampling the guest is not supported in the host (TODO) > > - Tracing the guest with event->attr.exclude_guest = 0 > - There are multiple challenges involved regarding mixing events > with mismatched branch_sample_type and exclude_guest and passing > on captured BRBE records to intended events during PMU interrupt > > - Guest access for BRBE registers and instructions has been blocked > > - BRBE state save is not required for VHE host (EL2) guest (EL1) transition > > - BRBE state is saved for NVHE host (EL1) guest (EL1) transition > > -------------------------------- Testing --------------------------------- > > - Cross compiled for both arm64 and arm32 platforms > - Passes all branch tests with 'perf test branch' on arm64 > > -------------------------------- Questions ------------------------------- > > - Instead of configuring the BRBE HW with branch_sample_type from the last > event to be added on the PMU as proposed, could those be merged together > e.g all privilege requests ORed, to form a common BRBE configuration and > all events get branch records after a PMU interrupt ? >
On 11/14/23 22:47, James Clark wrote: > > > On 14/11/2023 05:13, Anshuman Khandual wrote: >> This series enables perf branch stack sampling support on arm64 platform >> via a new arch feature called Branch Record Buffer Extension (BRBE). All >> the relevant register definitions could be accessed here. >> > [...] >> >> --------------------------- Virtualisation support ------------------------ >> >> - Branch stack sampling is not currently supported inside the guest (TODO) >> >> - FEAT_BRBE advertised as absent via clearing ID_AA64DFR0_EL1.BRBE >> - Future support in guest requires emulating FEAT_BRBE > > If you never add support for the host looking into a guest, and you save But that seems to be a valid use case though. Is there a particular concern why such capability should or could not be added for BRBE ? > and restore all the BRBINF[n] registers, I think you might be able to > just let the guest do whatever it wants with BRBE and not trap and > emulate it? Maybe there is some edge case why that wouldn't work, but > it's worth thinking about. Right, in case host tracing of the guest is not supported (although still wondering why it should not be), saving and restoring complete BRBE state i.e all system registers that can be accessed from guest, would let guest do what ever it wants with BRBE without requiring the trap-emulate model. > > For BRBE specifically I don't see much of a use case for hosts looking > into a guest, at least not like with PMU counters. But how is it any different from normal PMU counters ? Branch records do provide statistical insights into hot sections in the guest.
On 22/11/2023 05:15, Anshuman Khandual wrote: > On 11/14/23 22:47, James Clark wrote: >> >> >> On 14/11/2023 05:13, Anshuman Khandual wrote: >>> This series enables perf branch stack sampling support on arm64 platform >>> via a new arch feature called Branch Record Buffer Extension (BRBE). All >>> the relevant register definitions could be accessed here. >>> >> [...] >>> >>> --------------------------- Virtualisation support ------------------------ >>> >>> - Branch stack sampling is not currently supported inside the guest (TODO) >>> >>> - FEAT_BRBE advertised as absent via clearing ID_AA64DFR0_EL1.BRBE >>> - Future support in guest requires emulating FEAT_BRBE >> >> If you never add support for the host looking into a guest, and you save > > But that seems to be a valid use case though. Is there a particular concern > why such capability should or could not be added for BRBE ? > What's the use case exactly? You wouldn't even have the binary mappings of the guest without running perf inside the guest too, and at that point you might as well have just done the BRBE recording from inside the guest. My particular concern is only about the effort required to implement it, vs its usefulness. Not that we shouldn't ever implement the fully shared BRBE between host and guest, we could always do it later. My idea was just to get BRBE working inside of guests quicker. >> and restore all the BRBINF[n] registers, I think you might be able to >> just let the guest do whatever it wants with BRBE and not trap and >> emulate it? Maybe there is some edge case why that wouldn't work, but >> it's worth thinking about. > > Right, in case host tracing of the guest is not supported (although still > wondering why it should not be), saving and restoring complete BRBE state > i.e all system registers that can be accessed from guest, would let guest > do what ever it wants with BRBE without requiring the trap-emulate model. > >> >> For BRBE specifically I don't see much of a use case for hosts looking >> into a guest, at least not like with PMU counters. > But how is it any different from normal PMU counters ? Branch records do > provide statistical insights into hot sections in the guest. > There is a big difference, PMU counters can be used to infer general things about a system without any extra information. That's something that could be used by a monitoring task or someone looking at a guest running a known workload. But for BRBE you need the binaries, mappings, scheduling events, thread switches etc to make any sense of the pointers in the branch buffers, otherwise they're just random numbers from who knows which process.