Message ID | 20151221105528.GA19617@e106634-lin.cambridge.arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Dec 21, 2015 at 10:55:29AM +0000, Suzuki K. Poulose wrote: > Thanks for that hint. Here is what I cam up with. We don't reschedule > the events, all we need to do is group the writes to the counters. Hence > we could as well add a flag for those events which need programming > and perform the write in pmu::pmu_enable(). I'm still somewhat confused.. > Grouping the writes to counters can ammortise the cost of the operation > on PMUs where it is expensive (e.g, CCI-500). This rationale makes me think you want to reduce the number of counter writes, not batch them per-se. So why are you unconditionally writing all counters, instead of only those that changed?
On 05/01/16 13:37, Peter Zijlstra wrote: > On Mon, Dec 21, 2015 at 10:55:29AM +0000, Suzuki K. Poulose wrote: >> Thanks for that hint. Here is what I cam up with. We don't reschedule >> the events, all we need to do is group the writes to the counters. Hence >> we could as well add a flag for those events which need programming >> and perform the write in pmu::pmu_enable(). > > I'm still somewhat confused.. > >> Grouping the writes to counters can ammortise the cost of the operation >> on PMUs where it is expensive (e.g, CCI-500). > > This rationale makes me think you want to reduce the number of counter > writes, not batch them per-se. > > So why are you unconditionally writing all counters, instead of only > those that changed? > The ARM CCI PMU reprograms all the counters with a specific value (2^31) to account for high interrupt latencies in recording the counters that overflowed. So, pmu_stop() updates the counter and pmu_start() resets the counter to the above value, always. Now, writing to a single counter requires 1) Stopping and disabling all the counters in HW (So that step 3 doesn't interfere with the other counters) 2) Program the target counter with invalid event and enable the counter. 3) Enable the PMU and then write to the counter. 4) Reset everything back to normal. So, the approach here is to delay the writes to the counters as much as possible and batch them. So that we don't have to repeat steps 1 & 4 for every single counter. Does it help ? Thanks Suzuki
On Tue, Jan 05, 2016 at 01:43:30PM +0000, Suzuki K. Poulose wrote: > On 05/01/16 13:37, Peter Zijlstra wrote: > >On Mon, Dec 21, 2015 at 10:55:29AM +0000, Suzuki K. Poulose wrote: > >>Thanks for that hint. Here is what I cam up with. We don't reschedule > >>the events, all we need to do is group the writes to the counters. Hence > >>we could as well add a flag for those events which need programming > >>and perform the write in pmu::pmu_enable(). > > > >I'm still somewhat confused.. > > > >>Grouping the writes to counters can ammortise the cost of the operation > >>on PMUs where it is expensive (e.g, CCI-500). > > > >This rationale makes me think you want to reduce the number of counter > >writes, not batch them per-se. > > > >So why are you unconditionally writing all counters, instead of only > >those that changed? > > > > The ARM CCI PMU reprograms all the counters with a specific value (2^31) > to account for high interrupt latencies in recording the counters that > overflowed. So, pmu_stop() updates the counter and pmu_start() resets > the counter to the above value, always. > > Now, writing to a single counter requires > > 1) Stopping and disabling all the counters in HW (So that step 3 doesn't > interfere with the other counters) > 2) Program the target counter with invalid event and enable the counter. > 3) Enable the PMU and then write to the counter. > 4) Reset everything back to normal. > > > So, the approach here is to delay the writes to the counters as much as possible > and batch them. So that we don't have to repeat steps 1 & 4 for every single > counter. > > Does it help ? Yes, thanks!
diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c index 0189f3a..c768ee4 100644 --- a/drivers/bus/arm-cci.c +++ b/drivers/bus/arm-cci.c @@ -916,6 +916,40 @@ static void hw_perf_event_destroy(struct perf_event *event) } } +/* + * Program the CCI PMU counters which have PERF_HES_ARCH set + * with the event period and mark them ready before we enable + * PMU. + */ +void cci_pmu_update_counters(struct cci_pmu *cci_pmu) +{ + int i; + unsigned long mask[BITS_TO_LONGS(cci_pmu->num_cntrs)]; + + memset(mask, 0, BITS_TO_LONGS(cci_pmu->num_cntrs) * sizeof(unsigned long)); + + for_each_set_bit(i, cci_pmu->hw_events.used_mask, cci_pmu->num_cntrs) { + struct hw_perf_event *hwe; + + if (!cci_pmu->hw_events.events[i]) { + WARN_ON(1); + continue; + } + + hwe = &cci_pmu->hw_events.events[i]->hw; + /* Leave the events which are not counting */ + if (hwe->state & PERF_HES_STOPPED) + continue; + if (hwe->state & PERF_HES_ARCH) { + set_bit(i, mask); + hwe->state &= ~PERF_HES_ARCH; + local64_set(&hwe->prev_count, CCI_CNTR_PERIOD); + } + } + + pmu_write_counters(cci_pmu, mask, CCI_CNTR_PERIOD); +} + static void cci_pmu_enable(struct pmu *pmu) { struct cci_pmu *cci_pmu = to_cci_pmu(pmu); @@ -927,6 +961,7 @@ static void cci_pmu_enable(struct pmu *pmu) return; raw_spin_lock_irqsave(&hw_events->pmu_lock, flags); + cci_pmu_update_counters(cci_pmu); __cci_pmu_enable(); raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags); @@ -980,8 +1015,11 @@ static void cci_pmu_start(struct perf_event *event, int pmu_flags) /* Configure the counter unless you are counting a fixed event */ if (!pmu_fixed_hw_idx(cci_pmu, idx)) pmu_set_event(cci_pmu, idx, hwc->config_base); - - pmu_event_set_period(event); + /* + * Mark this counter, so that we can program the + * counter with the event_period. see cci_pmu_enable() + */ + hwc->state = PERF_HES_ARCH; pmu_enable_counter(cci_pmu, idx); raw_spin_unlock_irqrestore(&hw_events->pmu_lock, flags);