diff mbox series

[V12,08/10] arm64/perf: Add struct brbe_regset helper functions

Message ID 20230615133239.442736-9-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64/perf: Enable branch stack sampling | expand

Commit Message

Anshuman Khandual June 15, 2023, 1:32 p.m. UTC
The primary abstraction level for fetching branch records from BRBE HW has
been changed as 'struct brbe_regset', which contains storage for all three
BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing
happens in the task sched out path, or in the PMU IRQ handling path, these
registers need to be extracted from the HW. Afterwards both live and stored
sets need to be stitched together to create final branch records set. This
adds required helper functions for such operations.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Tested-by: James Clark <james.clark@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_brbe.c | 127 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Comments

Mark Rutland June 21, 2023, 1:15 p.m. UTC | #1
Hi Anshuman,

Thanks, this is looking much better; I just a have a couple of minor comments.

With those fixed up:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

On Thu, Jun 15, 2023 at 07:02:37PM +0530, Anshuman Khandual wrote:
> The primary abstraction level for fetching branch records from BRBE HW has
> been changed as 'struct brbe_regset', which contains storage for all three
> BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing
> happens in the task sched out path, or in the PMU IRQ handling path, these
> registers need to be extracted from the HW. Afterwards both live and stored
> sets need to be stitched together to create final branch records set. This
> adds required helper functions for such operations.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Tested-by: James Clark <james.clark@arm.com>
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  drivers/perf/arm_brbe.c | 127 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 127 insertions(+)
> 
> diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
> index 4729cb49282b..f6693699fade 100644
> --- a/drivers/perf/arm_brbe.c
> +++ b/drivers/perf/arm_brbe.c
> @@ -44,6 +44,133 @@ static void select_brbe_bank(int bank)
>  	isb();
>  }
>  
> +static bool __read_brbe_regset(struct brbe_regset *entry, int idx)
> +{
> +	entry->brbinf = get_brbinf_reg(idx);
> +
> +	/*
> +	 * There are no valid entries anymore on the buffer.
> +	 * Abort the branch record processing to save some
> +	 * cycles and also reduce the capture/process load
> +	 * for the user space as well.
> +	 */

This comment refers to the process of handling multiple entries, though it's
only handling one entry, and I don't think we need to mention saving cycles here.

Could we please delete this comment entirely? The comment above
capture_brbe_regset() already explains that we read until the first invalid
entry.

> +	if (brbe_invalid(entry->brbinf))
> +		return false;
> +
> +	entry->brbsrc = get_brbsrc_reg(idx);
> +	entry->brbtgt = get_brbtgt_reg(idx);
> +	return true;
> +}
> +
> +/*
> + * This scans over BRBE register banks and captures individual branch records
> + * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer,
> + * until an invalid one gets encountered. The caller for this function needs
> + * to ensure BRBE is an appropriate state before the records can be captured.
> + */

Could we simplify this to:

/*
 * Read all BRBE entries in HW until the first invalid entry.
 *
 * The caller must ensure that the BRBE is not concurrently modifying these
 * entries.
 */

> +static int capture_brbe_regset(int nr_hw_entries, struct brbe_regset *buf)
> +{
> +	int idx = 0;
> +
> +	select_brbe_bank(BRBE_BANK_IDX_0);
> +	while (idx < nr_hw_entries && idx < BRBE_BANK0_IDX_MAX) {
> +		if (!__read_brbe_regset(&buf[idx], idx))
> +			return idx;
> +		idx++;
> +	}
> +
> +	select_brbe_bank(BRBE_BANK_IDX_1);
> +	while (idx < nr_hw_entries && idx < BRBE_BANK1_IDX_MAX) {
> +		if (!__read_brbe_regset(&buf[idx], idx))
> +			return idx;
> +		idx++;
> +	}
> +	return idx;
> +}
> +
> +/*
> + * This function concatenates branch records from stored and live buffer
> + * up to maximum nr_max records and the stored buffer holds the resultant
> + * buffer. The concatenated buffer contains all the branch records from
> + * the live buffer but might contain some from stored buffer considering
> + * the maximum combined length does not exceed 'nr_max'.
> + *
> + *	Stored records	Live records
> + *	------------------------------------------------^
> + *	|	S0	|	L0	|	Newest	|
> + *	---------------------------------		|
> + *	|	S1	|	L1	|		|
> + *	---------------------------------		|
> + *	|	S2	|	L2	|		|
> + *	---------------------------------		|
> + *	|	S3	|	L3	|		|
> + *	---------------------------------		|
> + *	|	S4	|	L4	|		nr_max
> + *	---------------------------------		|
> + *	|		|	L5	|		|
> + *	---------------------------------		|
> + *	|		|	L6	|		|
> + *	---------------------------------		|
> + *	|		|	L7	|		|
> + *	---------------------------------		|
> + *	|		|		|		|
> + *	---------------------------------		|
> + *	|		|		|	Oldest	|
> + *	------------------------------------------------V
> + *
> + *
> + * S0 is the newest in the stored records, where as L7 is the oldest in
> + * the live records. Unless the live buffer is detected as being full
> + * thus potentially dropping off some older records, L7 and S0 records
> + * are contiguous in time for a user task context. The stitched buffer
> + * here represents maximum possible branch records, contiguous in time.
> + *
> + *	Stored records  Live records
> + *	------------------------------------------------^
> + *	|	L0	|	L0	|	Newest	|
> + *	---------------------------------		|
> + *	|	L0	|	L1	|		|
> + *	---------------------------------		|
> + *	|	L2	|	L2	|		|
> + *	---------------------------------		|
> + *	|	L3	|	L3	|		|
> + *	---------------------------------		|
> + *	|	L4	|	L4	|	      nr_max
> + *	---------------------------------		|
> + *	|	L5	|	L5	|		|
> + *	---------------------------------		|
> + *	|	L6	|	L6	|		|
> + *	---------------------------------		|
> + *	|	L7	|	L7	|		|
> + *	---------------------------------		|
> + *	|	S0	|		|		|
> + *	---------------------------------		|
> + *	|	S1	|		|    Oldest	|
> + *	------------------------------------------------V
> + *	|	S2	| <----|
> + *	-----------------      |
> + *	|	S3	| <----| Dropped off after nr_max
> + *	-----------------      |
> + *	|	S4	| <----|
> + *	-----------------
> + */
> +static int stitch_stored_live_entries(struct brbe_regset *stored,
> +				      struct brbe_regset *live,
> +				      int nr_stored, int nr_live,
> +				      int nr_max)
> +{
> +	int nr_move = min(nr_stored, nr_max - nr_live);
> +
> +	/* Move the tail of the buffer to make room for the new entries */
> +	memmove(&stored[nr_live], &stored[0], nr_move * sizeof(*stored));
> +
> +	/* Copy the new entries into the head of the buffer */
> +	memcpy(&stored[0], &live[0], nr_live * sizeof(*stored));
> +
> +	/* Return the number of entries in the stitched buffer */
> +	return min(nr_live + nr_stored, nr_max);
> +}
> +
>  /*
>   * Generic perf branch filters supported on BRBE
>   *
> -- 
> 2.25.1
>
Anshuman Khandual June 22, 2023, 2:07 a.m. UTC | #2
On 6/21/23 18:45, Mark Rutland wrote:
> Hi Anshuman,
> 
> Thanks, this is looking much better; I just a have a couple of minor comments.
> 
> With those fixed up:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> Mark.
> 
> On Thu, Jun 15, 2023 at 07:02:37PM +0530, Anshuman Khandual wrote:
>> The primary abstraction level for fetching branch records from BRBE HW has
>> been changed as 'struct brbe_regset', which contains storage for all three
>> BRBE registers i.e BRBSRC, BRBTGT, BRBINF. Whether branch record processing
>> happens in the task sched out path, or in the PMU IRQ handling path, these
>> registers need to be extracted from the HW. Afterwards both live and stored
>> sets need to be stitched together to create final branch records set. This
>> adds required helper functions for such operations.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Tested-by: James Clark <james.clark@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  drivers/perf/arm_brbe.c | 127 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 127 insertions(+)
>>
>> diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
>> index 4729cb49282b..f6693699fade 100644
>> --- a/drivers/perf/arm_brbe.c
>> +++ b/drivers/perf/arm_brbe.c
>> @@ -44,6 +44,133 @@ static void select_brbe_bank(int bank)
>>  	isb();
>>  }
>>  
>> +static bool __read_brbe_regset(struct brbe_regset *entry, int idx)
>> +{
>> +	entry->brbinf = get_brbinf_reg(idx);
>> +
>> +	/*
>> +	 * There are no valid entries anymore on the buffer.
>> +	 * Abort the branch record processing to save some
>> +	 * cycles and also reduce the capture/process load
>> +	 * for the user space as well.
>> +	 */
> 
> This comment refers to the process of handling multiple entries, though it's
> only handling one entry, and I don't think we need to mention saving cycles here.
> 
> Could we please delete this comment entirely? The comment above
> capture_brbe_regset() already explains that we read until the first invalid
> entry.

Sure, will drop the comment.

> 
>> +	if (brbe_invalid(entry->brbinf))
>> +		return false;
>> +
>> +	entry->brbsrc = get_brbsrc_reg(idx);
>> +	entry->brbtgt = get_brbtgt_reg(idx);
>> +	return true;
>> +}
>> +
>> +/*
>> + * This scans over BRBE register banks and captures individual branch records
>> + * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer,
>> + * until an invalid one gets encountered. The caller for this function needs
>> + * to ensure BRBE is an appropriate state before the records can be captured.
>> + */
> 
> Could we simplify this to:
> 
> /*
>  * Read all BRBE entries in HW until the first invalid entry.
>  *
>  * The caller must ensure that the BRBE is not concurrently modifying these
>  * entries.
>  */

Okay, will change the comment as suggested.
diff mbox series

Patch

diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
index 4729cb49282b..f6693699fade 100644
--- a/drivers/perf/arm_brbe.c
+++ b/drivers/perf/arm_brbe.c
@@ -44,6 +44,133 @@  static void select_brbe_bank(int bank)
 	isb();
 }
 
+static bool __read_brbe_regset(struct brbe_regset *entry, int idx)
+{
+	entry->brbinf = get_brbinf_reg(idx);
+
+	/*
+	 * There are no valid entries anymore on the buffer.
+	 * Abort the branch record processing to save some
+	 * cycles and also reduce the capture/process load
+	 * for the user space as well.
+	 */
+	if (brbe_invalid(entry->brbinf))
+		return false;
+
+	entry->brbsrc = get_brbsrc_reg(idx);
+	entry->brbtgt = get_brbtgt_reg(idx);
+	return true;
+}
+
+/*
+ * This scans over BRBE register banks and captures individual branch records
+ * [BRBSRC, BRBTGT, BRBINF] into a pre-allocated 'struct brbe_regset' buffer,
+ * until an invalid one gets encountered. The caller for this function needs
+ * to ensure BRBE is an appropriate state before the records can be captured.
+ */
+static int capture_brbe_regset(int nr_hw_entries, struct brbe_regset *buf)
+{
+	int idx = 0;
+
+	select_brbe_bank(BRBE_BANK_IDX_0);
+	while (idx < nr_hw_entries && idx < BRBE_BANK0_IDX_MAX) {
+		if (!__read_brbe_regset(&buf[idx], idx))
+			return idx;
+		idx++;
+	}
+
+	select_brbe_bank(BRBE_BANK_IDX_1);
+	while (idx < nr_hw_entries && idx < BRBE_BANK1_IDX_MAX) {
+		if (!__read_brbe_regset(&buf[idx], idx))
+			return idx;
+		idx++;
+	}
+	return idx;
+}
+
+/*
+ * This function concatenates branch records from stored and live buffer
+ * up to maximum nr_max records and the stored buffer holds the resultant
+ * buffer. The concatenated buffer contains all the branch records from
+ * the live buffer but might contain some from stored buffer considering
+ * the maximum combined length does not exceed 'nr_max'.
+ *
+ *	Stored records	Live records
+ *	------------------------------------------------^
+ *	|	S0	|	L0	|	Newest	|
+ *	---------------------------------		|
+ *	|	S1	|	L1	|		|
+ *	---------------------------------		|
+ *	|	S2	|	L2	|		|
+ *	---------------------------------		|
+ *	|	S3	|	L3	|		|
+ *	---------------------------------		|
+ *	|	S4	|	L4	|		nr_max
+ *	---------------------------------		|
+ *	|		|	L5	|		|
+ *	---------------------------------		|
+ *	|		|	L6	|		|
+ *	---------------------------------		|
+ *	|		|	L7	|		|
+ *	---------------------------------		|
+ *	|		|		|		|
+ *	---------------------------------		|
+ *	|		|		|	Oldest	|
+ *	------------------------------------------------V
+ *
+ *
+ * S0 is the newest in the stored records, where as L7 is the oldest in
+ * the live records. Unless the live buffer is detected as being full
+ * thus potentially dropping off some older records, L7 and S0 records
+ * are contiguous in time for a user task context. The stitched buffer
+ * here represents maximum possible branch records, contiguous in time.
+ *
+ *	Stored records  Live records
+ *	------------------------------------------------^
+ *	|	L0	|	L0	|	Newest	|
+ *	---------------------------------		|
+ *	|	L0	|	L1	|		|
+ *	---------------------------------		|
+ *	|	L2	|	L2	|		|
+ *	---------------------------------		|
+ *	|	L3	|	L3	|		|
+ *	---------------------------------		|
+ *	|	L4	|	L4	|	      nr_max
+ *	---------------------------------		|
+ *	|	L5	|	L5	|		|
+ *	---------------------------------		|
+ *	|	L6	|	L6	|		|
+ *	---------------------------------		|
+ *	|	L7	|	L7	|		|
+ *	---------------------------------		|
+ *	|	S0	|		|		|
+ *	---------------------------------		|
+ *	|	S1	|		|    Oldest	|
+ *	------------------------------------------------V
+ *	|	S2	| <----|
+ *	-----------------      |
+ *	|	S3	| <----| Dropped off after nr_max
+ *	-----------------      |
+ *	|	S4	| <----|
+ *	-----------------
+ */
+static int stitch_stored_live_entries(struct brbe_regset *stored,
+				      struct brbe_regset *live,
+				      int nr_stored, int nr_live,
+				      int nr_max)
+{
+	int nr_move = min(nr_stored, nr_max - nr_live);
+
+	/* Move the tail of the buffer to make room for the new entries */
+	memmove(&stored[nr_live], &stored[0], nr_move * sizeof(*stored));
+
+	/* Copy the new entries into the head of the buffer */
+	memcpy(&stored[0], &live[0], nr_live * sizeof(*stored));
+
+	/* Return the number of entries in the stitched buffer */
+	return min(nr_live + nr_stored, nr_max);
+}
+
 /*
  * Generic perf branch filters supported on BRBE
  *