mbox series

[v3,0/5] arm64: SPE ACPI enablement

Message ID 20190503232407.37195-1-jeremy.linton@arm.com (mailing list archive)
Headers show
Series arm64: SPE ACPI enablement | expand

Message

Jeremy Linton May 3, 2019, 11:24 p.m. UTC
This patch series enables the Arm Statistical Profiling
Extension (SPE) on ACPI platforms.

This is possible because ACPI 6.3 uses a previously
reserved field in the MADT to store the SPE interrupt
number, similarly to how the normal PMU is described.
If a consistent valid interrupt exists across all the
cores in the system, a platform device is registered.
That then triggers the SPE module, which runs as normal.

We also add the ability to parse the PPTT for IDENTICAL
cores. We then use this to sanity check the single SPE
device we create. This creates a bit of a problem with
respect to the specification though. The specification
says that its legal for multiple tree's to exist in the
PPTT. We handle this fine, but what happens in the
case of multiple tree's is that the lack of a common
node with IDENTICAL set forces us to assume that there
are multiple non-IDENTICAL cores in the machine.

v2->v3: Previously a function pointer was being used
	  to handle the more complex node checking
	  required by the IDENTICAL flag. This version
	  simply checks for the IDENTICAL flag and calls
	  flag_identical() to preform the revision
	  and next node checks. (I think after reading
	  Raphael's comments for the Nth time, this is
	  actually what he was suggesting, which I
	  initially miss interpreted).
	Modify subject of first patch so that its clear
	  a that its a capitalization change rather,
	  than a logical C 'case' change.

v1->v2: Wrap the code which creates the SPE device in
	    a new CONFIG_ARM_SPE_ACPI ifdef.
	Move arm,spe-v1 device name into common header file
	Some comment/case sensitivity/function name changes.

Jeremy Linton (5):
  ACPI/PPTT: Trivial, change the capitalization of CPU
  ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
  ACPI/PPTT: Modify node flag detection to find last IDENTICAL
  arm_pmu: acpi: spe: Add initial MADT/SPE probing
  perf: arm_spe: Enable ACPI/Platform automatic module loading

 arch/arm64/include/asm/acpi.h |   3 +
 drivers/acpi/pptt.c           | 102 +++++++++++++++++++++++++---------
 drivers/perf/Kconfig          |   5 ++
 drivers/perf/arm_pmu_acpi.c   |  76 +++++++++++++++++++++++++
 drivers/perf/arm_spe_pmu.c    |  12 +++-
 include/linux/acpi.h          |   5 ++
 include/linux/perf/arm_pmu.h  |   2 +
 7 files changed, 176 insertions(+), 29 deletions(-)

Comments

Hanjun Guo May 4, 2019, 11:06 a.m. UTC | #1
Hi Jeremy, Mark,

On 2019/5/4 7:24, Jeremy Linton wrote:
> This patch series enables the Arm Statistical Profiling
> Extension (SPE) on ACPI platforms.
> 
> This is possible because ACPI 6.3 uses a previously
> reserved field in the MADT to store the SPE interrupt
> number, similarly to how the normal PMU is described.
> If a consistent valid interrupt exists across all the
> cores in the system, a platform device is registered.
> That then triggers the SPE module, which runs as normal.
> 
> We also add the ability to parse the PPTT for IDENTICAL
> cores. We then use this to sanity check the single SPE
> device we create. This creates a bit of a problem with
> respect to the specification though. The specification
> says that its legal for multiple tree's to exist in the
> PPTT. We handle this fine, but what happens in the
> case of multiple tree's is that the lack of a common
> node with IDENTICAL set forces us to assume that there
> are multiple non-IDENTICAL cores in the machine.

Adding this patch set on top of latest mainline kernel,
and tested on D06 which has the SPE feature, in boot message
shows it was probed successfully:

arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]

but when I test it with spe events such as

perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls

it fails with:
failed to mmap with 12 (Cannot allocate memory),

Confirmed that patch [0] is merged and other perf events are working
fine.

I narrowed this issue down that mmap() failed to alloc 4M memory
in perf tool but seems have no relationship with this SPE patch set,
then I'm lost, could you take look please?

Thanks
Hanjun

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=528871b456026e6127d95b1b2bd8e3a003dc1614
Jeremy Linton May 7, 2019, 5:58 p.m. UTC | #2
Hi,

On 5/4/19 6:06 AM, Hanjun Guo wrote:
> Hi Jeremy, Mark,
> 
> On 2019/5/4 7:24, Jeremy Linton wrote:
>> This patch series enables the Arm Statistical Profiling
>> Extension (SPE) on ACPI platforms.
>>
>> This is possible because ACPI 6.3 uses a previously
>> reserved field in the MADT to store the SPE interrupt
>> number, similarly to how the normal PMU is described.
>> If a consistent valid interrupt exists across all the
>> cores in the system, a platform device is registered.
>> That then triggers the SPE module, which runs as normal.
>>
>> We also add the ability to parse the PPTT for IDENTICAL
>> cores. We then use this to sanity check the single SPE
>> device we create. This creates a bit of a problem with
>> respect to the specification though. The specification
>> says that its legal for multiple tree's to exist in the
>> PPTT. We handle this fine, but what happens in the
>> case of multiple tree's is that the lack of a common
>> node with IDENTICAL set forces us to assume that there
>> are multiple non-IDENTICAL cores in the machine.
> 
> Adding this patch set on top of latest mainline kernel,
> and tested on D06 which has the SPE feature, in boot message
> shows it was probed successfully:
> 
> arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]
> 
> but when I test it with spe events such as
> 
> perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls
> 
> it fails with:
> failed to mmap with 12 (Cannot allocate memory),
> 
> Confirmed that patch [0] is merged and other perf events are working
> fine.

Its pretty easy to get into the weeds with this driver, does it work 
with examples like:

https://lkml.org/lkml/2018/1/14/122

> I narrowed this issue down that mmap() failed to alloc 4M memory
> in perf tool but seems have no relationship with this SPE patch set,
> then I'm lost, could you take look please?
> 
> Thanks
> Hanjun
> 
> [0]: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=528871b456026e6127d95b1b2bd8e3a003dc1614
>
Hanjun Guo May 8, 2019, 9:35 a.m. UTC | #3
+Cc Alexander.

On 2019/5/8 1:58, Jeremy Linton wrote:
> Hi,
> 
> On 5/4/19 6:06 AM, Hanjun Guo wrote:
>> Hi Jeremy, Mark,
>>
>> On 2019/5/4 7:24, Jeremy Linton wrote:
>>> This patch series enables the Arm Statistical Profiling
>>> Extension (SPE) on ACPI platforms.
>>>
>>> This is possible because ACPI 6.3 uses a previously
>>> reserved field in the MADT to store the SPE interrupt
>>> number, similarly to how the normal PMU is described.
>>> If a consistent valid interrupt exists across all the
>>> cores in the system, a platform device is registered.
>>> That then triggers the SPE module, which runs as normal.
>>>
>>> We also add the ability to parse the PPTT for IDENTICAL
>>> cores. We then use this to sanity check the single SPE
>>> device we create. This creates a bit of a problem with
>>> respect to the specification though. The specification
>>> says that its legal for multiple tree's to exist in the
>>> PPTT. We handle this fine, but what happens in the
>>> case of multiple tree's is that the lack of a common
>>> node with IDENTICAL set forces us to assume that there
>>> are multiple non-IDENTICAL cores in the machine.
>>
>> Adding this patch set on top of latest mainline kernel,
>> and tested on D06 which has the SPE feature, in boot message
>> shows it was probed successfully:
>>
>> arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]
>>
>> but when I test it with spe events such as
>>
>> perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls
>>
>> it fails with:
>> failed to mmap with 12 (Cannot allocate memory),
>>
>> Confirmed that patch [0] is merged and other perf events are working
>> fine.
> 
> Its pretty easy to get into the weeds with this driver, does it work with examples like:
> 
> https://lkml.org/lkml/2018/1/14/122

No, not work at all.

SPE works on 5.0, but not work after 5.1-rc1, bisected to this commit:

5768402fd9c6 perf/ring_buffer: Use high order allocations for AUX buffers optimistically

Cced Alexander as well as Alexander is the author of above patch.

Thanks
Hanjun
Sudeep Holla May 8, 2019, 4:45 p.m. UTC | #4
On Sat, May 04, 2019 at 07:06:19PM +0800, Hanjun Guo wrote:
> Hi Jeremy, Mark,
>
> On 2019/5/4 7:24, Jeremy Linton wrote:
> > This patch series enables the Arm Statistical Profiling
> > Extension (SPE) on ACPI platforms.
> >
> > This is possible because ACPI 6.3 uses a previously
> > reserved field in the MADT to store the SPE interrupt
> > number, similarly to how the normal PMU is described.
> > If a consistent valid interrupt exists across all the
> > cores in the system, a platform device is registered.
> > That then triggers the SPE module, which runs as normal.
> >
> > We also add the ability to parse the PPTT for IDENTICAL
> > cores. We then use this to sanity check the single SPE
> > device we create. This creates a bit of a problem with
> > respect to the specification though. The specification
> > says that its legal for multiple tree's to exist in the
> > PPTT. We handle this fine, but what happens in the
> > case of multiple tree's is that the lack of a common
> > node with IDENTICAL set forces us to assume that there
> > are multiple non-IDENTICAL cores in the machine.
>
> Adding this patch set on top of latest mainline kernel,
> and tested on D06 which has the SPE feature, in boot message
> shows it was probed successfully:
>
> arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]
>
> but when I test it with spe events such as
>
> perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls
>
> it fails with:
> failed to mmap with 12 (Cannot allocate memory),
>
> Confirmed that patch [0] is merged and other perf events are working
> fine.
>
> I narrowed this issue down that mmap() failed to alloc 4M memory
> in perf tool but seems have no relationship with this SPE patch set,
> then I'm lost, could you take look please?
>

Thanks for pointing this out. I had last tested SPE only with v5.0 and
missed completely to check on v5.1. FWIW, I can reproduce this issue
on v5.1

--
Regards,
Sudeep
Sudeep Holla May 8, 2019, 4:51 p.m. UTC | #5
On Wed, May 08, 2019 at 05:35:51PM +0800, Hanjun Guo wrote:
> +Cc Alexander.
>
> On 2019/5/8 1:58, Jeremy Linton wrote:
> > Hi,
> >
> > On 5/4/19 6:06 AM, Hanjun Guo wrote:
> >> Hi Jeremy, Mark,
> >>
> >> On 2019/5/4 7:24, Jeremy Linton wrote:
> >>> This patch series enables the Arm Statistical Profiling
> >>> Extension (SPE) on ACPI platforms.
> >>>
> >>> This is possible because ACPI 6.3 uses a previously
> >>> reserved field in the MADT to store the SPE interrupt
> >>> number, similarly to how the normal PMU is described.
> >>> If a consistent valid interrupt exists across all the
> >>> cores in the system, a platform device is registered.
> >>> That then triggers the SPE module, which runs as normal.
> >>>
> >>> We also add the ability to parse the PPTT for IDENTICAL
> >>> cores. We then use this to sanity check the single SPE
> >>> device we create. This creates a bit of a problem with
> >>> respect to the specification though. The specification
> >>> says that its legal for multiple tree's to exist in the
> >>> PPTT. We handle this fine, but what happens in the
> >>> case of multiple tree's is that the lack of a common
> >>> node with IDENTICAL set forces us to assume that there
> >>> are multiple non-IDENTICAL cores in the machine.
> >>
> >> Adding this patch set on top of latest mainline kernel,
> >> and tested on D06 which has the SPE feature, in boot message
> >> shows it was probed successfully:
> >>
> >> arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]
> >>
> >> but when I test it with spe events such as
> >>
> >> perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls
> >>
> >> it fails with:
> >> failed to mmap with 12 (Cannot allocate memory),
> >>
> >> Confirmed that patch [0] is merged and other perf events are working
> >> fine.
> >
> > Its pretty easy to get into the weeds with this driver, does it work with examples like:
> >
> > https://lkml.org/lkml/2018/1/14/122
>
> No, not work at all.
>
> SPE works on 5.0, but not work after 5.1-rc1, bisected to this commit:
>
> 5768402fd9c6 perf/ring_buffer: Use high order allocations for AUX buffers optimistically
>

Indeed this patch breaks SPE. As mentioned in the patch, it uses high
order allocations for AUX buffers and SPE PMU setup_aux explicitly
fails with the warning "unexpected high-order page for auxbuf!" if
it encounters one.

I don't know the intention of that check in SPE. Will ?

--
Regards,
Sudeep
Will Deacon May 9, 2019, 9:28 a.m. UTC | #6
On Wed, May 08, 2019 at 05:51:49PM +0100, Sudeep Holla wrote:
> On Wed, May 08, 2019 at 05:35:51PM +0800, Hanjun Guo wrote:
> > +Cc Alexander.
> >
> > On 2019/5/8 1:58, Jeremy Linton wrote:
> > > Hi,
> > >
> > > On 5/4/19 6:06 AM, Hanjun Guo wrote:
> > >> Hi Jeremy, Mark,
> > >>
> > >> On 2019/5/4 7:24, Jeremy Linton wrote:
> > >>> This patch series enables the Arm Statistical Profiling
> > >>> Extension (SPE) on ACPI platforms.
> > >>>
> > >>> This is possible because ACPI 6.3 uses a previously
> > >>> reserved field in the MADT to store the SPE interrupt
> > >>> number, similarly to how the normal PMU is described.
> > >>> If a consistent valid interrupt exists across all the
> > >>> cores in the system, a platform device is registered.
> > >>> That then triggers the SPE module, which runs as normal.
> > >>>
> > >>> We also add the ability to parse the PPTT for IDENTICAL
> > >>> cores. We then use this to sanity check the single SPE
> > >>> device we create. This creates a bit of a problem with
> > >>> respect to the specification though. The specification
> > >>> says that its legal for multiple tree's to exist in the
> > >>> PPTT. We handle this fine, but what happens in the
> > >>> case of multiple tree's is that the lack of a common
> > >>> node with IDENTICAL set forces us to assume that there
> > >>> are multiple non-IDENTICAL cores in the machine.
> > >>
> > >> Adding this patch set on top of latest mainline kernel,
> > >> and tested on D06 which has the SPE feature, in boot message
> > >> shows it was probed successfully:
> > >>
> > >> arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]
> > >>
> > >> but when I test it with spe events such as
> > >>
> > >> perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls
> > >>
> > >> it fails with:
> > >> failed to mmap with 12 (Cannot allocate memory),
> > >>
> > >> Confirmed that patch [0] is merged and other perf events are working
> > >> fine.
> > >
> > > Its pretty easy to get into the weeds with this driver, does it work with examples like:
> > >
> > > https://lkml.org/lkml/2018/1/14/122
> >
> > No, not work at all.
> >
> > SPE works on 5.0, but not work after 5.1-rc1, bisected to this commit:
> >
> > 5768402fd9c6 perf/ring_buffer: Use high order allocations for AUX buffers optimistically
> >
> 
> Indeed this patch breaks SPE. As mentioned in the patch, it uses high
> order allocations for AUX buffers and SPE PMU setup_aux explicitly
> fails with the warning "unexpected high-order page for auxbuf!" if
> it encounters one.
> 
> I don't know the intention of that check in SPE. Will ?

Since SPE uses virtual addressing, we don't really care about the underlying
page layout so there's no need to use higher-order allocations. I suppose we
could theoretically map them at the pmd level in some cases, but ignoring
them should also be harmless and I suspect you can delete the check.

Does the patch below fix the problem?

Will

--->8

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 7cb766dafe85..e120f933412a 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -855,16 +855,8 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
 	if (!pglist)
 		goto out_free_buf;
 
-	for (i = 0; i < nr_pages; ++i) {
-		struct page *page = virt_to_page(pages[i]);
-
-		if (PagePrivate(page)) {
-			pr_warn("unexpected high-order page for auxbuf!");
-			goto out_free_pglist;
-		}
-
+	for (i = 0; i < nr_pages; ++i)
 		pglist[i] = virt_to_page(pages[i]);
-	}
 
 	buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
 	if (!buf->base)
Sudeep Holla May 9, 2019, 10:35 a.m. UTC | #7
On Thu, May 09, 2019 at 10:28:11AM +0100, Will Deacon wrote:
> On Wed, May 08, 2019 at 05:51:49PM +0100, Sudeep Holla wrote:
> > On Wed, May 08, 2019 at 05:35:51PM +0800, Hanjun Guo wrote:
> > > +Cc Alexander.
> > >
> > > On 2019/5/8 1:58, Jeremy Linton wrote:
> > > > Hi,
> > > >
> > > > On 5/4/19 6:06 AM, Hanjun Guo wrote:
> > > >> Hi Jeremy, Mark,
> > > >>
> > > >> On 2019/5/4 7:24, Jeremy Linton wrote:
> > > >>> This patch series enables the Arm Statistical Profiling
> > > >>> Extension (SPE) on ACPI platforms.
> > > >>>
> > > >>> This is possible because ACPI 6.3 uses a previously
> > > >>> reserved field in the MADT to store the SPE interrupt
> > > >>> number, similarly to how the normal PMU is described.
> > > >>> If a consistent valid interrupt exists across all the
> > > >>> cores in the system, a platform device is registered.
> > > >>> That then triggers the SPE module, which runs as normal.
> > > >>>
> > > >>> We also add the ability to parse the PPTT for IDENTICAL
> > > >>> cores. We then use this to sanity check the single SPE
> > > >>> device we create. This creates a bit of a problem with
> > > >>> respect to the specification though. The specification
> > > >>> says that its legal for multiple tree's to exist in the
> > > >>> PPTT. We handle this fine, but what happens in the
> > > >>> case of multiple tree's is that the lack of a common
> > > >>> node with IDENTICAL set forces us to assume that there
> > > >>> are multiple non-IDENTICAL cores in the machine.
> > > >>
> > > >> Adding this patch set on top of latest mainline kernel,
> > > >> and tested on D06 which has the SPE feature, in boot message
> > > >> shows it was probed successfully:
> > > >>
> > > >> arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]
> > > >>
> > > >> but when I test it with spe events such as
> > > >>
> > > >> perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls
> > > >>
> > > >> it fails with:
> > > >> failed to mmap with 12 (Cannot allocate memory),
> > > >>
> > > >> Confirmed that patch [0] is merged and other perf events are working
> > > >> fine.
> > > >
> > > > Its pretty easy to get into the weeds with this driver, does it work with examples like:
> > > >
> > > > https://lkml.org/lkml/2018/1/14/122
> > >
> > > No, not work at all.
> > >
> > > SPE works on 5.0, but not work after 5.1-rc1, bisected to this commit:
> > >
> > > 5768402fd9c6 perf/ring_buffer: Use high order allocations for AUX buffers optimistically
> > >
> > 
> > Indeed this patch breaks SPE. As mentioned in the patch, it uses high
> > order allocations for AUX buffers and SPE PMU setup_aux explicitly
> > fails with the warning "unexpected high-order page for auxbuf!" if
> > it encounters one.
> > 
> > I don't know the intention of that check in SPE. Will ?
> 
> Since SPE uses virtual addressing, we don't really care about the underlying
> page layout so there's no need to use higher-order allocations. I suppose we
> could theoretically map them at the pmd level in some cases, but ignoring
> them should also be harmless and I suspect you can delete the check.
>

Yes, I did a quick look to see if we can do that, but couldn't find a clue.
Not sure if that's any optimisation, we can use order from page_private
and set the values accordingly ?

> Does the patch below fix the problem?
>

Yes it should help, I tried exactly the same thing yesterday and it does
fix the issue.

Regards,
Sudeep
Sudeep Holla May 9, 2019, 2:13 p.m. UTC | #8
On Thu, May 09, 2019 at 11:35:59AM +0100, Sudeep Holla wrote:
> On Thu, May 09, 2019 at 10:28:11AM +0100, Will Deacon wrote:

[...]

> >
> > Since SPE uses virtual addressing, we don't really care about the underlying
> > page layout so there's no need to use higher-order allocations. I suppose we
> > could theoretically map them at the pmd level in some cases, but ignoring
> > them should also be harmless and I suspect you can delete the check.
> >
>
> Yes, I did a quick look to see if we can do that, but couldn't find a clue.
> Not sure if that's any optimisation, we can use order from page_private
> and set the values accordingly ?
>
And I forgot to add the diff that I mentioned above, something like the
patch below.

Regards,
Sudeep

-->8

diff --git i/drivers/perf/arm_spe_pmu.c w/drivers/perf/arm_spe_pmu.c
index 7cb766dafe85..45cd62517080 100644
--- i/drivers/perf/arm_spe_pmu.c
+++ w/drivers/perf/arm_spe_pmu.c
@@ -827,7 +827,7 @@ static void arm_spe_pmu_read(struct perf_event *event)
 static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
 				   int nr_pages, bool snapshot)
 {
-	int i, cpu = event->cpu;
+	int i, j, cpu = event->cpu;
 	struct page **pglist;
 	struct arm_spe_pmu_buf *buf;
 
@@ -859,11 +859,12 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
 		struct page *page = virt_to_page(pages[i]);
 
 		if (PagePrivate(page)) {
-			pr_warn("unexpected high-order page for auxbuf!");
-			goto out_free_pglist;
+			for (j = 0; j < 1 << page_private(page); j++)
+				pglist[i + j] = page++;
+			i += j - 1;
+		} else {
+			pglist[i] = page;
 		}
-
-		pglist[i] = virt_to_page(pages[i]);
 	}
 
 	buf->base = vmap(pglist, nr_pages, VM_MAP, PAGE_KERNEL);
Will Deacon May 13, 2019, 10:56 a.m. UTC | #9
Hi Sudeep,

On Thu, May 09, 2019 at 03:13:50PM +0100, Sudeep Holla wrote:
> On Thu, May 09, 2019 at 11:35:59AM +0100, Sudeep Holla wrote:
> > On Thu, May 09, 2019 at 10:28:11AM +0100, Will Deacon wrote:
> 
> [...]
> 
> > >
> > > Since SPE uses virtual addressing, we don't really care about the underlying
> > > page layout so there's no need to use higher-order allocations. I suppose we
> > > could theoretically map them at the pmd level in some cases, but ignoring
> > > them should also be harmless and I suspect you can delete the check.
> > >
> >
> > Yes, I did a quick look to see if we can do that, but couldn't find a clue.
> > Not sure if that's any optimisation, we can use order from page_private
> > and set the values accordingly ?
> >
> And I forgot to add the diff that I mentioned above, something like the
> patch below.
> 
> Regards,
> Sudeep
> 
> -->8
> 
> diff --git i/drivers/perf/arm_spe_pmu.c w/drivers/perf/arm_spe_pmu.c
> index 7cb766dafe85..45cd62517080 100644
> --- i/drivers/perf/arm_spe_pmu.c
> +++ w/drivers/perf/arm_spe_pmu.c
> @@ -827,7 +827,7 @@ static void arm_spe_pmu_read(struct perf_event *event)
>  static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
>  				   int nr_pages, bool snapshot)
>  {
> -	int i, cpu = event->cpu;
> +	int i, j, cpu = event->cpu;
>  	struct page **pglist;
>  	struct arm_spe_pmu_buf *buf;
>  
> @@ -859,11 +859,12 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
>  		struct page *page = virt_to_page(pages[i]);
>  
>  		if (PagePrivate(page)) {
> -			pr_warn("unexpected high-order page for auxbuf!");
> -			goto out_free_pglist;
> +			for (j = 0; j < 1 << page_private(page); j++)
> +				pglist[i + j] = page++;
> +			i += j - 1;
> +		} else {
> +			pglist[i] = page;

Hmm. Given that vmap() doesn't do anything special for high-order pages
and rb_alloc_aux()/rb_alloc_aux_page() already split the allocation up
for the page array, what does your change accomplish on top of that?

Will
Hanjun Guo May 13, 2019, 11:10 a.m. UTC | #10
On 2019/5/9 18:35, Sudeep Holla wrote:
> On Thu, May 09, 2019 at 10:28:11AM +0100, Will Deacon wrote:
>> On Wed, May 08, 2019 at 05:51:49PM +0100, Sudeep Holla wrote:
>>> On Wed, May 08, 2019 at 05:35:51PM +0800, Hanjun Guo wrote:
[...]
>>>>>>
>>>>>> Adding this patch set on top of latest mainline kernel,
>>>>>> and tested on D06 which has the SPE feature, in boot message
>>>>>> shows it was probed successfully:
>>>>>>
>>>>>> arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, features 0x7]
>>>>>>
>>>>>> but when I test it with spe events such as
>>>>>>
>>>>>> perf record -c 1024 -e arm_spe_0/branch_filter=0/ -o spe ls
>>>>>>
>>>>>> it fails with:
>>>>>> failed to mmap with 12 (Cannot allocate memory),
>>>>>>
>>>>>> Confirmed that patch [0] is merged and other perf events are working
>>>>>> fine.
>>>>>
>>>>> Its pretty easy to get into the weeds with this driver, does it work with examples like:
>>>>>
>>>>> https://lkml.org/lkml/2018/1/14/122
>>>>
>>>> No, not work at all.
>>>>
>>>> SPE works on 5.0, but not work after 5.1-rc1, bisected to this commit:
>>>>
>>>> 5768402fd9c6 perf/ring_buffer: Use high order allocations for AUX buffers optimistically
>>>>
>>>
>>> Indeed this patch breaks SPE. As mentioned in the patch, it uses high
>>> order allocations for AUX buffers and SPE PMU setup_aux explicitly
>>> fails with the warning "unexpected high-order page for auxbuf!" if
>>> it encounters one.
>>>
>>> I don't know the intention of that check in SPE. Will ?
>>
>> Since SPE uses virtual addressing, we don't really care about the underlying
>> page layout so there's no need to use higher-order allocations. I suppose we
>> could theoretically map them at the pmd level in some cases, but ignoring
>> them should also be harmless and I suspect you can delete the check.
>>
> 
> Yes, I did a quick look to see if we can do that, but couldn't find a clue.
> Not sure if that's any optimisation, we can use order from page_private
> and set the values accordingly ?
> 
>> Does the patch below fix the problem?
>>
> 
> Yes it should help, I tried exactly the same thing yesterday and it does
> fix the issue.

Works for me too, thank you Sudeep and Will for looking into this issue.

Best Regards
Hanjun
Sudeep Holla May 13, 2019, 11:31 a.m. UTC | #11
On Mon, May 13, 2019 at 11:56:31AM +0100, Will Deacon wrote:
> Hi Sudeep,
> 
> On Thu, May 09, 2019 at 03:13:50PM +0100, Sudeep Holla wrote:
> > On Thu, May 09, 2019 at 11:35:59AM +0100, Sudeep Holla wrote:
> > > On Thu, May 09, 2019 at 10:28:11AM +0100, Will Deacon wrote:
> > 
> > [...]
> > 
> > > >
> > > > Since SPE uses virtual addressing, we don't really care about the underlying
> > > > page layout so there's no need to use higher-order allocations. I suppose we
> > > > could theoretically map them at the pmd level in some cases, but ignoring
> > > > them should also be harmless and I suspect you can delete the check.
> > > >
> > >
> > > Yes, I did a quick look to see if we can do that, but couldn't find a clue.
> > > Not sure if that's any optimisation, we can use order from page_private
> > > and set the values accordingly ?
> > >
> > And I forgot to add the diff that I mentioned above, something like the
> > patch below.
> > 
> > Regards,
> > Sudeep
> > 
> > -->8
> > 
> > diff --git i/drivers/perf/arm_spe_pmu.c w/drivers/perf/arm_spe_pmu.c
> > index 7cb766dafe85..45cd62517080 100644
> > --- i/drivers/perf/arm_spe_pmu.c
> > +++ w/drivers/perf/arm_spe_pmu.c
> > @@ -827,7 +827,7 @@ static void arm_spe_pmu_read(struct perf_event *event)
> >  static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
> >  				   int nr_pages, bool snapshot)
> >  {
> > -	int i, cpu = event->cpu;
> > +	int i, j, cpu = event->cpu;
> >  	struct page **pglist;
> >  	struct arm_spe_pmu_buf *buf;
> >  
> > @@ -859,11 +859,12 @@ static void *arm_spe_pmu_setup_aux(struct perf_event *event, void **pages,
> >  		struct page *page = virt_to_page(pages[i]);
> >  
> >  		if (PagePrivate(page)) {
> > -			pr_warn("unexpected high-order page for auxbuf!");
> > -			goto out_free_pglist;
> > +			for (j = 0; j < 1 << page_private(page); j++)
> > +				pglist[i + j] = page++;
> > +			i += j - 1;
> > +		} else {
> > +			pglist[i] = page;
> 
> Hmm. Given that vmap() doesn't do anything special for high-order pages
> and rb_alloc_aux()/rb_alloc_aux_page() already split the allocation up
> for the page array, what does your change accomplish on top of that?
> 

Not much, instead of computing page ptr for each page using virt_to_page,
we jump pointers automatically for all the pages that are private.

page_private(page) holds the order. i.e. for 2MB high order allocation
we can skip calling virt_to_page for 511 pages that are contiguous.

--
Regards,
Sudeep