diff mbox

[3/4] dt-binding: remoteproc: venus rproc dt binding document

Message ID 1471622000-1906-4-git-send-email-stanimir.varbanov@linaro.org (mailing list archive)
State Not Applicable, archived
Delegated to: Andy Gross
Headers show

Commit Message

Stanimir Varbanov Aug. 19, 2016, 3:53 p.m. UTC
Add devicetree binding document for Venus remote processor.

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
---
 .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt

Comments

Rob Herring Aug. 23, 2016, 5:32 p.m. UTC | #1
On Fri, Aug 19, 2016 at 06:53:19PM +0300, Stanimir Varbanov wrote:
> Add devicetree binding document for Venus remote processor.
> 
> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
> ---
>  .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> 
> diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> new file mode 100644
> index 000000000000..06a2db60fa38
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> @@ -0,0 +1,33 @@
> +Qualcomm Venus Peripheral Image Loader
> +
> +This document defines the binding for a component that loads and boots firmware
> +on the Qualcomm Venus remote processor core.

This does not make sense to me. Venus is the video encoder/decoder h/w, 
right? Why is the firmware loader separate from the codec block? Why 
rproc is used? Are there multiple clients? Naming it rproc_venus implies 
there aren't. And why does the firmware loading need 8MB of memory at a 
fixed address?

> +
> +- compatible:
> +	Usage: required
> +	Value type: <string>
> +	Definition: must contain "qcom,venus-pil"
> +
> +- memory-region:
> +	Usage: required
> +	Value type: <phandle>
> +	Definition: a phandle to a node describing reserved memory
> +
> +* An example
> +	reserved-memory {
> +		#address-cells = <2>;
> +		#size-cells = <2>;
> +		ranges;
> +
> +		venus_mem: venus@89900000 {
> +			compatible = "shared-dma-pool";
> +			reg = <0x0 0x89900000 0x0 0x800000>;
> +			alignment = <0x1000>;
> +			no-map;
> +		};
> +	};
> +
> +	rproc_venus@0 {
> +		compatible = "qcom,venus-pil";
> +		memory-region = <&venus_mem>;
> +	};
> \ No newline at end of file
> -- 
> 2.7.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Andersson Aug. 23, 2016, 6:21 p.m. UTC | #2
On Tue 23 Aug 10:32 PDT 2016, Rob Herring wrote:

> On Fri, Aug 19, 2016 at 06:53:19PM +0300, Stanimir Varbanov wrote:
> > Add devicetree binding document for Venus remote processor.
> > 
> > Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
> > ---
> >  .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> > new file mode 100644
> > index 000000000000..06a2db60fa38
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> > @@ -0,0 +1,33 @@
> > +Qualcomm Venus Peripheral Image Loader
> > +
> > +This document defines the binding for a component that loads and boots firmware
> > +on the Qualcomm Venus remote processor core.
> 
> This does not make sense to me. Venus is the video encoder/decoder h/w, 
> right?

Yes, Venus is an ARM based thing doing video encoding and decoding.

> Why is the firmware loader separate from the codec block?
> Why rproc is used?

Implementation wise it shares structure and almost all the logic with
other remoteprocs in the Qualcomm platform; you load firmware into a
memory region, you grab a few resources (clocks, regulators,
power-domains), you jump into TrustZone for signature checks and you
release the resources as the remote is booted and have voted for these
with the RPM.

But there is also a second operation mode, where one of the Hexagon DSPs
"imitates" a Venus core; with slightly different transport mechanism for
transferring the command stream - so the Venus node might operate on a
non-Venus hardware.


That said, the Venus node (in Venus-hw mode) has a 1:1 life cycle with
the power-on-state of the remoteproc. So perhaps we should describe the
two parts in one DT node and have the rproc-venus implementation spawn
the v4l driver when the remote is running...

But that would mean that on a 8064 we would have 5-6 nodes describing
standalone remoteprocs and one describing the exact same thing but in a
completely different way.


If we keep it as two nodes, I think it would be better to describe the
video-part as a child of the venus-rproc; to show the link between the
two parts.

> Are there multiple clients?
> Naming it rproc_venus implies there aren't.

I'm still investigating this, but it looks like rproc part of the
8060/8960/8064 "vidc" is very similar.

> And why does the firmware loading need 8MB of memory at a fixed address?
> 

On msm8974 the Venus should be loaded into a 5MB region with a fixed
address, perhaps just because of some memory budgeting document. On 8916
it looks (downstream) like all we need is the size and it can be
positioned wherever.

But I would say this is not a property of the rproc-venus, but rather
about system configuration and the firmware. As such I think we should
omit the memory reserve from the example and make sure the
implementation can deal with either a fixed or only-sized reserved
memory region.

Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanimir Varbanov Aug. 24, 2016, 3:36 p.m. UTC | #3
Hi Rob,

On 08/23/2016 08:32 PM, Rob Herring wrote:
> On Fri, Aug 19, 2016 at 06:53:19PM +0300, Stanimir Varbanov wrote:
>> Add devicetree binding document for Venus remote processor.
>>
>> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
>> ---
>>  .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
>>  1 file changed, 33 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>>
>> diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>> new file mode 100644
>> index 000000000000..06a2db60fa38
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>> @@ -0,0 +1,33 @@
>> +Qualcomm Venus Peripheral Image Loader
>> +
>> +This document defines the binding for a component that loads and boots firmware
>> +on the Qualcomm Venus remote processor core.
> 
> This does not make sense to me. Venus is the video encoder/decoder h/w, 
> right? Why is the firmware loader separate from the codec block? Why 
> rproc is used? Are there multiple clients? Naming it rproc_venus implies 
> there aren't. And why does the firmware loading need 8MB of memory at a 
> fixed address?
> 

The firmware for Venus case is 5MB. And here is 8MB because of
dma_alloc_from_coherent size restriction.

The address is not really fixed, cause the firmware could support
relocation. In this example I just picked up the next free memory region
in memory-reserved from msm8916.dtsi.

regards,
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Andersson Aug. 25, 2016, 12:05 a.m. UTC | #4
On Wed 24 Aug 08:36 PDT 2016, Stanimir Varbanov wrote:

> Hi Rob,
> 
> On 08/23/2016 08:32 PM, Rob Herring wrote:
> > On Fri, Aug 19, 2016 at 06:53:19PM +0300, Stanimir Varbanov wrote:
> >> Add devicetree binding document for Venus remote processor.
> >>
> >> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
> >> ---
> >>  .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
> >>  1 file changed, 33 insertions(+)
> >>  create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> >>
> >> diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> >> new file mode 100644
> >> index 000000000000..06a2db60fa38
> >> --- /dev/null
> >> +++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> >> @@ -0,0 +1,33 @@
> >> +Qualcomm Venus Peripheral Image Loader
> >> +
> >> +This document defines the binding for a component that loads and boots firmware
> >> +on the Qualcomm Venus remote processor core.
> > 
> > This does not make sense to me. Venus is the video encoder/decoder h/w, 
> > right? Why is the firmware loader separate from the codec block? Why 
> > rproc is used? Are there multiple clients? Naming it rproc_venus implies 
> > there aren't. And why does the firmware loading need 8MB of memory at a 
> > fixed address?
> > 
> 
> The firmware for Venus case is 5MB. And here is 8MB because of
> dma_alloc_from_coherent size restriction.
> 

Then you should specify it 5MB large and we'll have to deal with this
implementation issue in the code. I've created a JIRA ticket for
the dma_alloc_from_coherent() behavior.

> The address is not really fixed, cause the firmware could support
> relocation. In this example I just picked up the next free memory region
> in memory-reserved from msm8916.dtsi.
> 

In 8974 we do have a physical region where it's expected to be loaded.

So in line with upcoming remoteproc work we should support referencing a
reserved-memory node with either reg or size.

In the case of spotting a "reg" we're currently better off using
ioremap. We're looking at getting the remoteproc core to deal with this
mess.


So, on 8916 I think you should use the form:

venus_mem: venus {
	size = <0x500000>;
};

And I don't think you should use the shared-dma-pool compatible, because
this is not a region for multiple devices to allocate dma memory out of.


Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanimir Varbanov Aug. 25, 2016, 11:10 a.m. UTC | #5
Hi Bjorn,

On 08/25/2016 03:05 AM, Bjorn Andersson wrote:
> On Wed 24 Aug 08:36 PDT 2016, Stanimir Varbanov wrote:
> 
>> Hi Rob,
>>
>> On 08/23/2016 08:32 PM, Rob Herring wrote:
>>> On Fri, Aug 19, 2016 at 06:53:19PM +0300, Stanimir Varbanov wrote:
>>>> Add devicetree binding document for Venus remote processor.
>>>>
>>>> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
>>>> ---
>>>>  .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
>>>>  1 file changed, 33 insertions(+)
>>>>  create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>>>> new file mode 100644
>>>> index 000000000000..06a2db60fa38
>>>> --- /dev/null
>>>> +++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>>>> @@ -0,0 +1,33 @@
>>>> +Qualcomm Venus Peripheral Image Loader
>>>> +
>>>> +This document defines the binding for a component that loads and boots firmware
>>>> +on the Qualcomm Venus remote processor core.
>>>
>>> This does not make sense to me. Venus is the video encoder/decoder h/w, 
>>> right? Why is the firmware loader separate from the codec block? Why 
>>> rproc is used? Are there multiple clients? Naming it rproc_venus implies 
>>> there aren't. And why does the firmware loading need 8MB of memory at a 
>>> fixed address?
>>>
>>
>> The firmware for Venus case is 5MB. And here is 8MB because of
>> dma_alloc_from_coherent size restriction.
>>
> 
> Then you should specify it 5MB large and we'll have to deal with this
> implementation issue in the code. I've created a JIRA ticket for
> the dma_alloc_from_coherent() behavior.

Infact it should be 5MB + ~100KB for iommu page table.

> 
>> The address is not really fixed, cause the firmware could support
>> relocation. In this example I just picked up the next free memory region
>> in memory-reserved from msm8916.dtsi.
>>
> 
> In 8974 we do have a physical region where it's expected to be loaded.
> 
> So in line with upcoming remoteproc work we should support referencing a
> reserved-memory node with either reg or size.
> 
> In the case of spotting a "reg" we're currently better off using
> ioremap. We're looking at getting the remoteproc core to deal with this
> mess.

You mean that remoteproc core will parse memory-region property?

> 
> 
> So, on 8916 I think you should use the form:
> 
> venus_mem: venus {
> 	size = <0x500000>;
> };

Don't forget that the physical address where the firmware is stored has
some range, the scm call will fail if it is out of the expected range,
probably because of some security reasons. So maybe alloc-ranges should
be specified here.

> 
> And I don't think you should use the shared-dma-pool compatible, because
> this is not a region for multiple devices to allocate dma memory out of.

Then I cannot reuse reserved-mem infrastructure.
Bjorn Andersson Aug. 26, 2016, 10:23 p.m. UTC | #6
On Thu 25 Aug 04:10 PDT 2016, Stanimir Varbanov wrote:

> Hi Bjorn,
> 
> On 08/25/2016 03:05 AM, Bjorn Andersson wrote:
> > On Wed 24 Aug 08:36 PDT 2016, Stanimir Varbanov wrote:
> > 
> >> Hi Rob,
> >>
> >> On 08/23/2016 08:32 PM, Rob Herring wrote:
> >>> On Fri, Aug 19, 2016 at 06:53:19PM +0300, Stanimir Varbanov wrote:
> >>>> Add devicetree binding document for Venus remote processor.
> >>>>
> >>>> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
> >>>> ---
> >>>>  .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
> >>>>  1 file changed, 33 insertions(+)
> >>>>  create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> >>>>
> >>>> diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> >>>> new file mode 100644
> >>>> index 000000000000..06a2db60fa38
> >>>> --- /dev/null
> >>>> +++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
> >>>> @@ -0,0 +1,33 @@
> >>>> +Qualcomm Venus Peripheral Image Loader
> >>>> +
> >>>> +This document defines the binding for a component that loads and boots firmware
> >>>> +on the Qualcomm Venus remote processor core.
> >>>
> >>> This does not make sense to me. Venus is the video encoder/decoder h/w, 
> >>> right? Why is the firmware loader separate from the codec block? Why 
> >>> rproc is used? Are there multiple clients? Naming it rproc_venus implies 
> >>> there aren't. And why does the firmware loading need 8MB of memory at a 
> >>> fixed address?
> >>>
> >>
> >> The firmware for Venus case is 5MB. And here is 8MB because of
> >> dma_alloc_from_coherent size restriction.
> >>
> > 
> > Then you should specify it 5MB large and we'll have to deal with this
> > implementation issue in the code. I've created a JIRA ticket for
> > the dma_alloc_from_coherent() behavior.
> 
> Infact it should be 5MB + ~100KB for iommu page table.
> 

Trying to wrap my head around how the iommu part works here. The
downstream code seems to indicate that this is a "generic" secure iommu
interface - used by venus, camera and kgsl; likely for dealing with DRM
protected buffers.

As such the iommu tables are not part of the venus rproc; I believe they
should either be tied into the msm-iommu driver or perhaps exposed as
its own iommu(?).


But I presume from your inclusion that you've concluded that the venus
firmware we have refuses to execute without these tables at least
initialized, is this correct?

> > 
> >> The address is not really fixed, cause the firmware could support
> >> relocation. In this example I just picked up the next free memory region
> >> in memory-reserved from msm8916.dtsi.
> >>
> > 
> > In 8974 we do have a physical region where it's expected to be loaded.
> > 
> > So in line with upcoming remoteproc work we should support referencing a
> > reserved-memory node with either reg or size.
> > 
> > In the case of spotting a "reg" we're currently better off using
> > ioremap. We're looking at getting the remoteproc core to deal with this
> > mess.
> 
> You mean that remoteproc core will parse memory-region property?
> 

It has to, because it's a quite common scenario for remoteproc drivers
to either get its backing memory from a static region or be restricted
to part of system ram - properties that reserved-memory and
memory-region captures already.

> > 
> > 
> > So, on 8916 I think you should use the form:
> > 
> > venus_mem: venus {
> > 	size = <0x500000>;
> > };
> 
> Don't forget that the physical address where the firmware is stored has
> some range, the scm call will fail if it is out of the expected range,
> probably because of some security reasons. So maybe alloc-ranges should
> be specified here.
> 

Thanks for highlighting this.

> > 
> > And I don't think you should use the shared-dma-pool compatible, because
> > this is not a region for multiple devices to allocate dma memory out of.
> 
> Then I cannot reuse reserved-mem infrastructure.
> 

You're right. If I understand the code correctly we need to use the
compatible shared-dma-pool and mark it either "no-map" or "reusable", to
be able to use dma_alloc_coherent().


But I presume we have the implementation issue of dma_alloc_coherent()
failing in either case with the 5MB size. I think we need to look into
that - and have created a JIRA ticket for it.

Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanimir Varbanov Aug. 29, 2016, 11:48 a.m. UTC | #7
Hi,

On 08/27/2016 01:23 AM, Bjorn Andersson wrote:
> On Thu 25 Aug 04:10 PDT 2016, Stanimir Varbanov wrote:
> 
>> Hi Bjorn,
>>
>> On 08/25/2016 03:05 AM, Bjorn Andersson wrote:
>>> On Wed 24 Aug 08:36 PDT 2016, Stanimir Varbanov wrote:
>>>
>>>> Hi Rob,
>>>>
>>>> On 08/23/2016 08:32 PM, Rob Herring wrote:
>>>>> On Fri, Aug 19, 2016 at 06:53:19PM +0300, Stanimir Varbanov wrote:
>>>>>> Add devicetree binding document for Venus remote processor.
>>>>>>
>>>>>> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
>>>>>> ---
>>>>>>  .../devicetree/bindings/remoteproc/qcom,venus.txt  | 33 ++++++++++++++++++++++
>>>>>>  1 file changed, 33 insertions(+)
>>>>>>  create mode 100644 Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>>>>>>
>>>>>> diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>>>>>> new file mode 100644
>>>>>> index 000000000000..06a2db60fa38
>>>>>> --- /dev/null
>>>>>> +++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
>>>>>> @@ -0,0 +1,33 @@
>>>>>> +Qualcomm Venus Peripheral Image Loader
>>>>>> +
>>>>>> +This document defines the binding for a component that loads and boots firmware
>>>>>> +on the Qualcomm Venus remote processor core.
>>>>>
>>>>> This does not make sense to me. Venus is the video encoder/decoder h/w, 
>>>>> right? Why is the firmware loader separate from the codec block? Why 
>>>>> rproc is used? Are there multiple clients? Naming it rproc_venus implies 
>>>>> there aren't. And why does the firmware loading need 8MB of memory at a 
>>>>> fixed address?
>>>>>
>>>>
>>>> The firmware for Venus case is 5MB. And here is 8MB because of
>>>> dma_alloc_from_coherent size restriction.
>>>>
>>>
>>> Then you should specify it 5MB large and we'll have to deal with this
>>> implementation issue in the code. I've created a JIRA ticket for
>>> the dma_alloc_from_coherent() behavior.
>>
>> Infact it should be 5MB + ~100KB for iommu page table.
>>
> 
> Trying to wrap my head around how the iommu part works here. The
> downstream code seems to indicate that this is a "generic" secure iommu
> interface - used by venus, camera and kgsl; likely for dealing with DRM
> protected buffers.

The secure iommu interface is for content protected buffers. But these
secure iommu contexts aren't used by msm DRM nor Venus in mainline. In
Venus case I use non-secure iommu context for data buffers.

> 
> As such the iommu tables are not part of the venus rproc; I believe they
> should either be tied into the msm-iommu driver or perhaps exposed as
> its own iommu(?).

The page tables are in msm-iommu driver.

> 
> 
> But I presume from your inclusion that you've concluded that the venus
> firmware we have refuses to execute without these tables at least
> initialized, is this correct?

Yes, the SMC call for PAS memory-setup will fail if this page table is
not initialized.

> 
>>>
>>>> The address is not really fixed, cause the firmware could support
>>>> relocation. In this example I just picked up the next free memory region
>>>> in memory-reserved from msm8916.dtsi.
>>>>
>>>
>>> In 8974 we do have a physical region where it's expected to be loaded.
>>>
>>> So in line with upcoming remoteproc work we should support referencing a
>>> reserved-memory node with either reg or size.
>>>
>>> In the case of spotting a "reg" we're currently better off using
>>> ioremap. We're looking at getting the remoteproc core to deal with this
>>> mess.
>>
>> You mean that remoteproc core will parse memory-region property?
>>
> 
> It has to, because it's a quite common scenario for remoteproc drivers
> to either get its backing memory from a static region or be restricted
> to part of system ram - properties that reserved-memory and
> memory-region captures already.

OK, I have no issues with that. My concern is the manual parsing of
'memory-region' and 'reg' properties in remoteproc core.

So that idea is to have generic binding for rproc, that would be good.

> 
>>>
>>>
>>> So, on 8916 I think you should use the form:
>>>
>>> venus_mem: venus {
>>> 	size = <0x500000>;
>>> };
>>
>> Don't forget that the physical address where the firmware is stored has
>> some range, the scm call will fail if it is out of the expected range,
>> probably because of some security reasons. So maybe alloc-ranges should
>> be specified here.
>>
> 
> Thanks for highlighting this.
> 
>>>
>>> And I don't think you should use the shared-dma-pool compatible, because
>>> this is not a region for multiple devices to allocate dma memory out of.
>>
>> Then I cannot reuse reserved-mem infrastructure.
>>
> 
> You're right. If I understand the code correctly we need to use the
> compatible shared-dma-pool and mark it either "no-map" or "reusable", to
> be able to use dma_alloc_coherent().

correct.

> 
> 
> But I presume we have the implementation issue of dma_alloc_coherent()
> failing in either case with the 5MB size. I think we need to look into

I'd be good to include Marek Szyprowski? At least he will know what
design restrictions there are.

> that - and have created a JIRA ticket for it.
> 
> Regards,
> Bjorn
>
Bjorn Andersson Aug. 30, 2016, 5:17 p.m. UTC | #8
On Mon 29 Aug 04:48 PDT 2016, Stanimir Varbanov wrote:

[..]
> > Trying to wrap my head around how the iommu part works here. The
> > downstream code seems to indicate that this is a "generic" secure iommu
> > interface - used by venus, camera and kgsl; likely for dealing with DRM
> > protected buffers.
> 
> The secure iommu interface is for content protected buffers. But these
> secure iommu contexts aren't used by msm DRM nor Venus in mainline. In
> Venus case I use non-secure iommu context for data buffers.
> 

We must consider the case when DRM, VFE and Venus handles protected
buffers.

> > 
> > As such the iommu tables are not part of the venus rproc; I believe they
> > should either be tied into the msm-iommu driver or perhaps exposed as
> > its own iommu(?).
> 
> The page tables are in msm-iommu driver.
> 

So, just to verify your answer, the msm-iommu driver will handle both
protected and unprotected?

> > 
> > 
> > But I presume from your inclusion that you've concluded that the venus
> > firmware we have refuses to execute without these tables at least
> > initialized, is this correct?
> 
> Yes, the SMC call for PAS memory-setup will fail if this page table is
> not initialized.
> 

If the msm-iommu driver will handle the protected buffers (or if there
will be a separate iommu driver for protected buffers) it should issue
these calls, to not be dependant on the rproc-venus driver.

With that I think we should make the rproc-venus driver depend on this
being setup (even if this means creating a "dummy" driver for the
protected iommu handling for now).

> > 
> >>>
> >>>> The address is not really fixed, cause the firmware could support
> >>>> relocation. In this example I just picked up the next free memory region
> >>>> in memory-reserved from msm8916.dtsi.
> >>>>
> >>>
> >>> In 8974 we do have a physical region where it's expected to be loaded.
> >>>
> >>> So in line with upcoming remoteproc work we should support referencing a
> >>> reserved-memory node with either reg or size.
> >>>
> >>> In the case of spotting a "reg" we're currently better off using
> >>> ioremap. We're looking at getting the remoteproc core to deal with this
> >>> mess.
> >>
> >> You mean that remoteproc core will parse memory-region property?
> >>
> > 
> > It has to, because it's a quite common scenario for remoteproc drivers
> > to either get its backing memory from a static region or be restricted
> > to part of system ram - properties that reserved-memory and
> > memory-region captures already.
> 
> OK, I have no issues with that. My concern is the manual parsing of
> 'memory-region' and 'reg' properties in remoteproc core.
> 
> So that idea is to have generic binding for rproc, that would be good.
> 

I do share your concerns here. But it's a recurring issue with
remoteproc drivers.

[..]
> > But I presume we have the implementation issue of dma_alloc_coherent()
> > failing in either case with the 5MB size. I think we need to look into
> 
> I'd be good to include Marek Szyprowski? At least he will know what
> design restrictions there are.
> 

Please do. The more I look at this the more I think we must use the
existing infrastructure for allocating "dma memory". Getting
dma_alloc_coherent() supporting non-power-of-2 memory regions would
allow us to use the existing infrastructure, for both fixed and
dynamically placed memory carveouts in remoteproc.

Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanimir Varbanov Sept. 1, 2016, 2:58 p.m. UTC | #9
Hi,

Cc: Marek

On 08/30/2016 08:17 PM, Bjorn Andersson wrote:
> On Mon 29 Aug 04:48 PDT 2016, Stanimir Varbanov wrote:
> 
> [..]
>>> Trying to wrap my head around how the iommu part works here. The
>>> downstream code seems to indicate that this is a "generic" secure iommu
>>> interface - used by venus, camera and kgsl; likely for dealing with DRM
>>> protected buffers.
>>
>> The secure iommu interface is for content protected buffers. But these
>> secure iommu contexts aren't used by msm DRM nor Venus in mainline. In
>> Venus case I use non-secure iommu context for data buffers.
>>
> 
> We must consider the case when DRM, VFE and Venus handles protected
> buffers.

That would be interesting topic.

> 
>>>
>>> As such the iommu tables are not part of the venus rproc; I believe they
>>> should either be tied into the msm-iommu driver or perhaps exposed as
>>> its own iommu(?).
>>
>> The page tables are in msm-iommu driver.
>>
> 
> So, just to verify your answer, the msm-iommu driver will handle both
> protected and unprotected?

yes, at least for Venus case, the secured buffers are tied to different
iommu contexts (and stream IDs). But this needs to be verified more
carefully.

> 
>>>
>>>
>>> But I presume from your inclusion that you've concluded that the venus
>>> firmware we have refuses to execute without these tables at least
>>> initialized, is this correct?
>>
>> Yes, the SMC call for PAS memory-setup will fail if this page table is
>> not initialized.
>>
> 
> If the msm-iommu driver will handle the protected buffers (or if there
> will be a separate iommu driver for protected buffers) it should issue
> these calls, to not be dependant on the rproc-venus driver.

For msm8916 we don't have iommu driver in mainline, I don't know what is
the status with msm8996.

> 
> With that I think we should make the rproc-venus driver depend on this
> being setup (even if this means creating a "dummy" driver for the
> protected iommu handling for now).

This sounds scary :)

> 
>>>
>>>>>
>>>>>> The address is not really fixed, cause the firmware could support
>>>>>> relocation. In this example I just picked up the next free memory region
>>>>>> in memory-reserved from msm8916.dtsi.
>>>>>>
>>>>>
>>>>> In 8974 we do have a physical region where it's expected to be loaded.
>>>>>
>>>>> So in line with upcoming remoteproc work we should support referencing a
>>>>> reserved-memory node with either reg or size.
>>>>>
>>>>> In the case of spotting a "reg" we're currently better off using
>>>>> ioremap. We're looking at getting the remoteproc core to deal with this
>>>>> mess.
>>>>
>>>> You mean that remoteproc core will parse memory-region property?
>>>>
>>>
>>> It has to, because it's a quite common scenario for remoteproc drivers
>>> to either get its backing memory from a static region or be restricted
>>> to part of system ram - properties that reserved-memory and
>>> memory-region captures already.
>>
>> OK, I have no issues with that. My concern is the manual parsing of
>> 'memory-region' and 'reg' properties in remoteproc core.
>>
>> So that idea is to have generic binding for rproc, that would be good.
>>
> 
> I do share your concerns here. But it's a recurring issue with
> remoteproc drivers.
> 
> [..]
>>> But I presume we have the implementation issue of dma_alloc_coherent()
>>> failing in either case with the 5MB size. I think we need to look into
>>
>> I'd be good to include Marek Szyprowski? At least he will know what
>> design restrictions there are.
>>
> 
> Please do. The more I look at this the more I think we must use the
> existing infrastructure for allocating "dma memory". Getting
> dma_alloc_coherent() supporting non-power-of-2 memory regions would

Just to be precise it should be dma_alloc_from_coherent().

Marek, what is your opinion on that, can we make dma_alloc_from_coherent
able to allocate memory for sizes with bigger granularity.

For your convenience here [1] is the mail thread.

> allow us to use the existing infrastructure, for both fixed and
> dynamically placed memory carveouts in remoteproc.

[1] https://lkml.org/lkml/2016/8/19/570
Andy Gross Sept. 1, 2016, 8:46 p.m. UTC | #10
On Thu, Sep 01, 2016 at 05:58:17PM +0300, Stanimir Varbanov wrote:
> Hi,
> 
> Cc: Marek
> 
> On 08/30/2016 08:17 PM, Bjorn Andersson wrote:
> > On Mon 29 Aug 04:48 PDT 2016, Stanimir Varbanov wrote:
> > 
> > [..]
> >>> Trying to wrap my head around how the iommu part works here. The
> >>> downstream code seems to indicate that this is a "generic" secure iommu
> >>> interface - used by venus, camera and kgsl; likely for dealing with DRM
> >>> protected buffers.
> >>
> >> The secure iommu interface is for content protected buffers. But these
> >> secure iommu contexts aren't used by msm DRM nor Venus in mainline. In
> >> Venus case I use non-secure iommu context for data buffers.
> >>
> > 
> > We must consider the case when DRM, VFE and Venus handles protected
> > buffers.
> 
> That would be interesting topic.
> 
> > 
> >>>
> >>> As such the iommu tables are not part of the venus rproc; I believe they
> >>> should either be tied into the msm-iommu driver or perhaps exposed as
> >>> its own iommu(?).
> >>
> >> The page tables are in msm-iommu driver.
> >>
> > 
> > So, just to verify your answer, the msm-iommu driver will handle both
> > protected and unprotected?
> 
> yes, at least for Venus case, the secured buffers are tied to different
> iommu contexts (and stream IDs). But this needs to be verified more
> carefully.
> 
> > 
> >>>
> >>>
> >>> But I presume from your inclusion that you've concluded that the venus
> >>> firmware we have refuses to execute without these tables at least
> >>> initialized, is this correct?
> >>
> >> Yes, the SMC call for PAS memory-setup will fail if this page table is
> >> not initialized.
> >>
> > 
> > If the msm-iommu driver will handle the protected buffers (or if there
> > will be a separate iommu driver for protected buffers) it should issue
> > these calls, to not be dependant on the rproc-venus driver.
> 
> For msm8916 we don't have iommu driver in mainline, I don't know what is
> the status with msm8996.

The iommu v2 transparently handles this and removes the requirement for the
specific SCM page tbl init calls.  However, the memory still has to be
configured to be accessible from the remote processor.

<snip>


Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marek Szyprowski Sept. 2, 2016, 11:52 a.m. UTC | #11
Hi,


On 2016-09-01 16:58, Stanimir Varbanov wrote:
> Hi,
>
> Cc: Marek
>

...

>>>> But I presume we have the implementation issue of dma_alloc_coherent()
>>>> failing in either case with the 5MB size. I think we need to look into
>>> I'd be good to include Marek Szyprowski? At least he will know what
>>> design restrictions there are.
>>>
>> Please do. The more I look at this the more I think we must use the
>> existing infrastructure for allocating "dma memory". Getting
>> dma_alloc_coherent() supporting non-power-of-2 memory regions would
> Just to be precise it should be dma_alloc_from_coherent().
>
> Marek, what is your opinion on that, can we make dma_alloc_from_coherent
> able to allocate memory for sizes with bigger granularity.
>
> For your convenience here [1] is the mail thread.

There should be no technical restrictions to add support for bigger 
granularity
than power-of-2. dma_alloc_from_coherent uses standard bitmap based 
allocator,
so it already support tracking allocations of arbitrary size. However 
for the
small allocations (smaller than 64KiB?, 512KiB?) it would make sense to keep
nearest-power-of-2 round up to prevent memory fragmentation.

Best regards
Bjorn Andersson Sept. 2, 2016, 8:12 p.m. UTC | #12
On Fri 02 Sep 04:52 PDT 2016, Marek Szyprowski wrote:

> Hi,
> 
> 
> On 2016-09-01 16:58, Stanimir Varbanov wrote:
> >Hi,
> >
> >Cc: Marek
> >
> 
> ...
> 
> >>>>But I presume we have the implementation issue of dma_alloc_coherent()
> >>>>failing in either case with the 5MB size. I think we need to look into
> >>>I'd be good to include Marek Szyprowski? At least he will know what
> >>>design restrictions there are.
> >>>
> >>Please do. The more I look at this the more I think we must use the
> >>existing infrastructure for allocating "dma memory". Getting
> >>dma_alloc_coherent() supporting non-power-of-2 memory regions would
> >Just to be precise it should be dma_alloc_from_coherent().
> >
> >Marek, what is your opinion on that, can we make dma_alloc_from_coherent
> >able to allocate memory for sizes with bigger granularity.
> >
> >For your convenience here [1] is the mail thread.
> 
> There should be no technical restrictions to add support for bigger
> granularity than power-of-2. dma_alloc_from_coherent uses standard
> bitmap based allocator, so it already support tracking allocations of
> arbitrary size.

I believe we should be able to change the parameter of
bitmap_{find_free,release,allocate}_region() to take a size rather than
an order.

The mask used in __reg_op() is an unsigned long, that is stamped over
the region to be masked or cleared, so there are some clear restrictions
in what parameters we can pass there - without having to break this
operation up in steps.

But if drive the offset by taking the next power-of-two of the size and
then align the number of bits to min(count, BITS_PER_LONG) we should
retain the performance characteristics and requirements of __reg_op().

> However for the small allocations (smaller than 64KiB?, 512KiB?) it
> would make sense to keep nearest-power-of-2 round up to prevent memory
> fragmentation.

But in our case each bit matches a single page, so by making sure the
mask always fills the unsigned long in the larger cases we would end up
with having to align things to 128kb (or 256kb if unsigned long is 64
bit).


Does this sound reasonable?

Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanimir Varbanov Sept. 7, 2016, 11:52 a.m. UTC | #13
Hi Bjorn,

On 09/02/2016 11:12 PM, Bjorn Andersson wrote:
> On Fri 02 Sep 04:52 PDT 2016, Marek Szyprowski wrote:
> 
>> Hi,
>>
>>
>> On 2016-09-01 16:58, Stanimir Varbanov wrote:
>>> Hi,
>>>
>>> Cc: Marek
>>>
>>
>> ...
>>
>>>>>> But I presume we have the implementation issue of dma_alloc_coherent()
>>>>>> failing in either case with the 5MB size. I think we need to look into
>>>>> I'd be good to include Marek Szyprowski? At least he will know what
>>>>> design restrictions there are.
>>>>>
>>>> Please do. The more I look at this the more I think we must use the
>>>> existing infrastructure for allocating "dma memory". Getting
>>>> dma_alloc_coherent() supporting non-power-of-2 memory regions would
>>> Just to be precise it should be dma_alloc_from_coherent().
>>>
>>> Marek, what is your opinion on that, can we make dma_alloc_from_coherent
>>> able to allocate memory for sizes with bigger granularity.
>>>
>>> For your convenience here [1] is the mail thread.
>>
>> There should be no technical restrictions to add support for bigger
>> granularity than power-of-2. dma_alloc_from_coherent uses standard
>> bitmap based allocator, so it already support tracking allocations of
>> arbitrary size.
> 
> I believe we should be able to change the parameter of
> bitmap_{find_free,release,allocate}_region() to take a size rather than
> an order.
> 
> The mask used in __reg_op() is an unsigned long, that is stamped over
> the region to be masked or cleared, so there are some clear restrictions
> in what parameters we can pass there - without having to break this
> operation up in steps.
> 
> But if drive the offset by taking the next power-of-two of the size and
> then align the number of bits to min(count, BITS_PER_LONG) we should
> retain the performance characteristics and requirements of __reg_op().
> 
>> However for the small allocations (smaller than 64KiB?, 512KiB?) it
>> would make sense to keep nearest-power-of-2 round up to prevent memory
>> fragmentation.
> 
> But in our case each bit matches a single page, so by making sure the
> mask always fills the unsigned long in the larger cases we would end up
> with having to align things to 128kb (or 256kb if unsigned long is 64
> bit).
> 
> 
> Does this sound reasonable?

I haven't looked in bitmap details, but can't we reuse genalloc?
Marek Szyprowski Sept. 15, 2016, 8:46 a.m. UTC | #14
Hi Bjorn,


On 2016-09-02 22:12, Bjorn Andersson wrote:
> On Fri 02 Sep 04:52 PDT 2016, Marek Szyprowski wrote:
>
>> On 2016-09-01 16:58, Stanimir Varbanov wrote:
>> ...
>>>>>> But I presume we have the implementation issue of dma_alloc_coherent()
>>>>>> failing in either case with the 5MB size. I think we need to look into
>>>>> I'd be good to include Marek Szyprowski? At least he will know what
>>>>> design restrictions there are.
>>>>>
>>>> Please do. The more I look at this the more I think we must use the
>>>> existing infrastructure for allocating "dma memory". Getting
>>>> dma_alloc_coherent() supporting non-power-of-2 memory regions would
>>> Just to be precise it should be dma_alloc_from_coherent().
>>>
>>> Marek, what is your opinion on that, can we make dma_alloc_from_coherent
>>> able to allocate memory for sizes with bigger granularity.
>>>
>>> For your convenience here [1] is the mail thread.
>> There should be no technical restrictions to add support for bigger
>> granularity than power-of-2. dma_alloc_from_coherent uses standard
>> bitmap based allocator, so it already support tracking allocations of
>> arbitrary size.
> I believe we should be able to change the parameter of
> bitmap_{find_free,release,allocate}_region() to take a size rather than
> an order.
>
> The mask used in __reg_op() is an unsigned long, that is stamped over
> the region to be masked or cleared, so there are some clear restrictions
> in what parameters we can pass there - without having to break this
> operation up in steps.
>
> But if drive the offset by taking the next power-of-two of the size and
> then align the number of bits to min(count, BITS_PER_LONG) we should
> retain the performance characteristics and requirements of __reg_op().
>
>> However for the small allocations (smaller than 64KiB?, 512KiB?) it
>> would make sense to keep nearest-power-of-2 round up to prevent memory
>> fragmentation.
> But in our case each bit matches a single page, so by making sure the
> mask always fills the unsigned long in the larger cases we would end up
> with having to align things to 128kb (or 256kb if unsigned long is 64
> bit).

By preventing memory fragmentation I wanted to align small allocations (less
than the mentioned 64KiB or 512KiB) to nearest-power-of-2 of their size just
like it is done now.

Best regards
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
new file mode 100644
index 000000000000..06a2db60fa38
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/qcom,venus.txt
@@ -0,0 +1,33 @@ 
+Qualcomm Venus Peripheral Image Loader
+
+This document defines the binding for a component that loads and boots firmware
+on the Qualcomm Venus remote processor core.
+
+- compatible:
+	Usage: required
+	Value type: <string>
+	Definition: must contain "qcom,venus-pil"
+
+- memory-region:
+	Usage: required
+	Value type: <phandle>
+	Definition: a phandle to a node describing reserved memory
+
+* An example
+	reserved-memory {
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		venus_mem: venus@89900000 {
+			compatible = "shared-dma-pool";
+			reg = <0x0 0x89900000 0x0 0x800000>;
+			alignment = <0x1000>;
+			no-map;
+		};
+	};
+
+	rproc_venus@0 {
+		compatible = "qcom,venus-pil";
+		memory-region = <&venus_mem>;
+	};
\ No newline at end of file