diff mbox series

[RFT,v2,1/8] media: hantro: jpeg: Relax register writes before write starting hardware

Message ID 20220107093455.73766-2-wenst@chromium.org (mailing list archive)
State New
Headers show
Series media: hantro: jpeg: Various improvements | expand

Commit Message

Chen-Yu Tsai Jan. 7, 2022, 9:34 a.m. UTC
In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
wmb() was inserted before the final register write that starts the
encoder. In v11, it was removed and the second-to-last register write
was changed to a non-relaxed write, which has an implicit wmb() [1].
The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
is another writel_relaxed() following the non-relaxed one.

Turns out only the last writel() needs to be non-relaxed. Device I/O
mappings already guarantee strict ordering to the same endpoint, and
the writel() triggering the hardware would force all writes to memory
to be observed before the writel() to the hardware is observed.

[1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
 drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
 drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

Comments

Ezequiel Garcia Jan. 18, 2022, 9:02 p.m. UTC | #1
Hi Chen-Yu,

The series looks good, thanks for picking up this task.

Just a one comment.

On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> wmb() was inserted before the final register write that starts the
> encoder. In v11, it was removed and the second-to-last register write
> was changed to a non-relaxed write, which has an implicit wmb() [1].
> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> is another writel_relaxed() following the non-relaxed one.
> 
> Turns out only the last writel() needs to be non-relaxed. Device I/O
> mappings already guarantee strict ordering to the same endpoint, and
> the writel() triggering the hardware would force all writes to memory
> to be observed before the writel() to the hardware is observed.
> 
> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> 
> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> ---
>  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
>  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> index 1450013d3685..03db1c3444f8 100644
> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
>  		| H1_REG_AXI_CTRL_INPUT_SWAP32
>  		| H1_REG_AXI_CTRL_OUTPUT_SWAP8
>  		| H1_REG_AXI_CTRL_INPUT_SWAP8;
> -	/* Make sure that all registers are written at this point. */
> -	vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> +	vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
>  

As far as I can remember, this logic comes from really old Chromium Kernels.
You might be right, and this barrier isn't needed... but then OTOH the comment
is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).

I don't have RK3288 boards near me, but in any case, I'm not sure
we'd be able to test this easily (maybe there are issues that only
trigger under a certain load).

I'd personally avoid this one change, but if you are confident enough with it
that's fine too.

Thanks!
Ezequiel

>  	reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
>  		| H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> index 4df16f59fb97..b931fc5fa1a9 100644
> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
>  		| VEPU_REG_INPUT_SWAP8
>  		| VEPU_REG_INPUT_SWAP16
>  		| VEPU_REG_INPUT_SWAP32;
> -	/* Make sure that all registers are written at this point. */
> -	vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
> +	vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
>  
>  	reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
>  	vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
> -- 
> 2.34.1.575.g55b058a8bb-goog
>
Chen-Yu Tsai Jan. 19, 2022, 10:08 a.m. UTC | #2
Hi,

On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
<ezequiel@vanguardiasur.com.ar> wrote:
>
> Hi Chen-Yu,
>
> The series looks good, thanks for picking up this task.
>
> Just a one comment.
>
> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> > In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> > wmb() was inserted before the final register write that starts the
> > encoder. In v11, it was removed and the second-to-last register write
> > was changed to a non-relaxed write, which has an implicit wmb() [1].
> > The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> > is another writel_relaxed() following the non-relaxed one.
> >
> > Turns out only the last writel() needs to be non-relaxed. Device I/O
> > mappings already guarantee strict ordering to the same endpoint, and
> > the writel() triggering the hardware would force all writes to memory
> > to be observed before the writel() to the hardware is observed.
> >
> > [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> >
> > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> > ---
> >  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
> >  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > index 1450013d3685..03db1c3444f8 100644
> > --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
> >               | H1_REG_AXI_CTRL_INPUT_SWAP32
> >               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
> >               | H1_REG_AXI_CTRL_INPUT_SWAP8;
> > -     /* Make sure that all registers are written at this point. */
> > -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> > +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
> >
>
> As far as I can remember, this logic comes from really old Chromium Kernels.
> You might be right, and this barrier isn't needed... but then OTOH the comment
> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).

I just realized that my commit log is wrong.

" ... a wmb() was inserted before the final register write that starts the
encoder. ... " . It is actually before the second-to-last register write.

> I don't have RK3288 boards near me, but in any case, I'm not sure
> we'd be able to test this easily (maybe there are issues that only
> trigger under a certain load).

I see. I do have a Veyron around that I haven't used in awhile. But as you
said, it might not be an obvious hardware limitation.

> I'd personally avoid this one change, but if you are confident enough with it
> that's fine too.

Unfortunately they didn't leave a whole lot of clues around. For most hardware,
as I mentioned in the commit log, I think the final non-relaxed write should
suffice. I'd point to the decoder drivers not having any barriers or
non-relaxed writes except the final one, but IIUC they are actually two
distinct pieces of hardware.

I suspect we will never know. This JPEG encoder doesn't seem to get used
a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
extra barrier and get tested a lot, but that's only testing the RK3399.

Hans, would it be possible for you to skip this patch and pick the rest?
Or would you like me to resent without this one?


Thanks
ChenYu

> Thanks!
> Ezequiel
>
> >       reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
> >               | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
> > diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> > index 4df16f59fb97..b931fc5fa1a9 100644
> > --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> > +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> > @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
> >               | VEPU_REG_INPUT_SWAP8
> >               | VEPU_REG_INPUT_SWAP16
> >               | VEPU_REG_INPUT_SWAP32;
> > -     /* Make sure that all registers are written at this point. */
> > -     vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
> > +     vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
> >
> >       reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
> >       vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
> > --
> > 2.34.1.575.g55b058a8bb-goog
> >
Ezequiel Garcia Jan. 19, 2022, 11:59 a.m. UTC | #3
On Wed, 19 Jan 2022 at 07:09, Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> Hi,
>
> On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
> <ezequiel@vanguardiasur.com.ar> wrote:
> >
> > Hi Chen-Yu,
> >
> > The series looks good, thanks for picking up this task.
> >
> > Just a one comment.
> >
> > On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> > > In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> > > wmb() was inserted before the final register write that starts the
> > > encoder. In v11, it was removed and the second-to-last register write
> > > was changed to a non-relaxed write, which has an implicit wmb() [1].
> > > The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> > > is another writel_relaxed() following the non-relaxed one.
> > >
> > > Turns out only the last writel() needs to be non-relaxed. Device I/O
> > > mappings already guarantee strict ordering to the same endpoint, and
> > > the writel() triggering the hardware would force all writes to memory
> > > to be observed before the writel() to the hardware is observed.
> > >
> > > [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> > >
> > > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> > > ---
> > >  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
> > >  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
> > >  2 files changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > > index 1450013d3685..03db1c3444f8 100644
> > > --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > > +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> > > @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
> > >               | H1_REG_AXI_CTRL_INPUT_SWAP32
> > >               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
> > >               | H1_REG_AXI_CTRL_INPUT_SWAP8;
> > > -     /* Make sure that all registers are written at this point. */
> > > -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> > > +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
> > >
> >
> > As far as I can remember, this logic comes from really old Chromium Kernels.
> > You might be right, and this barrier isn't needed... but then OTOH the comment
> > is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).
>
> I just realized that my commit log is wrong.
>
> " ... a wmb() was inserted before the final register write that starts the
> encoder. ... " . It is actually before the second-to-last register write.
>
> > I don't have RK3288 boards near me, but in any case, I'm not sure
> > we'd be able to test this easily (maybe there are issues that only
> > trigger under a certain load).
>
> I see. I do have a Veyron around that I haven't used in awhile. But as you
> said, it might not be an obvious hardware limitation.
>
> > I'd personally avoid this one change, but if you are confident enough with it
> > that's fine too.
>
> Unfortunately they didn't leave a whole lot of clues around. For most hardware,
> as I mentioned in the commit log, I think the final non-relaxed write should
> suffice. I'd point to the decoder drivers not having any barriers or
> non-relaxed writes except the final one, but IIUC they are actually two
> distinct pieces of hardware.
>
> I suspect we will never know. This JPEG encoder doesn't seem to get used
> a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
> extra barrier and get tested a lot, but that's only testing the RK3399.
>
> Hans, would it be possible for you to skip this patch and pick the rest?
> Or would you like me to resent without this one?
>

If you decide to resend, feel free to add:

Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>

to the whole series.

Thanks,
Ezequiel
Hans Verkuil Jan. 20, 2022, 2:20 p.m. UTC | #4
Hi Chen-Yu,

I'll take patches 2-8.

So should I mark patch 1/8 as 'Rejected' or 'Changes Requested' in patchwork?

Regards,

	Hans

On 1/19/22 11:08, Chen-Yu Tsai wrote:
> Hi,
> 
> On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
> <ezequiel@vanguardiasur.com.ar> wrote:
>>
>> Hi Chen-Yu,
>>
>> The series looks good, thanks for picking up this task.
>>
>> Just a one comment.
>>
>> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
>>> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
>>> wmb() was inserted before the final register write that starts the
>>> encoder. In v11, it was removed and the second-to-last register write
>>> was changed to a non-relaxed write, which has an implicit wmb() [1].
>>> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
>>> is another writel_relaxed() following the non-relaxed one.
>>>
>>> Turns out only the last writel() needs to be non-relaxed. Device I/O
>>> mappings already guarantee strict ordering to the same endpoint, and
>>> the writel() triggering the hardware would force all writes to memory
>>> to be observed before the writel() to the hardware is observed.
>>>
>>> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
>>>
>>> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
>>> ---
>>>  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
>>>  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
>>>  2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
>>> index 1450013d3685..03db1c3444f8 100644
>>> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
>>> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
>>> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
>>>               | H1_REG_AXI_CTRL_INPUT_SWAP32
>>>               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
>>>               | H1_REG_AXI_CTRL_INPUT_SWAP8;
>>> -     /* Make sure that all registers are written at this point. */
>>> -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
>>> +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
>>>
>>
>> As far as I can remember, this logic comes from really old Chromium Kernels.
>> You might be right, and this barrier isn't needed... but then OTOH the comment
>> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).
> 
> I just realized that my commit log is wrong.
> 
> " ... a wmb() was inserted before the final register write that starts the
> encoder. ... " . It is actually before the second-to-last register write.
> 
>> I don't have RK3288 boards near me, but in any case, I'm not sure
>> we'd be able to test this easily (maybe there are issues that only
>> trigger under a certain load).
> 
> I see. I do have a Veyron around that I haven't used in awhile. But as you
> said, it might not be an obvious hardware limitation.
> 
>> I'd personally avoid this one change, but if you are confident enough with it
>> that's fine too.
> 
> Unfortunately they didn't leave a whole lot of clues around. For most hardware,
> as I mentioned in the commit log, I think the final non-relaxed write should
> suffice. I'd point to the decoder drivers not having any barriers or
> non-relaxed writes except the final one, but IIUC they are actually two
> distinct pieces of hardware.
> 
> I suspect we will never know. This JPEG encoder doesn't seem to get used
> a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
> extra barrier and get tested a lot, but that's only testing the RK3399.
> 
> Hans, would it be possible for you to skip this patch and pick the rest?
> Or would you like me to resent without this one?
> 
> 
> Thanks
> ChenYu
> 
>> Thanks!
>> Ezequiel
>>
>>>       reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
>>>               | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
>>> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
>>> index 4df16f59fb97..b931fc5fa1a9 100644
>>> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
>>> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
>>> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
>>>               | VEPU_REG_INPUT_SWAP8
>>>               | VEPU_REG_INPUT_SWAP16
>>>               | VEPU_REG_INPUT_SWAP32;
>>> -     /* Make sure that all registers are written at this point. */
>>> -     vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
>>> +     vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
>>>
>>>       reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
>>>       vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
>>> --
>>> 2.34.1.575.g55b058a8bb-goog
>>>
Chen-Yu Tsai Jan. 21, 2022, 2:25 a.m. UTC | #5
On Thu, Jan 20, 2022 at 10:20 PM Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> Hi Chen-Yu,
>
> I'll take patches 2-8.

Got it.

> So should I mark patch 1/8 as 'Rejected' or 'Changes Requested' in patchwork?

I'd say "Changes Requested", but it won't really progress unless someone
from Rockchip fills in the blanks about the wmb() though.

ChenYu

> Regards,
>
>         Hans
>
> On 1/19/22 11:08, Chen-Yu Tsai wrote:
> > Hi,
> >
> > On Wed, Jan 19, 2022 at 5:02 AM Ezequiel Garcia
> > <ezequiel@vanguardiasur.com.ar> wrote:
> >>
> >> Hi Chen-Yu,
> >>
> >> The series looks good, thanks for picking up this task.
> >>
> >> Just a one comment.
> >>
> >> On Fri, Jan 07, 2022 at 05:34:48PM +0800, Chen-Yu Tsai wrote:
> >>> In the earlier submissions of the Hantro/Rockchip JPEG encoder driver, a
> >>> wmb() was inserted before the final register write that starts the
> >>> encoder. In v11, it was removed and the second-to-last register write
> >>> was changed to a non-relaxed write, which has an implicit wmb() [1].
> >>> The rockchip_vpu2 (then rk3399_vpu) variant is even weirder as there
> >>> is another writel_relaxed() following the non-relaxed one.
> >>>
> >>> Turns out only the last writel() needs to be non-relaxed. Device I/O
> >>> mappings already guarantee strict ordering to the same endpoint, and
> >>> the writel() triggering the hardware would force all writes to memory
> >>> to be observed before the writel() to the hardware is observed.
> >>>
> >>> [1] https://lore.kernel.org/linux-media/CAAFQd5ArFG0hU6MgcyLd+_UOP3+T_U-aw2FXv6sE7fGqVCVGqw@mail.gmail.com/
> >>>
> >>> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> >>> ---
> >>>  drivers/staging/media/hantro/hantro_h1_jpeg_enc.c        | 3 +--
> >>>  drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c | 3 +--
> >>>  2 files changed, 2 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> >>> index 1450013d3685..03db1c3444f8 100644
> >>> --- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> >>> +++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
> >>> @@ -123,8 +123,7 @@ int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
> >>>               | H1_REG_AXI_CTRL_INPUT_SWAP32
> >>>               | H1_REG_AXI_CTRL_OUTPUT_SWAP8
> >>>               | H1_REG_AXI_CTRL_INPUT_SWAP8;
> >>> -     /* Make sure that all registers are written at this point. */
> >>> -     vepu_write(vpu, reg, H1_REG_AXI_CTRL);
> >>> +     vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
> >>>
> >>
> >> As far as I can remember, this logic comes from really old Chromium Kernels.
> >> You might be right, and this barrier isn't needed... but then OTOH the comment
> >> is here for a reason, so maybe it is needed (or was needed on some RK3288 SoC revision).
> >
> > I just realized that my commit log is wrong.
> >
> > " ... a wmb() was inserted before the final register write that starts the
> > encoder. ... " . It is actually before the second-to-last register write.
> >
> >> I don't have RK3288 boards near me, but in any case, I'm not sure
> >> we'd be able to test this easily (maybe there are issues that only
> >> trigger under a certain load).
> >
> > I see. I do have a Veyron around that I haven't used in awhile. But as you
> > said, it might not be an obvious hardware limitation.
> >
> >> I'd personally avoid this one change, but if you are confident enough with it
> >> that's fine too.
> >
> > Unfortunately they didn't leave a whole lot of clues around. For most hardware,
> > as I mentioned in the commit log, I think the final non-relaxed write should
> > suffice. I'd point to the decoder drivers not having any barriers or
> > non-relaxed writes except the final one, but IIUC they are actually two
> > distinct pieces of hardware.
> >
> > I suspect we will never know. This JPEG encoder doesn't seem to get used
> > a lot. The VP8 and H.264 encoders on ChromeOS work correctly without the
> > extra barrier and get tested a lot, but that's only testing the RK3399.
> >
> > Hans, would it be possible for you to skip this patch and pick the rest?
> > Or would you like me to resent without this one?
> >
> >
> > Thanks
> > ChenYu
> >
> >> Thanks!
> >> Ezequiel
> >>
> >>>       reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
> >>>               | H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
> >>> diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> >>> index 4df16f59fb97..b931fc5fa1a9 100644
> >>> --- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> >>> +++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
> >>> @@ -152,8 +152,7 @@ int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
> >>>               | VEPU_REG_INPUT_SWAP8
> >>>               | VEPU_REG_INPUT_SWAP16
> >>>               | VEPU_REG_INPUT_SWAP32;
> >>> -     /* Make sure that all registers are written at this point. */
> >>> -     vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
> >>> +     vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
> >>>
> >>>       reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
> >>>       vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);
> >>> --
> >>> 2.34.1.575.g55b058a8bb-goog
> >>>
diff mbox series

Patch

diff --git a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
index 1450013d3685..03db1c3444f8 100644
--- a/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
+++ b/drivers/staging/media/hantro/hantro_h1_jpeg_enc.c
@@ -123,8 +123,7 @@  int hantro_h1_jpeg_enc_run(struct hantro_ctx *ctx)
 		| H1_REG_AXI_CTRL_INPUT_SWAP32
 		| H1_REG_AXI_CTRL_OUTPUT_SWAP8
 		| H1_REG_AXI_CTRL_INPUT_SWAP8;
-	/* Make sure that all registers are written at this point. */
-	vepu_write(vpu, reg, H1_REG_AXI_CTRL);
+	vepu_write_relaxed(vpu, reg, H1_REG_AXI_CTRL);
 
 	reg = H1_REG_ENC_CTRL_WIDTH(MB_WIDTH(ctx->src_fmt.width))
 		| H1_REG_ENC_CTRL_HEIGHT(MB_HEIGHT(ctx->src_fmt.height))
diff --git a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
index 4df16f59fb97..b931fc5fa1a9 100644
--- a/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
+++ b/drivers/staging/media/hantro/rockchip_vpu2_hw_jpeg_enc.c
@@ -152,8 +152,7 @@  int rockchip_vpu2_jpeg_enc_run(struct hantro_ctx *ctx)
 		| VEPU_REG_INPUT_SWAP8
 		| VEPU_REG_INPUT_SWAP16
 		| VEPU_REG_INPUT_SWAP32;
-	/* Make sure that all registers are written at this point. */
-	vepu_write(vpu, reg, VEPU_REG_DATA_ENDIAN);
+	vepu_write_relaxed(vpu, reg, VEPU_REG_DATA_ENDIAN);
 
 	reg = VEPU_REG_AXI_CTRL_BURST_LEN(16);
 	vepu_write_relaxed(vpu, reg, VEPU_REG_AXI_CTRL);