[V2] gpu: host1x: handle the correct # of syncpt regs

Message ID	1396650665-6992-1-git-send-email-swarren@wwwdotorg.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Stephen Warren <swarren@wwwdotorg.org> To: Thierry Reding <thierry.reding@gmail.com>, =?UTF-8?q?Terje=20Bergstr=C3=B6m?= <tbergstrom@nvidia.com> Subject: [PATCH V2] gpu: host1x: handle the correct # of syncpt regs Date: Fri, 4 Apr 2014 16:31:05 -0600 Message-Id: <1396650665-6992-1-git-send-email-swarren@wwwdotorg.org> Cc: linux-tegra@vger.kernel.org, Stephen Warren <swarren@nvidia.com>, dri-devel@lists.freedesktop.org Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Stephen Warren April 4, 2014, 10:31 p.m. UTC

From: Stephen Warren <swarren@nvidia.com>

BIT_WORD() truncates rather than rounds, so the loops in
syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
rather than < in an attempt to process the correct number of registers
when rounding of the conversion of count of bits to count of words is
necessary. However, when rounding isn't necessary because the value is
already a multiple of the divisor (as is the case for all values of
nb_pts the code actually sees), this causes one too many registers to
be processed.

Solve this by using and explicit DIV_ROUND_UP() call, rather than
BIT_WORD(), and comparing with < rather than <=.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
v2: Use DIV_ROUND_UP rather than BITS_TO_LONGS to avoid problems on 64-bit.
---
 drivers/gpu/host1x/hw/intr_hw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Thierry Reding April 7, 2014, 8:18 a.m. UTC | #1

On Fri, Apr 04, 2014 at 04:31:05PM -0600, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> BIT_WORD() truncates rather than rounds, so the loops in
> syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
> rather than < in an attempt to process the correct number of registers
> when rounding of the conversion of count of bits to count of words is
> necessary. However, when rounding isn't necessary because the value is
> already a multiple of the divisor (as is the case for all values of
> nb_pts the code actually sees), this causes one too many registers to
> be processed.
> 
> Solve this by using and explicit DIV_ROUND_UP() call, rather than
> BIT_WORD(), and comparing with < rather than <=.
> 
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> ---
> v2: Use DIV_ROUND_UP rather than BITS_TO_LONGS to avoid problems on 64-bit.
> ---
>  drivers/gpu/host1x/hw/intr_hw.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

If I understand correctly there's no immediate need for this to go to
stable kernels, nor for it to be queued for 3.15, right? That is the
potential extra write isn't causing any harm on actual hardware, is it?

In that case I'll queue this up for 3.16.

Thierry

Terje Bergstrom April 7, 2014, 8:32 a.m. UTC | #2

On 05.04.2014 01:31, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
> index db9017adfe2b..498b37e39058 100644
> --- a/drivers/gpu/host1x/hw/intr_hw.c
> +++ b/drivers/gpu/host1x/hw/intr_hw.c
> @@ -47,7 +47,7 @@ static irqreturn_t syncpt_thresh_isr(int irq, void *dev_id)
>  	unsigned long reg;
>  	int i, id;
>  
> -	for (i = 0; i <= BIT_WORD(host->info->nb_pts); i++) {
> +	for (i = 0; i < DIV_ROUND_UP(host->info->nb_pts, 32); i++) {
>  		reg = host1x_sync_readl(host,
>  			HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS(i));
>  		for_each_set_bit(id, &reg, BITS_PER_LONG) {
> @@ -64,7 +64,7 @@ static void _host1x_intr_disable_all_syncpt_intrs(struct host1x *host)
>  {
>  	u32 i;
>  
> -	for (i = 0; i <= BIT_WORD(host->info->nb_pts); ++i) {
> +	for (i = 0; i < DIV_ROUND_UP(host->info->nb_pts, 32); ++i) {
>  		host1x_sync_writel(host, 0xffffffffu,
>  			HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE(i));
>  		host1x_sync_writel(host, 0xffffffffu,
> 

Acked-By: Terje Bergstrom <tbergstrom@nvidia.com>

Terje

Terje Bergstrom April 7, 2014, 8:34 a.m. UTC | #3

On 07.04.2014 11:18, Thierry Reding wrote:
> If I understand correctly there's no immediate need for this to go to
> stable kernels, nor for it to be queued for 3.15, right? That is the
> potential extra write isn't causing any harm on actual hardware, is it?
> 
> In that case I'll queue this up for 3.16.

The reads and writes would get ignored on 32-bit kernel. The change does
fix sync point behavior in 64-bit kernel, so it is fixing a real issue.

Terje

Thierry Reding April 7, 2014, 8:41 a.m. UTC | #4

On Mon, Apr 07, 2014 at 11:34:22AM +0300, Terje Bergström wrote:
> On 07.04.2014 11:18, Thierry Reding wrote:
> > If I understand correctly there's no immediate need for this to go to
> > stable kernels, nor for it to be queued for 3.15, right? That is the
> > potential extra write isn't causing any harm on actual hardware, is it?
> > 
> > In that case I'll queue this up for 3.16.
> 
> The reads and writes would get ignored on 32-bit kernel. The change does
> fix sync point behavior in 64-bit kernel, so it is fixing a real issue.

Okay, but given that we don't support any 64 bit Tegra hardware upstream
yet, 3.16 would still be enough, wouldn't it?

Thierry

Terje Bergstrom April 7, 2014, 8:47 a.m. UTC | #5

On 07.04.2014 11:41, Thierry Reding wrote:
> On Mon, Apr 07, 2014 at 11:34:22AM +0300, Terje Bergström wrote:
>> On 07.04.2014 11:18, Thierry Reding wrote:
>>> If I understand correctly there's no immediate need for this to go to
>>> stable kernels, nor for it to be queued for 3.15, right? That is the
>>> potential extra write isn't causing any harm on actual hardware, is it?
>>>
>>> In that case I'll queue this up for 3.16.
>>
>> The reads and writes would get ignored on 32-bit kernel. The change does
>> fix sync point behavior in 64-bit kernel, so it is fixing a real issue.
> 
> Okay, but given that we don't support any 64 bit Tegra hardware upstream
> yet, 3.16 would still be enough, wouldn't it?

Sure, sounds good.

Terje

Stephen Warren April 7, 2014, 3:39 p.m. UTC | #6

On 04/07/2014 02:18 AM, Thierry Reding wrote:
> On Fri, Apr 04, 2014 at 04:31:05PM -0600, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> BIT_WORD() truncates rather than rounds, so the loops in
>> syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
>> rather than < in an attempt to process the correct number of registers
>> when rounding of the conversion of count of bits to count of words is
>> necessary. However, when rounding isn't necessary because the value is
>> already a multiple of the divisor (as is the case for all values of
>> nb_pts the code actually sees), this causes one too many registers to
>> be processed.
>>
>> Solve this by using and explicit DIV_ROUND_UP() call, rather than
>> BIT_WORD(), and comparing with < rather than <=.
>>
>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>> ---
>> v2: Use DIV_ROUND_UP rather than BITS_TO_LONGS to avoid problems on 64-bit.
>> ---
>>  drivers/gpu/host1x/hw/intr_hw.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> If I understand correctly there's no immediate need for this to go to
> stable kernels, nor for it to be queued for 3.15, right? That is the
> potential extra write isn't causing any harm on actual hardware, is it?
> 
> In that case I'll queue this up for 3.16.

We should definitely apply this, and as far back as the code exists,
since the SW is touching non-existent registers, and that is presumably
undefined behaviour, which could potentially cause hard-to-diagnose bugs.

Besides, I want the mainline kernel to run on our simulator without
having to maintain patches for fixed issues.

Stephen Warren April 14, 2014, 8:53 p.m. UTC | #7

On 04/04/2014 04:31 PM, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> BIT_WORD() truncates rather than rounds, so the loops in
> syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
> rather than < in an attempt to process the correct number of registers
> when rounding of the conversion of count of bits to count of words is
> necessary. However, when rounding isn't necessary because the value is
> already a multiple of the divisor (as is the case for all values of
> nb_pts the code actually sees), this causes one too many registers to
> be processed.
> 
> Solve this by using and explicit DIV_ROUND_UP() call, rather than
> BIT_WORD(), and comparing with < rather than <=.

I don't see this in linux-next yet.

Thierry Reding April 14, 2014, 9:13 p.m. UTC | #8

On Mon, Apr 14, 2014 at 02:53:51PM -0600, Stephen Warren wrote:
> On 04/04/2014 04:31 PM, Stephen Warren wrote:
> > From: Stephen Warren <swarren@nvidia.com>
> > 
> > BIT_WORD() truncates rather than rounds, so the loops in
> > syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
> > rather than < in an attempt to process the correct number of registers
> > when rounding of the conversion of count of bits to count of words is
> > necessary. However, when rounding isn't necessary because the value is
> > already a multiple of the divisor (as is the case for all values of
> > nb_pts the code actually sees), this causes one too many registers to
> > be processed.
> > 
> > Solve this by using and explicit DIV_ROUND_UP() call, rather than
> > BIT_WORD(), and comparing with < rather than <=.
> 
> I don't see this in linux-next yet.

I've queued this locally but haven't pushed anything out yet.

Thierry

Thierry Reding April 22, 2014, 7:15 a.m. UTC | #9

On Mon, Apr 14, 2014 at 02:53:51PM -0600, Stephen Warren wrote:
> On 04/04/2014 04:31 PM, Stephen Warren wrote:
> > From: Stephen Warren <swarren@nvidia.com>
> > 
> > BIT_WORD() truncates rather than rounds, so the loops in
> > syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <=
> > rather than < in an attempt to process the correct number of registers
> > when rounding of the conversion of count of bits to count of words is
> > necessary. However, when rounding isn't necessary because the value is
> > already a multiple of the divisor (as is the case for all values of
> > nb_pts the code actually sees), this causes one too many registers to
> > be processed.
> > 
> > Solve this by using and explicit DIV_ROUND_UP() call, rather than
> > BIT_WORD(), and comparing with < rather than <=.
> 
> I don't see this in linux-next yet.

Just in case you haven't noticed, this this was merged in v3.15-rc2.
I've also Cc'ed stable so that it can be applied as far back as 3.10
when the code it fixes was introduced.

Thierry

[V2] gpu: host1x: handle the correct # of syncpt regs

Commit Message

Comments

Patch