Message ID | 1396650665-6992-1-git-send-email-swarren@wwwdotorg.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 04, 2014 at 04:31:05PM -0600, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > BIT_WORD() truncates rather than rounds, so the loops in > syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <= > rather than < in an attempt to process the correct number of registers > when rounding of the conversion of count of bits to count of words is > necessary. However, when rounding isn't necessary because the value is > already a multiple of the divisor (as is the case for all values of > nb_pts the code actually sees), this causes one too many registers to > be processed. > > Solve this by using and explicit DIV_ROUND_UP() call, rather than > BIT_WORD(), and comparing with < rather than <=. > > Signed-off-by: Stephen Warren <swarren@nvidia.com> > --- > v2: Use DIV_ROUND_UP rather than BITS_TO_LONGS to avoid problems on 64-bit. > --- > drivers/gpu/host1x/hw/intr_hw.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) If I understand correctly there's no immediate need for this to go to stable kernels, nor for it to be queued for 3.15, right? That is the potential extra write isn't causing any harm on actual hardware, is it? In that case I'll queue this up for 3.16. Thierry
On 05.04.2014 01:31, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c > index db9017adfe2b..498b37e39058 100644 > --- a/drivers/gpu/host1x/hw/intr_hw.c > +++ b/drivers/gpu/host1x/hw/intr_hw.c > @@ -47,7 +47,7 @@ static irqreturn_t syncpt_thresh_isr(int irq, void *dev_id) > unsigned long reg; > int i, id; > > - for (i = 0; i <= BIT_WORD(host->info->nb_pts); i++) { > + for (i = 0; i < DIV_ROUND_UP(host->info->nb_pts, 32); i++) { > reg = host1x_sync_readl(host, > HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS(i)); > for_each_set_bit(id, ®, BITS_PER_LONG) { > @@ -64,7 +64,7 @@ static void _host1x_intr_disable_all_syncpt_intrs(struct host1x *host) > { > u32 i; > > - for (i = 0; i <= BIT_WORD(host->info->nb_pts); ++i) { > + for (i = 0; i < DIV_ROUND_UP(host->info->nb_pts, 32); ++i) { > host1x_sync_writel(host, 0xffffffffu, > HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE(i)); > host1x_sync_writel(host, 0xffffffffu, > Acked-By: Terje Bergstrom <tbergstrom@nvidia.com> Terje
On 07.04.2014 11:18, Thierry Reding wrote: > If I understand correctly there's no immediate need for this to go to > stable kernels, nor for it to be queued for 3.15, right? That is the > potential extra write isn't causing any harm on actual hardware, is it? > > In that case I'll queue this up for 3.16. The reads and writes would get ignored on 32-bit kernel. The change does fix sync point behavior in 64-bit kernel, so it is fixing a real issue. Terje
On Mon, Apr 07, 2014 at 11:34:22AM +0300, Terje Bergström wrote: > On 07.04.2014 11:18, Thierry Reding wrote: > > If I understand correctly there's no immediate need for this to go to > > stable kernels, nor for it to be queued for 3.15, right? That is the > > potential extra write isn't causing any harm on actual hardware, is it? > > > > In that case I'll queue this up for 3.16. > > The reads and writes would get ignored on 32-bit kernel. The change does > fix sync point behavior in 64-bit kernel, so it is fixing a real issue. Okay, but given that we don't support any 64 bit Tegra hardware upstream yet, 3.16 would still be enough, wouldn't it? Thierry
On 07.04.2014 11:41, Thierry Reding wrote: > On Mon, Apr 07, 2014 at 11:34:22AM +0300, Terje Bergström wrote: >> On 07.04.2014 11:18, Thierry Reding wrote: >>> If I understand correctly there's no immediate need for this to go to >>> stable kernels, nor for it to be queued for 3.15, right? That is the >>> potential extra write isn't causing any harm on actual hardware, is it? >>> >>> In that case I'll queue this up for 3.16. >> >> The reads and writes would get ignored on 32-bit kernel. The change does >> fix sync point behavior in 64-bit kernel, so it is fixing a real issue. > > Okay, but given that we don't support any 64 bit Tegra hardware upstream > yet, 3.16 would still be enough, wouldn't it? Sure, sounds good. Terje
On 04/07/2014 02:18 AM, Thierry Reding wrote: > On Fri, Apr 04, 2014 at 04:31:05PM -0600, Stephen Warren wrote: >> From: Stephen Warren <swarren@nvidia.com> >> >> BIT_WORD() truncates rather than rounds, so the loops in >> syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <= >> rather than < in an attempt to process the correct number of registers >> when rounding of the conversion of count of bits to count of words is >> necessary. However, when rounding isn't necessary because the value is >> already a multiple of the divisor (as is the case for all values of >> nb_pts the code actually sees), this causes one too many registers to >> be processed. >> >> Solve this by using and explicit DIV_ROUND_UP() call, rather than >> BIT_WORD(), and comparing with < rather than <=. >> >> Signed-off-by: Stephen Warren <swarren@nvidia.com> >> --- >> v2: Use DIV_ROUND_UP rather than BITS_TO_LONGS to avoid problems on 64-bit. >> --- >> drivers/gpu/host1x/hw/intr_hw.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) > > If I understand correctly there's no immediate need for this to go to > stable kernels, nor for it to be queued for 3.15, right? That is the > potential extra write isn't causing any harm on actual hardware, is it? > > In that case I'll queue this up for 3.16. We should definitely apply this, and as far back as the code exists, since the SW is touching non-existent registers, and that is presumably undefined behaviour, which could potentially cause hard-to-diagnose bugs. Besides, I want the mainline kernel to run on our simulator without having to maintain patches for fixed issues.
On 04/04/2014 04:31 PM, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > BIT_WORD() truncates rather than rounds, so the loops in > syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <= > rather than < in an attempt to process the correct number of registers > when rounding of the conversion of count of bits to count of words is > necessary. However, when rounding isn't necessary because the value is > already a multiple of the divisor (as is the case for all values of > nb_pts the code actually sees), this causes one too many registers to > be processed. > > Solve this by using and explicit DIV_ROUND_UP() call, rather than > BIT_WORD(), and comparing with < rather than <=. I don't see this in linux-next yet.
On Mon, Apr 14, 2014 at 02:53:51PM -0600, Stephen Warren wrote: > On 04/04/2014 04:31 PM, Stephen Warren wrote: > > From: Stephen Warren <swarren@nvidia.com> > > > > BIT_WORD() truncates rather than rounds, so the loops in > > syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <= > > rather than < in an attempt to process the correct number of registers > > when rounding of the conversion of count of bits to count of words is > > necessary. However, when rounding isn't necessary because the value is > > already a multiple of the divisor (as is the case for all values of > > nb_pts the code actually sees), this causes one too many registers to > > be processed. > > > > Solve this by using and explicit DIV_ROUND_UP() call, rather than > > BIT_WORD(), and comparing with < rather than <=. > > I don't see this in linux-next yet. I've queued this locally but haven't pushed anything out yet. Thierry
On Mon, Apr 14, 2014 at 02:53:51PM -0600, Stephen Warren wrote: > On 04/04/2014 04:31 PM, Stephen Warren wrote: > > From: Stephen Warren <swarren@nvidia.com> > > > > BIT_WORD() truncates rather than rounds, so the loops in > > syncpt_thresh_isr() and _host1x_intr_disable_all_syncpt_intrs() use <= > > rather than < in an attempt to process the correct number of registers > > when rounding of the conversion of count of bits to count of words is > > necessary. However, when rounding isn't necessary because the value is > > already a multiple of the divisor (as is the case for all values of > > nb_pts the code actually sees), this causes one too many registers to > > be processed. > > > > Solve this by using and explicit DIV_ROUND_UP() call, rather than > > BIT_WORD(), and comparing with < rather than <=. > > I don't see this in linux-next yet. Just in case you haven't noticed, this this was merged in v3.15-rc2. I've also Cc'ed stable so that it can be applied as far back as 3.10 when the code it fixes was introduced. Thierry
diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c index db9017adfe2b..498b37e39058 100644 --- a/drivers/gpu/host1x/hw/intr_hw.c +++ b/drivers/gpu/host1x/hw/intr_hw.c @@ -47,7 +47,7 @@ static irqreturn_t syncpt_thresh_isr(int irq, void *dev_id) unsigned long reg; int i, id; - for (i = 0; i <= BIT_WORD(host->info->nb_pts); i++) { + for (i = 0; i < DIV_ROUND_UP(host->info->nb_pts, 32); i++) { reg = host1x_sync_readl(host, HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS(i)); for_each_set_bit(id, ®, BITS_PER_LONG) { @@ -64,7 +64,7 @@ static void _host1x_intr_disable_all_syncpt_intrs(struct host1x *host) { u32 i; - for (i = 0; i <= BIT_WORD(host->info->nb_pts); ++i) { + for (i = 0; i < DIV_ROUND_UP(host->info->nb_pts, 32); ++i) { host1x_sync_writel(host, 0xffffffffu, HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE(i)); host1x_sync_writel(host, 0xffffffffu,