diff mbox series

[v2] net: macb: Restart tx only if queue pointer is lagging

Message ID 20220407161659.14532-1-tomas.melin@vaisala.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [v2] net: macb: Restart tx only if queue pointer is lagging | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 25 this patch: 25
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 25 this patch: 25
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Tomas Melin April 7, 2022, 4:16 p.m. UTC
commit 4298388574da ("net: macb: restart tx after tx used bit read")
added support for restarting transmission. Restarting tx does not work
in case controller asserts TXUBR interrupt and TQBP is already at the end
of the tx queue. In that situation, restarting tx will immediately cause
assertion of another TXUBR interrupt. The driver will end up in an infinite
interrupt loop which it cannot break out of.

For cases where TQBP is at the end of the tx queue, instead
only clear TX_USED interrupt. As more data gets pushed to the queue,
transmission will resume.

This issue was observed on a Xilinx Zynq-7000 based board.
During stress test of the network interface,
driver would get stuck on interrupt loop within seconds or minutes
causing CPU to stall.

Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
---
Changes v2:
- change referenced commit to use original commit ID instead of stable branch ID

 drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Claudiu Beznea April 8, 2022, 7:42 a.m. UTC | #1
Hi, Tomas,

I'm returning to this new thread.

Sorry for the long delay. I looked though my emails for the steps to
reproduce the bug that introduces macb_tx_restart() but haven't found them.
Though the code in this patch should not affect at all SAMA5D4.

I have tested anyway SAMA5D4 with and without your code and saw no issues.
In case Dave, Jakub want to merge it you can add my
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>

The only thing with this patch, as mention earlier, is that freeing of
packet N may depend on sending packet N+1 and if packet N+1 blocks again
the HW then the freeing of packets N, N+1 may depend on packet N+2 etc. But
from your investigation it seems hardware has some bugs.

FYI, I looked though Xilinx github repository and saw no patches on macb
that may be related to this issue.

Anyway, it would be good if there would be some replies from Xilinx or at
least Cadence people on this (previous thread at [1]).

Thank you,
Claudiu Beznea

[1]
https://lore.kernel.org/netdev/82276bf7-72a5-6a2e-ff33-f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e48ccfd

On 07.04.2022 19:16, Tomas Melin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> commit 4298388574da ("net: macb: restart tx after tx used bit read")
> added support for restarting transmission. Restarting tx does not work
> in case controller asserts TXUBR interrupt and TQBP is already at the end
> of the tx queue. In that situation, restarting tx will immediately cause
> assertion of another TXUBR interrupt. The driver will end up in an infinite
> interrupt loop which it cannot break out of.
> 
> For cases where TQBP is at the end of the tx queue, instead
> only clear TX_USED interrupt. As more data gets pushed to the queue,
> transmission will resume.
> 
> This issue was observed on a Xilinx Zynq-7000 based board.
> During stress test of the network interface,
> driver would get stuck on interrupt loop within seconds or minutes
> causing CPU to stall.
> 
> Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
> ---
> Changes v2:
> - change referenced commit to use original commit ID instead of stable branch ID
> 
>  drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 800d5ced5800..e475be29845c 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue *queue)
>         unsigned int head = queue->tx_head;
>         unsigned int tail = queue->tx_tail;
>         struct macb *bp = queue->bp;
> +       unsigned int head_idx, tbqp;
> 
>         if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>                 queue_writel(queue, ISR, MACB_BIT(TXUBR));
> @@ -1665,6 +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
>         if (head == tail)
>                 return;
> 
> +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> +
> +       if (tbqp == head_idx)
> +               return;
> +
>         macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>  }
> 
> --
> 2.35.1
>
Harini Katakam April 8, 2022, 8:47 a.m. UTC | #2
Hi Claudiu, Tomas,

> -----Original Message-----
> From: Claudiu.Beznea@microchip.com <Claudiu.Beznea@microchip.com>
> Sent: Friday, April 8, 2022 1:13 PM
> To: tomas.melin@vaisala.com; netdev@vger.kernel.org
> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Harini Katakam <harinik@xilinx.com>; Shubhrajyoti
> Datta <shubhraj@xilinx.com>; Michal Simek <michals@xilinx.com>;
> pthombar@cadence.com; mparab@cadence.com; rafalo@cadence.com
> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
> 
> Hi, Tomas,
> 
> I'm returning to this new thread.
> 
> Sorry for the long delay. I looked though my emails for the steps to
> reproduce the bug that introduces macb_tx_restart() but haven't found
> them.
> Though the code in this patch should not affect at all SAMA5D4.
> 
> I have tested anyway SAMA5D4 with and without your code and saw no
> issues.
> In case Dave, Jakub want to merge it you can add my
> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> 
> The only thing with this patch, as mention earlier, is that freeing of packet N
> may depend on sending packet N+1 and if packet N+1 blocks again the HW
> then the freeing of packets N, N+1 may depend on packet N+2 etc. But from
> your investigation it seems hardware has some bugs.
> 
> FYI, I looked though Xilinx github repository and saw no patches on macb that
> may be related to this issue.
> 
> Anyway, it would be good if there would be some replies from Xilinx or at
> least Cadence people on this (previous thread at [1]).

Sorry for the delayed response.
I saw the condition you described and I'm not able to reproduce it.
But I agree with your assessment that restarting TX will not help in this case.
Also, the original patch restarting TX was also not reproduced on Zynq board
easily. We've had some users report the issue after > 1hr of traffic but that was
on a 4.xx kernel and I'm afraid I don’t have a case where I can reproduce the
original issue Claudiu described on any 5.xx kernel.

Based on the thread, there is one possibility for a HW bug that controller fails to
generate TCOMP when a TXUBR and restart conditions occur because these interrupts
are edge triggered on Zynq.

I'm going to check the errata and let you know if I find anything relevant and also
request Cadence folks to comment.
I'm sorry ask but is this condition reproducible on any later variants of the IP in Xilinx or
non-Xilinx devices?
Zynq US+ MPSoC has the r1p07 while Zynq has the older version IP r1p23 (old versioning)

Regards,
Harini

> 
> Thank you,
> Claudiu Beznea
> 
> [1]
> https://lore.kernel.org/netdev/82276bf7-72a5-6a2e-ff33-
> f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e4
> 8ccfd
> 
> On 07.04.2022 19:16, Tomas Melin wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know
> > the content is safe
> >
> > commit 4298388574da ("net: macb: restart tx after tx used bit read")
> > added support for restarting transmission. Restarting tx does not work
> > in case controller asserts TXUBR interrupt and TQBP is already at the
> > end of the tx queue. In that situation, restarting tx will immediately
> > cause assertion of another TXUBR interrupt. The driver will end up in
> > an infinite interrupt loop which it cannot break out of.
> >
> > For cases where TQBP is at the end of the tx queue, instead only clear
> > TX_USED interrupt. As more data gets pushed to the queue, transmission
> > will resume.
> >
> > This issue was observed on a Xilinx Zynq-7000 based board.
> > During stress test of the network interface, driver would get stuck on
> > interrupt loop within seconds or minutes causing CPU to stall.
> >
> > Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
> > ---
> > Changes v2:
> > - change referenced commit to use original commit ID instead of stable
> > branch ID
> >
> >  drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 800d5ced5800..e475be29845c 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue
> *queue)
> >         unsigned int head = queue->tx_head;
> >         unsigned int tail = queue->tx_tail;
> >         struct macb *bp = queue->bp;
> > +       unsigned int head_idx, tbqp;
> >
> >         if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> >                 queue_writel(queue, ISR, MACB_BIT(TXUBR)); @@ -1665,6
> > +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
> >         if (head == tail)
> >                 return;
> >
> > +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp,
> > + head));
> > +
> > +       if (tbqp == head_idx)
> > +               return;
> > +
> >         macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > }
> >
> > --
> > 2.35.1
> >
Tomas Melin April 8, 2022, 9:57 a.m. UTC | #3
Hi Claudiu, Harini,

On 08/04/2022 11:47, Harini Katakam wrote:
> Hi Claudiu, Tomas,
> 
>> -----Original Message-----
>> From: Claudiu.Beznea@microchip.com <Claudiu.Beznea@microchip.com>
>> Sent: Friday, April 8, 2022 1:13 PM
>> To: tomas.melin@vaisala.com; netdev@vger.kernel.org
>> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net; kuba@kernel.org;
>> pabeni@redhat.com; Harini Katakam <harinik@xilinx.com>; Shubhrajyoti
>> Datta <shubhraj@xilinx.com>; Michal Simek <michals@xilinx.com>;
>> pthombar@cadence.com; mparab@cadence.com; rafalo@cadence.com
>> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
>>
>> Hi, Tomas,
>>
>> I'm returning to this new thread.
>>
>> Sorry for the long delay. I looked though my emails for the steps to
>> reproduce the bug that introduces macb_tx_restart() but haven't found
>> them.
>> Though the code in this patch should not affect at all SAMA5D4.
>>
>> I have tested anyway SAMA5D4 with and without your code and saw no
>> issues.
>> In case Dave, Jakub want to merge it you can add my
>> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
>> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>

Thank you for the effort to review and test this! Also thanks for the
discussions around this issue to provide further insights.


>>
>> The only thing with this patch, as mention earlier, is that freeing of packet N
>> may depend on sending packet N+1 and if packet N+1 blocks again the HW
>> then the freeing of packets N, N+1 may depend on packet N+2 etc. But from
>> your investigation it seems hardware has some bugs.

Indeed, this is not behaviour I have encountered in any testing. If we
were ever to encounter such issue, then it would need to be handled in
separate manner. Perhaps call tx_interrupt() to progress the queue. But
then again, this does not seem to happen.

>>
>> FYI, I looked though Xilinx github repository and saw no patches on macb that
>> may be related to this issue.
>>
>> Anyway, it would be good if there would be some replies from Xilinx or at
>> least Cadence people on this (previous thread at [1]).
> 
> Sorry for the delayed response.
> I saw the condition you described and I'm not able to reproduce it.
> But I agree with your assessment that restarting TX will not help in this case.
> Also, the original patch restarting TX was also not reproduced on Zynq board
> easily. We've had some users report the issue after > 1hr of traffic but that was
> on a 4.xx kernel and I'm afraid I don’t have a case where I can reproduce the
> original issue Claudiu described on any 5.xx kernel.
> 
> Based on the thread, there is one possibility for a HW bug that controller fails to
> generate TCOMP when a TXUBR and restart conditions occur because these interrupts
> are edge triggered on Zynq.

This is interesting hypothesis and that would indeed lead to this situation.


> 
> I'm going to check the errata and let you know if I find anything relevant and also
> request Cadence folks to comment.
> I'm sorry ask but is this condition reproducible on any later variants of the IP in Xilinx or
> non-Xilinx devices?

I have not seen this issue on MPSoC (atleast yet). Indeed this issue
seems to require the correct timing conditions for being able to trigger it.

So any additional information that we might get about possible issues in
IP is welcomed. However, the hardware on the boards we have at hand will
still be the same so the patch as such is relevant.

BR,
Tomas



> Zynq US+ MPSoC has the r1p07 while Zynq has the older version IP r1p23 (old versioning)
> 
> Regards,
> Harini
> 
>>
>> Thank you,
>> Claudiu Beznea
>>
>> [1]
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F82276bf7-72a5-6a2e-ff33-&amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C352a532fe14b42ad01d508da193c6320%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637850044400650522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=rsBnJEVlDqpSUIfL%2BuXzAgTUL4w9rqaR6A6OLAi9gNQ%3D&amp;reserved=0
>> f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e4
>> 8ccfd
>>
>> On 07.04.2022 19:16, Tomas Melin wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know
>>> the content is safe
>>>
>>> commit 4298388574da ("net: macb: restart tx after tx used bit read")
>>> added support for restarting transmission. Restarting tx does not work
>>> in case controller asserts TXUBR interrupt and TQBP is already at the
>>> end of the tx queue. In that situation, restarting tx will immediately
>>> cause assertion of another TXUBR interrupt. The driver will end up in
>>> an infinite interrupt loop which it cannot break out of.
>>>
>>> For cases where TQBP is at the end of the tx queue, instead only clear
>>> TX_USED interrupt. As more data gets pushed to the queue, transmission
>>> will resume.
>>>
>>> This issue was observed on a Xilinx Zynq-7000 based board.
>>> During stress test of the network interface, driver would get stuck on
>>> interrupt loop within seconds or minutes causing CPU to stall.
>>>
>>> Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
>>> ---
>>> Changes v2:
>>> - change referenced commit to use original commit ID instead of stable
>>> branch ID
>>>
>>>  drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>>> b/drivers/net/ethernet/cadence/macb_main.c
>>> index 800d5ced5800..e475be29845c 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue
>> *queue)
>>>         unsigned int head = queue->tx_head;
>>>         unsigned int tail = queue->tx_tail;
>>>         struct macb *bp = queue->bp;
>>> +       unsigned int head_idx, tbqp;
>>>
>>>         if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>>                 queue_writel(queue, ISR, MACB_BIT(TXUBR)); @@ -1665,6
>>> +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
>>>         if (head == tail)
>>>                 return;
>>>
>>> +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>>> +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>>> +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp,
>>> + head));
>>> +
>>> +       if (tbqp == head_idx)
>>> +               return;
>>> +
>>>         macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>> }
>>>
>>> --
>>> 2.35.1
>>>
>
Harini Katakam April 8, 2022, 12:02 p.m. UTC | #4
Hi Tomas,

> -----Original Message-----
> From: Tomas Melin <tomas.melin@vaisala.com>
> Sent: Friday, April 8, 2022 3:27 PM
> To: Harini Katakam <harinik@xilinx.com>; Claudiu.Beznea@microchip.com;
> netdev@vger.kernel.org
> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Shubhrajyoti Datta <shubhraj@xilinx.com>; Michal
> Simek <michals@xilinx.com>; pthombar@cadence.com;
> mparab@cadence.com; rafalo@cadence.com
> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
> 
> Hi Claudiu, Harini,
> 
> On 08/04/2022 11:47, Harini Katakam wrote:
> > Hi Claudiu, Tomas,
> >
> >> -----Original Message-----
> >> From: Claudiu.Beznea@microchip.com <Claudiu.Beznea@microchip.com>
> >> Sent: Friday, April 8, 2022 1:13 PM
> >> To: tomas.melin@vaisala.com; netdev@vger.kernel.org
> >> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net;
> kuba@kernel.org;
> >> pabeni@redhat.com; Harini Katakam <harinik@xilinx.com>; Shubhrajyoti
> >> Datta <shubhraj@xilinx.com>; Michal Simek <michals@xilinx.com>;
> >> pthombar@cadence.com; mparab@cadence.com; rafalo@cadence.com
> >> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is
> lagging
> >>
> >> Hi, Tomas,
> >>
> >> I'm returning to this new thread.
> >>
> >> Sorry for the long delay. I looked though my emails for the steps to
> >> reproduce the bug that introduces macb_tx_restart() but haven't found
> >> them.
> >> Though the code in this patch should not affect at all SAMA5D4.
> >>
> >> I have tested anyway SAMA5D4 with and without your code and saw no
> >> issues.
> >> In case Dave, Jakub want to merge it you can add my
> >> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> >> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> 
> Thank you for the effort to review and test this! Also thanks for the
> discussions around this issue to provide further insights.
> 
> 
> >>
> >> The only thing with this patch, as mention earlier, is that freeing of packet
> N
> >> may depend on sending packet N+1 and if packet N+1 blocks again the
> HW
> >> then the freeing of packets N, N+1 may depend on packet N+2 etc. But
> from
> >> your investigation it seems hardware has some bugs.
> 
> Indeed, this is not behaviour I have encountered in any testing. If we
> were ever to encounter such issue, then it would need to be handled in
> separate manner. Perhaps call tx_interrupt() to progress the queue. But
> then again, this does not seem to happen.
> 
> >>
> >> FYI, I looked though Xilinx github repository and saw no patches on macb
> that
> >> may be related to this issue.
> >>
> >> Anyway, it would be good if there would be some replies from Xilinx or at
> >> least Cadence people on this (previous thread at [1]).
> >
> > Sorry for the delayed response.
> > I saw the condition you described and I'm not able to reproduce it.
> > But I agree with your assessment that restarting TX will not help in this
> case.
> > Also, the original patch restarting TX was also not reproduced on Zynq
> board
> > easily. We've had some users report the issue after > 1hr of traffic but that
> was
> > on a 4.xx kernel and I'm afraid I don’t have a case where I can reproduce
> the
> > original issue Claudiu described on any 5.xx kernel.
> >
> > Based on the thread, there is one possibility for a HW bug that controller
> fails to
> > generate TCOMP when a TXUBR and restart conditions occur because
> these interrupts
> > are edge triggered on Zynq.
> 
> This is interesting hypothesis and that would indeed lead to this situation.
> 
> 
> >
> > I'm going to check the errata and let you know if I find anything relevant
> and also
> > request Cadence folks to comment.
> > I'm sorry ask but is this condition reproducible on any later variants of the IP
> in Xilinx or
> > non-Xilinx devices?
> 
> I have not seen this issue on MPSoC (atleast yet). Indeed this issue
> seems to require the correct timing conditions for being able to trigger it.
> 
> So any additional information that we might get about possible issues in
> IP is welcomed. However, the hardware on the boards we have at hand will
> still be the same so the patch as such is relevant.

Yes, agreed. The patch is still required.

Regards,
Harini

> 
> BR,
> Tomas
> 
> 
> 
> > Zynq US+ MPSoC has the r1p07 while Zynq has the older version IP r1p23
> (old versioning)
> >
> > Regards,
> > Harini
> >
> >>
> >> Thank you,
> >> Claudiu Beznea
> >>
> >> [1]
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.k
> ernel.org%2Fnetdev%2F82276bf7-72a5-6a2e-ff33-
> &amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C352a532fe14b42ad
> 01d508da193c6320%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C6
> 37850044400650522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=
> rsBnJEVlDqpSUIfL%2BuXzAgTUL4w9rqaR6A6OLAi9gNQ%3D&amp;reserved=
> 0
> >>
> f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e4
> >> 8ccfd
> >>
<snip>
Jakub Kicinski April 12, 2022, 1:19 a.m. UTC | #5
On Thu,  7 Apr 2022 19:16:59 +0300 Tomas Melin wrote:
> commit 4298388574da ("net: macb: restart tx after tx used bit read")
> added support for restarting transmission. Restarting tx does not work
> in case controller asserts TXUBR interrupt and TQBP is already at the end
> of the tx queue. In that situation, restarting tx will immediately cause
> assertion of another TXUBR interrupt. The driver will end up in an infinite
> interrupt loop which it cannot break out of.
> 
> For cases where TQBP is at the end of the tx queue, instead
> only clear TX_USED interrupt. As more data gets pushed to the queue,
> transmission will resume.
> 
> This issue was observed on a Xilinx Zynq-7000 based board.
> During stress test of the network interface,
> driver would get stuck on interrupt loop within seconds or minutes
> causing CPU to stall.
> 
> Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>

Applied, thanks. Commit 5ad7f18cd82c ("net: macb: Restart tx only if
queue pointer is lagging") in net.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 800d5ced5800..e475be29845c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1658,6 +1658,7 @@  static void macb_tx_restart(struct macb_queue *queue)
 	unsigned int head = queue->tx_head;
 	unsigned int tail = queue->tx_tail;
 	struct macb *bp = queue->bp;
+	unsigned int head_idx, tbqp;
 
 	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
 		queue_writel(queue, ISR, MACB_BIT(TXUBR));
@@ -1665,6 +1666,13 @@  static void macb_tx_restart(struct macb_queue *queue)
 	if (head == tail)
 		return;
 
+	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
+	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
+	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
+
+	if (tbqp == head_idx)
+		return;
+
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
 }