diff mbox series

[V1,net-next] net: stmmac: should not modify RX descriptor when STMMAC resume

Message ID 20210527084911.20116-1-qiangqing.zhang@nxp.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [V1,net-next] net: stmmac: should not modify RX descriptor when STMMAC resume | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: linux-arm-kernel@lists.infradead.org linux-stm32@st-md-mailman.stormreply.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Joakim Zhang May 27, 2021, 8:49 a.m. UTC
When system resume back, STMMAC will clear RX descriptors:
stmmac_resume()
	->stmmac_clear_descriptors()
		->stmmac_clear_rx_descriptors()
			->stmmac_init_rx_desc()
				->dwmac4_set_rx_owner()
				//p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR);
It only asserts OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2 fields.

Let's take a case into account, when system suspend, it is possible that
there are packets have not received yet, so the RX descriptors are wrote
back by DMA, e.g.
008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040

When system resume back, after above process, it became a broken
descriptor:
008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040

The issue is that it only changes the owner of this descriptor, but do nothing
about desc0/1/2 fields. The descriptor of STMMAC a bit special, applicaton
prepares RX descriptors for DMA, after DMA recevie the packets, it will write
back the descriptors, so the same field of a descriptor have different
meanings to application and DMA. It should be a software bug there, and may
not easy to reproduce, but there is a certain probability that it will
occur.

i.MX8MP STMMAC DMA width is 34 bits, so desc0/desc1 indicates the buffer
address, after system resume, the buffer address changes to
0x40_00000000. And the correct rx descriptor is 008 [0x00000000c4310080]:
0x6511000 0x1 0x0 0x81000000, the valid buffer address is 0x1_6511000.
So when DMA tried to access the invalid address 0x40_00000000 would
generate fatal bus error.

But for other 32 bits width DMA, DMA still can work when this issue happened,
only desc0 indicates buffer address, so the buffer address is 0x00000000 when
system resume.

There is a NOTE in the Guide:
In the Receive Descriptor (Read Format), if the Buffer Address field is all 0s,
the module does not transfer data to that buffer and skips to the next buffer
or next descriptor.

Also a feedback from SYPS:
When buffer address field of Rx descriptor is all 0's, DMA skips such descriptor
means DMA closes Rx descriptor as Intermediate descriptor with OWN bit set to 0,
indicates that the application owns this descriptor.

It now appears that this issue seems only can be reproduced on DMA width more
than 32 bits, this may be why other SoCs which integrated the same STMMAC IP
can't reproduce it.

Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume back") tried
to re-init desc0/desc1 (buffer address fields) to fix this issue, but it
is not a proper solution, and made regression on Jetson TX2 boards.

It is unreasonable to modify RX descriptors outside of stmmac_rx_refill() function,
where it will clear all desc0/desc1/desc2/desc3 fields together.

This patch removes RX descriptors modification when STMMAC resume.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
---
ChangeLogs:
	V1: remove RFC tag, please come here for RFC discussion:
	    https://lore.kernel.org/netdev/cec17489-2ef9-7862-94c8-202d31507a0c@nvidia.com/T/
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jon Hunter May 27, 2021, 12:43 p.m. UTC | #1
On 27/05/2021 09:49, Joakim Zhang wrote:
> When system resume back, STMMAC will clear RX descriptors:
> stmmac_resume()
> 	->stmmac_clear_descriptors()
> 		->stmmac_clear_rx_descriptors()
> 			->stmmac_init_rx_desc()
> 				->dwmac4_set_rx_owner()
> 				//p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR);
> It only asserts OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2 fields.
> 
> Let's take a case into account, when system suspend, it is possible that
> there are packets have not received yet, so the RX descriptors are wrote
> back by DMA, e.g.
> 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040
> 
> When system resume back, after above process, it became a broken
> descriptor:
> 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040
> 
> The issue is that it only changes the owner of this descriptor, but do nothing
> about desc0/1/2 fields. The descriptor of STMMAC a bit special, applicaton
> prepares RX descriptors for DMA, after DMA recevie the packets, it will write
> back the descriptors, so the same field of a descriptor have different
> meanings to application and DMA. It should be a software bug there, and may
> not easy to reproduce, but there is a certain probability that it will
> occur.
> 
> i.MX8MP STMMAC DMA width is 34 bits, so desc0/desc1 indicates the buffer
> address, after system resume, the buffer address changes to
> 0x40_00000000. And the correct rx descriptor is 008 [0x00000000c4310080]:
> 0x6511000 0x1 0x0 0x81000000, the valid buffer address is 0x1_6511000.
> So when DMA tried to access the invalid address 0x40_00000000 would
> generate fatal bus error.
> 
> But for other 32 bits width DMA, DMA still can work when this issue happened,
> only desc0 indicates buffer address, so the buffer address is 0x00000000 when
> system resume.
> 
> There is a NOTE in the Guide:
> In the Receive Descriptor (Read Format), if the Buffer Address field is all 0s,
> the module does not transfer data to that buffer and skips to the next buffer
> or next descriptor.
> 
> Also a feedback from SYPS:
> When buffer address field of Rx descriptor is all 0's, DMA skips such descriptor
> means DMA closes Rx descriptor as Intermediate descriptor with OWN bit set to 0,
> indicates that the application owns this descriptor.
> 
> It now appears that this issue seems only can be reproduced on DMA width more
> than 32 bits, this may be why other SoCs which integrated the same STMMAC IP
> can't reproduce it.
> 
> Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume back") tried
> to re-init desc0/desc1 (buffer address fields) to fix this issue, but it
> is not a proper solution, and made regression on Jetson TX2 boards.
> 
> It is unreasonable to modify RX descriptors outside of stmmac_rx_refill() function,
> where it will clear all desc0/desc1/desc2/desc3 fields together.
> 
> This patch removes RX descriptors modification when STMMAC resume.
> 
> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> ---
> ChangeLogs:
> 	V1: remove RFC tag, please come here for RFC discussion:
> 	    https://lore.kernel.org/netdev/cec17489-2ef9-7862-94c8-202d31507a0c@nvidia.com/T/
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index bf9fe25fed69..2570d26286ea 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -7187,6 +7187,8 @@ static void stmmac_reset_queues_param(struct stmmac_priv *priv)
>  		tx_q->mss = 0;
>  
>  		netdev_tx_reset_queue(netdev_get_tx_queue(priv->dev, queue));
> +
> +		stmmac_clear_tx_descriptors(priv, queue);
>  	}
>  }
>  
> @@ -7251,7 +7253,6 @@ int stmmac_resume(struct device *dev)
>  	stmmac_reset_queues_param(priv);
>  
>  	stmmac_free_tx_skbufs(priv);
> -	stmmac_clear_descriptors(priv);
>  
>  	stmmac_hw_setup(ndev, false);
>  	stmmac_init_coalesce(priv);
> 


So as previously mentioned this still causing a regression when resuming
from suspend on Jetson TX2 platform. I am not sure why you are still
attempting to push this patch as-is when it causes a complete failure
for another platform. I am quite disappointed that you are ignoring the
issue we have reported :-(

To summarise we do not see any issues with suspend on Jetson TX2 without
this patch. I have stressed suspend on this board doing 2000 suspend
iterations and so no issues. However, this patch completely breaks
resuming from suspend for us. Therefore, I don't see how we can merge this.

Given that this fixes a problem, that appears to be specific to your
platform, why do you not implement this in away such that this is only
done for your platform?

Jon
Joakim Zhang May 27, 2021, 1:13 p.m. UTC | #2
Hi Joh,

> -----Original Message-----
> From: Jon Hunter <jonathanh@nvidia.com>
> Sent: 2021年5月27日 20:43
> To: Joakim Zhang <qiangqing.zhang@nxp.com>; f.fainelli@gmail.com;
> peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;
> joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;
> mcoquelin.stm32@gmail.com; andrew@lunn.ch; treding@nvidia.com
> Cc: netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>
> Subject: Re: [PATCH V1 net-next] net: stmmac: should not modify RX descriptor
> when STMMAC resume
> 
> 
> On 27/05/2021 09:49, Joakim Zhang wrote:
> > When system resume back, STMMAC will clear RX descriptors:
> > stmmac_resume()
> > 	->stmmac_clear_descriptors()
> > 		->stmmac_clear_rx_descriptors()
> > 			->stmmac_init_rx_desc()
> > 				->dwmac4_set_rx_owner()
> > 				//p->des3 |= cpu_to_le32(RDES3_OWN |
> RDES3_BUFFER1_VALID_ADDR); It
> > only asserts OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2
> fields.
> >
> > Let's take a case into account, when system suspend, it is possible
> > that there are packets have not received yet, so the RX descriptors
> > are wrote back by DMA, e.g.
> > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040
> >
> > When system resume back, after above process, it became a broken
> > descriptor:
> > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040
> >
> > The issue is that it only changes the owner of this descriptor, but do
> > nothing about desc0/1/2 fields. The descriptor of STMMAC a bit
> > special, applicaton prepares RX descriptors for DMA, after DMA recevie
> > the packets, it will write back the descriptors, so the same field of
> > a descriptor have different meanings to application and DMA. It should
> > be a software bug there, and may not easy to reproduce, but there is a
> > certain probability that it will occur.
> >
> > i.MX8MP STMMAC DMA width is 34 bits, so desc0/desc1 indicates the
> > buffer address, after system resume, the buffer address changes to
> > 0x40_00000000. And the correct rx descriptor is 008 [0x00000000c4310080]:
> > 0x6511000 0x1 0x0 0x81000000, the valid buffer address is 0x1_6511000.
> > So when DMA tried to access the invalid address 0x40_00000000 would
> > generate fatal bus error.
> >
> > But for other 32 bits width DMA, DMA still can work when this issue
> > happened, only desc0 indicates buffer address, so the buffer address
> > is 0x00000000 when system resume.
> >
> > There is a NOTE in the Guide:
> > In the Receive Descriptor (Read Format), if the Buffer Address field
> > is all 0s, the module does not transfer data to that buffer and skips
> > to the next buffer or next descriptor.
> >
> > Also a feedback from SYPS:
> > When buffer address field of Rx descriptor is all 0's, DMA skips such
> > descriptor means DMA closes Rx descriptor as Intermediate descriptor
> > with OWN bit set to 0, indicates that the application owns this descriptor.
> >
> > It now appears that this issue seems only can be reproduced on DMA
> > width more than 32 bits, this may be why other SoCs which integrated
> > the same STMMAC IP can't reproduce it.
> >
> > Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume
> > back") tried to re-init desc0/desc1 (buffer address fields) to fix
> > this issue, but it is not a proper solution, and made regression on Jetson TX2
> boards.
> >
> > It is unreasonable to modify RX descriptors outside of
> > stmmac_rx_refill() function, where it will clear all desc0/desc1/desc2/desc3
> fields together.
> >
> > This patch removes RX descriptors modification when STMMAC resume.
> >
> > Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> > ---
> > ChangeLogs:
> > 	V1: remove RFC tag, please come here for RFC discussion:
> >
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fnetdev%2Fcec17489-2ef9-7862-94c8-202d31507a0c%40nvidia
> .c
> >
> om%2FT%2F&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C16be3
> 6b4a2584
> >
> a4c208608d9210d09ef%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> 7C63757
> >
> 7162155074221%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
> QIjoiV2lu
> >
> MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BdsNu6l4%2Bt
> Q6WllA
> > tVn%2BP1jD3sRXfpIH2XErmRh%2BjLA%3D&amp;reserved=0
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index bf9fe25fed69..2570d26286ea 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -7187,6 +7187,8 @@ static void stmmac_reset_queues_param(struct
> stmmac_priv *priv)
> >  		tx_q->mss = 0;
> >
> >  		netdev_tx_reset_queue(netdev_get_tx_queue(priv->dev, queue));
> > +
> > +		stmmac_clear_tx_descriptors(priv, queue);
> >  	}
> >  }
> >
> > @@ -7251,7 +7253,6 @@ int stmmac_resume(struct device *dev)
> >  	stmmac_reset_queues_param(priv);
> >
> >  	stmmac_free_tx_skbufs(priv);
> > -	stmmac_clear_descriptors(priv);
> >
> >  	stmmac_hw_setup(ndev, false);
> >  	stmmac_init_coalesce(priv);
> >
> 
> 
> So as previously mentioned this still causing a regression when resuming from
> suspend on Jetson TX2 platform. I am not sure why you are still attempting to
> push this patch as-is when it causes a complete failure for another platform. I
> am quite disappointed that you are ignoring the issue we have reported :-(
I first pushed the RFC and discussed about the issue, I think this patch trigger a potential issue at your side. 
IMHO, you may need try to find the root case why this patch make regression on your platform.

> To summarise we do not see any issues with suspend on Jetson TX2 without
> this patch. I have stressed suspend on this board doing 2000 suspend iterations
> and so no issues. However, this patch completely breaks resuming from
> suspend for us. Therefore, I don't see how we can merge this.
If you read the commit message, you should know you can't reproduce this issue if your DMA bit width is 32 bits.
Please take the commit message seriously, do you admit this is a real bug? After the analysis, this may a common issue.

> Given that this fixes a problem, that appears to be specific to your platform,
> why do you not implement this in away such that this is only done for your
> platform?
I really don't know how to take your case into account, let us add a flag in "struct plat_stmmacenet_data" to 
separate different cases? If maintainer agree with this, I can do.

I agree keep this patch doesn't merge until a way also fit your platform. If there is still no good solutions, I will try to add
a specific flag in platform data.

Best Regards,
Joakim Zhang
> Jon
> 
> --
> nvpublic
Jon Hunter May 27, 2021, 1:38 p.m. UTC | #3
On 27/05/2021 14:13, Joakim Zhang wrote:

...

>> So as previously mentioned this still causing a regression when resuming from
>> suspend on Jetson TX2 platform. I am not sure why you are still attempting to
>> push this patch as-is when it causes a complete failure for another platform. I
>> am quite disappointed that you are ignoring the issue we have reported :-(
> I first pushed the RFC and discussed about the issue, I think this patch trigger a potential issue at your side. 
> IMHO, you may need try to find the root case why this patch make regression on your platform.

I am always happy to test patches, but I am not sure it is completely
fair to ask us to debug an issue your patch is introducing. Yes this
could be uncovering an underlying issue with the driver, but that does
not mean we can introduce regressions.

TBH where are the maintainers of this driver? We really need their
inputs to help us figure this out.

>> To summarise we do not see any issues with suspend on Jetson TX2 without
>> this patch. I have stressed suspend on this board doing 2000 suspend iterations
>> and so no issues. However, this patch completely breaks resuming from
>> suspend for us. Therefore, I don't see how we can merge this.
> If you read the commit message, you should know you can't reproduce this issue if your DMA bit width is 32 bits.
> Please take the commit message seriously, do you admit this is a real bug? After the analysis, this may a common issue.

I did, but it seems that you are OK to break devices with 32-bit DMAs.
If it breaks Jetson then it could also break others but you don't know
that yet. I am not saying there is not a bug, but I am saying that you
cannot attempt to fix it, by causing regressions for other platforms.

>> Given that this fixes a problem, that appears to be specific to your platform,
>> why do you not implement this in away such that this is only done for your
>> platform?
> I really don't know how to take your case into account, let us add a flag in "struct plat_stmmacenet_data" to 
> separate different cases? If maintainer agree with this, I can do.
> 
> I agree keep this patch doesn't merge until a way also fit your platform. If there is still no good solutions, I will try to add
> a specific flag in platform data.

OK, so I am not sure why you are trying to push this patch again without
coming to a resolution with known issues this change is causing. And
that is the problem I have with this.

Jon
Thierry Reding May 27, 2021, 2:13 p.m. UTC | #4
On Thu, May 27, 2021 at 01:13:06PM +0000, Joakim Zhang wrote:
> 
> Hi Joh,
> 
> > -----Original Message-----
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Sent: 2021年5月27日 20:43
> > To: Joakim Zhang <qiangqing.zhang@nxp.com>; f.fainelli@gmail.com;
> > peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;
> > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;
> > mcoquelin.stm32@gmail.com; andrew@lunn.ch; treding@nvidia.com
> > Cc: netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>
> > Subject: Re: [PATCH V1 net-next] net: stmmac: should not modify RX descriptor
> > when STMMAC resume
> > 
> > 
> > On 27/05/2021 09:49, Joakim Zhang wrote:
> > > When system resume back, STMMAC will clear RX descriptors:
> > > stmmac_resume()
> > > 	->stmmac_clear_descriptors()
> > > 		->stmmac_clear_rx_descriptors()
> > > 			->stmmac_init_rx_desc()
> > > 				->dwmac4_set_rx_owner()
> > > 				//p->des3 |= cpu_to_le32(RDES3_OWN |
> > RDES3_BUFFER1_VALID_ADDR); It
> > > only asserts OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2
> > fields.
> > >
> > > Let's take a case into account, when system suspend, it is possible
> > > that there are packets have not received yet, so the RX descriptors
> > > are wrote back by DMA, e.g.
> > > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040
> > >
> > > When system resume back, after above process, it became a broken
> > > descriptor:
> > > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0xb5010040
> > >
> > > The issue is that it only changes the owner of this descriptor, but do
> > > nothing about desc0/1/2 fields. The descriptor of STMMAC a bit
> > > special, applicaton prepares RX descriptors for DMA, after DMA recevie
> > > the packets, it will write back the descriptors, so the same field of
> > > a descriptor have different meanings to application and DMA. It should
> > > be a software bug there, and may not easy to reproduce, but there is a
> > > certain probability that it will occur.
> > >
> > > i.MX8MP STMMAC DMA width is 34 bits, so desc0/desc1 indicates the
> > > buffer address, after system resume, the buffer address changes to
> > > 0x40_00000000. And the correct rx descriptor is 008 [0x00000000c4310080]:
> > > 0x6511000 0x1 0x0 0x81000000, the valid buffer address is 0x1_6511000.
> > > So when DMA tried to access the invalid address 0x40_00000000 would
> > > generate fatal bus error.
> > >
> > > But for other 32 bits width DMA, DMA still can work when this issue
> > > happened, only desc0 indicates buffer address, so the buffer address
> > > is 0x00000000 when system resume.
> > >
> > > There is a NOTE in the Guide:
> > > In the Receive Descriptor (Read Format), if the Buffer Address field
> > > is all 0s, the module does not transfer data to that buffer and skips
> > > to the next buffer or next descriptor.
> > >
> > > Also a feedback from SYPS:
> > > When buffer address field of Rx descriptor is all 0's, DMA skips such
> > > descriptor means DMA closes Rx descriptor as Intermediate descriptor
> > > with OWN bit set to 0, indicates that the application owns this descriptor.
> > >
> > > It now appears that this issue seems only can be reproduced on DMA
> > > width more than 32 bits, this may be why other SoCs which integrated
> > > the same STMMAC IP can't reproduce it.
> > >
> > > Commit 9c63faaa931e ("net: stmmac: re-init rx buffers when mac resume
> > > back") tried to re-init desc0/desc1 (buffer address fields) to fix
> > > this issue, but it is not a proper solution, and made regression on Jetson TX2
> > boards.
> > >
> > > It is unreasonable to modify RX descriptors outside of
> > > stmmac_rx_refill() function, where it will clear all desc0/desc1/desc2/desc3
> > fields together.
> > >
> > > This patch removes RX descriptors modification when STMMAC resume.
> > >
> > > Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> > > ---
> > > ChangeLogs:
> > > 	V1: remove RFC tag, please come here for RFC discussion:
> > >
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > > .kernel.org%2Fnetdev%2Fcec17489-2ef9-7862-94c8-202d31507a0c%40nvidia
> > .c
> > >
> > om%2FT%2F&amp;data=04%7C01%7Cqiangqing.zhang%40nxp.com%7C16be3
> > 6b4a2584
> > >
> > a4c208608d9210d09ef%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > 7C63757
> > >
> > 7162155074221%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
> > QIjoiV2lu
> > >
> > MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BdsNu6l4%2Bt
> > Q6WllA
> > > tVn%2BP1jD3sRXfpIH2XErmRh%2BjLA%3D&amp;reserved=0
> > > ---
> > >  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > index bf9fe25fed69..2570d26286ea 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > @@ -7187,6 +7187,8 @@ static void stmmac_reset_queues_param(struct
> > stmmac_priv *priv)
> > >  		tx_q->mss = 0;
> > >
> > >  		netdev_tx_reset_queue(netdev_get_tx_queue(priv->dev, queue));
> > > +
> > > +		stmmac_clear_tx_descriptors(priv, queue);
> > >  	}
> > >  }
> > >
> > > @@ -7251,7 +7253,6 @@ int stmmac_resume(struct device *dev)
> > >  	stmmac_reset_queues_param(priv);
> > >
> > >  	stmmac_free_tx_skbufs(priv);
> > > -	stmmac_clear_descriptors(priv);
> > >
> > >  	stmmac_hw_setup(ndev, false);
> > >  	stmmac_init_coalesce(priv);
> > >
> > 
> > 
> > So as previously mentioned this still causing a regression when resuming from
> > suspend on Jetson TX2 platform. I am not sure why you are still attempting to
> > push this patch as-is when it causes a complete failure for another platform. I
> > am quite disappointed that you are ignoring the issue we have reported :-(
> I first pushed the RFC and discussed about the issue, I think this patch trigger a potential issue at your side. 
> IMHO, you may need try to find the root case why this patch make regression on your platform.
> 
> > To summarise we do not see any issues with suspend on Jetson TX2 without
> > this patch. I have stressed suspend on this board doing 2000 suspend iterations
> > and so no issues. However, this patch completely breaks resuming from
> > suspend for us. Therefore, I don't see how we can merge this.
> If you read the commit message, you should know you can't reproduce
> this issue if your DMA bit width is 32 bits.

Tegra186 and Tegra194 have DMA bit masks of 40 and 39 bits,
respectively, so according to what you're saying we should be able to
reproduce this, but as Jon explained we were unable to even reproduce
this once.

Thierry
Thierry Reding May 27, 2021, 2:26 p.m. UTC | #5
On Thu, May 27, 2021 at 04:49:11PM +0800, Joakim Zhang wrote:
> When system resume back, STMMAC will clear RX descriptors:
> stmmac_resume()
> 	->stmmac_clear_descriptors()
> 		->stmmac_clear_rx_descriptors()
> 			->stmmac_init_rx_desc()
> 				->dwmac4_set_rx_owner()
> 				//p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR);
> It only asserts OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2 fields.
> 
> Let's take a case into account, when system suspend, it is possible that
> there are packets have not received yet, so the RX descriptors are wrote
> back by DMA, e.g.
> 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040

This is something that completely baffles me. Why is DMA still writing
back RX descriptors on system suspend? stmmac_suspend() should take care
of completely quiescing the device so that DMA is no longer active. It
sounds like for some reason that doesn't happen when you run into this
problematic situation.

I see that some platform adaptations override the DMA ->stop_rx()
callback (see dwmac-sun8i.c and dwxgmac2_dma.c), so perhaps some similar
override is required on your platform to actually stop DMA?

Thierry
Joakim Zhang May 31, 2021, 9:28 a.m. UTC | #6
Hi Thierry,

> -----Original Message-----
> From: Thierry Reding <treding@nvidia.com>
> Sent: 2021年5月27日 22:27
> To: Joakim Zhang <qiangqing.zhang@nxp.com>
> Cc: f.fainelli@gmail.com; jonathanh@nvidia.com; peppe.cavallaro@st.com;
> alexandre.torgue@foss.st.com; joabreu@synopsys.com;
> davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;
> andrew@lunn.ch; netdev@vger.kernel.org; dl-linux-imx <linux-imx@nxp.com>
> Subject: Re: [PATCH V1 net-next] net: stmmac: should not modify RX descriptor
> when STMMAC resume
> 
> On Thu, May 27, 2021 at 04:49:11PM +0800, Joakim Zhang wrote:
> > When system resume back, STMMAC will clear RX descriptors:
> > stmmac_resume()
> > 	->stmmac_clear_descriptors()
> > 		->stmmac_clear_rx_descriptors()
> > 			->stmmac_init_rx_desc()
> > 				->dwmac4_set_rx_owner()
> > 				//p->des3 |= cpu_to_le32(RDES3_OWN |
> RDES3_BUFFER1_VALID_ADDR); It
> > only asserts OWN and BUF1V bits in desc3 field, doesn't clear desc0/1/2
> fields.
> >
> > Let's take a case into account, when system suspend, it is possible
> > that there are packets have not received yet, so the RX descriptors
> > are wrote back by DMA, e.g.
> > 008 [0x00000000c4310080]: 0x0 0x40 0x0 0x34010040
>
> This is something that completely baffles me. Why is DMA still writing back RX
> descriptors on system suspend? stmmac_suspend() should take care of
> completely quiescing the device so that DMA is no longer active. It sounds like
> for some reason that doesn't happen when you run into this problematic
> situation.
Thanks. I don't think so. stmmac_stop_all_dma() in stmmac_suspend() to stop RX DMA, RX descriptors
have been wrote back before stop DMA, and these descriptors have not been handled yet, so they stay there.

Explain more detailed, NAPI scheduled finished, DMA receive frames and write back RX descriptors, but RX interrupt
doesn't issue yet, system suspending, stop RX DMA. After system suspended, RX descriptors is the write back format.
  
> I see that some platform adaptations override the DMA ->stop_rx() callback
> (see dwmac-sun8i.c and dwxgmac2_dma.c), so perhaps some similar override
> is required on your platform to actually stop DMA?
I check the driver you mentioned, it seems they have SoC level registers to configure DMA. But i.MX8MP doesn't implement it.

Best Regards,
Joakim Zhang
> Thierry
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bf9fe25fed69..2570d26286ea 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -7187,6 +7187,8 @@  static void stmmac_reset_queues_param(struct stmmac_priv *priv)
 		tx_q->mss = 0;
 
 		netdev_tx_reset_queue(netdev_get_tx_queue(priv->dev, queue));
+
+		stmmac_clear_tx_descriptors(priv, queue);
 	}
 }
 
@@ -7251,7 +7253,6 @@  int stmmac_resume(struct device *dev)
 	stmmac_reset_queues_param(priv);
 
 	stmmac_free_tx_skbufs(priv);
-	stmmac_clear_descriptors(priv);
 
 	stmmac_hw_setup(ndev, false);
 	stmmac_init_coalesce(priv);