mbox series

[0/2] meson: Fix IRQ trigger type

Message ID 20181204160447.27869-1-ccaione@baylibre.com (mailing list archive)
Headers show
Series meson: Fix IRQ trigger type | expand

Message

Carlo Caione Dec. 4, 2018, 4:04 p.m. UTC
The wrong IRQ trigger type for the macirq was causing the connection
speed to drop after a few hours when stress testing the DUT. The fix
seems also to fix another long standing issue with EEE.

The fixes are tested on a AXG board but we think that the same fix is
valid also for all the others Amlogic SoC families.

Carlo Caione (2):
  arm64: dts: meson: Fix IRQ trigger type for macirq
  arm64: dts: meson: Remove eee-broken-1000t quirk

 arch/arm/boot/dts/meson.dtsi                        | 2 +-
 arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
 arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
 arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
 arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
 arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
 arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
 7 files changed, 3 insertions(+), 7 deletions(-)

Comments

Martin Blumenstingl Dec. 4, 2018, 7:59 p.m. UTC | #1
adding Emiliano because he experienced high packet loss on Odroid-C1
without "eee-broken-1000t"

On Tue, Dec 4, 2018 at 5:05 PM Carlo Caione <ccaione@baylibre.com> wrote:
>
> The wrong IRQ trigger type for the macirq was causing the connection
> speed to drop after a few hours when stress testing the DUT. The fix
> seems also to fix another long standing issue with EEE.
the other two DesignWare controllers (2x dwc2) are also using
IRQ_TYPE_LEVEL_HIGH
so this is not unlikely - good job detective!

> The fixes are tested on a AXG board but we think that the same fix is
> valid also for all the others Amlogic SoC families.
I checked Amlogic's 3.10 kernel for the 32-bit SoCs and it seems they
are setting all IRQs to be edge triggered: [0]
however, Emiliano reported an issue with IRQ_TYPE_EDGE_RISING for the
dwc2 controllers as well. 291f45dd6da5fa6 "ARM: dts: meson: fixing USB
support on Meson6, Meson8 and Meson8b" fixed it for him whereas it
worked for me with IRQ_TYPE_EDGE_RISING

I find it strange though that Amlogic's buildroot kernel (even the
latest buildroot_openlinux_kernel_4.9_fbdev_20180706) uses:
  interrupts = <0 8 1>
which translates to:
  interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>

does the datasheet give a hint that this IRQ should be level triggered
or did you find out by trial and error?

> Carlo Caione (2):
>   arm64: dts: meson: Fix IRQ trigger type for macirq
>   arm64: dts: meson: Remove eee-broken-1000t quirk
>
>  arch/arm/boot/dts/meson.dtsi                        | 2 +-
>  arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
these two should be in separate patches with "ARM: dts: " as prefix

>  arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
>  arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
>  arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
>  arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
>  arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
>  7 files changed, 3 insertions(+), 7 deletions(-)
>
> --
> 2.19.1
>

Regards
Martin

[0] https://github.com/endlessm/linux-meson/blob/cd4096c3ff4eb5b8a8a5581bb46508601c5470dc/drivers/irqchip/irq-gic.c#L400
Carlo Caione Dec. 4, 2018, 8:51 p.m. UTC | #2
On Tue, 2018-12-04 at 20:59 +0100, Martin Blumenstingl wrote:
> adding Emiliano because he experienced high packet loss on Odroid-C1
> without "eee-broken-1000t"

Yes, it would be nice to have confirmation. I tested this using an AXG
board with iperf3 both in TX/RX.

> I find it strange though that Amlogic's buildroot kernel (even the
> latest buildroot_openlinux_kernel_4.9_fbdev_20180706) uses:
>   interrupts = <0 8 1>
> which translates to:
>   interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>
> 
> does the datasheet give a hint that this IRQ should be level
> triggered
> or did you find out by trial and error?

The datasheet says nothing about that. Looking at the GMAC registers
you can get some clues but we confirmed that with long lasting tests.

> > Carlo Caione (2):
> >   arm64: dts: meson: Fix IRQ trigger type for macirq
> >   arm64: dts: meson: Remove eee-broken-1000t quirk
> > 
> >  arch/arm/boot/dts/meson.dtsi                        | 2 +-
> >  arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
> these two should be in separate patches with "ARM: dts: " as prefix

Right, they slipped in after I had already written the commit message.

Kevin, Neil, let me push a V2 so that I can fix the commit messages.

> >  arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
> >  arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
> >  arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
> >  arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
> >  arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
> >  7 files changed, 3 insertions(+), 7 deletions(-)
> > 
> > --
> > 2.19.1

Cheers,

--
Carlo Caione
Emiliano Ingrassia Dec. 6, 2018, 12:43 p.m. UTC | #3
Hi all,

thank you for involving me.

I applied Carlo's patches[0] on a kernel vanilla 4.19.6
and tested it with kernel packet generator, monitoring
bandwidth usage with "nload".

All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
with a short ethernet cable directly attached to a laptop with
1G ethernet interface, with "nload" running on the board.

The tests I performed are composed by the following steps:

1) Start packet generator with "rate 1000M" on laptop;

2) Keep packet generator active on the laptop and
   start the packet generator on the board with "rate 1000M";

3) Stop both packet generators;

4) Start packet generator on the board;

5) Keep packet generator active on the board and
   start the packet generator on the laptop.


Test results without Carlo's patches applied:

1) "nload" shows an incoming traffic of ~950Mbps;

2) "nload" shows an incoming traffic of ~400Mbps
   and an outgoing traffic of ~250Mbps;

3) "nload" shows 0Mbps both for incoming and outgoing traffic;

4) "nload" shows an outgoing traffic of ~950Mbps from the board;

5) "nload" shows incoming traffic of 0Mbps
   and an outgoing traffic of ~950Mbps.

Applying only the first patch (change mac IRQ type) I got the same results.

Applying only the second patch (drop eee-broken-1000t) I got the same results!

With both patches applied I got the same results but with an incoming traffic
of ~3Mbps on the board.

Consider that the described tests were performed for a few minutes.


The tests I performed clearly show that currently the MAC does not
perform as 1G full-duplex.
I can't say if this depends on the hardware, the driver or
the IP description in the board's device tree.

From the results shown above I think that the patches regarding 32 bit
Meson SoCs should NOT be applied together, but you can consider to apply
only the second one which remove the "eee-broken-1000t" flag
from the board MAC IP description.
In particular, I think that more tests are needed to better understand
what's happening in the case of Meson8b SoC.

To better investigate the MAC behaviour on Odroid-C1+, should I use
the Amlogic development kernel[1]? If yes, what branch should I use?


On Tue, Dec 04, 2018 at 08:59:20PM +0100, Martin Blumenstingl wrote:
> adding Emiliano because he experienced high packet loss on Odroid-C1
> without "eee-broken-1000t"
>
> On Tue, Dec 4, 2018 at 5:05 PM Carlo Caione <ccaione@baylibre.com> wrote:
> >
> > The wrong IRQ trigger type for the macirq was causing the connection
> > speed to drop after a few hours when stress testing the DUT. The fix
> > seems also to fix another long standing issue with EEE.


Carlo, can you describe precisely the tests you conducted
on your board and the tools used?


> the other two DesignWare controllers (2x dwc2) are also using
> IRQ_TYPE_LEVEL_HIGH
> so this is not unlikely - good job detective!
>

Consider that currently the USB ports do not work correctly.
In particular, USB pendrive insertion is not recognized at runtime.


> > The fixes are tested on a AXG board but we think that the same fix is
> > valid also for all the others Amlogic SoC families.
> I checked Amlogic's 3.10 kernel for the 32-bit SoCs and it seems they
> are setting all IRQs to be edge triggered: [0]
> however, Emiliano reported an issue with IRQ_TYPE_EDGE_RISING for the
> dwc2 controllers as well. 291f45dd6da5fa6 "ARM: dts: meson: fixing USB
> support on Meson6, Meson8 and Meson8b" fixed it for him whereas it
> worked for me with IRQ_TYPE_EDGE_RISING
>
> I find it strange though that Amlogic's buildroot kernel (even the
> latest buildroot_openlinux_kernel_4.9_fbdev_20180706) uses:
>   interrupts = <0 8 1>
> which translates to:
>   interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>
>
> does the datasheet give a hint that this IRQ should be level triggered
> or did you find out by trial and error?
>
> > Carlo Caione (2):
> >   arm64: dts: meson: Fix IRQ trigger type for macirq
> >   arm64: dts: meson: Remove eee-broken-1000t quirk
> >
> >  arch/arm/boot/dts/meson.dtsi                        | 2 +-
> >  arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
> these two should be in separate patches with "ARM: dts: " as prefix
>
> >  arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
> >  arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
> >  arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
> >  arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
> >  arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
> >  7 files changed, 3 insertions(+), 7 deletions(-)
> >
> > --
> > 2.19.1
> >
>
> Regards
> Martin
>
> [0] https://github.com/endlessm/linux-meson/blob/cd4096c3ff4eb5b8a8a5581bb46508601c5470dc/drivers/irqchip/irq-gic.c#L400
>
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic

Best regards,

Emiliano

[0] http://lists.infradead.org/pipermail/linux-amlogic/2018-December/009325.html
[1] https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/
Carlo Caione Dec. 6, 2018, 1:17 p.m. UTC | #4
On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> Hi all,

Hi Emiliano,

> thank you for involving me.
> 
> I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> and tested it with kernel packet generator, monitoring
> bandwidth usage with "nload".
> 
> All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> with a short ethernet cable directly attached to a laptop with
> 1G ethernet interface, with "nload" running on the board.
> 
> The tests I performed are composed by the following steps:
> 
> 1) Start packet generator with "rate 1000M" on laptop;
> 
> 2) Keep packet generator active on the laptop and
>    start the packet generator on the board with "rate 1000M";
> 
> 3) Stop both packet generators;
> 
> 4) Start packet generator on the board;
> 
> 5) Keep packet generator active on the board and
>    start the packet generator on the laptop.

out of curiosity: why do you expect to see something different from
point (2)?

> Test results without Carlo's patches applied:
> 
> 1) "nload" shows an incoming traffic of ~950Mbps;
> 
> 2) "nload" shows an incoming traffic of ~400Mbps
>    and an outgoing traffic of ~250Mbps;
> 
> 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> 
> 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> 
> 5) "nload" shows incoming traffic of 0Mbps
>    and an outgoing traffic of ~950Mbps.
> 
> Applying only the first patch (change mac IRQ type) I got the same
> results.

This is expected. The change in the IRQ type is solving an issue that
you can see if the run a stress test involving multiple components for
several hours.

> Applying only the second patch (drop eee-broken-1000t) I got the same
> results!

I am a bit confused here. Wasn't the eee-broken-1000t added to fix a
problem with the ethernet? Are you suggesting that for some reason you
cannot reproduce anymore the problem for which the quirk was
introduced?

> With both patches applied I got the same results but with an incoming
> traffic
> of ~3Mbps on the board.

On all the tests and immediately from the start of the tests?

When you hit the problem con you check in /proc/interrupts if you see
the IRQ counter for the eth0 incrementing or not?

Cheers,

--
Carlo Caione
Jerome Brunet Dec. 6, 2018, 1:26 p.m. UTC | #5
On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> Hi all,
> 
> thank you for involving me.
> 
> I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> and tested it with kernel packet generator, monitoring
> bandwidth usage with "nload".
> 
> All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> with a short ethernet cable directly attached to a laptop with
> 1G ethernet interface, with "nload" running on the board.
> 
> The tests I performed are composed by the following steps:
> 
> 1) Start packet generator with "rate 1000M" on laptop;
> 
> 2) Keep packet generator active on the laptop and
>    start the packet generator on the board with "rate 1000M";
> 
> 3) Stop both packet generators;
> 
> 4) Start packet generator on the board;
> 
> 5) Keep packet generator active on the board and
>    start the packet generator on the laptop.
> 
> 
> Test results without Carlo's patches applied:
> 
> 1) "nload" shows an incoming traffic of ~950Mbps;
> 
> 2) "nload" shows an incoming traffic of ~400Mbps
>    and an outgoing traffic of ~250Mbps;
> 
> 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> 
> 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> 
> 5) "nload" shows incoming traffic of 0Mbps
>    and an outgoing traffic of ~950Mbps.
> 
> Applying only the first patch (change mac IRQ type) I got the same results.
> 
> Applying only the second patch (drop eee-broken-1000t) I got the same
> results!
> 
> With both patches applied I got the same results but with an incoming
> traffic
> of ~3Mbps on the board.

Are you sure you did not mix up the result ?
I would expect this kind of drop when only the eee patch is applied.

> 
> Consider that the described tests were performed for a few minutes.
> 
> 
> The tests I performed clearly show that currently the MAC does not
> perform as 1G full-duplex.

Do you really get 1G full duplex w/o any of these patch ?
I would be surprised if they had any meaningful impact on throughput


> I can't say if this depends on the hardware, the driver or
> the IP description in the board's device tree.
> 
> From the results shown above I think that the patches regarding 32 bit
> Meson SoCs should NOT be applied together, but you can consider to apply
> only the second one which remove the "eee-broken-1000t" flag
> from the board MAC IP description.

I would defenitely advise against that.

> In particular, I think that more tests are needed to better understand
> what's happening in the case of Meson8b SoC.
> 
> To better investigate the MAC behaviour on Odroid-C1+, should I use
> the Amlogic development kernel[1]? If yes, what branch should I use?

And bit of background:
The MAC found in all Amlogic SoC we have seen so far comes from Synopsys
(dwmac).

The kernel provided by the vendor use the IRQ type 'EDGE_RISING' for this IP
This means that the HW block is supposed to generate a rising edge on the irq
line every time there is an event. This is opposed to the Type "LEVEL_HIGH"
with keep the irq line high as long as their pending IRQs

Of course, when adding mainline support, we did the same as the vendor without
thinking about it

We started to investigate the network because, after a while, we noticed
severe performance drops on the AXG family: the throughput would drop from
900MBps to 30MBps after somethings 12+ hours of iperf tests.

We noticed that irqs were not triggered anymore. Manually acking the IRQ in
the register would revive the interface. Since the IRQ is supposed to be acked
in the ISR, we were clearly missing IRQs and as a consequence, never acking
them.

All HW using the dwmac out there are using "LEVEL_HIGH", except amlogic.
Changing this fixes the problem.

Now regarding EEE: about 2 years ago, the network would break on the OC-2. We
noticed the EEE was generating a *LOT* of IRQs. Deactivating EEE solved the
problem ... or so we thought. Fact is, it was an un-acked IRQ as well, and we
just made it harder to trigger by disabling EEE.

So applying the EEE patch without the IRQ_LEVEL would clearly be a mistake,
you would be back in the situation we investigated 2 years, with a very
unstable ethernet connection.

Anyways, I have been able to test it on S905 and A113 and I think this series
should applied, at least for the arm64 family ... most likely of all.

If issues persist on meson8, maybe there is something else ? soemthing hidden
before ?

> 
> 
> On Tue, Dec 04, 2018 at 08:59:20PM +0100, Martin Blumenstingl wrote:
> > adding Emiliano because he experienced high packet loss on Odroid-C1
> > without "eee-broken-1000t"
> > 
> > On Tue, Dec 4, 2018 at 5:05 PM Carlo Caione <ccaione@baylibre.com> wrote:
> > > The wrong IRQ trigger type for the macirq was causing the connection
> > > speed to drop after a few hours when stress testing the DUT. The fix
> > > seems also to fix another long standing issue with EEE.
> 
> Carlo, can you describe precisely the tests you conducted
> on your board and the tools used?
> 
> 
> > the other two DesignWare controllers (2x dwc2) are also using
> > IRQ_TYPE_LEVEL_HIGH
> > so this is not unlikely - good job detective!
> > 
> 
> Consider that currently the USB ports do not work correctly.
> In particular, USB pendrive insertion is not recognized at runtime.
> 
> 
> > > The fixes are tested on a AXG board but we think that the same fix is
> > > valid also for all the others Amlogic SoC families.
> > I checked Amlogic's 3.10 kernel for the 32-bit SoCs and it seems they
> > are setting all IRQs to be edge triggered: [0]
> > however, Emiliano reported an issue with IRQ_TYPE_EDGE_RISING for the
> > dwc2 controllers as well. 291f45dd6da5fa6 "ARM: dts: meson: fixing USB
> > support on Meson6, Meson8 and Meson8b" fixed it for him whereas it
> > worked for me with IRQ_TYPE_EDGE_RISING
> > 
> > I find it strange though that Amlogic's buildroot kernel (even the
> > latest buildroot_openlinux_kernel_4.9_fbdev_20180706) uses:
> >   interrupts = <0 8 1>
> > which translates to:
> >   interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>
> > 
> > does the datasheet give a hint that this IRQ should be level triggered
> > or did you find out by trial and error?
> > 
> > > Carlo Caione (2):
> > >   arm64: dts: meson: Fix IRQ trigger type for macirq
> > >   arm64: dts: meson: Remove eee-broken-1000t quirk
> > > 
> > >  arch/arm/boot/dts/meson.dtsi                        | 2 +-
> > >  arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
> > these two should be in separate patches with "ARM: dts: " as prefix
> > 
> > >  arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
> > >  arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
> > >  arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
> > >  arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
> > >  arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
> > >  7 files changed, 3 insertions(+), 7 deletions(-)
> > > 
> > > --
> > > 2.19.1
> > > 
> > 
> > Regards
> > Martin
> > 
> > [0] 
> > https://github.com/endlessm/linux-meson/blob/cd4096c3ff4eb5b8a8a5581bb46508601c5470dc/drivers/irqchip/irq-gic.c#L400
> > 
> > _______________________________________________
> > linux-amlogic mailing list
> > linux-amlogic@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-amlogic
> 
> Best regards,
> 
> Emiliano
> 
> [0] 
> http://lists.infradead.org/pipermail/linux-amlogic/2018-December/009325.html
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/
> 
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic
Jerome Brunet Dec. 6, 2018, 1:26 p.m. UTC | #6
On Tue, 2018-12-04 at 16:04 +0000, Carlo Caione wrote:
> The wrong IRQ trigger type for the macirq was causing the connection
> speed to drop after a few hours when stress testing the DUT. The fix
> seems also to fix another long standing issue with EEE.
> 
> The fixes are tested on a AXG board but we think that the same fix is
> valid also for all the others Amlogic SoC families.
> 
> Carlo Caione (2):
>   arm64: dts: meson: Fix IRQ trigger type for macirq
>   arm64: dts: meson: Remove eee-broken-1000t quirk
> 
>  arch/arm/boot/dts/meson.dtsi                        | 2 +-
>  arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
>  arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
>  arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
>  arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
>  arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
>  arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
>  7 files changed, 3 insertions(+), 7 deletions(-)
> 

Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>

on the S905 odroid-c2 and A113 s400
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Emiliano Ingrassia Dec. 6, 2018, 3:52 p.m. UTC | #7
Hi Carlo,

thanks for the answer.

On Thu, Dec 06, 2018 at 01:17:58PM +0000, Carlo Caione wrote:
> On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> > Hi all,
>
> Hi Emiliano,
>
> > thank you for involving me.
> >
> > I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> > and tested it with kernel packet generator, monitoring
> > bandwidth usage with "nload".
> >
> > All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> > with a short ethernet cable directly attached to a laptop with
> > 1G ethernet interface, with "nload" running on the board.
> >
> > The tests I performed are composed by the following steps:
> >
> > 1) Start packet generator with "rate 1000M" on laptop;
> >
> > 2) Keep packet generator active on the laptop and
> >    start the packet generator on the board with "rate 1000M";
> >
> > 3) Stop both packet generators;
> >
> > 4) Start packet generator on the board;
> >
> > 5) Keep packet generator active on the board and
> >    start the packet generator on the laptop.
>
> out of curiosity: why do you expect to see something different from
> point (2)?
>

I did not expect it indeed, I tried and got different results.

> > Test results without Carlo's patches applied:
> >
> > 1) "nload" shows an incoming traffic of ~950Mbps;
> >
> > 2) "nload" shows an incoming traffic of ~400Mbps
> >    and an outgoing traffic of ~250Mbps;
> >
> > 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> >
> > 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> >
> > 5) "nload" shows incoming traffic of 0Mbps
> >    and an outgoing traffic of ~950Mbps.
> >
> > Applying only the first patch (change mac IRQ type) I got the same
> > results.
>
> This is expected. The change in the IRQ type is solving an issue that
> you can see if the run a stress test involving multiple components for
> several hours.
>

OK, did you use "stress-ng" tool for tests?

> > Applying only the second patch (drop eee-broken-1000t) I got the same
> > results!
>
> I am a bit confused here. Wasn't the eee-broken-1000t added to fix a
> problem with the ethernet? Are you suggesting that for some reason you
> cannot reproduce anymore the problem for which the quirk was
> introduced?
>

Problems without the "eee-broken-1000t" flags were experimented
one and a half years ago on a Amlogic development kernel from [0],
probably a 4.14 version.
Many patches about Meson8b SoC, dwmac-meson8b and dwmac driver
were introduced so yes, the "eee-broken-1000t" was added
to fix a problem with the ethernet (one and a half years ago),
but new tests are needed to say if it still necessary.

> > With both patches applied I got the same results but with an incoming
> > traffic
> > of ~3Mbps on the board.
>
> On all the tests and immediately from the start of the tests?
>

Yes, in all the 5 steps immediately from the start.

I also tried to execute "nload" on both sides to see the bandwidth
usage.

With bot patches applied, after starting kernel packet generator
on my laptop with 1Gbps rate, "nload" on the laptop side shows me
an outgoing traffic of ~940Mbps while "nload" on the board side shows
me an incoming traffic of ~3Mbps.

Also consider that a pinging test from my laptop to the board shows
a packet loss of about 90%.

> When you hit the problem con you check in /proc/interrupts if you see
> the IRQ counter for the eth0 incrementing or not?
>

The eth0 IRQ counter is incremented during the test.

> Cheers,
>
> --
> Carlo Caione
>
>

I would like to conduct other tests with iperf3 to be sure about
the obtained results. What do you think?
Should I apply your patches on the latest Amlogic development kernel?

Regards,

Emiliano

[0] https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/
Emiliano Ingrassia Dec. 6, 2018, 4:24 p.m. UTC | #8
Hi Jerome,

On Thu, Dec 06, 2018 at 02:26:34PM +0100, Jerome Brunet wrote:
> On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> > Hi all,
> >
> > thank you for involving me.
> >
> > I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> > and tested it with kernel packet generator, monitoring
> > bandwidth usage with "nload".
> >
> > All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> > with a short ethernet cable directly attached to a laptop with
> > 1G ethernet interface, with "nload" running on the board.
> >
> > The tests I performed are composed by the following steps:
> >
> > 1) Start packet generator with "rate 1000M" on laptop;
> >
> > 2) Keep packet generator active on the laptop and
> >    start the packet generator on the board with "rate 1000M";
> >
> > 3) Stop both packet generators;
> >
> > 4) Start packet generator on the board;
> >
> > 5) Keep packet generator active on the board and
> >    start the packet generator on the laptop.
> >
> >
> > Test results without Carlo's patches applied:
> >
> > 1) "nload" shows an incoming traffic of ~950Mbps;
> >
> > 2) "nload" shows an incoming traffic of ~400Mbps
> >    and an outgoing traffic of ~250Mbps;
> >
> > 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> >
> > 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> >
> > 5) "nload" shows incoming traffic of 0Mbps
> >    and an outgoing traffic of ~950Mbps.
> >
> > Applying only the first patch (change mac IRQ type) I got the same results.
> >
> > Applying only the second patch (drop eee-broken-1000t) I got the same
> > results!
> >
> > With both patches applied I got the same results but with an incoming
> > traffic
> > of ~3Mbps on the board.
>
> Are you sure you did not mix up the result ?
> I would expect this kind of drop when only the eee patch is applied.

Yes, I'm sure.

>
> >
> > Consider that the described tests were performed for a few minutes.
> >
> >
> > The tests I performed clearly show that currently the MAC does not
> > perform as 1G full-duplex.
>
> Do you really get 1G full duplex w/o any of these patch ?
> I would be surprised if they had any meaningful impact on throughput

As I wrote in the previous mail, without the two patches applied
I see an incoming traffic on the board of about 460 Mbps and an outgoing
of 256 Mbps.
On the laptop side I see an outgoing of about 940 Mbps and an incoming
of about 256 Mbps.
So it seems that the board (without the Carlo's patches) is losing traffic.
I'll keep investigating to see if they can solve this problem.

>
> > I can't say if this depends on the hardware, the driver or
> > the IP description in the board's device tree.
> >
> > From the results shown above I think that the patches regarding 32 bit
> > Meson SoCs should NOT be applied together, but you can consider to apply
> > only the second one which remove the "eee-broken-1000t" flag
> > from the board MAC IP description.
>
> I would defenitely advise against that.
>
> > In particular, I think that more tests are needed to better understand
> > what's happening in the case of Meson8b SoC.
> >
> > To better investigate the MAC behaviour on Odroid-C1+, should I use
> > the Amlogic development kernel[1]? If yes, what branch should I use?
>
> And bit of background:
> The MAC found in all Amlogic SoC we have seen so far comes from Synopsys
> (dwmac).

Yes, I know. I was referring to the patches regarding Meson8b SoCs
currently not in mainline that possibly can impact this problem.

>
> The kernel provided by the vendor use the IRQ type 'EDGE_RISING' for this IP
> This means that the HW block is supposed to generate a rising edge on the irq
> line every time there is an event. This is opposed to the Type "LEVEL_HIGH"
> with keep the irq line high as long as their pending IRQs
>
> Of course, when adding mainline support, we did the same as the vendor without
> thinking about it
>
> We started to investigate the network because, after a while, we noticed
> severe performance drops on the AXG family: the throughput would drop from
> 900MBps to 30MBps after somethings 12+ hours of iperf tests.
>
> We noticed that irqs were not triggered anymore. Manually acking the IRQ in
> the register would revive the interface. Since the IRQ is supposed to be acked
> in the ISR, we were clearly missing IRQs and as a consequence, never acking
> them.
>
> All HW using the dwmac out there are using "LEVEL_HIGH", except amlogic.
> Changing this fixes the problem.
>
> Now regarding EEE: about 2 years ago, the network would break on the OC-2. We
> noticed the EEE was generating a *LOT* of IRQs. Deactivating EEE solved the
> problem ... or so we thought. Fact is, it was an un-acked IRQ as well, and we
> just made it harder to trigger by disabling EEE.
>
> So applying the EEE patch without the IRQ_LEVEL would clearly be a mistake,
> you would be back in the situation we investigated 2 years, with a very
> unstable ethernet connection.
>
> Anyways, I have been able to test it on S905 and A113 and I think this series
> should applied, at least for the arm64 family ... most likely of all.
>
> If issues persist on meson8, maybe there is something else ? soemthing hidden
> before ?
>
> >
> >
> > On Tue, Dec 04, 2018 at 08:59:20PM +0100, Martin Blumenstingl wrote:
> > > adding Emiliano because he experienced high packet loss on Odroid-C1
> > > without "eee-broken-1000t"
> > >
> > > On Tue, Dec 4, 2018 at 5:05 PM Carlo Caione <ccaione@baylibre.com> wrote:
> > > > The wrong IRQ trigger type for the macirq was causing the connection
> > > > speed to drop after a few hours when stress testing the DUT. The fix
> > > > seems also to fix another long standing issue with EEE.
> >
> > Carlo, can you describe precisely the tests you conducted
> > on your board and the tools used?
> >
> >
> > > the other two DesignWare controllers (2x dwc2) are also using
> > > IRQ_TYPE_LEVEL_HIGH
> > > so this is not unlikely - good job detective!
> > >
> >
> > Consider that currently the USB ports do not work correctly.
> > In particular, USB pendrive insertion is not recognized at runtime.
> >
> >
> > > > The fixes are tested on a AXG board but we think that the same fix is
> > > > valid also for all the others Amlogic SoC families.
> > > I checked Amlogic's 3.10 kernel for the 32-bit SoCs and it seems they
> > > are setting all IRQs to be edge triggered: [0]
> > > however, Emiliano reported an issue with IRQ_TYPE_EDGE_RISING for the
> > > dwc2 controllers as well. 291f45dd6da5fa6 "ARM: dts: meson: fixing USB
> > > support on Meson6, Meson8 and Meson8b" fixed it for him whereas it
> > > worked for me with IRQ_TYPE_EDGE_RISING
> > >
> > > I find it strange though that Amlogic's buildroot kernel (even the
> > > latest buildroot_openlinux_kernel_4.9_fbdev_20180706) uses:
> > >   interrupts = <0 8 1>
> > > which translates to:
> > >   interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>
> > >
> > > does the datasheet give a hint that this IRQ should be level triggered
> > > or did you find out by trial and error?
> > >
> > > > Carlo Caione (2):
> > > >   arm64: dts: meson: Fix IRQ trigger type for macirq
> > > >   arm64: dts: meson: Remove eee-broken-1000t quirk
> > > >
> > > >  arch/arm/boot/dts/meson.dtsi                        | 2 +-
> > > >  arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
> > > these two should be in separate patches with "ARM: dts: " as prefix
> > >
> > > >  arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
> > > >  arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
> > > >  arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
> > > >  arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
> > > >  arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
> > > >  7 files changed, 3 insertions(+), 7 deletions(-)
> > > >
> > > > --
> > > > 2.19.1
> > > >
> > >
> > > Regards
> > > Martin
> > >
> > > [0]
> > > https://github.com/endlessm/linux-meson/blob/cd4096c3ff4eb5b8a8a5581bb46508601c5470dc/drivers/irqchip/irq-gic.c#L400
> > >
> > > _______________________________________________
> > > linux-amlogic mailing list
> > > linux-amlogic@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-amlogic
> >
> > Best regards,
> >
> > Emiliano
> >
> > [0]
> > http://lists.infradead.org/pipermail/linux-amlogic/2018-December/009325.html
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/
> >
> > _______________________________________________
> > linux-amlogic mailing list
> > linux-amlogic@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-amlogic
>
>

Emiliano
Emiliano Ingrassia Dec. 6, 2018, 6:51 p.m. UTC | #9
Hi Carlo,

I keep running tests with packet generator,
using nload to show bandwidth usage.

Here is my test results with packet generators
both running on laptop and board with rate 1 Gbps.

TEST 0: no patch applied.

1) Start packet generator on laptop:

              | incoming traffic | outgoing traffic
=====================================================
nload (board) |     ~940 Mbps    |       0 Mbps
-----------------------------------------------------
nload (laptop)|       0 Mbps     |     ~940 Mbps
=====================================================

2) Start packet generator on board:

              | incoming traffic  | outgoing traffic
==============+===================+==================
nload (board) |     ~460 Mbps     |     ~256 Mbps
--------------+-------------------+------------------
nload (laptop)|     ~256 Mbps     |     ~940 Mbps
=====================================================

3) Stop packet generator on laptop:

              | incoming traffic | outgoing traffic
=====================================================
nload (board) |       0 Mbps     |    ~940 Mbps
-----------------------------------------------------
nload (laptop)|       ~940 Mbps  |      0 Mbps
=====================================================

4) Restart packet generator on laptop:

              | incoming traffic | outgoing traffic
=====================================================
nload (board) |     ~0 Mbps      |     ~940 Mbps
-----------------------------------------------------
nload (laptop)|     ~940 Mbps    |     ~940 Mbps
=====================================================

In the last case the "ifconfig" statistics about RX packets
remain fixed which probably indicates that the incoming traffic
to the board is effectively being dropped.

The eth0 interrupt counter keeps incrementing.
Simple ping test works correctly.


TEST 1: IRQ type patch applied

Same results as TEST 0.


TEST 2: eee-broken-1000t flag removed

1) Start packet generator on laptop:

              | incoming traffic | outgoing traffic
=====================================================
nload (board) |      ~3Mbps      |       0 Mbps
-----------------------------------------------------
nload (laptop)|       0 Mbps     |     ~940 Mbps
=====================================================

2) Start packet generator on board:

              | incoming traffic  | outgoing traffic
==============+===================+==================
nload (board) |     ~0 Mbps       |     ~940 Mbps
--------------+-------------------+------------------
nload (laptop)|     ~940 Mbps     |     ~940 Mbps
=====================================================

3) Stop packet generator on laptop:

              | incoming traffic | outgoing traffic
=====================================================
nload (board) |       0 Mbps     |    ~940 Mbps
-----------------------------------------------------
nload (laptop)|       ~940 Mbps  |      0 Mbps
=====================================================

4) Restart packet generator on laptop:

              | incoming traffic | outgoing traffic
=====================================================
nload (board) |     ~0 Mbps      |     ~940 Mbps
-----------------------------------------------------
nload (laptop)|     ~940 Mbps    |     ~940 Mbps
=====================================================

In the first case the "ifconfig" statistics about RX packets
are incremented consistently with the incoming traffic value
showed by the nload (board side).

In the last case the "ifconfig" statistics about RX packets
remain fixed which probably indicates that the incoming traffic
to the board is effectively being dropped.

The eth0 interrupt counter keeps incrementing.
Simple ping test from laptop to board shows a packet loss
of 90% and more while no packet loss achieved pinging
the laptop from the board.


TEST 3: both patches applied.

Same results as TEST 2.


From the results obtained from these tests,
which are more accurate than the previous one,
I can say that the second patch (remove eee-broken-1000t flag)
should NOT be applied.

About the first one (change MAC IRQ type), I would like
to do other tests with other tools like iperf3.
With these results only, I would say to not apply it
because nothing changed but if your stress test failed on
long running and this patch fix it I would like to test it more deeply.

As final thought, the conducted tests clearly show that if the board
transmits at full rate, all the incoming traffic is dropped.
I think that this behaviour should be fixed but don't know if
it depends on the driver or device tree description.
I'll keep investigating.

Regards,

Emiliano

On Thu, Dec 06, 2018 at 04:52:28PM +0100, Emiliano Ingrassia wrote:
> Hi Carlo,
>
> thanks for the answer.
>
> On Thu, Dec 06, 2018 at 01:17:58PM +0000, Carlo Caione wrote:
> > On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> > > Hi all,
> >
> > Hi Emiliano,
> >
> > > thank you for involving me.
> > >
> > > I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> > > and tested it with kernel packet generator, monitoring
> > > bandwidth usage with "nload".
> > >
> > > All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> > > with a short ethernet cable directly attached to a laptop with
> > > 1G ethernet interface, with "nload" running on the board.
> > >
> > > The tests I performed are composed by the following steps:
> > >
> > > 1) Start packet generator with "rate 1000M" on laptop;
> > >
> > > 2) Keep packet generator active on the laptop and
> > >    start the packet generator on the board with "rate 1000M";
> > >
> > > 3) Stop both packet generators;
> > >
> > > 4) Start packet generator on the board;
> > >
> > > 5) Keep packet generator active on the board and
> > >    start the packet generator on the laptop.
> >
> > out of curiosity: why do you expect to see something different from
> > point (2)?
> >
>
> I did not expect it indeed, I tried and got different results.
>
> > > Test results without Carlo's patches applied:
> > >
> > > 1) "nload" shows an incoming traffic of ~950Mbps;
> > >
> > > 2) "nload" shows an incoming traffic of ~400Mbps
> > >    and an outgoing traffic of ~250Mbps;
> > >
> > > 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> > >
> > > 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> > >
> > > 5) "nload" shows incoming traffic of 0Mbps
> > >    and an outgoing traffic of ~950Mbps.
> > >
> > > Applying only the first patch (change mac IRQ type) I got the same
> > > results.
> >
> > This is expected. The change in the IRQ type is solving an issue that
> > you can see if the run a stress test involving multiple components for
> > several hours.
> >
>
> OK, did you use "stress-ng" tool for tests?
>
> > > Applying only the second patch (drop eee-broken-1000t) I got the same
> > > results!
> >
> > I am a bit confused here. Wasn't the eee-broken-1000t added to fix a
> > problem with the ethernet? Are you suggesting that for some reason you
> > cannot reproduce anymore the problem for which the quirk was
> > introduced?
> >
>
> Problems without the "eee-broken-1000t" flags were experimented
> one and a half years ago on a Amlogic development kernel from [0],
> probably a 4.14 version.
> Many patches about Meson8b SoC, dwmac-meson8b and dwmac driver
> were introduced so yes, the "eee-broken-1000t" was added
> to fix a problem with the ethernet (one and a half years ago),
> but new tests are needed to say if it still necessary.
>
> > > With both patches applied I got the same results but with an incoming
> > > traffic
> > > of ~3Mbps on the board.
> >
> > On all the tests and immediately from the start of the tests?
> >
>
> Yes, in all the 5 steps immediately from the start.
>
> I also tried to execute "nload" on both sides to see the bandwidth
> usage.
>
> With bot patches applied, after starting kernel packet generator
> on my laptop with 1Gbps rate, "nload" on the laptop side shows me
> an outgoing traffic of ~940Mbps while "nload" on the board side shows
> me an incoming traffic of ~3Mbps.
>
> Also consider that a pinging test from my laptop to the board shows
> a packet loss of about 90%.
>
> > When you hit the problem con you check in /proc/interrupts if you see
> > the IRQ counter for the eth0 incrementing or not?
> >
>
> The eth0 IRQ counter is incremented during the test.
>
> > Cheers,
> >
> > --
> > Carlo Caione
> >
> >
>
> I would like to conduct other tests with iperf3 to be sure about
> the obtained results. What do you think?
> Should I apply your patches on the latest Amlogic development kernel?
>
> Regards,
>
> Emiliano
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/

Cheers,

Emiliano
Kevin Hilman Dec. 7, 2018, 4:17 a.m. UTC | #10
Carlo Caione <ccaione@baylibre.com> writes:

> The wrong IRQ trigger type for the macirq was causing the connection
> speed to drop after a few hours when stress testing the DUT. The fix
> seems also to fix another long standing issue with EEE.
>
> The fixes are tested on a AXG board but we think that the same fix is
> valid also for all the others Amlogic SoC families.

For broader testing, I've created a v4.21/dt64-testing branch (on top of
the current v4.21/dt64 branch) which I've merged into the integ branch
so it gets a spin through kernelCI.org, and others can easily test the
integ branch which has all the other stuff currently queued for v4.21.

I´ll wait for the discussions to settle down and testing results before
deciding whether to officially queue this for v4.21.

Kevin
Jerome Brunet Dec. 7, 2018, 10:49 a.m. UTC | #11
On Thu, 2018-12-06 at 19:51 +0100, Emiliano Ingrassia wrote:
> Hi Carlo,
> 
> I keep running tests with packet generator,
> using nload to show bandwidth usage.
> 
> Here is my test results with packet generators
> both running on laptop and board with rate 1 Gbps.

Testing in UDP is unlikely to give us clear picture of anything for this sort
of fixes.

All your test seems in show it the fact the Amlogic SoC usually prioritize the
TX traffic over RX, which is something we've known about for a while.

It would be helpful if you could provide TCP figures with a traffic generator
we can all share an inspect, such as iperf3

Finally, Your test do not show the original issue regarding EEE. So the work
around we put (yes, it was never considered a solution) for it should not be
kept IHMO. Your numbers for EEE may be due to the way the traffic is generated
and the PHY entering LPI and taking a bit of time to exit it. Again UDP is not
very helpful to characterize this.

> 
> TEST 0: no patch applied.
> 
> 1) Start packet generator on laptop:
> 
>               | incoming traffic | outgoing traffic
> =====================================================
> nload (board) |     ~940 Mbps    |       0 Mbps
> -----------------------------------------------------
> nload (laptop)|       0 Mbps     |     ~940 Mbps
> =====================================================
> 
> 2) Start packet generator on board:
> 
>               | incoming traffic  | outgoing traffic
> ==============+===================+==================
> nload (board) |     ~460 Mbps     |     ~256 Mbps
> --------------+-------------------+------------------
> nload (laptop)|     ~256 Mbps     |     ~940 Mbps
> =====================================================
> 
> 3) Stop packet generator on laptop:
> 
>               | incoming traffic | outgoing traffic
> =====================================================
> nload (board) |       0 Mbps     |    ~940 Mbps
> -----------------------------------------------------
> nload (laptop)|       ~940 Mbps  |      0 Mbps
> =====================================================
> 
> 4) Restart packet generator on laptop:
> 
>               | incoming traffic | outgoing traffic
> =====================================================
> nload (board) |     ~0 Mbps      |     ~940 Mbps
> -----------------------------------------------------
> nload (laptop)|     ~940 Mbps    |     ~940 Mbps
> =====================================================
> 
> In the last case the "ifconfig" statistics about RX packets
> remain fixed which probably indicates that the incoming traffic
> to the board is effectively being dropped.
> 
> The eth0 interrupt counter keeps incrementing.
> Simple ping test works correctly.
> 
> 
> TEST 1: IRQ type patch applied
> 
> Same results as TEST 0.
> 
> 
> TEST 2: eee-broken-1000t flag removed
> 
> 1) Start packet generator on laptop:
> 
>               | incoming traffic | outgoing traffic
> =====================================================
> nload (board) |      ~3Mbps      |       0 Mbps
> -----------------------------------------------------
> nload (laptop)|       0 Mbps     |     ~940 Mbps
> =====================================================
> 
> 2) Start packet generator on board:
> 
>               | incoming traffic  | outgoing traffic
> ==============+===================+==================
> nload (board) |     ~0 Mbps       |     ~940 Mbps
> --------------+-------------------+------------------
> nload (laptop)|     ~940 Mbps     |     ~940 Mbps
> =====================================================
> 
> 3) Stop packet generator on laptop:
> 
>               | incoming traffic | outgoing traffic
> =====================================================
> nload (board) |       0 Mbps     |    ~940 Mbps
> -----------------------------------------------------
> nload (laptop)|       ~940 Mbps  |      0 Mbps
> =====================================================
> 
> 4) Restart packet generator on laptop:
> 
>               | incoming traffic | outgoing traffic
> =====================================================
> nload (board) |     ~0 Mbps      |     ~940 Mbps
> -----------------------------------------------------
> nload (laptop)|     ~940 Mbps    |     ~940 Mbps
> =====================================================
> 
> In the first case the "ifconfig" statistics about RX packets
> are incremented consistently with the incoming traffic value
> showed by the nload (board side).
> 
> In the last case the "ifconfig" statistics about RX packets
> remain fixed which probably indicates that the incoming traffic
> to the board is effectively being dropped.
> 
> The eth0 interrupt counter keeps incrementing.
> Simple ping test from laptop to board shows a packet loss
> of 90% and more while no packet loss achieved pinging
> the laptop from the board.
> 
> 
> TEST 3: both patches applied.
> 
> Same results as TEST 2.
> 
> 
> From the results obtained from these tests,
> which are more accurate than the previous one,
> I can say that the second patch (remove eee-broken-1000t flag)
> should NOT be applied.
> 
> About the first one (change MAC IRQ type), I would like
> to do other tests with other tools like iperf3.
> With these results only, I would say to not apply it
> because nothing changed but if your stress test failed on
> long running and this patch fix it I would like to test it more deeply.
> 
> As final thought, the conducted tests clearly show that if the board
> transmits at full rate, all the incoming traffic is dropped.
> I think that this behaviour should be fixed but don't know if
> it depends on the driver or device tree description.
> I'll keep investigating.
> 
> Regards,
> 
> Emiliano
> 
> On Thu, Dec 06, 2018 at 04:52:28PM +0100, Emiliano Ingrassia wrote:
> > Hi Carlo,
> > 
> > thanks for the answer.
> > 
> > On Thu, Dec 06, 2018 at 01:17:58PM +0000, Carlo Caione wrote:
> > > On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> > > > Hi all,
> > > 
> > > Hi Emiliano,
> > > 
> > > > thank you for involving me.
> > > > 
> > > > I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> > > > and tested it with kernel packet generator, monitoring
> > > > bandwidth usage with "nload".
> > > > 
> > > > All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> > > > with a short ethernet cable directly attached to a laptop with
> > > > 1G ethernet interface, with "nload" running on the board.
> > > > 
> > > > The tests I performed are composed by the following steps:
> > > > 
> > > > 1) Start packet generator with "rate 1000M" on laptop;
> > > > 
> > > > 2) Keep packet generator active on the laptop and
> > > >    start the packet generator on the board with "rate 1000M";
> > > > 
> > > > 3) Stop both packet generators;
> > > > 
> > > > 4) Start packet generator on the board;
> > > > 
> > > > 5) Keep packet generator active on the board and
> > > >    start the packet generator on the laptop.
> > > 
> > > out of curiosity: why do you expect to see something different from
> > > point (2)?
> > > 
> > 
> > I did not expect it indeed, I tried and got different results.
> > 
> > > > Test results without Carlo's patches applied:
> > > > 
> > > > 1) "nload" shows an incoming traffic of ~950Mbps;
> > > > 
> > > > 2) "nload" shows an incoming traffic of ~400Mbps
> > > >    and an outgoing traffic of ~250Mbps;
> > > > 
> > > > 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> > > > 
> > > > 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> > > > 
> > > > 5) "nload" shows incoming traffic of 0Mbps
> > > >    and an outgoing traffic of ~950Mbps.
> > > > 
> > > > Applying only the first patch (change mac IRQ type) I got the same
> > > > results.
> > > 
> > > This is expected. The change in the IRQ type is solving an issue that
> > > you can see if the run a stress test involving multiple components for
> > > several hours.
> > > 
> > 
> > OK, did you use "stress-ng" tool for tests?
> > 
> > > > Applying only the second patch (drop eee-broken-1000t) I got the same
> > > > results!
> > > 
> > > I am a bit confused here. Wasn't the eee-broken-1000t added to fix a
> > > problem with the ethernet? Are you suggesting that for some reason you
> > > cannot reproduce anymore the problem for which the quirk was
> > > introduced?
> > > 
> > 
> > Problems without the "eee-broken-1000t" flags were experimented
> > one and a half years ago on a Amlogic development kernel from [0],
> > probably a 4.14 version.
> > Many patches about Meson8b SoC, dwmac-meson8b and dwmac driver
> > were introduced so yes, the "eee-broken-1000t" was added
> > to fix a problem with the ethernet (one and a half years ago),
> > but new tests are needed to say if it still necessary.
> > 
> > > > With both patches applied I got the same results but with an incoming
> > > > traffic
> > > > of ~3Mbps on the board.
> > > 
> > > On all the tests and immediately from the start of the tests?
> > > 
> > 
> > Yes, in all the 5 steps immediately from the start.
> > 
> > I also tried to execute "nload" on both sides to see the bandwidth
> > usage.
> > 
> > With bot patches applied, after starting kernel packet generator
> > on my laptop with 1Gbps rate, "nload" on the laptop side shows me
> > an outgoing traffic of ~940Mbps while "nload" on the board side shows
> > me an incoming traffic of ~3Mbps.
> > 
> > Also consider that a pinging test from my laptop to the board shows
> > a packet loss of about 90%.
> > 
> > > When you hit the problem con you check in /proc/interrupts if you see
> > > the IRQ counter for the eth0 incrementing or not?
> > > 
> > 
> > The eth0 IRQ counter is incremented during the test.
> > 
> > > Cheers,
> > > 
> > > --
> > > Carlo Caione
> > > 
> > > 
> > 
> > I would like to conduct other tests with iperf3 to be sure about
> > the obtained results. What do you think?
> > Should I apply your patches on the latest Amlogic development kernel?
> > 
> > Regards,
> > 
> > Emiliano
> > 
> > [0] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/
> 
> Cheers,
> 
> Emiliano
> 
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic
Carlo Caione Dec. 7, 2018, 11:03 a.m. UTC | #12
On Fri, 2018-12-07 at 11:49 +0100, Jerome Brunet wrote:
> On Thu, 2018-12-06 at 19:51 +0100, Emiliano Ingrassia wrote:
> > Hi Carlo,
> > 
> > I keep running tests with packet generator,
> > using nload to show bandwidth usage.
> > 
> > Here is my test results with packet generators
> > both running on laptop and board with rate 1 Gbps.
> 
> Testing in UDP is unlikely to give us clear picture of anything for
> this sort
> of fixes.
> 
> All your test seems in show it the fact the Amlogic SoC usually
> prioritize the
> TX traffic over RX, which is something we've known about for a while.
> 
> It would be helpful if you could provide TCP figures with a traffic
> generator
> we can all share an inspect, such as iperf3

For reference you can use something like:

iperf3 -c ${IP} -i ${IPERF_LOGPERIOD} -w 400k -p ${PORT} -t ${DURATION}

Cheers,

--
Carlo Caione
Emiliano Ingrassia Dec. 7, 2018, 6:28 p.m. UTC | #13
On Fri, Dec 07, 2018 at 11:49:15AM +0100, Jerome Brunet wrote:
> On Thu, 2018-12-06 at 19:51 +0100, Emiliano Ingrassia wrote:
> > Hi Carlo,
> >
> > I keep running tests with packet generator,
> > using nload to show bandwidth usage.
> >
> > Here is my test results with packet generators
> > both running on laptop and board with rate 1 Gbps.
>
> Testing in UDP is unlikely to give us clear picture of anything for this sort
> of fixes.
>

Why? Would you mind to explain your reasoning?

> All your test seems in show it the fact the Amlogic SoC usually prioritize the
> TX traffic over RX, which is something we've known about for a while.
>

Is that normal and/or acceptable?

> It would be helpful if you could provide TCP figures with a traffic generator
> we can all share an inspect, such as iperf3
>
> Finally, Your test do not show the original issue regarding EEE. So the work
> around we put (yes, it was never considered a solution) for it should not be
> kept IHMO. Your numbers for EEE may be due to the way the traffic is generated
> and the PHY entering LPI and taking a bit of time to exit it. Again UDP is not
> very helpful to characterize this.
>

Did you read my email entirely or just kidding me?

TEST 3 clearly shows that the issue regarding EEE is still there
with both patches applied.

My comment about TEST 3 (same results as TEST 2):
"Simple ping test from laptop to board shows a packet loss
 of 90% and more while no packet loss achieved pinging
 the laptop from the board."

I definitively advice against the second patch (the part regarding
32 bit Meson SoC).

About the first one, still no evidence that is needed on Meson8b SoC.
And I'm saying it because I tested both patches on real hardware,
not just guessing!

Furthermore, as Martin reported in one of the previous mail,
even Amlogic's buildroot kernel uses an edge rising IRQ type
for the Meson8b MAC. Other evidence that is not so clear
the need for the first patch on 32 bit Meson SoC.

> >
> > TEST 0: no patch applied.
> >
> > 1) Start packet generator on laptop:
> >
> >               | incoming traffic | outgoing traffic
> > =====================================================
> > nload (board) |     ~940 Mbps    |       0 Mbps
> > -----------------------------------------------------
> > nload (laptop)|       0 Mbps     |     ~940 Mbps
> > =====================================================
> >
> > 2) Start packet generator on board:
> >
> >               | incoming traffic  | outgoing traffic
> > ==============+===================+==================
> > nload (board) |     ~460 Mbps     |     ~256 Mbps
> > --------------+-------------------+------------------
> > nload (laptop)|     ~256 Mbps     |     ~940 Mbps
> > =====================================================
> >
> > 3) Stop packet generator on laptop:
> >
> >               | incoming traffic | outgoing traffic
> > =====================================================
> > nload (board) |       0 Mbps     |    ~940 Mbps
> > -----------------------------------------------------
> > nload (laptop)|       ~940 Mbps  |      0 Mbps
> > =====================================================
> >
> > 4) Restart packet generator on laptop:
> >
> >               | incoming traffic | outgoing traffic
> > =====================================================
> > nload (board) |     ~0 Mbps      |     ~940 Mbps
> > -----------------------------------------------------
> > nload (laptop)|     ~940 Mbps    |     ~940 Mbps
> > =====================================================
> >
> > In the last case the "ifconfig" statistics about RX packets
> > remain fixed which probably indicates that the incoming traffic
> > to the board is effectively being dropped.
> >
> > The eth0 interrupt counter keeps incrementing.
> > Simple ping test works correctly.
> >
> >
> > TEST 1: IRQ type patch applied
> >
> > Same results as TEST 0.
> >
> >
> > TEST 2: eee-broken-1000t flag removed
> >
> > 1) Start packet generator on laptop:
> >
> >               | incoming traffic | outgoing traffic
> > =====================================================
> > nload (board) |      ~3Mbps      |       0 Mbps
> > -----------------------------------------------------
> > nload (laptop)|       0 Mbps     |     ~940 Mbps
> > =====================================================
> >
> > 2) Start packet generator on board:
> >
> >               | incoming traffic  | outgoing traffic
> > ==============+===================+==================
> > nload (board) |     ~0 Mbps       |     ~940 Mbps
> > --------------+-------------------+------------------
> > nload (laptop)|     ~940 Mbps     |     ~940 Mbps
> > =====================================================
> >
> > 3) Stop packet generator on laptop:
> >
> >               | incoming traffic | outgoing traffic
> > =====================================================
> > nload (board) |       0 Mbps     |    ~940 Mbps
> > -----------------------------------------------------
> > nload (laptop)|       ~940 Mbps  |      0 Mbps
> > =====================================================
> >
> > 4) Restart packet generator on laptop:
> >
> >               | incoming traffic | outgoing traffic
> > =====================================================
> > nload (board) |     ~0 Mbps      |     ~940 Mbps
> > -----------------------------------------------------
> > nload (laptop)|     ~940 Mbps    |     ~940 Mbps
> > =====================================================
> >
> > In the first case the "ifconfig" statistics about RX packets
> > are incremented consistently with the incoming traffic value
> > showed by the nload (board side).
> >
> > In the last case the "ifconfig" statistics about RX packets
> > remain fixed which probably indicates that the incoming traffic
> > to the board is effectively being dropped.
> >
> > The eth0 interrupt counter keeps incrementing.
> > Simple ping test from laptop to board shows a packet loss
> > of 90% and more while no packet loss achieved pinging
> > the laptop from the board.
> >
> >
> > TEST 3: both patches applied.
> >
> > Same results as TEST 2.
> >
> >
> > From the results obtained from these tests,
> > which are more accurate than the previous one,
> > I can say that the second patch (remove eee-broken-1000t flag)
> > should NOT be applied.
> >
> > About the first one (change MAC IRQ type), I would like
> > to do other tests with other tools like iperf3.
> > With these results only, I would say to not apply it
> > because nothing changed but if your stress test failed on
> > long running and this patch fix it I would like to test it more deeply.
> >
> > As final thought, the conducted tests clearly show that if the board
> > transmits at full rate, all the incoming traffic is dropped.
> > I think that this behaviour should be fixed but don't know if
> > it depends on the driver or device tree description.
> > I'll keep investigating.
> >
> > Regards,
> >
> > Emiliano
> >
> > On Thu, Dec 06, 2018 at 04:52:28PM +0100, Emiliano Ingrassia wrote:
> > > Hi Carlo,
> > >
> > > thanks for the answer.
> > >
> > > On Thu, Dec 06, 2018 at 01:17:58PM +0000, Carlo Caione wrote:
> > > > On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> > > > > Hi all,
> > > >
> > > > Hi Emiliano,
> > > >
> > > > > thank you for involving me.
> > > > >
> > > > > I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> > > > > and tested it with kernel packet generator, monitoring
> > > > > bandwidth usage with "nload".
> > > > >
> > > > > All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> > > > > with a short ethernet cable directly attached to a laptop with
> > > > > 1G ethernet interface, with "nload" running on the board.
> > > > >
> > > > > The tests I performed are composed by the following steps:
> > > > >
> > > > > 1) Start packet generator with "rate 1000M" on laptop;
> > > > >
> > > > > 2) Keep packet generator active on the laptop and
> > > > >    start the packet generator on the board with "rate 1000M";
> > > > >
> > > > > 3) Stop both packet generators;
> > > > >
> > > > > 4) Start packet generator on the board;
> > > > >
> > > > > 5) Keep packet generator active on the board and
> > > > >    start the packet generator on the laptop.
> > > >
> > > > out of curiosity: why do you expect to see something different from
> > > > point (2)?
> > > >
> > >
> > > I did not expect it indeed, I tried and got different results.
> > >
> > > > > Test results without Carlo's patches applied:
> > > > >
> > > > > 1) "nload" shows an incoming traffic of ~950Mbps;
> > > > >
> > > > > 2) "nload" shows an incoming traffic of ~400Mbps
> > > > >    and an outgoing traffic of ~250Mbps;
> > > > >
> > > > > 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> > > > >
> > > > > 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> > > > >
> > > > > 5) "nload" shows incoming traffic of 0Mbps
> > > > >    and an outgoing traffic of ~950Mbps.
> > > > >
> > > > > Applying only the first patch (change mac IRQ type) I got the same
> > > > > results.
> > > >
> > > > This is expected. The change in the IRQ type is solving an issue that
> > > > you can see if the run a stress test involving multiple components for
> > > > several hours.
> > > >
> > >
> > > OK, did you use "stress-ng" tool for tests?
> > >
> > > > > Applying only the second patch (drop eee-broken-1000t) I got the same
> > > > > results!
> > > >
> > > > I am a bit confused here. Wasn't the eee-broken-1000t added to fix a
> > > > problem with the ethernet? Are you suggesting that for some reason you
> > > > cannot reproduce anymore the problem for which the quirk was
> > > > introduced?
> > > >
> > >
> > > Problems without the "eee-broken-1000t" flags were experimented
> > > one and a half years ago on a Amlogic development kernel from [0],
> > > probably a 4.14 version.
> > > Many patches about Meson8b SoC, dwmac-meson8b and dwmac driver
> > > were introduced so yes, the "eee-broken-1000t" was added
> > > to fix a problem with the ethernet (one and a half years ago),
> > > but new tests are needed to say if it still necessary.
> > >
> > > > > With both patches applied I got the same results but with an incoming
> > > > > traffic
> > > > > of ~3Mbps on the board.
> > > >
> > > > On all the tests and immediately from the start of the tests?
> > > >
> > >
> > > Yes, in all the 5 steps immediately from the start.
> > >
> > > I also tried to execute "nload" on both sides to see the bandwidth
> > > usage.
> > >
> > > With bot patches applied, after starting kernel packet generator
> > > on my laptop with 1Gbps rate, "nload" on the laptop side shows me
> > > an outgoing traffic of ~940Mbps while "nload" on the board side shows
> > > me an incoming traffic of ~3Mbps.
> > >
> > > Also consider that a pinging test from my laptop to the board shows
> > > a packet loss of about 90%.
> > >
> > > > When you hit the problem con you check in /proc/interrupts if you see
> > > > the IRQ counter for the eth0 incrementing or not?
> > > >
> > >
> > > The eth0 IRQ counter is incremented during the test.
> > >
> > > > Cheers,
> > > >
> > > > --
> > > > Carlo Caione
> > > >
> > > >
> > >
> > > I would like to conduct other tests with iperf3 to be sure about
> > > the obtained results. What do you think?
> > > Should I apply your patches on the latest Amlogic development kernel?
> > >
> > > Regards,
> > >
> > > Emiliano
> > >
> > > [0]
> > > https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/
> >
> > Cheers,
> >
> > Emiliano
> >
> > _______________________________________________
> > linux-amlogic mailing list
> > linux-amlogic@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-amlogic
>
>
Emiliano Ingrassia Dec. 7, 2018, 6:33 p.m. UTC | #14
On Fri, Dec 07, 2018 at 11:03:20AM +0000, Carlo Caione wrote:
> On Fri, 2018-12-07 at 11:49 +0100, Jerome Brunet wrote:
> > On Thu, 2018-12-06 at 19:51 +0100, Emiliano Ingrassia wrote:
> > > Hi Carlo,
> > >
> > > I keep running tests with packet generator,
> > > using nload to show bandwidth usage.
> > >
> > > Here is my test results with packet generators
> > > both running on laptop and board with rate 1 Gbps.
> >
> > Testing in UDP is unlikely to give us clear picture of anything for
> > this sort
> > of fixes.
> >
> > All your test seems in show it the fact the Amlogic SoC usually
> > prioritize the
> > TX traffic over RX, which is something we've known about for a while.
> >
> > It would be helpful if you could provide TCP figures with a traffic
> > generator
> > we can all share an inspect, such as iperf3
>
> For reference you can use something like:
>
> iperf3 -c ${IP} -i ${IPERF_LOGPERIOD} -w 400k -p ${PORT} -t ${DURATION}
>

Thanks Carlo. Just to be consistent, what value did you use for DURATION?

> Cheers,
>
> --
> Carlo Caione
>

Regards,

Emiliano
Jerome Brunet Dec. 7, 2018, 7:58 p.m. UTC | #15
On Fri, 2018-12-07 at 19:28 +0100, Emiliano Ingrassia wrote:
> On Fri, Dec 07, 2018 at 11:49:15AM +0100, Jerome Brunet wrote:
> > On Thu, 2018-12-06 at 19:51 +0100, Emiliano Ingrassia wrote:
> > > Hi Carlo,
> > > 
> > > I keep running tests with packet generator,
> > > using nload to show bandwidth usage.
> > > 
> > > Here is my test results with packet generators
> > > both running on laptop and board with rate 1 Gbps.
> > 
> > Testing in UDP is unlikely to give us clear picture of anything for this
> > sort
> > of fixes.
> > 
> 
> Why? Would you mind to explain your reasoning?

Because we have no idea why packet are lost in UDP

> 
> > All your test seems in show it the fact the Amlogic SoC usually prioritize
> > the
> > TX traffic over RX, which is something we've known about for a while.
> > 
> 
> Is that normal

Yes

> and/or acceptable?

Free software, Free world .. you are free to (try to) do something about it.

> 
> > It would be helpful if you could provide TCP figures with a traffic
> > generator
> > we can all share an inspect, such as iperf3
> > 
> > Finally, Your test do not show the original issue regarding EEE. So the
> > work
> > around we put (yes, it was never considered a solution) for it should not
> > be
> > kept IHMO. Your numbers for EEE may be due to the way the traffic is
> > generated
> > and the PHY entering LPI and taking a bit of time to exit it. Again UDP is
> > not
> > very helpful to characterize this.
> > 
> 
> Did you read my email entirely

Yes I did (and I'm starting to regret it)

> or just kidding me?

I don't know what you hope accomplish with that can, but you can be sure it
won't help.

> 
> TEST 3 clearly shows that the issue regarding EEE is still there
> with both patches applied.

TEST 3 clearly shows nothing. Once gain with Tx priority and UDP, no
conclusion can be made from your test

> 
> My comment about TEST 3 (same results as TEST 2):
> "Simple ping test from laptop to board shows a packet loss
>  of 90% and more while no packet loss achieved pinging
>  the laptop from the board."
> 
> I definitively advice against the second patch (the part regarding
> 32 bit Meson SoC).
> 
> About the first one, still no evidence that is needed on Meson8b SoC.
> And I'm saying it because I tested both patches on real hardware,
> not just guessing!

LOL. What are you trying to imply exactly ?

> 
> Furthermore, as Martin reported in one of the previous mail,
> even Amlogic's buildroot kernel uses an edge rising IRQ type
> for the Meson8b MAC.

As far any other amlogic MAC ... And I think  we pointed out that every other
dwmac users out there uses LEVEL, which makes a lot more sense and is stable.


>  Other evidence that is not so clear
> the need for the first patch on 32 bit Meson SoC.

Not an evidence at all, no difference between the 32bit and 64bit arch.



> 
> > > TEST 0: no patch applied.
> > > 
> > > 1) Start packet generator on laptop:
> > > 
> > >               | incoming traffic | outgoing traffic
> > > =====================================================
> > > nload (board) |     ~940 Mbps    |       0 Mbps
> > > -----------------------------------------------------
> > > nload (laptop)|       0 Mbps     |     ~940 Mbps
> > > =====================================================
> > > 
> > > 2) Start packet generator on board:
> > > 
> > >               | incoming traffic  | outgoing traffic
> > > ==============+===================+==================
> > > nload (board) |     ~460 Mbps     |     ~256 Mbps
> > > --------------+-------------------+------------------
> > > nload (laptop)|     ~256 Mbps     |     ~940 Mbps
> > > =====================================================
> > > 
> > > 3) Stop packet generator on laptop:
> > > 
> > >               | incoming traffic | outgoing traffic
> > > =====================================================
> > > nload (board) |       0 Mbps     |    ~940 Mbps
> > > -----------------------------------------------------
> > > nload (laptop)|       ~940 Mbps  |      0 Mbps
> > > =====================================================
> > > 
> > > 4) Restart packet generator on laptop:
> > > 
> > >               | incoming traffic | outgoing traffic
> > > =====================================================
> > > nload (board) |     ~0 Mbps      |     ~940 Mbps
> > > -----------------------------------------------------
> > > nload (laptop)|     ~940 Mbps    |     ~940 Mbps
> > > =====================================================
> > > 
> > > In the last case the "ifconfig" statistics about RX packets
> > > remain fixed which probably indicates that the incoming traffic
> > > to the board is effectively being dropped.
> > > 
> > > The eth0 interrupt counter keeps incrementing.
> > > Simple ping test works correctly.
> > > 
> > > 
> > > TEST 1: IRQ type patch applied
> > > 
> > > Same results as TEST 0.
> > > 
> > > 
> > > TEST 2: eee-broken-1000t flag removed
> > > 
> > > 1) Start packet generator on laptop:
> > > 
> > >               | incoming traffic | outgoing traffic
> > > =====================================================
> > > nload (board) |      ~3Mbps      |       0 Mbps
> > > -----------------------------------------------------
> > > nload (laptop)|       0 Mbps     |     ~940 Mbps
> > > =====================================================
> > > 
> > > 2) Start packet generator on board:
> > > 
> > >               | incoming traffic  | outgoing traffic
> > > ==============+===================+==================
> > > nload (board) |     ~0 Mbps       |     ~940 Mbps
> > > --------------+-------------------+------------------
> > > nload (laptop)|     ~940 Mbps     |     ~940 Mbps
> > > =====================================================
> > > 
> > > 3) Stop packet generator on laptop:
> > > 
> > >               | incoming traffic | outgoing traffic
> > > =====================================================
> > > nload (board) |       0 Mbps     |    ~940 Mbps
> > > -----------------------------------------------------
> > > nload (laptop)|       ~940 Mbps  |      0 Mbps
> > > =====================================================
> > > 
> > > 4) Restart packet generator on laptop:
> > > 
> > >               | incoming traffic | outgoing traffic
> > > =====================================================
> > > nload (board) |     ~0 Mbps      |     ~940 Mbps
> > > -----------------------------------------------------
> > > nload (laptop)|     ~940 Mbps    |     ~940 Mbps
> > > =====================================================
> > > 
> > > In the first case the "ifconfig" statistics about RX packets
> > > are incremented consistently with the incoming traffic value
> > > showed by the nload (board side).
> > > 
> > > In the last case the "ifconfig" statistics about RX packets
> > > remain fixed which probably indicates that the incoming traffic
> > > to the board is effectively being dropped.
> > > 
> > > The eth0 interrupt counter keeps incrementing.
> > > Simple ping test from laptop to board shows a packet loss
> > > of 90% and more while no packet loss achieved pinging
> > > the laptop from the board.
> > > 
> > > 
> > > TEST 3: both patches applied.
> > > 
> > > Same results as TEST 2.
> > > 
> > > 
> > > From the results obtained from these tests,
> > > which are more accurate than the previous one,
> > > I can say that the second patch (remove eee-broken-1000t flag)
> > > should NOT be applied.
> > > 
> > > About the first one (change MAC IRQ type), I would like
> > > to do other tests with other tools like iperf3.
> > > With these results only, I would say to not apply it
> > > because nothing changed but if your stress test failed on
> > > long running and this patch fix it I would like to test it more deeply.
> > > 
> > > As final thought, the conducted tests clearly show that if the board
> > > transmits at full rate, all the incoming traffic is dropped.
> > > I think that this behaviour should be fixed but don't know if
> > > it depends on the driver or device tree description.
> > > I'll keep investigating.
> > > 
> > > Regards,
> > > 
> > > Emiliano
> > > 
> > > On Thu, Dec 06, 2018 at 04:52:28PM +0100, Emiliano Ingrassia wrote:
> > > > Hi Carlo,
> > > > 
> > > > thanks for the answer.
> > > > 
> > > > On Thu, Dec 06, 2018 at 01:17:58PM +0000, Carlo Caione wrote:
> > > > > On Thu, 2018-12-06 at 13:43 +0100, Emiliano Ingrassia wrote:
> > > > > > Hi all,
> > > > > 
> > > > > Hi Emiliano,
> > > > > 
> > > > > > thank you for involving me.
> > > > > > 
> > > > > > I applied Carlo's patches[0] on a kernel vanilla 4.19.6
> > > > > > and tested it with kernel packet generator, monitoring
> > > > > > bandwidth usage with "nload".
> > > > > > 
> > > > > > All tests were conducted on an Odroid-C1+ Rev. 0.4-20150930 board
> > > > > > with a short ethernet cable directly attached to a laptop with
> > > > > > 1G ethernet interface, with "nload" running on the board.
> > > > > > 
> > > > > > The tests I performed are composed by the following steps:
> > > > > > 
> > > > > > 1) Start packet generator with "rate 1000M" on laptop;
> > > > > > 
> > > > > > 2) Keep packet generator active on the laptop and
> > > > > >    start the packet generator on the board with "rate 1000M";
> > > > > > 
> > > > > > 3) Stop both packet generators;
> > > > > > 
> > > > > > 4) Start packet generator on the board;
> > > > > > 
> > > > > > 5) Keep packet generator active on the board and
> > > > > >    start the packet generator on the laptop.
> > > > > 
> > > > > out of curiosity: why do you expect to see something different from
> > > > > point (2)?
> > > > > 
> > > > 
> > > > I did not expect it indeed, I tried and got different results.
> > > > 
> > > > > > Test results without Carlo's patches applied:
> > > > > > 
> > > > > > 1) "nload" shows an incoming traffic of ~950Mbps;
> > > > > > 
> > > > > > 2) "nload" shows an incoming traffic of ~400Mbps
> > > > > >    and an outgoing traffic of ~250Mbps;
> > > > > > 
> > > > > > 3) "nload" shows 0Mbps both for incoming and outgoing traffic;
> > > > > > 
> > > > > > 4) "nload" shows an outgoing traffic of ~950Mbps from the board;
> > > > > > 
> > > > > > 5) "nload" shows incoming traffic of 0Mbps
> > > > > >    and an outgoing traffic of ~950Mbps.
> > > > > > 
> > > > > > Applying only the first patch (change mac IRQ type) I got the same
> > > > > > results.
> > > > > 
> > > > > This is expected. The change in the IRQ type is solving an issue
> > > > > that
> > > > > you can see if the run a stress test involving multiple components
> > > > > for
> > > > > several hours.
> > > > > 
> > > > 
> > > > OK, did you use "stress-ng" tool for tests?
> > > > 
> > > > > > Applying only the second patch (drop eee-broken-1000t) I got the
> > > > > > same
> > > > > > results!
> > > > > 
> > > > > I am a bit confused here. Wasn't the eee-broken-1000t added to fix a
> > > > > problem with the ethernet? Are you suggesting that for some reason
> > > > > you
> > > > > cannot reproduce anymore the problem for which the quirk was
> > > > > introduced?
> > > > > 
> > > > 
> > > > Problems without the "eee-broken-1000t" flags were experimented
> > > > one and a half years ago on a Amlogic development kernel from [0],
> > > > probably a 4.14 version.
> > > > Many patches about Meson8b SoC, dwmac-meson8b and dwmac driver
> > > > were introduced so yes, the "eee-broken-1000t" was added
> > > > to fix a problem with the ethernet (one and a half years ago),
> > > > but new tests are needed to say if it still necessary.
> > > > 
> > > > > > With both patches applied I got the same results but with an
> > > > > > incoming
> > > > > > traffic
> > > > > > of ~3Mbps on the board.
> > > > > 
> > > > > On all the tests and immediately from the start of the tests?
> > > > > 
> > > > 
> > > > Yes, in all the 5 steps immediately from the start.
> > > > 
> > > > I also tried to execute "nload" on both sides to see the bandwidth
> > > > usage.
> > > > 
> > > > With bot patches applied, after starting kernel packet generator
> > > > on my laptop with 1Gbps rate, "nload" on the laptop side shows me
> > > > an outgoing traffic of ~940Mbps while "nload" on the board side shows
> > > > me an incoming traffic of ~3Mbps.
> > > > 
> > > > Also consider that a pinging test from my laptop to the board shows
> > > > a packet loss of about 90%.
> > > > 
> > > > > When you hit the problem con you check in /proc/interrupts if you
> > > > > see
> > > > > the IRQ counter for the eth0 incrementing or not?
> > > > > 
> > > > 
> > > > The eth0 IRQ counter is incremented during the test.
> > > > 
> > > > > Cheers,
> > > > > 
> > > > > --
> > > > > Carlo Caione
> > > > > 
> > > > > 
> > > > 
> > > > I would like to conduct other tests with iperf3 to be sure about
> > > > the obtained results. What do you think?
> > > > Should I apply your patches on the latest Amlogic development kernel?
> > > > 
> > > > Regards,
> > > > 
> > > > Emiliano
> > > > 
> > > > [0]
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic.git/
> > > 
> > > Cheers,
> > > 
> > > Emiliano
> > > 
> > > _______________________________________________
> > > linux-amlogic mailing list
> > > linux-amlogic@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-amlogic
Martin Blumenstingl Dec. 7, 2018, 9:56 p.m. UTC | #16
Hi Emiliano,

On Fri, Dec 7, 2018 at 7:28 PM Emiliano Ingrassia
<ingrassia@epigenesys.com> wrote:
[...]
> > All your test seems in show it the fact the Amlogic SoC usually prioritize the
> > TX traffic over RX, which is something we've known about for a while.
> >
>
> Is that normal and/or acceptable?
the public S805 datasheet mentions in the "Ethernet MAC" features
section (22.2) on page 120:
"RX FIFO 4KB, TX FIFO 2KB"
this suggests that

I did some tests using some Armbian 3.10 kernel on my Odroid-C1:
root@odroidc1:~# iperf3 -c 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  4] local 192.168.1.163 port 44297 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  49.9 MBytes   419 Mbits/sec    0    809 KBytes
[  4]   1.00-2.00   sec  48.7 MBytes   408 Mbits/sec    0    809 KBytes
[  4]   2.00-3.00   sec  48.4 MBytes   407 Mbits/sec    0    809 KBytes
[  4]   3.00-4.00   sec  48.9 MBytes   409 Mbits/sec    0    809 KBytes
[  4]   4.00-5.00   sec  48.2 MBytes   406 Mbits/sec    0    809 KBytes
[  4]   5.00-6.00   sec  48.8 MBytes   409 Mbits/sec    0    809 KBytes
[  4]   6.00-7.00   sec  48.7 MBytes   408 Mbits/sec    0    809 KBytes
[  4]   7.00-8.00   sec  48.0 MBytes   404 Mbits/sec    0    809 KBytes
[  4]   8.00-9.00   sec  48.1 MBytes   403 Mbits/sec    0    809 KBytes
[  4]   9.00-10.00  sec  48.1 MBytes   404 Mbits/sec    0    809 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   486 MBytes   408 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   485 MBytes   407 Mbits/sec                  receiver

iperf Done.
root@odroidc1:~# iperf3 -c 192.168.1.100 -R
Connecting to host 192.168.1.100, port 5201
Reverse mode, remote host 192.168.1.100 is sending
[  4] local 192.168.1.163 port 44301 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  87.5 MBytes   734 Mbits/sec
[  4]   1.00-2.00   sec  89.2 MBytes   748 Mbits/sec
[  4]   2.00-3.00   sec  89.0 MBytes   747 Mbits/sec
[  4]   3.00-4.00   sec  88.9 MBytes   746 Mbits/sec
[  4]   4.00-5.00   sec  89.2 MBytes   748 Mbits/sec
[  4]   5.00-6.00   sec  89.0 MBytes   747 Mbits/sec
[  4]   6.00-7.00   sec  88.5 MBytes   742 Mbits/sec
[  4]   7.00-8.00   sec  88.5 MBytes   742 Mbits/sec
[  4]   8.00-9.00   sec  88.5 MBytes   742 Mbits/sec
[  4]   9.00-10.00  sec  88.2 MBytes   740 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   889 MBytes   745 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   887 MBytes   744 Mbits/sec                  receiver

iperf Done.
root@odroidc1:~#

[...]
> Furthermore, as Martin reported in one of the previous mail,
> even Amlogic's buildroot kernel uses an edge rising IRQ type
> for the Meson8b MAC. Other evidence that is not so clear
> the need for the first patch on 32 bit Meson SoC.
please note that the dwc2 USB controllers are also using
IRQ_TYPE_EDGE_RISING in Amlogic's 3.10 kernel.
mainline on the other hand uses IRQ_TYPE_LEVEL_HIGH after your commit
291f45dd6da5fa "ARM: dts: meson: fixing USB support on Meson6, Meson8
and Meson8b"
what I want to say is: in some cases we need to use different settings
than the 3.10 kernel!


Regards
Martin