diff mbox series

[net-next,v1,1/6] lan743x: boost performance on cpu archs w/o dma cache snooping

Message ID 20210129195240.31871-2-TheSven73@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series lan743x speed boost | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 0 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/build_allmodconfig_warn fail Errors and warnings before: 0 this patch: 1
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Sven Van Asbroeck Jan. 29, 2021, 7:52 p.m. UTC
From: Sven Van Asbroeck <thesven73@gmail.com>

The buffers in the lan743x driver's receive ring are always 9K,
even when the largest packet that can be received (the mtu) is
much smaller. This performs particularly badly on cpu archs
without dma cache snooping (such as ARM): each received packet
results in a 9K dma_{map|unmap} operation, which is very expensive
because cpu caches need to be invalidated.

Careful measurement of the driver rx path on armv7 reveals that
the cpu spends the majority of its time waiting for cache
invalidation.

Optimize as follows:

1. set rx ring buffer size equal to the mtu. this limits the
   amount of cache that needs to be invalidated per dma_map().

2. when dma_unmap()ping, skip cpu sync. Sync only the packet data
   actually received, the size of which the chip will indicate in
   its rx ring descriptors. this limits the amount of cache that
   needs to be invalidated per dma_unmap().

These optimizations double the rx performance on armv7.
Third parties report 3x rx speedup on armv8.

Performance on dma cache snooping architectures (such as x86)
is expected to stay the same.

Tested with iperf3 on a freescale imx6qp + lan7430, both sides
set to mtu 1500 bytes, measure rx performance:

Before:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec   550 MBytes   231 Mbits/sec    0
After:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec  1.33 GBytes   570 Mbits/sec    0

Test by Anders Roenningen (anders@ronningen.priv.no) on armv8,
    rx iperf3:
Before 102 Mbits/sec
After  279 Mbits/sec

Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
---

Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git # 46eb3c108fe1

To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)

 drivers/net/ethernet/microchip/lan743x_main.c | 35 ++++++++++++-------
 1 file changed, 23 insertions(+), 12 deletions(-)

Comments

Andrew Lunn Jan. 29, 2021, 8:36 p.m. UTC | #1
> diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
> index f1f6eba4ace4..f485320e5784 100644
> --- a/drivers/net/ethernet/microchip/lan743x_main.c
> +++ b/drivers/net/ethernet/microchip/lan743x_main.c
> @@ -1957,11 +1957,11 @@ static int lan743x_rx_next_index(struct lan743x_rx *rx, int index)
>  
>  static struct sk_buff *lan743x_rx_allocate_skb(struct lan743x_rx *rx)
>  {
> -	int length = 0;
> +	struct net_device *netdev = rx->adapter->netdev;
>  
> -	length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
> -	return __netdev_alloc_skb(rx->adapter->netdev,
> -				  length, GFP_ATOMIC | GFP_DMA);
> +	return __netdev_alloc_skb(netdev,
> +				  netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING,
> +				  GFP_ATOMIC | GFP_DMA);
>  }
>  
>  static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
> @@ -1977,9 +1977,10 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
>  {
>  	struct lan743x_rx_buffer_info *buffer_info;
>  	struct lan743x_rx_descriptor *descriptor;
> -	int length = 0;
> +	struct net_device *netdev = rx->adapter->netdev;
> +	int length;

Please keep to reverse christmass tree.
>  
> -	length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
> +	length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
>  	descriptor = &rx->ring_cpu_ptr[index];
>  	buffer_info = &rx->buffer_info[index];
>  	buffer_info->skb = skb;
> @@ -2148,11 +2149,18 @@ static int lan743x_rx_process_packet(struct lan743x_rx *rx)
>  			descriptor = &rx->ring_cpu_ptr[first_index];
>  
>  			/* unmap from dma */
> +			packet_length =	RX_DESC_DATA0_FRAME_LENGTH_GET_
> +					(descriptor->data0);
>  			if (buffer_info->dma_ptr) {
> -				dma_unmap_single(&rx->adapter->pdev->dev,
> -						 buffer_info->dma_ptr,
> -						 buffer_info->buffer_length,
> -						 DMA_FROM_DEVICE);
> +				dma_sync_single_for_cpu(&rx->adapter->pdev->dev,
> +							buffer_info->dma_ptr,
> +							packet_length,
> +							DMA_FROM_DEVICE);
> +				dma_unmap_single_attrs(&rx->adapter->pdev->dev,
> +						       buffer_info->dma_ptr,
> +						       buffer_info->buffer_length,
> +						       DMA_FROM_DEVICE,
> +						       DMA_ATTR_SKIP_CPU_SYNC);

So this patch appears to contain two different changes
1) You only allocate a receive buffer as big as the MTU plus overheads
2) You change the cache operations to operate on the received length.

The first change should be completely safe, and i guess, is giving
most of the benefits. The second one is where interesting things might
happen. So please split this patch into two.  If it does break, we can
git bisect, and probably end up on the second patch.

Thanks
	Andrew
Jakub Kicinski Jan. 29, 2021, 10:01 p.m. UTC | #2
On Fri, 29 Jan 2021 14:52:35 -0500 Sven Van Asbroeck wrote:
> From: Sven Van Asbroeck <thesven73@gmail.com>
> 
> The buffers in the lan743x driver's receive ring are always 9K,
> even when the largest packet that can be received (the mtu) is
> much smaller. This performs particularly badly on cpu archs
> without dma cache snooping (such as ARM): each received packet
> results in a 9K dma_{map|unmap} operation, which is very expensive
> because cpu caches need to be invalidated.
> 
> Careful measurement of the driver rx path on armv7 reveals that
> the cpu spends the majority of its time waiting for cache
> invalidation.
> 
> Optimize as follows:
> 
> 1. set rx ring buffer size equal to the mtu. this limits the
>    amount of cache that needs to be invalidated per dma_map().
> 
> 2. when dma_unmap()ping, skip cpu sync. Sync only the packet data
>    actually received, the size of which the chip will indicate in
>    its rx ring descriptors. this limits the amount of cache that
>    needs to be invalidated per dma_unmap().
> 
> These optimizations double the rx performance on armv7.
> Third parties report 3x rx speedup on armv8.
> 
> Performance on dma cache snooping architectures (such as x86)
> is expected to stay the same.
> 
> Tested with iperf3 on a freescale imx6qp + lan7430, both sides
> set to mtu 1500 bytes, measure rx performance:
> 
> Before:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-20.00  sec   550 MBytes   231 Mbits/sec    0
> After:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-20.00  sec  1.33 GBytes   570 Mbits/sec    0
> 
> Test by Anders Roenningen (anders@ronningen.priv.no) on armv8,
>     rx iperf3:
> Before 102 Mbits/sec
> After  279 Mbits/sec
> 
> Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>

You may need to rebase to see this:

drivers/net/ethernet/microchip/lan743x_main.c:2123:41: warning: restricted __le32 degrades to integer
Sven Van Asbroeck Jan. 29, 2021, 10:46 p.m. UTC | #3
On Fri, Jan 29, 2021 at 5:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> You may need to rebase to see this:
>
> drivers/net/ethernet/microchip/lan743x_main.c:2123:41: warning: restricted __le32 degrades to integer

Good catch. The problem goes away with the next commit in the set.
This is probably because I rebased to the little endian/big endian patch at
the last minute. I'll fix it up in v2.
Sven Van Asbroeck Jan. 29, 2021, 10:49 p.m. UTC | #4
Hi Andrew, thank you so much for looking at this patch !

On Fri, Jan 29, 2021 at 3:36 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> So this patch appears to contain two different changes
> 1) You only allocate a receive buffer as big as the MTU plus overheads
> 2) You change the cache operations to operate on the received length.
>
> The first change should be completely safe, and i guess, is giving
> most of the benefits. The second one is where interesting things might
> happen. So please split this patch into two.  If it does break, we can
> git bisect, and probably end up on the second patch.
>

Yes, I tested this extensively on arm7, but you're right, it might behave
differently on other platforms. I will split into two, as you suggested.
Bryan.Whitehead@microchip.com Jan. 30, 2021, 10:10 p.m. UTC | #5
Sven, see below comments

> @@ -2148,11 +2149,18 @@ static int lan743x_rx_process_packet(struct
> lan743x_rx *rx)
>                         descriptor = &rx->ring_cpu_ptr[first_index];
> 
>                         /* unmap from dma */
> +                       packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
> +                                       (descriptor->data0);
It appears you moved this packet_length assignment from just below the following if block, however  you left out the le32_to_cpu.See next comment

>                         if (buffer_info->dma_ptr) {
> -                               dma_unmap_single(&rx->adapter->pdev->dev,
> -                                                buffer_info->dma_ptr,
> -                                                buffer_info->buffer_length,
> -                                                DMA_FROM_DEVICE);
> +                               dma_sync_single_for_cpu(&rx->adapter->pdev->dev,
> +                                                       buffer_info->dma_ptr,
> +                                                       packet_length,
> +                                                       DMA_FROM_DEVICE);
> +                               dma_unmap_single_attrs(&rx->adapter->pdev->dev,
> +                                                      buffer_info->dma_ptr,
> +                                                      buffer_info->buffer_length,
> +                                                      DMA_FROM_DEVICE,
> +
> + DMA_ATTR_SKIP_CPU_SYNC);
>                                 buffer_info->dma_ptr = 0;
>                                 buffer_info->buffer_length = 0;
>                         }
Just below here is the following line
		packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
				(le32_to_cpu(descriptor->data0));
This line was moved above the previous if block, but the le32_to_cpu was removed. Is that intentional?
Also I don't see any mention of this packet_length assignment (after the if block) being removed.
Since packet_length already contains this value, it seems unnecessary to keep this assignment.

> @@ -2167,8 +2175,8 @@ static int lan743x_rx_process_packet(struct
> lan743x_rx *rx)
>                         int index = first_index;
> 
>                         /* multi buffer packet not supported */
> -                       /* this should not happen since
> -                        * buffers are allocated to be at least jumbo size
> +                       /* this should not happen since buffers are allocated
> +                        * to be at least the mtu size configured in the mac.
>                          */
> 
>                         /* clean up buffers */ @@ -2628,6 +2636,9 @@ static int
> lan743x_netdev_change_mtu(struct net_device *netdev, int new_mtu)
>         struct lan743x_adapter *adapter = netdev_priv(netdev);
>         int ret = 0;
> 
> +       if (netif_running(netdev))
> +               return -EBUSY;
> +
>         ret = lan743x_mac_set_mtu(adapter, new_mtu);
>         if (!ret)
>                 netdev->mtu = new_mtu;
> --
> 2.17.1
Sven Van Asbroeck Jan. 30, 2021, 11:59 p.m. UTC | #6
Hi Bryan, thank you so much for reviewing, I really appreciate it.

On Sat, Jan 30, 2021 at 5:11 PM <Bryan.Whitehead@microchip.com> wrote:
>
> >                         /* unmap from dma */
> > +                       packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
> > +                                       (descriptor->data0);
> It appears you moved this packet_length assignment from just below the following if block, however  you left out the le32_to_cpu.See next comment
>

Correct on both counts. This is a merge snafu that crept in when I
rebased to Alexey's very recent little/big endian patch, at the last
minute.
My tests didn't catch it, because I'm running on a little-endian cpu,
where le32_to_cpu() compiles to a nop.

Had I had the good sense to run sparse on every patch, like Jakub has,
I would have caught it...
Sven Van Asbroeck Jan. 31, 2021, 12:14 a.m. UTC | #7
On Sat, Jan 30, 2021 at 5:11 PM <Bryan.Whitehead@microchip.com> wrote:
>
> It appears you moved this packet_length assignment from just below the following if block, however  you left out the le32_to_cpu.See next comment

PS this merge snafu is removed completely by the next patch in the set.
So this will not prevent you from reviewing/testing the multi-buffer support,
should you want to.
Christoph Hellwig Feb. 5, 2021, 9:31 a.m. UTC | #8
This is a pattern we've seen in a few other net driver, so we should
be ok.  It just is rather hairy and needs a good justification, which
seems to be given here.
Sergej Bauer Feb. 5, 2021, 12:44 p.m. UTC | #9
Hi Sven
I can confirm great stability improvement after your patch
"lan743x: boost performance on cpu archs w/o dma cache snooping".
Please note, that test_ber opens raw sockets as
s = socket(AF_PACKET, SOCK_RAW, ETH_P_ALL)
and resulting 'average speed' is a average egress speed.

Test machine is Intel Pentium G4560 3.50GHz
lan743x with rejected virtual phy 'inside'

What I had before:
$ ifmtu eth7 500
$ test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf
...
number of sent packets      = 1000000
number of received packets  = 289017
number of lost packets = 710983
number of out of order packets = 0
number of bit errors   = 0
total errors detected  = 710983
bit error rate         = 0.710983
average speed: 429.3799 Mbit/s

$ ifmtu eth7 1500
$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f1500 --no-conf
...
number of sent packets      = 1000000
number of received packets  = 577194
number of lost packets = 422806
number of out of order packets = 0
number of bit errors   = 0
total errors detected  = 422806
bit error rate         = 0.422806
average speed: 644.6557 Mbit/s
---

and what I had after your patch:
$ ifmtu eth7 500
$ test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf
...
number of sent packets      = 1000000
number of received packets  = 711329
number of lost packets = 288671
number of out of order packets = 0
number of bit errors   = 0
total errors detected  = 288671
bit error rate         = 0.288671
average speed: 429.2263 Mbit/s

$ ifmtu eth7 1500
$ test_ber -l eth7 -c 1000 -n 1000000 -f1500 --no-conf
...
number of sent packets      = 1000000
number of received packets  = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors   = 0
total errors detected  = 0
bit error rate         = 0
average speed: 644.5405 Mbit/s
Sven Van Asbroeck Feb. 5, 2021, 2:01 p.m. UTC | #10
Hi Christoph,

On Fri, Feb 5, 2021 at 4:31 AM Christoph Hellwig <hch@lst.de> wrote:
>
> This is a pattern we've seen in a few other net driver, so we should
> be ok.  It just is rather hairy and needs a good justification, which
> seems to be given here.

Thank you so much for taking the time to look into this.
That is certainly good news !!
Sven Van Asbroeck Feb. 5, 2021, 2:07 p.m. UTC | #11
Hi Sergej,

On Fri, Feb 5, 2021 at 7:44 AM Sergej Bauer <sbauer@blackbox.su> wrote:
>
> Hi Sven
> I can confirm great stability improvement after your patch
> "lan743x: boost performance on cpu archs w/o dma cache snooping".
>
> Test machine is Intel Pentium G4560 3.50GHz
> lan743x with rejected virtual phy 'inside'

Interesting, so the speed boost patch seems to improve things even on Intel...

Would you be able to apply and test the multi-buffer patch as well?
To do that, you can simply apply patches [2/6] and [3/6] on top of
what you already have.

Keeping in mind that Bryan has identified an issue with the above
patch, which will get fixed in v2. So YMMV.
Sergej Bauer Feb. 5, 2021, 3:09 p.m. UTC | #12
On Friday, February 5, 2021 5:07:22 PM MSK you wrote:
> Hi Sergej,
> 
> On Fri, Feb 5, 2021 at 7:44 AM Sergej Bauer <sbauer@blackbox.su> wrote:
> > Hi Sven
> > I can confirm great stability improvement after your patch
> > "lan743x: boost performance on cpu archs w/o dma cache snooping".
> > 
> > Test machine is Intel Pentium G4560 3.50GHz
> > lan743x with rejected virtual phy 'inside'
> 
> Interesting, so the speed boost patch seems to improve things even on
> Intel...
> 
> Would you be able to apply and test the multi-buffer patch as well?
> To do that, you can simply apply patches [2/6] and [3/6] on top of
> what you already have.
> 

Hi Sven
Tests after applying patches [2/6] and [3/6] are:
$ ifmtu eth7 500
$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf
...
number of sent packets      = 1000000
number of received packets  = 713288
number of lost packets = 286712
number of out of order packets = 0
number of bit errors   = 0
total errors detected  = 286712
bit error rate         = 0.286712
average speed: 427.8043 Mbit/s

$ ifmtu eth7 1500
$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf
...
number of sent packets      = 1000000
number of received packets  = 707869
number of lost packets = 292131
number of out of order packets = 0
number of bit errors   = 0
total errors detected  = 292131
bit error rate         = 0.292131
average speed: 431.0163 Mbit/s

$ sudo test_ber -l eth7 -c 1000 -n 1000000 -f1500 --no-conf
...
number of sent packets      = 1000000
number of received packets  = 1000000
number of lost packets = 0
number of out of order packets = 0
number of bit errors   = 0
total errors detected  = 0
bit error rate         = 0
average speed: 646.4932 Mbit/s

> Keeping in mind that Bryan has identified an issue with the above
> patch, which will get fixed in v2. So YMMV.
I'll perform tests with v2 as well.
Sven Van Asbroeck Feb. 5, 2021, 4:39 p.m. UTC | #13
Hi Sergej,

On Fri, Feb 5, 2021 at 10:09 AM Sergej Bauer <sbauer@blackbox.su> wrote:
>
> Tests after applying patches [2/6] and [3/6] are:
> $ ifmtu eth7 500
> $ sudo test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf

Thank you! Is there a way for me to run test_ber myself?
Is this a standard, or a bespoke testing tool?
Sergej Bauer Feb. 5, 2021, 4:59 p.m. UTC | #14
sOn Friday, February 5, 2021 7:39:40 PM MSK Sven Van Asbroeck wrote:
> Hi Sergej,
> 
> On Fri, Feb 5, 2021 at 10:09 AM Sergej Bauer <sbauer@blackbox.su> wrote:
> > Tests after applying patches [2/6] and [3/6] are:
> > $ ifmtu eth7 500
> > $ sudo test_ber -l eth7 -c 1000 -n 1000000 -f500 --no-conf
> 
> Thank you! Is there a way for me to run test_ber myself?
> Is this a standard, or a bespoke testing tool?
It's kind of bespoke... A part of framework to assist HW guys in developing
PHY-device. But the project is finished, so I could ask for permission to send
the source to you.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index f1f6eba4ace4..f485320e5784 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1957,11 +1957,11 @@  static int lan743x_rx_next_index(struct lan743x_rx *rx, int index)
 
 static struct sk_buff *lan743x_rx_allocate_skb(struct lan743x_rx *rx)
 {
-	int length = 0;
+	struct net_device *netdev = rx->adapter->netdev;
 
-	length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
-	return __netdev_alloc_skb(rx->adapter->netdev,
-				  length, GFP_ATOMIC | GFP_DMA);
+	return __netdev_alloc_skb(netdev,
+				  netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING,
+				  GFP_ATOMIC | GFP_DMA);
 }
 
 static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
@@ -1977,9 +1977,10 @@  static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
 {
 	struct lan743x_rx_buffer_info *buffer_info;
 	struct lan743x_rx_descriptor *descriptor;
-	int length = 0;
+	struct net_device *netdev = rx->adapter->netdev;
+	int length;
 
-	length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
+	length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
 	descriptor = &rx->ring_cpu_ptr[index];
 	buffer_info = &rx->buffer_info[index];
 	buffer_info->skb = skb;
@@ -2148,11 +2149,18 @@  static int lan743x_rx_process_packet(struct lan743x_rx *rx)
 			descriptor = &rx->ring_cpu_ptr[first_index];
 
 			/* unmap from dma */
+			packet_length =	RX_DESC_DATA0_FRAME_LENGTH_GET_
+					(descriptor->data0);
 			if (buffer_info->dma_ptr) {
-				dma_unmap_single(&rx->adapter->pdev->dev,
-						 buffer_info->dma_ptr,
-						 buffer_info->buffer_length,
-						 DMA_FROM_DEVICE);
+				dma_sync_single_for_cpu(&rx->adapter->pdev->dev,
+							buffer_info->dma_ptr,
+							packet_length,
+							DMA_FROM_DEVICE);
+				dma_unmap_single_attrs(&rx->adapter->pdev->dev,
+						       buffer_info->dma_ptr,
+						       buffer_info->buffer_length,
+						       DMA_FROM_DEVICE,
+						       DMA_ATTR_SKIP_CPU_SYNC);
 				buffer_info->dma_ptr = 0;
 				buffer_info->buffer_length = 0;
 			}
@@ -2167,8 +2175,8 @@  static int lan743x_rx_process_packet(struct lan743x_rx *rx)
 			int index = first_index;
 
 			/* multi buffer packet not supported */
-			/* this should not happen since
-			 * buffers are allocated to be at least jumbo size
+			/* this should not happen since buffers are allocated
+			 * to be at least the mtu size configured in the mac.
 			 */
 
 			/* clean up buffers */
@@ -2628,6 +2636,9 @@  static int lan743x_netdev_change_mtu(struct net_device *netdev, int new_mtu)
 	struct lan743x_adapter *adapter = netdev_priv(netdev);
 	int ret = 0;
 
+	if (netif_running(netdev))
+		return -EBUSY;
+
 	ret = lan743x_mac_set_mtu(adapter, new_mtu);
 	if (!ret)
 		netdev->mtu = new_mtu;