diff mbox series

[3/3] can: c_can: cache frames to operate as a true FIFO

Message ID 20210509124309.30024-4-dariobin@libero.it (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series can: c_can: cache frames to operate as a true FIFO | expand

Checks

Context Check Description
netdev/tree_selection success Series ignored based on subject

Commit Message

Dario Binacchi May 9, 2021, 12:43 p.m. UTC
As reported by a comment in the c_can_start_xmit() this was not a FIFO.
C/D_CAN controller sends out the buffers prioritized so that the lowest
buffer number wins.

What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
waited until the only frame of the FIFO was actually transmitted by the
controller. Only one message in the FIFO but we had to wait for it to
empty completely to ensure that the messages were transmitted in the
order in which they were loaded.

By storing the frames in the FIFO without requiring its transmission, we
will be able to use the full size of the FIFO even in cases such as the
one described above. The transmission interrupt will trigger their
transmission only when all the messages previously loaded but stored in
less priority positions of the buffers have been transmitted.

Suggested-by: Gianluca Falavigna <gianluca.falavigna@inwind.it>
Signed-off-by: Dario Binacchi <dariobin@libero.it>


---

 drivers/net/can/c_can/c_can.h      |  3 ++
 drivers/net/can/c_can/c_can_main.c | 63 ++++++++++++++++++++++++------
 2 files changed, 55 insertions(+), 11 deletions(-)

Comments

Marc Kleine-Budde May 10, 2021, 12:25 p.m. UTC | #1
On 09.05.2021 14:43:09, Dario Binacchi wrote:
> As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> C/D_CAN controller sends out the buffers prioritized so that the lowest
> buffer number wins.
> 
> What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> waited until the only frame of the FIFO was actually transmitted by the
> controller. Only one message in the FIFO but we had to wait for it to
> empty completely to ensure that the messages were transmitted in the
> order in which they were loaded.
> 
> By storing the frames in the FIFO without requiring its transmission, we
> will be able to use the full size of the FIFO even in cases such as the
> one described above. The transmission interrupt will trigger their
> transmission only when all the messages previously loaded but stored in
> less priority positions of the buffers have been transmitted.

The algorithm you implemented looks a bit too complicated to me. Let me
sketch the algorithm that's implemented by several other drivers.

- have a power of two number of TX objects
- add a number of objects to struct priv (tx_num)
  (or make it a define, if the number of tx objects is compile time fixed)
- add two "unsigned int" variables to your struct priv,
  one "tx_head", one "tx_tail"
- the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
- increment tx_head
- stop the tx_queue if there is no space or if the object with the
  lowest prio has been written
- in TX complete IRQ, handle priv->tx_tail object
- increment tx_tail
- wake queue if there is space but don't wake if we wait for the lowest
  prio object to be TX completed.

Special care needs to be taken to implement that lock-less and race
free. I suggest to look the the mcp251xfd driver.

Marc
Marc Kleine-Budde May 10, 2021, 12:36 p.m. UTC | #2
On 10.05.2021 14:25:15, Marc Kleine-Budde wrote:
> On 09.05.2021 14:43:09, Dario Binacchi wrote:
> > As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> > C/D_CAN controller sends out the buffers prioritized so that the lowest
> > buffer number wins.
> > 
> > What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> > waited until the only frame of the FIFO was actually transmitted by the
> > controller. Only one message in the FIFO but we had to wait for it to
> > empty completely to ensure that the messages were transmitted in the
> > order in which they were loaded.
> > 
> > By storing the frames in the FIFO without requiring its transmission, we
> > will be able to use the full size of the FIFO even in cases such as the
> > one described above. The transmission interrupt will trigger their
> > transmission only when all the messages previously loaded but stored in
> > less priority positions of the buffers have been transmitted.
> 
> The algorithm you implemented looks a bit too complicated to me. Let me
> sketch the algorithm that's implemented by several other drivers.
> 
> - have a power of two number of TX objects
> - add a number of objects to struct priv (tx_num)
>   (or make it a define, if the number of tx objects is compile time fixed)
> - add two "unsigned int" variables to your struct priv,
>   one "tx_head", one "tx_tail"
> - the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
> - increment tx_head
> - stop the tx_queue if there is no space or if the object with the
>   lowest prio has been written
> - in TX complete IRQ, handle priv->tx_tail object
> - increment tx_tail
> - wake queue if there is space but don't wake if we wait for the lowest
>   prio object to be TX completed.
> 
> Special care needs to be taken to implement that lock-less and race
> free. I suggest to look the the mcp251xfd driver.

After converting the driver to the above outlined implementation it
should be more straight forward to add the caching you implemented.  

regards,
Marc
Dario Binacchi May 13, 2021, 11:23 a.m. UTC | #3
Hi Marc,

> Il 10/05/2021 14:36 Marc Kleine-Budde <mkl@pengutronix.de> ha scritto:
> 
>  
> On 10.05.2021 14:25:15, Marc Kleine-Budde wrote:
> > On 09.05.2021 14:43:09, Dario Binacchi wrote:
> > > As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> > > C/D_CAN controller sends out the buffers prioritized so that the lowest
> > > buffer number wins.
> > > 
> > > What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> > > waited until the only frame of the FIFO was actually transmitted by the
> > > controller. Only one message in the FIFO but we had to wait for it to
> > > empty completely to ensure that the messages were transmitted in the
> > > order in which they were loaded.
> > > 
> > > By storing the frames in the FIFO without requiring its transmission, we
> > > will be able to use the full size of the FIFO even in cases such as the
> > > one described above. The transmission interrupt will trigger their
> > > transmission only when all the messages previously loaded but stored in
> > > less priority positions of the buffers have been transmitted.
> > 
> > The algorithm you implemented looks a bit too complicated to me. Let me
> > sketch the algorithm that's implemented by several other drivers.
> > 
> > - have a power of two number of TX objects
> > - add a number of objects to struct priv (tx_num)
> >   (or make it a define, if the number of tx objects is compile time fixed)
> > - add two "unsigned int" variables to your struct priv,
> >   one "tx_head", one "tx_tail"
> > - the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
> > - increment tx_head
> > - stop the tx_queue if there is no space or if the object with the
> >   lowest prio has been written
> > - in TX complete IRQ, handle priv->tx_tail object
> > - increment tx_tail
> > - wake queue if there is space but don't wake if we wait for the lowest
> >   prio object to be TX completed.
> > 
> > Special care needs to be taken to implement that lock-less and race
> > free. I suggest to look the the mcp251xfd driver.
> 
> After converting the driver to the above outlined implementation it
> should be more straight forward to add the caching you implemented.  
> 

I took some time to think about your suggestions.
The submitted patch was developed trying to improve the
CAN transmission using the current driver design for minimize
the creation of bugs.
If I'm not missing something you suggest me to change the
driver design as a pre-condition to apply an updated version
of my patch. IMHO this would increase the possibility of generating
bugs, even for parts of the code that are considered stable.
If the algorithm I have implemented is a bit too complicated,
let's try to simplify it starting from the submitted patch.

Waiting for your reply, thanks and regards
Dario

> regards,
> Marc
> 
> -- 
> Pengutronix e.K.                 | Marc Kleine-Budde           |
> Embedded Linux                   | https://www.pengutronix.de  |
> Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
diff mbox series

Patch

diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 4247ff80a29c..6abde6cbc0b1 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -191,6 +191,9 @@  struct c_can_priv {
 	unsigned int msg_obj_tx_last;
 	u32 msg_obj_rx_mask;
 	atomic_t tx_active;
+	atomic_t tx_cached;
+	spinlock_t tx_cached_lock;
+	atomic_t tx_avail;
 	atomic_t sie_pending;
 	unsigned long tx_dir;
 	int last_status;
diff --git a/drivers/net/can/c_can/c_can_main.c b/drivers/net/can/c_can/c_can_main.c
index 7588f70ca0fe..d2f44c07d47f 100644
--- a/drivers/net/can/c_can/c_can_main.c
+++ b/drivers/net/can/c_can/c_can_main.c
@@ -124,6 +124,9 @@ 
 				 IF_COMM_TXRQST |		 \
 				 IF_COMM_DATAA | IF_COMM_DATAB)
 
+#define IF_COMM_TX_FRAME	(IF_COMM_ARB | IF_COMM_CONTROL | \
+				 IF_COMM_DATAA | IF_COMM_DATAB)
+
 /* For the low buffers we clear the interrupt bit, but keep newdat */
 #define IF_COMM_RCV_LOW		(IF_COMM_MASK | IF_COMM_ARB | \
 				 IF_COMM_CONTROL | IF_COMM_CLR_INT_PND | \
@@ -432,19 +435,36 @@  static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 {
 	struct can_frame *frame = (struct can_frame *)skb->data;
 	struct c_can_priv *priv = netdev_priv(dev);
-	u32 idx, obj;
+	u32 idx, obj, tx_active, tx_cached;
 
 	if (can_dropped_invalid_skb(dev, skb))
 		return NETDEV_TX_OK;
-	/* This is not a FIFO. C/D_CAN sends out the buffers
-	 * prioritized. The lowest buffer number wins.
-	 */
-	idx = fls(atomic_read(&priv->tx_active));
-	obj = idx + priv->msg_obj_tx_first;
 
-	/* If this is the last buffer, stop the xmit queue */
-	if (idx == priv->msg_obj_tx_num - 1)
+	if (atomic_read(&priv->tx_avail) == 0)
 		netif_stop_queue(dev);
+
+	tx_active = atomic_read(&priv->tx_active);
+	tx_cached = atomic_read(&priv->tx_cached);
+	idx = fls(tx_active);
+	if (idx > priv->msg_obj_tx_num - 1) {
+		idx = fls(tx_cached);
+
+		obj = idx + priv->msg_obj_tx_first;
+		spin_lock_bh(&priv->tx_cached_lock);
+		/* prepare message object for transmission */
+		c_can_setup_tx_object(dev, IF_TX, frame, idx);
+		/* Store the message but don't ask for its transmission */
+		c_can_object_put(dev, IF_TX, obj, IF_COMM_TX_FRAME);
+		spin_unlock_bh(&priv->tx_cached_lock);
+		priv->dlc[idx] = frame->len;
+		can_put_echo_skb(skb, dev, idx, 0);
+		atomic_dec(&priv->tx_avail);
+		atomic_add(BIT(idx), &priv->tx_cached);
+		return NETDEV_TX_OK;
+	}
+
+	obj = idx + priv->msg_obj_tx_first;
+
 	/* Store the message in the interface so we can call
 	 * can_put_echo_skb(). We must do this before we enable
 	 * transmit as we might race against do_tx().
@@ -453,6 +473,7 @@  static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 	priv->dlc[idx] = frame->len;
 	can_put_echo_skb(skb, dev, idx, 0);
 
+	atomic_dec(&priv->tx_avail);
 	/* Update the active bits */
 	atomic_add(BIT(idx), &priv->tx_active);
 	/* Start transmission */
@@ -599,6 +620,8 @@  static int c_can_chip_config(struct net_device *dev)
 
 	/* Clear all internal status */
 	atomic_set(&priv->tx_active, 0);
+	atomic_set(&priv->tx_cached, 0);
+	atomic_set(&priv->tx_avail, priv->msg_obj_tx_num);
 	priv->tx_dir = 0;
 
 	/* set bittiming params */
@@ -723,14 +746,31 @@  static void c_can_do_tx(struct net_device *dev)
 	/* Clear the bits in the tx_active mask */
 	atomic_sub(clr, &priv->tx_active);
 
-	if (clr & BIT(priv->msg_obj_tx_num - 1))
-		netif_wake_queue(dev);
-
 	if (pkts) {
+		atomic_add(pkts, &priv->tx_avail);
+
+		if (netif_queue_stopped(dev))
+			netif_wake_queue(dev);
+
 		stats->tx_bytes += bytes;
 		stats->tx_packets += pkts;
 		can_led_event(dev, CAN_LED_EVENT_TX);
 	}
+
+	if (atomic_read(&priv->tx_active) == 0) {
+		pend = atomic_read(&priv->tx_cached);
+
+		clr = pend;
+		while ((idx = ffs(pend))) {
+			idx--;
+			pend &= ~(1 << idx);
+
+			obj = idx + priv->msg_obj_tx_first;
+			c_can_object_put(dev, IF_TX, obj, IF_COMM_TXRQST);
+		}
+		atomic_sub(clr, &priv->tx_cached);
+		atomic_add(clr, &priv->tx_active);
+	}
 }
 
 /* If we have a gap in the pending bits, that means we either
@@ -1193,6 +1233,7 @@  struct net_device *alloc_c_can_dev(int msg_obj_num)
 		return NULL;
 
 	priv = netdev_priv(dev);
+	spin_lock_init(&priv->tx_cached_lock);
 	priv->msg_obj_num = msg_obj_num;
 	priv->msg_obj_rx_num = msg_obj_num - msg_obj_tx_num;
 	priv->msg_obj_rx_first = 1;