diff mbox series

[v3,net] sfc: disable softirqs for ptp TX

Message ID 20220726064504.49613-1-alejandro.lucero-palau@amd.com (mailing list archive)
State Accepted
Commit 67c3b611d92fc238c43734878bc3e232ab570c79
Delegated to: Netdev Maintainers
Headers show
Series [v3,net] sfc: disable softirqs for ptp TX | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 29 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Lucero Palau, Alejandro July 26, 2022, 6:45 a.m. UTC
From: Alejandro Lucero <alejandro.lucero-palau@amd.com>

Sending a PTP packet can imply to use the normal TX driver datapath but
invoked from the driver's ptp worker. The kernel generic TX code
disables softirqs and preemption before calling specific driver TX code,
but the ptp worker does not. Although current ptp driver functionality
does not require it, there are several reasons for doing so:

   1) The invoked code is always executed with softirqs disabled for non
      PTP packets.
   2) Better if a ptp packet transmission is not interrupted by softirq
      handling which could lead to high latencies.
   3) netdev_xmit_more used by the TX code requires preemption to be
      disabled.

Indeed a solution for dealing with kernel preemption state based on static
kernel configuration is not possible since the introduction of dynamic
preemption level configuration at boot time using the static calls
functionality.

Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper")
Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
---
 drivers/net/ethernet/sfc/ptp.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Taehee Yoo July 27, 2022, 3:56 a.m. UTC | #1
Hi Alejandro Lucero,

On 7/26/22 15:45, alejandro.lucero-palau@amd.com wrote:
 > From: Alejandro Lucero <alejandro.lucero-palau@amd.com>
 >
 > Sending a PTP packet can imply to use the normal TX driver datapath but
 > invoked from the driver's ptp worker. The kernel generic TX code
 > disables softirqs and preemption before calling specific driver TX code,
 > but the ptp worker does not. Although current ptp driver functionality
 > does not require it, there are several reasons for doing so:
 >
 >     1) The invoked code is always executed with softirqs disabled for non
 >        PTP packets.
 >     2) Better if a ptp packet transmission is not interrupted by softirq
 >        handling which could lead to high latencies.
 >     3) netdev_xmit_more used by the TX code requires preemption to be
 >        disabled.
 >
 > Indeed a solution for dealing with kernel preemption state based on 
static
 > kernel configuration is not possible since the introduction of dynamic
 > preemption level configuration at boot time using the static calls
 > functionality.
 >
 > Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper")
 > Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com>
 > ---
 >   drivers/net/ethernet/sfc/ptp.c | 22 ++++++++++++++++++++++
 >   1 file changed, 22 insertions(+)
 >
 > diff --git a/drivers/net/ethernet/sfc/ptp.c 
b/drivers/net/ethernet/sfc/ptp.c
 > index 4625f85acab2..10ad0b93d283 100644
 > --- a/drivers/net/ethernet/sfc/ptp.c
 > +++ b/drivers/net/ethernet/sfc/ptp.c
 > @@ -1100,7 +1100,29 @@ static void efx_ptp_xmit_skb_queue(struct 
efx_nic *efx, struct sk_buff *skb)
 >
 >   	tx_queue = efx_channel_get_tx_queue(ptp_data->channel, type);
 >   	if (tx_queue && tx_queue->timestamping) {
 > +		/* This code invokes normal driver TX code which is always
 > +		 * protected from softirqs when called from generic TX code,
 > +		 * which in turn disables preemption. Look at __dev_queue_xmit
 > +		 * which uses rcu_read_lock_bh disabling preemption for RCU
 > +		 * plus disabling softirqs. We do not need RCU reader
 > +		 * protection here.
 > +		 *
 > +		 * Although it is theoretically safe for current PTP TX/RX code
 > +		 * running without disabling softirqs, there are three good
 > +		 * reasond for doing so:
 > +		 *
 > +		 *      1) The code invoked is mainly implemented for non-PTP
 > +		 *         packets and it is always executed with softirqs
 > +		 *         disabled.
 > +		 *      2) This being a single PTP packet, better to not
 > +		 *         interrupt its processing by softirqs which can lead
 > +		 *         to high latencies.
 > +		 *      3) netdev_xmit_more checks preemption is disabled and
 > +		 *         triggers a BUG_ON if not.
 > +		 */
 > +		local_bh_disable();
 >   		efx_enqueue_skb(tx_queue, skb);
 > +		local_bh_enable();
 >   	} else {
 >   		WARN_ONCE(1, "PTP channel has no timestamped tx queue\n");
 >   		dev_kfree_skb_any(skb);

I tested this patch with my X2522, it works well.

test command:
     ptp4l -H -i <sfc interface>

Before this patch, splat looks like:
[ 1464.606891] BUG: using __this_cpu_read() in preemptible [00000000] 
code: kworker/u8:6/100
[ 1464.606949] caller is __efx_enqueue_skb+0xbf/0x1b30 [sfc]
[ 1464.607037] CPU: 3 PID: 100 Comm: kworker/u8:6 Tainted: G        W 
       5.19.0-rc7+ #285 c4ddba47419033e42679f633134da4cdb2a6de42
[ 1464.607081] Hardware name: ASUS System Product Name/PRIME Z690-P D4, 
BIOS 0603 11/01/2021
[ 1464.607109] Workqueue: sfc_ptp efx_ptp_worker [sfc]
[ 1464.607191] Call Trace:
[ 1464.607210]  <TASK>
[ 1464.607228]  dump_stack_lvl+0x57/0x81
[ 1464.607260]  check_preemption_disabled+0xdd/0xe0
[ 1464.607293]  __efx_enqueue_skb+0xbf/0x1b30 [sfc 
f1a1bef35bcab479f96f2aeb5d51c271b89f71ae]
[ 1464.607387]  ? lock_downgrade+0x700/0x700
[ 1464.607420]  ? rcu_read_lock_sched_held+0x12/0x80
[ 1464.607449]  ? lock_acquire+0x478/0x560
[ 1464.607478]  ? rcu_read_lock_sched_held+0x12/0x80
[ 1464.607505]  ? lock_release+0x5c6/0xdf0
[ 1464.607533]  ? rcu_read_lock_sched_held+0x12/0x80
[ 1464.607562]  ? efx_tx_get_copy_buffer_limited+0x230/0x230 [sfc 
f1a1bef35bcab479f96f2aeb5d51c271b89f71ae]
[ 1464.607654]  ? lock_downgrade+0x700/0x700
[ 1464.607684]  ? lock_contended+0xd80/0xd80
[ 1464.607713]  ? do_raw_spin_lock+0x270/0x270
[ 1464.607745]  ? _raw_spin_unlock_irqrestore+0x59/0x70
[ 1464.607774]  ? trace_hardirqs_on+0x3c/0x140
[ 1464.607806]  ? _raw_spin_unlock_irqrestore+0x42/0x70
[ 1464.607840]  efx_ptp_worker+0x6ac/0xec0 [sfc 
f1a1bef35bcab479f96f2aeb5d51c271b89f71ae]
[ 1464.607928]  ? osq_unlock+0x1e0/0x1e0
[ 1464.607961]  ? efx_ptp_rx+0x660/0x660 [sfc 
f1a1bef35bcab479f96f2aeb5d51c271b89f71ae]
[ 1464.608049]  ? lock_downgrade+0x700/0x700
[ 1464.608081]  ? lock_contended+0xd80/0xd80
[ 1464.608112]  ? read_word_at_a_time+0xe/0x20
[ 1464.608149]  process_one_work+0x7c3/0x1300
[ 1464.608184]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 1464.608219]  ? pwq_dec_nr_in_flight+0x230/0x230
[ 1464.608247]  ? lock_acquired+0x37e/0xbc0
[ 1464.608291]  worker_thread+0x5ac/0xed0
[ 1464.608329]  ? process_one_work+0x1300/0x1300
[ 1464.608361]  kthread+0x2a4/0x350
[ 1464.608385]  ? kthread_complete_and_exit+0x20/0x20
[ 1464.608416]  ret_from_fork+0x1f/0x30
[ 1464.608458]  </TASK>

After this patch, I can't see any splats.

Thanks,
Taehee Yoo
patchwork-bot+netdevbpf@kernel.org July 28, 2022, 1:30 a.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 26 Jul 2022 08:45:04 +0200 you wrote:
> From: Alejandro Lucero <alejandro.lucero-palau@amd.com>
> 
> Sending a PTP packet can imply to use the normal TX driver datapath but
> invoked from the driver's ptp worker. The kernel generic TX code
> disables softirqs and preemption before calling specific driver TX code,
> but the ptp worker does not. Although current ptp driver functionality
> does not require it, there are several reasons for doing so:
> 
> [...]

Here is the summary with links:
  - [v3,net] sfc: disable softirqs for ptp TX
    https://git.kernel.org/netdev/net/c/67c3b611d92f

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index 4625f85acab2..10ad0b93d283 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -1100,7 +1100,29 @@  static void efx_ptp_xmit_skb_queue(struct efx_nic *efx, struct sk_buff *skb)
 
 	tx_queue = efx_channel_get_tx_queue(ptp_data->channel, type);
 	if (tx_queue && tx_queue->timestamping) {
+		/* This code invokes normal driver TX code which is always
+		 * protected from softirqs when called from generic TX code,
+		 * which in turn disables preemption. Look at __dev_queue_xmit
+		 * which uses rcu_read_lock_bh disabling preemption for RCU
+		 * plus disabling softirqs. We do not need RCU reader
+		 * protection here.
+		 *
+		 * Although it is theoretically safe for current PTP TX/RX code
+		 * running without disabling softirqs, there are three good
+		 * reasond for doing so:
+		 *
+		 *      1) The code invoked is mainly implemented for non-PTP
+		 *         packets and it is always executed with softirqs
+		 *         disabled.
+		 *      2) This being a single PTP packet, better to not
+		 *         interrupt its processing by softirqs which can lead
+		 *         to high latencies.
+		 *      3) netdev_xmit_more checks preemption is disabled and
+		 *         triggers a BUG_ON if not.
+		 */
+		local_bh_disable();
 		efx_enqueue_skb(tx_queue, skb);
+		local_bh_enable();
 	} else {
 		WARN_ONCE(1, "PTP channel has no timestamped tx queue\n");
 		dev_kfree_skb_any(skb);