diff mbox

[RFC,v1,07/11] net/mlx4: Cache irq_desc->affinity instead of irq_desc

Message ID 1430709339-29083-8-git-send-email-jiang.liu@linux.intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Jiang Liu May 4, 2015, 3:15 a.m. UTC
The field 'affinity' in irq_desc won't change once the irq_desc data
structure is created. So cache irq_desc->affinity instead of irq_desc.
This also helps to hide struct irq_desc from device drivers.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c   |    6 +++---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |    5 +----
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

Comments

Amir Vadai May 4, 2015, 12:10 p.m. UTC | #1
On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
> The field 'affinity' in irq_desc won't change once the irq_desc data
> structure is created. So cache irq_desc->affinity instead of irq_desc.
> This also helps to hide struct irq_desc from device drivers.

Hi Jiang,

I might not understand the new changes irq core, but up until now
affinity was changed when the user changed it through
/proc/irq/<IRQ>/smp_affinity.
This code is monitoring the affinity from the napi_poll context to
detect affinity changes, and prevent napi from keep running on the
wrong CPU.
Therefore, the affinity can't be cached at the beginning. Please
revert this caching.

Thanks,
Amir
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu May 4, 2015, 2 p.m. UTC | #2
On 2015/5/4 20:10, Amir Vadai wrote:
> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
>> The field 'affinity' in irq_desc won't change once the irq_desc data
>> structure is created. So cache irq_desc->affinity instead of irq_desc.
>> This also helps to hide struct irq_desc from device drivers.
> 
> Hi Jiang,
> 
> I might not understand the new changes irq core, but up until now
> affinity was changed when the user changed it through
> /proc/irq/<IRQ>/smp_affinity.
> This code is monitoring the affinity from the napi_poll context to
> detect affinity changes, and prevent napi from keep running on the
> wrong CPU.
> Therefore, the affinity can't be cached at the beginning. Please
> revert this caching.
Hi Amir,
	Thanks for review:) We want to hide irq_desc implementation
details from device drivers, so made these changes.
	Function irq_get_affinity_mask() returns 'struct cpumask *'
and we cache the returned pointer. On the other hand, user may change
IRQ affinity through /proc/irq/<IRQ>/smp_affinity, but that only
changes the bitmap pointed to by the cached pointer and won't change
the pointer itself. So it should always return the latest affinity
setting by calling cpumask_test_cpu(cpu_curr, cq->irq_affinity).
Or am I missing something here?
Thanks!
Gerry
> 
> Thanks,
> Amir
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner May 4, 2015, 3:10 p.m. UTC | #3
On Mon, 4 May 2015, Amir Vadai wrote:

> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
> > The field 'affinity' in irq_desc won't change once the irq_desc data
> > structure is created. So cache irq_desc->affinity instead of irq_desc.
> > This also helps to hide struct irq_desc from device drivers.
> 
> Hi Jiang,
> 
> I might not understand the new changes irq core, but up until now
> affinity was changed when the user changed it through
> /proc/irq/<IRQ>/smp_affinity.
> This code is monitoring the affinity from the napi_poll context to
> detect affinity changes, and prevent napi from keep running on the
> wrong CPU.
> Therefore, the affinity can't be cached at the beginning. Please
> revert this caching.

Sigh. All of this is completely wrong. Polling in the internals of
irq_desc is just crap.

irq_set_affinity_notifier() has been added for exactly this purpose.

Please get rid of your irq_desc abuse.

Thanks,

	tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Vadai May 5, 2015, 9:07 a.m. UTC | #4
On Mon, May 4, 2015 at 5:00 PM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
> On 2015/5/4 20:10, Amir Vadai wrote:
>> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
>>> The field 'affinity' in irq_desc won't change once the irq_desc data
>>> structure is created. So cache irq_desc->affinity instead of irq_desc.
>>> This also helps to hide struct irq_desc from device drivers.
>>
>> Hi Jiang,
>>
>> I might not understand the new changes irq core, but up until now
>> affinity was changed when the user changed it through
>> /proc/irq/<IRQ>/smp_affinity.
>> This code is monitoring the affinity from the napi_poll context to
>> detect affinity changes, and prevent napi from keep running on the
>> wrong CPU.
>> Therefore, the affinity can't be cached at the beginning. Please
>> revert this caching.
> Hi Amir,
>         Thanks for review:) We want to hide irq_desc implementation
> details from device drivers, so made these changes.
>         Function irq_get_affinity_mask() returns 'struct cpumask *'
> and we cache the returned pointer. On the other hand, user may change
> IRQ affinity through /proc/irq/<IRQ>/smp_affinity, but that only
> changes the bitmap pointed to by the cached pointer and won't change
> the pointer itself. So it should always return the latest affinity
> setting by calling cpumask_test_cpu(cpu_curr, cq->irq_affinity).
> Or am I missing something here?
No - you are completely right - I missed it.

Anyway I tested it to make sure, and everything works as expected.

Acked-By: Amir Vadai <amirv@mellanox.com>

Thanks,
Amir


> Thanks!
> Gerry
>>
>> Thanks,
>> Amir
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Vadai May 5, 2015, 9:17 a.m. UTC | #5
On Mon, May 4, 2015 at 6:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 4 May 2015, Amir Vadai wrote:
>
>> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
>> > The field 'affinity' in irq_desc won't change once the irq_desc data
>> > structure is created. So cache irq_desc->affinity instead of irq_desc.
>> > This also helps to hide struct irq_desc from device drivers.
>>
>> Hi Jiang,
>>
>> I might not understand the new changes irq core, but up until now
>> affinity was changed when the user changed it through
>> /proc/irq/<IRQ>/smp_affinity.
>> This code is monitoring the affinity from the napi_poll context to
>> detect affinity changes, and prevent napi from keep running on the
>> wrong CPU.
>> Therefore, the affinity can't be cached at the beginning. Please
>> revert this caching.
>
> Sigh. All of this is completely wrong. Polling in the internals of
> irq_desc is just crap.
Hi,

>
> irq_set_affinity_notifier() has been added for exactly this purpose.
rmap is already registered on irq_set_affinity_notifier().
Actually, I did post an extension to the affinity notifier, to use
affinity chain which you rejected [1] (and BTW, convinced me that this
is not the way to do it).
Current code is according to your suggestion.

[1] - https://lkml.org/lkml/2014/5/26/222

Amir

>
> Please get rid of your irq_desc abuse.
>
> Thanks,
>
>         tglx
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner May 5, 2015, 2:53 p.m. UTC | #6
On Tue, 5 May 2015, Amir Vadai wrote:

> On Mon, May 4, 2015 at 6:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 4 May 2015, Amir Vadai wrote:
> >
> >> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
> >> > The field 'affinity' in irq_desc won't change once the irq_desc data
> >> > structure is created. So cache irq_desc->affinity instead of irq_desc.
> >> > This also helps to hide struct irq_desc from device drivers.
> >>
> >> Hi Jiang,
> >>
> >> I might not understand the new changes irq core, but up until now
> >> affinity was changed when the user changed it through
> >> /proc/irq/<IRQ>/smp_affinity.
> >> This code is monitoring the affinity from the napi_poll context to
> >> detect affinity changes, and prevent napi from keep running on the
> >> wrong CPU.
> >> Therefore, the affinity can't be cached at the beginning. Please
> >> revert this caching.
> >
> > Sigh. All of this is completely wrong. Polling in the internals of
> > irq_desc is just crap.
> Hi,
> 
> >
> > irq_set_affinity_notifier() has been added for exactly this purpose.
> rmap is already registered on irq_set_affinity_notifier().
> Actually, I did post an extension to the affinity notifier, to use
> affinity chain which you rejected [1] (and BTW, convinced me that this
> is not the way to do it).
> Current code is according to your suggestion.
> 
> [1] - https://lkml.org/lkml/2014/5/26/222

No, it's not. I wrote:

 "So what's the point of adding notifier call chain complexity,
  ordering problems etc., if you can simply note the fact that the
  affinity changed in the rmap itself and check that in the RX path?"

I did not tell you to fiddle in irqdesc, really.

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir Vadai May 7, 2015, 10:41 a.m. UTC | #7
On 5/5/2015 5:53 PM, Thomas Gleixner wrote:
> On Tue, 5 May 2015, Amir Vadai wrote:
> 
>> On Mon, May 4, 2015 at 6:10 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> On Mon, 4 May 2015, Amir Vadai wrote:
>>>
>>>> On Mon, May 4, 2015 at 6:15 AM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
>>>>> The field 'affinity' in irq_desc won't change once the irq_desc data
>>>>> structure is created. So cache irq_desc->affinity instead of irq_desc.
>>>>> This also helps to hide struct irq_desc from device drivers.
>>>>
>>>> Hi Jiang,
>>>>
>>>> I might not understand the new changes irq core, but up until now
>>>> affinity was changed when the user changed it through
>>>> /proc/irq/<IRQ>/smp_affinity.
>>>> This code is monitoring the affinity from the napi_poll context to
>>>> detect affinity changes, and prevent napi from keep running on the
>>>> wrong CPU.
>>>> Therefore, the affinity can't be cached at the beginning. Please
>>>> revert this caching.
>>>
>>> Sigh. All of this is completely wrong. Polling in the internals of
>>> irq_desc is just crap.
>> Hi,
>>
>>>
>>> irq_set_affinity_notifier() has been added for exactly this purpose.
>> rmap is already registered on irq_set_affinity_notifier().
>> Actually, I did post an extension to the affinity notifier, to use
>> affinity chain which you rejected [1] (and BTW, convinced me that this
>> is not the way to do it).
>> Current code is according to your suggestion.
>>
>> [1] - https://lkml.org/lkml/2014/5/26/222
> 
> No, it's not. I wrote:
> 
>  "So what's the point of adding notifier call chain complexity,
>   ordering problems etc., if you can simply note the fact that the
>   affinity changed in the rmap itself and check that in the RX path?"
> 
> I did not tell you to fiddle in irqdesc, really.
> 
> Thanks,
> 
> 	tglx
> 

It is problematic, since I didn't want to link the napi polling affinity
issue and CONFIG_CPU_RMAP being selected. That's why I CC'd you when
submitted the patch [1]. Guess it was my bad that I didn't do it in a
better way that would give you a chance to fully review it.

Anyway, I understand now that I shouldn't access irqdesc attributes from
a device driver.
I will see if I can live with disabling this whole feature (migrate a
currently looping NAPI when affinity is changed), when CONFIG_CPU_RMAP
is not selected. If not, I might modify the rmap code to give me this
information even when it is not selected. I assume that rmap notifier
callback can access this information.

Thanks,
Amir

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 22da4d0d0f05..a03a01625398 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -31,6 +31,7 @@ 
  *
  */
 
+#include <linux/irq.h>
 #include <linux/mlx4/cq.h>
 #include <linux/mlx4/qp.h>
 #include <linux/mlx4/cmd.h>
@@ -135,9 +136,8 @@  int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 				mdev->dev->caps.num_comp_vectors;
 		}
 
-		cq->irq_desc =
-			irq_to_desc(mlx4_eq_get_irq(mdev->dev,
-						    cq->vector));
+		cq->irq_affinity = irq_get_affinity_mask(
+					mlx4_eq_get_irq(mdev->dev, cq->vector));
 	} else {
 		/* For TX we use the same irq per
 		ring we assigned for the RX    */
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 4fdd3c37e47b..59d8395ef31c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1022,14 +1022,11 @@  int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 	/* If we used up all the quota - we're probably not done yet... */
 	if (done == budget) {
 		int cpu_curr;
-		const struct cpumask *aff;
 
 		INC_PERF_COUNTER(priv->pstats.napi_quota);
 
 		cpu_curr = smp_processor_id();
-		aff = irq_desc_get_irq_data(cq->irq_desc)->affinity;
-
-		if (likely(cpumask_test_cpu(cpu_curr, aff)))
+		if (likely(cpumask_test_cpu(cpu_curr, cq->irq_affinity)))
 			return budget;
 
 		/* Current cpu is not according to smp_irq_affinity -
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 9de30216b146..5de5e9a51d76 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -357,7 +357,7 @@  struct mlx4_en_cq {
 #define CQ_USER_PEND (MLX4_EN_CQ_STATE_POLL | MLX4_EN_CQ_STATE_POLL_YIELD)
 	spinlock_t poll_lock; /* protects from LLS/napi conflicts */
 #endif  /* CONFIG_NET_RX_BUSY_POLL */
-	struct irq_desc *irq_desc;
+	struct cpumask	*irq_affinity;
 };
 
 struct mlx4_en_port_profile {