diff mbox

[1/2] net/mlx5: increase async EQ to avoid EQ overrun

Message ID 1517840992-29813-1-git-send-email-maxg@mellanox.com (mailing list archive)
State Accepted
Headers show

Commit Message

Max Gurtovoy Feb. 5, 2018, 2:29 p.m. UTC
Currently the async EQ has 256 entries only. It might not be big enough
for the SW to handle all the needed pending events. For example, in case
of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
and the target goes down, the FW will raise 1024 "last WQE reached" events
and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
it the FW should be able to delay the event and raise it later on using internal
backpressure mechanism.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Sagi Grimberg Feb. 5, 2018, 3:28 p.m. UTC | #1
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Feb. 5, 2018, 4:02 p.m. UTC | #2
On Mon, 2018-02-05 at 16:29 +0200, Max Gurtovoy wrote:
> Currently the async EQ has 256 entries only. It might not be big enough
> for the SW to handle all the needed pending events. For example, in case
> of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
> and the target goes down, the FW will raise 1024 "last WQE reached" events
> and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
> it the FW should be able to delay the event and raise it later on using internal
> backpressure mechanism.
> 
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>

Thanks, applied.  But if this gets me in trouble with DaveM for sending
a patch to drivers/net, I'm gonna tell him it's because you sent it here
and excluded netdev@ and that it was part of a series related to an RDMA
issue that I took it, all while giving you the evil eye ;-)
Leon Romanovsky Feb. 5, 2018, 4:10 p.m. UTC | #3
On Mon, Feb 05, 2018 at 11:02:10AM -0500, Doug Ledford wrote:
> On Mon, 2018-02-05 at 16:29 +0200, Max Gurtovoy wrote:
> > Currently the async EQ has 256 entries only. It might not be big enough
> > for the SW to handle all the needed pending events. For example, in case
> > of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
> > and the target goes down, the FW will raise 1024 "last WQE reached" events
> > and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
> > it the FW should be able to delay the event and raise it later on using internal
> > backpressure mechanism.
> >
> > Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>
> Thanks, applied.  But if this gets me in trouble with DaveM for sending
> a patch to drivers/net, I'm gonna tell him it's because you sent it here
> and excluded netdev@ and that it was part of a series related to an RDMA
> issue that I took it, all while giving you the evil eye ;-)

You can add that neither me nor Saeed didn't test it for any conflicts.

>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Jason Gunthorpe Feb. 5, 2018, 6:09 p.m. UTC | #4
On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote:
> Currently the async EQ has 256 entries only. It might not be big enough
> for the SW to handle all the needed pending events. For example, in case
> of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
> and the target goes down, the FW will raise 1024 "last WQE reached" events
> and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
> it the FW should be able to delay the event and raise it later on using internal
> backpressure mechanism.

If the firmware has an internal backpressure meachanism then why
would we get a EQ overrun?

Do we need to block adding too many QPs to a SRQ as well or something
like that?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Gurtovoy Feb. 5, 2018, 11:11 p.m. UTC | #5
On 2/5/2018 8:09 PM, Jason Gunthorpe wrote:
> On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote:
>> Currently the async EQ has 256 entries only. It might not be big enough
>> for the SW to handle all the needed pending events. For example, in case
>> of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
>> and the target goes down, the FW will raise 1024 "last WQE reached" events
>> and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
>> it the FW should be able to delay the event and raise it later on using internal
>> backpressure mechanism.
> 
> If the firmware has an internal backpressure meachanism then why
> would we get a EQ overrun?

FW backpressure mechanism is WIP, that's why we get the overrun.
After consulting with FW team, we conclude that 256 EQ depth is small.
Do you think it's reasonable to allocate 4k entries (256KB of contig 
memory) for async EQ ?

> 
> Do we need to block adding too many QPs to a SRQ as well or something
> like that?

Hard to say. In the storage world, this may lead to a situation that 
initiator X has priority over initiator Y on without any good reason 
(only because X was served before Y)..

> 
> Jason
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Feb. 5, 2018, 11:16 p.m. UTC | #6
On Tue, Feb 06, 2018 at 01:11:41AM +0200, Max Gurtovoy wrote:
> 
> 
> On 2/5/2018 8:09 PM, Jason Gunthorpe wrote:
> >On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote:
> >>Currently the async EQ has 256 entries only. It might not be big enough
> >>for the SW to handle all the needed pending events. For example, in case
> >>of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
> >>and the target goes down, the FW will raise 1024 "last WQE reached" events
> >>and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
> >>it the FW should be able to delay the event and raise it later on using internal
> >>backpressure mechanism.
> >
> >If the firmware has an internal backpressure meachanism then why
> >would we get a EQ overrun?
> 
> FW backpressure mechanism is WIP, that's why we get the overrun.

Ah, so current HW blows up if EQ is overrun and that can actually be
triggered by ULPs? Yuk

> After consulting with FW team, we conclude that 256 EQ depth is small.
> Do you think it's reasonable to allocate 4k entries (256KB of contig memory)
> for async EQ ?

No idea, ask Saeed?

> >Do we need to block adding too many QPs to a SRQ as well or something
> >like that?
> 
> Hard to say. In the storage world, this may lead to a situation that
> initiator X has priority over initiator Y on without any good reason (only
> because X was served before Y)..

Well, correctness comes first, so if the device does have to protect
itself from rouge ULPS.. If that means enforcing a goofy limit, then
so be it :(

Presumably someday fixed firmware will remove the limitation?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Saeed Mahameed Feb. 8, 2018, 5:45 a.m. UTC | #7
On Mon, Feb 5, 2018 at 3:16 PM, Jason Gunthorpe <jgg@mellanox.com> wrote:
> On Tue, Feb 06, 2018 at 01:11:41AM +0200, Max Gurtovoy wrote:
>>
>>
>> On 2/5/2018 8:09 PM, Jason Gunthorpe wrote:
>> >On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote:
>> >>Currently the async EQ has 256 entries only. It might not be big enough
>> >>for the SW to handle all the needed pending events. For example, in case
>> >>of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
>> >>and the target goes down, the FW will raise 1024 "last WQE reached" events
>> >>and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
>> >>it the FW should be able to delay the event and raise it later on using internal
>> >>backpressure mechanism.
>> >
>> >If the firmware has an internal backpressure meachanism then why
>> >would we get a EQ overrun?
>>
>> FW backpressure mechanism is WIP, that's why we get the overrun.
>
> Ah, so current HW blows up if EQ is overrun and that can actually be
> triggered by ULPs? Yuk
>
>> After consulting with FW team, we conclude that 256 EQ depth is small.
>> Do you think it's reasonable to allocate 4k entries (256KB of contig memory)
>> for async EQ ?
>
> No idea, ask Saeed?

Thank you Jason for raising those concerns, your concerns are in place
and the whole issue
already being discussed internally.

Max, you are already cc'ed to my emails regarding this issue since last week,
next time I would expect you to roll back such patch.

I see, that this patch is already on its way to linus, with no proper
mlx5 maintainer sign-off, nice.

There is a well defined flow we have internally for each patch to pass review,
regression and merge tests, why did you go behind our backs with this patch ?

>
>> >Do we need to block adding too many QPs to a SRQ as well or something
>> >like that?
>>
>> Hard to say. In the storage world, this may lead to a situation that
>> initiator X has priority over initiator Y on without any good reason (only
>> because X was served before Y)..
>
> Well, correctness comes first, so if the device does have to protect
> itself from rouge ULPS.. If that means enforcing a goofy limit, then
> so be it :(
>
> Presumably someday fixed firmware will remove the limitation?
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Feb. 8, 2018, 4:26 p.m. UTC | #8
On Wed, Feb 07, 2018 at 09:45:52PM -0800, Saeed Mahameed wrote:

> I see, that this patch is already on its way to linus, with no proper
> mlx5 maintainer sign-off, nice.

Doug and I can be more careful to not accept patches without
maintainer ack's - but this means I will want to see Acks from you no
matter who sends the patch (including patches sent by Leon).

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Feb. 8, 2018, 4:39 p.m. UTC | #9
On Thu, Feb 08, 2018 at 09:26:05AM -0700, Jason Gunthorpe wrote:
> On Wed, Feb 07, 2018 at 09:45:52PM -0800, Saeed Mahameed wrote:
>
> > I see, that this patch is already on its way to linus, with no proper
> > mlx5 maintainer sign-off, nice.
>
> Doug and I can be more careful to not accept patches without
> maintainer ack's - but this means I will want to see Acks from you no
> matter who sends the patch (including patches sent by Leon).

Please don't invent the wheel, mlx5 is my responsibility too,
I'm just successfully avoid dealing with netdev patches.

 8897 MELLANOX MLX5 core VPI driver
 8898 M:      Saeed Mahameed <saeedm@mellanox.com>
 8899 M:      Matan Barak <matanb@mellanox.com>
 8900 M:      Leon Romanovsky <leonro@mellanox.com>
 8901 L:      netdev@vger.kernel.org
 8902 L:      linux-rdma@vger.kernel.org
 8903 W:      http://www.mellanox.com
 8904 Q:      http://patchwork.ozlabs.org/project/netdev/list/
 8905 S:      Supported
 8906 F:      drivers/net/ethernet/mellanox/mlx5/core/
 8907 F:      include/linux/mlx5/

Thanks

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Saeed Mahameed Feb. 8, 2018, 6:58 p.m. UTC | #10
On Thu, Feb 8, 2018 at 8:26 AM, Jason Gunthorpe <jgg@mellanox.com> wrote:
> On Wed, Feb 07, 2018 at 09:45:52PM -0800, Saeed Mahameed wrote:
>
>> I see, that this patch is already on its way to linus, with no proper
>> mlx5 maintainer sign-off, nice.
>
> Doug and I can be more careful to not accept patches without
> maintainer ack's - but this means I will want to see Acks from you no
> matter who sends the patch (including patches sent by Leon).
>

Thanks for understanding, Leon and I almost always in sync,
we don't submit patches to each other tree (especially pure patches)
without notifying the other first.

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index e7e7cef..9ce4add 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -51,7 +51,7 @@  enum {
 
 enum {
 	MLX5_NUM_SPARE_EQE	= 0x80,
-	MLX5_NUM_ASYNC_EQE	= 0x100,
+	MLX5_NUM_ASYNC_EQE	= 0x1000,
 	MLX5_NUM_CMD_EQE	= 32,
 	MLX5_NUM_PF_DRAIN	= 64,
 };