Message ID | 1517840992-29813-1-git-send-email-maxg@mellanox.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2018-02-05 at 16:29 +0200, Max Gurtovoy wrote: > Currently the async EQ has 256 entries only. It might not be big enough > for the SW to handle all the needed pending events. For example, in case > of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target > and the target goes down, the FW will raise 1024 "last WQE reached" events > and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond > it the FW should be able to delay the event and raise it later on using internal > backpressure mechanism. > > Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Thanks, applied. But if this gets me in trouble with DaveM for sending a patch to drivers/net, I'm gonna tell him it's because you sent it here and excluded netdev@ and that it was part of a series related to an RDMA issue that I took it, all while giving you the evil eye ;-)
On Mon, Feb 05, 2018 at 11:02:10AM -0500, Doug Ledford wrote: > On Mon, 2018-02-05 at 16:29 +0200, Max Gurtovoy wrote: > > Currently the async EQ has 256 entries only. It might not be big enough > > for the SW to handle all the needed pending events. For example, in case > > of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target > > and the target goes down, the FW will raise 1024 "last WQE reached" events > > and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond > > it the FW should be able to delay the event and raise it later on using internal > > backpressure mechanism. > > > > Signed-off-by: Max Gurtovoy <maxg@mellanox.com> > > Thanks, applied. But if this gets me in trouble with DaveM for sending > a patch to drivers/net, I'm gonna tell him it's because you sent it here > and excluded netdev@ and that it was part of a series related to an RDMA > issue that I took it, all while giving you the evil eye ;-) You can add that neither me nor Saeed didn't test it for any conflicts. > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote: > Currently the async EQ has 256 entries only. It might not be big enough > for the SW to handle all the needed pending events. For example, in case > of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target > and the target goes down, the FW will raise 1024 "last WQE reached" events > and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond > it the FW should be able to delay the event and raise it later on using internal > backpressure mechanism. If the firmware has an internal backpressure meachanism then why would we get a EQ overrun? Do we need to block adding too many QPs to a SRQ as well or something like that? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/5/2018 8:09 PM, Jason Gunthorpe wrote: > On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote: >> Currently the async EQ has 256 entries only. It might not be big enough >> for the SW to handle all the needed pending events. For example, in case >> of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target >> and the target goes down, the FW will raise 1024 "last WQE reached" events >> and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond >> it the FW should be able to delay the event and raise it later on using internal >> backpressure mechanism. > > If the firmware has an internal backpressure meachanism then why > would we get a EQ overrun? FW backpressure mechanism is WIP, that's why we get the overrun. After consulting with FW team, we conclude that 256 EQ depth is small. Do you think it's reasonable to allocate 4k entries (256KB of contig memory) for async EQ ? > > Do we need to block adding too many QPs to a SRQ as well or something > like that? Hard to say. In the storage world, this may lead to a situation that initiator X has priority over initiator Y on without any good reason (only because X was served before Y).. > > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 06, 2018 at 01:11:41AM +0200, Max Gurtovoy wrote: > > > On 2/5/2018 8:09 PM, Jason Gunthorpe wrote: > >On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote: > >>Currently the async EQ has 256 entries only. It might not be big enough > >>for the SW to handle all the needed pending events. For example, in case > >>of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target > >>and the target goes down, the FW will raise 1024 "last WQE reached" events > >>and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond > >>it the FW should be able to delay the event and raise it later on using internal > >>backpressure mechanism. > > > >If the firmware has an internal backpressure meachanism then why > >would we get a EQ overrun? > > FW backpressure mechanism is WIP, that's why we get the overrun. Ah, so current HW blows up if EQ is overrun and that can actually be triggered by ULPs? Yuk > After consulting with FW team, we conclude that 256 EQ depth is small. > Do you think it's reasonable to allocate 4k entries (256KB of contig memory) > for async EQ ? No idea, ask Saeed? > >Do we need to block adding too many QPs to a SRQ as well or something > >like that? > > Hard to say. In the storage world, this may lead to a situation that > initiator X has priority over initiator Y on without any good reason (only > because X was served before Y).. Well, correctness comes first, so if the device does have to protect itself from rouge ULPS.. If that means enforcing a goofy limit, then so be it :( Presumably someday fixed firmware will remove the limitation? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 5, 2018 at 3:16 PM, Jason Gunthorpe <jgg@mellanox.com> wrote: > On Tue, Feb 06, 2018 at 01:11:41AM +0200, Max Gurtovoy wrote: >> >> >> On 2/5/2018 8:09 PM, Jason Gunthorpe wrote: >> >On Mon, Feb 05, 2018 at 04:29:51PM +0200, Max Gurtovoy wrote: >> >>Currently the async EQ has 256 entries only. It might not be big enough >> >>for the SW to handle all the needed pending events. For example, in case >> >>of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target >> >>and the target goes down, the FW will raise 1024 "last WQE reached" events >> >>and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond >> >>it the FW should be able to delay the event and raise it later on using internal >> >>backpressure mechanism. >> > >> >If the firmware has an internal backpressure meachanism then why >> >would we get a EQ overrun? >> >> FW backpressure mechanism is WIP, that's why we get the overrun. > > Ah, so current HW blows up if EQ is overrun and that can actually be > triggered by ULPs? Yuk > >> After consulting with FW team, we conclude that 256 EQ depth is small. >> Do you think it's reasonable to allocate 4k entries (256KB of contig memory) >> for async EQ ? > > No idea, ask Saeed? Thank you Jason for raising those concerns, your concerns are in place and the whole issue already being discussed internally. Max, you are already cc'ed to my emails regarding this issue since last week, next time I would expect you to roll back such patch. I see, that this patch is already on its way to linus, with no proper mlx5 maintainer sign-off, nice. There is a well defined flow we have internally for each patch to pass review, regression and merge tests, why did you go behind our backs with this patch ? > >> >Do we need to block adding too many QPs to a SRQ as well or something >> >like that? >> >> Hard to say. In the storage world, this may lead to a situation that >> initiator X has priority over initiator Y on without any good reason (only >> because X was served before Y).. > > Well, correctness comes first, so if the device does have to protect > itself from rouge ULPS.. If that means enforcing a goofy limit, then > so be it :( > > Presumably someday fixed firmware will remove the limitation? > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 07, 2018 at 09:45:52PM -0800, Saeed Mahameed wrote: > I see, that this patch is already on its way to linus, with no proper > mlx5 maintainer sign-off, nice. Doug and I can be more careful to not accept patches without maintainer ack's - but this means I will want to see Acks from you no matter who sends the patch (including patches sent by Leon). Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Feb 08, 2018 at 09:26:05AM -0700, Jason Gunthorpe wrote: > On Wed, Feb 07, 2018 at 09:45:52PM -0800, Saeed Mahameed wrote: > > > I see, that this patch is already on its way to linus, with no proper > > mlx5 maintainer sign-off, nice. > > Doug and I can be more careful to not accept patches without > maintainer ack's - but this means I will want to see Acks from you no > matter who sends the patch (including patches sent by Leon). Please don't invent the wheel, mlx5 is my responsibility too, I'm just successfully avoid dealing with netdev patches. 8897 MELLANOX MLX5 core VPI driver 8898 M: Saeed Mahameed <saeedm@mellanox.com> 8899 M: Matan Barak <matanb@mellanox.com> 8900 M: Leon Romanovsky <leonro@mellanox.com> 8901 L: netdev@vger.kernel.org 8902 L: linux-rdma@vger.kernel.org 8903 W: http://www.mellanox.com 8904 Q: http://patchwork.ozlabs.org/project/netdev/list/ 8905 S: Supported 8906 F: drivers/net/ethernet/mellanox/mlx5/core/ 8907 F: include/linux/mlx5/ Thanks > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Feb 8, 2018 at 8:26 AM, Jason Gunthorpe <jgg@mellanox.com> wrote: > On Wed, Feb 07, 2018 at 09:45:52PM -0800, Saeed Mahameed wrote: > >> I see, that this patch is already on its way to linus, with no proper >> mlx5 maintainer sign-off, nice. > > Doug and I can be more careful to not accept patches without > maintainer ack's - but this means I will want to see Acks from you no > matter who sends the patch (including patches sent by Leon). > Thanks for understanding, Leon and I almost always in sync, we don't submit patches to each other tree (especially pure patches) without notifying the other first. > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index e7e7cef..9ce4add 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -51,7 +51,7 @@ enum { enum { MLX5_NUM_SPARE_EQE = 0x80, - MLX5_NUM_ASYNC_EQE = 0x100, + MLX5_NUM_ASYNC_EQE = 0x1000, MLX5_NUM_CMD_EQE = 32, MLX5_NUM_PF_DRAIN = 64, };
Currently the async EQ has 256 entries only. It might not be big enough for the SW to handle all the needed pending events. For example, in case of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target and the target goes down, the FW will raise 1024 "last WQE reached" events and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond it the FW should be able to delay the event and raise it later on using internal backpressure mechanism. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)