Message ID | 3eb5814f-14cb-2b94-adf8-335d4b2eb7e9@grimberg.me (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
> Hi Sagi, > > I think we need to add fence to the UMR wqe. > > so lets try this one: > > diff --git a/drivers/infiniband/hw/mlx5/qp.c > b/drivers/infiniband/hw/mlx5/qp.c > index ad8a263..c38c4fa 100644 > --- a/drivers/infiniband/hw/mlx5/qp.c > +++ b/drivers/infiniband/hw/mlx5/qp.c > @@ -3737,8 +3737,7 @@ static void dump_wqe(struct mlx5_ib_qp *qp, int > idx, int size_16) > > static u8 get_fence(u8 fence, struct ib_send_wr *wr) > { > - if (unlikely(wr->opcode == IB_WR_LOCAL_INV && > - wr->send_flags & IB_SEND_FENCE)) > + if (wr->opcode == IB_WR_LOCAL_INV || wr->opcode == IB_WR_REG_MR) > return MLX5_FENCE_MODE_STRONG_ORDERING; > > if (unlikely(fence)) { This will kill performance, isn't there another fix that can be applied just for retransmission flow? > Couldn't repro that case but I run some initial tests in my Lab (with my > patch above) - not performace servers: > > Initiator with 24 CPUs (2 threads/core, 6 cores/socket, 2 sockets), > Connect IB (same driver mlx5_ib), kernel 4.10.0, fio test with 24 jobs > and 128 iodepth. > register_always=N > > Target - 1 subsystem with 1 ns (null_blk) > > bs read (without/with patch) write (without/with patch) > --- -------------------------- --------------------------- > 512 1019k / 1008k 1004k / 992k > 1k 1021k / 1013k 1002k / 991k > 4k 1030k / 1022k 978k / 969k > > CPU usage is 100% for both cases in the initiator side. > haven't seen difference with bs = 16k. > No so big drop like we would expect, Obviously you won't see a drop without registering memory for small IO (register_always=N), this would bypass registration altogether... Please retest with register_always=Y. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Hi, > > Shahar/Joseph, what is your link layer conf (IB/Eth) ? > In eth case, have you configured some PFC ? if not, can you try it ? > I suspect that this is the root cause The root cause is that the device fails frwr in retransmission flow, if PFC is not on, it will happen almost immediately, if not it will happen at some point... > and it might help you avoiding > this case, meanwhile we're looking for for the best solution. > > Adding Vladimir that will run iSER on his performance setup with the new > fencing patch (not an NVMEoF related issue). > We can run also NVMEoF later on if needed. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Sorry for the delay, I have been OOO for the past few days. Indeed the underlying transport is Ethernet, and we have found that flow control is disabled on the switch side. I still do not have access to the system for re-tests with the flow control, and 4.10. But I am working on assembling another system. A little off-topic, but something that may help me get the other system up faster, can I use a dual CX4 card as initiator and target (i.e. one port going up to the switch, and the other coming back into the system) without the kernel looping back? I have a spare CX4, so if possible, I will use it to build a mini system for recreates of this sort. Thanks, Shahar On 03/07/2017 03:41 PM, Sagi Grimberg wrote: > >> Hi, >> >> Shahar/Joseph, what is your link layer conf (IB/Eth) ? >> In eth case, have you configured some PFC ? if not, can you try it ? >> I suspect that this is the root cause > > The root cause is that the device fails frwr in retransmission > flow, if PFC is not on, it will happen almost immediately, if not > it will happen at some point... > >> and it might help you avoiding >> this case, meanwhile we're looking for for the best solution. >> >> Adding Vladimir that will run iSER on his performance setup with the new >> fencing patch (not an NVMEoF related issue). >> We can run also NVMEoF later on if needed. > > Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Patched by patch of Max (block 4K): > > allways reg Y N > write 1902K 1923.3K > read 1315K 2009K > > Original OFED code (block 4K) > > allways reg Y N > write 1947K 1982K > read 1273K 1978K First, the write comparison is redundant because we send immediate data without memory registration. And, I'd compare against upstream code and not OFED. So it seems that strong fencing does not effect performance from the ULP point of view, surprising... I'd suggest comparing on nvmf and srp as well. If this is the case, and it indeed resolves the issue, we should move forward with it as is. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Joseph, > Hey folks. Apologies if this message comes through twice, but when I originally sent it the list flagged it as too large due to the dmesg log attachments, and then a coworker just told me they never saw it, so I don't think it made it through on the first attempt. > > Please see last note above and dmesg example attached - after more extensive testing with Max's patch we are still able to produce cqe dump errors (at a much lower frequency) as well as a new failure mode involving a crash dump. > This is a different issue AFAICT, Looks like nvmet_sq_destroy() is stuck waiting for the final reference to drop (which seems to never happen). I'm trying to look for a code path where this may happen. Can jyou tell if the backend block device completed all of its I/O when this happens (can check for active tags in debugfs). -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 3/17/2017 8:37 PM, Gruher, Joseph R wrote: >> >> >>> -----Original Message----- >>> From: Max Gurtovoy [mailto:maxg@mellanox.com] >>> >>> I think we need to add fence to the UMR wqe. >>> >>> diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c >>> index ad8a263..c38c4fa 100644 >>> --- a/drivers/infiniband/hw/mlx5/qp.c >>> +++ b/drivers/infiniband/hw/mlx5/qp.c >>> @@ -3737,8 +3737,7 @@ static void dump_wqe(struct mlx5_ib_qp *qp, int idx, >>> int size_16) >>> >>> static u8 get_fence(u8 fence, struct ib_send_wr *wr) >>> { >>> - if (unlikely(wr->opcode == IB_WR_LOCAL_INV && >>> - wr->send_flags & IB_SEND_FENCE)) >>> + if (wr->opcode == IB_WR_LOCAL_INV || wr->opcode == IB_WR_REG_MR) >>> return MLX5_FENCE_MODE_STRONG_ORDERING; >>> >>> if (unlikely(fence)) { >>> >>> >> >> You mention these patches are only for testing. How do we get to something >> which can be submitted to upstream? > > Yes, we need to be careful and not put the strong_fence if it's not a must. > I'll be out for the upcoming week, but I'll ask our mlx5 maintainers to > prepare a suitable patch and check some other applications performance > numbers. > Thanks for the testing, you can use this patch meanwhile till we push > the formal solution. > Hello Max, We're seeing the same issue in our setup and we're running this patch on our system for some time already. It seems to have fixed the issue. When there is a final patch available, we can test it too. Thanks, Marta -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
----- Mail original ----- > On 4/10/2017 2:40 PM, Marta Rybczynska wrote: >>> On 3/17/2017 8:37 PM, Gruher, Joseph R wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Max Gurtovoy [mailto:maxg@mellanox.com] >>>>> >>>>> I think we need to add fence to the UMR wqe. >>>>> >>>>> diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c >>>>> index ad8a263..c38c4fa 100644 >>>>> --- a/drivers/infiniband/hw/mlx5/qp.c >>>>> +++ b/drivers/infiniband/hw/mlx5/qp.c >>>>> @@ -3737,8 +3737,7 @@ static void dump_wqe(struct mlx5_ib_qp *qp, int idx, >>>>> int size_16) >>>>> >>>>> static u8 get_fence(u8 fence, struct ib_send_wr *wr) >>>>> { >>>>> - if (unlikely(wr->opcode == IB_WR_LOCAL_INV && >>>>> - wr->send_flags & IB_SEND_FENCE)) >>>>> + if (wr->opcode == IB_WR_LOCAL_INV || wr->opcode == IB_WR_REG_MR) >>>>> return MLX5_FENCE_MODE_STRONG_ORDERING; >>>>> >>>>> if (unlikely(fence)) { >>>>> >>>>> >>>> >>>> You mention these patches are only for testing. How do we get to something >>>> which can be submitted to upstream? >>> >>> Yes, we need to be careful and not put the strong_fence if it's not a must. >>> I'll be out for the upcoming week, but I'll ask our mlx5 maintainers to >>> prepare a suitable patch and check some other applications performance >>> numbers. >>> Thanks for the testing, you can use this patch meanwhile till we push >>> the formal solution. >>> >> >> Hello Max, >> We're seeing the same issue in our setup and we're running this patch >> on our system for some time already. It seems to have fixed the issue. >> When there is a final patch available, we can test it too. >> >> Thanks, >> Marta >> > > Hi Marta, > thanks for testing my patch. I'll send it early next week (holiday's in > our country) so it will be available in 4.11 kernel hopefully. > if you can share on which NIC's you tested it and the perf numbers you > get (with and without the patch), it will be great. > Hello Max, That's great news. We're ready to start testing as the final version of the patch is out. We're using ConnectX 4. Unfortunately in our use case the patch is not a question of performance: the workload doesn't work without it. We may think about running other workloads for the tests. Marta -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Max, > Hi Marta, > thanks for testing my patch. I'll send it early next week (holiday's in > our country) so it will be available in 4.11 kernel hopefully. > if you can share on which NIC's you tested it and the perf numbers you > get (with and without the patch), it will be great. Can we get it to 4.12 please (we need you to send it out today). -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Can we get it to 4.12 please (we need you to send it out today). > > Sagi, > we are preparing a cap bit for it. > I'll send it as soon as it will be ready (otherwise we'll add fence for > devices that can handle without it - such as ConnectX5). Why? can't you just get what you have as a fix and incrementally add the optimization for CX5? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Any update here? Would love to be able to load up the new kernels without patching them every time. :)
I would like that too,
Max, can you send a patch? or should I?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 15, 2017 at 03:00:07PM +0300, Sagi Grimberg wrote: > > Any update here? Would love to be able to load up the new kernels without patching them every time. :) > > I would like that too, > > Max, can you send a patch? or should I? Sagi, Max is doing his best to provide a patch, unfortunately he is limited by various architecture implications which he needs to resolve before sending it. Thanks > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Any update here? Would love to be able to load up the new kernels without patching them every time. :) >> >> I would like that too, >> >> Max, can you send a patch? or should I? > > Sagi, > > Max is doing his best to provide a patch, unfortunately he is limited > by various architecture implications which he needs to resolve before > sending it. Well, he already sent a patch that fixes the issue. He said that he needs additional optimization for CX5 (which I assume doesn't need the strong fence), but that still does not change the fact that CX4 is broken. I asked to include his patch, fix the existing bug and incrementally optimize CX5. What do you mean by architecture implications? it's broken and there is a request from the community to fix it. Are you suggesting that it doesn't solve the issue? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 15, 2017 at 04:43:18PM +0300, Sagi Grimberg wrote: > > > > Any update here? Would love to be able to load up the new kernels without patching them every time. :) > > > > > > I would like that too, > > > > > > Max, can you send a patch? or should I? > > > > Sagi, > > > > Max is doing his best to provide a patch, unfortunately he is limited > > by various architecture implications which he needs to resolve before > > sending it. > > Well, he already sent a patch that fixes the issue. > He said that he needs additional optimization for CX5 (which I assume > doesn't need the strong fence), but that still does not change the > fact that CX4 is broken. I asked to include his patch, fix the > existing bug and incrementally optimize CX5. > > What do you mean by architecture implications? it's broken and there is > a request from the community to fix it. Are you suggesting that it > doesn't solve the issue? I understand you and both Max and me are feeling the same as you. For more than 2 months, we constantly (almost on daily basis) asked for a solution from architecture group, but received different answers. The proposals were extremely broad from need for strong fence for all cards to no need for strong fence at all. Thanks >
On Mon, May 15, 2017 at 05:36:32PM +0300, Leon Romanovsky wrote: > I understand you and both Max and me are feeling the same as you. For more > than 2 months, we constantly (almost on daily basis) asked for a solution from > architecture group, but received different answers. The proposals were > extremely broad from need for strong fence for all cards to no need for > strong fence at all. So let's get the patch to do a strong fence everywhere now, and relax it later where possible. Correntness before speed.. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 15, 2017 at 07:59:52AM -0700, Christoph Hellwig wrote: > On Mon, May 15, 2017 at 05:36:32PM +0300, Leon Romanovsky wrote: > > I understand you and both Max and me are feeling the same as you. For more > > than 2 months, we constantly (almost on daily basis) asked for a solution from > > architecture group, but received different answers. The proposals were > > extremely broad from need for strong fence for all cards to no need for > > strong fence at all. > > So let's get the patch to do a strong fence everywhere now, and relax > it later where possible. > > Correntness before speed.. OK, please give me and Max till EOW to stop this saga. One of the two options will be: Max will resend original patch, or Max will send patch blessed by architecture group. Thanks
> On Mon, May 15, 2017 at 07:59:52AM -0700, Christoph Hellwig wrote: >> On Mon, May 15, 2017 at 05:36:32PM +0300, Leon Romanovsky wrote: >> > I understand you and both Max and me are feeling the same as you. For more >> > than 2 months, we constantly (almost on daily basis) asked for a solution from >> > architecture group, but received different answers. The proposals were >> > extremely broad from need for strong fence for all cards to no need for >> > strong fence at all. >> >> So let's get the patch to do a strong fence everywhere now, and relax >> it later where possible. >> >> Correntness before speed.. > > OK, please give me and Max till EOW to stop this saga. One of the two > options will be: Max will resend original patch, or Max will send patch > blessed by architecture group. > Good luck with this Max & Leon! It seems to be a complicated problem. Just an idea: in our case it *seems* that the problem started appearing after a firmware upgrade, older ones do not seem to have the same behaviour. Maybe it's a hint for you. Thanks! Marta -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 17, 2017 at 02:56:36PM +0200, Marta Rybczynska wrote: > > On Mon, May 15, 2017 at 07:59:52AM -0700, Christoph Hellwig wrote: > >> On Mon, May 15, 2017 at 05:36:32PM +0300, Leon Romanovsky wrote: > >> > I understand you and both Max and me are feeling the same as you. For more > >> > than 2 months, we constantly (almost on daily basis) asked for a solution from > >> > architecture group, but received different answers. The proposals were > >> > extremely broad from need for strong fence for all cards to no need for > >> > strong fence at all. > >> > >> So let's get the patch to do a strong fence everywhere now, and relax > >> it later where possible. > >> > >> Correntness before speed.. > > > > OK, please give me and Max till EOW to stop this saga. One of the two > > options will be: Max will resend original patch, or Max will send patch > > blessed by architecture group. > > > > Good luck with this Max & Leon! It seems to be a complicated problem. > Just an idea: in our case it *seems* that the problem started appearing > after a firmware upgrade, older ones do not seem to have the same > behaviour. Maybe it's a hint for you. OK, we came to the agreement which capability bits we should add. Max will return to the office at the middle of the next week and we will proceed with the submission of proper patch once our shared code will be accepted. In the meantime, i put the original patch to be part of our regression. https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=testing/queue-next&id=a40ac569f243db552661e6efad70080bb406823c Thank you for your patience. > > Thanks! > Marta > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
I ran into this with 4.9.32 when I rebooted the target. I tested 4.12-rc6 and this particular error seems to have been resolved, but I now get a new one on the initiator. This one doesn't seem as impactful. [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 [Mon Jun 19 11:17:20 2017] iser: iser_err_comp: command failure: local protection error (4) vend_err 52 [Mon Jun 19 11:17:20 2017] connection3:0: detected conn error (1011) [Mon Jun 19 11:17:31 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe [Mon Jun 19 11:17:31 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe [Mon Jun 19 11:17:31 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:31 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:31 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:31 2017] 00000000 93005204 0a0001e7 45dd82d2 [Mon Jun 19 11:17:31 2017] iser: iser_err_comp: command failure: local protection error (4) vend_err 52 [Mon Jun 19 11:17:31 2017] connection4:0: detected conn error (1011) [Mon Jun 19 11:17:31 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:31 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:31 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:31 2017] 00000000 93005204 0a0001f4 004915d2 [Mon Jun 19 11:17:31 2017] iser: iser_err_comp: command failure: local protection error (4) vend_err 52 [Mon Jun 19 11:17:31 2017] connection3:0: detected conn error (1011) [Mon Jun 19 11:17:44 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe [Mon Jun 19 11:17:44 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:44 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:44 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:17:44 2017] 00000000 93005204 0a0001f6 004519d2 [Mon Jun 19 11:17:44 2017] iser: iser_err_comp: command failure: local protection error (4) vend_err 52 [Mon Jun 19 11:17:44 2017] connection3:0: detected conn error (1011) [Mon Jun 19 11:18:55 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe [Mon Jun 19 11:18:55 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:18:55 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:18:55 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:18:55 2017] 00000000 93005204 0a0001f7 01934fd2 [Mon Jun 19 11:18:55 2017] iser: iser_err_comp: command failure: local protection error (4) vend_err 52 [Mon Jun 19 11:18:55 2017] connection3:0: detected conn error (1011) [Mon Jun 19 11:20:25 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe [Mon Jun 19 11:20:25 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:20:25 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:20:25 2017] 00000000 00000000 00000000 00000000 [Mon Jun 19 11:20:25 2017] 00000000 93005204 0a0001f8 0274edd2 [Mon Jun 19 11:20:25 2017] iser: iser_err_comp: command failure: local protection error (4) vend_err 52 [Mon Jun 19 11:20:25 2017] connection3:0: detected conn error (1011) I'm going to try to cherry-pick the fix to 4.9.x and do some testing there. Thanks, ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, May 18, 2017 at 7:34 AM, Leon Romanovsky <leon@kernel.org> wrote: > On Wed, May 17, 2017 at 02:56:36PM +0200, Marta Rybczynska wrote: >> > On Mon, May 15, 2017 at 07:59:52AM -0700, Christoph Hellwig wrote: >> >> On Mon, May 15, 2017 at 05:36:32PM +0300, Leon Romanovsky wrote: >> >> > I understand you and both Max and me are feeling the same as you. For more >> >> > than 2 months, we constantly (almost on daily basis) asked for a solution from >> >> > architecture group, but received different answers. The proposals were >> >> > extremely broad from need for strong fence for all cards to no need for >> >> > strong fence at all. >> >> >> >> So let's get the patch to do a strong fence everywhere now, and relax >> >> it later where possible. >> >> >> >> Correntness before speed.. >> > >> > OK, please give me and Max till EOW to stop this saga. One of the two >> > options will be: Max will resend original patch, or Max will send patch >> > blessed by architecture group. >> > >> >> Good luck with this Max & Leon! It seems to be a complicated problem. >> Just an idea: in our case it *seems* that the problem started appearing >> after a firmware upgrade, older ones do not seem to have the same >> behaviour. Maybe it's a hint for you. > > OK, we came to the agreement which capability bits we should add. Max > will return to the office at the middle of the next week and we will > proceed with the submission of proper patch once our shared code will > be accepted. > > In the meantime, i put the original patch to be part of our regression. > https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=testing/queue-next&id=a40ac569f243db552661e6efad70080bb406823c > > Thank you for your patience. > >> >> Thanks! >> Marta >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Robert, > I ran into this with 4.9.32 when I rebooted the target. I tested > 4.12-rc6 and this particular error seems to have been resolved, but I > now get a new one on the initiator. This one doesn't seem as > impactful. > > [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 Max, Leon, Care to parse this syndrome for us? ;) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 20, 2017 at 09:39:36AM +0300, Sagi Grimberg wrote: > Hi Robert, > > > I ran into this with 4.9.32 when I rebooted the target. I tested > > 4.12-rc6 and this particular error seems to have been resolved, but I > > now get a new one on the initiator. This one doesn't seem as > > impactful. > > > > [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 > > Max, Leon, > > Care to parse this syndrome for us? ;) Here the parsed output, it says that it was access to mkey which is free. ======== cqe_with_error ======== wqe_id : 0x0 srqn_usr_index : 0x0 byte_cnt : 0x0 hw_error_syndrome : 0x93 hw_syndrome_type : 0x0 vendor_error_syndrome : 0x52 syndrome : LOCAL_PROTECTION_ERROR (0x4) s_wqe_opcode : SEND (0xa) qpn_dctn_flow_tag : 0x1bd wqe_counter : 0x45c8 signature : 0xe0 opcode : REQUESTOR_ERROR (0xd) cqe_format : NO_INLINE_DATA (0x0) owner : 0x0 Thanks
>> Hi Robert, >> >>> I ran into this with 4.9.32 when I rebooted the target. I tested >>> 4.12-rc6 and this particular error seems to have been resolved, but I >>> now get a new one on the initiator. This one doesn't seem as >>> impactful. >>> >>> [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe >>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>> [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 >> >> Max, Leon, >> >> Care to parse this syndrome for us? ;) > > Here the parsed output, it says that it was access to mkey which is > free. > > ======== cqe_with_error ======== > wqe_id : 0x0 > srqn_usr_index : 0x0 > byte_cnt : 0x0 > hw_error_syndrome : 0x93 > hw_syndrome_type : 0x0 > vendor_error_syndrome : 0x52 Can you share the check that correlates to the vendor+hw syndrome? > syndrome : LOCAL_PROTECTION_ERROR (0x4) > s_wqe_opcode : SEND (0xa) That's interesting, the opcode is a send operation. I'm assuming that this is immediate-data write? Robert, did this happen when you issued >4k writes to the target? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 20, 2017 at 10:58:47AM +0300, Sagi Grimberg wrote: > > > > Hi Robert, > > > > > > > I ran into this with 4.9.32 when I rebooted the target. I tested > > > > 4.12-rc6 and this particular error seems to have been resolved, but I > > > > now get a new one on the initiator. This one doesn't seem as > > > > impactful. > > > > > > > > [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe > > > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > > > [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 > > > > [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 > > > > > > Max, Leon, > > > > > > Care to parse this syndrome for us? ;) > > > > Here the parsed output, it says that it was access to mkey which is > > free. > > > > ======== cqe_with_error ======== > > wqe_id : 0x0 > > srqn_usr_index : 0x0 > > byte_cnt : 0x0 > > hw_error_syndrome : 0x93 > > hw_syndrome_type : 0x0 > > vendor_error_syndrome : 0x52 > > Can you share the check that correlates to the vendor+hw syndrome? mkey.free == 1 > > > syndrome : LOCAL_PROTECTION_ERROR (0x4) > > s_wqe_opcode : SEND (0xa) > > That's interesting, the opcode is a send operation. I'm assuming > that this is immediate-data write? Robert, did this happen when > you issued >4k writes to the target?
>>> Here the parsed output, it says that it was access to mkey which is >>> free. Missed that :) >>> ======== cqe_with_error ======== >>> wqe_id : 0x0 >>> srqn_usr_index : 0x0 >>> byte_cnt : 0x0 >>> hw_error_syndrome : 0x93 >>> hw_syndrome_type : 0x0 >>> vendor_error_syndrome : 0x52 >> >> Can you share the check that correlates to the vendor+hw syndrome? > > mkey.free == 1 Hmm, the way I understand it is that the HW is trying to access (locally via send) a MR which was already invalidated. Thinking of this further, this can happen in a case where the target already completed the transaction, sent SEND_WITH_INVALIDATE but the original send ack was lost somewhere causing the device to retransmit from the MR (which was already invalidated). This is highly unlikely though. Shouldn't this be protected somehow by the device? Can someone explain why the above cannot happen? Jason? Liran? Anyone? Say host register MR (a) and send (1) from that MR to a target, send (1) ack got lost, and the target issues SEND_WITH_INVALIDATE on MR (a) and the host HCA process it, then host HCA timeout on send (1) so it retries, but ehh, its already invalidated. Or, we can also have a race where we destroy all our MRs when I/O is still running (but from the code we should be safe here). Robert, when you rebooted the target, I assume iscsi ping timeout expired and the connection teardown started correct? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 20, 2017 at 1:58 AM, Sagi Grimberg <sagi@grimberg.me> wrote: > >>> Hi Robert, >>> >>>> I ran into this with 4.9.32 when I rebooted the target. I tested >>>> 4.12-rc6 and this particular error seems to have been resolved, but I >>>> now get a new one on the initiator. This one doesn't seem as >>>> impactful. >>>> >>>> [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe >>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000 >>>> [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2 >>> >>> >>> Max, Leon, >>> >>> Care to parse this syndrome for us? ;) >> >> >> Here the parsed output, it says that it was access to mkey which is >> free. >> >> ======== cqe_with_error ======== >> wqe_id : 0x0 >> srqn_usr_index : 0x0 >> byte_cnt : 0x0 >> hw_error_syndrome : 0x93 >> hw_syndrome_type : 0x0 >> vendor_error_syndrome : 0x52 > > > Can you share the check that correlates to the vendor+hw syndrome? > >> syndrome : LOCAL_PROTECTION_ERROR (0x4) >> s_wqe_opcode : SEND (0xa) > > > That's interesting, the opcode is a send operation. I'm assuming > that this is immediate-data write? Robert, did this happen when > you issued >4k writes to the target? I was running dd with oflag=direct, so yes. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 20, 2017 at 3:33 AM, Sagi Grimberg <sagi@grimberg.me> wrote: > >>>> Here the parsed output, it says that it was access to mkey which is >>>> free. > > > Missed that :) > >>>> ======== cqe_with_error ======== >>>> wqe_id : 0x0 >>>> srqn_usr_index : 0x0 >>>> byte_cnt : 0x0 >>>> hw_error_syndrome : 0x93 >>>> hw_syndrome_type : 0x0 >>>> vendor_error_syndrome : 0x52 >>> >>> >>> Can you share the check that correlates to the vendor+hw syndrome? >> >> >> mkey.free == 1 > > > Hmm, the way I understand it is that the HW is trying to access > (locally via send) a MR which was already invalidated. > > Thinking of this further, this can happen in a case where the target > already completed the transaction, sent SEND_WITH_INVALIDATE but the > original send ack was lost somewhere causing the device to retransmit > from the MR (which was already invalidated). This is highly unlikely > though. > > Shouldn't this be protected somehow by the device? > Can someone explain why the above cannot happen? Jason? Liran? Anyone? > > Say host register MR (a) and send (1) from that MR to a target, > send (1) ack got lost, and the target issues SEND_WITH_INVALIDATE > on MR (a) and the host HCA process it, then host HCA timeout on send (1) > so it retries, but ehh, its already invalidated. > > Or, we can also have a race where we destroy all our MRs when I/O > is still running (but from the code we should be safe here). > > Robert, when you rebooted the target, I assume iscsi ping > timeout expired and the connection teardown started correct? I do remember that the ping timed out and the connection was torn down according to the messages. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 20, 2017 at 4:31 AM, Max Gurtovoy <maxg@mellanox.com> wrote: > > > On 6/20/2017 12:33 PM, Sagi Grimberg wrote: >> >> >>>>> Here the parsed output, it says that it was access to mkey which is >>>>> free. >> >> >> Missed that :) >> >>>>> ======== cqe_with_error ======== >>>>> wqe_id : 0x0 >>>>> srqn_usr_index : 0x0 >>>>> byte_cnt : 0x0 >>>>> hw_error_syndrome : 0x93 >>>>> hw_syndrome_type : 0x0 >>>>> vendor_error_syndrome : 0x52 >>>> >>>> >>>> Can you share the check that correlates to the vendor+hw syndrome? >>> >>> >>> mkey.free == 1 >> >> >> Hmm, the way I understand it is that the HW is trying to access >> (locally via send) a MR which was already invalidated. >> >> Thinking of this further, this can happen in a case where the target >> already completed the transaction, sent SEND_WITH_INVALIDATE but the >> original send ack was lost somewhere causing the device to retransmit >> from the MR (which was already invalidated). This is highly unlikely >> though. >> >> Shouldn't this be protected somehow by the device? >> Can someone explain why the above cannot happen? Jason? Liran? Anyone? >> >> Say host register MR (a) and send (1) from that MR to a target, >> send (1) ack got lost, and the target issues SEND_WITH_INVALIDATE >> on MR (a) and the host HCA process it, then host HCA timeout on send (1) >> so it retries, but ehh, its already invalidated. > > > This might happen IMO. > Robert, can you test this untested patch (this is not the full solution, > just something to think about): > > diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c > b/drivers/infiniband/ulp/iser/iser_verbs.c > index c538a38..e93bd40 100644 > --- a/drivers/infiniband/ulp/iser/iser_verbs.c > +++ b/drivers/infiniband/ulp/iser/iser_verbs.c > @@ -1079,7 +1079,7 @@ int iser_post_send(struct ib_conn *ib_conn, struct > iser_tx_desc *tx_desc, > wr->sg_list = tx_desc->tx_sg; > wr->num_sge = tx_desc->num_sge; > wr->opcode = IB_WR_SEND; > - wr->send_flags = signal ? IB_SEND_SIGNALED : 0; > + wr->send_flags = IB_SEND_SIGNALED; > > ib_ret = ib_post_send(ib_conn->qp, &tx_desc->wrs[0].send, &bad_wr); > if (ib_ret) > > >> >> Or, we can also have a race where we destroy all our MRs when I/O >> is still running (but from the code we should be safe here). >> >> Robert, when you rebooted the target, I assume iscsi ping >> timeout expired and the connection teardown started correct? I still get the local protection errors with this patch. I am seeing a ping timeout on the initiator when I reboot the target. [Tue Jun 20 16:41:21 2017] connection7:0: detected conn error (1011) [Tue Jun 20 16:41:26 2017] session7: session recovery timed out after 5 secs Since I'm gracefully shutting down the targets in this case, shouldn't the connection be closed gracefully by the target instead of the initiator having to wait for ping to fail? ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I still get the local protection errors with this patch. I am seeing a > ping timeout on the initiator when I reboot the target. > [Tue Jun 20 16:41:21 2017] connection7:0: detected conn error (1011) > [Tue Jun 20 16:41:26 2017] session7: session recovery timed out after 5 secs > Not a big surprise as its not really addressing the issue... > Since I'm gracefully shutting down the targets in this case, shouldn't > the connection be closed gracefully by the target instead of the > initiator having to wait for ping to fail? Not really, even in orderly shutdown, the device driver (mlx5 in this case) shutdown sequence is triggered before ib_isert and fires DEVICE_REMOVAL events to all its upper layer users (ib_isert being one of them), which forces resource teardown (no disconnect). We could resgister a shutdown handler in ib_isert, but its not really its responsibility as a transport driver.. It would be nice if we had targetcli daemon'ised as a service and register a shutdown notification, orderly remove and save the existing configuration before the kernel even sees it. But its a different scope really... CC'ing target-devel -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index ad8a2638e339..0f9a12570262 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -3893,7 +3893,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; case IB_WR_LOCAL_INV: - next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL; + next_fence = MLX5_FENCE_MODE_STRONG_ORDERING;