diff mbox

kernull NULL pointer observed on initiator side after 'nvmetcli clear' on target side

Message ID 644fc4ab-df6b-a337-1431-bad881ef56ee@grimberg.me (mailing list archive)
State Not Applicable
Headers show

Commit Message

Sagi Grimberg March 6, 2017, 11:25 a.m. UTC
> Hi experts
>
> If I offline one CPU on initiator side and nvmetcli clear on target side, it will cause kernel NULL pointer on initiator side, could you help check it, thanks
>
> Steps to reproduce:
> 1. setup nvmet target with null-blk device:
> #modprobe nvmet
> #modprobe nvmet-rdma
> #modprobe null_blk nr_devices=1
> #nvmetcli restore rdma.json
>
> 2. connect the target on initiator side and offline one cpu:
> #modprobe nvme-rdma
> #nvme connect-all -t rdma -a 172.31.2.3 -s 1023
> #echo 0 > /sys/devices/system/cpu/cpu1/online
>
> 3. nvmetcli clear on target side
> #nvmetcli clear
>
> Kernel log:
>
> [  125.039340] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.2.3:1023
> [  125.160587] nvme nvme0: creating 16 I/O queues.
> [  125.602244] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.2.3:1023
> [  140.930343] Broke affinity for irq 16
> [  140.950295] Broke affinity for irq 28
> [  140.969957] Broke affinity for irq 70
> [  140.986584] Broke affinity for irq 90
> [  141.003160] Broke affinity for irq 93
> [  141.019779] Broke affinity for irq 97
> [  141.036341] Broke affinity for irq 100
> [  141.053782] Broke affinity for irq 104
> [  141.072860] smpboot: CPU 1 is now offline
> [  154.768104] nvme nvme0: reconnecting in 10 seconds
> [  165.349689] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [  165.387783] IP: blk_mq_reinit_tagset+0x35/0x80

Looks like blk_mq_reinit_tagset is not aware that tags can go away with
cpu hotplug...

Does this fix your issue:
--
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yi Zhang March 9, 2017, 4:02 a.m. UTC | #1
Looks like blk_mq_reinit_tagset is not aware that tags can go away with
> cpu hotplug...
>
> Does this fix your issue:
> -- 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index e48bc2c72615..9d97bfc4d465 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -295,6 +295,9 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
>         for (i = 0; i < set->nr_hw_queues; i++) {
>                 struct blk_mq_tags *tags = set->tags[i];
>
> +               if (!tags)
> +                       continue;
> +
>                 for (j = 0; j < tags->nr_tags; j++) {
>                         if (!tags->static_rqs[j])
>                                 continue;
> -- 
Hi Sagi
With this patch, the NULL pointer fixed now.
But from below log, we can see it will continue reconnecting in 10 
seconds and cannot be stopped.

[36288.963890] Broke affinity for irq 16
[36288.983090] Broke affinity for irq 28
[36289.003104] Broke affinity for irq 90
[36289.020488] Broke affinity for irq 93
[36289.036911] Broke affinity for irq 97
[36289.053344] Broke affinity for irq 100
[36289.070166] Broke affinity for irq 104
[36289.088076] smpboot: CPU 1 is now offline
[36302.371160] nvme nvme0: reconnecting in 10 seconds
[36312.953684] blk_mq_reinit_tagset: tag is null, continue
[36312.983267] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36313.017290] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36313.044937] nvme nvme0: Failed reconnect attempt, requeueing...
[36323.171983] blk_mq_reinit_tagset: tag is null, continue
[36323.200733] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36323.233820] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36323.261027] nvme nvme0: Failed reconnect attempt, requeueing...
[36333.412341] blk_mq_reinit_tagset: tag is null, continue
[36333.441346] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36333.476139] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36333.502794] nvme nvme0: Failed reconnect attempt, requeueing...
[36343.652755] blk_mq_reinit_tagset: tag is null, continue
[36343.682103] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36343.716645] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36343.743581] nvme nvme0: Failed reconnect attempt, requeueing...
[36353.893103] blk_mq_reinit_tagset: tag is null, continue
[36353.921041] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36353.953541] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36353.983528] nvme nvme0: Failed reconnect attempt, requeueing...
[36364.133544] blk_mq_reinit_tagset: tag is null, continue
[36364.162012] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[36364.195002] nvme nvme0: rdma_resolve_addr wait failed (-104).
[36364.221671] nvme nvme0: Failed reconnect attempt, requeueing...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index e48bc2c72615..9d97bfc4d465 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -295,6 +295,9 @@  int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
         for (i = 0; i < set->nr_hw_queues; i++) {
                 struct blk_mq_tags *tags = set->tags[i];

+               if (!tags)
+                       continue;
+
                 for (j = 0; j < tags->nr_tags; j++) {
                         if (!tags->static_rqs[j])
                                 continue;