diff mbox series

dm: use noio when sending kobject event

Message ID alpine.LRH.2.02.2007081222500.4103@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show
Series dm: use noio when sending kobject event | expand

Commit Message

Mikulas Patocka July 8, 2020, 4:25 p.m. UTC
kobject_uevent may allocate memory and it may be called while there are dm
devices suspended. The allocation may recurse into a suspended device,
causing a deadlock. We must set the noio flag when sending a uevent.

This is the observed deadlock:

iSCSI-write
     holding: rx_queue_mutex
     waiting: uevent_sock_mutex

     kobject_uevent_env+0x1bd/0x419
     kobject_uevent+0xb/0xd
     device_add+0x48a/0x678
     scsi_add_host_with_dma+0xc5/0x22d
     iscsi_host_add+0x53/0x55
     iscsi_sw_tcp_session_create+0xa6/0x129
     iscsi_if_rx+0x100/0x1247
     netlink_unicast+0x213/0x4f0
     netlink_sendmsg+0x230/0x3c0

iscsi_fail iscsi_conn_failure
     waiting: rx_queue_mutex

     schedule_preempt_disabled+0x325/0x734
     __mutex_lock_slowpath+0x18b/0x230
     mutex_lock+0x22/0x40
     iscsi_conn_failure+0x42/0x149
     worker_thread+0x24a/0xbc0

EventManager_
     holding: uevent_sock_mutex
     waiting: dm_bufio_client->lock

     dm_bufio_lock+0xe/0x10
     shrink+0x34/0xf7
     shrink_slab+0x177/0x5d0
     do_try_to_free_pages+0x129/0x470
     try_to_free_mem_cgroup_pages+0x14f/0x210
     memcg_kmem_newpage_charge+0xa6d/0x13b0
     __alloc_pages_nodemask+0x4a3/0x1a70
     fallback_alloc+0x1b2/0x36c
     __kmalloc_node_track_caller+0xb9/0x10d0
     __alloc_skb+0x83/0x2f0
     kobject_uevent_env+0x26b/0x419
     dm_kobject_uevent+0x70/0x79
     dev_suspend+0x1a9/0x1e7
     ctl_ioctl+0x3e9/0x411
     dm_ctl_ioctl+0x13/0x17
     do_vfs_ioctl+0xb3/0x460
     SyS_ioctl+0x5e/0x90

MemcgReclaimerD
     holding: dm_bufio_client->lock
     waiting: stuck io to finish (needs iscsi_fail thread to progress)

     schedule at ffffffffbd603618
     io_schedule at ffffffffbd603ba4
     do_io_schedule at ffffffffbdaf0d94
     __wait_on_bit at ffffffffbd6008a6
     out_of_line_wait_on_bit at ffffffffbd600960
     wait_on_bit.constprop.10 at ffffffffbdaf0f17
     __make_buffer_clean at ffffffffbdaf18ba
     __cleanup_old_buffer at ffffffffbdaf192f
     shrink at ffffffffbdaf19fd
     do_shrink_slab at ffffffffbd6ec000
     shrink_slab at ffffffffbd6ec24a
     do_try_to_free_pages at ffffffffbd6eda09
     try_to_free_mem_cgroup_pages at ffffffffbd6ede7e
     mem_cgroup_resize_limit at ffffffffbd7024c0
     mem_cgroup_write at ffffffffbd703149
     cgroup_file_write at ffffffffbd6d9c6e
     sys_write at ffffffffbd6662ea
     system_call_fastpath at ffffffffbdbc34a2


Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Khazhismel Kumykov <khazhy@google.com>
Reported-by: Tahsin Erdogan <tahsin@google.com>
Reported-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: stable@vger.kernel.org

---
 drivers/md/dm.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Mike Snitzer July 8, 2020, 5:33 p.m. UTC | #1
On Wed, Jul 08 2020 at  2:26pm -0400,
Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> Mikulas Patocka <mpatocka@redhat.com> writes:
> 
> > kobject_uevent may allocate memory and it may be called while there are dm
> > devices suspended. The allocation may recurse into a suspended device,
> > causing a deadlock. We must set the noio flag when sending a uevent.
> 
> If I understand it correctly, considering the deadlock you shared, this
> doesn't solve the entire issue. For instance, kobject_uevent_env on the
> GFP_NOIO thread waits on uevent_sock_mutex, and another thread with
> GFP_IO holding the mutex might have triggered the shrinker from inside
> kobject_uevent_net_broadcast.  I believe 7e7cd796f277 ("scsi: iscsi: Fix
> deadlock on recovery path during GFP_IO reclaim") solved the one you
> shared and other similar cases for iSCSI in a different way.

I staged a different fix, from Mikulas, for 5.9 that is meant to address
the original report, please see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.9&id=e5bfe9baf23dca211f4b794b651e871032c427ec

I'd appreciate it if you could try this commit to se if it fixes the
original issue you reported.

> I know this is similar to the log I shared on an earlier patch. Are you
> able to reproduce the deadlock with the above patch applied?

Mikulas seized on the fact that the backtrace shows the uevent upcall
to have occurred while suspending.  I know he didn't reproduce your
issue.

> That said, I think this patch is an improvement as we shouldn't be using
> GFP_IO in this path to begin with, so please add:
> 
> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>

FYI, whilee I do appreciate your Reviewed-by I already staged this for
5.8 so I'd rather not rebase to add your Reviewed-by, see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.8&id=6958c1c640af8c3f40fa8a2eee3b5b905d95b677

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Gabriel Krisman Bertazi July 8, 2020, 6:26 p.m. UTC | #2
Mikulas Patocka <mpatocka@redhat.com> writes:

> kobject_uevent may allocate memory and it may be called while there are dm
> devices suspended. The allocation may recurse into a suspended device,
> causing a deadlock. We must set the noio flag when sending a uevent.

If I understand it correctly, considering the deadlock you shared, this
doesn't solve the entire issue. For instance, kobject_uevent_env on the
GFP_NOIO thread waits on uevent_sock_mutex, and another thread with
GFP_IO holding the mutex might have triggered the shrinker from inside
kobject_uevent_net_broadcast.  I believe 7e7cd796f277 ("scsi: iscsi: Fix
deadlock on recovery path during GFP_IO reclaim") solved the one you
shared and other similar cases for iSCSI in a different way.

I know this is similar to the log I shared on an earlier patch. Are you
able to reproduce the deadlock with the above patch applied?

That said, I think this patch is an improvement as we shouldn't be using
GFP_IO in this path to begin with, so please add:

Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>


> This is the observed deadlock:
>
> iSCSI-write
>      holding: rx_queue_mutex
>      waiting: uevent_sock_mutex
>
>      kobject_uevent_env+0x1bd/0x419
>      kobject_uevent+0xb/0xd
>      device_add+0x48a/0x678
>      scsi_add_host_with_dma+0xc5/0x22d
>      iscsi_host_add+0x53/0x55
>      iscsi_sw_tcp_session_create+0xa6/0x129
>      iscsi_if_rx+0x100/0x1247
>      netlink_unicast+0x213/0x4f0
>      netlink_sendmsg+0x230/0x3c0
>
> iscsi_fail iscsi_conn_failure
>      waiting: rx_queue_mutex
>
>      schedule_preempt_disabled+0x325/0x734
>      __mutex_lock_slowpath+0x18b/0x230
>      mutex_lock+0x22/0x40
>      iscsi_conn_failure+0x42/0x149
>      worker_thread+0x24a/0xbc0
>
> EventManager_
>      holding: uevent_sock_mutex
>      waiting: dm_bufio_client->lock
>
>      dm_bufio_lock+0xe/0x10
>      shrink+0x34/0xf7
>      shrink_slab+0x177/0x5d0
>      do_try_to_free_pages+0x129/0x470
>      try_to_free_mem_cgroup_pages+0x14f/0x210
>      memcg_kmem_newpage_charge+0xa6d/0x13b0
>      __alloc_pages_nodemask+0x4a3/0x1a70
>      fallback_alloc+0x1b2/0x36c
>      __kmalloc_node_track_caller+0xb9/0x10d0
>      __alloc_skb+0x83/0x2f0
>      kobject_uevent_env+0x26b/0x419
>      dm_kobject_uevent+0x70/0x79
>      dev_suspend+0x1a9/0x1e7
>      ctl_ioctl+0x3e9/0x411
>      dm_ctl_ioctl+0x13/0x17
>      do_vfs_ioctl+0xb3/0x460
>      SyS_ioctl+0x5e/0x90
>
> MemcgReclaimerD
>      holding: dm_bufio_client->lock
>      waiting: stuck io to finish (needs iscsi_fail thread to progress)
>
>      schedule at ffffffffbd603618
>      io_schedule at ffffffffbd603ba4
>      do_io_schedule at ffffffffbdaf0d94
>      __wait_on_bit at ffffffffbd6008a6
>      out_of_line_wait_on_bit at ffffffffbd600960
>      wait_on_bit.constprop.10 at ffffffffbdaf0f17
>      __make_buffer_clean at ffffffffbdaf18ba
>      __cleanup_old_buffer at ffffffffbdaf192f
>      shrink at ffffffffbdaf19fd
>      do_shrink_slab at ffffffffbd6ec000
>      shrink_slab at ffffffffbd6ec24a
>      do_try_to_free_pages at ffffffffbd6eda09
>      try_to_free_mem_cgroup_pages at ffffffffbd6ede7e
>      mem_cgroup_resize_limit at ffffffffbd7024c0
>      mem_cgroup_write at ffffffffbd703149
>      cgroup_file_write at ffffffffbd6d9c6e
>      sys_write at ffffffffbd6662ea
>      system_call_fastpath at ffffffffbdbc34a2
>
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Reported-by: Khazhismel Kumykov <khazhy@google.com>
> Reported-by: Tahsin Erdogan <tahsin@google.com>
> Reported-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Cc: stable@vger.kernel.org
>
> ---
>  drivers/md/dm.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/drivers/md/dm.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm.c	2020-06-29 14:50:22.000000000 +0200
> +++ linux-2.6/drivers/md/dm.c	2020-07-08 18:17:55.000000000 +0200
> @@ -12,6 +12,7 @@
>  #include <linux/init.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
> +#include <linux/sched/mm.h>
>  #include <linux/sched/signal.h>
>  #include <linux/blkpg.h>
>  #include <linux/bio.h>
> @@ -2926,17 +2927,25 @@ EXPORT_SYMBOL_GPL(dm_internal_resume_fas
>  int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action,
>  		       unsigned cookie)
>  {
> +	int r;
> +	unsigned noio_flag;
>  	char udev_cookie[DM_COOKIE_LENGTH];
>  	char *envp[] = { udev_cookie, NULL };
>  
> +	noio_flag = memalloc_noio_save();
> +
>  	if (!cookie)
> -		return kobject_uevent(&disk_to_dev(md->disk)->kobj, action);
> +		r = kobject_uevent(&disk_to_dev(md->disk)->kobj, action);
>  	else {
>  		snprintf(udev_cookie, DM_COOKIE_LENGTH, "%s=%u",
>  			 DM_COOKIE_ENV_VAR_NAME, cookie);
> -		return kobject_uevent_env(&disk_to_dev(md->disk)->kobj,
> -					  action, envp);
> +		r = kobject_uevent_env(&disk_to_dev(md->disk)->kobj,
> +				       action, envp);
>  	}
> +
> +	memalloc_noio_restore(noio_flag);
> +
> +	return r;
>  }
>  
>  uint32_t dm_next_uevent_seq(struct mapped_device *md)
>
Gabriel Krisman Bertazi July 8, 2020, 7:37 p.m. UTC | #3
Mike Snitzer <snitzer@redhat.com> writes:

> On Wed, Jul 08 2020 at  2:26pm -0400,
> Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
>
>> If I understand it correctly, considering the deadlock you shared, this
>> doesn't solve the entire issue. For instance, kobject_uevent_env on the
>> GFP_NOIO thread waits on uevent_sock_mutex, and another thread with
>> GFP_IO holding the mutex might have triggered the shrinker from inside
>> kobject_uevent_net_broadcast.  I believe 7e7cd796f277 ("scsi: iscsi: Fix
>> deadlock on recovery path during GFP_IO reclaim") solved the one you
>> shared and other similar cases for iSCSI in a different way.
>
> I staged a different fix, from Mikulas, for 5.9 that is meant to address
> the original report, please see:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.9&id=e5bfe9baf23dca211f4b794b651e871032c427ec
>
> I'd appreciate it if you could try this commit to se if it fixes the
> original issue you reported.

I reverted 7e7cd796f277 and cherry-picked e5bfe9baf23dc on my tree.
After a few iterations, I could see the conditions that formerly
triggered the deadlock happening, but this patch successfully allowed
the reclaim to succeed and the iscsi recovery thread to run.

My reproducer is a bit artificial, as I wrote it only from only the
problem description provided by google.  They were hitting this in
production and might have a better final word on the fix, though I know
they don't have a simple way to reproduce it.

>> That said, I think this patch is an improvement as we shouldn't be using
>> GFP_IO in this path to begin with, so please add:
>> 
>> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
>
> FYI, whilee I do appreciate your Reviewed-by I already staged this for
> 5.8 so I'd rather not rebase to add your Reviewed-by, see:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.8&id=6958c1c640af8c3f40fa8a2eee3b5b905d95b677

No worries.  Actually, thank you guys for helping with this issue.
Mike Snitzer July 10, 2020, 6:22 p.m. UTC | #4
On Wed, Jul 08 2020 at  3:37pm -0400,
Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> Mike Snitzer <snitzer@redhat.com> writes:
> 
> > On Wed, Jul 08 2020 at  2:26pm -0400,
> > Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> >
> >> If I understand it correctly, considering the deadlock you shared, this
> >> doesn't solve the entire issue. For instance, kobject_uevent_env on the
> >> GFP_NOIO thread waits on uevent_sock_mutex, and another thread with
> >> GFP_IO holding the mutex might have triggered the shrinker from inside
> >> kobject_uevent_net_broadcast.  I believe 7e7cd796f277 ("scsi: iscsi: Fix
> >> deadlock on recovery path during GFP_IO reclaim") solved the one you
> >> shared and other similar cases for iSCSI in a different way.
> >
> > I staged a different fix, from Mikulas, for 5.9 that is meant to address
> > the original report, please see:
> > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.9&id=e5bfe9baf23dca211f4b794b651e871032c427ec
> >
> > I'd appreciate it if you could try this commit to se if it fixes the
> > original issue you reported.
> 
> I reverted 7e7cd796f277 and cherry-picked e5bfe9baf23dc on my tree.
> After a few iterations, I could see the conditions that formerly
> triggered the deadlock happening, but this patch successfully allowed
> the reclaim to succeed and the iscsi recovery thread to run.
> 
> My reproducer is a bit artificial, as I wrote it only from only the
> problem description provided by google.  They were hitting this in
> production and might have a better final word on the fix, though I know
> they don't have a simple way to reproduce it.

Nice job on getting together a reproducer that even begins to model
the issue google's production setup teased out.

Thanks for testing, I've added your Tested-by to the commit.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c	2020-06-29 14:50:22.000000000 +0200
+++ linux-2.6/drivers/md/dm.c	2020-07-08 18:17:55.000000000 +0200
@@ -12,6 +12,7 @@ 
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 #include <linux/blkpg.h>
 #include <linux/bio.h>
@@ -2926,17 +2927,25 @@  EXPORT_SYMBOL_GPL(dm_internal_resume_fas
 int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action,
 		       unsigned cookie)
 {
+	int r;
+	unsigned noio_flag;
 	char udev_cookie[DM_COOKIE_LENGTH];
 	char *envp[] = { udev_cookie, NULL };
 
+	noio_flag = memalloc_noio_save();
+
 	if (!cookie)
-		return kobject_uevent(&disk_to_dev(md->disk)->kobj, action);
+		r = kobject_uevent(&disk_to_dev(md->disk)->kobj, action);
 	else {
 		snprintf(udev_cookie, DM_COOKIE_LENGTH, "%s=%u",
 			 DM_COOKIE_ENV_VAR_NAME, cookie);
-		return kobject_uevent_env(&disk_to_dev(md->disk)->kobj,
-					  action, envp);
+		r = kobject_uevent_env(&disk_to_dev(md->disk)->kobj,
+				       action, envp);
 	}
+
+	memalloc_noio_restore(noio_flag);
+
+	return r;
 }
 
 uint32_t dm_next_uevent_seq(struct mapped_device *md)