block: flush queued bios when the process blocks

From: Mikulas Patocka <mpatocka@redhat.com>

Hi Jens

Would you please consider applying this patch?

Regarding your idea about merging bio and request plugging - I think this 
could be done later when it is properly designed by Kent Overstreet or 
someone else.

We need this patch to fix the deadlock in dm snapshot.

Mikulas

---------- Forwarded message ----------
Date: Tue, 27 May 2014 11:03:36 -0400 (EDT)
From: Mikulas Patocka <mpatocka@redhat.com>
To: Jens Axboe <axboe@kernel.dk>, Kent Overstreet <kmo@daterainc.com>
Cc: linux-kernel@vger.kernel.org, dm-devel@redhat.com,
    Alasdair G. Kergon <agk@redhat.com>, Mike Snitzer <msnitzer@redhat.com>
Subject: [PATCH] block: flush queued bios when the process blocks

The block layer uses per-process bio list to avoid recursion in
generic_make_request. When generic_make_request is called recursively, the
bio is added to current->bio_list and the function returns immediatelly.
The top-level instance of generic_make_requests takes bios from
current->bio_list and processes them.

This bio queuing can result in deadlocks. The following deadlock was
observed:

1) Process A sends one-page read bio to the dm-snapshot target. The bio
spans snapshot chunk boundary and so it is split to two bios by device
mapper.

2) Device mapper creates the first sub-bio and sends it to the snapshot
driver.

3) The function snapshot_map calls track_chunk (that allocates a structure
dm_snap_tracked_chunk and adds it to tracked_chunk_hash) and then it
remaps the bio to the underlying linear target and exits with
DM_MAPIO_REMAPPED.

4) The remapped bio is submitted with generic_make_request, but it isn't
processed - it is added to current->bio_list instead.

5) Meanwhile, process B executes pending_complete for the affected chunk,
it takes down_write(&s->lock) and then loops in
__check_for_conflicting_io, waiting for dm_snap_tracked_chunk created in
step 3) to be released.

6) Process A continues, it creates a new bio for the rest of the original
bio.

7) snapshot_map is called for this new bio, it waits on
down_write(&s->lock) that is held in step 5).

The resulting deadlock:
* bio added to current->bio_list at step 4) waits until the function in
  step 7) finishes
* the function in step 7) waits until s->lock held in step 5) is released
* the process in step 5) waits until the bio queued in step 4) finishes

The general problem is that queuing bios on current->bio_list introduces
additional lock dependencies. If a device mapper target sends a bio to
some block device, it assumes that the bio only takes locks of the target
block device or devices that are below the target device. However, if the
bio is added to queue on current->bio_list, it creates artifical locking
dependency on locks taken by other bios that are on current->bio_list. In
the above scenario, this artifical locking dependency results in
deadlock.

Kent Overstreet already created a workqueue for every bio set and there is
a code that tries to resolve some low-memory deadlocks by redirecting bios
queued on current->bio_list to the workqueue if the system is low on
memory. However, other deadlocks (as described above) may happen without
any low memory condition.

This patch generalizes Kent's concept, it redirects bios on
current->bio_list to the bio_set's workqueue on every schedule call.
Consequently, when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

Bios allocated with bio_kmalloc do not have bio_set, so they are not
redirected, however bio_kmalloc shouldn't be used by stacking drivers (it
is currently used by raid1.c and raid10.c, we need to change it to
bio_set).

Note to stable kernel maintainers: before backporting this patch, you also
need to backport df2cb6daa4cbc34406bc4b1ac9b9335df1083a72.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 fs/bio.c               |   84 ++++++++++++++-----------------------------------
 include/linux/blkdev.h |    7 +++-
 kernel/sched/core.c    |    7 ++++
 3 files changed, 37 insertions(+), 61 deletions(-)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Message ID	alpine.LRH.2.02.1406261927280.4570@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State	Deferred, archived
Delegated to:	Mike Snitzer
Headers	show Return-Path: <dm-devel-bounces@redhat.com> X-Original-To: patchwork-dm-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id C0AE4BEEAA for <patchwork-dm-devel@patchwork.kernel.org>; Thu, 26 Jun 2014 23:51:31 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9CBCF2024C for <patchwork-dm-devel@patchwork.kernel.org>; Thu, 26 Jun 2014 23:51:30 +0000 (UTC) Received: from mx5-phx2.redhat.com (mx5-phx2.redhat.com [209.132.183.37]) by mail.kernel.org (Postfix) with ESMTP id 5445A201ED for <patchwork-dm-devel@patchwork.kernel.org>; Thu, 26 Jun 2014 23:51:29 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by mx5-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s5QNl3Dj018395; Thu, 26 Jun 2014 19:47:04 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id s5QNl15V012629 for <dm-devel@listman.util.phx.redhat.com>; Thu, 26 Jun 2014 19:47:01 -0400 Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s5QNl0b1026134 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 26 Jun 2014 19:47:00 -0400 Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id s5QNl0fw003930; Thu, 26 Jun 2014 19:47:00 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id s5QNkvi7003927; Thu, 26 Jun 2014 19:46:59 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Thu, 26 Jun 2014 19:46:57 -0400 (EDT) From: Mikulas Patocka <mpatocka@redhat.com> X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Jens Axboe <axboe@kernel.dk>, Kent Overstreet <kmo@daterainc.com> Message-ID: <alpine.LRH.2.02.1406261927280.4570@file01.intranet.prod.int.rdu2.redhat.com> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-loop: dm-devel@redhat.com Cc: dm-devel@redhat.com, Mike Snitzer <msnitzer@redhat.com>, linux-kernel@vger.kernel.org, "Alasdair G. Kergon" <agk@redhat.com> Subject: [dm-devel] [PATCH] block: flush queued bios when the process blocks X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: device-mapper development <dm-devel@redhat.com> List-Id: device-mapper development <dm-devel.redhat.com> List-Unsubscribe: <https://www.redhat.com/mailman/options/dm-devel>, <mailto:dm-devel-request@redhat.com?subject=unsubscribe> List-Archive: <https://www.redhat.com/archives/dm-devel> List-Post: <mailto:dm-devel@redhat.com> List-Help: <mailto:dm-devel-request@redhat.com?subject=help> List-Subscribe: <https://www.redhat.com/mailman/listinfo/dm-devel>, <mailto:dm-devel-request@redhat.com?subject=subscribe> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP

block: flush queued bios when the process blocks

Commit Message

Patch