diff mbox

dm: use cloned bio as head, not remainder, in __split_and_process_bio()

Message ID 878tewzjz3.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Nov. 23, 2017, 10:52 p.m. UTC
When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder.

If the newly allocated bio is used as the remainder, and if it then
needs to be split again, then the next bio_clone_bioset() call will
be made while holding a reference a bio (result of the first clone)
from the same bioset.  This can potentially exhaust the bioset mempool
and result in a memory allocation deadlock.

So the result of the bio_clone_bioset() must be attached to the new
dm_io struct, and the original must be resubmitted.  The current code
is backwards.

Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio().  This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---

Hi,
 I think this should resolve the problem Mikulas noticed that the
 bios form a deep chain instead of a wide tree.

Thanks,
NeilBrown

 drivers/md/dm.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Mike Snitzer Nov. 27, 2017, 2:23 p.m. UTC | #1
On Thu, Nov 23 2017 at  5:52pm -0500,
NeilBrown <neilb@suse.com> wrote:

> 
> When we use bio_clone_bioset() to split off the front part of a bio
> and chain the two together and submit the remainder to
> generic_make_request(), it is important that the newly allocated
> bio is used as the head to be processed immediately, and the original
> bio gets "bio_advance()"d and sent to generic_make_request() as the
> remainder.
> 
> If the newly allocated bio is used as the remainder, and if it then
> needs to be split again, then the next bio_clone_bioset() call will
> be made while holding a reference a bio (result of the first clone)
> from the same bioset.  This can potentially exhaust the bioset mempool
> and result in a memory allocation deadlock.
> 
> So the result of the bio_clone_bioset() must be attached to the new
> dm_io struct, and the original must be resubmitted.  The current code
> is backwards.
> 
> Note that there is no race caused by reassigning cio.io->bio after already
> calling __map_bio().  This bio will only be dereferenced again after
> dec_pending() has found io->io_count to be zero, and this cannot happen
> before the dec_pending() call at the end of __split_and_process_bio().
> 
> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> Hi,
>  I think this should resolve the problem Mikulas noticed that the
>  bios form a deep chain instead of a wide tree.

I'm inclined to just fold this into the original commit.
I'd update that header to make mention of the details captured in this
header.

Would you be OK with that?

Mike
NeilBrown Nov. 28, 2017, 10:18 p.m. UTC | #2
On Mon, Nov 27 2017, Mike Snitzer wrote:

> On Thu, Nov 23 2017 at  5:52pm -0500,
> NeilBrown <neilb@suse.com> wrote:
>
>> 
>> When we use bio_clone_bioset() to split off the front part of a bio
>> and chain the two together and submit the remainder to
>> generic_make_request(), it is important that the newly allocated
>> bio is used as the head to be processed immediately, and the original
>> bio gets "bio_advance()"d and sent to generic_make_request() as the
>> remainder.
>> 
>> If the newly allocated bio is used as the remainder, and if it then
>> needs to be split again, then the next bio_clone_bioset() call will
>> be made while holding a reference a bio (result of the first clone)
>> from the same bioset.  This can potentially exhaust the bioset mempool
>> and result in a memory allocation deadlock.
>> 
>> So the result of the bio_clone_bioset() must be attached to the new
>> dm_io struct, and the original must be resubmitted.  The current code
>> is backwards.
>> 
>> Note that there is no race caused by reassigning cio.io->bio after already
>> calling __map_bio().  This bio will only be dereferenced again after
>> dec_pending() has found io->io_count to be zero, and this cannot happen
>> before the dec_pending() call at the end of __split_and_process_bio().
>> 
>> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>> 
>> Hi,
>>  I think this should resolve the problem Mikulas noticed that the
>>  bios form a deep chain instead of a wide tree.
>
> I'm inclined to just fold this into the original commit.
> I'd update that header to make mention of the details captured in this
> header.
>
> Would you be OK with that?

Perfectly OK with that.  Thanks for asking.

NeilBrown
diff mbox

Patch

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 99ec215f7dcb..2e0e10a1c030 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1482,12 +1482,19 @@  static void __split_and_process_bio(struct mapped_device *md,
 				 * Remainder must be passed to generic_make_request()
 				 * so that it gets handled *after* bios already submitted
 				 * have been completely processed.
+				 * We take a clone of the original to store in
+				 * ci.io->bio to be used by end_io_acct() and
+				 * for dec_pending to use for completion handling.
+				 * As this path is not used for REQ_OP_ZONE_REPORT,
+				 * the usage of io->bio in dm_remap_zone_report()
+				 * won't be affected by this reassignment.
 				 */
 				struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
 								 md->queue->bio_split);
-				bio_advance(b, (bio_sectors(b) - ci.sector_count) << 9);
+				ci.io->bio = b;
+				bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9);
 				bio_chain(b, bio);
-				generic_make_request(b);
+				generic_make_request(bio);
 				break;
 			}
 		}