[0/6] Support DAX for device-mapper dm-linear devices

Message ID	1466457467.3504.249.camel@hpe.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nvdimm-bounces@lists.01.org> From: "Kani, Toshimitsu" <toshi.kani@hpe.com> To: "snitzer@redhat.com" <snitzer@redhat.com> Subject: Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices Thread-Topic: [PATCH 0/6] Support DAX for device-mapper dm-linear devices Thread-Index: AQHRxcNvFc0gDtWw/UKSuO/qIYbBN5/oAoaAgAqtRoCAAAWsgIAAFjAAgAADUICAAAKLgIAAFViA Date: Mon, 20 Jun 2016 21:28:00 +0000 Message-ID: <1466457467.3504.249.camel@hpe.com> References: <1465856497-19698-1-git-send-email-toshi.kani@hpe.com> <20160613225756.GA18417@redhat.com> <20160620180043.GA21261@redhat.com> <1466446861.3504.243.camel@hpe.com> <20160620194026.GA21657@redhat.com> <20160620195217.GB21657@redhat.com> <1466452883.3504.244.camel@hpe.com> In-Reply-To: <1466452883.3504.244.camel@hpe.com> Accept-Language: en-US Content-Language: en-US received-spf: None (protection.outlook.com: hpe.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-ID: <6BF6767E1A048D4E8255034D80B96A64@NAMPRD84.PROD.OUTLOOK.COM> MIME-Version: 1.0 Precedence: list Cc: "axboe@kernel.dk" <axboe@kernel.dk>, "sandeen@redhat.com" <sandeen@redhat.com>, "linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>, "dm-devel@redhat.com" <dm-devel@redhat.com>, "viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>, "agk@redhat.com" <agk@redhat.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" <linux-nvdimm-bounces@lists.01.org>

Kani, Toshi June 20, 2016, 9:28 p.m. UTC

On Mon, 2016-06-20 at 14:01 -0600, Kani, Toshimitsu wrote:
> On Mon, 2016-06-20 at 15:52 -0400, Mike Snitzer wrote:
> > 
> > On Mon, Jun 20 2016 at  3:40pm -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> >  
 :
> > > If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I
> > > don't see this corruption.
> >
> > I did the same test with ext4 instead of xfs and it resulted in the same
> > type of systemic corruption (lvm2 metadata corrupted too):
> > 
 :
> I will look into the issue.

Hi Mike,

Can you fold the following patch to the dm-linear patch?

Thanks,
-Tsohi

------
Subject: [PATCH] dm-linear: Fix partition handling for DAX

Partition handling was missing in linear_direct_access().
Call bdev_direct_access(), instead of directly calling
target direct_access function.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
---
 drivers/md/dm-linear.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

 static struct target_type linear_target = {

Mike Snitzer June 20, 2016, 10:22 p.m. UTC | #1

On Mon, Jun 20 2016 at  5:28pm -0400,
Kani, Toshimitsu <toshi.kani@hpe.com> wrote:

> On Mon, 2016-06-20 at 14:01 -0600, Kani, Toshimitsu wrote:
> > On Mon, 2016-06-20 at 15:52 -0400, Mike Snitzer wrote:
> > > 
> > > On Mon, Jun 20 2016 at  3:40pm -0400,
> > > Mike Snitzer <snitzer@redhat.com> wrote:
> > >  
>  :
> > > > If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I
> > > > don't see this corruption.
> > >
> > > I did the same test with ext4 instead of xfs and it resulted in the same
> > > type of systemic corruption (lvm2 metadata corrupted too):
> > > 
>  :
> > I will look into the issue.
> 
> Hi Mike,
> 
> Can you fold the following patch to the dm-linear patch?
> 
> Thanks,
> -Tsohi
> 
> ------
> Subject: [PATCH] dm-linear: Fix partition handling for DAX
> 
> Partition handling was missing in linear_direct_access().
> Call bdev_direct_access(), instead of directly calling
> target direct_access function.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> ---
>  drivers/md/dm-linear.c |   14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index 325aa06..38323e4 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti,
> sector_t sector,
>  {
>  	struct linear_c *lc = ti->private;
>  	struct block_device *bdev = lc->dev->bdev;
> -	const struct block_device_operations *bd_ops = bdev->bd_disk->fops;
> -
> -	return bd_ops->direct_access(bdev, linear_map_sector(ti, sector),
> -				     kaddr, pfn, size);
> +	struct blk_dax_ctl dax = {
> +		.sector = linear_map_sector(ti, sector),
> +		.size = size,
> +	};
> +	long ret;
> +
> +	ret = bdev_direct_access(bdev, &dax);
> +	*kaddr = dax.addr;
> +	*pfn = dax.pfn;
> +	return ret;
>  }
>  
>  static struct target_type linear_target = {

Looks good, I folded it in and tested it to work.  Pushed to my 'wip'
branch.

No longer seeing any corruption in my test that was using partitions to
span pmem devices with a dm-linear device.

Jens, any chance you'd be open to picking up the first 2 patches in this
series?  Or would you like to see them folded or something different?

Mike

Mike Snitzer June 21, 2016, 1:41 p.m. UTC | #2

On Mon, Jun 20 2016 at  6:22pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Jun 20 2016 at  5:28pm -0400,
> Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> 
> > 
> > Hi Mike,
> > 
> > Can you fold the following patch to the dm-linear patch?
> > 
> > Thanks,
> > -Tsohi
> > 
> > ------
> > Subject: [PATCH] dm-linear: Fix partition handling for DAX
> > 
> > Partition handling was missing in linear_direct_access().
> > Call bdev_direct_access(), instead of directly calling
> > target direct_access function.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> > ---
> >  drivers/md/dm-linear.c |   14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> > index 325aa06..38323e4 100644
> > --- a/drivers/md/dm-linear.c
> > +++ b/drivers/md/dm-linear.c
> > @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti,
> > sector_t sector,
> >  {
> >  	struct linear_c *lc = ti->private;
> >  	struct block_device *bdev = lc->dev->bdev;
> > -	const struct block_device_operations *bd_ops = bdev->bd_disk->fops;
> > -
> > -	return bd_ops->direct_access(bdev, linear_map_sector(ti, sector),
> > -				     kaddr, pfn, size);
> > +	struct blk_dax_ctl dax = {
> > +		.sector = linear_map_sector(ti, sector),
> > +		.size = size,
> > +	};
> > +	long ret;
> > +
> > +	ret = bdev_direct_access(bdev, &dax);
> > +	*kaddr = dax.addr;
> > +	*pfn = dax.pfn;
> > +	return ret;
> >  }
> >  
> >  static struct target_type linear_target = {
> 
> Looks good, I folded it in and tested it to work.  Pushed to my 'wip'
> branch.
> 
> No longer seeing any corruption in my test that was using partitions to
> span pmem devices with a dm-linear device.
> 
> Jens, any chance you'd be open to picking up the first 2 patches in this
> series?  Or would you like to see them folded or something different?

I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX
rather than establish GENHD_FL_DAX on the genhd?

It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to
check for a queue flag.

Kani, Toshi June 21, 2016, 3:44 p.m. UTC | #3

On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:
> On Mon, Jun 20 2016 at  6:22pm -0400,

> Mike Snitzer <snitzer@redhat.com> wrote:

> > 

> > On Mon, Jun 20 2016 at  5:28pm -0400,

> > Kani, Toshimitsu <toshi.kani@hpe.com> wrote:

> > 

 :
> > Looks good, I folded it in and tested it to work.  Pushed to my 'wip'

> > branch.

> > 

> > No longer seeing any corruption in my test that was using partitions to

> > span pmem devices with a dm-linear device.

> > 

> > Jens, any chance you'd be open to picking up the first 2 patches in this

> > series?  Or would you like to see them folded or something different?

>

> I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX

> rather than establish GENHD_FL_DAX on the genhd?

> 

> It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to

> check for a queue flag.


I think GENHD_FL_DAX is more appropriate since DAX does not use a request
queue, except for protecting the underlining device being disabled while
direct_access() is called (b2e0d1625e19).  

About protecting direct_access, this patch assumes that the underlining
device cannot be disabled until dtr() is called.  Is this correct?  If not,
I will need to call dax_map_atomic().

Thanks,
-Toshi

Kani, Toshi June 21, 2016, 3:50 p.m. UTC | #4

On Tue, 2016-06-21 at 09:34 -0600, Kani, Toshimitsu wrote:
> On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:

> > 

> > On Mon, Jun 20 2016 at  6:22pm -0400,

> > Mike Snitzer <snitzer@redhat.com> wrote:

> > > 

> > > On Mon, Jun 20 2016 at  5:28pm -0400,

> > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote:

> > > 

>  :

> > > 

> > > Looks good, I folded it in and tested it to work.  Pushed to my 'wip'

> > > branch.

> > > 

> > > No longer seeing any corruption in my test that was using partitions

> > > to span pmem devices with a dm-linear device.

> > > 

> > > Jens, any chance you'd be open to picking up the first 2 patches in

> > > this series?  Or would you like to see them folded or something

> > > different?

> >

> > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX

> > rather than establish GENHD_FL_DAX on the genhd?

> > 

> > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to

> > check for a queue flag.

>

> I think GENHD_FL_DAX is more appropriate since DAX does not use a request

> queue, except for protecting the underlining device being disabled while

> direct_access() is called (b2e0d1625e19).  


Forgot to mention that there are bdev_dax_supported() and bdev_dax_capable()
interfaces that can be called from upper layers.  They both call
bdev_direct_access() which checks GENHD_FL_DAX.

Thanks,
-Toshi

> About protecting direct_access, this patch assumes that the underlining

> device cannot be disabled until dtr() is called.  Is this correct?  If

> not, I will need to call dax_map_atomic().

> 

> Thanks,

> -Toshi

Dan Williams June 21, 2016, 4:25 p.m. UTC | #5

On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:
>> On Mon, Jun 20 2016 at  6:22pm -0400,
>> Mike Snitzer <snitzer@redhat.com> wrote:
>> >
>> > On Mon, Jun 20 2016 at  5:28pm -0400,
>> > Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> >
>  :
>> > Looks good, I folded it in and tested it to work.  Pushed to my 'wip'
>> > branch.
>> >
>> > No longer seeing any corruption in my test that was using partitions to
>> > span pmem devices with a dm-linear device.
>> >
>> > Jens, any chance you'd be open to picking up the first 2 patches in this
>> > series?  Or would you like to see them folded or something different?
>>
>> I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX
>> rather than establish GENHD_FL_DAX on the genhd?
>>
>> It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to
>> check for a queue flag.
>
> I think GENHD_FL_DAX is more appropriate since DAX does not use a request
> queue, except for protecting the underlining device being disabled while
> direct_access() is called (b2e0d1625e19).
>
> About protecting direct_access, this patch assumes that the underlining
> device cannot be disabled until dtr() is called.  Is this correct?  If not,
> I will need to call dax_map_atomic().

Kernel internal usages of dax should be using dax_map_atomic() to
safely resolve device removal races.

Kani, Toshi June 21, 2016, 4:35 p.m. UTC | #6

On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote:
> On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com>

> wrote:

> > 

> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:

> > > 

> > > On Mon, Jun 20 2016 at  6:22pm -0400,

> > > Mike Snitzer <snitzer@redhat.com> wrote:

> > > > 

> > > > On Mon, Jun 20 2016 at  5:28pm -0400,

> > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote:

> >  :

> > > > Looks good, I folded it in and tested it to work.  Pushed to my 'wip'

> > > > branch.

> > > > 

> > > > No longer seeing any corruption in my test that was using partitions

> > > > to span pmem devices with a dm-linear device.

> > > > 

> > > > Jens, any chance you'd be open to picking up the first 2 patches in

> > > > this series?  Or would you like to see them folded or something

> > > > different?

> > >

> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX

> > > rather than establish GENHD_FL_DAX on the genhd?

> > > 

> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to

> > > check for a queue flag.

> >

> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request

> > queue, except for protecting the underlining device being disabled while

> > direct_access() is called (b2e0d1625e19).

> > 

> > About protecting direct_access, this patch assumes that the underlining

> > device cannot be disabled until dtr() is called.  Is this correct?  If

> > not, I will need to call dax_map_atomic().

>

> Kernel internal usages of dax should be using dax_map_atomic() to

> safely resolve device removal races.


Will do.  In such case, shall I move dax_[un]map_atomic() to block_dev.c and
rename them to bdev_dax_[un]map_atomic()?

Thanks,
-Toshi

Dan Williams June 21, 2016, 4:45 p.m. UTC | #7

On Tue, Jun 21, 2016 at 9:35 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote:
>> On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com>
>> wrote:
>> >
>> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:
>> > >
>> > > On Mon, Jun 20 2016 at  6:22pm -0400,
>> > > Mike Snitzer <snitzer@redhat.com> wrote:
>> > > >
>> > > > On Mon, Jun 20 2016 at  5:28pm -0400,
>> > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> >  :
>> > > > Looks good, I folded it in and tested it to work.  Pushed to my 'wip'
>> > > > branch.
>> > > >
>> > > > No longer seeing any corruption in my test that was using partitions
>> > > > to span pmem devices with a dm-linear device.
>> > > >
>> > > > Jens, any chance you'd be open to picking up the first 2 patches in
>> > > > this series?  Or would you like to see them folded or something
>> > > > different?
>> > >
>> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX
>> > > rather than establish GENHD_FL_DAX on the genhd?
>> > >
>> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to
>> > > check for a queue flag.
>> >
>> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request
>> > queue, except for protecting the underlining device being disabled while
>> > direct_access() is called (b2e0d1625e19).
>> >
>> > About protecting direct_access, this patch assumes that the underlining
>> > device cannot be disabled until dtr() is called.  Is this correct?  If
>> > not, I will need to call dax_map_atomic().
>>
>> Kernel internal usages of dax should be using dax_map_atomic() to
>> safely resolve device removal races.
>
> Will do.  In such case, shall I move dax_[un]map_atomic() to block_dev.c and
> rename them to bdev_dax_[un]map_atomic()?

Sounds good to me.  I know Jeff and Christoph don't like the current
calling convention of passing in a structure.  Just note that they
might ask you to change it back to a list of parameters if it moves to
bdev_dax_map_atomic().

Kani, Toshi June 21, 2016, 4:56 p.m. UTC | #8

On Tue, 2016-06-21 at 09:45 -0700, Dan Williams wrote:
> On Tue, Jun 21, 2016 at 9:35 AM, Kani, Toshimitsu <toshi.kani@hpe.com>

> wrote:

> > On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote:

> > > On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com>

> > > wrote:

> > > > 

 :
> > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a

> > > > request queue, except for protecting the underlining device being

> > > > disabled while direct_access() is called (b2e0d1625e19).

> > > > 

> > > > About protecting direct_access, this patch assumes that the

> > > > underlining device cannot be disabled until dtr() is called.  Is this

> > > > correct?  If not, I will need to call dax_map_atomic().

> > >

> > > Kernel internal usages of dax should be using dax_map_atomic() to

> > > safely resolve device removal races.

> >

> > Will do.  In such case, shall I move dax_[un]map_atomic() to block_dev.c

> > and rename them to bdev_dax_[un]map_atomic()?

>

> Sounds good to me.  I know Jeff and Christoph don't like the current

> calling convention of passing in a structure.  Just note that they

> might ask you to change it back to a list of parameters if it moves to

> bdev_dax_map_atomic().


OK, I will change it back to a list of parameters as well.

Thanks,
-Toshi

Mike Snitzer June 21, 2016, 6:17 p.m. UTC | #9

On Tue, Jun 21 2016 at 11:44am -0400,
Kani, Toshimitsu <toshi.kani@hpe.com> wrote:

> On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:
> > On Mon, Jun 20 2016 at  6:22pm -0400,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> > > 
> > > On Mon, Jun 20 2016 at  5:28pm -0400,
> > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> > > 
>  :
> > > Looks good, I folded it in and tested it to work.  Pushed to my 'wip'
> > > branch.
> > > 
> > > No longer seeing any corruption in my test that was using partitions to
> > > span pmem devices with a dm-linear device.
> > > 
> > > Jens, any chance you'd be open to picking up the first 2 patches in this
> > > series?  Or would you like to see them folded or something different?
> >
> > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX
> > rather than establish GENHD_FL_DAX on the genhd?
> > 
> > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to
> > check for a queue flag.
> 
> I think GENHD_FL_DAX is more appropriate since DAX does not use a request
> queue, except for protecting the underlining device being disabled while
> direct_access() is called (b2e0d1625e19).  

The devices in question have a request_queue.  All bio-based device have
a request_queue.

I don't have a big problem with GENHD_FL_DAX.  Just wanted to point out
that such block device capabilities are generally advertised in terms of
a QUEUE_FLAG.

> About protecting direct_access, this patch assumes that the underlining
> device cannot be disabled until dtr() is called.  Is this correct?  If not,
> I will need to call dax_map_atomic().

One of the big design considerations for DM that a DM device can be
suspended (with or without flush) and any new IO will be blocked until
the DM device is resumed.

So ideally DM should be able to have the same capability even if using
DAX.

But that is different than what commit b2e0d1625e19 is addressing.  For
DM, I wouldn't think you'd need the extra protections that
dax_map_atomic() is providing given that the underlying block device
lifetime is managed via DM core's dm_get_device/dm_put_device (see also:
dm.c:open_table_device/close_table_device).

Kani, Toshi June 22, 2016, 5:44 p.m. UTC | #10

On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote:
> On Tue, Jun 21 2016 at 11:44am -0400,

> Kani, Toshimitsu <toshi.kani@hpe.com> wrote:

> > 

> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:

> > > 

> > > On Mon, Jun 20 2016 at  6:22pm -0400,

> > > Mike Snitzer <snitzer@redhat.com> wrote:

 :
> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX

> > > rather than establish GENHD_FL_DAX on the genhd?

> > > 

> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to

> > > check for a queue flag.

> >

> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request

> > queue, except for protecting the underlining device being disabled while

> > direct_access() is called (b2e0d1625e19).  

>

> The devices in question have a request_queue.  All bio-based device have

> a request_queue.

DAX-capable devices have two operation modes, bio-based and DAX.  I agree that
bio-based operation is associated with a request queue, and its capabilities
should be set to it.  DAX, on the other hand, is rather independent from a
request queue.

> I don't have a big problem with GENHD_FL_DAX.  Just wanted to point out

> that such block device capabilities are generally advertised in terms of

> a QUEUE_FLAG.

I do not have a strong opinion, but feel a bit odd to associate DAX to a
request queue. 

> > About protecting direct_access, this patch assumes that the underlining

> > device cannot be disabled until dtr() is called.  Is this correct?  If

> > not, I will need to call dax_map_atomic().

>

> One of the big design considerations for DM that a DM device can be

> suspended (with or without flush) and any new IO will be blocked until

> the DM device is resumed.

> 

> So ideally DM should be able to have the same capability even if using

> DAX.

Supporting suspend for DAX is challenging since it allows user applications to
access a device directly.  Once a device range is mmap'd, there is no kernel
intervention to access the range, unless we invalidate user mappings.  This
isn't done today even after a driver is unbind'd from a device.

> But that is different than what commit b2e0d1625e19 is addressing.  For

> DM, I wouldn't think you'd need the extra protections that

> dax_map_atomic() is providing given that the underlying block device

> lifetime is managed via DM core's dm_get_device/dm_put_device (see also:

> dm.c:open_table_device/close_table_device).

I thought so as well.  But I realized that there is (almost) nothing that can
prevent the unbind operation.  It cannot fail, either.  This unbind proceeds
even when a device is in-use.  In case of a pmem device, it is only protected
by pmem_release_queue(), which is called when a pmem device is being deleted
and calls blk_cleanup_queue() to serialize a critical section between
blk_queue_enter() and blk_queue_exit() per b2e0d1625e19.  This prevents from a
kernel DTLB fault, but does not prevent a device disappeared while in-use.

Protecting DM's underlining device with blk_queue_enter() (or something
similar) requires more thoughts...  blk_queue_enter() to a DM device cannot be
redirected to its underlining device.  So, this is TBD for now.  But I do not
think this is a blocker issue since doing unbind to a underlining device is
quite harmful no matter what we do - even if it is protected with
blk_queue_enter().

Thanks,
-Toshi

Dan Williams June 22, 2016, 7:15 p.m. UTC | #11

On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
> On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote:
>> On Tue, Jun 21 2016 at 11:44am -0400,
>> Kani, Toshimitsu <toshi.kani@hpe.com> wrote:
>> >
>> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:
>> > >
>> > > On Mon, Jun 20 2016 at  6:22pm -0400,
>> > > Mike Snitzer <snitzer@redhat.com> wrote:
>  :
>> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX
>> > > rather than establish GENHD_FL_DAX on the genhd?
>> > >
>> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to
>> > > check for a queue flag.
>> >
>> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request
>> > queue, except for protecting the underlining device being disabled while
>> > direct_access() is called (b2e0d1625e19).
>>
>> The devices in question have a request_queue.  All bio-based device have
>> a request_queue.
>
> DAX-capable devices have two operation modes, bio-based and DAX.  I agree that
> bio-based operation is associated with a request queue, and its capabilities
> should be set to it.  DAX, on the other hand, is rather independent from a
> request queue.
>
>> I don't have a big problem with GENHD_FL_DAX.  Just wanted to point out
>> that such block device capabilities are generally advertised in terms of
>> a QUEUE_FLAG.
>
> I do not have a strong opinion, but feel a bit odd to associate DAX to a
> request queue.

Given that we do not support dax to a raw block device [1] it seems a
gendisk flag is more misleading than request_queue flag that specifies
what requests can be made of the device.

[1]: acc93d30d7d4 Revert "block: enable dax for raw block devices"


>> > About protecting direct_access, this patch assumes that the underlining
>> > device cannot be disabled until dtr() is called.  Is this correct?  If
>> > not, I will need to call dax_map_atomic().
>>
>> One of the big design considerations for DM that a DM device can be
>> suspended (with or without flush) and any new IO will be blocked until
>> the DM device is resumed.
>>
>> So ideally DM should be able to have the same capability even if using
>> DAX.
>
> Supporting suspend for DAX is challenging since it allows user applications to
> access a device directly.  Once a device range is mmap'd, there is no kernel
> intervention to access the range, unless we invalidate user mappings.  This
> isn't done today even after a driver is unbind'd from a device.
>
>> But that is different than what commit b2e0d1625e19 is addressing.  For
>> DM, I wouldn't think you'd need the extra protections that
>> dax_map_atomic() is providing given that the underlying block device
>> lifetime is managed via DM core's dm_get_device/dm_put_device (see also:
>> dm.c:open_table_device/close_table_device).
>
> I thought so as well.  But I realized that there is (almost) nothing that can
> prevent the unbind operation.  It cannot fail, either.  This unbind proceeds
> even when a device is in-use.  In case of a pmem device, it is only protected
> by pmem_release_queue(), which is called when a pmem device is being deleted
> and calls blk_cleanup_queue() to serialize a critical section between
> blk_queue_enter() and blk_queue_exit() per b2e0d1625e19.  This prevents from a
> kernel DTLB fault, but does not prevent a device disappeared while in-use.
>
> Protecting DM's underlining device with blk_queue_enter() (or something
> similar) requires more thoughts...  blk_queue_enter() to a DM device cannot be
> redirected to its underlining device.  So, this is TBD for now.  But I do not
> think this is a blocker issue since doing unbind to a underlining device is
> quite harmful no matter what we do - even if it is protected with
> blk_queue_enter().

I still have the "block device removed" notification patches on my
todo list.  It's not a blocker, but there are scenarios where we can
keep accessing memory via dax of a disabled device leading to memory
corruption.  I'll bump that up in my queue now that we are looking at
additional scenarios where letting DAX mappings leak past the
reconfiguration of a block device could lead to trouble.

Kani, Toshi June 22, 2016, 8:16 p.m. UTC | #12

On Wed, 2016-06-22 at 12:15 -0700, Dan Williams wrote:
> On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com>

> wrote:

> > On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote:

> > > 

> > > On Tue, Jun 21 2016 at 11:44am -0400,

> > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote:

> > > > 

> > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote:

> > > > > On Mon, Jun 20 2016 at  6:22pm -0400,

> > > > > Mike Snitzer <snitzer@redhat.com> wrote:

> > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX

> > > > > rather than establish GENHD_FL_DAX on the genhd?

> > > > > 

> > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4)

> > > > > to check for a queue flag.

> > > > 

> > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a

> > > > request queue, except for protecting the underlining device being

> > > > disabled while direct_access() is called (b2e0d1625e19).

> > > 

> > > The devices in question have a request_queue.  All bio-based device have

> > > a request_queue.

> >

> > DAX-capable devices have two operation modes, bio-based and DAX.  I agree

> > that bio-based operation is associated with a request queue, and its

> > capabilities should be set to it.  DAX, on the other hand, is rather

> > independent from a request queue.

> > 

> > > I don't have a big problem with GENHD_FL_DAX.  Just wanted to point out

> > > that such block device capabilities are generally advertised in terms of

> > > a QUEUE_FLAG.

> >

> > I do not have a strong opinion, but feel a bit odd to associate DAX to a

> > request queue.

>

> Given that we do not support dax to a raw block device [1] it seems a

> gendisk flag is more misleading than request_queue flag that specifies

> what requests can be made of the device.

> 

> [1]: acc93d30d7d4 Revert "block: enable dax for raw block devices"


Oh, I see.  I will change to use request_queue flag.


> > > > About protecting direct_access, this patch assumes that the

> > > > underlining device cannot be disabled until dtr() is called.  Is this

> > > > correct?  If not, I will need to call dax_map_atomic().

> > >

> > > One of the big design considerations for DM that a DM device can be

> > > suspended (with or without flush) and any new IO will be blocked until

> > > the DM device is resumed.

> > > 

> > > So ideally DM should be able to have the same capability even if using

> > > DAX.

> >

> > Supporting suspend for DAX is challenging since it allows user

> > applications to access a device directly.  Once a device range is mmap'd,

> > there is no kernel intervention to access the range, unless we invalidate

> > user mappings.  This isn't done today even after a driver is unbind'd from

> > a device.

> > 

> > > But that is different than what commit b2e0d1625e19 is addressing.  For

> > > DM, I wouldn't think you'd need the extra protections that

> > > dax_map_atomic() is providing given that the underlying block device

> > > lifetime is managed via DM core's dm_get_device/dm_put_device (see also:

> > > dm.c:open_table_device/close_table_device).

> >

> > I thought so as well.  But I realized that there is (almost) nothing that

> > can prevent the unbind operation.  It cannot fail, either.  This unbind

> > proceeds even when a device is in-use.  In case of a pmem device, it is

> > only protected by pmem_release_queue(), which is called when a pmem device

> > is being deleted and calls blk_cleanup_queue() to serialize a critical

> > section between

> > blk_queue_enter() and blk_queue_exit() per b2e0d1625e19.  This prevents

> > from a kernel DTLB fault, but does not prevent a device disappeared while

> > in-use.

> > 

> > Protecting DM's underlining device with blk_queue_enter() (or something

> > similar) requires more thoughts...  blk_queue_enter() to a DM device

> > cannot be redirected to its underlining device.  So, this is TBD for

> > now.  But I do not think this is a blocker issue since doing unbind to a

> > underlining device is quite harmful no matter what we do - even if it is

> > protected with blk_queue_enter().

>

> I still have the "block device removed" notification patches on my

> todo list.  It's not a blocker, but there are scenarios where we can

> keep accessing memory via dax of a disabled device leading to memory

> corruption.  


Right, I noticed that user applications can access mmap'd ranges on a disabled
device.

> I'll bump that up in my queue now that we are looking at

> additional scenarios where letting DAX mappings leak past the

> reconfiguration of a block device could lead to trouble.


Great.  With DM, removing a underlining device while in-use can lead to
trouble, esp. with RAID0.  Users need to remove a device from DM first...

Thanks,
-Toshi

[0/6] Support DAX for device-mapper dm-linear devices

Commit Message

Comments

Patch