Message ID | 1466457467.3504.249.camel@hpe.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jun 20 2016 at 5:28pm -0400, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Mon, 2016-06-20 at 14:01 -0600, Kani, Toshimitsu wrote: > > On Mon, 2016-06-20 at 15:52 -0400, Mike Snitzer wrote: > > > > > > On Mon, Jun 20 2016 at 3:40pm -0400, > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > : > > > > If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I > > > > don't see this corruption. > > > > > > I did the same test with ext4 instead of xfs and it resulted in the same > > > type of systemic corruption (lvm2 metadata corrupted too): > > > > : > > I will look into the issue. > > Hi Mike, > > Can you fold the following patch to the dm-linear patch? > > Thanks, > -Tsohi > > ------ > Subject: [PATCH] dm-linear: Fix partition handling for DAX > > Partition handling was missing in linear_direct_access(). > Call bdev_direct_access(), instead of directly calling > target direct_access function. > > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > --- > drivers/md/dm-linear.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c > index 325aa06..38323e4 100644 > --- a/drivers/md/dm-linear.c > +++ b/drivers/md/dm-linear.c > @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti, > sector_t sector, > { > struct linear_c *lc = ti->private; > struct block_device *bdev = lc->dev->bdev; > - const struct block_device_operations *bd_ops = bdev->bd_disk->fops; > - > - return bd_ops->direct_access(bdev, linear_map_sector(ti, sector), > - kaddr, pfn, size); > + struct blk_dax_ctl dax = { > + .sector = linear_map_sector(ti, sector), > + .size = size, > + }; > + long ret; > + > + ret = bdev_direct_access(bdev, &dax); > + *kaddr = dax.addr; > + *pfn = dax.pfn; > + return ret; > } > > static struct target_type linear_target = { Looks good, I folded it in and tested it to work. Pushed to my 'wip' branch. No longer seeing any corruption in my test that was using partitions to span pmem devices with a dm-linear device. Jens, any chance you'd be open to picking up the first 2 patches in this series? Or would you like to see them folded or something different? Mike
On Mon, Jun 20 2016 at 6:22pm -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Jun 20 2016 at 5:28pm -0400, > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > > Hi Mike, > > > > Can you fold the following patch to the dm-linear patch? > > > > Thanks, > > -Tsohi > > > > ------ > > Subject: [PATCH] dm-linear: Fix partition handling for DAX > > > > Partition handling was missing in linear_direct_access(). > > Call bdev_direct_access(), instead of directly calling > > target direct_access function. > > > > Signed-off-by: Toshi Kani <toshi.kani@hpe.com> > > --- > > drivers/md/dm-linear.c | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c > > index 325aa06..38323e4 100644 > > --- a/drivers/md/dm-linear.c > > +++ b/drivers/md/dm-linear.c > > @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti, > > sector_t sector, > > { > > struct linear_c *lc = ti->private; > > struct block_device *bdev = lc->dev->bdev; > > - const struct block_device_operations *bd_ops = bdev->bd_disk->fops; > > - > > - return bd_ops->direct_access(bdev, linear_map_sector(ti, sector), > > - kaddr, pfn, size); > > + struct blk_dax_ctl dax = { > > + .sector = linear_map_sector(ti, sector), > > + .size = size, > > + }; > > + long ret; > > + > > + ret = bdev_direct_access(bdev, &dax); > > + *kaddr = dax.addr; > > + *pfn = dax.pfn; > > + return ret; > > } > > > > static struct target_type linear_target = { > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > branch. > > No longer seeing any corruption in my test that was using partitions to > span pmem devices with a dm-linear device. > > Jens, any chance you'd be open to picking up the first 2 patches in this > series? Or would you like to see them folded or something different? I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX rather than establish GENHD_FL_DAX on the genhd? It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to check for a queue flag.
On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > On Mon, Jun 20 2016 at 6:22pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > : > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > branch. > > > > No longer seeing any corruption in my test that was using partitions to > > span pmem devices with a dm-linear device. > > > > Jens, any chance you'd be open to picking up the first 2 patches in this > > series? Or would you like to see them folded or something different? > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > rather than establish GENHD_FL_DAX on the genhd? > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > check for a queue flag. I think GENHD_FL_DAX is more appropriate since DAX does not use a request queue, except for protecting the underlining device being disabled while direct_access() is called (b2e0d1625e19). About protecting direct_access, this patch assumes that the underlining device cannot be disabled until dtr() is called. Is this correct? If not, I will need to call dax_map_atomic(). Thanks, -Toshi
On Tue, 2016-06-21 at 09:34 -0600, Kani, Toshimitsu wrote: > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > : > > > > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > > branch. > > > > > > No longer seeing any corruption in my test that was using partitions > > > to span pmem devices with a dm-linear device. > > > > > > Jens, any chance you'd be open to picking up the first 2 patches in > > > this series? Or would you like to see them folded or something > > > different? > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > rather than establish GENHD_FL_DAX on the genhd? > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > check for a queue flag. > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > queue, except for protecting the underlining device being disabled while > direct_access() is called (b2e0d1625e19). Forgot to mention that there are bdev_dax_supported() and bdev_dax_capable() interfaces that can be called from upper layers. They both call bdev_direct_access() which checks GENHD_FL_DAX. Thanks, -Toshi > About protecting direct_access, this patch assumes that the underlining > device cannot be disabled until dtr() is called. Is this correct? If > not, I will need to call dax_map_atomic(). > > Thanks, > -Toshi
On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: >> On Mon, Jun 20 2016 at 6:22pm -0400, >> Mike Snitzer <snitzer@redhat.com> wrote: >> > >> > On Mon, Jun 20 2016 at 5:28pm -0400, >> > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >> > > : >> > Looks good, I folded it in and tested it to work. Pushed to my 'wip' >> > branch. >> > >> > No longer seeing any corruption in my test that was using partitions to >> > span pmem devices with a dm-linear device. >> > >> > Jens, any chance you'd be open to picking up the first 2 patches in this >> > series? Or would you like to see them folded or something different? >> >> I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX >> rather than establish GENHD_FL_DAX on the genhd? >> >> It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to >> check for a queue flag. > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > queue, except for protecting the underlining device being disabled while > direct_access() is called (b2e0d1625e19). > > About protecting direct_access, this patch assumes that the underlining > device cannot be disabled until dtr() is called. Is this correct? If not, > I will need to call dax_map_atomic(). Kernel internal usages of dax should be using dax_map_atomic() to safely resolve device removal races.
On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote: > On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > : > > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > > > branch. > > > > > > > > No longer seeing any corruption in my test that was using partitions > > > > to span pmem devices with a dm-linear device. > > > > > > > > Jens, any chance you'd be open to picking up the first 2 patches in > > > > this series? Or would you like to see them folded or something > > > > different? > > > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > > rather than establish GENHD_FL_DAX on the genhd? > > > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > > check for a queue flag. > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > > queue, except for protecting the underlining device being disabled while > > direct_access() is called (b2e0d1625e19). > > > > About protecting direct_access, this patch assumes that the underlining > > device cannot be disabled until dtr() is called. Is this correct? If > > not, I will need to call dax_map_atomic(). > > Kernel internal usages of dax should be using dax_map_atomic() to > safely resolve device removal races. Will do. In such case, shall I move dax_[un]map_atomic() to block_dev.c and rename them to bdev_dax_[un]map_atomic()? Thanks, -Toshi
On Tue, Jun 21, 2016 at 9:35 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote: >> On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> >> wrote: >> > >> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: >> > > >> > > On Mon, Jun 20 2016 at 6:22pm -0400, >> > > Mike Snitzer <snitzer@redhat.com> wrote: >> > > > >> > > > On Mon, Jun 20 2016 at 5:28pm -0400, >> > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >> > : >> > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' >> > > > branch. >> > > > >> > > > No longer seeing any corruption in my test that was using partitions >> > > > to span pmem devices with a dm-linear device. >> > > > >> > > > Jens, any chance you'd be open to picking up the first 2 patches in >> > > > this series? Or would you like to see them folded or something >> > > > different? >> > > >> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX >> > > rather than establish GENHD_FL_DAX on the genhd? >> > > >> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to >> > > check for a queue flag. >> > >> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request >> > queue, except for protecting the underlining device being disabled while >> > direct_access() is called (b2e0d1625e19). >> > >> > About protecting direct_access, this patch assumes that the underlining >> > device cannot be disabled until dtr() is called. Is this correct? If >> > not, I will need to call dax_map_atomic(). >> >> Kernel internal usages of dax should be using dax_map_atomic() to >> safely resolve device removal races. > > Will do. In such case, shall I move dax_[un]map_atomic() to block_dev.c and > rename them to bdev_dax_[un]map_atomic()? Sounds good to me. I know Jeff and Christoph don't like the current calling convention of passing in a structure. Just note that they might ask you to change it back to a list of parameters if it moves to bdev_dax_map_atomic().
On Tue, 2016-06-21 at 09:45 -0700, Dan Williams wrote: > On Tue, Jun 21, 2016 at 9:35 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote: > > > On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > > > wrote: > > > > : > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a > > > > request queue, except for protecting the underlining device being > > > > disabled while direct_access() is called (b2e0d1625e19). > > > > > > > > About protecting direct_access, this patch assumes that the > > > > underlining device cannot be disabled until dtr() is called. Is this > > > > correct? If not, I will need to call dax_map_atomic(). > > > > > > Kernel internal usages of dax should be using dax_map_atomic() to > > > safely resolve device removal races. > > > > Will do. In such case, shall I move dax_[un]map_atomic() to block_dev.c > > and rename them to bdev_dax_[un]map_atomic()? > > Sounds good to me. I know Jeff and Christoph don't like the current > calling convention of passing in a structure. Just note that they > might ask you to change it back to a list of parameters if it moves to > bdev_dax_map_atomic(). OK, I will change it back to a list of parameters as well. Thanks, -Toshi
On Tue, Jun 21 2016 at 11:44am -0400, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > On Mon, Jun 20 2016 at 6:22pm -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > : > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > > branch. > > > > > > No longer seeing any corruption in my test that was using partitions to > > > span pmem devices with a dm-linear device. > > > > > > Jens, any chance you'd be open to picking up the first 2 patches in this > > > series? Or would you like to see them folded or something different? > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > rather than establish GENHD_FL_DAX on the genhd? > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > check for a queue flag. > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > queue, except for protecting the underlining device being disabled while > direct_access() is called (b2e0d1625e19). The devices in question have a request_queue. All bio-based device have a request_queue. I don't have a big problem with GENHD_FL_DAX. Just wanted to point out that such block device capabilities are generally advertised in terms of a QUEUE_FLAG. > About protecting direct_access, this patch assumes that the underlining > device cannot be disabled until dtr() is called. Is this correct? If not, > I will need to call dax_map_atomic(). One of the big design considerations for DM that a DM device can be suspended (with or without flush) and any new IO will be blocked until the DM device is resumed. So ideally DM should be able to have the same capability even if using DAX. But that is different than what commit b2e0d1625e19 is addressing. For DM, I wouldn't think you'd need the extra protections that dax_map_atomic() is providing given that the underlying block device lifetime is managed via DM core's dm_get_device/dm_put_device (see also: dm.c:open_table_device/close_table_device).
On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: > On Tue, Jun 21 2016 at 11:44am -0400, > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > > Mike Snitzer <snitzer@redhat.com> wrote: : > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > > rather than establish GENHD_FL_DAX on the genhd? > > > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > > check for a queue flag. > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > > queue, except for protecting the underlining device being disabled while > > direct_access() is called (b2e0d1625e19). > > The devices in question have a request_queue. All bio-based device have > a request_queue. DAX-capable devices have two operation modes, bio-based and DAX. I agree that bio-based operation is associated with a request queue, and its capabilities should be set to it. DAX, on the other hand, is rather independent from a request queue. > I don't have a big problem with GENHD_FL_DAX. Just wanted to point out > that such block device capabilities are generally advertised in terms of > a QUEUE_FLAG. I do not have a strong opinion, but feel a bit odd to associate DAX to a request queue. > > About protecting direct_access, this patch assumes that the underlining > > device cannot be disabled until dtr() is called. Is this correct? If > > not, I will need to call dax_map_atomic(). > > One of the big design considerations for DM that a DM device can be > suspended (with or without flush) and any new IO will be blocked until > the DM device is resumed. > > So ideally DM should be able to have the same capability even if using > DAX. Supporting suspend for DAX is challenging since it allows user applications to access a device directly. Once a device range is mmap'd, there is no kernel intervention to access the range, unless we invalidate user mappings. This isn't done today even after a driver is unbind'd from a device. > But that is different than what commit b2e0d1625e19 is addressing. For > DM, I wouldn't think you'd need the extra protections that > dax_map_atomic() is providing given that the underlying block device > lifetime is managed via DM core's dm_get_device/dm_put_device (see also: > dm.c:open_table_device/close_table_device). I thought so as well. But I realized that there is (almost) nothing that can prevent the unbind operation. It cannot fail, either. This unbind proceeds even when a device is in-use. In case of a pmem device, it is only protected by pmem_release_queue(), which is called when a pmem device is being deleted and calls blk_cleanup_queue() to serialize a critical section between blk_queue_enter() and blk_queue_exit() per b2e0d1625e19. This prevents from a kernel DTLB fault, but does not prevent a device disappeared while in-use. Protecting DM's underlining device with blk_queue_enter() (or something similar) requires more thoughts... blk_queue_enter() to a DM device cannot be redirected to its underlining device. So, this is TBD for now. But I do not think this is a blocker issue since doing unbind to a underlining device is quite harmful no matter what we do - even if it is protected with blk_queue_enter(). Thanks, -Toshi
On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: >> On Tue, Jun 21 2016 at 11:44am -0400, >> Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >> > >> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: >> > > >> > > On Mon, Jun 20 2016 at 6:22pm -0400, >> > > Mike Snitzer <snitzer@redhat.com> wrote: > : >> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX >> > > rather than establish GENHD_FL_DAX on the genhd? >> > > >> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to >> > > check for a queue flag. >> > >> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request >> > queue, except for protecting the underlining device being disabled while >> > direct_access() is called (b2e0d1625e19). >> >> The devices in question have a request_queue. All bio-based device have >> a request_queue. > > DAX-capable devices have two operation modes, bio-based and DAX. I agree that > bio-based operation is associated with a request queue, and its capabilities > should be set to it. DAX, on the other hand, is rather independent from a > request queue. > >> I don't have a big problem with GENHD_FL_DAX. Just wanted to point out >> that such block device capabilities are generally advertised in terms of >> a QUEUE_FLAG. > > I do not have a strong opinion, but feel a bit odd to associate DAX to a > request queue. Given that we do not support dax to a raw block device [1] it seems a gendisk flag is more misleading than request_queue flag that specifies what requests can be made of the device. [1]: acc93d30d7d4 Revert "block: enable dax for raw block devices" >> > About protecting direct_access, this patch assumes that the underlining >> > device cannot be disabled until dtr() is called. Is this correct? If >> > not, I will need to call dax_map_atomic(). >> >> One of the big design considerations for DM that a DM device can be >> suspended (with or without flush) and any new IO will be blocked until >> the DM device is resumed. >> >> So ideally DM should be able to have the same capability even if using >> DAX. > > Supporting suspend for DAX is challenging since it allows user applications to > access a device directly. Once a device range is mmap'd, there is no kernel > intervention to access the range, unless we invalidate user mappings. This > isn't done today even after a driver is unbind'd from a device. > >> But that is different than what commit b2e0d1625e19 is addressing. For >> DM, I wouldn't think you'd need the extra protections that >> dax_map_atomic() is providing given that the underlying block device >> lifetime is managed via DM core's dm_get_device/dm_put_device (see also: >> dm.c:open_table_device/close_table_device). > > I thought so as well. But I realized that there is (almost) nothing that can > prevent the unbind operation. It cannot fail, either. This unbind proceeds > even when a device is in-use. In case of a pmem device, it is only protected > by pmem_release_queue(), which is called when a pmem device is being deleted > and calls blk_cleanup_queue() to serialize a critical section between > blk_queue_enter() and blk_queue_exit() per b2e0d1625e19. This prevents from a > kernel DTLB fault, but does not prevent a device disappeared while in-use. > > Protecting DM's underlining device with blk_queue_enter() (or something > similar) requires more thoughts... blk_queue_enter() to a DM device cannot be > redirected to its underlining device. So, this is TBD for now. But I do not > think this is a blocker issue since doing unbind to a underlining device is > quite harmful no matter what we do - even if it is protected with > blk_queue_enter(). I still have the "block device removed" notification patches on my todo list. It's not a blocker, but there are scenarios where we can keep accessing memory via dax of a disabled device leading to memory corruption. I'll bump that up in my queue now that we are looking at additional scenarios where letting DAX mappings leak past the reconfiguration of a block device could lead to trouble.
On Wed, 2016-06-22 at 12:15 -0700, Dan Williams wrote: > On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: > > > > > > On Tue, Jun 21 2016 at 11:44am -0400, > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > > > > rather than establish GENHD_FL_DAX on the genhd? > > > > > > > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) > > > > > to check for a queue flag. > > > > > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a > > > > request queue, except for protecting the underlining device being > > > > disabled while direct_access() is called (b2e0d1625e19). > > > > > > The devices in question have a request_queue. All bio-based device have > > > a request_queue. > > > > DAX-capable devices have two operation modes, bio-based and DAX. I agree > > that bio-based operation is associated with a request queue, and its > > capabilities should be set to it. DAX, on the other hand, is rather > > independent from a request queue. > > > > > I don't have a big problem with GENHD_FL_DAX. Just wanted to point out > > > that such block device capabilities are generally advertised in terms of > > > a QUEUE_FLAG. > > > > I do not have a strong opinion, but feel a bit odd to associate DAX to a > > request queue. > > Given that we do not support dax to a raw block device [1] it seems a > gendisk flag is more misleading than request_queue flag that specifies > what requests can be made of the device. > > [1]: acc93d30d7d4 Revert "block: enable dax for raw block devices" Oh, I see. I will change to use request_queue flag. > > > > About protecting direct_access, this patch assumes that the > > > > underlining device cannot be disabled until dtr() is called. Is this > > > > correct? If not, I will need to call dax_map_atomic(). > > > > > > One of the big design considerations for DM that a DM device can be > > > suspended (with or without flush) and any new IO will be blocked until > > > the DM device is resumed. > > > > > > So ideally DM should be able to have the same capability even if using > > > DAX. > > > > Supporting suspend for DAX is challenging since it allows user > > applications to access a device directly. Once a device range is mmap'd, > > there is no kernel intervention to access the range, unless we invalidate > > user mappings. This isn't done today even after a driver is unbind'd from > > a device. > > > > > But that is different than what commit b2e0d1625e19 is addressing. For > > > DM, I wouldn't think you'd need the extra protections that > > > dax_map_atomic() is providing given that the underlying block device > > > lifetime is managed via DM core's dm_get_device/dm_put_device (see also: > > > dm.c:open_table_device/close_table_device). > > > > I thought so as well. But I realized that there is (almost) nothing that > > can prevent the unbind operation. It cannot fail, either. This unbind > > proceeds even when a device is in-use. In case of a pmem device, it is > > only protected by pmem_release_queue(), which is called when a pmem device > > is being deleted and calls blk_cleanup_queue() to serialize a critical > > section between > > blk_queue_enter() and blk_queue_exit() per b2e0d1625e19. This prevents > > from a kernel DTLB fault, but does not prevent a device disappeared while > > in-use. > > > > Protecting DM's underlining device with blk_queue_enter() (or something > > similar) requires more thoughts... blk_queue_enter() to a DM device > > cannot be redirected to its underlining device. So, this is TBD for > > now. But I do not think this is a blocker issue since doing unbind to a > > underlining device is quite harmful no matter what we do - even if it is > > protected with blk_queue_enter(). > > I still have the "block device removed" notification patches on my > todo list. It's not a blocker, but there are scenarios where we can > keep accessing memory via dax of a disabled device leading to memory > corruption. Right, I noticed that user applications can access mmap'd ranges on a disabled device. > I'll bump that up in my queue now that we are looking at > additional scenarios where letting DAX mappings leak past the > reconfiguration of a block device could lead to trouble. Great. With DM, removing a underlining device while in-use can lead to trouble, esp. with RAID0. Users need to remove a device from DM first... Thanks, -Toshi
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 325aa06..38323e4 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti, sector_t sector, { struct linear_c *lc = ti->private; struct block_device *bdev = lc->dev->bdev; - const struct block_device_operations *bd_ops = bdev->bd_disk->fops; - - return bd_ops->direct_access(bdev, linear_map_sector(ti, sector), - kaddr, pfn, size); + struct blk_dax_ctl dax = { + .sector = linear_map_sector(ti, sector), + .size = size, + }; + long ret; + + ret = bdev_direct_access(bdev, &dax); + *kaddr = dax.addr; + *pfn = dax.pfn; + return ret; }