Message ID | 148559261791.11180.12085685820239925499.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote: > Provide a replacement for bdev_direct_access() that uses > dax_operations.direct_access() instead of > block_device_operations.direct_access(). Once all consumers of the old > api have been converted bdev_direct_access() will be deleted. > > Given that block device partitioning decisions can cause dax page > alignment constraints to be violated we still need to validate the > block_device before calling the dax ->direct_access method. > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > block/Kconfig | 1 + > drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++ > fs/block_dev.c | 28 ++++++++++++++++++++++++++++ > include/linux/blkdev.h | 3 +++ > include/linux/dax.h | 2 ++ > 5 files changed, 67 insertions(+) > > diff --git a/block/Kconfig b/block/Kconfig > index 8bf114a3858a..9be785173280 100644 > --- a/block/Kconfig > +++ b/block/Kconfig > @@ -6,6 +6,7 @@ menuconfig BLOCK > default y > select SBITMAP > select SRCU > + select DAX > help > Provide block layer support for the kernel. > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index eb844ffea3cf..ab5b082df5dd 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -65,6 +65,39 @@ struct dax_inode { > const struct dax_operations *ops; > }; > > +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr, > + void **kaddr, pfn_t *pfn, long size) > +{ > + long avail; > + > + /* > + * The device driver is allowed to sleep, in order to make the > + * memory directly accessible. > + */ > + might_sleep(); > + > + if (!dax_inode) > + return -EOPNOTSUPP; > + > + if (!dax_inode_alive(dax_inode)) > + return -ENXIO; > + > + if (size < 0) > + return size; > + > + if (dev_addr % PAGE_SIZE) > + return -EINVAL; > + > + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn, > + size); > + if (!avail) > + return -ERANGE; > + if (avail > 0 && avail & ~PAGE_MASK) > + return -ENXIO; > + return min(avail, size); > +} > +EXPORT_SYMBOL_GPL(dax_direct_access); > + > bool dax_inode_alive(struct dax_inode *dax_inode) > { > lockdep_assert_held(&dax_srcu); > diff --git a/fs/block_dev.c b/fs/block_dev.c > index edb1d2b16b8f..bf4b51a3a412 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -18,6 +18,7 @@ > #include <linux/module.h> > #include <linux/blkpg.h> > #include <linux/magic.h> > +#include <linux/dax.h> > #include <linux/buffer_head.h> > #include <linux/swap.h> > #include <linux/pagevec.h> > @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) > EXPORT_SYMBOL_GPL(bdev_direct_access); > > /** > + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address > + * @bdev: host block device for @dax_inode > + * @dax_inode: interface data and operations for a memory device > + * @dax: control and output parameters for ->direct_access > + * > + * Return: negative errno if an error occurs, otherwise the number of bytes > + * accessible at this address. > + * > + * Locking: must be called with dax_read_lock() held > + */ > +long bdev_dax_direct_access(struct block_device *bdev, > + struct dax_inode *dax_inode, struct blk_dax_ctl *dax) > +{ > + sector_t sector = dax->sector; > + > + if (!blk_queue_dax(bdev->bd_queue)) > + return -EOPNOTSUPP; I don't think this should take a bdev - the caller should know if it has a dax_inode. Also if you touch this anyway can we kill the annoying struct blk_dax_ctl calling convention? Passing the four arguments explicitly is just a lot more readable and understandable. > + if ((sector + DIV_ROUND_UP(dax->size, 512)) > + > part_nr_sects_read(bdev->bd_part)) > + return -ERANGE; > + sector += get_start_sect(bdev); > + return dax_direct_access(dax_inode, sector * 512, &dax->addr, > + &dax->pfn, dax->size); And please switch to using bytes as the granularity given that we're deadling with byte addressable memory. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 30, 2017 at 4:32 AM, Christoph Hellwig <hch@lst.de> wrote: > On Sat, Jan 28, 2017 at 12:36:58AM -0800, Dan Williams wrote: >> Provide a replacement for bdev_direct_access() that uses >> dax_operations.direct_access() instead of >> block_device_operations.direct_access(). Once all consumers of the old >> api have been converted bdev_direct_access() will be deleted. >> >> Given that block device partitioning decisions can cause dax page >> alignment constraints to be violated we still need to validate the >> block_device before calling the dax ->direct_access method. >> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> --- >> block/Kconfig | 1 + >> drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++ >> fs/block_dev.c | 28 ++++++++++++++++++++++++++++ >> include/linux/blkdev.h | 3 +++ >> include/linux/dax.h | 2 ++ >> 5 files changed, 67 insertions(+) >> >> diff --git a/block/Kconfig b/block/Kconfig >> index 8bf114a3858a..9be785173280 100644 >> --- a/block/Kconfig >> +++ b/block/Kconfig >> @@ -6,6 +6,7 @@ menuconfig BLOCK >> default y >> select SBITMAP >> select SRCU >> + select DAX >> help >> Provide block layer support for the kernel. >> >> diff --git a/drivers/dax/super.c b/drivers/dax/super.c >> index eb844ffea3cf..ab5b082df5dd 100644 >> --- a/drivers/dax/super.c >> +++ b/drivers/dax/super.c >> @@ -65,6 +65,39 @@ struct dax_inode { >> const struct dax_operations *ops; >> }; >> >> +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr, >> + void **kaddr, pfn_t *pfn, long size) >> +{ >> + long avail; >> + >> + /* >> + * The device driver is allowed to sleep, in order to make the >> + * memory directly accessible. >> + */ >> + might_sleep(); >> + >> + if (!dax_inode) >> + return -EOPNOTSUPP; >> + >> + if (!dax_inode_alive(dax_inode)) >> + return -ENXIO; >> + >> + if (size < 0) >> + return size; >> + >> + if (dev_addr % PAGE_SIZE) >> + return -EINVAL; >> + >> + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn, >> + size); >> + if (!avail) >> + return -ERANGE; >> + if (avail > 0 && avail & ~PAGE_MASK) >> + return -ENXIO; >> + return min(avail, size); >> +} >> +EXPORT_SYMBOL_GPL(dax_direct_access); >> + >> bool dax_inode_alive(struct dax_inode *dax_inode) >> { >> lockdep_assert_held(&dax_srcu); >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index edb1d2b16b8f..bf4b51a3a412 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -18,6 +18,7 @@ >> #include <linux/module.h> >> #include <linux/blkpg.h> >> #include <linux/magic.h> >> +#include <linux/dax.h> >> #include <linux/buffer_head.h> >> #include <linux/swap.h> >> #include <linux/pagevec.h> >> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) >> EXPORT_SYMBOL_GPL(bdev_direct_access); >> >> /** >> + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address >> + * @bdev: host block device for @dax_inode >> + * @dax_inode: interface data and operations for a memory device >> + * @dax: control and output parameters for ->direct_access >> + * >> + * Return: negative errno if an error occurs, otherwise the number of bytes >> + * accessible at this address. >> + * >> + * Locking: must be called with dax_read_lock() held >> + */ >> +long bdev_dax_direct_access(struct block_device *bdev, >> + struct dax_inode *dax_inode, struct blk_dax_ctl *dax) >> +{ >> + sector_t sector = dax->sector; >> + >> + if (!blk_queue_dax(bdev->bd_queue)) >> + return -EOPNOTSUPP; > > I don't think this should take a bdev - the caller should know if > it has a dax_inode. Also if you touch this anyway can we kill > the annoying struct blk_dax_ctl calling convention? Passing the > four arguments explicitly is just a lot more readable and understandable. Ok, now that dax_map_atomic() is gone, it's much easier to remove struct blk_dax_ctl. We can also move the partition alignment checks to be a one-time check at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor of calling dax_direct_access() directly. >> + if ((sector + DIV_ROUND_UP(dax->size, 512)) >> + > part_nr_sects_read(bdev->bd_part)) >> + return -ERANGE; >> + sector += get_start_sect(bdev); >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr, >> + &dax->pfn, dax->size); > > And please switch to using bytes as the granularity given that we're > deadling with byte addressable memory. dax_direct_access() does take a byte aligned physical address, but it needs to be at least page aligned since we are returning a pfn_t... Hmm, perhaps the input should be raw page frame number. We could reduce one of the arguments by making the current 'pfn_t *' parameter an in/out-parameter. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote: > Ok, now that dax_map_atomic() is gone, it's much easier to remove > struct blk_dax_ctl. > > We can also move the partition alignment checks to be a one-time check > at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor > of calling dax_direct_access() directly. Yes, please. > >> + if ((sector + DIV_ROUND_UP(dax->size, 512)) > >> + > part_nr_sects_read(bdev->bd_part)) > >> + return -ERANGE; > >> + sector += get_start_sect(bdev); > >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr, > >> + &dax->pfn, dax->size); > > > > And please switch to using bytes as the granularity given that we're > > deadling with byte addressable memory. > > dax_direct_access() does take a byte aligned physical address, but it > needs to be at least page aligned since we are returning a pfn_t... > > Hmm, perhaps the input should be raw page frame number. We could > reduce one of the arguments by making the current 'pfn_t *' parameter > an in/out-parameter. In/Out parameters are always a bit problematic in terms of API clarity. And updating a device-relative address with an absolute physical one sounds like an odd API for sure. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 1, 2017 at 12:10 AM, Christoph Hellwig <hch@lst.de> wrote: > On Mon, Jan 30, 2017 at 10:16:29AM -0800, Dan Williams wrote: >> Ok, now that dax_map_atomic() is gone, it's much easier to remove >> struct blk_dax_ctl. >> >> We can also move the partition alignment checks to be a one-time check >> at bdev_dax_capable() time and kill bdev_dax_direct_access() in favor >> of calling dax_direct_access() directly. > > Yes, please. > >> >> + if ((sector + DIV_ROUND_UP(dax->size, 512)) >> >> + > part_nr_sects_read(bdev->bd_part)) >> >> + return -ERANGE; >> >> + sector += get_start_sect(bdev); >> >> + return dax_direct_access(dax_inode, sector * 512, &dax->addr, >> >> + &dax->pfn, dax->size); >> > >> > And please switch to using bytes as the granularity given that we're >> > deadling with byte addressable memory. >> >> dax_direct_access() does take a byte aligned physical address, but it >> needs to be at least page aligned since we are returning a pfn_t... >> >> Hmm, perhaps the input should be raw page frame number. We could >> reduce one of the arguments by making the current 'pfn_t *' parameter >> an in/out-parameter. > > In/Out parameters are always a bit problematic in terms of API clarity. > And updating a device-relative address with an absolute physical one > sounds like an odd API for sure. Yes, it does, and I thought better of it shortly after sending that. How about: long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, unsigned long nr_pages, void **kaddr, pfn_t *pfn) -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 01, 2017 at 01:21:40AM -0800, Dan Williams wrote: > > In/Out parameters are always a bit problematic in terms of API clarity. > > And updating a device-relative address with an absolute physical one > > sounds like an odd API for sure. > > Yes, it does, and I thought better of it shortly after sending that. How about: > > long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, > unsigned long nr_pages, void **kaddr, pfn_t *pfn) Yes, that looks good to me. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/Kconfig b/block/Kconfig index 8bf114a3858a..9be785173280 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -6,6 +6,7 @@ menuconfig BLOCK default y select SBITMAP select SRCU + select DAX help Provide block layer support for the kernel. diff --git a/drivers/dax/super.c b/drivers/dax/super.c index eb844ffea3cf..ab5b082df5dd 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -65,6 +65,39 @@ struct dax_inode { const struct dax_operations *ops; }; +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr, + void **kaddr, pfn_t *pfn, long size) +{ + long avail; + + /* + * The device driver is allowed to sleep, in order to make the + * memory directly accessible. + */ + might_sleep(); + + if (!dax_inode) + return -EOPNOTSUPP; + + if (!dax_inode_alive(dax_inode)) + return -ENXIO; + + if (size < 0) + return size; + + if (dev_addr % PAGE_SIZE) + return -EINVAL; + + avail = dax_inode->ops->direct_access(dax_inode, dev_addr, kaddr, pfn, + size); + if (!avail) + return -ERANGE; + if (avail > 0 && avail & ~PAGE_MASK) + return -ENXIO; + return min(avail, size); +} +EXPORT_SYMBOL_GPL(dax_direct_access); + bool dax_inode_alive(struct dax_inode *dax_inode) { lockdep_assert_held(&dax_srcu); diff --git a/fs/block_dev.c b/fs/block_dev.c index edb1d2b16b8f..bf4b51a3a412 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -18,6 +18,7 @@ #include <linux/module.h> #include <linux/blkpg.h> #include <linux/magic.h> +#include <linux/dax.h> #include <linux/buffer_head.h> #include <linux/swap.h> #include <linux/pagevec.h> @@ -763,6 +764,33 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) EXPORT_SYMBOL_GPL(bdev_direct_access); /** + * bdev_dax_direct_access() - bdev-sector to pfn_t and kernel virtual address + * @bdev: host block device for @dax_inode + * @dax_inode: interface data and operations for a memory device + * @dax: control and output parameters for ->direct_access + * + * Return: negative errno if an error occurs, otherwise the number of bytes + * accessible at this address. + * + * Locking: must be called with dax_read_lock() held + */ +long bdev_dax_direct_access(struct block_device *bdev, + struct dax_inode *dax_inode, struct blk_dax_ctl *dax) +{ + sector_t sector = dax->sector; + + if (!blk_queue_dax(bdev->bd_queue)) + return -EOPNOTSUPP; + if ((sector + DIV_ROUND_UP(dax->size, 512)) + > part_nr_sects_read(bdev->bd_part)) + return -ERANGE; + sector += get_start_sect(bdev); + return dax_direct_access(dax_inode, sector * 512, &dax->addr, + &dax->pfn, dax->size); +} +EXPORT_SYMBOL_GPL(bdev_dax_direct_access); + +/** * bdev_dax_supported() - Check if the device supports dax for filesystem * @sb: The superblock of the device * @blocksize: The block size of the device diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 5e7706f7d533..3b3c5ce376fd 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1903,6 +1903,9 @@ extern int bdev_read_page(struct block_device *, sector_t, struct page *); extern int bdev_write_page(struct block_device *, sector_t, struct page *, struct writeback_control *); extern long bdev_direct_access(struct block_device *, struct blk_dax_ctl *); +struct dax_inode; +extern long bdev_dax_direct_access(struct block_device *bdev, + struct dax_inode *dax_inode, struct blk_dax_ctl *dax); extern int bdev_dax_supported(struct super_block *, int); #else /* CONFIG_BLOCK */ diff --git a/include/linux/dax.h b/include/linux/dax.h index 5aa620e8e5a2..2ef8e18e2587 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -22,6 +22,8 @@ void *dax_inode_get_private(struct dax_inode *dax_inode); void put_dax_inode(struct dax_inode *dax_inode); bool dax_inode_alive(struct dax_inode *dax_inode); void kill_dax_inode(struct dax_inode *dax_inode); +long dax_direct_access(struct dax_inode *dax_inode, phys_addr_t dev_addr, + void **kaddr, pfn_t *pfn, long size); /* * We use lowest available bit in exceptional entry for locking, one bit for
Provide a replacement for bdev_direct_access() that uses dax_operations.direct_access() instead of block_device_operations.direct_access(). Once all consumers of the old api have been converted bdev_direct_access() will be deleted. Given that block device partitioning decisions can cause dax page alignment constraints to be violated we still need to validate the block_device before calling the dax ->direct_access method. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- block/Kconfig | 1 + drivers/dax/super.c | 33 +++++++++++++++++++++++++++++++++ fs/block_dev.c | 28 ++++++++++++++++++++++++++++ include/linux/blkdev.h | 3 +++ include/linux/dax.h | 2 ++ 5 files changed, 67 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html