Message ID | 150846714747.24336.14704246566580871364.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> --- a/arch/powerpc/sysdev/axonram.c > +++ b/arch/powerpc/sysdev/axonram.c > @@ -172,6 +172,7 @@ static size_t axon_ram_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, > > static const struct dax_operations axon_ram_dax_ops = { > .direct_access = axon_ram_dax_direct_access, > + > .copy_from_iter = axon_ram_copy_from_iter, Unrelated whitespace change. That being said - I don't think axonram has devmap support in any form, so this basically becomes dead code, doesn't it? > diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c > index 7abb240847c0..e7e5db07e339 100644 > --- a/drivers/s390/block/dcssblk.c > +++ b/drivers/s390/block/dcssblk.c > @@ -52,6 +52,7 @@ static size_t dcssblk_dax_copy_from_iter(struct dax_device *dax_dev, > > static const struct dax_operations dcssblk_dax_ops = { > .direct_access = dcssblk_dax_direct_access, > + > .copy_from_iter = dcssblk_dax_copy_from_iter, Same comments apply here.
On Fri, Oct 20, 2017 at 12:57 AM, Christoph Hellwig <hch@lst.de> wrote: >> --- a/arch/powerpc/sysdev/axonram.c >> +++ b/arch/powerpc/sysdev/axonram.c >> @@ -172,6 +172,7 @@ static size_t axon_ram_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, >> >> static const struct dax_operations axon_ram_dax_ops = { >> .direct_access = axon_ram_dax_direct_access, >> + >> .copy_from_iter = axon_ram_copy_from_iter, > > Unrelated whitespace change. That being said - I don't think axonram has > devmap support in any form, so this basically becomes dead code, doesn't > it? > >> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c >> index 7abb240847c0..e7e5db07e339 100644 >> --- a/drivers/s390/block/dcssblk.c >> +++ b/drivers/s390/block/dcssblk.c >> @@ -52,6 +52,7 @@ static size_t dcssblk_dax_copy_from_iter(struct dax_device *dax_dev, >> >> static const struct dax_operations dcssblk_dax_ops = { >> .direct_access = dcssblk_dax_direct_access, >> + >> .copy_from_iter = dcssblk_dax_copy_from_iter, > > Same comments apply here. Yes, however it seems these drivers / platforms have been living with the lack of struct page for a long time. So they either don't use DAX, or they have a constrained use case that never triggers get_user_pages(). If it is the latter then they could introduce a new configuration option that bypasses the pfn_t_devmap() check in bdev_dax_supported() and fix up the get_user_pages() paths to fail. So, I'd like to understand how these drivers have been using DAX support without struct page to see if we need a workaround or we can go ahead delete this support. If the usage is limited to execute-in-place perhaps we can do a constrained ->direct_access() for just that case.
On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: > Yes, however it seems these drivers / platforms have been living with > the lack of struct page for a long time. So they either don't use DAX, > or they have a constrained use case that never triggers > get_user_pages(). If it is the latter then they could introduce a new > configuration option that bypasses the pfn_t_devmap() check in > bdev_dax_supported() and fix up the get_user_pages() paths to fail. > So, I'd like to understand how these drivers have been using DAX > support without struct page to see if we need a workaround or we can > go ahead delete this support. If the usage is limited to > execute-in-place perhaps we can do a constrained ->direct_access() for > just that case. For axonram I doubt anyone is using it any more - it was a very for the IBM Cell blades, which were produceѕ in a rather limited number. And Cell basically seems to be dead as far as I can tell. For S/390 Martin might be able to help out what the status of xpram in general and DAX support in particular is.
On Fri, Oct 20, 2017 at 9:29 AM, Christoph Hellwig <hch@lst.de> wrote: > On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: >> Yes, however it seems these drivers / platforms have been living with >> the lack of struct page for a long time. So they either don't use DAX, >> or they have a constrained use case that never triggers >> get_user_pages(). If it is the latter then they could introduce a new >> configuration option that bypasses the pfn_t_devmap() check in >> bdev_dax_supported() and fix up the get_user_pages() paths to fail. >> So, I'd like to understand how these drivers have been using DAX >> support without struct page to see if we need a workaround or we can >> go ahead delete this support. If the usage is limited to >> execute-in-place perhaps we can do a constrained ->direct_access() for >> just that case. > > For axonram I doubt anyone is using it any more - it was a very for > the IBM Cell blades, which were produceѕ in a rather limited number. > And Cell basically seems to be dead as far as I can tell. > > For S/390 Martin might be able to help out what the status of xpram > in general and DAX support in particular is. Ok, I'd also like to kill DAX support in the brd driver. It's a source of complexity and maintenance burden for zero benefit. It's the only ->direct_access() implementation that sleeps and it's the only implementation where there is a non-linear relationship between sectors and pfns. Having a 1:1 sector to pfn relationship will help with the dma-extent-busy management since we don't need to keep calling into the driver to map pfns back to sectors once we know the pfn[0] sector[0] relationship.
On Fri, Oct 20, 2017 at 03:29:57PM -0700, Dan Williams wrote: > Ok, I'd also like to kill DAX support in the brd driver. It's a source > of complexity and maintenance burden for zero benefit. It's the only > ->direct_access() implementation that sleeps and it's the only > implementation where there is a non-linear relationship between > sectors and pfns. Having a 1:1 sector to pfn relationship will help > with the dma-extent-busy management since we don't need to keep > calling into the driver to map pfns back to sectors once we know the > pfn[0] sector[0] relationship. But these are important things that other block devices may / will want. For example, I think it's entirely sensible to support ->direct_access for RAID-0. Dell are looking at various different options for having one pmemX device per DIMM and using RAID to lash them together. ->direct_access makes no sense for RAID-5 or RAID-1, but RAID-0 makes sense to me. Last time we tried to take sleeping out, there were grumblings from people with network block devices who thought they'd want to bring pages in across the network. I'm a bit less sympathetic to this because I don't know anyone actively working on it, but the RAID-0 case is something I think we should care about.
On Fri, Oct 20, 2017 at 8:20 PM, Matthew Wilcox <willy@infradead.org> wrote: > On Fri, Oct 20, 2017 at 03:29:57PM -0700, Dan Williams wrote: >> Ok, I'd also like to kill DAX support in the brd driver. It's a source >> of complexity and maintenance burden for zero benefit. It's the only >> ->direct_access() implementation that sleeps and it's the only >> implementation where there is a non-linear relationship between >> sectors and pfns. Having a 1:1 sector to pfn relationship will help >> with the dma-extent-busy management since we don't need to keep >> calling into the driver to map pfns back to sectors once we know the >> pfn[0] sector[0] relationship. > > But these are important things that other block devices may / will want. > > For example, I think it's entirely sensible to support ->direct_access > for RAID-0. Dell are looking at various different options for having > one pmemX device per DIMM and using RAID to lash them together. > ->direct_access makes no sense for RAID-5 or RAID-1, but RAID-0 makes > sense to me. > > Last time we tried to take sleeping out, there were grumblings from people > with network block devices who thought they'd want to bring pages in > across the network. I'm a bit less sympathetic to this because I don't > know anyone actively working on it, but the RAID-0 case is something I > think we should care about. True, good point. In fact we already support device-mapper striping with ->direct_access(). I'd still like to go ahead with the sleeping removal. When those folks come back and add network direct_access they can do the hard work of figuring out cases where we need to call direct_access in atomic contexts.
On Fri, Oct 20, 2017 at 09:16:21PM -0700, Dan Williams wrote: > > For example, I think it's entirely sensible to support ->direct_access > > for RAID-0. Dell are looking at various different options for having > > one pmemX device per DIMM and using RAID to lash them together. > > ->direct_access makes no sense for RAID-5 or RAID-1, but RAID-0 makes > > sense to me. > > > > Last time we tried to take sleeping out, there were grumblings from people > > with network block devices who thought they'd want to bring pages in > > across the network. I'm a bit less sympathetic to this because I don't > > know anyone actively working on it, but the RAID-0 case is something I > > think we should care about. > > True, good point. In fact we already support device-mapper striping > with ->direct_access(). I'd still like to go ahead with the sleeping > removal. When those folks come back and add network direct_access they > can do the hard work of figuring out cases where we need to call > direct_access in atomic contexts. It would be great to move DAX striping out of DM so that we don't need to keep fake block devices around just for that. In fact if Dell is so interested in it it would be great if they get a strip/concact table into ACPI so that the bios and OS can agree on it in a standardized way, and we can just implement it in the nvdimm layer. I agree that there is no reason at all to support sleeping in ->direct_access - it makes life painful for no gain at all. If you network access remote memory you will need local memory to support mmap, so we might as well use the page cache instead of reinventing it. (saying that with my remote pmem over NFS hat on).
On Fri, 20 Oct 2017 18:29:33 +0200 Christoph Hellwig <hch@lst.de> wrote: > On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: > > Yes, however it seems these drivers / platforms have been living with > > the lack of struct page for a long time. So they either don't use DAX, > > or they have a constrained use case that never triggers > > get_user_pages(). If it is the latter then they could introduce a new > > configuration option that bypasses the pfn_t_devmap() check in > > bdev_dax_supported() and fix up the get_user_pages() paths to fail. > > So, I'd like to understand how these drivers have been using DAX > > support without struct page to see if we need a workaround or we can > > go ahead delete this support. If the usage is limited to > > execute-in-place perhaps we can do a constrained ->direct_access() for > > just that case. > > For axonram I doubt anyone is using it any more - it was a very for > the IBM Cell blades, which were produceѕ in a rather limited number. > And Cell basically seems to be dead as far as I can tell. > > For S/390 Martin might be able to help out what the status of xpram > in general and DAX support in particular is. The goes back to the time where DAX was called XIP. The initial design point has been *not* to have struct pages for a large read-only memory area. There is a block device driver for z/VM that maps a DCSS segment somewhere in memore (no struct page!) with e.g. the complete /usr filesystem. The xpram driver is a different beast and has nothing to do with XIP/DAX. Now, if any there are very few users of the dcssblk driver out there. The idea to save a few megabyte for /usr never really took of. We have to look at our get_user_pages() implementation to see how hard it would be to make it fail if the target address is for an area without struct pages.
On Sun, Oct 22, 2017 at 10:18 PM, Martin Schwidefsky <schwidefsky@de.ibm.com> wrote: > On Fri, 20 Oct 2017 18:29:33 +0200 > Christoph Hellwig <hch@lst.de> wrote: > >> On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: >> > Yes, however it seems these drivers / platforms have been living with >> > the lack of struct page for a long time. So they either don't use DAX, >> > or they have a constrained use case that never triggers >> > get_user_pages(). If it is the latter then they could introduce a new >> > configuration option that bypasses the pfn_t_devmap() check in >> > bdev_dax_supported() and fix up the get_user_pages() paths to fail. >> > So, I'd like to understand how these drivers have been using DAX >> > support without struct page to see if we need a workaround or we can >> > go ahead delete this support. If the usage is limited to >> > execute-in-place perhaps we can do a constrained ->direct_access() for >> > just that case. >> >> For axonram I doubt anyone is using it any more - it was a very for >> the IBM Cell blades, which were produceѕ in a rather limited number. >> And Cell basically seems to be dead as far as I can tell. >> >> For S/390 Martin might be able to help out what the status of xpram >> in general and DAX support in particular is. > > The goes back to the time where DAX was called XIP. The initial design > point has been *not* to have struct pages for a large read-only memory > area. There is a block device driver for z/VM that maps a DCSS segment > somewhere in memore (no struct page!) with e.g. the complete /usr > filesystem. The xpram driver is a different beast and has nothing to > do with XIP/DAX. > > Now, if any there are very few users of the dcssblk driver out there. > The idea to save a few megabyte for /usr never really took of. > > We have to look at our get_user_pages() implementation to see how hard > it would be to make it fail if the target address is for an area without > struct pages. For read-only memory I think we can enable a subset of DAX, and explicitly turn off the paths that require get_user_pages(). However, I wonder if anyone has tested DAX with dcssblk because fork() requires get_user_pages()?
On Mon, 23 Oct 2017 01:55:20 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > On Sun, Oct 22, 2017 at 10:18 PM, Martin Schwidefsky > <schwidefsky@de.ibm.com> wrote: > > On Fri, 20 Oct 2017 18:29:33 +0200 > > Christoph Hellwig <hch@lst.de> wrote: > > > >> On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: > >> > Yes, however it seems these drivers / platforms have been living with > >> > the lack of struct page for a long time. So they either don't use DAX, > >> > or they have a constrained use case that never triggers > >> > get_user_pages(). If it is the latter then they could introduce a new > >> > configuration option that bypasses the pfn_t_devmap() check in > >> > bdev_dax_supported() and fix up the get_user_pages() paths to fail. > >> > So, I'd like to understand how these drivers have been using DAX > >> > support without struct page to see if we need a workaround or we can > >> > go ahead delete this support. If the usage is limited to > >> > execute-in-place perhaps we can do a constrained ->direct_access() for > >> > just that case. > >> > >> For axonram I doubt anyone is using it any more - it was a very for > >> the IBM Cell blades, which were produceѕ in a rather limited number. > >> And Cell basically seems to be dead as far as I can tell. > >> > >> For S/390 Martin might be able to help out what the status of xpram > >> in general and DAX support in particular is. > > > > The goes back to the time where DAX was called XIP. The initial design > > point has been *not* to have struct pages for a large read-only memory > > area. There is a block device driver for z/VM that maps a DCSS segment > > somewhere in memore (no struct page!) with e.g. the complete /usr > > filesystem. The xpram driver is a different beast and has nothing to > > do with XIP/DAX. > > > > Now, if any there are very few users of the dcssblk driver out there. > > The idea to save a few megabyte for /usr never really took of. > > > > We have to look at our get_user_pages() implementation to see how hard > > it would be to make it fail if the target address is for an area without > > struct pages. > > For read-only memory I think we can enable a subset of DAX, and > explicitly turn off the paths that require get_user_pages(). However, > I wonder if anyone has tested DAX with dcssblk because fork() requires > get_user_pages()? I did not test it recently, someone else might have. Gerald? Looking at the code I see this in the s390 version of gup_pte_range: mask = (write ? _PAGE_PROTECT : 0) | _PAGE_INVALID | _PAGE_SPECIAL; ... if ((pte_val(pte) & mask) != 0) return 0; ... The XIP code used the pte_mkspecial mechanics to make it work. As far as I can see the pfn_t_devmap returns true for the DAX mappins, yes? Then I would say that dcssblk and DAX currently do not work together.
On Mon, Oct 23, 2017 at 3:44 AM, Martin Schwidefsky <schwidefsky@de.ibm.com> wrote: > On Mon, 23 Oct 2017 01:55:20 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > >> On Sun, Oct 22, 2017 at 10:18 PM, Martin Schwidefsky >> <schwidefsky@de.ibm.com> wrote: >> > On Fri, 20 Oct 2017 18:29:33 +0200 >> > Christoph Hellwig <hch@lst.de> wrote: >> > >> >> On Fri, Oct 20, 2017 at 08:23:02AM -0700, Dan Williams wrote: >> >> > Yes, however it seems these drivers / platforms have been living with >> >> > the lack of struct page for a long time. So they either don't use DAX, >> >> > or they have a constrained use case that never triggers >> >> > get_user_pages(). If it is the latter then they could introduce a new >> >> > configuration option that bypasses the pfn_t_devmap() check in >> >> > bdev_dax_supported() and fix up the get_user_pages() paths to fail. >> >> > So, I'd like to understand how these drivers have been using DAX >> >> > support without struct page to see if we need a workaround or we can >> >> > go ahead delete this support. If the usage is limited to >> >> > execute-in-place perhaps we can do a constrained ->direct_access() for >> >> > just that case. >> >> >> >> For axonram I doubt anyone is using it any more - it was a very for >> >> the IBM Cell blades, which were produceѕ in a rather limited number. >> >> And Cell basically seems to be dead as far as I can tell. >> >> >> >> For S/390 Martin might be able to help out what the status of xpram >> >> in general and DAX support in particular is. >> > >> > The goes back to the time where DAX was called XIP. The initial design >> > point has been *not* to have struct pages for a large read-only memory >> > area. There is a block device driver for z/VM that maps a DCSS segment >> > somewhere in memore (no struct page!) with e.g. the complete /usr >> > filesystem. The xpram driver is a different beast and has nothing to >> > do with XIP/DAX. >> > >> > Now, if any there are very few users of the dcssblk driver out there. >> > The idea to save a few megabyte for /usr never really took of. >> > >> > We have to look at our get_user_pages() implementation to see how hard >> > it would be to make it fail if the target address is for an area without >> > struct pages. >> >> For read-only memory I think we can enable a subset of DAX, and >> explicitly turn off the paths that require get_user_pages(). However, >> I wonder if anyone has tested DAX with dcssblk because fork() requires >> get_user_pages()? > > I did not test it recently, someone else might have. Gerald? > > Looking at the code I see this in the s390 version of gup_pte_range: > > mask = (write ? _PAGE_PROTECT : 0) | _PAGE_INVALID | _PAGE_SPECIAL; > ... > if ((pte_val(pte) & mask) != 0) > return 0; > ... > > The XIP code used the pte_mkspecial mechanics to make it work. As far as > I can see the pfn_t_devmap returns true for the DAX mappins, yes? Yes, but that's only for get_user_pages_fast() support. > Then I would say that dcssblk and DAX currently do not work together. I think at a minimum we need a new pfn_t flag for the 'special' bit to at least indicate that DAX mappings of dcssblk and axonram do not support normal get_user_pages(). Then I don't need to explicitly disable DAX in the !pfn_t_devmap() case. I think I also want to split the "pfn_to_virt()" and the "sector to pfn" operations into distinct dax_operations rather than doing both in one ->direct_access(). This supports storing pfns in the fs/dax radix rather than sectors. In other words, the pfn_t_devmap() requirement was only about making get_user_pages() safely fail, and pte_special() fills that requirement.
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c index c60e84e4558d..9da64d95e6f1 100644 --- a/arch/powerpc/sysdev/axonram.c +++ b/arch/powerpc/sysdev/axonram.c @@ -172,6 +172,7 @@ static size_t axon_ram_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, static const struct dax_operations axon_ram_dax_ops = { .direct_access = axon_ram_dax_direct_access, + .copy_from_iter = axon_ram_copy_from_iter, }; diff --git a/drivers/dax/super.c b/drivers/dax/super.c index b0cc8117eebe..26c324a5aef4 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -15,6 +15,7 @@ #include <linux/mount.h> #include <linux/magic.h> #include <linux/genhd.h> +#include <linux/pfn_t.h> #include <linux/cdev.h> #include <linux/hash.h> #include <linux/slab.h> @@ -123,6 +124,12 @@ int __bdev_dax_supported(struct super_block *sb, int blocksize) return len < 0 ? len : -EIO; } + if (!pfn_t_devmap(pfn)) { + pr_debug("VFS (%s): error: dax support not enabled\n", + sb->s_id); + return -EOPNOTSUPP; + } + return 0; } EXPORT_SYMBOL_GPL(__bdev_dax_supported); diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index 7abb240847c0..e7e5db07e339 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -52,6 +52,7 @@ static size_t dcssblk_dax_copy_from_iter(struct dax_device *dax_dev, static const struct dax_operations dcssblk_dax_ops = { .direct_access = dcssblk_dax_direct_access, + .copy_from_iter = dcssblk_dax_copy_from_iter, };
If a dax buffer from a device that does not map pages is passed to read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If gdb attempts to examine the contents of a dax buffer from a device that does not map pages it triggers SIGBUS. If fork(2) is called on a process with a dax mapping from a device that does not map pages it triggers SIGBUS. 'struct page' is required otherwise several kernel code paths break in surprising ways. Disable filesystem-dax on devices that do not map pages. In addition to needing pfn_to_page() to be valid we also require devmap pages. We need this to detect dax pages in the get_user_pages_fast() path and so that we can stop managing the VM_MIXEDMAP flag. This impacts the dax drivers that do not use dev_memremap_pages(): brd, dcssblk, and axonram. Note that when the initial dax support was being merged a few years back there was concern that struct page was unsuitable for use with next generation persistent memory devices. The theoretical concern was that struct page access, being such a hotly used data structure in the kernel, would lead to media wear out. While that was a reasonable conservative starting position it has not held true in practice. We have long since committed to using devm_memremap_pages() to support higher order kernel functionality that needs get_user_pages() and pfn_to_page(). Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/powerpc/sysdev/axonram.c | 1 + drivers/dax/super.c | 7 +++++++ drivers/s390/block/dcssblk.c | 1 + 3 files changed, 9 insertions(+)