diff mbox

[v2,3/6] ext4: Use ext4_get_block_write() for DAX

Message ID 1435934443-17090-4-git-send-email-matthew.r.wilcox@intel.com
State New
Headers show

Commit Message

Wilcox, Matthew R July 3, 2015, 2:40 p.m. UTC
From: Matthew Wilcox <willy@linux.intel.com>

DAX relies on the get_block function either zeroing newly allocated blocks
before they're findable by subsequent calls to get_block, or marking newly
allocated blocks as unwritten.  ext4_get_block() cannot create unwritten
extents, but ext4_get_block_write() can.

Reported-by: Andy Rudoff <andy.rudoff@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 fs/ext4/file.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Theodore Y. Ts'o July 3, 2015, 6:30 p.m. UTC | #1
On Fri, Jul 03, 2015 at 10:40:40AM -0400, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> DAX relies on the get_block function either zeroing newly allocated blocks
> before they're findable by subsequent calls to get_block, or marking newly
> allocated blocks as unwritten.  ext4_get_block() cannot create unwritten
> extents, but ext4_get_block_write() can.

To be clear, this patch has no prerequistes or dependencies, right?
That is, it would be fine if I take this through the ext4 git tree?
Or is there a reason or a preference for carrying this patch
somewhere else?

Also, is there a way I can test the DAX functionality in ext4 using
KVM?  If so, can you give me a cheat sheet about how I can do that?

Thanks,

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Wilcox July 3, 2015, 6:48 p.m. UTC | #2
On Fri, Jul 03, 2015 at 02:30:27PM -0400, Theodore Ts'o wrote:
> On Fri, Jul 03, 2015 at 10:40:40AM -0400, Matthew Wilcox wrote:
> > From: Matthew Wilcox <willy@linux.intel.com>
> > 
> > DAX relies on the get_block function either zeroing newly allocated blocks
> > before they're findable by subsequent calls to get_block, or marking newly
> > allocated blocks as unwritten.  ext4_get_block() cannot create unwritten
> > extents, but ext4_get_block_write() can.
> 
> To be clear, this patch has no prerequistes or dependencies, right?
> That is, it would be fine if I take this through the ext4 git tree?
> Or is there a reason or a preference for carrying this patch
> somewhere else?

Right, no dependencies or prerequisites, completely independent of all
the other patches.

> Also, is there a way I can test the DAX functionality in ext4 using
> KVM?  If so, can you give me a cheat sheet about how I can do that?

I don't use KVM, but I can tell you what I do ... (additional explanation
added, not for Ted's benefit, but because people less familiar with
Linux than Ted is may happen upon this email for their own purposes).

In /etc/default/grub, I have this line:

GRUB_CMDLINE_LINUX="memmap=4G!4G"

        memmap=nn[KMG]!ss[KMG]
                        [KNL,X86] Mark specific memory as protected.
                        Region of memory to be used, from ss to ss+nn.
                        The memory region may be marked as e820 type 12 (0xc)
                        and is NVDIMM or ADR memory.


In my kernel config, I have:

CONFIG_X86_PMEM_LEGACY=y
CONFIG_LIBNVDIMM=y
CONFIG_BLK_DEV_PMEM=m

At boot, I "modprobe pmem".  On the desktop-class system I'm using as
my development machine, the BIOS doesn't clear RAM between boots (only
power cycles), so the partition table and ext4 filesystem stays good,
and all I have to do is:

mount -odax /dev/pmem0p1 /mnt/ram0/

Also my xfstests local.config:

TEST_DEV=/dev/pmem0p1
TEST_DIR=/mnt/ram0
SCRATCH_DEV=/dev/pmem0p2
SCRATCH_MNT=/mnt/ram1
TEST_FS_MOUNT_OPTS="-o dax"
EXT_MOUNT_OPTIONS="-o dax"
MKFS_OPTIONS="-b4096"

Hope I didn't forget anything.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Y. Ts'o July 3, 2015, 7:07 p.m. UTC | #3
On Fri, Jul 03, 2015 at 02:48:24PM -0400, Matthew Wilcox wrote:
> 
> At boot, I "modprobe pmem".

Is there a reason why it's important to build and load pmem as a
module?  If I use CONFIG_BLK_DEV_PMEM=y (which is more convenient
given how I launch my KVM test appliance), should I expect any
problems?

I assume that this won't detect any bugs caused by missing CLFLUSH
instructions, but I assume that when using NVM as a block device, this
isn't much of an issue, as long as we don't care about torn writes?
(How using NVM with metdata checksums, or any checksums for that
matter, seems to be an interesting question --- how do we recover from
a checksum failure after a power failure?)

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh July 5, 2015, 1:29 p.m. UTC | #4
On 07/03/2015 10:07 PM, Theodore Ts'o wrote:
> On Fri, Jul 03, 2015 at 02:48:24PM -0400, Matthew Wilcox wrote:
>>
>> At boot, I "modprobe pmem".
> 
> Is there a reason why it's important to build and load pmem as a
> module?  If I use CONFIG_BLK_DEV_PMEM=y (which is more convenient
> given how I launch my KVM test appliance), should I expect any
> problems?
> 

This (=y) should work fine. We use it a lot. (with KVM even boot an
image with -dax, with a trick)

Note that DAX need not be tested with pmem only, you can always use brd
at any given point without any reboot.

One more trick for xfstest I use:
	memmap=2G!4G,2G!6G

And have two pmem0/1 and don't need to bother with any fdisk. Do need
to mkfs every boot though.

BTW: with kvm a reboot with above memmap will give you back the exact
same memory. halt and "virsh start" is a different story.

> I assume that this won't detect any bugs caused by missing CLFLUSH
> instructions, but I assume that when using NVM as a block device, this
> isn't much of an issue, as long as we don't care about torn writes?
> (How using NVM with metdata checksums, or any checksums for that
> matter, seems to be an interesting question --- how do we recover from
> a checksum failure after a power failure?)
> 

Currently pmem maps a very-(very) slow ioremap_nocache. So any Kernel
memory access should be pmem persistent. For a real world faster ioremap,
there are few major pieces still missing in the stack to make it
persistent. Note the even today with  ioremap_nocache, any application
mmap (like git) is not persistent.

> 					- Ted

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc313ac..e5bdcb7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -195,7 +195,7 @@  out:
 static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate)
 {
 	struct inode *inode = bh->b_assoc_map->host;
-	/* XXX: breaks on 32-bit > 16GB. Is that even supported? */
+	/* XXX: breaks on 32-bit > 16TB. Is that even supported? */
 	loff_t offset = (loff_t)(uintptr_t)bh->b_private << inode->i_blkbits;
 	int err;
 	if (!uptodate)
@@ -206,13 +206,13 @@  static void ext4_end_io_unwritten(struct buffer_head *bh, int uptodate)
 
 static int ext4_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	return dax_fault(vma, vmf, ext4_get_block, ext4_end_io_unwritten);
-					/* Is this the right get_block? */
+	return dax_fault(vma, vmf, ext4_get_block_write, ext4_end_io_unwritten);
 }
 
 static int ext4_dax_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
-	return dax_mkwrite(vma, vmf, ext4_get_block, ext4_end_io_unwritten);
+	return dax_mkwrite(vma, vmf, ext4_get_block_write,
+				ext4_end_io_unwritten);
 }
 
 static const struct vm_operations_struct ext4_dax_vm_ops = {