diff mbox

nvdimm, btt: make sure initializing new metadata clears poison

Message ID 20170426224307.28075-1-vishal.l.verma@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Verma, Vishal L April 26, 2017, 10:43 p.m. UTC
If we had badblocks/poison in the metadata area of a BTT, recreating the
BTT would not clear the poison in all cases, notably the flog area. This
is because rw_bytes will only clear errors if the request being sent
down is 512B aligned and sized.

Make sure that when writing the map and info blocks, the rw_bytes being
sent are of the correct size/alignment. For the flog, instead of doing
the smaller log_entry writes only, first do a 'wipe' of the entire area
by writing zeroes in large enough chunks so that errors get cleared.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/nvdimm/btt.c | 54 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 47 insertions(+), 7 deletions(-)

Comments

Dan Williams May 8, 2017, 5 p.m. UTC | #1
On Wed, Apr 26, 2017 at 3:43 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> If we had badblocks/poison in the metadata area of a BTT, recreating the
> BTT would not clear the poison in all cases, notably the flog area. This
> is because rw_bytes will only clear errors if the request being sent
> down is 512B aligned and sized.
>
> Make sure that when writing the map and info blocks, the rw_bytes being
> sent are of the correct size/alignment. For the flog, instead of doing
> the smaller log_entry writes only, first do a 'wipe' of the entire area
> by writing zeroes in large enough chunks so that errors get cleared.

These eventually nsio_rw_bytes() while the namespace is claimed by a
btt instance, so this collides with the hack to disable error clearing
for btt. If we want to fix this up for 4.12 I think we need to pass a
'flags' parameter to the ->rw_bytes() routine to indicate that we are
in atomic context (NVDIMM_IO_ATOMIC), rather than assuming that all
BTT I/O is atomic. Care to code that up? I think we can include it in
a pull request before the merge window closes.
Verma, Vishal L May 8, 2017, 9:17 p.m. UTC | #2
On Mon, 2017-05-08 at 10:00 -0700, Dan Williams wrote:
> On Wed, Apr 26, 2017 at 3:43 PM, Vishal Verma <vishal.l.verma@intel.c
> om> wrote:
> > If we had badblocks/poison in the metadata area of a BTT,
> > recreating the
> > BTT would not clear the poison in all cases, notably the flog area.
> > This
> > is because rw_bytes will only clear errors if the request being
> > sent
> > down is 512B aligned and sized.
> > 
> > Make sure that when writing the map and info blocks, the rw_bytes
> > being
> > sent are of the correct size/alignment. For the flog, instead of
> > doing
> > the smaller log_entry writes only, first do a 'wipe' of the entire
> > area
> > by writing zeroes in large enough chunks so that errors get
> > cleared.
> 
> These eventually nsio_rw_bytes() while the namespace is claimed by a
> btt instance, so this collides with the hack to disable error
> clearing
> for btt. If we want to fix this up for 4.12 I think we need to pass a
> 'flags' parameter to the ->rw_bytes() routine to indicate that we are
> in atomic context (NVDIMM_IO_ATOMIC), rather than assuming that all
> BTT I/O is atomic. Care to code that up? I think we can include it in
> a pull request before the merge window closes.

Ah good point. You mean a flag to say that we're in process context
right?We want to skip the clearing in atomic context..
Dan Williams May 8, 2017, 9:22 p.m. UTC | #3
On Mon, May 8, 2017 at 2:17 PM, Verma, Vishal L
<vishal.l.verma@intel.com> wrote:
> On Mon, 2017-05-08 at 10:00 -0700, Dan Williams wrote:
>> On Wed, Apr 26, 2017 at 3:43 PM, Vishal Verma <vishal.l.verma@intel.c
>> om> wrote:
>> > If we had badblocks/poison in the metadata area of a BTT,
>> > recreating the
>> > BTT would not clear the poison in all cases, notably the flog area.
>> > This
>> > is because rw_bytes will only clear errors if the request being
>> > sent
>> > down is 512B aligned and sized.
>> >
>> > Make sure that when writing the map and info blocks, the rw_bytes
>> > being
>> > sent are of the correct size/alignment. For the flog, instead of
>> > doing
>> > the smaller log_entry writes only, first do a 'wipe' of the entire
>> > area
>> > by writing zeroes in large enough chunks so that errors get
>> > cleared.
>>
>> These eventually nsio_rw_bytes() while the namespace is claimed by a
>> btt instance, so this collides with the hack to disable error
>> clearing
>> for btt. If we want to fix this up for 4.12 I think we need to pass a
>> 'flags' parameter to the ->rw_bytes() routine to indicate that we are
>> in atomic context (NVDIMM_IO_ATOMIC), rather than assuming that all
>> BTT I/O is atomic. Care to code that up? I think we can include it in
>> a pull request before the merge window closes.
>
> Ah good point. You mean a flag to say that we're in process context
> right?We want to skip the clearing in atomic context..

Either way, a flag to indicate atomic vs process so that we can decide
whether to skip vs allow error clearing.
diff mbox

Patch

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 368795a..6054e83 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -57,6 +57,14 @@  static int btt_info_write(struct arena_info *arena, struct btt_sb *super)
 {
 	int ret;
 
+	/*
+	 * infooff and info2off should always be at least 512B aligned.
+	 * We rely on that to make sure rw_bytes does error clearing
+	 * correctly, so make sure that is the case.
+	 */
+	WARN_ON_ONCE(!IS_ALIGNED(arena->infooff, 512));
+	WARN_ON_ONCE(!IS_ALIGNED(arena->info2off, 512));
+
 	ret = arena_write_bytes(arena, arena->info2off, super,
 			sizeof(struct btt_sb));
 	if (ret)
@@ -393,9 +401,17 @@  static int btt_map_init(struct arena_info *arena)
 	if (!zerobuf)
 		return -ENOMEM;
 
+	/*
+	 * mapoff should always be at least 512B  aligned. We rely on that to
+	 * make sure rw_bytes does error clearing correctly, so make sure that
+	 * is the case.
+	 */
+	WARN_ON_ONCE(!IS_ALIGNED(arena->mapoff, 512));
+
 	while (mapsize) {
 		size_t size = min(mapsize, chunk_size);
 
+		WARN_ON_ONCE(size < 512);
 		ret = arena_write_bytes(arena, arena->mapoff + offset, zerobuf,
 				size);
 		if (ret)
@@ -417,11 +433,36 @@  static int btt_map_init(struct arena_info *arena)
  */
 static int btt_log_init(struct arena_info *arena)
 {
+	size_t logsize = arena->info2off - arena->logoff;
+	size_t chunk_size = SZ_4K, offset = 0;
+	struct log_entry log;
+	void *zerobuf;
 	int ret;
 	u32 i;
-	struct log_entry log, zerolog;
 
-	memset(&zerolog, 0, sizeof(zerolog));
+	zerobuf = kzalloc(chunk_size, GFP_KERNEL);
+	if (!zerobuf)
+		return -ENOMEM;
+	/*
+	 * logoff should always be at least 512B  aligned. We rely on that to
+	 * make sure rw_bytes does error clearing correctly, so make sure that
+	 * is the case.
+	 */
+	WARN_ON_ONCE(!IS_ALIGNED(arena->logoff, 512));
+
+	while (logsize) {
+		size_t size = min(logsize, chunk_size);
+
+		WARN_ON_ONCE(size < 512);
+		ret = arena_write_bytes(arena, arena->logoff + offset, zerobuf,
+				size);
+		if (ret)
+			goto free;
+
+		offset += size;
+		logsize -= size;
+		cond_resched();
+	}
 
 	for (i = 0; i < arena->nfree; i++) {
 		log.lba = cpu_to_le32(i);
@@ -430,13 +471,12 @@  static int btt_log_init(struct arena_info *arena)
 		log.seq = cpu_to_le32(LOG_SEQ_INIT);
 		ret = __btt_log_write(arena, i, 0, &log);
 		if (ret)
-			return ret;
-		ret = __btt_log_write(arena, i, 1, &zerolog);
-		if (ret)
-			return ret;
+			goto free;
 	}
 
-	return 0;
+ free:
+	kfree(zerobuf);
+	return ret;
 }
 
 static int btt_freelist_init(struct arena_info *arena)