Message ID | 20170426224307.28075-1-vishal.l.verma@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 26, 2017 at 3:43 PM, Vishal Verma <vishal.l.verma@intel.com> wrote: > If we had badblocks/poison in the metadata area of a BTT, recreating the > BTT would not clear the poison in all cases, notably the flog area. This > is because rw_bytes will only clear errors if the request being sent > down is 512B aligned and sized. > > Make sure that when writing the map and info blocks, the rw_bytes being > sent are of the correct size/alignment. For the flog, instead of doing > the smaller log_entry writes only, first do a 'wipe' of the entire area > by writing zeroes in large enough chunks so that errors get cleared. These eventually nsio_rw_bytes() while the namespace is claimed by a btt instance, so this collides with the hack to disable error clearing for btt. If we want to fix this up for 4.12 I think we need to pass a 'flags' parameter to the ->rw_bytes() routine to indicate that we are in atomic context (NVDIMM_IO_ATOMIC), rather than assuming that all BTT I/O is atomic. Care to code that up? I think we can include it in a pull request before the merge window closes.
On Mon, 2017-05-08 at 10:00 -0700, Dan Williams wrote: > On Wed, Apr 26, 2017 at 3:43 PM, Vishal Verma <vishal.l.verma@intel.c > om> wrote: > > If we had badblocks/poison in the metadata area of a BTT, > > recreating the > > BTT would not clear the poison in all cases, notably the flog area. > > This > > is because rw_bytes will only clear errors if the request being > > sent > > down is 512B aligned and sized. > > > > Make sure that when writing the map and info blocks, the rw_bytes > > being > > sent are of the correct size/alignment. For the flog, instead of > > doing > > the smaller log_entry writes only, first do a 'wipe' of the entire > > area > > by writing zeroes in large enough chunks so that errors get > > cleared. > > These eventually nsio_rw_bytes() while the namespace is claimed by a > btt instance, so this collides with the hack to disable error > clearing > for btt. If we want to fix this up for 4.12 I think we need to pass a > 'flags' parameter to the ->rw_bytes() routine to indicate that we are > in atomic context (NVDIMM_IO_ATOMIC), rather than assuming that all > BTT I/O is atomic. Care to code that up? I think we can include it in > a pull request before the merge window closes. Ah good point. You mean a flag to say that we're in process context right?We want to skip the clearing in atomic context..
On Mon, May 8, 2017 at 2:17 PM, Verma, Vishal L <vishal.l.verma@intel.com> wrote: > On Mon, 2017-05-08 at 10:00 -0700, Dan Williams wrote: >> On Wed, Apr 26, 2017 at 3:43 PM, Vishal Verma <vishal.l.verma@intel.c >> om> wrote: >> > If we had badblocks/poison in the metadata area of a BTT, >> > recreating the >> > BTT would not clear the poison in all cases, notably the flog area. >> > This >> > is because rw_bytes will only clear errors if the request being >> > sent >> > down is 512B aligned and sized. >> > >> > Make sure that when writing the map and info blocks, the rw_bytes >> > being >> > sent are of the correct size/alignment. For the flog, instead of >> > doing >> > the smaller log_entry writes only, first do a 'wipe' of the entire >> > area >> > by writing zeroes in large enough chunks so that errors get >> > cleared. >> >> These eventually nsio_rw_bytes() while the namespace is claimed by a >> btt instance, so this collides with the hack to disable error >> clearing >> for btt. If we want to fix this up for 4.12 I think we need to pass a >> 'flags' parameter to the ->rw_bytes() routine to indicate that we are >> in atomic context (NVDIMM_IO_ATOMIC), rather than assuming that all >> BTT I/O is atomic. Care to code that up? I think we can include it in >> a pull request before the merge window closes. > > Ah good point. You mean a flag to say that we're in process context > right?We want to skip the clearing in atomic context.. Either way, a flag to indicate atomic vs process so that we can decide whether to skip vs allow error clearing.
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 368795a..6054e83 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -57,6 +57,14 @@ static int btt_info_write(struct arena_info *arena, struct btt_sb *super) { int ret; + /* + * infooff and info2off should always be at least 512B aligned. + * We rely on that to make sure rw_bytes does error clearing + * correctly, so make sure that is the case. + */ + WARN_ON_ONCE(!IS_ALIGNED(arena->infooff, 512)); + WARN_ON_ONCE(!IS_ALIGNED(arena->info2off, 512)); + ret = arena_write_bytes(arena, arena->info2off, super, sizeof(struct btt_sb)); if (ret) @@ -393,9 +401,17 @@ static int btt_map_init(struct arena_info *arena) if (!zerobuf) return -ENOMEM; + /* + * mapoff should always be at least 512B aligned. We rely on that to + * make sure rw_bytes does error clearing correctly, so make sure that + * is the case. + */ + WARN_ON_ONCE(!IS_ALIGNED(arena->mapoff, 512)); + while (mapsize) { size_t size = min(mapsize, chunk_size); + WARN_ON_ONCE(size < 512); ret = arena_write_bytes(arena, arena->mapoff + offset, zerobuf, size); if (ret) @@ -417,11 +433,36 @@ static int btt_map_init(struct arena_info *arena) */ static int btt_log_init(struct arena_info *arena) { + size_t logsize = arena->info2off - arena->logoff; + size_t chunk_size = SZ_4K, offset = 0; + struct log_entry log; + void *zerobuf; int ret; u32 i; - struct log_entry log, zerolog; - memset(&zerolog, 0, sizeof(zerolog)); + zerobuf = kzalloc(chunk_size, GFP_KERNEL); + if (!zerobuf) + return -ENOMEM; + /* + * logoff should always be at least 512B aligned. We rely on that to + * make sure rw_bytes does error clearing correctly, so make sure that + * is the case. + */ + WARN_ON_ONCE(!IS_ALIGNED(arena->logoff, 512)); + + while (logsize) { + size_t size = min(logsize, chunk_size); + + WARN_ON_ONCE(size < 512); + ret = arena_write_bytes(arena, arena->logoff + offset, zerobuf, + size); + if (ret) + goto free; + + offset += size; + logsize -= size; + cond_resched(); + } for (i = 0; i < arena->nfree; i++) { log.lba = cpu_to_le32(i); @@ -430,13 +471,12 @@ static int btt_log_init(struct arena_info *arena) log.seq = cpu_to_le32(LOG_SEQ_INIT); ret = __btt_log_write(arena, i, 0, &log); if (ret) - return ret; - ret = __btt_log_write(arena, i, 1, &zerolog); - if (ret) - return ret; + goto free; } - return 0; + free: + kfree(zerobuf); + return ret; } static int btt_freelist_init(struct arena_info *arena)
If we had badblocks/poison in the metadata area of a BTT, recreating the BTT would not clear the poison in all cases, notably the flog area. This is because rw_bytes will only clear errors if the request being sent down is 512B aligned and sized. Make sure that when writing the map and info blocks, the rw_bytes being sent are of the correct size/alignment. For the flog, instead of doing the smaller log_entry writes only, first do a 'wipe' of the entire area by writing zeroes in large enough chunks so that errors get cleared. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- drivers/nvdimm/btt.c | 54 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 47 insertions(+), 7 deletions(-)