Message ID | 20190806105012.15170-1-vaibhav@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 7f10fd29826c8604b064a7344037473d773ff443 |
Headers | show |
Series | [v3] ndctl, check: Ensure mmap of BTT sections work with 64K page-sizes | expand |
Vaibhav Jain <vaibhav@linux.ibm.com> writes: > On PPC64 which uses a 64K page-size, ndtl-check command fails on a BTT > namespace with following error: > > $ sudo ndctl check-namespace namespace0.0 -vv > namespace0.0: namespace_check: checking namespace0.0 > namespace0.0: btt_discover_arenas: found 1 BTT arena > namespace0.0: btt_create_mappings: mmap arena[0].info [sz = 0x1000, off = 0x1000] failed: Invalid argument > error checking namespaces: Invalid argument > checked 0 namespaces > > An error happens when btt_create_mappings() tries to mmap the sections > of the BTT device which are usually 4K offset aligned. However the > mmap() syscall expects the 'offset' argument to be in multiples of > page-size, hence it returns EINVAL causing the btt_create_mappings() > to error out. > > As a fix for the issue this patch proposes addition of two new > functions to 'check.c' namely btt_mmap/btt_unmap that can be used to > map/unmap parts of BTT device to ndctl process address-space. The > functions tweaks the requested 'offset' argument to mmap() ensuring > that its page-size aligned and then fix-ups the returned pointer such > that it points to the requested offset within mmapped region. > > With these newly introduced functions the patch updates the call-sites > in 'check.c' to use these functions instead of mmap/unmap syscalls. > Also since btt_mmap returns NULL if mmap operation fails, the > error check call-sites can be made against NULL instead of MAP_FAILED. > > Reported-by: Harish Sriram <harish@linux.ibm.com> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > --- > Changelog: > v3: > * Fixed a log string that was splitted across multiple lines [Vishal] > > v2: > * Updated patch description to include canonical email address of bug > reporter. [Vishal] > * Improved the comment describing function btt_mmap() in 'check.c' > [Vishal] > * Simplified the argument list for btt_mmap() to just accept bttc, > length and offset. Other arguments for mmap() are derived from these > args. [Vishal] > * Renamed 'shift' variable in btt_mmap()/unmap() to 'page_offset' > [Vishal] > * Improved the dbg() messages logged inside > btt_mmap()/unmap(). [Vishal] > * Return NULL from btt_mmap() in case of an error. [Vishal] > * Changed the all sites to btt_mmap() to perform error check against > NULL return value instead of MAP_FAILED. [Vishal] > --- > ndctl/check.c | 93 +++++++++++++++++++++++++++++++++++++-------------- > 1 file changed, 67 insertions(+), 26 deletions(-) > > diff --git a/ndctl/check.c b/ndctl/check.c > index 8a7125053865..5969012eba84 100644 > --- a/ndctl/check.c > +++ b/ndctl/check.c > @@ -907,59 +907,100 @@ static int btt_discover_arenas(struct btt_chk *bttc) > return ret; > } > > -static int btt_create_mappings(struct btt_chk *bttc) > +/* > + * Wrap call to mmap(2) to work with btt device offsets that are not aligned > + * to system page boundary. It works by rounding down the requested offset > + * to sys_page_size when calling mmap(2) and then returning a fixed-up pointer > + * to the correct offset in the mmaped region. > + */ > +static void *btt_mmap(struct btt_chk *bttc, size_t length, off_t offset) > { > - struct arena_info *a; > - int mmap_flags; > - int i; > + off_t page_offset; > + int prot_flags; > + uint8_t *addr; > > if (!bttc->opts->repair) > - mmap_flags = PROT_READ; > + prot_flags = PROT_READ; > else > - mmap_flags = PROT_READ|PROT_WRITE; > + prot_flags = PROT_READ|PROT_WRITE; > + > + /* Calculate the page_offset from the system page boundary */ > + page_offset = offset - rounddown(offset, bttc->sys_page_size); > + > + /* Update the offset and length with the page_offset calculated above */ > + offset -= page_offset; > + length += page_offset; Don't you have to ensure that the length is also a multiple of the system page size? -Jeff > + > + addr = mmap(NULL, length, prot_flags, MAP_SHARED, bttc->fd, offset); > + > + /* If needed fixup the return pointer to correct offset requested */ > + if (addr != MAP_FAILED) > + addr += page_offset; > + > + dbg(bttc, "addr = %p, length = %#lx, offset = %#lx, page_offset = %#lx\n", > + (void *) addr, length, offset, page_offset); > + > + return addr == MAP_FAILED ? NULL : addr; > +} > + > +static void btt_unmap(struct btt_chk *bttc, void *ptr, size_t length) > +{ > + off_t page_offset; > + uintptr_t addr = (uintptr_t) ptr; > + > + /* Calculate the page_offset from system page boundary */ > + page_offset = addr - rounddown(addr, bttc->sys_page_size); > + > + addr -= page_offset; > + length += page_offset; > + > + munmap((void *) addr, length); > + dbg(bttc, "addr = %p, length = %#lx, page_offset = %#lx\n", > + (void *) addr, length, page_offset); > +} > + > +static int btt_create_mappings(struct btt_chk *bttc) > +{ > + struct arena_info *a; > + int i; > > for (i = 0; i < bttc->num_arenas; i++) { > a = &bttc->arena[i]; > a->map.info_len = BTT_INFO_SIZE; > - a->map.info = mmap(NULL, a->map.info_len, mmap_flags, > - MAP_SHARED, bttc->fd, a->infooff); > - if (a->map.info == MAP_FAILED) { > + a->map.info = btt_mmap(bttc, a->map.info_len, a->infooff); > + if (!a->map.info) { > err(bttc, "mmap arena[%d].info [sz = %#lx, off = %#lx] failed: %s\n", > i, a->map.info_len, a->infooff, strerror(errno)); > return -errno; > } > > a->map.data_len = a->mapoff - a->dataoff; > - a->map.data = mmap(NULL, a->map.data_len, mmap_flags, > - MAP_SHARED, bttc->fd, a->dataoff); > - if (a->map.data == MAP_FAILED) { > + a->map.data = btt_mmap(bttc, a->map.data_len, a->dataoff); > + if (!a->map.data) { > err(bttc, "mmap arena[%d].data [sz = %#lx, off = %#lx] failed: %s\n", > i, a->map.data_len, a->dataoff, strerror(errno)); > return -errno; > } > > a->map.map_len = a->logoff - a->mapoff; > - a->map.map = mmap(NULL, a->map.map_len, mmap_flags, > - MAP_SHARED, bttc->fd, a->mapoff); > - if (a->map.map == MAP_FAILED) { > + a->map.map = btt_mmap(bttc, a->map.map_len, a->mapoff); > + if (!a->map.map) { > err(bttc, "mmap arena[%d].map [sz = %#lx, off = %#lx] failed: %s\n", > i, a->map.map_len, a->mapoff, strerror(errno)); > return -errno; > } > > a->map.log_len = a->info2off - a->logoff; > - a->map.log = mmap(NULL, a->map.log_len, mmap_flags, > - MAP_SHARED, bttc->fd, a->logoff); > - if (a->map.log == MAP_FAILED) { > + a->map.log = btt_mmap(bttc, a->map.log_len, a->logoff); > + if (!a->map.log) { > err(bttc, "mmap arena[%d].log [sz = %#lx, off = %#lx] failed: %s\n", > i, a->map.log_len, a->logoff, strerror(errno)); > return -errno; > } > > a->map.info2_len = BTT_INFO_SIZE; > - a->map.info2 = mmap(NULL, a->map.info2_len, mmap_flags, > - MAP_SHARED, bttc->fd, a->info2off); > - if (a->map.info2 == MAP_FAILED) { > + a->map.info2 = btt_mmap(bttc, a->map.info2_len, a->info2off); > + if (!a->map.info2) { > err(bttc, "mmap arena[%d].info2 [sz = %#lx, off = %#lx] failed: %s\n", > i, a->map.info2_len, a->info2off, strerror(errno)); > return -errno; > @@ -977,15 +1018,15 @@ static void btt_remove_mappings(struct btt_chk *bttc) > for (i = 0; i < bttc->num_arenas; i++) { > a = &bttc->arena[i]; > if (a->map.info) > - munmap(a->map.info, a->map.info_len); > + btt_unmap(bttc, a->map.info, a->map.info_len); > if (a->map.data) > - munmap(a->map.data, a->map.data_len); > + btt_unmap(bttc, a->map.data, a->map.data_len); > if (a->map.map) > - munmap(a->map.map, a->map.map_len); > + btt_unmap(bttc, a->map.map, a->map.map_len); > if (a->map.log) > - munmap(a->map.log, a->map.log_len); > + btt_unmap(bttc, a->map.log, a->map.log_len); > if (a->map.info2) > - munmap(a->map.info2, a->map.info2_len); > + btt_unmap(bttc, a->map.info2, a->map.info2_len); > } > }
Thanks for reviewing this patch Jeff. Jeff Moyer <jmoyer@redhat.com> writes: > Vaibhav Jain <vaibhav@linux.ibm.com> writes: > >> On PPC64 which uses a 64K page-size, ndtl-check command fails on a BTT >> namespace with following error: >> >> $ sudo ndctl check-namespace namespace0.0 -vv >> namespace0.0: namespace_check: checking namespace0.0 >> namespace0.0: btt_discover_arenas: found 1 BTT arena >> namespace0.0: btt_create_mappings: mmap arena[0].info [sz = 0x1000, off = 0x1000] failed: Invalid argument >> error checking namespaces: Invalid argument >> checked 0 namespaces >> >> An error happens when btt_create_mappings() tries to mmap the sections >> of the BTT device which are usually 4K offset aligned. However the >> mmap() syscall expects the 'offset' argument to be in multiples of >> page-size, hence it returns EINVAL causing the btt_create_mappings() >> to error out. >> >> As a fix for the issue this patch proposes addition of two new >> functions to 'check.c' namely btt_mmap/btt_unmap that can be used to >> map/unmap parts of BTT device to ndctl process address-space. The >> functions tweaks the requested 'offset' argument to mmap() ensuring >> that its page-size aligned and then fix-ups the returned pointer such >> that it points to the requested offset within mmapped region. >> >> With these newly introduced functions the patch updates the call-sites >> in 'check.c' to use these functions instead of mmap/unmap syscalls. >> Also since btt_mmap returns NULL if mmap operation fails, the >> error check call-sites can be made against NULL instead of MAP_FAILED. >> >> Reported-by: Harish Sriram <harish@linux.ibm.com> >> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> >> --- >> Changelog: >> v3: >> * Fixed a log string that was splitted across multiple lines [Vishal] >> >> v2: >> * Updated patch description to include canonical email address of bug >> reporter. [Vishal] >> * Improved the comment describing function btt_mmap() in 'check.c' >> [Vishal] >> * Simplified the argument list for btt_mmap() to just accept bttc, >> length and offset. Other arguments for mmap() are derived from these >> args. [Vishal] >> * Renamed 'shift' variable in btt_mmap()/unmap() to 'page_offset' >> [Vishal] >> * Improved the dbg() messages logged inside >> btt_mmap()/unmap(). [Vishal] >> * Return NULL from btt_mmap() in case of an error. [Vishal] >> * Changed the all sites to btt_mmap() to perform error check against >> NULL return value instead of MAP_FAILED. [Vishal] >> --- >> ndctl/check.c | 93 +++++++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 67 insertions(+), 26 deletions(-) >> >> diff --git a/ndctl/check.c b/ndctl/check.c >> index 8a7125053865..5969012eba84 100644 >> --- a/ndctl/check.c >> +++ b/ndctl/check.c >> @@ -907,59 +907,100 @@ static int btt_discover_arenas(struct btt_chk *bttc) >> return ret; >> } >> >> -static int btt_create_mappings(struct btt_chk *bttc) >> +/* >> + * Wrap call to mmap(2) to work with btt device offsets that are not aligned >> + * to system page boundary. It works by rounding down the requested offset >> + * to sys_page_size when calling mmap(2) and then returning a fixed-up pointer >> + * to the correct offset in the mmaped region. >> + */ >> +static void *btt_mmap(struct btt_chk *bttc, size_t length, off_t offset) >> { >> - struct arena_info *a; >> - int mmap_flags; >> - int i; >> + off_t page_offset; >> + int prot_flags; >> + uint8_t *addr; >> >> if (!bttc->opts->repair) >> - mmap_flags = PROT_READ; >> + prot_flags = PROT_READ; >> else >> - mmap_flags = PROT_READ|PROT_WRITE; >> + prot_flags = PROT_READ|PROT_WRITE; >> + >> + /* Calculate the page_offset from the system page boundary */ >> + page_offset = offset - rounddown(offset, bttc->sys_page_size); >> + >> + /* Update the offset and length with the page_offset calculated above */ >> + offset -= page_offset; >> + length += page_offset; > > Don't you have to ensure that the length is also a multiple of the > system page size? > > -Jeff No, as the BTT info header is 4K in size which isnt in multiple of page size on PPC64 where PAGE_SIZE == 64K. Also I see 'do_mmap()' in kernel always rounding up the 'length' to PAGE_SIZE. So any unaligned value for 'length' arg will be handled by the kernel. Finally mmap(2) doesn't put any constraint on the 'length' argument to mmap except it should > 0. The 'offset' arg on other hand has a constraint which needs to be in multiple of PAGE_SIZE. > >> + >> + addr = mmap(NULL, length, prot_flags, MAP_SHARED, bttc->fd, offset); >> + >> + /* If needed fixup the return pointer to correct offset requested */ >> + if (addr != MAP_FAILED) >> + addr += page_offset; >> + >> + dbg(bttc, "addr = %p, length = %#lx, offset = %#lx, page_offset = %#lx\n", >> + (void *) addr, length, offset, page_offset); >> + >> + return addr == MAP_FAILED ? NULL : addr; >> +} >> + >> +static void btt_unmap(struct btt_chk *bttc, void *ptr, size_t length) >> +{ >> + off_t page_offset; >> + uintptr_t addr = (uintptr_t) ptr; >> + >> + /* Calculate the page_offset from system page boundary */ >> + page_offset = addr - rounddown(addr, bttc->sys_page_size); >> + >> + addr -= page_offset; >> + length += page_offset; >> + >> + munmap((void *) addr, length); >> + dbg(bttc, "addr = %p, length = %#lx, page_offset = %#lx\n", >> + (void *) addr, length, page_offset); >> +} >> + >> +static int btt_create_mappings(struct btt_chk *bttc) >> +{ >> + struct arena_info *a; >> + int i; >> >> for (i = 0; i < bttc->num_arenas; i++) { >> a = &bttc->arena[i]; >> a->map.info_len = BTT_INFO_SIZE; >> - a->map.info = mmap(NULL, a->map.info_len, mmap_flags, >> - MAP_SHARED, bttc->fd, a->infooff); >> - if (a->map.info == MAP_FAILED) { >> + a->map.info = btt_mmap(bttc, a->map.info_len, a->infooff); >> + if (!a->map.info) { >> err(bttc, "mmap arena[%d].info [sz = %#lx, off = %#lx] failed: %s\n", >> i, a->map.info_len, a->infooff, strerror(errno)); >> return -errno; >> } >> >> a->map.data_len = a->mapoff - a->dataoff; >> - a->map.data = mmap(NULL, a->map.data_len, mmap_flags, >> - MAP_SHARED, bttc->fd, a->dataoff); >> - if (a->map.data == MAP_FAILED) { >> + a->map.data = btt_mmap(bttc, a->map.data_len, a->dataoff); >> + if (!a->map.data) { >> err(bttc, "mmap arena[%d].data [sz = %#lx, off = %#lx] failed: %s\n", >> i, a->map.data_len, a->dataoff, strerror(errno)); >> return -errno; >> } >> >> a->map.map_len = a->logoff - a->mapoff; >> - a->map.map = mmap(NULL, a->map.map_len, mmap_flags, >> - MAP_SHARED, bttc->fd, a->mapoff); >> - if (a->map.map == MAP_FAILED) { >> + a->map.map = btt_mmap(bttc, a->map.map_len, a->mapoff); >> + if (!a->map.map) { >> err(bttc, "mmap arena[%d].map [sz = %#lx, off = %#lx] failed: %s\n", >> i, a->map.map_len, a->mapoff, strerror(errno)); >> return -errno; >> } >> >> a->map.log_len = a->info2off - a->logoff; >> - a->map.log = mmap(NULL, a->map.log_len, mmap_flags, >> - MAP_SHARED, bttc->fd, a->logoff); >> - if (a->map.log == MAP_FAILED) { >> + a->map.log = btt_mmap(bttc, a->map.log_len, a->logoff); >> + if (!a->map.log) { >> err(bttc, "mmap arena[%d].log [sz = %#lx, off = %#lx] failed: %s\n", >> i, a->map.log_len, a->logoff, strerror(errno)); >> return -errno; >> } >> >> a->map.info2_len = BTT_INFO_SIZE; >> - a->map.info2 = mmap(NULL, a->map.info2_len, mmap_flags, >> - MAP_SHARED, bttc->fd, a->info2off); >> - if (a->map.info2 == MAP_FAILED) { >> + a->map.info2 = btt_mmap(bttc, a->map.info2_len, a->info2off); >> + if (!a->map.info2) { >> err(bttc, "mmap arena[%d].info2 [sz = %#lx, off = %#lx] failed: %s\n", >> i, a->map.info2_len, a->info2off, strerror(errno)); >> return -errno; >> @@ -977,15 +1018,15 @@ static void btt_remove_mappings(struct btt_chk *bttc) >> for (i = 0; i < bttc->num_arenas; i++) { >> a = &bttc->arena[i]; >> if (a->map.info) >> - munmap(a->map.info, a->map.info_len); >> + btt_unmap(bttc, a->map.info, a->map.info_len); >> if (a->map.data) >> - munmap(a->map.data, a->map.data_len); >> + btt_unmap(bttc, a->map.data, a->map.data_len); >> if (a->map.map) >> - munmap(a->map.map, a->map.map_len); >> + btt_unmap(bttc, a->map.map, a->map.map_len); >> if (a->map.log) >> - munmap(a->map.log, a->map.log_len); >> + btt_unmap(bttc, a->map.log, a->map.log_len); >> if (a->map.info2) >> - munmap(a->map.info2, a->map.info2_len); >> + btt_unmap(bttc, a->map.info2, a->map.info2_len); >> } >> } >
Vaibhav Jain <vaibhav@linux.ibm.com> writes: > Thanks for reviewing this patch Jeff. > > Jeff Moyer <jmoyer@redhat.com> writes: > >> Vaibhav Jain <vaibhav@linux.ibm.com> writes: >> >>> On PPC64 which uses a 64K page-size, ndtl-check command fails on a BTT >>> namespace with following error: >>> >>> $ sudo ndctl check-namespace namespace0.0 -vv >>> namespace0.0: namespace_check: checking namespace0.0 >>> namespace0.0: btt_discover_arenas: found 1 BTT arena >>> namespace0.0: btt_create_mappings: mmap arena[0].info [sz = 0x1000, off = 0x1000] failed: Invalid argument >>> error checking namespaces: Invalid argument >>> checked 0 namespaces >>> >>> An error happens when btt_create_mappings() tries to mmap the sections >>> of the BTT device which are usually 4K offset aligned. However the >>> mmap() syscall expects the 'offset' argument to be in multiples of >>> page-size, hence it returns EINVAL causing the btt_create_mappings() >>> to error out. >>> >>> As a fix for the issue this patch proposes addition of two new >>> functions to 'check.c' namely btt_mmap/btt_unmap that can be used to >>> map/unmap parts of BTT device to ndctl process address-space. The >>> functions tweaks the requested 'offset' argument to mmap() ensuring >>> that its page-size aligned and then fix-ups the returned pointer such >>> that it points to the requested offset within mmapped region. >>> >>> With these newly introduced functions the patch updates the call-sites >>> in 'check.c' to use these functions instead of mmap/unmap syscalls. >>> Also since btt_mmap returns NULL if mmap operation fails, the >>> error check call-sites can be made against NULL instead of MAP_FAILED. >>> >>> Reported-by: Harish Sriram <harish@linux.ibm.com> >>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> >>> --- >>> Changelog: >>> v3: >>> * Fixed a log string that was splitted across multiple lines [Vishal] >>> >>> v2: >>> * Updated patch description to include canonical email address of bug >>> reporter. [Vishal] >>> * Improved the comment describing function btt_mmap() in 'check.c' >>> [Vishal] >>> * Simplified the argument list for btt_mmap() to just accept bttc, >>> length and offset. Other arguments for mmap() are derived from these >>> args. [Vishal] >>> * Renamed 'shift' variable in btt_mmap()/unmap() to 'page_offset' >>> [Vishal] >>> * Improved the dbg() messages logged inside >>> btt_mmap()/unmap(). [Vishal] >>> * Return NULL from btt_mmap() in case of an error. [Vishal] >>> * Changed the all sites to btt_mmap() to perform error check against >>> NULL return value instead of MAP_FAILED. [Vishal] >>> --- >>> ndctl/check.c | 93 +++++++++++++++++++++++++++++++++++++-------------- >>> 1 file changed, 67 insertions(+), 26 deletions(-) >>> >>> diff --git a/ndctl/check.c b/ndctl/check.c >>> index 8a7125053865..5969012eba84 100644 >>> --- a/ndctl/check.c >>> +++ b/ndctl/check.c >>> @@ -907,59 +907,100 @@ static int btt_discover_arenas(struct btt_chk *bttc) >>> return ret; >>> } >>> >>> -static int btt_create_mappings(struct btt_chk *bttc) >>> +/* >>> + * Wrap call to mmap(2) to work with btt device offsets that are not aligned >>> + * to system page boundary. It works by rounding down the requested offset >>> + * to sys_page_size when calling mmap(2) and then returning a fixed-up pointer >>> + * to the correct offset in the mmaped region. >>> + */ >>> +static void *btt_mmap(struct btt_chk *bttc, size_t length, off_t offset) >>> { >>> - struct arena_info *a; >>> - int mmap_flags; >>> - int i; >>> + off_t page_offset; >>> + int prot_flags; >>> + uint8_t *addr; >>> >>> if (!bttc->opts->repair) >>> - mmap_flags = PROT_READ; >>> + prot_flags = PROT_READ; >>> else >>> - mmap_flags = PROT_READ|PROT_WRITE; >>> + prot_flags = PROT_READ|PROT_WRITE; >>> + >>> + /* Calculate the page_offset from the system page boundary */ >>> + page_offset = offset - rounddown(offset, bttc->sys_page_size); >>> + >>> + /* Update the offset and length with the page_offset calculated above */ >>> + offset -= page_offset; >>> + length += page_offset; >> >> Don't you have to ensure that the length is also a multiple of the >> system page size? >> >> -Jeff > > No, as the BTT info header is 4K in size which isnt in multiple of page > size on PPC64 where PAGE_SIZE == 64K. > > Also I see 'do_mmap()' in kernel always rounding up the 'length' to > PAGE_SIZE. So any unaligned value for 'length' arg will be handled by the > kernel. > > Finally mmap(2) doesn't put any constraint on the 'length' argument to > mmap except it should > 0. The 'offset' arg on other hand has a > constraint which needs to be in multiple of PAGE_SIZE. Right, this is the part I forgot. I'd probaby write the map and unmap routines a bit differently, but what you wrote works. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> >> >>> + >>> + addr = mmap(NULL, length, prot_flags, MAP_SHARED, bttc->fd, offset); >>> + >>> + /* If needed fixup the return pointer to correct offset requested */ >>> + if (addr != MAP_FAILED) >>> + addr += page_offset; >>> + >>> + dbg(bttc, "addr = %p, length = %#lx, offset = %#lx, page_offset = %#lx\n", >>> + (void *) addr, length, offset, page_offset); >>> + >>> + return addr == MAP_FAILED ? NULL : addr; >>> +} >>> + >>> +static void btt_unmap(struct btt_chk *bttc, void *ptr, size_t length) >>> +{ >>> + off_t page_offset; >>> + uintptr_t addr = (uintptr_t) ptr; >>> + >>> + /* Calculate the page_offset from system page boundary */ >>> + page_offset = addr - rounddown(addr, bttc->sys_page_size); >>> + >>> + addr -= page_offset; >>> + length += page_offset; >>> + >>> + munmap((void *) addr, length); >>> + dbg(bttc, "addr = %p, length = %#lx, page_offset = %#lx\n", >>> + (void *) addr, length, page_offset); >>> +} >>> + >>> +static int btt_create_mappings(struct btt_chk *bttc) >>> +{ >>> + struct arena_info *a; >>> + int i; >>> >>> for (i = 0; i < bttc->num_arenas; i++) { >>> a = &bttc->arena[i]; >>> a->map.info_len = BTT_INFO_SIZE; >>> - a->map.info = mmap(NULL, a->map.info_len, mmap_flags, >>> - MAP_SHARED, bttc->fd, a->infooff); >>> - if (a->map.info == MAP_FAILED) { >>> + a->map.info = btt_mmap(bttc, a->map.info_len, a->infooff); >>> + if (!a->map.info) { >>> err(bttc, "mmap arena[%d].info [sz = %#lx, off = %#lx] failed: %s\n", >>> i, a->map.info_len, a->infooff, strerror(errno)); >>> return -errno; >>> } >>> >>> a->map.data_len = a->mapoff - a->dataoff; >>> - a->map.data = mmap(NULL, a->map.data_len, mmap_flags, >>> - MAP_SHARED, bttc->fd, a->dataoff); >>> - if (a->map.data == MAP_FAILED) { >>> + a->map.data = btt_mmap(bttc, a->map.data_len, a->dataoff); >>> + if (!a->map.data) { >>> err(bttc, "mmap arena[%d].data [sz = %#lx, off = %#lx] failed: %s\n", >>> i, a->map.data_len, a->dataoff, strerror(errno)); >>> return -errno; >>> } >>> >>> a->map.map_len = a->logoff - a->mapoff; >>> - a->map.map = mmap(NULL, a->map.map_len, mmap_flags, >>> - MAP_SHARED, bttc->fd, a->mapoff); >>> - if (a->map.map == MAP_FAILED) { >>> + a->map.map = btt_mmap(bttc, a->map.map_len, a->mapoff); >>> + if (!a->map.map) { >>> err(bttc, "mmap arena[%d].map [sz = %#lx, off = %#lx] failed: %s\n", >>> i, a->map.map_len, a->mapoff, strerror(errno)); >>> return -errno; >>> } >>> >>> a->map.log_len = a->info2off - a->logoff; >>> - a->map.log = mmap(NULL, a->map.log_len, mmap_flags, >>> - MAP_SHARED, bttc->fd, a->logoff); >>> - if (a->map.log == MAP_FAILED) { >>> + a->map.log = btt_mmap(bttc, a->map.log_len, a->logoff); >>> + if (!a->map.log) { >>> err(bttc, "mmap arena[%d].log [sz = %#lx, off = %#lx] failed: %s\n", >>> i, a->map.log_len, a->logoff, strerror(errno)); >>> return -errno; >>> } >>> >>> a->map.info2_len = BTT_INFO_SIZE; >>> - a->map.info2 = mmap(NULL, a->map.info2_len, mmap_flags, >>> - MAP_SHARED, bttc->fd, a->info2off); >>> - if (a->map.info2 == MAP_FAILED) { >>> + a->map.info2 = btt_mmap(bttc, a->map.info2_len, a->info2off); >>> + if (!a->map.info2) { >>> err(bttc, "mmap arena[%d].info2 [sz = %#lx, off = %#lx] failed: %s\n", >>> i, a->map.info2_len, a->info2off, strerror(errno)); >>> return -errno; >>> @@ -977,15 +1018,15 @@ static void btt_remove_mappings(struct btt_chk *bttc) >>> for (i = 0; i < bttc->num_arenas; i++) { >>> a = &bttc->arena[i]; >>> if (a->map.info) >>> - munmap(a->map.info, a->map.info_len); >>> + btt_unmap(bttc, a->map.info, a->map.info_len); >>> if (a->map.data) >>> - munmap(a->map.data, a->map.data_len); >>> + btt_unmap(bttc, a->map.data, a->map.data_len); >>> if (a->map.map) >>> - munmap(a->map.map, a->map.map_len); >>> + btt_unmap(bttc, a->map.map, a->map.map_len); >>> if (a->map.log) >>> - munmap(a->map.log, a->map.log_len); >>> + btt_unmap(bttc, a->map.log, a->map.log_len); >>> if (a->map.info2) >>> - munmap(a->map.info2, a->map.info2_len); >>> + btt_unmap(bttc, a->map.info2, a->map.info2_len); >>> } >>> } >>
diff --git a/ndctl/check.c b/ndctl/check.c index 8a7125053865..5969012eba84 100644 --- a/ndctl/check.c +++ b/ndctl/check.c @@ -907,59 +907,100 @@ static int btt_discover_arenas(struct btt_chk *bttc) return ret; } -static int btt_create_mappings(struct btt_chk *bttc) +/* + * Wrap call to mmap(2) to work with btt device offsets that are not aligned + * to system page boundary. It works by rounding down the requested offset + * to sys_page_size when calling mmap(2) and then returning a fixed-up pointer + * to the correct offset in the mmaped region. + */ +static void *btt_mmap(struct btt_chk *bttc, size_t length, off_t offset) { - struct arena_info *a; - int mmap_flags; - int i; + off_t page_offset; + int prot_flags; + uint8_t *addr; if (!bttc->opts->repair) - mmap_flags = PROT_READ; + prot_flags = PROT_READ; else - mmap_flags = PROT_READ|PROT_WRITE; + prot_flags = PROT_READ|PROT_WRITE; + + /* Calculate the page_offset from the system page boundary */ + page_offset = offset - rounddown(offset, bttc->sys_page_size); + + /* Update the offset and length with the page_offset calculated above */ + offset -= page_offset; + length += page_offset; + + addr = mmap(NULL, length, prot_flags, MAP_SHARED, bttc->fd, offset); + + /* If needed fixup the return pointer to correct offset requested */ + if (addr != MAP_FAILED) + addr += page_offset; + + dbg(bttc, "addr = %p, length = %#lx, offset = %#lx, page_offset = %#lx\n", + (void *) addr, length, offset, page_offset); + + return addr == MAP_FAILED ? NULL : addr; +} + +static void btt_unmap(struct btt_chk *bttc, void *ptr, size_t length) +{ + off_t page_offset; + uintptr_t addr = (uintptr_t) ptr; + + /* Calculate the page_offset from system page boundary */ + page_offset = addr - rounddown(addr, bttc->sys_page_size); + + addr -= page_offset; + length += page_offset; + + munmap((void *) addr, length); + dbg(bttc, "addr = %p, length = %#lx, page_offset = %#lx\n", + (void *) addr, length, page_offset); +} + +static int btt_create_mappings(struct btt_chk *bttc) +{ + struct arena_info *a; + int i; for (i = 0; i < bttc->num_arenas; i++) { a = &bttc->arena[i]; a->map.info_len = BTT_INFO_SIZE; - a->map.info = mmap(NULL, a->map.info_len, mmap_flags, - MAP_SHARED, bttc->fd, a->infooff); - if (a->map.info == MAP_FAILED) { + a->map.info = btt_mmap(bttc, a->map.info_len, a->infooff); + if (!a->map.info) { err(bttc, "mmap arena[%d].info [sz = %#lx, off = %#lx] failed: %s\n", i, a->map.info_len, a->infooff, strerror(errno)); return -errno; } a->map.data_len = a->mapoff - a->dataoff; - a->map.data = mmap(NULL, a->map.data_len, mmap_flags, - MAP_SHARED, bttc->fd, a->dataoff); - if (a->map.data == MAP_FAILED) { + a->map.data = btt_mmap(bttc, a->map.data_len, a->dataoff); + if (!a->map.data) { err(bttc, "mmap arena[%d].data [sz = %#lx, off = %#lx] failed: %s\n", i, a->map.data_len, a->dataoff, strerror(errno)); return -errno; } a->map.map_len = a->logoff - a->mapoff; - a->map.map = mmap(NULL, a->map.map_len, mmap_flags, - MAP_SHARED, bttc->fd, a->mapoff); - if (a->map.map == MAP_FAILED) { + a->map.map = btt_mmap(bttc, a->map.map_len, a->mapoff); + if (!a->map.map) { err(bttc, "mmap arena[%d].map [sz = %#lx, off = %#lx] failed: %s\n", i, a->map.map_len, a->mapoff, strerror(errno)); return -errno; } a->map.log_len = a->info2off - a->logoff; - a->map.log = mmap(NULL, a->map.log_len, mmap_flags, - MAP_SHARED, bttc->fd, a->logoff); - if (a->map.log == MAP_FAILED) { + a->map.log = btt_mmap(bttc, a->map.log_len, a->logoff); + if (!a->map.log) { err(bttc, "mmap arena[%d].log [sz = %#lx, off = %#lx] failed: %s\n", i, a->map.log_len, a->logoff, strerror(errno)); return -errno; } a->map.info2_len = BTT_INFO_SIZE; - a->map.info2 = mmap(NULL, a->map.info2_len, mmap_flags, - MAP_SHARED, bttc->fd, a->info2off); - if (a->map.info2 == MAP_FAILED) { + a->map.info2 = btt_mmap(bttc, a->map.info2_len, a->info2off); + if (!a->map.info2) { err(bttc, "mmap arena[%d].info2 [sz = %#lx, off = %#lx] failed: %s\n", i, a->map.info2_len, a->info2off, strerror(errno)); return -errno; @@ -977,15 +1018,15 @@ static void btt_remove_mappings(struct btt_chk *bttc) for (i = 0; i < bttc->num_arenas; i++) { a = &bttc->arena[i]; if (a->map.info) - munmap(a->map.info, a->map.info_len); + btt_unmap(bttc, a->map.info, a->map.info_len); if (a->map.data) - munmap(a->map.data, a->map.data_len); + btt_unmap(bttc, a->map.data, a->map.data_len); if (a->map.map) - munmap(a->map.map, a->map.map_len); + btt_unmap(bttc, a->map.map, a->map.map_len); if (a->map.log) - munmap(a->map.log, a->map.log_len); + btt_unmap(bttc, a->map.log, a->map.log_len); if (a->map.info2) - munmap(a->map.info2, a->map.info2_len); + btt_unmap(bttc, a->map.info2, a->map.info2_len); } }
On PPC64 which uses a 64K page-size, ndtl-check command fails on a BTT namespace with following error: $ sudo ndctl check-namespace namespace0.0 -vv namespace0.0: namespace_check: checking namespace0.0 namespace0.0: btt_discover_arenas: found 1 BTT arena namespace0.0: btt_create_mappings: mmap arena[0].info [sz = 0x1000, off = 0x1000] failed: Invalid argument error checking namespaces: Invalid argument checked 0 namespaces An error happens when btt_create_mappings() tries to mmap the sections of the BTT device which are usually 4K offset aligned. However the mmap() syscall expects the 'offset' argument to be in multiples of page-size, hence it returns EINVAL causing the btt_create_mappings() to error out. As a fix for the issue this patch proposes addition of two new functions to 'check.c' namely btt_mmap/btt_unmap that can be used to map/unmap parts of BTT device to ndctl process address-space. The functions tweaks the requested 'offset' argument to mmap() ensuring that its page-size aligned and then fix-ups the returned pointer such that it points to the requested offset within mmapped region. With these newly introduced functions the patch updates the call-sites in 'check.c' to use these functions instead of mmap/unmap syscalls. Also since btt_mmap returns NULL if mmap operation fails, the error check call-sites can be made against NULL instead of MAP_FAILED. Reported-by: Harish Sriram <harish@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> --- Changelog: v3: * Fixed a log string that was splitted across multiple lines [Vishal] v2: * Updated patch description to include canonical email address of bug reporter. [Vishal] * Improved the comment describing function btt_mmap() in 'check.c' [Vishal] * Simplified the argument list for btt_mmap() to just accept bttc, length and offset. Other arguments for mmap() are derived from these args. [Vishal] * Renamed 'shift' variable in btt_mmap()/unmap() to 'page_offset' [Vishal] * Improved the dbg() messages logged inside btt_mmap()/unmap(). [Vishal] * Return NULL from btt_mmap() in case of an error. [Vishal] * Changed the all sites to btt_mmap() to perform error check against NULL return value instead of MAP_FAILED. [Vishal] --- ndctl/check.c | 93 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 67 insertions(+), 26 deletions(-)