mbox series

[ndctl,V2,0/8] fix serverl issues reported by Coverity

Message ID 3211fe8a-33fb-37ca-e192-ad1f116f4acd@huawei.com (mailing list archive)
Headers show
Series fix serverl issues reported by Coverity | expand

Message

Zhiqiang Liu Nov. 25, 2020, 1 a.m. UTC
Changes: V1->V2
- add one empty line in 1/8 patch as suggested by Jeff Moyer <jmoyer@redhat.com>.


Recently, we use Coverity to analysis the ndctl package.
Several issues should be resolved to make Coverity happy.

Zhiqiang Liu (8):
  namespace: check whether pfn|dax|btt is NULL in setup_namespace
  lib/libndctl: fix memory leakage problem in add_bus
  libdaxctl: fix memory leakage in add_dax_region()
  dimm: fix potential fd leakage in dimm_action()
  util/help: check whether strdup returns NULL in exec_man_konqueror
  lib/inject: check whether cmd is created successfully
  libndctl: check whether ndctl_btt_get_namespace returns NULL in
    callers
  namespace: check whether seed is NULL in validate_namespace_options

 daxctl/lib/libdaxctl.c |  3 +++
 ndctl/dimm.c           | 12 +++++++-----
 ndctl/lib/inject.c     |  8 ++++++++
 ndctl/lib/libndctl.c   |  1 +
 ndctl/namespace.c      | 23 ++++++++++++++++++-----
 test/libndctl.c        | 16 +++++++++++-----
 test/parent-uuid.c     |  2 +-
 util/help.c            |  8 +++++++-
 util/json.c            |  3 +++
 9 files changed, 59 insertions(+), 17 deletions(-)

Comments

Jane Chu Dec. 9, 2020, 12:20 a.m. UTC | #1
Hi,

I actually just ran into the NULL deref issue that is fixed here.

Bu I have a question for the experts:
what might cause libndctl to run into the NULL deref like below ?

Program terminated with signal 11, Segmentation fault.
#0  ndctl_pfn_get_bus (pfn=pfn@entry=0x0) at libndctl.c:5540
5540            return pfn->region->bus;

(gdb) print pfn
$1 = (struct ndctl_pfn *) 0x0
(gdb) frame 4
#4  0x000000000040ca70 in setup_namespace (region=region@entry=0x109d910,
     ndns=ndns@entry=0x10a7d40, p=p@entry=0x7ffd8ff73b90) at namespace.c:570
570                     try(ndctl_dax, set_uuid, dax, uuid);
(gdb) info locals
__rc = <optimized out>
dax = 0x0

What I did was to let 2 threads run "create-namespace all" in a tight 
loop, and 2 other threads run "destroy-namespace all" in a tight loop,
while chasing an year old issue that randomly resurfaces -
"nd_region region1: allocation underrun: 0x0 of 0x40000000 bytes"

In addition, there are kmemleaks,
# cat /sys/kernel/debug/kmemleak
[..]
unreferenced object 0xffff976bd46f6240 (size 64):
   comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00  .......... .7...
     ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00  ....8...........
   backtrace:
     [<00000000064003cf>] __kmalloc_track_caller+0x136/0x379
     [<00000000d85e3c52>] krealloc+0x67/0x92
     [<00000000d7d3ba8a>] __alloc_dev_dax_range+0x73/0x25c
     [<0000000027d58626>] devm_create_dev_dax+0x27d/0x416
     [<00000000434abd43>] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core]
     [<0000000083726c1c>] dax_pmem_probe+0x10/0x1f [dax_pmem]
     [<00000000b5f2319c>] nvdimm_bus_probe+0x9d/0x340 [libnvdimm]
     [<00000000c055e544>] really_probe+0x230/0x48d
     [<000000006cabd38e>] driver_probe_device+0x122/0x13b
     [<0000000029c7b95a>] device_driver_attach+0x5b/0x60
     [<0000000053e5659b>] bind_store+0xb7/0xc3
     [<00000000d3bdaadc>] drv_attr_store+0x27/0x31
     [<00000000949069c5>] sysfs_kf_write+0x4a/0x57
     [<000000004a8b5adf>] kernfs_fop_write+0x150/0x1e5
     [<00000000bded60f0>] __vfs_write+0x1b/0x34
     [<00000000b92900f0>] vfs_write+0xd8/0x1d1


thanks,
-jane


On 11/24/2020 5:00 PM, Zhiqiang Liu wrote:
> Changes: V1->V2
> - add one empty line in 1/8 patch as suggested by Jeff Moyer <jmoyer@redhat.com>.
> 
> 
> Recently, we use Coverity to analysis the ndctl package.
> Several issues should be resolved to make Coverity happy.
> 
> Zhiqiang Liu (8):
>    namespace: check whether pfn|dax|btt is NULL in setup_namespace
>    lib/libndctl: fix memory leakage problem in add_bus
>    libdaxctl: fix memory leakage in add_dax_region()
>    dimm: fix potential fd leakage in dimm_action()
>    util/help: check whether strdup returns NULL in exec_man_konqueror
>    lib/inject: check whether cmd is created successfully
>    libndctl: check whether ndctl_btt_get_namespace returns NULL in
>      callers
>    namespace: check whether seed is NULL in validate_namespace_options
> 
>   daxctl/lib/libdaxctl.c |  3 +++
>   ndctl/dimm.c           | 12 +++++++-----
>   ndctl/lib/inject.c     |  8 ++++++++
>   ndctl/lib/libndctl.c   |  1 +
>   ndctl/namespace.c      | 23 ++++++++++++++++++-----
>   test/libndctl.c        | 16 +++++++++++-----
>   test/parent-uuid.c     |  2 +-
>   util/help.c            |  8 +++++++-
>   util/json.c            |  3 +++
>   9 files changed, 59 insertions(+), 17 deletions(-)
>
Verma, Vishal L Dec. 17, 2020, 3:41 a.m. UTC | #2
On Wed, 2020-11-25 at 09:00 +0800, Zhiqiang Liu wrote:
> Changes: V1->V2
> - add one empty line in 1/8 patch as suggested by Jeff Moyer <jmoyer@redhat.com>.
> 
> 
> Recently, we use Coverity to analysis the ndctl package.
> Several issues should be resolved to make Coverity happy.
> 
> Zhiqiang Liu (8):
>   namespace: check whether pfn|dax|btt is NULL in setup_namespace
>   lib/libndctl: fix memory leakage problem in add_bus
>   libdaxctl: fix memory leakage in add_dax_region()
>   dimm: fix potential fd leakage in dimm_action()
>   util/help: check whether strdup returns NULL in exec_man_konqueror
>   lib/inject: check whether cmd is created successfully
>   libndctl: check whether ndctl_btt_get_namespace returns NULL in
>     callers
>   namespace: check whether seed is NULL in validate_namespace_options
> 
>  daxctl/lib/libdaxctl.c |  3 +++
>  ndctl/dimm.c           | 12 +++++++-----
>  ndctl/lib/inject.c     |  8 ++++++++
>  ndctl/lib/libndctl.c   |  1 +
>  ndctl/namespace.c      | 23 ++++++++++++++++++-----
>  test/libndctl.c        | 16 +++++++++++-----
>  test/parent-uuid.c     |  2 +-
>  util/help.c            |  8 +++++++-
>  util/json.c            |  3 +++
>  9 files changed, 59 insertions(+), 17 deletions(-)
> 
Hi Zhiquiang,

The patches look good, and I've applied them for v71. However one thing
to note:

If you're sending a v2, it is preferable to respin the whole series,
even if you're only changing a subset of (even a single) patch in the
series. That allows tools like 'b4' to just Do The Right Thing, and make
sure all the latest patches are grabbed.

In this case, especially, your cover letter promises 8 patches (0/8),
but there is only one that follows. This confuses 'b4':

   ERROR: missing [2/8]!
   ERROR: missing [3/8]!
   ERROR: missing [4/8]!
   ...etc

I've fixed it up manually for this, but just some things to consider for
the future.

Thanks,
-Vishal
Zhiqiang Liu Dec. 17, 2020, 6:18 a.m. UTC | #3
On 2020/12/17 11:41, Verma, Vishal L wrote:
> On Wed, 2020-11-25 at 09:00 +0800, Zhiqiang Liu wrote:
>> Changes: V1->V2
>> - add one empty line in 1/8 patch as suggested by Jeff Moyer <jmoyer@redhat.com>.
>>
>>
>> Recently, we use Coverity to analysis the ndctl package.
>> Several issues should be resolved to make Coverity happy.
>>
>> Zhiqiang Liu (8):
>>   namespace: check whether pfn|dax|btt is NULL in setup_namespace
>>   lib/libndctl: fix memory leakage problem in add_bus
>>   libdaxctl: fix memory leakage in add_dax_region()
>>   dimm: fix potential fd leakage in dimm_action()
>>   util/help: check whether strdup returns NULL in exec_man_konqueror
>>   lib/inject: check whether cmd is created successfully
>>   libndctl: check whether ndctl_btt_get_namespace returns NULL in
>>     callers
>>   namespace: check whether seed is NULL in validate_namespace_options
>>
>>  daxctl/lib/libdaxctl.c |  3 +++
>>  ndctl/dimm.c           | 12 +++++++-----
>>  ndctl/lib/inject.c     |  8 ++++++++
>>  ndctl/lib/libndctl.c   |  1 +
>>  ndctl/namespace.c      | 23 ++++++++++++++++++-----
>>  test/libndctl.c        | 16 +++++++++++-----
>>  test/parent-uuid.c     |  2 +-
>>  util/help.c            |  8 +++++++-
>>  util/json.c            |  3 +++
>>  9 files changed, 59 insertions(+), 17 deletions(-)
>>
> Hi Zhiquiang,
> 
> The patches look good, and I've applied them for v71. However one thing
> to note:
> 
> If you're sending a v2, it is preferable to respin the whole series,
> even if you're only changing a subset of (even a single) patch in the
> series. That allows tools like 'b4' to just Do The Right Thing, and make
> sure all the latest patches are grabbed.
> 
> In this case, especially, your cover letter promises 8 patches (0/8),
> but there is only one that follows. This confuses 'b4':
> 
>    ERROR: missing [2/8]!
>    ERROR: missing [3/8]!
>    ERROR: missing [4/8]!
>    ...etc
> 
> I've fixed it up manually for this, but just some things to consider for
> the future.
> 

Thanks for the suggestion。

Regards
Zhiqiang Liu

> Thanks,
> -Vishal
>
Dan Williams Dec. 17, 2020, 8:18 a.m. UTC | #4
On Tue, Dec 8, 2020 at 4:20 PM Jane Chu <jane.chu@oracle.com> wrote:
>
> Hi,
>
> I actually just ran into the NULL deref issue that is fixed here.
>
> Bu I have a question for the experts:
> what might cause libndctl to run into the NULL deref like below ?
>
> Program terminated with signal 11, Segmentation fault.
> #0  ndctl_pfn_get_bus (pfn=pfn@entry=0x0) at libndctl.c:5540
> 5540            return pfn->region->bus;
>
> (gdb) print pfn
> $1 = (struct ndctl_pfn *) 0x0
> (gdb) frame 4
> #4  0x000000000040ca70 in setup_namespace (region=region@entry=0x109d910,
>      ndns=ndns@entry=0x10a7d40, p=p@entry=0x7ffd8ff73b90) at namespace.c:570
> 570                     try(ndctl_dax, set_uuid, dax, uuid);
> (gdb) info locals
> __rc = <optimized out>
> dax = 0x0
>
> What I did was to let 2 threads run "create-namespace all" in a tight
> loop, and 2 other threads run "destroy-namespace all" in a tight loop,

This will definitely cause libndctl to get confused the expectation is
that only one ndctl instance is operating on one region at a time.
What likely happened is TOCTOU in ndctl_region_get_pfn_seed() where
the seed device name is destroyed by the time it tries to convert it
to an ndctl object. The reason libndctl does not perform its own
locking is to keep the library stateless and allow locking to imposed
from a higher level. Two ndctl instances must not be allowed to
operate in the same region otherwise the 2 libndctl instances will get
out of sync.

This is no different than 2 fdisk processes running at the same time,
they are going to invalidate each other's view of the cached partition
state. The fix is not for fdisk to implement locking internally
instead it requires the admin to arrange for only one fdisk to run
against one disk at a time.

> while chasing an year old issue that randomly resurfaces -
> "nd_region region1: allocation underrun: 0x0 of 0x40000000 bytes"

It would be interesting to get more data on the sequence of
allocations that lead up to this event, and a dump of the resource
tree when this happens:

for (i = 0; i < nd_region->ndr_mappings; i++)
    ndd = to_ndd(&nd_region->mapping[i]);
    for_each_dpa_resource(...)
        nd_dbg_dpa(...)


>
> In addition, there are kmemleaks,
> # cat /sys/kernel/debug/kmemleak
> [..]
> unreferenced object 0xffff976bd46f6240 (size 64):
>    comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s)
>    hex dump (first 32 bytes):
>      00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00  .......... .7...
>      ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00  ....8...........
>    backtrace:
>      [<00000000064003cf>] __kmalloc_track_caller+0x136/0x379
>      [<00000000d85e3c52>] krealloc+0x67/0x92
>      [<00000000d7d3ba8a>] __alloc_dev_dax_range+0x73/0x25c
>      [<0000000027d58626>] devm_create_dev_dax+0x27d/0x416
>      [<00000000434abd43>] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core]
>      [<0000000083726c1c>] dax_pmem_probe+0x10/0x1f [dax_pmem]
>      [<00000000b5f2319c>] nvdimm_bus_probe+0x9d/0x340 [libnvdimm]
>      [<00000000c055e544>] really_probe+0x230/0x48d
>      [<000000006cabd38e>] driver_probe_device+0x122/0x13b
>      [<0000000029c7b95a>] device_driver_attach+0x5b/0x60
>      [<0000000053e5659b>] bind_store+0xb7/0xc3
>      [<00000000d3bdaadc>] drv_attr_store+0x27/0x31
>      [<00000000949069c5>] sysfs_kf_write+0x4a/0x57
>      [<000000004a8b5adf>] kernfs_fop_write+0x150/0x1e5
>      [<00000000bded60f0>] __vfs_write+0x1b/0x34
>      [<00000000b92900f0>] vfs_write+0xd8/0x1d1

Hmm... maybe this?

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 27513d311242..506549235e03 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1393,6 +1393,7 @@ struct dev_dax *devm_create_dev_dax(struct
dev_dax_data *data)
 err_pgmap:
        free_dev_dax_ranges(dev_dax);
 err_range:
+       kfree(dev_dax->ranges);
        free_dev_dax_id(dev_dax);
 err_id:
        kfree(dev_dax);