diff mbox

Panic when make check for ndctl

Message ID 20170614114155.6D1E.E1E9C6FF@jp.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gotou, Yasunori/五島 康文 June 14, 2017, 2:41 a.m. UTC
Hi, Dan-san, Linda-san,

I had chased the root cause of this panic problem, and maybe I found it.

> > > Hmmm, though I made Fedora 25 environment, this panic still occurs...
> > > I'll attach syslog and .config again.
> > >
> > >
> > [..]
> > > [  117.804948] general protection fault: 0000 [#1] SMP
> > [..]
> > > [  117.820866] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > [..]
> > > [  117.843262] Call Trace:
> > > [  117.843985]  release_nodes+0x76/0x260
> > > [  117.845062]  devres_release_all+0x3c/0x50
> > > [  117.846225]  device_release_driver_internal+0x159/0x200
> > > [  117.847748]  device_release_driver+0x12/0x20
> > > [  117.849029]  bus_remove_device+0xfd/0x170
> > > [  117.850192]  device_del+0x1e8/0x330
> > > [  117.851284]  platform_device_del+0x28/0x90
> > > [  117.852485]  platform_device_unregister+0x12/0x30
> > > [  117.853846]  nfit_test_exit+0x2a/0x93b [nfit_test]
> > > [  117.855219]  SyS_delete_module+0x171/0x250
> > > [  117.856403]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> > 
> > Can you also attach the qemu-kvm command line you are using?
> > 
> >     ps aux | grep qemu


The cause of this problem is the num_pm of nfit_test1 is wrong.
Though 1 is specified for num_pm at nfit_test_init(), it must be 2.

----
static __init int nfit_test_init(void)
{
        int rc, i;
        :
        :
                case 1:
                        nfit_test->num_pm = 1;            <---- !!!
                        nfit_test->dcr_idx = NUM_DCR;
-----

The num_pm affects size of devm_kcalloc() at nfit_test_probe().

----
static int nfit_test_probe(struct platform_device *pdev)
{
        if (nfit_test->num_pm) {
                int num = nfit_test->num_pm;   <----!!!

                nfit_test->spa_set = devm_kcalloc(dev, num, sizeof(void *),
                                GFP_KERNEL);   <---!!!!
                nfit_test->spa_set_dma = devm_kcalloc(dev, num,   
                                sizeof(dma_addr_t), GFP_KERNEL);
-----

However,  spa_set[] array needs 2 elements at nfit_test1_alloc().

---
static int nfit_test1_alloc(struct nfit_test *t)
{
               :
        t->spa_set[0] = test_alloc(t, SPA2_SIZE, &t->spa_set_dma[0]);  <--- first element
        if (!t->spa_set[0])
                return -ENOMEM;
               :

       t->spa_set[1] = test_alloc(t, SPA_VCD_SIZE, &t->spa_set_dma[1]);  <---- The second element!!!!
-----

This breaks other area, and the area is often the link list of devres.
As a result, the panic occured on release_nodes().

I confirmed that this panic never occurred with the following patch.

---
----

Thanks,
Yasunori Goto

Comments

Dan Williams June 14, 2017, 3:05 a.m. UTC | #1
On Tue, Jun 13, 2017 at 7:41 PM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
> Hi, Dan-san, Linda-san,
>
> I had chased the root cause of this panic problem, and maybe I found it.
>
>> > > Hmmm, though I made Fedora 25 environment, this panic still occurs...
>> > > I'll attach syslog and .config again.
>> > >
>> > >
>> > [..]
>> > > [  117.804948] general protection fault: 0000 [#1] SMP
>> > [..]
>> > > [  117.820866] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> > [..]
>> > > [  117.843262] Call Trace:
>> > > [  117.843985]  release_nodes+0x76/0x260
>> > > [  117.845062]  devres_release_all+0x3c/0x50
>> > > [  117.846225]  device_release_driver_internal+0x159/0x200
>> > > [  117.847748]  device_release_driver+0x12/0x20
>> > > [  117.849029]  bus_remove_device+0xfd/0x170
>> > > [  117.850192]  device_del+0x1e8/0x330
>> > > [  117.851284]  platform_device_del+0x28/0x90
>> > > [  117.852485]  platform_device_unregister+0x12/0x30
>> > > [  117.853846]  nfit_test_exit+0x2a/0x93b [nfit_test]
>> > > [  117.855219]  SyS_delete_module+0x171/0x250
>> > > [  117.856403]  entry_SYSCALL_64_fastpath+0x1a/0xa5
>> >
>> > Can you also attach the qemu-kvm command line you are using?
>> >
>> >     ps aux | grep qemu
>
>
> The cause of this problem is the num_pm of nfit_test1 is wrong.
> Though 1 is specified for num_pm at nfit_test_init(), it must be 2.
>
> ----
> static __init int nfit_test_init(void)
> {
>         int rc, i;
>         :
>         :
>                 case 1:
>                         nfit_test->num_pm = 1;            <---- !!!
>                         nfit_test->dcr_idx = NUM_DCR;
> -----
>
> The num_pm affects size of devm_kcalloc() at nfit_test_probe().
>
> ----
> static int nfit_test_probe(struct platform_device *pdev)
> {
>         if (nfit_test->num_pm) {
>                 int num = nfit_test->num_pm;   <----!!!
>
>                 nfit_test->spa_set = devm_kcalloc(dev, num, sizeof(void *),
>                                 GFP_KERNEL);   <---!!!!
>                 nfit_test->spa_set_dma = devm_kcalloc(dev, num,
>                                 sizeof(dma_addr_t), GFP_KERNEL);
> -----
>
> However,  spa_set[] array needs 2 elements at nfit_test1_alloc().
>
> ---
> static int nfit_test1_alloc(struct nfit_test *t)
> {
>                :
>         t->spa_set[0] = test_alloc(t, SPA2_SIZE, &t->spa_set_dma[0]);  <--- first element
>         if (!t->spa_set[0])
>                 return -ENOMEM;
>                :
>
>        t->spa_set[1] = test_alloc(t, SPA_VCD_SIZE, &t->spa_set_dma[1]);  <---- The second element!!!!
> -----
>
> This breaks other area, and the area is often the link list of devres.
> As a result, the panic occured on release_nodes().
>
> I confirmed that this panic never occurred with the following patch.

Wow!

>
> ---
> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
> index c218717..548b6d4 100644
> --- a/tools/testing/nvdimm/test/nfit.c
> +++ b/tools/testing/nvdimm/test/nfit.c
> @@ -1943,7 +1943,7 @@ static __init int nfit_test_init(void)
>                         nfit_test->setup = nfit_test0_setup;
>                         break;
>                 case 1:
> -                       nfit_test->num_pm = 1;
> +                       nfit_test->num_pm = 2;
>                         nfit_test->dcr_idx = NUM_DCR;
>                         nfit_test->num_dcr = 2;
>                         nfit_test->alloc = nfit_test1_alloc;

This change looks correct to me. I'm going to try it out.

...but, "Wow!" again and a big "Thank You!".
Linda Knippers June 14, 2017, 3:31 p.m. UTC | #2
On 6/13/2017 11:05 PM, Dan Williams wrote:
> On Tue, Jun 13, 2017 at 7:41 PM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
>> Hi, Dan-san, Linda-san,
>>
>> I had chased the root cause of this panic problem, and maybe I found it.

Very good!  Thank you!

-- ljk

>>
>>>>> Hmmm, though I made Fedora 25 environment, this panic still occurs...
>>>>> I'll attach syslog and .config again.
>>>>>
>>>>>
>>>> [..]
>>>>> [  117.804948] general protection fault: 0000 [#1] SMP
>>>> [..]
>>>>> [  117.820866] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>>>> [..]
>>>>> [  117.843262] Call Trace:
>>>>> [  117.843985]  release_nodes+0x76/0x260
>>>>> [  117.845062]  devres_release_all+0x3c/0x50
>>>>> [  117.846225]  device_release_driver_internal+0x159/0x200
>>>>> [  117.847748]  device_release_driver+0x12/0x20
>>>>> [  117.849029]  bus_remove_device+0xfd/0x170
>>>>> [  117.850192]  device_del+0x1e8/0x330
>>>>> [  117.851284]  platform_device_del+0x28/0x90
>>>>> [  117.852485]  platform_device_unregister+0x12/0x30
>>>>> [  117.853846]  nfit_test_exit+0x2a/0x93b [nfit_test]
>>>>> [  117.855219]  SyS_delete_module+0x171/0x250
>>>>> [  117.856403]  entry_SYSCALL_64_fastpath+0x1a/0xa5
>>>>
>>>> Can you also attach the qemu-kvm command line you are using?
>>>>
>>>>     ps aux | grep qemu
>>
>>
>> The cause of this problem is the num_pm of nfit_test1 is wrong.
>> Though 1 is specified for num_pm at nfit_test_init(), it must be 2.
>>
>> ----
>> static __init int nfit_test_init(void)
>> {
>>         int rc, i;
>>         :
>>         :
>>                 case 1:
>>                         nfit_test->num_pm = 1;            <---- !!!
>>                         nfit_test->dcr_idx = NUM_DCR;
>> -----
>>
>> The num_pm affects size of devm_kcalloc() at nfit_test_probe().
>>
>> ----
>> static int nfit_test_probe(struct platform_device *pdev)
>> {
>>         if (nfit_test->num_pm) {
>>                 int num = nfit_test->num_pm;   <----!!!
>>
>>                 nfit_test->spa_set = devm_kcalloc(dev, num, sizeof(void *),
>>                                 GFP_KERNEL);   <---!!!!
>>                 nfit_test->spa_set_dma = devm_kcalloc(dev, num,
>>                                 sizeof(dma_addr_t), GFP_KERNEL);
>> -----
>>
>> However,  spa_set[] array needs 2 elements at nfit_test1_alloc().
>>
>> ---
>> static int nfit_test1_alloc(struct nfit_test *t)
>> {
>>                :
>>         t->spa_set[0] = test_alloc(t, SPA2_SIZE, &t->spa_set_dma[0]);  <--- first element
>>         if (!t->spa_set[0])
>>                 return -ENOMEM;
>>                :
>>
>>        t->spa_set[1] = test_alloc(t, SPA_VCD_SIZE, &t->spa_set_dma[1]);  <---- The second element!!!!
>> -----
>>
>> This breaks other area, and the area is often the link list of devres.
>> As a result, the panic occured on release_nodes().
>>
>> I confirmed that this panic never occurred with the following patch.
>
> Wow!
>
>>
>> ---
>> diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
>> index c218717..548b6d4 100644
>> --- a/tools/testing/nvdimm/test/nfit.c
>> +++ b/tools/testing/nvdimm/test/nfit.c
>> @@ -1943,7 +1943,7 @@ static __init int nfit_test_init(void)
>>                         nfit_test->setup = nfit_test0_setup;
>>                         break;
>>                 case 1:
>> -                       nfit_test->num_pm = 1;
>> +                       nfit_test->num_pm = 2;
>>                         nfit_test->dcr_idx = NUM_DCR;
>>                         nfit_test->num_dcr = 2;
>>                         nfit_test->alloc = nfit_test1_alloc;
>
> This change looks correct to me. I'm going to try it out.
>
> ...but, "Wow!" again and a big "Thank You!".
>
Dan Williams June 14, 2017, 4:06 p.m. UTC | #3
On Tue, Jun 13, 2017 at 7:41 PM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
> Hi, Dan-san, Linda-san,
>
> I had chased the root cause of this panic problem, and maybe I found it.
>
>> > > Hmmm, though I made Fedora 25 environment, this panic still occurs...
>> > > I'll attach syslog and .config again.
>> > >
>> > >
>> > [..]
>> > > [  117.804948] general protection fault: 0000 [#1] SMP
>> > [..]
>> > > [  117.820866] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> > [..]
>> > > [  117.843262] Call Trace:
>> > > [  117.843985]  release_nodes+0x76/0x260
>> > > [  117.845062]  devres_release_all+0x3c/0x50
>> > > [  117.846225]  device_release_driver_internal+0x159/0x200
>> > > [  117.847748]  device_release_driver+0x12/0x20
>> > > [  117.849029]  bus_remove_device+0xfd/0x170
>> > > [  117.850192]  device_del+0x1e8/0x330
>> > > [  117.851284]  platform_device_del+0x28/0x90
>> > > [  117.852485]  platform_device_unregister+0x12/0x30
>> > > [  117.853846]  nfit_test_exit+0x2a/0x93b [nfit_test]
>> > > [  117.855219]  SyS_delete_module+0x171/0x250
>> > > [  117.856403]  entry_SYSCALL_64_fastpath+0x1a/0xa5
>> >
>> > Can you also attach the qemu-kvm command line you are using?
>> >
>> >     ps aux | grep qemu
>
>
> The cause of this problem is the num_pm of nfit_test1 is wrong.
> Though 1 is specified for num_pm at nfit_test_init(), it must be 2.
>
> ----
> static __init int nfit_test_init(void)
> {
>         int rc, i;
>         :
>         :
>                 case 1:
>                         nfit_test->num_pm = 1;            <---- !!!
>                         nfit_test->dcr_idx = NUM_DCR;
> -----
>
> The num_pm affects size of devm_kcalloc() at nfit_test_probe().
>
> ----
> static int nfit_test_probe(struct platform_device *pdev)
> {
>         if (nfit_test->num_pm) {
>                 int num = nfit_test->num_pm;   <----!!!
>
>                 nfit_test->spa_set = devm_kcalloc(dev, num, sizeof(void *),
>                                 GFP_KERNEL);   <---!!!!
>                 nfit_test->spa_set_dma = devm_kcalloc(dev, num,
>                                 sizeof(dma_addr_t), GFP_KERNEL);
> -----
>
> However,  spa_set[] array needs 2 elements at nfit_test1_alloc().
>
> ---
> static int nfit_test1_alloc(struct nfit_test *t)
> {
>                :
>         t->spa_set[0] = test_alloc(t, SPA2_SIZE, &t->spa_set_dma[0]);  <--- first element
>         if (!t->spa_set[0])
>                 return -ENOMEM;
>                :
>
>        t->spa_set[1] = test_alloc(t, SPA_VCD_SIZE, &t->spa_set_dma[1]);  <---- The second element!!!!
> -----
>
> This breaks other area, and the area is often the link list of devres.
> As a result, the panic occured on release_nodes().
>
> I confirmed that this panic never occurred with the following patch.

Care to resend this as a formal patch with a "Signed-off-by:"? I'll
get it applied.
diff mbox

Patch

diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index c218717..548b6d4 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -1943,7 +1943,7 @@  static __init int nfit_test_init(void)
                        nfit_test->setup = nfit_test0_setup;
                        break;
                case 1:
-                       nfit_test->num_pm = 1;
+                       nfit_test->num_pm = 2;
                        nfit_test->dcr_idx = NUM_DCR;
                        nfit_test->num_dcr = 2;
                        nfit_test->alloc = nfit_test1_alloc;