diff mbox series

crypto: caam - use the same jr for rng init/exit

Message ID VI1PR0402MB34859D108C03F3AB0F64EE6598B10@VI1PR0402MB3485.eurprd04.prod.outlook.com (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show
Series crypto: caam - use the same jr for rng init/exit | expand

Commit Message

Horia Geanta Sept. 11, 2019, 9:35 a.m. UTC
On 9/4/2019 5:35 AM, Andrey Smirnov wrote:
> In order to allow caam_jr_enqueue() to lock underlying JR's
> device (via device_lock(), see commit that follows) we need to make
> sure that no code calls caam_jr_enqueue() as a part of caam_jr_probe()
> to avoid a deadlock. Unfortunately, current implementation of caamrng
> code does exactly that in caam_init_buf().
> 
> Another big problem with original caamrng initialization is a
> circular reference in the form of:
> 
>  1. caam_rng_init() aquires JR via caam_jr_alloc(). Freed only by
>     caam_rng_exit()
> 
>  2. caam_rng_exit() is only called by unregister_algs() once last JR
>     is shut down
> 
>  3. caam_jr_remove() won't call unregister_algs() for last JR
>     until tfm_count reaches zero, which can only happen via
>     unregister_algs() -> caam_rng_exit() call chain.
> 
> To avoid all of that, convert caamrng code to a platform device driver
> and extend caam_probe() to create corresponding platform device.
> 
> Additionally, this change also allows us to remove any access to
> struct caam_drv_private in caamrng.c as well as simplify resource
> ownership/deallocation via devres.
> 
I would avoid adding platform devices that don't have
corresponding DT nodes.

There's some generic advice here:
https://www.kernel.org/doc/html/latest/driver-api/driver-model/platform.html#legacy-drivers-device-probing

and there's also previous experience in caam driver:
6b175685b4a1 crypto: caam/qi - don't allocate an extra platform device

The issue in caamrng is actually that caam/jr driver (jr.c) tries to call
caam_rng_exit() on the last available jr device.
Instead, caam_rng_exit() must be called on the same jr device that
was used during caam_rng_init().

Otherwise, by the time:

void caam_rng_exit(void)
{
        if (!init_done)
                return;

        caam_jr_free(rng_ctx->jrdev);
	hwrng_unregister(&caam_rng);
[...]

is executed, rng_ctx->jrdev might have been removed.

This will cause an oops in caam_jr_free().
caam_cleanup() - .cleanup hwrng callback that is called when doing
hwrng_unregister() - also needs to be executed on that jr device.

The problem can be easily reproduced as follows.

If caamrng was initialized on jr0:
[...]
caam_jr 2101000.jr0: registering rng-caam
[...]

then by manually unbinding jr0 first, then jr1 (using i.MX6SX):
# echo -n "2101000.jr0" > /sys/bus/platform/drivers/caam_jr/
# echo -n "2102000.jr1" > /sys/bus/platform/drivers/caam_jr/

Unable to handle kernel NULL pointer dereference at virtual address 00000040
pgd = 572e14e7
[00000040] *pgd=be2e8831
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 629 Comm: sh Not tainted 5.3.0-rc1-00299-g8e2a2738e5d3-dirty #8
Hardware name: Freescale i.MX6 SoloX (Device Tree)
PC is at caam_jr_free+0xc/0x24
LR is at caam_rng_exit+0x20/0x3c
pc : [<c08aef20>]    lr : [<c08bab1c>]    psr: 200f0013
sp : e9669e68  ip : 00001488  fp : 00000000
r10: 00000000  r9 : e9669f80  r8 : e9284010
r7 : e872d410  r6 : e872d410  r5 : e872d400  r4 : c1aa7cd8
r3 : 00000000  r2 : 00000040  r1 : 00000000  r0 : e872d010
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: a969004a  DAC: 00000051
Process sh (pid: 629, stack limit = 0xfc1b6e94)
Stack: (0xe9669e68 to 0xe966a000)
9e60:                   e865c940 c08af7dc e872d410 e872d410 c13a9cec c06b223c
9e80: c06b2218 e872d410 e81a9410 c06b08dc c13806f0 0000000b c13a9cec c06aeaf8
9ea0: 0000000b 00000000 0000000b e9284000 e91189c0 c0318c3c 00000000 00000000
9ec0: e95ddbd0 e8843500 c1308b08 c6614c9f e8843500 00438340 e9668000 00000004
9ee0: 00000000 c0285e00 00000001 00000000 00000000 c0288a44 00000000 00000000
9f00: c1308b28 00000000 00000001 c130911c 00000001 c13e81d1 c0288a44 00000000
9f20: e8ed9800 c019df00 e8ed9a7c c028ac08 00000001 00000000 c0288a44 c6614c9f
9f40: c1308b08 0000000b 00438340 e9669f80 e8843500 00438340 e9668000 c028899c
9f60: e95ddbc0 c6614c9f e8843500 e8843500 c1308b08 0000000b 00438340 c0288bfc
9f80: 00000000 00000000 00000000 c6614c9f 0000000b 00438340 b6ef1da0 00000004
9fa0: c01011c4 c0101000 0000000b 00438340 00000001 00438340 0000000b 00000000
9fc0: 0000000b 00438340 b6ef1da0 00000004 00438340 0000000b 00000000 00000000
9fe0: 0000006c bea7f908 b6e19e58 b6e7325c 600f0010 00000001 00000000 00000000
[<c08aef20>] (caam_jr_free) from [<c08bab1c>] (caam_rng_exit+0x20/0x3c)
[<c08bab1c>] (caam_rng_exit) from [<c08af7dc>] (caam_jr_remove+0x38/0xc0)
[<c08af7dc>] (caam_jr_remove) from [<c06b223c>] (platform_drv_remove+0x24/0x3c)
[<c06b223c>] (platform_drv_remove) from [<c06b08dc>] (device_release_driver_internal+0xdc/0x1a0)
[<c06b08dc>] (device_release_driver_internal) from [<c06aeaf8>] (unbind_store+0x5c/0xcc)
[<c06aeaf8>] (unbind_store) from [<c0318c3c>] (kernfs_fop_write+0xfc/0x1e0)
[<c0318c3c>] (kernfs_fop_write) from [<c0285e00>] (__vfs_write+0x2c/0x1d0)
[<c0285e00>] (__vfs_write) from [<c028899c>] (vfs_write+0xa0/0x180)
[<c028899c>] (vfs_write) from [<c0288bfc>] (ksys_write+0x5c/0xdc)
[<c0288bfc>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xe9669fa8 to 0xe9669ff0)
9fa0:                   0000000b 00438340 00000001 00438340 0000000b 00000000
9fc0: 0000000b 00438340 b6ef1da0 00000004 00438340 0000000b 00000000 00000000
9fe0: 0000006c bea7f908 b6e19e58 b6e7325c
Code: eaffff49 e5903040 e2832040 f5d2f000 (e1921f9f)

Thus, I'd say the following fix is needed:

-- >8 --

When caam_rng_exit() executes, the jr device that was used
during caam_rng_init() must be available.

This means that current scheme - where caam_rng_exit() is called
when last jr device is removed - is incorrect.
Instead, caam_rng_exit() has to run when the jr acquired
during caam_rng_init() is removed.

Fixes: 1b46c90c8e00 ("crypto: caam - convert top level drivers to libraries")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>

Comments

Andrey Smirnov Sept. 18, 2019, 6:01 a.m. UTC | #1
On Wed, Sep 11, 2019 at 2:35 AM Horia Geanta <horia.geanta@nxp.com> wrote:
>
> On 9/4/2019 5:35 AM, Andrey Smirnov wrote:
> > In order to allow caam_jr_enqueue() to lock underlying JR's
> > device (via device_lock(), see commit that follows) we need to make
> > sure that no code calls caam_jr_enqueue() as a part of caam_jr_probe()
> > to avoid a deadlock. Unfortunately, current implementation of caamrng
> > code does exactly that in caam_init_buf().
> >

I don't think your patch addresses the above, so it can probably be
cut out of the message.

> > Another big problem with original caamrng initialization is a
> > circular reference in the form of:
> >
> >  1. caam_rng_init() aquires JR via caam_jr_alloc(). Freed only by
> >     caam_rng_exit()
> >
> >  2. caam_rng_exit() is only called by unregister_algs() once last JR
> >     is shut down
> >
> >  3. caam_jr_remove() won't call unregister_algs() for last JR
> >     until tfm_count reaches zero, which can only happen via
> >     unregister_algs() -> caam_rng_exit() call chain.
> >
> > To avoid all of that, convert caamrng code to a platform device driver
> > and extend caam_probe() to create corresponding platform device.
> >
> > Additionally, this change also allows us to remove any access to
> > struct caam_drv_private in caamrng.c as well as simplify resource
> > ownership/deallocation via devres.
> >
> I would avoid adding platform devices that don't have
> corresponding DT nodes.
>
> There's some generic advice here:
> https://www.kernel.org/doc/html/latest/driver-api/driver-model/platform.html#legacy-drivers-device-probing
>
> and there's also previous experience in caam driver:
> 6b175685b4a1 crypto: caam/qi - don't allocate an extra platform device
>

Hmm, I am not sure how that experience relates to the case we have
with hwrng, but OK, I'm going to assume that platform driver approach
is a no-go.

> The issue in caamrng is actually that caam/jr driver (jr.c) tries to call
> caam_rng_exit() on the last available jr device.
> Instead, caam_rng_exit() must be called on the same jr device that
> was used during caam_rng_init().
>
> Otherwise, by the time:
>
> void caam_rng_exit(void)
> {
>         if (!init_done)
>                 return;
>
>         caam_jr_free(rng_ctx->jrdev);
>         hwrng_unregister(&caam_rng);
> [...]
>
> is executed, rng_ctx->jrdev might have been removed.
>
> This will cause an oops in caam_jr_free().
> caam_cleanup() - .cleanup hwrng callback that is called when doing
> hwrng_unregister() - also needs to be executed on that jr device.
>
> The problem can be easily reproduced as follows.
>
> If caamrng was initialized on jr0:
> [...]
> caam_jr 2101000.jr0: registering rng-caam
> [...]
>
> then by manually unbinding jr0 first, then jr1 (using i.MX6SX):
> # echo -n "2101000.jr0" > /sys/bus/platform/drivers/caam_jr/
> # echo -n "2102000.jr1" > /sys/bus/platform/drivers/caam_jr/
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000040
> pgd = 572e14e7
> [00000040] *pgd=be2e8831
> Internal error: Oops: 17 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 629 Comm: sh Not tainted 5.3.0-rc1-00299-g8e2a2738e5d3-dirty #8
> Hardware name: Freescale i.MX6 SoloX (Device Tree)
> PC is at caam_jr_free+0xc/0x24
> LR is at caam_rng_exit+0x20/0x3c
> pc : [<c08aef20>]    lr : [<c08bab1c>]    psr: 200f0013
> sp : e9669e68  ip : 00001488  fp : 00000000
> r10: 00000000  r9 : e9669f80  r8 : e9284010
> r7 : e872d410  r6 : e872d410  r5 : e872d400  r4 : c1aa7cd8
> r3 : 00000000  r2 : 00000040  r1 : 00000000  r0 : e872d010
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: a969004a  DAC: 00000051
> Process sh (pid: 629, stack limit = 0xfc1b6e94)
> Stack: (0xe9669e68 to 0xe966a000)
> 9e60:                   e865c940 c08af7dc e872d410 e872d410 c13a9cec c06b223c
> 9e80: c06b2218 e872d410 e81a9410 c06b08dc c13806f0 0000000b c13a9cec c06aeaf8
> 9ea0: 0000000b 00000000 0000000b e9284000 e91189c0 c0318c3c 00000000 00000000
> 9ec0: e95ddbd0 e8843500 c1308b08 c6614c9f e8843500 00438340 e9668000 00000004
> 9ee0: 00000000 c0285e00 00000001 00000000 00000000 c0288a44 00000000 00000000
> 9f00: c1308b28 00000000 00000001 c130911c 00000001 c13e81d1 c0288a44 00000000
> 9f20: e8ed9800 c019df00 e8ed9a7c c028ac08 00000001 00000000 c0288a44 c6614c9f
> 9f40: c1308b08 0000000b 00438340 e9669f80 e8843500 00438340 e9668000 c028899c
> 9f60: e95ddbc0 c6614c9f e8843500 e8843500 c1308b08 0000000b 00438340 c0288bfc
> 9f80: 00000000 00000000 00000000 c6614c9f 0000000b 00438340 b6ef1da0 00000004
> 9fa0: c01011c4 c0101000 0000000b 00438340 00000001 00438340 0000000b 00000000
> 9fc0: 0000000b 00438340 b6ef1da0 00000004 00438340 0000000b 00000000 00000000
> 9fe0: 0000006c bea7f908 b6e19e58 b6e7325c 600f0010 00000001 00000000 00000000
> [<c08aef20>] (caam_jr_free) from [<c08bab1c>] (caam_rng_exit+0x20/0x3c)
> [<c08bab1c>] (caam_rng_exit) from [<c08af7dc>] (caam_jr_remove+0x38/0xc0)
> [<c08af7dc>] (caam_jr_remove) from [<c06b223c>] (platform_drv_remove+0x24/0x3c)
> [<c06b223c>] (platform_drv_remove) from [<c06b08dc>] (device_release_driver_internal+0xdc/0x1a0)
> [<c06b08dc>] (device_release_driver_internal) from [<c06aeaf8>] (unbind_store+0x5c/0xcc)
> [<c06aeaf8>] (unbind_store) from [<c0318c3c>] (kernfs_fop_write+0xfc/0x1e0)
> [<c0318c3c>] (kernfs_fop_write) from [<c0285e00>] (__vfs_write+0x2c/0x1d0)
> [<c0285e00>] (__vfs_write) from [<c028899c>] (vfs_write+0xa0/0x180)
> [<c028899c>] (vfs_write) from [<c0288bfc>] (ksys_write+0x5c/0xdc)
> [<c0288bfc>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xe9669fa8 to 0xe9669ff0)
> 9fa0:                   0000000b 00438340 00000001 00438340 0000000b 00000000
> 9fc0: 0000000b 00438340 b6ef1da0 00000004 00438340 0000000b 00000000 00000000
> 9fe0: 0000006c bea7f908 b6e19e58 b6e7325c
> Code: eaffff49 e5903040 e2832040 f5d2f000 (e1921f9f)
>
> Thus, I'd say the following fix is needed:
>
> -- >8 --
>
> When caam_rng_exit() executes, the jr device that was used
> during caam_rng_init() must be available.
>
> This means that current scheme - where caam_rng_exit() is called
> when last jr device is removed - is incorrect.
> Instead, caam_rng_exit() has to run when the jr acquired
> during caam_rng_init() is removed.
>
> Fixes: 1b46c90c8e00 ("crypto: caam - convert top level drivers to libraries")
> Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
>
> diff --git a/drivers/crypto/caam/caamrng.c b/drivers/crypto/caam/caamrng.c
> index e8baacaabe07..ec40178fa688 100644
> --- a/drivers/crypto/caam/caamrng.c
> +++ b/drivers/crypto/caam/caamrng.c
> @@ -300,9 +300,9 @@ static struct hwrng caam_rng = {
>         .read           = caam_read,
>  };
>
> -void caam_rng_exit(void)
> +void caam_rng_exit(struct device *jrdev)
>  {
> -       if (!init_done)
> +       if (!init_done || jrdev != rng_ctx->jrdev)
>                 return;
>
>         caam_jr_free(rng_ctx->jrdev);
> diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
> index 731b06becd9c..4795530203ad 100644
> --- a/drivers/crypto/caam/intern.h
> +++ b/drivers/crypto/caam/intern.h
> @@ -165,7 +165,7 @@ static inline void caam_pkc_exit(void)
>  #ifdef CONFIG_CRYPTO_DEV_FSL_CAAM_RNG_API
>
>  int caam_rng_init(struct device *dev);
> -void caam_rng_exit(void);
> +void caam_rng_exit(struct device *jrdev);
>
>  #else
>
> @@ -174,7 +174,7 @@ static inline int caam_rng_init(struct device *dev)
>         return 0;
>  }
>
> -static inline void caam_rng_exit(void)
> +static inline void caam_rng_exit(struct device *jrdev)
>  {
>  }
>
> diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
> index d11956bc358f..61aea11773a6 100644
> --- a/drivers/crypto/caam/jr.c
> +++ b/drivers/crypto/caam/jr.c
> @@ -53,7 +53,6 @@ static void unregister_algs(void)
>
>         caam_qi_algapi_exit();
>
> -       caam_rng_exit();
>         caam_pkc_exit();
>         caam_algapi_hash_exit();
>         caam_algapi_exit();
> @@ -126,6 +125,8 @@ static int caam_jr_remove(struct platform_device *pdev)
>         jrdev = &pdev->dev;
>         jrpriv = dev_get_drvdata(jrdev);
>
> +       caam_rng_exit(jrdev);
> +
>         /*
>          * Return EBUSY if job ring already allocated.
>          */
Horia Geanta Sept. 20, 2019, 3:50 p.m. UTC | #2
On 9/18/2019 9:01 AM, Andrey Smirnov wrote:
> On Wed, Sep 11, 2019 at 2:35 AM Horia Geanta <horia.geanta@nxp.com> wrote:
>>
>> On 9/4/2019 5:35 AM, Andrey Smirnov wrote:
>>> In order to allow caam_jr_enqueue() to lock underlying JR's
>>> device (via device_lock(), see commit that follows) we need to make
>>> sure that no code calls caam_jr_enqueue() as a part of caam_jr_probe()
>>> to avoid a deadlock. Unfortunately, current implementation of caamrng
>>> code does exactly that in caam_init_buf().
>>>
> 
> I don't think your patch addresses the above, so it can probably be
> cut out of the message.
> 
No, it does not, it addresses the issue right below.

Not sure what you mean by "cut out of the message". If you look carefully
in the original message, at some pointer there is a scissors line.

>>> Another big problem with original caamrng initialization is a
>>> circular reference in the form of:
>>>
>>>  1. caam_rng_init() aquires JR via caam_jr_alloc(). Freed only by
>>>     caam_rng_exit()
>>>
>>>  2. caam_rng_exit() is only called by unregister_algs() once last JR
>>>     is shut down
>>>
>>>  3. caam_jr_remove() won't call unregister_algs() for last JR
>>>     until tfm_count reaches zero, which can only happen via
>>>     unregister_algs() -> caam_rng_exit() call chain.
>>>
>>> To avoid all of that, convert caamrng code to a platform device driver
>>> and extend caam_probe() to create corresponding platform device.
>>>
>>> Additionally, this change also allows us to remove any access to
>>> struct caam_drv_private in caamrng.c as well as simplify resource
>>> ownership/deallocation via devres.
>>>
>> I would avoid adding platform devices that don't have
>> corresponding DT nodes.
>>
>> There's some generic advice here:
>> https://www.kernel.org/doc/html/latest/driver-api/driver-model/platform.html#legacy-drivers-device-probing
>>
>> and there's also previous experience in caam driver:
>> 6b175685b4a1 crypto: caam/qi - don't allocate an extra platform device
>>
> 
> Hmm, I am not sure how that experience relates to the case we have
> with hwrng, but OK, I'm going to assume that platform driver approach
> is a no-go.
> 
Not specific to hwrng, but platform drivers in general:
    [...]
    SMMU. A platform device allocated using platform_device_register_full()
    is not completely set up - most importantly .dma_configure()
    is not called.

Horia
diff mbox series

Patch

diff --git a/drivers/crypto/caam/caamrng.c b/drivers/crypto/caam/caamrng.c
index e8baacaabe07..ec40178fa688 100644
--- a/drivers/crypto/caam/caamrng.c
+++ b/drivers/crypto/caam/caamrng.c
@@ -300,9 +300,9 @@  static struct hwrng caam_rng = {
        .read           = caam_read,
 };

-void caam_rng_exit(void)
+void caam_rng_exit(struct device *jrdev)
 {
-       if (!init_done)
+       if (!init_done || jrdev != rng_ctx->jrdev)
                return;

        caam_jr_free(rng_ctx->jrdev);
diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
index 731b06becd9c..4795530203ad 100644
--- a/drivers/crypto/caam/intern.h
+++ b/drivers/crypto/caam/intern.h
@@ -165,7 +165,7 @@  static inline void caam_pkc_exit(void)
 #ifdef CONFIG_CRYPTO_DEV_FSL_CAAM_RNG_API

 int caam_rng_init(struct device *dev);
-void caam_rng_exit(void);
+void caam_rng_exit(struct device *jrdev);

 #else

@@ -174,7 +174,7 @@  static inline int caam_rng_init(struct device *dev)
        return 0;
 }

-static inline void caam_rng_exit(void)
+static inline void caam_rng_exit(struct device *jrdev)
 {
 }

diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
index d11956bc358f..61aea11773a6 100644
--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -53,7 +53,6 @@  static void unregister_algs(void)

        caam_qi_algapi_exit();

-       caam_rng_exit();
        caam_pkc_exit();
        caam_algapi_hash_exit();
        caam_algapi_exit();
@@ -126,6 +125,8 @@  static int caam_jr_remove(struct platform_device *pdev)
        jrdev = &pdev->dev;
        jrpriv = dev_get_drvdata(jrdev);

+       caam_rng_exit(jrdev);
+
        /*
         * Return EBUSY if job ring already allocated.
         */