lib/mpi: call cond_resched() from mpi_powm() loop

Message ID	20171107061951.861-1-ebiggers3@gmail.com (mailing list archive)
State	Superseded
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> From: Eric Biggers <ebiggers3@gmail.com> To: linux-crypto@vger.kernel.org, Herbert Xu <herbert@gondor.apana.org.au> Cc: Tudor-Dan Ambarus <tudor.ambarus@microchip.com>, Mat Martineau <mathew.j.martineau@linux.intel.com>, Salvatore Benedetto <salvatore.benedetto@intel.com>, keyrings@vger.kernel.org, linux-kernel@vger.kernel.org, Eric Biggers <ebiggers@google.com>, stable@vger.kernel.org Subject: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop Date: Mon, 6 Nov 2017 22:19:51 -0800 Message-Id: <20171107061951.861-1-ebiggers3@gmail.com> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk

Message ID

20171107061951.861-1-ebiggers3@gmail.com (mailing list archive)

State

Superseded

Delegated to:

Herbert Xu

Headers

From: Eric Biggers <ebiggers3@gmail.com>
To: linux-crypto@vger.kernel.org, Herbert Xu <herbert@gondor.apana.org.au>
Cc: Tudor-Dan Ambarus <tudor.ambarus@microchip.com>,
	Mat Martineau <mathew.j.martineau@linux.intel.com>,
	Salvatore Benedetto <salvatore.benedetto@intel.com>,
	keyrings@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eric Biggers <ebiggers@google.com>, stable@vger.kernel.org
Subject: [PATCH] lib/mpi: call cond_resched() from mpi_powm() loop
Date: Mon,  6 Nov 2017 22:19:51 -0800
Message-Id: <20171107061951.861-1-ebiggers3@gmail.com>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk

Commit Message

Eric Biggers Nov. 7, 2017, 6:19 a.m. UTC

From: Eric Biggers <ebiggers@google.com>

On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
largest permitted inputs (16384 bits), the kernel spends 10+ seconds
doing modular exponentiation in mpi_powm() without rescheduling.  If all
threads do it, it locks up the system.  Moreover, it can cause
rcu_sched-stall warnings.

Notwithstanding the insanity of doing this calculation in kernel mode
rather than in userspace, fix it by calling cond_resched() as each bit
from the exponent is processed.  It's still noninterruptible, but at
least it's preemptible now.

Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 lib/mpi/mpi-pow.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Mat Martineau Nov. 7, 2017, 6:38 p.m. UTC | #1

Eric,

On Mon, 6 Nov 2017, Eric Biggers wrote:

> From: Eric Biggers <ebiggers@google.com>
>
> On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> doing modular exponentiation in mpi_powm() without rescheduling.  If all
> threads do it, it locks up the system.  Moreover, it can cause
> rcu_sched-stall warnings.
>
> Notwithstanding the insanity of doing this calculation in kernel mode
> rather than in userspace, fix it by calling cond_resched() as each bit
> from the exponent is processed.  It's still noninterruptible, but at
> least it's preemptible now.

cond_resched() is in the outer loop and gets called every BITS_PER_LONG 
bits. That seems to be often enough for the system that was taking 10+ 
seconds, and might be ok for slower processors.

Was your intent to call cond_resched() for every bit as you described in 
the commit message?

Thanks for the fix.


Mat


>
> Cc: stable@vger.kernel.org # v4.12+
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> lib/mpi/mpi-pow.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
> index e24388a863a7..f089a52dbbdb 100644
> --- a/lib/mpi/mpi-pow.c
> +++ b/lib/mpi/mpi-pow.c
> @@ -26,6 +26,7 @@
>  *	 however I decided to publish this code under the plain GPL.
>  */
>
> +#include <linux/sched.h>
> #include <linux/string.h>
> #include "mpi-internal.h"
> #include "longlong.h"
> @@ -263,6 +264,8 @@ int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
> 				break;
> 			e = ep[i];
> 			c = BITS_PER_MPI_LIMB;
> +
> +			cond_resched();
> 		}
>
> 		/* We shifted MOD, the modulo reduction argument, left MOD_SHIFT_CNT
> -- 
> 2.15.0
>
>

--
Mat Martineau
Intel OTC

Eric Biggers Nov. 7, 2017, 10:03 p.m. UTC | #2

On Tue, Nov 07, 2017 at 10:38:30AM -0800, Mat Martineau wrote:
> 
> Eric,
> 
> On Mon, 6 Nov 2017, Eric Biggers wrote:
> 
> >From: Eric Biggers <ebiggers@google.com>
> >
> >On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> >largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> >doing modular exponentiation in mpi_powm() without rescheduling.  If all
> >threads do it, it locks up the system.  Moreover, it can cause
> >rcu_sched-stall warnings.
> >
> >Notwithstanding the insanity of doing this calculation in kernel mode
> >rather than in userspace, fix it by calling cond_resched() as each bit
> >from the exponent is processed.  It's still noninterruptible, but at
> >least it's preemptible now.
> 
> cond_resched() is in the outer loop and gets called every
> BITS_PER_LONG bits. That seems to be often enough for the system
> that was taking 10+ seconds, and might be ok for slower processors.
> 
> Was your intent to call cond_resched() for every bit as you
> described in the commit message?
> 

You're right, the cond_resched() is actually once per "limb", not once per bit.

With the largest permitted inputs (16384 bits), each limb of the exponent takes
about 38 milliseconds on an x86_64 CPU.  Therefore on some other CPUs it will
probably take 100+ milliseconds, which is much too long.  So I guess it should
do cond_resched() for each bit.  I'll send a revised patch...

Eric

Herbert Xu Nov. 10, 2017, 11:37 a.m. UTC | #3

On Mon, Nov 06, 2017 at 10:19:51PM -0800, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> doing modular exponentiation in mpi_powm() without rescheduling.  If all
> threads do it, it locks up the system.  Moreover, it can cause
> rcu_sched-stall warnings.
> 
> Notwithstanding the insanity of doing this calculation in kernel mode
> rather than in userspace, fix it by calling cond_resched() as each bit
> from the exponent is processed.  It's still noninterruptible, but at
> least it's preemptible now.
> 
> Cc: stable@vger.kernel.org # v4.12+
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Patch applied.  Thanks.

Eric Biggers Nov. 10, 2017, 6:41 p.m. UTC | #4

On Fri, Nov 10, 2017 at 10:37:30PM +1100, Herbert Xu wrote:
> On Mon, Nov 06, 2017 at 10:19:51PM -0800, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the
> > largest permitted inputs (16384 bits), the kernel spends 10+ seconds
> > doing modular exponentiation in mpi_powm() without rescheduling.  If all
> > threads do it, it locks up the system.  Moreover, it can cause
> > rcu_sched-stall warnings.
> > 
> > Notwithstanding the insanity of doing this calculation in kernel mode
> > rather than in userspace, fix it by calling cond_resched() as each bit
> > from the exponent is processed.  It's still noninterruptible, but at
> > least it's preemptible now.
> > 
> > Cc: stable@vger.kernel.org # v4.12+
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> 
> Patch applied.  Thanks.
> -- 

If it's not too late can you fix the stable line to be just

	Cc: stable@vger.kernel.org

As Mat pointed out KEYCTL_DH_COMPUTE was actually introduced in v4.7.  Also I
think the code is also reachable through RSA by adding an x509 certificate using
the "asymmetric" key type, although that appears to be limited to 4096-bit
inputs rather than 16384 bits.

Eric

diff --git a/lib/mpi/mpi-pow.c b/lib/mpi/mpi-pow.c
index e24388a863a7..f089a52dbbdb 100644
--- a/lib/mpi/mpi-pow.c
+++ b/lib/mpi/mpi-pow.c
@@ -26,6 +26,7 @@ 
  *	 however I decided to publish this code under the plain GPL.
  */
 
+#include <linux/sched.h>
 #include <linux/string.h>
 #include "mpi-internal.h"
 #include "longlong.h"
@@ -263,6 +264,8 @@  int mpi_powm(MPI res, MPI base, MPI exp, MPI mod)
 				break;
 			e = ep[i];
 			c = BITS_PER_MPI_LIMB;
+
+			cond_resched();
 		}
 
 		/* We shifted MOD, the modulo reduction argument, left MOD_SHIFT_CNT