mbox series

[v5,0/5] Lazy flush for the auth session

Message ID 20240921120811.1264985-1-jarkko@kernel.org (mailing list archive)
Headers show
Series Lazy flush for the auth session | expand

Message

Jarkko Sakkinen Sept. 21, 2024, 12:08 p.m. UTC
This patch set aims to fix:
https://bugzilla.kernel.org/show_bug.cgi?id=219229.

The baseline for the series is the v6.11 tag.

v4:
https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
v3:
https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
v2:
https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
v1:
https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/

Jarkko Sakkinen (5):
  tpm: Return on tpm2_create_null_primary() failure
  tpm: Implement tpm2_load_null() rollback
  tpm: flush the null key only when /dev/tpm0 is accessed
  tpm: Allocate chip->auth in tpm2_start_auth_session()
  tpm: flush the auth session only when /dev/tpm0 is open

 drivers/char/tpm/tpm-chip.c       |  14 ++++
 drivers/char/tpm/tpm-dev-common.c |   8 +++
 drivers/char/tpm/tpm-interface.c  |  10 ++-
 drivers/char/tpm/tpm2-cmd.c       |   3 +
 drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++------------
 include/linux/tpm.h               |   2 +
 6 files changed, 102 insertions(+), 44 deletions(-)

Comments

Paul Menzel Sept. 21, 2024, 12:36 p.m. UTC | #1
Dear Jarkko,


Thank you for working on this and your patches.

Am 21.09.24 um 14:08 schrieb Jarkko Sakkinen:
> This patch set aims to fix:
> https://bugzilla.kernel.org/show_bug.cgi?id=219229.

If I am not mistaken this is about reducing the boot time, right? It’d 
be great if you documented the numbers in the commit messages.


Kind regards,

Paul
Jarkko Sakkinen Sept. 21, 2024, 1:13 p.m. UTC | #2
On Sat Sep 21, 2024 at 3:36 PM EEST, Paul Menzel wrote:
> Dear Jarkko,
>
>
> Thank you for working on this and your patches.
>
> Am 21.09.24 um 14:08 schrieb Jarkko Sakkinen:
> > This patch set aims to fix:
> > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
>
> If I am not mistaken this is about reducing the boot time, right? It’d 
> be great if you documented the numbers in the commit messages.

So what you're asking is already documented by:

1. https://bugzilla.kernel.org/show_bug.cgi?id=219229
2. Tested-by

I added lore link to the bugzilla bug, in order to address your comment.

I don't think my "QEMU numbers" will bring any value here tbh.

BR, Jarkko
Jarkko Sakkinen Sept. 21, 2024, 2:38 p.m. UTC | #3
On Sat Sep 21, 2024 at 4:13 PM EEST, Jarkko Sakkinen wrote:
> On Sat Sep 21, 2024 at 3:36 PM EEST, Paul Menzel wrote:
> > Dear Jarkko,
> >
> >
> > Thank you for working on this and your patches.
> >
> > Am 21.09.24 um 14:08 schrieb Jarkko Sakkinen:
> > > This patch set aims to fix:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> >
> > If I am not mistaken this is about reducing the boot time, right? It’d 
> > be great if you documented the numbers in the commit messages.
>
> So what you're asking is already documented by:
>
> 1. https://bugzilla.kernel.org/show_bug.cgi?id=219229
> 2. Tested-by
>
> I added lore link to the bugzilla bug, in order to address your comment.

... to the email which contains the figures.

20 s -> 8.7 s

BR, Jarkko
Jarkko Sakkinen Sept. 22, 2024, 5:51 p.m. UTC | #4
On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote:
> This patch set aims to fix:
> https://bugzilla.kernel.org/show_bug.cgi?id=219229.
>
> The baseline for the series is the v6.11 tag.
>
> v4:
> https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> v3:
> https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> v2:
> https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> v1:
> https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
>
> Jarkko Sakkinen (5):
>   tpm: Return on tpm2_create_null_primary() failure
>   tpm: Implement tpm2_load_null() rollback
>   tpm: flush the null key only when /dev/tpm0 is accessed
>   tpm: Allocate chip->auth in tpm2_start_auth_session()
>   tpm: flush the auth session only when /dev/tpm0 is open
>
>  drivers/char/tpm/tpm-chip.c       |  14 ++++
>  drivers/char/tpm/tpm-dev-common.c |   8 +++
>  drivers/char/tpm/tpm-interface.c  |  10 ++-
>  drivers/char/tpm/tpm2-cmd.c       |   3 +
>  drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++------------
>  include/linux/tpm.h               |   2 +
>  6 files changed, 102 insertions(+), 44 deletions(-)


Roberto, James, speaking of digest cache. This patch set has no aim to
fix those issues but I do believe that it should improve also that 
feature.

If I don't get soon patch reviews for the patch set, I'll pick the 2nd
best option: disable bus encryption on all architectures including x86
and ARM64 (being by default on).

It's a force majeure situation. I know this would sort out the issue
but I really cannot send these as a pull request with zero reviewe-by's.

I expect this to be closed by tomorrow.

BR, Jarkko
James Bottomley Sept. 24, 2024, 1:48 p.m. UTC | #5
On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote:
> On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote:
> > This patch set aims to fix:
> > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> > 
> > The baseline for the series is the v6.11 tag.
> > 
> > v4:
> > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> > v3:
> > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> > v2:
> > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> > v1:
> > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
> > 
> > Jarkko Sakkinen (5):
> >   tpm: Return on tpm2_create_null_primary() failure
> >   tpm: Implement tpm2_load_null() rollback
> >   tpm: flush the null key only when /dev/tpm0 is accessed
> >   tpm: Allocate chip->auth in tpm2_start_auth_session()
> >   tpm: flush the auth session only when /dev/tpm0 is open
> > 
> >  drivers/char/tpm/tpm-chip.c       |  14 ++++
> >  drivers/char/tpm/tpm-dev-common.c |   8 +++
> >  drivers/char/tpm/tpm-interface.c  |  10 ++-
> >  drivers/char/tpm/tpm2-cmd.c       |   3 +
> >  drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++--------
> > ----
> >  include/linux/tpm.h               |   2 +
> >  6 files changed, 102 insertions(+), 44 deletions(-)
> 
> 
> Roberto, James, speaking of digest cache. This patch set has no aim
> to fix those issues but I do believe that it should improve also that
> feature.
> 
> If I don't get soon patch reviews for the patch set, I'll pick the
> 2nd best option: disable bus encryption on all architectures
> including x86 and ARM64 (being by default on).
> 
> It's a force majeure situation. I know this would sort out the issue
> but I really cannot send these as a pull request with zero reviewe-
> by's.
> 
> I expect this to be closed by tomorrow.

Hey come on, you knew I was running plumbers last week so I had all the
lead up and teardown stuff to do as well.  I'm only just digging
through accumulated email.

Patches 1-2 are fully irrelevant to the bug, so I ignored them on the
grounds that improvement to the error flow could be done through the
normal patch process

Patch 3 is completely unnecessary: the null key is only used to salt
the session and is not required to be resident while the session is
used (so can be flushed after session creation) therefore keeping it
around serves no purpose once the session is created and simply
clutters up the TPM volatile handle slots. (I don't know of a case
where we use all the slots in a kernel operation, but since we don't
need it lets not find out when we get one).  So I advise dropping patch
3.

I've reviewed 4 and 5.

Regards,

James
Jarkko Sakkinen Sept. 24, 2024, 4:29 p.m. UTC | #6
On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote:
> On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote:
> > On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote:
> > > This patch set aims to fix:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> > > 
> > > The baseline for the series is the v6.11 tag.
> > > 
> > > v4:
> > > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> > > v3:
> > > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> > > v2:
> > > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> > > v1:
> > > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
> > > 
> > > Jarkko Sakkinen (5):
> > >   tpm: Return on tpm2_create_null_primary() failure
> > >   tpm: Implement tpm2_load_null() rollback
> > >   tpm: flush the null key only when /dev/tpm0 is accessed
> > >   tpm: Allocate chip->auth in tpm2_start_auth_session()
> > >   tpm: flush the auth session only when /dev/tpm0 is open
> > > 
> > >  drivers/char/tpm/tpm-chip.c       |  14 ++++
> > >  drivers/char/tpm/tpm-dev-common.c |   8 +++
> > >  drivers/char/tpm/tpm-interface.c  |  10 ++-
> > >  drivers/char/tpm/tpm2-cmd.c       |   3 +
> > >  drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++--------
> > > ----
> > >  include/linux/tpm.h               |   2 +
> > >  6 files changed, 102 insertions(+), 44 deletions(-)
> > 
> > 
> > Roberto, James, speaking of digest cache. This patch set has no aim
> > to fix those issues but I do believe that it should improve also that
> > feature.
> > 
> > If I don't get soon patch reviews for the patch set, I'll pick the
> > 2nd best option: disable bus encryption on all architectures
> > including x86 and ARM64 (being by default on).
> > 
> > It's a force majeure situation. I know this would sort out the issue
> > but I really cannot send these as a pull request with zero reviewe-
> > by's.
> > 
> > I expect this to be closed by tomorrow.
>
> Hey come on, you knew I was running plumbers last week so I had all the
> lead up and teardown stuff to do as well.  I'm only just digging
> through accumulated email.

Fair enough, I actually do not want to disable the feature. That
was my main concern here. Now if we get this fixed we might be
able to revisit earlier decisions on defconfig and widen the
support eventually, not shrink it.


>
> Patches 1-2 are fully irrelevant to the bug, so I ignored them on the
> grounds that improvement to the error flow could be done through the
> normal patch process

Hmm.. I'll revisit this for v6. Not sure what to say on this yet
because I need to address the other remarks and based on that
reflect. So might drop or keep them but not 100% sure yet.


> Patch 3 is completely unnecessary: the null key is only used to salt
> the session and is not required to be resident while the session is
> used (so can be flushed after session creation) therefore keeping it
> around serves no purpose once the session is created and simply
> clutters up the TPM volatile handle slots. (I don't know of a case
> where we use all the slots in a kernel operation, but since we don't
> need it lets not find out when we get one).  So I advise dropping patch
> 3.

Let's go this through just to check I'm understanding.

Holding null key had radical effect on boot time: it cut it down by
5 secons down to 15 seconds:

https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/

Then in subsequent version I implemented lazy auth session and boot
time went down to 9.7 seconds.

So is the point you're trying to make that since auth session is 
already held as long as we can and they flushed in synchronous
point too, I can just as well drop patch 3?

I think I reach your point but just want to check that I do it
for the matching reasons. It is evolutionary cruft in the patch
set :-)

BR, Jarkko
James Bottomley Sept. 24, 2024, 4:33 p.m. UTC | #7
On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote:
> On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote:
[...]
> > Patch 3 is completely unnecessary: the null key is only used to
> > salt the session and is not required to be resident while the
> > session is used (so can be flushed after session creation)
> > therefore keeping it around serves no purpose once the session is
> > created and simply clutters up the TPM volatile handle slots. (I
> > don't know of a case where we use all the slots in a kernel
> > operation, but since we don't need it lets not find out when we get
> > one).  So I advise dropping patch 3.
> 
> Let's go this through just to check I'm understanding.
> 
> Holding null key had radical effect on boot time: it cut it down by
> 5 secons down to 15 seconds:
> 
> https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/
> 
> Then in subsequent version I implemented lazy auth session and boot
> time went down to 9.7 seconds.
> 
> So is the point you're trying to make that since auth session is 
> already held as long as we can and they flushed in synchronous
> point too, I can just as well drop patch 3?

Yes, because the null key is only used in session generation which is
now lazy, it adds or subtracts nothing from the timings.  When you're
forced to flush the session, the null key goes too, so you again have
to restore it from the context.  When you can keep the session you
don't need the null key because you're not regenerating it.

> I think I reach your point but just want to check that I do it
> for the matching reasons. It is evolutionary cruft in the patch
> set :-)

Regards,

James
Jarkko Sakkinen Sept. 24, 2024, 4:36 p.m. UTC | #8
On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote:
> On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote:
> > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote:
> [...]
> > > Patch 3 is completely unnecessary: the null key is only used to
> > > salt the session and is not required to be resident while the
> > > session is used (so can be flushed after session creation)
> > > therefore keeping it around serves no purpose once the session is
> > > created and simply clutters up the TPM volatile handle slots. (I
> > > don't know of a case where we use all the slots in a kernel
> > > operation, but since we don't need it lets not find out when we get
> > > one).  So I advise dropping patch 3.
> > 
> > Let's go this through just to check I'm understanding.
> > 
> > Holding null key had radical effect on boot time: it cut it down by
> > 5 secons down to 15 seconds:
> > 
> > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/
> > 
> > Then in subsequent version I implemented lazy auth session and boot
> > time went down to 9.7 seconds.
> > 
> > So is the point you're trying to make that since auth session is 
> > already held as long as we can and they flushed in synchronous
> > point too, I can just as well drop patch 3?
>
> Yes, because the null key is only used in session generation which is
> now lazy, it adds or subtracts nothing from the timings.  When you're
> forced to flush the session, the null key goes too, so you again have
> to restore it from the context.  When you can keep the session you
> don't need the null key because you're not regenerating it.

Yeah, OK, then we're in sync with this. It's evolutionary cruft.

Just had to check that the logic matches how I projected your earlier
comment because these are sensitive changes.

BR, Jarkko
Jarkko Sakkinen Sept. 24, 2024, 5:26 p.m. UTC | #9
On Tue Sep 24, 2024 at 7:36 PM EEST, Jarkko Sakkinen wrote:
> On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote:
> > On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote:
> > > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote:
> > [...]
> > > > Patch 3 is completely unnecessary: the null key is only used to
> > > > salt the session and is not required to be resident while the
> > > > session is used (so can be flushed after session creation)
> > > > therefore keeping it around serves no purpose once the session is
> > > > created and simply clutters up the TPM volatile handle slots. (I
> > > > don't know of a case where we use all the slots in a kernel
> > > > operation, but since we don't need it lets not find out when we get
> > > > one).  So I advise dropping patch 3.
> > > 
> > > Let's go this through just to check I'm understanding.
> > > 
> > > Holding null key had radical effect on boot time: it cut it down by
> > > 5 secons down to 15 seconds:
> > > 
> > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/
> > > 
> > > Then in subsequent version I implemented lazy auth session and boot
> > > time went down to 9.7 seconds.
> > > 
> > > So is the point you're trying to make that since auth session is 
> > > already held as long as we can and they flushed in synchronous
> > > point too, I can just as well drop patch 3?
> >
> > Yes, because the null key is only used in session generation which is
> > now lazy, it adds or subtracts nothing from the timings.  When you're
> > forced to flush the session, the null key goes too, so you again have
> > to restore it from the context.  When you can keep the session you
> > don't need the null key because you're not regenerating it.
>
> Yeah, OK, then we're in sync with this. It's evolutionary cruft.
>
> Just had to check that the logic matches how I projected your earlier
> comment because these are sensitive changes.

I'm definitely going keeep 1/5 and 2/5 as they are still bug fixes.

So they will appear in v6 unchanged and perf fixes (which are not
functional fixes) should not be built on top of broken code.

BR, Jarkko
Jarkko Sakkinen Sept. 24, 2024, 5:28 p.m. UTC | #10
On Tue Sep 24, 2024 at 8:26 PM EEST, Jarkko Sakkinen wrote:
> On Tue Sep 24, 2024 at 7:36 PM EEST, Jarkko Sakkinen wrote:
> > On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote:
> > > On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote:
> > > > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote:
> > > [...]
> > > > > Patch 3 is completely unnecessary: the null key is only used to
> > > > > salt the session and is not required to be resident while the
> > > > > session is used (so can be flushed after session creation)
> > > > > therefore keeping it around serves no purpose once the session is
> > > > > created and simply clutters up the TPM volatile handle slots. (I
> > > > > don't know of a case where we use all the slots in a kernel
> > > > > operation, but since we don't need it lets not find out when we get
> > > > > one).  So I advise dropping patch 3.
> > > > 
> > > > Let's go this through just to check I'm understanding.
> > > > 
> > > > Holding null key had radical effect on boot time: it cut it down by
> > > > 5 secons down to 15 seconds:
> > > > 
> > > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/
> > > > 
> > > > Then in subsequent version I implemented lazy auth session and boot
> > > > time went down to 9.7 seconds.
> > > > 
> > > > So is the point you're trying to make that since auth session is 
> > > > already held as long as we can and they flushed in synchronous
> > > > point too, I can just as well drop patch 3?
> > >
> > > Yes, because the null key is only used in session generation which is
> > > now lazy, it adds or subtracts nothing from the timings.  When you're
> > > forced to flush the session, the null key goes too, so you again have
> > > to restore it from the context.  When you can keep the session you
> > > don't need the null key because you're not regenerating it.
> >
> > Yeah, OK, then we're in sync with this. It's evolutionary cruft.
> >
> > Just had to check that the logic matches how I projected your earlier
> > comment because these are sensitive changes.
>
> I'm definitely going keeep 1/5 and 2/5 as they are still bug fixes.
>
> So they will appear in v6 unchanged and perf fixes (which are not
> functional fixes) should not be built on top of broken code.

And 3/5 is actually required because it saves of doing flush during
the boot if nothing else.

We are wasting more time so I don't want to waste it for nothing.

BR, Jarkko
Jarkko Sakkinen Sept. 24, 2024, 6:01 p.m. UTC | #11
On Tue Sep 24, 2024 at 8:28 PM EEST, Jarkko Sakkinen wrote:
> On Tue Sep 24, 2024 at 8:26 PM EEST, Jarkko Sakkinen wrote:
> > On Tue Sep 24, 2024 at 7:36 PM EEST, Jarkko Sakkinen wrote:
> > > On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote:
> > > > On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote:
> > > > > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote:
> > > > [...]
> > > > > > Patch 3 is completely unnecessary: the null key is only used to
> > > > > > salt the session and is not required to be resident while the
> > > > > > session is used (so can be flushed after session creation)
> > > > > > therefore keeping it around serves no purpose once the session is
> > > > > > created and simply clutters up the TPM volatile handle slots. (I
> > > > > > don't know of a case where we use all the slots in a kernel
> > > > > > operation, but since we don't need it lets not find out when we get
> > > > > > one).  So I advise dropping patch 3.
> > > > > 
> > > > > Let's go this through just to check I'm understanding.
> > > > > 
> > > > > Holding null key had radical effect on boot time: it cut it down by
> > > > > 5 secons down to 15 seconds:
> > > > > 
> > > > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/
> > > > > 
> > > > > Then in subsequent version I implemented lazy auth session and boot
> > > > > time went down to 9.7 seconds.
> > > > > 
> > > > > So is the point you're trying to make that since auth session is 
> > > > > already held as long as we can and they flushed in synchronous
> > > > > point too, I can just as well drop patch 3?
> > > >
> > > > Yes, because the null key is only used in session generation which is
> > > > now lazy, it adds or subtracts nothing from the timings.  When you're
> > > > forced to flush the session, the null key goes too, so you again have
> > > > to restore it from the context.  When you can keep the session you
> > > > don't need the null key because you're not regenerating it.
> > >
> > > Yeah, OK, then we're in sync with this. It's evolutionary cruft.
> > >
> > > Just had to check that the logic matches how I projected your earlier
> > > comment because these are sensitive changes.
> >
> > I'm definitely going keeep 1/5 and 2/5 as they are still bug fixes.
> >
> > So they will appear in v6 unchanged and perf fixes (which are not
> > functional fixes) should not be built on top of broken code.
>
> And 3/5 is actually required because it saves of doing flush during
> the boot if nothing else.
>
> We are wasting more time so I don't want to waste it for nothing.

Anything beyong 50 ms matters and that flush certainly costs more than
that. As I already stated in earlier version, we need to find more of
these "50 ms and 100 ms there sites.

The functional fixes are required because perf fix is always *less
critical* than perf fix.

Please pay more attention to proper error rollback next time, that's
all I can say on that. It's not my fault that it is broken.

BR, Jarkko
Mimi Zohar Oct. 1, 2024, 6:10 p.m. UTC | #12
On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote:
> On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote:
> > This patch set aims to fix:
> > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> > 
> > The baseline for the series is the v6.11 tag.
> > 
> > v4:
> > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> > v3:
> > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> > v2:
> > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> > v1:
> > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
> > 
> > Jarkko Sakkinen (5):
> >   tpm: Return on tpm2_create_null_primary() failure
> >   tpm: Implement tpm2_load_null() rollback
> >   tpm: flush the null key only when /dev/tpm0 is accessed
> >   tpm: Allocate chip->auth in tpm2_start_auth_session()
> >   tpm: flush the auth session only when /dev/tpm0 is open
> > 
> >  drivers/char/tpm/tpm-chip.c       |  14 ++++
> >  drivers/char/tpm/tpm-dev-common.c |   8 +++
> >  drivers/char/tpm/tpm-interface.c  |  10 ++-
> >  drivers/char/tpm/tpm2-cmd.c       |   3 +
> >  drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++------------
> >  include/linux/tpm.h               |   2 +
> >  6 files changed, 102 insertions(+), 44 deletions(-)
> 
> 
> Roberto, James, speaking of digest cache. This patch set has no aim to
> fix those issues but I do believe that it should improve also that 
> feature.
> 
> If I don't get soon patch reviews for the patch set, I'll pick the 2nd
> best option: disable bus encryption on all architectures including x86
> and ARM64 (being by default on).
> 
> It's a force majeure situation. I know this would sort out the issue
> but I really cannot send these as a pull request with zero reviewe-by's.
> 
> I expect this to be closed by tomorrow.

Jarkko, sorry to be so late to this discussion.  The bus HMAC/encryption really
impacts IMA as well.  Even with this patch set, it's slow.  My preference would
be to disable bus encryption on all architectures until there is a boot/runtime
option allowing it to be disabled for IMA as discussed in the other thread.

In the other thread, I also mentioned that the Kconfig is incorrectly worded. 
The performance degradation is not limited to encryption, but the HMAC itself. 
Please change "Saying Y here adds some encryption overhead to all kernel to TPM
transactions." to "Saying Y here adds overhead to all kernel to TPM
transactions."

thanks,

Mimi
Stefan Berger Oct. 3, 2024, 3:14 p.m. UTC | #13
On 9/21/24 8:08 AM, Jarkko Sakkinen wrote:
> This patch set aims to fix:
> https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> 
> The baseline for the series is the v6.11 tag.

I was testing this with 6.12-rc1 on ppc64/kvm + vtpm boot time from 
pressing return on grub until login prompt appears while using an IMA 
measure policy:

with HMAC2: 36s
with HMAC2+this series: 29s
without HMAC2: 28s

Looks good to me, though using a hardware TPM would probably be more 
critical in this type of measurement.

> 
> v4:
> https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> v3:
> https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> v2:
> https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> v1:
> https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
> 
> Jarkko Sakkinen (5):
>    tpm: Return on tpm2_create_null_primary() failure
>    tpm: Implement tpm2_load_null() rollback
>    tpm: flush the null key only when /dev/tpm0 is accessed
>    tpm: Allocate chip->auth in tpm2_start_auth_session()
>    tpm: flush the auth session only when /dev/tpm0 is open
> 
>   drivers/char/tpm/tpm-chip.c       |  14 ++++
>   drivers/char/tpm/tpm-dev-common.c |   8 +++
>   drivers/char/tpm/tpm-interface.c  |  10 ++-
>   drivers/char/tpm/tpm2-cmd.c       |   3 +
>   drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++------------
>   include/linux/tpm.h               |   2 +
>   6 files changed, 102 insertions(+), 44 deletions(-)
>
Jarkko Sakkinen Oct. 7, 2024, 11:45 p.m. UTC | #14
On Tue, 2024-10-01 at 14:10 -0400, Mimi Zohar wrote:
> On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote:
> > On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote:
> > > This patch set aims to fix:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> > > 
> > > The baseline for the series is the v6.11 tag.
> > > 
> > > v4:
> > > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> > > v3:
> > > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> > > v2:
> > > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> > > v1:
> > > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
> > > 
> > > Jarkko Sakkinen (5):
> > >   tpm: Return on tpm2_create_null_primary() failure
> > >   tpm: Implement tpm2_load_null() rollback
> > >   tpm: flush the null key only when /dev/tpm0 is accessed
> > >   tpm: Allocate chip->auth in tpm2_start_auth_session()
> > >   tpm: flush the auth session only when /dev/tpm0 is open
> > > 
> > >  drivers/char/tpm/tpm-chip.c       |  14 ++++
> > >  drivers/char/tpm/tpm-dev-common.c |   8 +++
> > >  drivers/char/tpm/tpm-interface.c  |  10 ++-
> > >  drivers/char/tpm/tpm2-cmd.c       |   3 +
> > >  drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++------
> > > ------
> > >  include/linux/tpm.h               |   2 +
> > >  6 files changed, 102 insertions(+), 44 deletions(-)
> > 
> > 
> > Roberto, James, speaking of digest cache. This patch set has no aim
> > to
> > fix those issues but I do believe that it should improve also that 
> > feature.
> > 
> > If I don't get soon patch reviews for the patch set, I'll pick the
> > 2nd
> > best option: disable bus encryption on all architectures including
> > x86
> > and ARM64 (being by default on).
> > 
> > It's a force majeure situation. I know this would sort out the
> > issue
> > but I really cannot send these as a pull request with zero reviewe-
> > by's.
> > 
> > I expect this to be closed by tomorrow.
> 
> Jarkko, sorry to be so late to this discussion.  The bus
> HMAC/encryption really
> impacts IMA as well.  Even with this patch set, it's slow.  My
> preference would
> be to disable bus encryption on all architectures until there is a
> boot/runtime
> option allowing it to be disabled for IMA as discussed in the other
> thread.

No worries, I was getting nervous because of a job switch, now
I have time since cannot move this forward for week or two anyway
:-)

I'm totally +1 to make bus encyption opt-in instead of opt-out.
It's just not there yet.

My fixes fix one use case, i.e. the boot process for AMD, so in
that sense they are totally legit. But it is pretty clear by now
that tons of similar patches and small tweaks will be required.

As it is my 2nd work week, I can implement such patch, *next
week*. Up until that there is time to give any feedback.

> 
> In the other thread, I also mentioned that the Kconfig is incorrectly
> worded. 
> The performance degradation is not limited to encryption, but the
> HMAC itself. 
> Please change "Saying Y here adds some encryption overhead to all
> kernel to TPM
> transactions." to "Saying Y here adds overhead to all kernel to TPM
> transactions."

I'll keep this in mind!

I'd prefer to do a single patch set probably with my previous fixes
and this, as they are already tested by the reporter anyway and pile
this as a new patch on top. I.e. have basically all that I'd put to
the next PR.

> 
> thanks,
> 
> Mimi

BR, Jarkko
Jarkko Sakkinen Oct. 7, 2024, 11:49 p.m. UTC | #15
On Thu, 2024-10-03 at 11:14 -0400, Stefan Berger wrote:
> 
> 
> On 9/21/24 8:08 AM, Jarkko Sakkinen wrote:
> > This patch set aims to fix:
> > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> > 
> > The baseline for the series is the v6.11 tag.
> 
> I was testing this with 6.12-rc1 on ppc64/kvm + vtpm boot time from 
> pressing return on grub until login prompt appears while using an IMA
> measure policy:
> 
> with HMAC2: 36s
> with HMAC2+this series: 29s
> without HMAC2: 28s
> 
> Looks good to me, though using a hardware TPM would probably be more 
> critical in this type of measurement.

Yep, exactly. QEMU fails here...

BR, Jarkko
Jarkko Sakkinen Oct. 11, 2024, 2:06 p.m. UTC | #16
On Sat, 2024-09-21 at 15:08 +0300, Jarkko Sakkinen wrote:
> This patch set aims to fix:
> https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> 
> The baseline for the series is the v6.11 tag.
> 
> v4:
> https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> v3:
> https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> v2:
> https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> v1:
> https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
> 
> Jarkko Sakkinen (5):
>   tpm: Return on tpm2_create_null_primary() failure
>   tpm: Implement tpm2_load_null() rollback
>   tpm: flush the null key only when /dev/tpm0 is accessed
>   tpm: Allocate chip->auth in tpm2_start_auth_session()
>   tpm: flush the auth session only when /dev/tpm0 is open
> 
>  drivers/char/tpm/tpm-chip.c       |  14 ++++
>  drivers/char/tpm/tpm-dev-common.c |   8 +++
>  drivers/char/tpm/tpm-interface.c  |  10 ++-
>  drivers/char/tpm/tpm2-cmd.c       |   3 +
>  drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++----------
> --
>  include/linux/tpm.h               |   2 +
>  6 files changed, 102 insertions(+), 44 deletions(-)

The summarize some discussions:

1. I'll address Stefan's remarks.
2. We know that these patches address the desktop boot.
3. IMA is too slow => add a boot option for IMA default off. I.e.
   IMA will not use the feature unless you specifically ask.
4. Random generation can be optimized a lot with or without
   encryption. Not sure if  I have time to do ths right now
   but I have already patch planned for this.

What is blocking me is the James' request to not include
functional fixes. The problem with that is that if comply
to that request I will have to postpone all the performacne
fixes and send a patch set with only functional fixes and
go all review rounds with that before moving forward.

This is just how priorities go in kernel and doing by the
book. Is that really necessary?

Since I've just started in a new job any patches can be
expected earliest next week. That's why I was rushing with
the patch set in the first place because I knew that there
will be otherwise a few week delay but we'll get there :-)

BR, Jarkko
Roberto Sassu Oct. 11, 2024, 4:10 p.m. UTC | #17
On Fri, 2024-10-11 at 17:06 +0300, Jarkko Sakkinen wrote:
> On Sat, 2024-09-21 at 15:08 +0300, Jarkko Sakkinen wrote:
> > This patch set aims to fix:
> > https://bugzilla.kernel.org/show_bug.cgi?id=219229.
> > 
> > The baseline for the series is the v6.11 tag.
> > 
> > v4:
> > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/
> > v3:
> > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/
> > v2:
> > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/
> > v1:
> > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/
> > 
> > Jarkko Sakkinen (5):
> >   tpm: Return on tpm2_create_null_primary() failure
> >   tpm: Implement tpm2_load_null() rollback
> >   tpm: flush the null key only when /dev/tpm0 is accessed
> >   tpm: Allocate chip->auth in tpm2_start_auth_session()
> >   tpm: flush the auth session only when /dev/tpm0 is open
> > 
> >  drivers/char/tpm/tpm-chip.c       |  14 ++++
> >  drivers/char/tpm/tpm-dev-common.c |   8 +++
> >  drivers/char/tpm/tpm-interface.c  |  10 ++-
> >  drivers/char/tpm/tpm2-cmd.c       |   3 +
> >  drivers/char/tpm/tpm2-sessions.c  | 109 ++++++++++++++++++----------
> > --
> >  include/linux/tpm.h               |   2 +
> >  6 files changed, 102 insertions(+), 44 deletions(-)
> 
> The summarize some discussions:
> 
> 1. I'll address Stefan's remarks.
> 2. We know that these patches address the desktop boot.
> 3. IMA is too slow => add a boot option for IMA default off. I.e.
>    IMA will not use the feature unless you specifically ask.

Initially, I thought that maybe it would not be good to have an event
log with unmodified and altered measurement entries. Then, I tried to
think if we can really prevent an active interposer from injecting
arbitrary PCR extends and pretending that those events actually
happened.

If I understood James's cover letter correctly, the kernel can detect
whether a TPM reset occurred, but not that a PCR extend occurred (maybe
with a shadow PCR?).

Second point, do we really want to take the responsibility to disable
the protection on behalf of users? Maybe a better choice is to let them
consciously disable HMAC protection.

So, maybe we should keep the HMAC protection enabled, and if the number
of PCR extends is above a certain threshold, we can print a warning
message in the kernel log.

Roberto

> 4. Random generation can be optimized a lot with or without
>    encryption. Not sure if  I have time to do ths right now
>    but I have already patch planned for this.
> 
> What is blocking me is the James' request to not include
> functional fixes. The problem with that is that if comply
> to that request I will have to postpone all the performacne
> fixes and send a patch set with only functional fixes and
> go all review rounds with that before moving forward.
> 
> This is just how priorities go in kernel and doing by the
> book. Is that really necessary?
> 
> Since I've just started in a new job any patches can be
> expected earliest next week. That's why I was rushing with
> the patch set in the first place because I knew that there
> will be otherwise a few week delay but we'll get there :-)
> 
> BR, Jarkko
>
Jarkko Sakkinen Oct. 11, 2024, 4:25 p.m. UTC | #18
On Fri, 2024-10-11 at 18:10 +0200, Roberto Sassu wrote:
> Initially, I thought that maybe it would not be good to have an event
> log with unmodified and altered measurement entries. Then, I tried to
> think if we can really prevent an active interposer from injecting
> arbitrary PCR extends and pretending that those events actually
> happened.
> 
> If I understood James's cover letter correctly, the kernel can detect
> whether a TPM reset occurred, but not that a PCR extend occurred
> (maybe
> with a shadow PCR?).

We can detect TPM reset indirectly. I.e. null seed re-randomizes
per reset.

> 
> Second point, do we really want to take the responsibility to disable
> the protection on behalf of users? Maybe a better choice is to let
> them
> consciously disable HMAC protection.

So when IMA is not used already with these fixes we get good
results. And for tpm2_get_random() we can make the algorithm
smarter. All in all we have good path ongoing for "desktop
use case" that I would keep thing way there are or at least
postpone any major decisions just a bit.

For server/IMA use case I'll add a boot parameter it can be
either on or off by default, I will state that in the commit
message and we'll go from there.

> 
> So, maybe we should keep the HMAC protection enabled, and if the
> number
> of PCR extends is above a certain threshold, we can print a warning
> message in the kernel log.
> 
> Roberto

BR, Jarkko
Jarkko Sakkinen Oct. 12, 2024, 10:56 a.m. UTC | #19
On Fri, 2024-10-11 at 19:25 +0300, Jarkko Sakkinen wrote:
> On Fri, 2024-10-11 at 18:10 +0200, Roberto Sassu wrote:
> > Initially, I thought that maybe it would not be good to have an
> > event
> > log with unmodified and altered measurement entries. Then, I tried
> > to
> > think if we can really prevent an active interposer from injecting
> > arbitrary PCR extends and pretending that those events actually
> > happened.
> > 
> > If I understood James's cover letter correctly, the kernel can
> > detect
> > whether a TPM reset occurred, but not that a PCR extend occurred
> > (maybe
> > with a shadow PCR?).
> 
> We can detect TPM reset indirectly. I.e. null seed re-randomizes
> per reset.
> 
> > 
> > Second point, do we really want to take the responsibility to
> > disable
> > the protection on behalf of users? Maybe a better choice is to let
> > them
> > consciously disable HMAC protection.
> 
> So when IMA is not used already with these fixes we get good
> results. And for tpm2_get_random() we can make the algorithm
> smarter. All in all we have good path ongoing for "desktop
> use case" that I would keep thing way there are or at least
> postpone any major decisions just a bit.
> 
> For server/IMA use case I'll add a boot parameter it can be
> either on or off by default, I will state that in the commit
> message and we'll go from there.

Up until legit fixes are place distributors can easily disable
the feature. It would be worse if TCG_TPM2_HMAC did not exist.

So I think it is better to focus on doing right things right,
since the feature itself is useful objectively, and make sure
that those fixes bring the wanted results.

BR, Jarkko
Mimi Zohar Oct. 14, 2024, 11:45 a.m. UTC | #20
On Sat, 2024-10-12 at 13:56 +0300, Jarkko Sakkinen wrote:
> On Fri, 2024-10-11 at 19:25 +0300, Jarkko Sakkinen wrote:
> > On Fri, 2024-10-11 at 18:10 +0200, Roberto Sassu wrote:
> > > Initially, I thought that maybe it would not be good to have an
> > > event
> > > log with unmodified and altered measurement entries. Then, I tried
> > > to
> > > think if we can really prevent an active interposer from injecting
> > > arbitrary PCR extends and pretending that those events actually
> > > happened.
> > > 
> > > If I understood James's cover letter correctly, the kernel can
> > > detect
> > > whether a TPM reset occurred, but not that a PCR extend occurred
> > > (maybe
> > > with a shadow PCR?).
> > 
> > We can detect TPM reset indirectly. I.e. null seed re-randomizes
> > per reset.
> > 
> > > 
> > > Second point, do we really want to take the responsibility to
> > > disable
> > > the protection on behalf of users? Maybe a better choice is to let
> > > them
> > > consciously disable HMAC protection.
> > 
> > So when IMA is not used already with these fixes we get good
> > results. And for tpm2_get_random() we can make the algorithm
> > smarter. All in all we have good path ongoing for "desktop
> > use case" that I would keep thing way there are or at least
> > postpone any major decisions just a bit.
> > 
> > For server/IMA use case I'll add a boot parameter it can be
> > either on or off by default, I will state that in the commit
> > message and we'll go from there.

Sounds good.

> 
> Up until legit fixes are place distributors can easily disable
> the feature. It would be worse if TCG_TPM2_HMAC did not exist.
> 
> So I think it is better to focus on doing right things right,
> since the feature itself is useful objectively, and make sure
> that those fixes bring the wanted results.

Are you backtracking on having a boot parameter here specifically to turn on/off
HMAC encryption for IMA?

Mimi
Jarkko Sakkinen Oct. 14, 2024, 12:34 p.m. UTC | #21
On Mon, 2024-10-14 at 07:45 -0400, Mimi Zohar wrote:
> > > For server/IMA use case I'll add a boot parameter it can be
> > > either on or off by default, I will state that in the commit
> > > message and we'll go from there.
> 
> Sounds good.

But only after this patch set lands. I gave this a thought and since
this patch set is specifically for a specific Bugzilla bug that it
closes, I have no interest to increase its scope.

> 
> > 
> > Up until legit fixes are place distributors can easily disable
> > the feature. It would be worse if TCG_TPM2_HMAC did not exist.
> > 
> > So I think it is better to focus on doing right things right,
> > since the feature itself is useful objectively, and make sure
> > that those fixes bring the wanted results.
> 
> Are you backtracking on having a boot parameter here specifically to
> turn on/off
> HMAC encryption for IMA?

I'm not really sure yet but obviously any change goes through review.

Also fastest route is to send your own RFC's to IMA specific issue.
For me it will take some time (post this patch set).

> 
> Mimi
> 
> 

BR, Jarkko

BR, Jarkko
Mimi Zohar Oct. 15, 2024, 8:08 p.m. UTC | #22
On Mon, 2024-10-14 at 15:34 +0300, Jarkko Sakkinen wrote:
> On Mon, 2024-10-14 at 07:45 -0400, Mimi Zohar wrote:
> > > > For server/IMA use case I'll add a boot parameter it can be
> > > > either on or off by default, I will state that in the commit
> > > > message and we'll go from there.
> > 
> > Sounds good.
> 
> But only after this patch set lands. I gave this a thought and since
> this patch set is specifically for a specific Bugzilla bug that it
> closes, I have no interest to increase its scope.

Prior to your performance improvement patch set it took >10 minutes to boot,
when it succeeded booting.  Now on Fedora 40 with "ima_policy=tcb" on the boot
command line, it's taking ~3 minutes to boot.  Do you really think that is
acceptable?!

> > 
> > > 
> > > Up until legit fixes are place distributors can easily disable
> > > the feature. It would be worse if TCG_TPM2_HMAC did not exist.
> > > 
> > > So I think it is better to focus on doing right things right,
> > > since the feature itself is useful objectively, and make sure
> > > that those fixes bring the wanted results.

The right thing would have been to listen to my concerns when this was initially
being discussed.  The right thing wasn't enabling TCG_TPM2_HMAC by default.

> > 
> > Are you backtracking on having a boot parameter here specifically to
> > turn on/off
> > HMAC encryption for IMA?
> 
> I'm not really sure yet but obviously any change goes through review.
> 
> Also fastest route is to send your own RFC's to IMA specific issue.
> For me it will take some time (post this patch set).

Done.  The patch applies cleanly with/without the TPM performance improvement
patch set.
https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/

Mimi
Jarkko Sakkinen Oct. 15, 2024, 10:14 p.m. UTC | #23
On Tue Oct 15, 2024 at 11:08 PM EEST, Mimi Zohar wrote:
> > > > since the feature itself is useful objectively, and make sure
> > > > that those fixes bring the wanted results.
>
> The right thing would have been to listen to my concerns when this was initially
> being discussed.  The right thing wasn't enabling TCG_TPM2_HMAC by default.

This is debatable as for laptops and desktops having hard drive
encryption do benefit with this. If systemd hadn't added
systemd-cryptenroll I would agree with this. I learned about this
feature two years after its inception in that project, so we needed to
address this as a priority (I did not and will not follow systemd
development proactively, as don't have time for that).

I feel more safe using my laptop with the feature in place at least.

Besides, it is complicated feature enough that it would have been
impossible ever "zero glitch" land it. I don't think there is any
rigid "data centers first" rule really, except maybe for those
businesses that run data centers (and I'm not one of those
businesses).

BR, Jarkko