Message ID | 20240921120811.1264985-1-jarkko@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | Lazy flush for the auth session | expand |
Dear Jarkko, Thank you for working on this and your patches. Am 21.09.24 um 14:08 schrieb Jarkko Sakkinen: > This patch set aims to fix: > https://bugzilla.kernel.org/show_bug.cgi?id=219229. If I am not mistaken this is about reducing the boot time, right? It’d be great if you documented the numbers in the commit messages. Kind regards, Paul
On Sat Sep 21, 2024 at 3:36 PM EEST, Paul Menzel wrote: > Dear Jarkko, > > > Thank you for working on this and your patches. > > Am 21.09.24 um 14:08 schrieb Jarkko Sakkinen: > > This patch set aims to fix: > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > If I am not mistaken this is about reducing the boot time, right? It’d > be great if you documented the numbers in the commit messages. So what you're asking is already documented by: 1. https://bugzilla.kernel.org/show_bug.cgi?id=219229 2. Tested-by I added lore link to the bugzilla bug, in order to address your comment. I don't think my "QEMU numbers" will bring any value here tbh. BR, Jarkko
On Sat Sep 21, 2024 at 4:13 PM EEST, Jarkko Sakkinen wrote: > On Sat Sep 21, 2024 at 3:36 PM EEST, Paul Menzel wrote: > > Dear Jarkko, > > > > > > Thank you for working on this and your patches. > > > > Am 21.09.24 um 14:08 schrieb Jarkko Sakkinen: > > > This patch set aims to fix: > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > > > If I am not mistaken this is about reducing the boot time, right? It’d > > be great if you documented the numbers in the commit messages. > > So what you're asking is already documented by: > > 1. https://bugzilla.kernel.org/show_bug.cgi?id=219229 > 2. Tested-by > > I added lore link to the bugzilla bug, in order to address your comment. ... to the email which contains the figures. 20 s -> 8.7 s BR, Jarkko
On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote: > This patch set aims to fix: > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > The baseline for the series is the v6.11 tag. > > v4: > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > v3: > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > v2: > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > v1: > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > Jarkko Sakkinen (5): > tpm: Return on tpm2_create_null_primary() failure > tpm: Implement tpm2_load_null() rollback > tpm: flush the null key only when /dev/tpm0 is accessed > tpm: Allocate chip->auth in tpm2_start_auth_session() > tpm: flush the auth session only when /dev/tpm0 is open > > drivers/char/tpm/tpm-chip.c | 14 ++++ > drivers/char/tpm/tpm-dev-common.c | 8 +++ > drivers/char/tpm/tpm-interface.c | 10 ++- > drivers/char/tpm/tpm2-cmd.c | 3 + > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++------------ > include/linux/tpm.h | 2 + > 6 files changed, 102 insertions(+), 44 deletions(-) Roberto, James, speaking of digest cache. This patch set has no aim to fix those issues but I do believe that it should improve also that feature. If I don't get soon patch reviews for the patch set, I'll pick the 2nd best option: disable bus encryption on all architectures including x86 and ARM64 (being by default on). It's a force majeure situation. I know this would sort out the issue but I really cannot send these as a pull request with zero reviewe-by's. I expect this to be closed by tomorrow. BR, Jarkko
On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote: > On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote: > > This patch set aims to fix: > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > > > The baseline for the series is the v6.11 tag. > > > > v4: > > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > > v3: > > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > > v2: > > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > > v1: > > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > > > Jarkko Sakkinen (5): > > tpm: Return on tpm2_create_null_primary() failure > > tpm: Implement tpm2_load_null() rollback > > tpm: flush the null key only when /dev/tpm0 is accessed > > tpm: Allocate chip->auth in tpm2_start_auth_session() > > tpm: flush the auth session only when /dev/tpm0 is open > > > > drivers/char/tpm/tpm-chip.c | 14 ++++ > > drivers/char/tpm/tpm-dev-common.c | 8 +++ > > drivers/char/tpm/tpm-interface.c | 10 ++- > > drivers/char/tpm/tpm2-cmd.c | 3 + > > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++-------- > > ---- > > include/linux/tpm.h | 2 + > > 6 files changed, 102 insertions(+), 44 deletions(-) > > > Roberto, James, speaking of digest cache. This patch set has no aim > to fix those issues but I do believe that it should improve also that > feature. > > If I don't get soon patch reviews for the patch set, I'll pick the > 2nd best option: disable bus encryption on all architectures > including x86 and ARM64 (being by default on). > > It's a force majeure situation. I know this would sort out the issue > but I really cannot send these as a pull request with zero reviewe- > by's. > > I expect this to be closed by tomorrow. Hey come on, you knew I was running plumbers last week so I had all the lead up and teardown stuff to do as well. I'm only just digging through accumulated email. Patches 1-2 are fully irrelevant to the bug, so I ignored them on the grounds that improvement to the error flow could be done through the normal patch process Patch 3 is completely unnecessary: the null key is only used to salt the session and is not required to be resident while the session is used (so can be flushed after session creation) therefore keeping it around serves no purpose once the session is created and simply clutters up the TPM volatile handle slots. (I don't know of a case where we use all the slots in a kernel operation, but since we don't need it lets not find out when we get one). So I advise dropping patch 3. I've reviewed 4 and 5. Regards, James
On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote: > On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote: > > On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote: > > > This patch set aims to fix: > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > > > > > The baseline for the series is the v6.11 tag. > > > > > > v4: > > > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > > > v3: > > > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > > > v2: > > > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > > > v1: > > > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > > > > > Jarkko Sakkinen (5): > > > tpm: Return on tpm2_create_null_primary() failure > > > tpm: Implement tpm2_load_null() rollback > > > tpm: flush the null key only when /dev/tpm0 is accessed > > > tpm: Allocate chip->auth in tpm2_start_auth_session() > > > tpm: flush the auth session only when /dev/tpm0 is open > > > > > > drivers/char/tpm/tpm-chip.c | 14 ++++ > > > drivers/char/tpm/tpm-dev-common.c | 8 +++ > > > drivers/char/tpm/tpm-interface.c | 10 ++- > > > drivers/char/tpm/tpm2-cmd.c | 3 + > > > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++-------- > > > ---- > > > include/linux/tpm.h | 2 + > > > 6 files changed, 102 insertions(+), 44 deletions(-) > > > > > > Roberto, James, speaking of digest cache. This patch set has no aim > > to fix those issues but I do believe that it should improve also that > > feature. > > > > If I don't get soon patch reviews for the patch set, I'll pick the > > 2nd best option: disable bus encryption on all architectures > > including x86 and ARM64 (being by default on). > > > > It's a force majeure situation. I know this would sort out the issue > > but I really cannot send these as a pull request with zero reviewe- > > by's. > > > > I expect this to be closed by tomorrow. > > Hey come on, you knew I was running plumbers last week so I had all the > lead up and teardown stuff to do as well. I'm only just digging > through accumulated email. Fair enough, I actually do not want to disable the feature. That was my main concern here. Now if we get this fixed we might be able to revisit earlier decisions on defconfig and widen the support eventually, not shrink it. > > Patches 1-2 are fully irrelevant to the bug, so I ignored them on the > grounds that improvement to the error flow could be done through the > normal patch process Hmm.. I'll revisit this for v6. Not sure what to say on this yet because I need to address the other remarks and based on that reflect. So might drop or keep them but not 100% sure yet. > Patch 3 is completely unnecessary: the null key is only used to salt > the session and is not required to be resident while the session is > used (so can be flushed after session creation) therefore keeping it > around serves no purpose once the session is created and simply > clutters up the TPM volatile handle slots. (I don't know of a case > where we use all the slots in a kernel operation, but since we don't > need it lets not find out when we get one). So I advise dropping patch > 3. Let's go this through just to check I'm understanding. Holding null key had radical effect on boot time: it cut it down by 5 secons down to 15 seconds: https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/ Then in subsequent version I implemented lazy auth session and boot time went down to 9.7 seconds. So is the point you're trying to make that since auth session is already held as long as we can and they flushed in synchronous point too, I can just as well drop patch 3? I think I reach your point but just want to check that I do it for the matching reasons. It is evolutionary cruft in the patch set :-) BR, Jarkko
On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote: > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote: [...] > > Patch 3 is completely unnecessary: the null key is only used to > > salt the session and is not required to be resident while the > > session is used (so can be flushed after session creation) > > therefore keeping it around serves no purpose once the session is > > created and simply clutters up the TPM volatile handle slots. (I > > don't know of a case where we use all the slots in a kernel > > operation, but since we don't need it lets not find out when we get > > one). So I advise dropping patch 3. > > Let's go this through just to check I'm understanding. > > Holding null key had radical effect on boot time: it cut it down by > 5 secons down to 15 seconds: > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/ > > Then in subsequent version I implemented lazy auth session and boot > time went down to 9.7 seconds. > > So is the point you're trying to make that since auth session is > already held as long as we can and they flushed in synchronous > point too, I can just as well drop patch 3? Yes, because the null key is only used in session generation which is now lazy, it adds or subtracts nothing from the timings. When you're forced to flush the session, the null key goes too, so you again have to restore it from the context. When you can keep the session you don't need the null key because you're not regenerating it. > I think I reach your point but just want to check that I do it > for the matching reasons. It is evolutionary cruft in the patch > set :-) Regards, James
On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote: > On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote: > > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote: > [...] > > > Patch 3 is completely unnecessary: the null key is only used to > > > salt the session and is not required to be resident while the > > > session is used (so can be flushed after session creation) > > > therefore keeping it around serves no purpose once the session is > > > created and simply clutters up the TPM volatile handle slots. (I > > > don't know of a case where we use all the slots in a kernel > > > operation, but since we don't need it lets not find out when we get > > > one). So I advise dropping patch 3. > > > > Let's go this through just to check I'm understanding. > > > > Holding null key had radical effect on boot time: it cut it down by > > 5 secons down to 15 seconds: > > > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/ > > > > Then in subsequent version I implemented lazy auth session and boot > > time went down to 9.7 seconds. > > > > So is the point you're trying to make that since auth session is > > already held as long as we can and they flushed in synchronous > > point too, I can just as well drop patch 3? > > Yes, because the null key is only used in session generation which is > now lazy, it adds or subtracts nothing from the timings. When you're > forced to flush the session, the null key goes too, so you again have > to restore it from the context. When you can keep the session you > don't need the null key because you're not regenerating it. Yeah, OK, then we're in sync with this. It's evolutionary cruft. Just had to check that the logic matches how I projected your earlier comment because these are sensitive changes. BR, Jarkko
On Tue Sep 24, 2024 at 7:36 PM EEST, Jarkko Sakkinen wrote: > On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote: > > On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote: > > > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote: > > [...] > > > > Patch 3 is completely unnecessary: the null key is only used to > > > > salt the session and is not required to be resident while the > > > > session is used (so can be flushed after session creation) > > > > therefore keeping it around serves no purpose once the session is > > > > created and simply clutters up the TPM volatile handle slots. (I > > > > don't know of a case where we use all the slots in a kernel > > > > operation, but since we don't need it lets not find out when we get > > > > one). So I advise dropping patch 3. > > > > > > Let's go this through just to check I'm understanding. > > > > > > Holding null key had radical effect on boot time: it cut it down by > > > 5 secons down to 15 seconds: > > > > > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/ > > > > > > Then in subsequent version I implemented lazy auth session and boot > > > time went down to 9.7 seconds. > > > > > > So is the point you're trying to make that since auth session is > > > already held as long as we can and they flushed in synchronous > > > point too, I can just as well drop patch 3? > > > > Yes, because the null key is only used in session generation which is > > now lazy, it adds or subtracts nothing from the timings. When you're > > forced to flush the session, the null key goes too, so you again have > > to restore it from the context. When you can keep the session you > > don't need the null key because you're not regenerating it. > > Yeah, OK, then we're in sync with this. It's evolutionary cruft. > > Just had to check that the logic matches how I projected your earlier > comment because these are sensitive changes. I'm definitely going keeep 1/5 and 2/5 as they are still bug fixes. So they will appear in v6 unchanged and perf fixes (which are not functional fixes) should not be built on top of broken code. BR, Jarkko
On Tue Sep 24, 2024 at 8:26 PM EEST, Jarkko Sakkinen wrote: > On Tue Sep 24, 2024 at 7:36 PM EEST, Jarkko Sakkinen wrote: > > On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote: > > > On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote: > > > > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote: > > > [...] > > > > > Patch 3 is completely unnecessary: the null key is only used to > > > > > salt the session and is not required to be resident while the > > > > > session is used (so can be flushed after session creation) > > > > > therefore keeping it around serves no purpose once the session is > > > > > created and simply clutters up the TPM volatile handle slots. (I > > > > > don't know of a case where we use all the slots in a kernel > > > > > operation, but since we don't need it lets not find out when we get > > > > > one). So I advise dropping patch 3. > > > > > > > > Let's go this through just to check I'm understanding. > > > > > > > > Holding null key had radical effect on boot time: it cut it down by > > > > 5 secons down to 15 seconds: > > > > > > > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/ > > > > > > > > Then in subsequent version I implemented lazy auth session and boot > > > > time went down to 9.7 seconds. > > > > > > > > So is the point you're trying to make that since auth session is > > > > already held as long as we can and they flushed in synchronous > > > > point too, I can just as well drop patch 3? > > > > > > Yes, because the null key is only used in session generation which is > > > now lazy, it adds or subtracts nothing from the timings. When you're > > > forced to flush the session, the null key goes too, so you again have > > > to restore it from the context. When you can keep the session you > > > don't need the null key because you're not regenerating it. > > > > Yeah, OK, then we're in sync with this. It's evolutionary cruft. > > > > Just had to check that the logic matches how I projected your earlier > > comment because these are sensitive changes. > > I'm definitely going keeep 1/5 and 2/5 as they are still bug fixes. > > So they will appear in v6 unchanged and perf fixes (which are not > functional fixes) should not be built on top of broken code. And 3/5 is actually required because it saves of doing flush during the boot if nothing else. We are wasting more time so I don't want to waste it for nothing. BR, Jarkko
On Tue Sep 24, 2024 at 8:28 PM EEST, Jarkko Sakkinen wrote: > On Tue Sep 24, 2024 at 8:26 PM EEST, Jarkko Sakkinen wrote: > > On Tue Sep 24, 2024 at 7:36 PM EEST, Jarkko Sakkinen wrote: > > > On Tue Sep 24, 2024 at 7:33 PM EEST, James Bottomley wrote: > > > > On Tue, 2024-09-24 at 19:29 +0300, Jarkko Sakkinen wrote: > > > > > On Tue Sep 24, 2024 at 4:48 PM EEST, James Bottomley wrote: > > > > [...] > > > > > > Patch 3 is completely unnecessary: the null key is only used to > > > > > > salt the session and is not required to be resident while the > > > > > > session is used (so can be flushed after session creation) > > > > > > therefore keeping it around serves no purpose once the session is > > > > > > created and simply clutters up the TPM volatile handle slots. (I > > > > > > don't know of a case where we use all the slots in a kernel > > > > > > operation, but since we don't need it lets not find out when we get > > > > > > one). So I advise dropping patch 3. > > > > > > > > > > Let's go this through just to check I'm understanding. > > > > > > > > > > Holding null key had radical effect on boot time: it cut it down by > > > > > 5 secons down to 15 seconds: > > > > > > > > > > https://lore.kernel.org/linux-integrity/CALSz7m1WG7fZ9UuO0URgCZEDG7r_wB4Ev_4mOHJThH_d1Ed1nw@mail.gmail.com/ > > > > > > > > > > Then in subsequent version I implemented lazy auth session and boot > > > > > time went down to 9.7 seconds. > > > > > > > > > > So is the point you're trying to make that since auth session is > > > > > already held as long as we can and they flushed in synchronous > > > > > point too, I can just as well drop patch 3? > > > > > > > > Yes, because the null key is only used in session generation which is > > > > now lazy, it adds or subtracts nothing from the timings. When you're > > > > forced to flush the session, the null key goes too, so you again have > > > > to restore it from the context. When you can keep the session you > > > > don't need the null key because you're not regenerating it. > > > > > > Yeah, OK, then we're in sync with this. It's evolutionary cruft. > > > > > > Just had to check that the logic matches how I projected your earlier > > > comment because these are sensitive changes. > > > > I'm definitely going keeep 1/5 and 2/5 as they are still bug fixes. > > > > So they will appear in v6 unchanged and perf fixes (which are not > > functional fixes) should not be built on top of broken code. > > And 3/5 is actually required because it saves of doing flush during > the boot if nothing else. > > We are wasting more time so I don't want to waste it for nothing. Anything beyong 50 ms matters and that flush certainly costs more than that. As I already stated in earlier version, we need to find more of these "50 ms and 100 ms there sites. The functional fixes are required because perf fix is always *less critical* than perf fix. Please pay more attention to proper error rollback next time, that's all I can say on that. It's not my fault that it is broken. BR, Jarkko
On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote: > On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote: > > This patch set aims to fix: > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > > > The baseline for the series is the v6.11 tag. > > > > v4: > > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > > v3: > > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > > v2: > > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > > v1: > > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > > > Jarkko Sakkinen (5): > > tpm: Return on tpm2_create_null_primary() failure > > tpm: Implement tpm2_load_null() rollback > > tpm: flush the null key only when /dev/tpm0 is accessed > > tpm: Allocate chip->auth in tpm2_start_auth_session() > > tpm: flush the auth session only when /dev/tpm0 is open > > > > drivers/char/tpm/tpm-chip.c | 14 ++++ > > drivers/char/tpm/tpm-dev-common.c | 8 +++ > > drivers/char/tpm/tpm-interface.c | 10 ++- > > drivers/char/tpm/tpm2-cmd.c | 3 + > > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++------------ > > include/linux/tpm.h | 2 + > > 6 files changed, 102 insertions(+), 44 deletions(-) > > > Roberto, James, speaking of digest cache. This patch set has no aim to > fix those issues but I do believe that it should improve also that > feature. > > If I don't get soon patch reviews for the patch set, I'll pick the 2nd > best option: disable bus encryption on all architectures including x86 > and ARM64 (being by default on). > > It's a force majeure situation. I know this would sort out the issue > but I really cannot send these as a pull request with zero reviewe-by's. > > I expect this to be closed by tomorrow. Jarkko, sorry to be so late to this discussion. The bus HMAC/encryption really impacts IMA as well. Even with this patch set, it's slow. My preference would be to disable bus encryption on all architectures until there is a boot/runtime option allowing it to be disabled for IMA as discussed in the other thread. In the other thread, I also mentioned that the Kconfig is incorrectly worded. The performance degradation is not limited to encryption, but the HMAC itself. Please change "Saying Y here adds some encryption overhead to all kernel to TPM transactions." to "Saying Y here adds overhead to all kernel to TPM transactions." thanks, Mimi
On 9/21/24 8:08 AM, Jarkko Sakkinen wrote: > This patch set aims to fix: > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > The baseline for the series is the v6.11 tag. I was testing this with 6.12-rc1 on ppc64/kvm + vtpm boot time from pressing return on grub until login prompt appears while using an IMA measure policy: with HMAC2: 36s with HMAC2+this series: 29s without HMAC2: 28s Looks good to me, though using a hardware TPM would probably be more critical in this type of measurement. > > v4: > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > v3: > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > v2: > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > v1: > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > Jarkko Sakkinen (5): > tpm: Return on tpm2_create_null_primary() failure > tpm: Implement tpm2_load_null() rollback > tpm: flush the null key only when /dev/tpm0 is accessed > tpm: Allocate chip->auth in tpm2_start_auth_session() > tpm: flush the auth session only when /dev/tpm0 is open > > drivers/char/tpm/tpm-chip.c | 14 ++++ > drivers/char/tpm/tpm-dev-common.c | 8 +++ > drivers/char/tpm/tpm-interface.c | 10 ++- > drivers/char/tpm/tpm2-cmd.c | 3 + > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++------------ > include/linux/tpm.h | 2 + > 6 files changed, 102 insertions(+), 44 deletions(-) >
On Tue, 2024-10-01 at 14:10 -0400, Mimi Zohar wrote: > On Sun, 2024-09-22 at 20:51 +0300, Jarkko Sakkinen wrote: > > On Sat Sep 21, 2024 at 3:08 PM EEST, Jarkko Sakkinen wrote: > > > This patch set aims to fix: > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > > > > > The baseline for the series is the v6.11 tag. > > > > > > v4: > > > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > > > v3: > > > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > > > v2: > > > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > > > v1: > > > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > > > > > Jarkko Sakkinen (5): > > > tpm: Return on tpm2_create_null_primary() failure > > > tpm: Implement tpm2_load_null() rollback > > > tpm: flush the null key only when /dev/tpm0 is accessed > > > tpm: Allocate chip->auth in tpm2_start_auth_session() > > > tpm: flush the auth session only when /dev/tpm0 is open > > > > > > drivers/char/tpm/tpm-chip.c | 14 ++++ > > > drivers/char/tpm/tpm-dev-common.c | 8 +++ > > > drivers/char/tpm/tpm-interface.c | 10 ++- > > > drivers/char/tpm/tpm2-cmd.c | 3 + > > > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++------ > > > ------ > > > include/linux/tpm.h | 2 + > > > 6 files changed, 102 insertions(+), 44 deletions(-) > > > > > > Roberto, James, speaking of digest cache. This patch set has no aim > > to > > fix those issues but I do believe that it should improve also that > > feature. > > > > If I don't get soon patch reviews for the patch set, I'll pick the > > 2nd > > best option: disable bus encryption on all architectures including > > x86 > > and ARM64 (being by default on). > > > > It's a force majeure situation. I know this would sort out the > > issue > > but I really cannot send these as a pull request with zero reviewe- > > by's. > > > > I expect this to be closed by tomorrow. > > Jarkko, sorry to be so late to this discussion. The bus > HMAC/encryption really > impacts IMA as well. Even with this patch set, it's slow. My > preference would > be to disable bus encryption on all architectures until there is a > boot/runtime > option allowing it to be disabled for IMA as discussed in the other > thread. No worries, I was getting nervous because of a job switch, now I have time since cannot move this forward for week or two anyway :-) I'm totally +1 to make bus encyption opt-in instead of opt-out. It's just not there yet. My fixes fix one use case, i.e. the boot process for AMD, so in that sense they are totally legit. But it is pretty clear by now that tons of similar patches and small tweaks will be required. As it is my 2nd work week, I can implement such patch, *next week*. Up until that there is time to give any feedback. > > In the other thread, I also mentioned that the Kconfig is incorrectly > worded. > The performance degradation is not limited to encryption, but the > HMAC itself. > Please change "Saying Y here adds some encryption overhead to all > kernel to TPM > transactions." to "Saying Y here adds overhead to all kernel to TPM > transactions." I'll keep this in mind! I'd prefer to do a single patch set probably with my previous fixes and this, as they are already tested by the reporter anyway and pile this as a new patch on top. I.e. have basically all that I'd put to the next PR. > > thanks, > > Mimi BR, Jarkko
On Thu, 2024-10-03 at 11:14 -0400, Stefan Berger wrote: > > > On 9/21/24 8:08 AM, Jarkko Sakkinen wrote: > > This patch set aims to fix: > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > > > The baseline for the series is the v6.11 tag. > > I was testing this with 6.12-rc1 on ppc64/kvm + vtpm boot time from > pressing return on grub until login prompt appears while using an IMA > measure policy: > > with HMAC2: 36s > with HMAC2+this series: 29s > without HMAC2: 28s > > Looks good to me, though using a hardware TPM would probably be more > critical in this type of measurement. Yep, exactly. QEMU fails here... BR, Jarkko
On Sat, 2024-09-21 at 15:08 +0300, Jarkko Sakkinen wrote: > This patch set aims to fix: > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > The baseline for the series is the v6.11 tag. > > v4: > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > v3: > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > v2: > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > v1: > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > Jarkko Sakkinen (5): > tpm: Return on tpm2_create_null_primary() failure > tpm: Implement tpm2_load_null() rollback > tpm: flush the null key only when /dev/tpm0 is accessed > tpm: Allocate chip->auth in tpm2_start_auth_session() > tpm: flush the auth session only when /dev/tpm0 is open > > drivers/char/tpm/tpm-chip.c | 14 ++++ > drivers/char/tpm/tpm-dev-common.c | 8 +++ > drivers/char/tpm/tpm-interface.c | 10 ++- > drivers/char/tpm/tpm2-cmd.c | 3 + > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++---------- > -- > include/linux/tpm.h | 2 + > 6 files changed, 102 insertions(+), 44 deletions(-) The summarize some discussions: 1. I'll address Stefan's remarks. 2. We know that these patches address the desktop boot. 3. IMA is too slow => add a boot option for IMA default off. I.e. IMA will not use the feature unless you specifically ask. 4. Random generation can be optimized a lot with or without encryption. Not sure if I have time to do ths right now but I have already patch planned for this. What is blocking me is the James' request to not include functional fixes. The problem with that is that if comply to that request I will have to postpone all the performacne fixes and send a patch set with only functional fixes and go all review rounds with that before moving forward. This is just how priorities go in kernel and doing by the book. Is that really necessary? Since I've just started in a new job any patches can be expected earliest next week. That's why I was rushing with the patch set in the first place because I knew that there will be otherwise a few week delay but we'll get there :-) BR, Jarkko
On Fri, 2024-10-11 at 17:06 +0300, Jarkko Sakkinen wrote: > On Sat, 2024-09-21 at 15:08 +0300, Jarkko Sakkinen wrote: > > This patch set aims to fix: > > https://bugzilla.kernel.org/show_bug.cgi?id=219229. > > > > The baseline for the series is the v6.11 tag. > > > > v4: > > https://lore.kernel.org/linux-integrity/20240918203559.192605-1-jarkko@kernel.org/ > > v3: > > https://lore.kernel.org/linux-integrity/20240917154444.702370-1-jarkko@kernel.org/ > > v2: > > https://lore.kernel.org/linux-integrity/20240916110714.1396407-1-jarkko@kernel.org/ > > v1: > > https://lore.kernel.org/linux-integrity/20240915180448.2030115-1-jarkko@kernel.org/ > > > > Jarkko Sakkinen (5): > > tpm: Return on tpm2_create_null_primary() failure > > tpm: Implement tpm2_load_null() rollback > > tpm: flush the null key only when /dev/tpm0 is accessed > > tpm: Allocate chip->auth in tpm2_start_auth_session() > > tpm: flush the auth session only when /dev/tpm0 is open > > > > drivers/char/tpm/tpm-chip.c | 14 ++++ > > drivers/char/tpm/tpm-dev-common.c | 8 +++ > > drivers/char/tpm/tpm-interface.c | 10 ++- > > drivers/char/tpm/tpm2-cmd.c | 3 + > > drivers/char/tpm/tpm2-sessions.c | 109 ++++++++++++++++++---------- > > -- > > include/linux/tpm.h | 2 + > > 6 files changed, 102 insertions(+), 44 deletions(-) > > The summarize some discussions: > > 1. I'll address Stefan's remarks. > 2. We know that these patches address the desktop boot. > 3. IMA is too slow => add a boot option for IMA default off. I.e. > IMA will not use the feature unless you specifically ask. Initially, I thought that maybe it would not be good to have an event log with unmodified and altered measurement entries. Then, I tried to think if we can really prevent an active interposer from injecting arbitrary PCR extends and pretending that those events actually happened. If I understood James's cover letter correctly, the kernel can detect whether a TPM reset occurred, but not that a PCR extend occurred (maybe with a shadow PCR?). Second point, do we really want to take the responsibility to disable the protection on behalf of users? Maybe a better choice is to let them consciously disable HMAC protection. So, maybe we should keep the HMAC protection enabled, and if the number of PCR extends is above a certain threshold, we can print a warning message in the kernel log. Roberto > 4. Random generation can be optimized a lot with or without > encryption. Not sure if I have time to do ths right now > but I have already patch planned for this. > > What is blocking me is the James' request to not include > functional fixes. The problem with that is that if comply > to that request I will have to postpone all the performacne > fixes and send a patch set with only functional fixes and > go all review rounds with that before moving forward. > > This is just how priorities go in kernel and doing by the > book. Is that really necessary? > > Since I've just started in a new job any patches can be > expected earliest next week. That's why I was rushing with > the patch set in the first place because I knew that there > will be otherwise a few week delay but we'll get there :-) > > BR, Jarkko >
On Fri, 2024-10-11 at 18:10 +0200, Roberto Sassu wrote: > Initially, I thought that maybe it would not be good to have an event > log with unmodified and altered measurement entries. Then, I tried to > think if we can really prevent an active interposer from injecting > arbitrary PCR extends and pretending that those events actually > happened. > > If I understood James's cover letter correctly, the kernel can detect > whether a TPM reset occurred, but not that a PCR extend occurred > (maybe > with a shadow PCR?). We can detect TPM reset indirectly. I.e. null seed re-randomizes per reset. > > Second point, do we really want to take the responsibility to disable > the protection on behalf of users? Maybe a better choice is to let > them > consciously disable HMAC protection. So when IMA is not used already with these fixes we get good results. And for tpm2_get_random() we can make the algorithm smarter. All in all we have good path ongoing for "desktop use case" that I would keep thing way there are or at least postpone any major decisions just a bit. For server/IMA use case I'll add a boot parameter it can be either on or off by default, I will state that in the commit message and we'll go from there. > > So, maybe we should keep the HMAC protection enabled, and if the > number > of PCR extends is above a certain threshold, we can print a warning > message in the kernel log. > > Roberto BR, Jarkko
On Fri, 2024-10-11 at 19:25 +0300, Jarkko Sakkinen wrote: > On Fri, 2024-10-11 at 18:10 +0200, Roberto Sassu wrote: > > Initially, I thought that maybe it would not be good to have an > > event > > log with unmodified and altered measurement entries. Then, I tried > > to > > think if we can really prevent an active interposer from injecting > > arbitrary PCR extends and pretending that those events actually > > happened. > > > > If I understood James's cover letter correctly, the kernel can > > detect > > whether a TPM reset occurred, but not that a PCR extend occurred > > (maybe > > with a shadow PCR?). > > We can detect TPM reset indirectly. I.e. null seed re-randomizes > per reset. > > > > > Second point, do we really want to take the responsibility to > > disable > > the protection on behalf of users? Maybe a better choice is to let > > them > > consciously disable HMAC protection. > > So when IMA is not used already with these fixes we get good > results. And for tpm2_get_random() we can make the algorithm > smarter. All in all we have good path ongoing for "desktop > use case" that I would keep thing way there are or at least > postpone any major decisions just a bit. > > For server/IMA use case I'll add a boot parameter it can be > either on or off by default, I will state that in the commit > message and we'll go from there. Up until legit fixes are place distributors can easily disable the feature. It would be worse if TCG_TPM2_HMAC did not exist. So I think it is better to focus on doing right things right, since the feature itself is useful objectively, and make sure that those fixes bring the wanted results. BR, Jarkko
On Sat, 2024-10-12 at 13:56 +0300, Jarkko Sakkinen wrote: > On Fri, 2024-10-11 at 19:25 +0300, Jarkko Sakkinen wrote: > > On Fri, 2024-10-11 at 18:10 +0200, Roberto Sassu wrote: > > > Initially, I thought that maybe it would not be good to have an > > > event > > > log with unmodified and altered measurement entries. Then, I tried > > > to > > > think if we can really prevent an active interposer from injecting > > > arbitrary PCR extends and pretending that those events actually > > > happened. > > > > > > If I understood James's cover letter correctly, the kernel can > > > detect > > > whether a TPM reset occurred, but not that a PCR extend occurred > > > (maybe > > > with a shadow PCR?). > > > > We can detect TPM reset indirectly. I.e. null seed re-randomizes > > per reset. > > > > > > > > Second point, do we really want to take the responsibility to > > > disable > > > the protection on behalf of users? Maybe a better choice is to let > > > them > > > consciously disable HMAC protection. > > > > So when IMA is not used already with these fixes we get good > > results. And for tpm2_get_random() we can make the algorithm > > smarter. All in all we have good path ongoing for "desktop > > use case" that I would keep thing way there are or at least > > postpone any major decisions just a bit. > > > > For server/IMA use case I'll add a boot parameter it can be > > either on or off by default, I will state that in the commit > > message and we'll go from there. Sounds good. > > Up until legit fixes are place distributors can easily disable > the feature. It would be worse if TCG_TPM2_HMAC did not exist. > > So I think it is better to focus on doing right things right, > since the feature itself is useful objectively, and make sure > that those fixes bring the wanted results. Are you backtracking on having a boot parameter here specifically to turn on/off HMAC encryption for IMA? Mimi
On Mon, 2024-10-14 at 07:45 -0400, Mimi Zohar wrote: > > > For server/IMA use case I'll add a boot parameter it can be > > > either on or off by default, I will state that in the commit > > > message and we'll go from there. > > Sounds good. But only after this patch set lands. I gave this a thought and since this patch set is specifically for a specific Bugzilla bug that it closes, I have no interest to increase its scope. > > > > > Up until legit fixes are place distributors can easily disable > > the feature. It would be worse if TCG_TPM2_HMAC did not exist. > > > > So I think it is better to focus on doing right things right, > > since the feature itself is useful objectively, and make sure > > that those fixes bring the wanted results. > > Are you backtracking on having a boot parameter here specifically to > turn on/off > HMAC encryption for IMA? I'm not really sure yet but obviously any change goes through review. Also fastest route is to send your own RFC's to IMA specific issue. For me it will take some time (post this patch set). > > Mimi > > BR, Jarkko BR, Jarkko
On Mon, 2024-10-14 at 15:34 +0300, Jarkko Sakkinen wrote: > On Mon, 2024-10-14 at 07:45 -0400, Mimi Zohar wrote: > > > > For server/IMA use case I'll add a boot parameter it can be > > > > either on or off by default, I will state that in the commit > > > > message and we'll go from there. > > > > Sounds good. > > But only after this patch set lands. I gave this a thought and since > this patch set is specifically for a specific Bugzilla bug that it > closes, I have no interest to increase its scope. Prior to your performance improvement patch set it took >10 minutes to boot, when it succeeded booting. Now on Fedora 40 with "ima_policy=tcb" on the boot command line, it's taking ~3 minutes to boot. Do you really think that is acceptable?! > > > > > > > > Up until legit fixes are place distributors can easily disable > > > the feature. It would be worse if TCG_TPM2_HMAC did not exist. > > > > > > So I think it is better to focus on doing right things right, > > > since the feature itself is useful objectively, and make sure > > > that those fixes bring the wanted results. The right thing would have been to listen to my concerns when this was initially being discussed. The right thing wasn't enabling TCG_TPM2_HMAC by default. > > > > Are you backtracking on having a boot parameter here specifically to > > turn on/off > > HMAC encryption for IMA? > > I'm not really sure yet but obviously any change goes through review. > > Also fastest route is to send your own RFC's to IMA specific issue. > For me it will take some time (post this patch set). Done. The patch applies cleanly with/without the TPM performance improvement patch set. https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/ Mimi
On Tue Oct 15, 2024 at 11:08 PM EEST, Mimi Zohar wrote: > > > > since the feature itself is useful objectively, and make sure > > > > that those fixes bring the wanted results. > > The right thing would have been to listen to my concerns when this was initially > being discussed. The right thing wasn't enabling TCG_TPM2_HMAC by default. This is debatable as for laptops and desktops having hard drive encryption do benefit with this. If systemd hadn't added systemd-cryptenroll I would agree with this. I learned about this feature two years after its inception in that project, so we needed to address this as a priority (I did not and will not follow systemd development proactively, as don't have time for that). I feel more safe using my laptop with the feature in place at least. Besides, it is complicated feature enough that it would have been impossible ever "zero glitch" land it. I don't think there is any rigid "data centers first" rule really, except maybe for those businesses that run data centers (and I'm not one of those businesses). BR, Jarkko