Message ID | 20221215180125.24632-3-jejb@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tpm: add mssim backend | expand |
On 12/15/22 13:01, James Bottomley wrote: > From: James Bottomley <James.Bottomley@HansenPartnership.com> > > The Microsoft Simulator (mssim) is the reference emulation platform > for the TCG TPM 2.0 specification. > > https://github.com/Microsoft/ms-tpm-20-ref.git > > It exports a fairly simple network socket baset protocol on two > sockets, one for command (default 2321) and one for control (default > 2322). This patch adds a simple backend that can speak the mssim > protocol over the network. It also allows the host, and two ports to > be specified on the qemu command line. The benefits are twofold: > firstly it gives us a backend that actually speaks a standard TPM > emulation protocol instead of the linux specific TPM driver format of > the current emulated TPM backend and secondly, using the microsoft > protocol, the end point of the emulator can be anywhere on the > network, facilitating the cloud use case where a central TPM service > can be used over a control network. > > The implementation does basic control commands like power off/on, but > doesn't implement cancellation or startup. The former because > cancellation is pretty much useless on a fast operating TPM emulator > and the latter because this emulator is designed to be used with OVMF > which itself does TPM startup and I wanted to validate that. > > To run this, simply download an emulator based on the MS specification > (package ibmswtpm2 on openSUSE) and run it, then add these two lines > to the qemu command and it will use the emulator. > > -tpmdev mssim,id=tpm0 \ > -device tpm-crb,tpmdev=tpm0 \ > > to use a remote emulator replace the first line with > > -tpmdev "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote','port':'2321'}}" > > tpm-tis also works as the backend. Since this device does not properly support migration you have to register a migration blocker. Stefan
On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote: > > > On 12/15/22 13:01, James Bottomley wrote: > > From: James Bottomley <James.Bottomley@HansenPartnership.com> > > > > The Microsoft Simulator (mssim) is the reference emulation platform > > for the TCG TPM 2.0 specification. > > > > https://github.com/Microsoft/ms-tpm-20-ref.git > > > > It exports a fairly simple network socket baset protocol on two > > sockets, one for command (default 2321) and one for control > > (default > > 2322). This patch adds a simple backend that can speak the mssim > > protocol over the network. It also allows the host, and two ports > > to > > be specified on the qemu command line. The benefits are twofold: > > firstly it gives us a backend that actually speaks a standard TPM > > emulation protocol instead of the linux specific TPM driver format > > of > > the current emulated TPM backend and secondly, using the microsoft > > protocol, the end point of the emulator can be anywhere on the > > network, facilitating the cloud use case where a central TPM > > service > > can be used over a control network. > > > > The implementation does basic control commands like power off/on, > > but > > doesn't implement cancellation or startup. The former because > > cancellation is pretty much useless on a fast operating TPM > > emulator > > and the latter because this emulator is designed to be used with > > OVMF > > which itself does TPM startup and I wanted to validate that. > > > > To run this, simply download an emulator based on the MS > > specification > > (package ibmswtpm2 on openSUSE) and run it, then add these two > > lines > > to the qemu command and it will use the emulator. > > > > -tpmdev mssim,id=tpm0 \ > > -device tpm-crb,tpmdev=tpm0 \ > > > > to use a remote emulator replace the first line with > > > > -tpmdev > > "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote' > > ,'port':'2321'}}" > > > > tpm-tis also works as the backend. > > Since this device does not properly support migration you have to > register a migration blocker. Actually it seems to support migration just fine. Currently the PCR's get zero'd which is my fault for doing a TPM power off/on, but switching that based on state should be an easy fix. James
On 12/15/22 14:22, James Bottomley wrote: > On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote: >> >> >> On 12/15/22 13:01, James Bottomley wrote: >>> From: James Bottomley <James.Bottomley@HansenPartnership.com> >>> >>> The Microsoft Simulator (mssim) is the reference emulation platform >>> for the TCG TPM 2.0 specification. >>> >>> https://github.com/Microsoft/ms-tpm-20-ref.git >>> >>> It exports a fairly simple network socket baset protocol on two >>> sockets, one for command (default 2321) and one for control >>> (default >>> 2322). This patch adds a simple backend that can speak the mssim >>> protocol over the network. It also allows the host, and two ports >>> to >>> be specified on the qemu command line. The benefits are twofold: >>> firstly it gives us a backend that actually speaks a standard TPM >>> emulation protocol instead of the linux specific TPM driver format >>> of >>> the current emulated TPM backend and secondly, using the microsoft >>> protocol, the end point of the emulator can be anywhere on the >>> network, facilitating the cloud use case where a central TPM >>> service >>> can be used over a control network. >>> >>> The implementation does basic control commands like power off/on, >>> but >>> doesn't implement cancellation or startup. The former because >>> cancellation is pretty much useless on a fast operating TPM >>> emulator >>> and the latter because this emulator is designed to be used with >>> OVMF >>> which itself does TPM startup and I wanted to validate that. >>> >>> To run this, simply download an emulator based on the MS >>> specification >>> (package ibmswtpm2 on openSUSE) and run it, then add these two >>> lines >>> to the qemu command and it will use the emulator. >>> >>> -tpmdev mssim,id=tpm0 \ >>> -device tpm-crb,tpmdev=tpm0 \ >>> >>> to use a remote emulator replace the first line with >>> >>> -tpmdev >>> "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote' >>> ,'port':'2321'}}" >>> >>> tpm-tis also works as the backend. >> >> Since this device does not properly support migration you have to >> register a migration blocker. > > Actually it seems to support migration just fine. Currently the PCR's > get zero'd which is my fault for doing a TPM power off/on, but > switching that based on state should be an easy fix. How do you handle virsh save -> host reboot -> virsh restore? You should also add a description to docs/specs/tpm.rst. Stefan > > James >
On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote: > > > On 12/15/22 14:22, James Bottomley wrote: > > On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote: > > > > > > > > > On 12/15/22 13:01, James Bottomley wrote: > > > > From: James Bottomley <James.Bottomley@HansenPartnership.com> > > > > > > > > The Microsoft Simulator (mssim) is the reference emulation > > > > platform > > > > for the TCG TPM 2.0 specification. > > > > > > > > https://github.com/Microsoft/ms-tpm-20-ref.git > > > > > > > > It exports a fairly simple network socket baset protocol on two > > > > sockets, one for command (default 2321) and one for control > > > > (default > > > > 2322). This patch adds a simple backend that can speak the > > > > mssim > > > > protocol over the network. It also allows the host, and two > > > > ports > > > > to > > > > be specified on the qemu command line. The benefits are > > > > twofold: > > > > firstly it gives us a backend that actually speaks a standard > > > > TPM > > > > emulation protocol instead of the linux specific TPM driver > > > > format > > > > of > > > > the current emulated TPM backend and secondly, using the > > > > microsoft > > > > protocol, the end point of the emulator can be anywhere on the > > > > network, facilitating the cloud use case where a central TPM > > > > service > > > > can be used over a control network. > > > > > > > > The implementation does basic control commands like power > > > > off/on, > > > > but > > > > doesn't implement cancellation or startup. The former because > > > > cancellation is pretty much useless on a fast operating TPM > > > > emulator > > > > and the latter because this emulator is designed to be used > > > > with > > > > OVMF > > > > which itself does TPM startup and I wanted to validate that. > > > > > > > > To run this, simply download an emulator based on the MS > > > > specification > > > > (package ibmswtpm2 on openSUSE) and run it, then add these two > > > > lines > > > > to the qemu command and it will use the emulator. > > > > > > > > -tpmdev mssim,id=tpm0 \ > > > > -device tpm-crb,tpmdev=tpm0 \ > > > > > > > > to use a remote emulator replace the first line with > > > > > > > > -tpmdev > > > > "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'rem > > > > ote' > > > > ,'port':'2321'}}" > > > > > > > > tpm-tis also works as the backend. > > > > > > Since this device does not properly support migration you have to > > > register a migration blocker. > > > > Actually it seems to support migration just fine. Currently the > > PCR's > > get zero'd which is my fault for doing a TPM power off/on, but > > switching that based on state should be an easy fix. > > How do you handle virsh save -> host reboot -> virsh restore? I didn't. I just pulled out the TPM power state changes and followed the guide here using the migrate "exec:gzip -c > STATEFILE.gz" recipe: https://www.linux-kvm.org/page/Migration and verified the TPM pcrs and the null name were unchanged. > You should also add a description to docs/specs/tpm.rst. Description of what? It functions exactly like passthrough on migration. Since the TPM state is retained in the server a reconnection just brings everything back to where it was. James
On 12/15/22 14:40, James Bottomley wrote: > On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote: >> >> >> On 12/15/22 14:22, James Bottomley wrote: >>> On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote: >>>> >>>> >>>> On 12/15/22 13:01, James Bottomley wrote: >>>>> From: James Bottomley <James.Bottomley@HansenPartnership.com> >>>>> >>>>> The Microsoft Simulator (mssim) is the reference emulation >>>>> platform >>>>> for the TCG TPM 2.0 specification. >>>>> >>>>> https://github.com/Microsoft/ms-tpm-20-ref.git >>>>> >>>>> It exports a fairly simple network socket baset protocol on two >>>>> sockets, one for command (default 2321) and one for control >>>>> (default >>>>> 2322). This patch adds a simple backend that can speak the >>>>> mssim >>>>> protocol over the network. It also allows the host, and two >>>>> ports >>>>> to >>>>> be specified on the qemu command line. The benefits are >>>>> twofold: >>>>> firstly it gives us a backend that actually speaks a standard >>>>> TPM >>>>> emulation protocol instead of the linux specific TPM driver >>>>> format >>>>> of >>>>> the current emulated TPM backend and secondly, using the >>>>> microsoft >>>>> protocol, the end point of the emulator can be anywhere on the >>>>> network, facilitating the cloud use case where a central TPM >>>>> service >>>>> can be used over a control network. >>>>> >>>>> The implementation does basic control commands like power >>>>> off/on, >>>>> but >>>>> doesn't implement cancellation or startup. The former because >>>>> cancellation is pretty much useless on a fast operating TPM >>>>> emulator >>>>> and the latter because this emulator is designed to be used >>>>> with >>>>> OVMF >>>>> which itself does TPM startup and I wanted to validate that. >>>>> >>>>> To run this, simply download an emulator based on the MS >>>>> specification >>>>> (package ibmswtpm2 on openSUSE) and run it, then add these two >>>>> lines >>>>> to the qemu command and it will use the emulator. >>>>> >>>>> -tpmdev mssim,id=tpm0 \ >>>>> -device tpm-crb,tpmdev=tpm0 \ >>>>> >>>>> to use a remote emulator replace the first line with >>>>> >>>>> -tpmdev >>>>> "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'rem >>>>> ote' >>>>> ,'port':'2321'}}" >>>>> >>>>> tpm-tis also works as the backend. >>>> >>>> Since this device does not properly support migration you have to >>>> register a migration blocker. >>> >>> Actually it seems to support migration just fine. Currently the >>> PCR's >>> get zero'd which is my fault for doing a TPM power off/on, but >>> switching that based on state should be an easy fix. >> >> How do you handle virsh save -> host reboot -> virsh restore? > > I didn't. I just pulled out the TPM power state changes and followed > the guide here using the migrate "exec:gzip -c > STATEFILE.gz" recipe: > > https://www.linux-kvm.org/page/Migration > > and verified the TPM pcrs and the null name were unchanged. > >> You should also add a description to docs/specs/tpm.rst. > > Description of what? It functions exactly like passthrough on Please describe all the scenarios so that someone else can repeat them when trying out **your** device. There are sections describing how things for swtpm and you should add how things work for the mssim TPM. https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#the-qemu-tpm-emulator-device https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#migration-with-the-tpm-emulator > migration. Since the TPM state is retained in the server a > reconnection just brings everything back to where it was. So it's remote. And the ports are always open and someone can just connect to the open ports and power cycle the device? This may not be the most important scenario but nevertheless I wouldn't want to deal with bug reports if someone does 'VM snapshotting' -- how this is correctly handled would be of interest. Stefan > > James >
On Thu, 2022-12-15 at 14:57 -0500, Stefan Berger wrote: > On 12/15/22 14:40, James Bottomley wrote: > > On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote: [...] > > > You should also add a description to docs/specs/tpm.rst. > > > > Description of what? It functions exactly like passthrough on > > Please describe all the scenarios so that someone else can repeat > them when trying out **your** device. > > There are sections describing how things for swtpm and you should add > how things work for the mssim TPM. > > https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#the-qemu-tpm-emulator-device > https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#migration-with-the-tpm-emulator The passthrough snapshot/restore isn't described there either. This behaves exactly the same in that it's caveat emptor. If something happens in the interim to upset the TPM state then the restore will have unexpected effects due to the externally changed TPM state. This is actually a feature: I'm checking our interposer defences by doing external state manipulation. > > migration. Since the TPM state is retained in the server a > > reconnection just brings everything back to where it was. > > So it's remote. And the ports are always open and someone can just > connect to the open ports and power cycle the device? in the same way as you can power off the hardware and have issues with a passthrough TPM on vm restore, yes. > This may not be the most important scenario but nevertheless I > wouldn't want to deal with bug reports if someone does 'VM > snapshotting' -- how this is correctly handled would be of interest. I'd rather say nothing, like passthrough, then there are no expectations beyond it might work if you know what you're doing. I don't really have much interest in the migration use case, but I knew it should work like the passthrough case, so that's what I tested. James
On 12/15/22 15:07, James Bottomley wrote: > On Thu, 2022-12-15 at 14:57 -0500, Stefan Berger wrote: >> On 12/15/22 14:40, James Bottomley wrote: >>> On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote: > [...] >>>> You should also add a description to docs/specs/tpm.rst. >>> >>> Description of what? It functions exactly like passthrough on >> >> Please describe all the scenarios so that someone else can repeat >> them when trying out **your** device. >> >> There are sections describing how things for swtpm and you should add >> how things work for the mssim TPM. >> >> https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#the-qemu-tpm-emulator-device >> https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#migration-with-the-tpm-emulator > > The passthrough snapshot/restore isn't described there either. This Forget about passthrough, rather compare it to swtpm. > behaves exactly the same in that it's caveat emptor. If something > happens in the interim to upset the TPM state then the restore will > have unexpected effects due to the externally changed TPM state. This > is actually a feature: I'm checking our interposer defences by doing > external state manipulation. > >>> migration. Since the TPM state is retained in the server a >>> reconnection just brings everything back to where it was. >> >> So it's remote. And the ports are always open and someone can just >> connect to the open ports and power cycle the device? > > in the same way as you can power off the hardware and have issues with > a passthrough TPM on vm restore, yes. I don't thinkyou should compare the mssim TPM with passthrough but rather with swtpm emulator + tpm_emulator backend. That's a much better comparison. > >> This may not be the most important scenario but nevertheless I >> wouldn't want to deal with bug reports if someone does 'VM >> snapshotting' -- how this is correctly handled would be of interest. > > I'd rather say nothing, like passthrough, then there are no > expectations beyond it might work if you know what you're doing. I Why do we need this device then if it doesn't handle migration scenarios in the same or better way than swtpm + tpm_emulator backends already do? > don't really have much interest in the migration use case, but I knew > it should work like the passthrough case, so that's what I tested. I think your device needs to block migrations since it doesn't handle all migration scenarios correctly. Stefan > > James >
On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: > On 12/15/22 15:07, James Bottomley wrote: [...] > > don't really have much interest in the migration use case, but I > > knew it should work like the passthrough case, so that's what I > > tested. > > I think your device needs to block migrations since it doesn't handle > all migration scenarios correctly. Passthrough doesn't block migrations either, presumably because it can also be made to work if you know what you're doing. I might not be particularly interested in migrations, but that's not really a good reason to prevent anyone from ever using them, particularly when the experiment says they do work. James
On 12/15/22 15:30, James Bottomley wrote: > On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: >> On 12/15/22 15:07, James Bottomley wrote: > [...] >>> don't really have much interest in the migration use case, but I >>> knew it should work like the passthrough case, so that's what I >>> tested. >> >> I think your device needs to block migrations since it doesn't handle >> all migration scenarios correctly. > > Passthrough doesn't block migrations either, presumably because it can > also be made to work if you know what you're doing. I might not be Don't compare it to passthrough, compare it to swtpm. It should have at least the same features as swtpm or be better, otherwise I don't see why we need to have the backend device in the upstream repo. Stefan
On Thu, Dec 15, 2022 at 03:53:43PM -0500, Stefan Berger wrote: > > > On 12/15/22 15:30, James Bottomley wrote: > > On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: > > > On 12/15/22 15:07, James Bottomley wrote: > > [...] > > > > don't really have much interest in the migration use case, but I > > > > knew it should work like the passthrough case, so that's what I > > > > tested. > > > > > > I think your device needs to block migrations since it doesn't handle > > > all migration scenarios correctly. > > > > Passthrough doesn't block migrations either, presumably because it can > > also be made to work if you know what you're doing. I might not be > > Don't compare it to passthrough, compare it to swtpm. It should > have at least the same features as swtpm or be better, otherwise > I don't see why we need to have the backend device in the upstream > repo. James has explained multiple times that mssim is a beneficial thing to support, given that it is the reference implementation of TPM2. Requiring the same or greater features than swtpm is an unreasonable thing to demand. With regards, Daniel
On 12/16/22 05:27, Daniel P. Berrangé wrote: > On Thu, Dec 15, 2022 at 03:53:43PM -0500, Stefan Berger wrote: >> >> >> On 12/15/22 15:30, James Bottomley wrote: >>> On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: >>>> On 12/15/22 15:07, James Bottomley wrote: >>> [...] >>>>> don't really have much interest in the migration use case, but I >>>>> knew it should work like the passthrough case, so that's what I >>>>> tested. >>>> >>>> I think your device needs to block migrations since it doesn't handle >>>> all migration scenarios correctly. >>> >>> Passthrough doesn't block migrations either, presumably because it can >>> also be made to work if you know what you're doing. I might not be >> >> Don't compare it to passthrough, compare it to swtpm. It should >> have at least the same features as swtpm or be better, otherwise >> I don't see why we need to have the backend device in the upstream >> repo. > > James has explained multiple times that mssim is a beneficial > thing to support, given that it is the reference implementation > of TPM2. Requiring the same or greater features than swtpm is > an unreasonable thing to demand. Nevertheless it needs documentation and has to handle migration scenarios either via a blocker or it has to handle them all correctly. Since it's supposed to be a TPM running remote you had asked for TLS support iirc. Stefan > > With regards, > Daniel
On Fri, Dec 16, 2022 at 07:28:59AM -0500, Stefan Berger wrote: > > > On 12/16/22 05:27, Daniel P. Berrangé wrote: > > On Thu, Dec 15, 2022 at 03:53:43PM -0500, Stefan Berger wrote: > > > > > > > > > On 12/15/22 15:30, James Bottomley wrote: > > > > On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: > > > > > On 12/15/22 15:07, James Bottomley wrote: > > > > [...] > > > > > > don't really have much interest in the migration use case, but I > > > > > > knew it should work like the passthrough case, so that's what I > > > > > > tested. > > > > > > > > > > I think your device needs to block migrations since it doesn't handle > > > > > all migration scenarios correctly. > > > > > > > > Passthrough doesn't block migrations either, presumably because it can > > > > also be made to work if you know what you're doing. I might not be > > > > > > Don't compare it to passthrough, compare it to swtpm. It should > > > have at least the same features as swtpm or be better, otherwise > > > I don't see why we need to have the backend device in the upstream > > > repo. > > > > James has explained multiple times that mssim is a beneficial > > thing to support, given that it is the reference implementation > > of TPM2. Requiring the same or greater features than swtpm is > > an unreasonable thing to demand. > > Nevertheless it needs documentation and has to handle migration > scenarios either via a blocker or it has to handle them all > correctly. Since it's supposed to be a TPM running remote you > had asked for TLS support iirc. If the mssim implmentation doesn't provide TLS itself, then I don't consider that a blocker on the QEMU side, merely a nice-to-have. With swtpm the control channel is being used to load and store state during the migration dance. This makes the use of an external process largely transparent to the user, since QEMU handles all the state save/load as part of its migration data stream. With mssim there is state save/load co-ordination with QEMU. Instead whomever/whatever is managing the mssim instance, is responsible for ensuring it is running with the correct state at the time QEMU does a vmstate load. If doing a live migration this co-ordination is trivial if you just use the same mssim instance for both src/dst to connect to. If doing save/store to disk, the user needs to be able to save the mssim state and load it again later. If doing snapshots and reverting to old snapshots, then again whomever manages mssim needs to be keeping saved TPM state corresponding to each QEMU snapshot saved, and picking the right one when restoring to old snapshots. QEMU exposes enough functionality to enable a mgmt app / admin user to achieve all of this. This is not as seemlessly integrated with swtpm is, but it is still technically posssible todo the right thing with migration from QEMU's POV. Whether or not the app/person managing mssim instance actually does the right thing in practice is not a concern of QEMU. I don't see a need for a migration blocker here. With regards, Daniel
On 12/16/22 07:54, Daniel P. Berrangé wrote: > On Fri, Dec 16, 2022 at 07:28:59AM -0500, Stefan Berger wrote: >> >> >> On 12/16/22 05:27, Daniel P. Berrangé wrote: >>> On Thu, Dec 15, 2022 at 03:53:43PM -0500, Stefan Berger wrote: >>>> >>>> >>>> On 12/15/22 15:30, James Bottomley wrote: >>>>> On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: >>>>>> On 12/15/22 15:07, James Bottomley wrote: >>>>> [...] >>>>>>> don't really have much interest in the migration use case, but I >>>>>>> knew it should work like the passthrough case, so that's what I >>>>>>> tested. >>>>>> >>>>>> I think your device needs to block migrations since it doesn't handle >>>>>> all migration scenarios correctly. >>>>> >>>>> Passthrough doesn't block migrations either, presumably because it can >>>>> also be made to work if you know what you're doing. I might not be >>>> >>>> Don't compare it to passthrough, compare it to swtpm. It should >>>> have at least the same features as swtpm or be better, otherwise >>>> I don't see why we need to have the backend device in the upstream >>>> repo. >>> >>> James has explained multiple times that mssim is a beneficial >>> thing to support, given that it is the reference implementation >>> of TPM2. Requiring the same or greater features than swtpm is >>> an unreasonable thing to demand. >> >> Nevertheless it needs documentation and has to handle migration >> scenarios either via a blocker or it has to handle them all >> correctly. Since it's supposed to be a TPM running remote you >> had asked for TLS support iirc. > > If the mssim implmentation doesn't provide TLS itself, then I don't > consider that a blocker on the QEMU side, merely a nice-to-have. > > With swtpm the control channel is being used to load and store state > during the migration dance. This makes the use of an external process > largely transparent to the user, since QEMU handles all the state > save/load as part of its migration data stream. > > With mssim there is state save/load co-ordination with QEMU. Instead > whomever/whatever is managing the mssim instance, is responsible for > ensuring it is running with the correct state at the time QEMU does > a vmstate load. If doing a live migration this co-ordination is trivial > if you just use the same mssim instance for both src/dst to connect to. > > If doing save/store to disk, the user needs to be able to save the mssim > state and load it again later. If doing snapshots and reverting to old There is no way for storing and loading the *volatile state* of the mssim device. > snapshots, then again whomever manages mssim needs to be keeping saved > TPM state corresponding to each QEMU snapshot saved, and picking the > right one when restoring to old snapshots. This doesn't work. Either way, if it's possible it can be documented and shown how this works. > > QEMU exposes enough functionality to enable a mgmt app / admin us> achieve all of this. How do you store the volatile state of this device, like the current state of the PCRs, loaded sessions etc? It doesn't support this. > > This is not as seemlessly integrated with swtpm is, but it is still > technically posssible todo the right thing with migration from QEMU's > POV. Whether or not the app/person managing mssim instance actually > does the right thing in practice is not a concern of QEMU. I don't > see a need for a migration blocker here. I do see it because the *volatile state* cannot be extracted from this device. The state of the PCRs is going to be lost. Regards, Stefan > > With regards, > Daniel
On Fri, 2022-12-16 at 08:32 -0500, Stefan Berger wrote: > On 12/16/22 07:54, Daniel P. Berrangé wrote: > > On Fri, Dec 16, 2022 at 07:28:59AM -0500, Stefan Berger wrote: [...] > > > Nevertheless it needs documentation and has to handle migration > > > scenarios either via a blocker or it has to handle them all > > > correctly. Since it's supposed to be a TPM running remote you > > > had asked for TLS support iirc. > > > > If the mssim implmentation doesn't provide TLS itself, then I don't > > consider that a blocker on the QEMU side, merely a nice-to-have. > > > > With swtpm the control channel is being used to load and store > > state during the migration dance. This makes the use of an external > > process largely transparent to the user, since QEMU handles all the > > state save/load as part of its migration data stream. > > > > With mssim there is state save/load co-ordination with QEMU. > > Instead whomever/whatever is managing the mssim instance, is > > responsible for ensuring it is running with the correct state at > > the time QEMU does a vmstate load. If doing a live migration this > > co-ordination is trivial if you just use the same mssim instance > > for both src/dst to connect to. > > > > If doing save/store to disk, the user needs to be able to save the > > mssim state and load it again later. If doing snapshots and > > reverting to old > > There is no way for storing and loading the *volatile state* of the > mssim device. Well, yes there is, it saves internal TPM state to an NVChip file: https://github.com/microsoft/ms-tpm-20-ref/blob/main/TPMCmd/Platform/src/NVMem.c However, if I were running this as a service, I'd condition saving and restoring state on a connection protocol, which would mean QEMU wouldn't have to worry about it. The simplest approach, of course, is just to keep the service running even when the VM is suspended so the state is kept internally. > > snapshots, then again whomever manages mssim needs to be keeping > > saved TPM state corresponding to each QEMU snapshot saved, and > > picking the right one when restoring to old snapshots. > > This doesn't work. I already told you I tested this and it does work. I'll actually add the migration state check to the power on/off path because I need that for testing S3 anyway. > Either way, if it's possible it can be documented and shown how this > works. I could do a blog post, but I really don't think you want this in official documentation because that creates support expectations. > > > QEMU exposes enough functionality to enable a mgmt app / admin us> > > achieve all of this. > > How do you store the volatile state of this device, like the current > state of the PCRs, loaded sessions etc? It doesn't support this. That's not the only way of doing migration. This precise problem exists for VFIO and PCI pass through devices as well: external state is stored in the card and that state must be matched in some way for the card to work on resume. Pretty much any external device coupled to the VM has this problem. As I keep saying you're thinking about this in the wrong way: it's not a system directly slaved to QEMU it's an independent daemon which must be managed separately. The design is for it to function like a passthrough. > > This is not as seemlessly integrated with swtpm is, but it is still > > technically posssible todo the right thing with migration from > > QEMU's POV. Whether or not the app/person managing mssim instance > > actually does the right thing in practice is not a concern of QEMU. > > I don't see a need for a migration blocker here. > > I do see it because the *volatile state* cannot be extracted from > this device. The state of the PCRs is going to be lost. Installing a migration blocker would prevent me from exercising the S3 paths, which I want to test. James
On 12/16/22 08:53, James Bottomley wrote: > On Fri, 2022-12-16 at 08:32 -0500, Stefan Berger wrote: >> On 12/16/22 07:54, Daniel P. Berrangé wrote: >>> On Fri, Dec 16, 2022 at 07:28:59AM -0500, Stefan Berger wrote: > [...] >>>> Nevertheless it needs documentation and has to handle migration >>>> scenarios either via a blocker or it has to handle them all >>>> correctly. Since it's supposed to be a TPM running remote you >>>> had asked for TLS support iirc. >>> >>> If the mssim implmentation doesn't provide TLS itself, then I don't >>> consider that a blocker on the QEMU side, merely a nice-to-have. >>> >>> With swtpm the control channel is being used to load and store >>> state during the migration dance. This makes the use of an external >>> process largely transparent to the user, since QEMU handles all the >>> state save/load as part of its migration data stream. >>> >>> With mssim there is state save/load co-ordination with QEMU. >>> Instead whomever/whatever is managing the mssim instance, is >>> responsible for ensuring it is running with the correct state at >>> the time QEMU does a vmstate load. If doing a live migration this >>> co-ordination is trivial if you just use the same mssim instance >>> for both src/dst to connect to. >>> >>> If doing save/store to disk, the user needs to be able to save the >>> mssim state and load it again later. If doing snapshots and >>> reverting to old >> >> There is no way for storing and loading the *volatile state* of the >> mssim device. > > Well, yes there is, it saves internal TPM state to an NVChip file: > > https://github.com/microsoft/ms-tpm-20-ref/blob/main/TPMCmd/Platform/src/NVMem.c > > However, if I were running this as a service, I'd condition saving and > restoring state on a connection protocol, which would mean QEMU > wouldn't have to worry about it. The simplest approach, of course, is > just to keep the service running even when the VM is suspended so the > state is kept internally. > >>> snapshots, then again whomever manages mssim needs to be keeping >>> saved TPM state corresponding to each QEMU snapshot saved, and >>> picking the right one when restoring to old snapshots. >> >> This doesn't work. > > I already told you I tested this and it does work. I'll actually add > the migration state check to the power on/off path because I need that > for testing S3 anyway. Please document how this needs to be done. > >> Either way, if it's possible it can be documented and shown how this >> works. > > I could do a blog post, but I really don't think you want this in > official documentation because that creates support expectations. We have documentation for passthrough and tpm_emulator. If you don't want to add documentation for it to QEMU then please add the driver in as 'unsupported'. diff --git a/MAINTAINERS b/MAINTAINERS index 1729c0901c..32fa2eb282 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3017,6 +3017,7 @@ F: include/hw/acpi/tpm.h F: include/sysemu/tpm* F: qapi/tpm.json F: backends/tpm/ +X: backends/tpm/tpm_mssim.* F: tests/qtest/*tpm* T: git https://github.com/stefanberger/qemu-tpm.git tpm-next Stefan
On Fri, Dec 16, 2022 at 08:32:44AM -0500, Stefan Berger wrote: > > > On 12/16/22 07:54, Daniel P. Berrangé wrote: > > On Fri, Dec 16, 2022 at 07:28:59AM -0500, Stefan Berger wrote: > > > > > > > > > On 12/16/22 05:27, Daniel P. Berrangé wrote: > > > > On Thu, Dec 15, 2022 at 03:53:43PM -0500, Stefan Berger wrote: > > > > > > > > > > > > > > > On 12/15/22 15:30, James Bottomley wrote: > > > > > > On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: > > > > > > > On 12/15/22 15:07, James Bottomley wrote: > > > > > > [...] > > > > > > > > don't really have much interest in the migration use case, but I > > > > > > > > knew it should work like the passthrough case, so that's what I > > > > > > > > tested. > > > > > > > > > > > > > > I think your device needs to block migrations since it doesn't handle > > > > > > > all migration scenarios correctly. > > > > > > > > > > > > Passthrough doesn't block migrations either, presumably because it can > > > > > > also be made to work if you know what you're doing. I might not be > > > > > > > > > > Don't compare it to passthrough, compare it to swtpm. It should > > > > > have at least the same features as swtpm or be better, otherwise > > > > > I don't see why we need to have the backend device in the upstream > > > > > repo. > > > > > > > > James has explained multiple times that mssim is a beneficial > > > > thing to support, given that it is the reference implementation > > > > of TPM2. Requiring the same or greater features than swtpm is > > > > an unreasonable thing to demand. > > > > > > Nevertheless it needs documentation and has to handle migration > > > scenarios either via a blocker or it has to handle them all > > > correctly. Since it's supposed to be a TPM running remote you > > > had asked for TLS support iirc. > > > > If the mssim implmentation doesn't provide TLS itself, then I don't > > consider that a blocker on the QEMU side, merely a nice-to-have. > > > > With swtpm the control channel is being used to load and store state > > during the migration dance. This makes the use of an external process > > largely transparent to the user, since QEMU handles all the state > > save/load as part of its migration data stream. > > > > With mssim there is state save/load co-ordination with QEMU. Instead > > whomever/whatever is managing the mssim instance, is responsible for > > ensuring it is running with the correct state at the time QEMU does > > a vmstate load. If doing a live migration this co-ordination is trivial > > if you just use the same mssim instance for both src/dst to connect to. > > > > If doing save/store to disk, the user needs to be able to save the mssim > > state and load it again later. If doing snapshots and reverting to old > > There is no way for storing and loading the *volatile state* of the > mssim device. > > > snapshots, then again whomever manages mssim needs to be keeping saved > > TPM state corresponding to each QEMU snapshot saved, and picking the > > right one when restoring to old snapshots. > > This doesn't work. > Either way, if it's possible it can be documented and shown how this works. > > > > > QEMU exposes enough functionality to enable a mgmt app / admin us> achieve all of this. > > How do you store the volatile state of this device, like the current > state of the PCRs, loaded sessions etc? It doesn't support this. > > > > > This is not as seemlessly integrated with swtpm is, but it is still > > technically posssible todo the right thing with migration from QEMU's > > POV. Whether or not the app/person managing mssim instance actually > > does the right thing in practice is not a concern of QEMU. I don't > > see a need for a migration blocker here. > > I do see it because the *volatile state* cannot be extracted from > this device. The state of the PCRs is going to be lost. All the objections you're raising are related to the current specifics of the implementation of the mssim remote server. While valid, this is of no concern to QEMU when deciding whether to require a migration blocker on the client side. This is 3rd party remote service that should be considered a black box from QEMU's POV. It is possible to write a remote server that supports the mssim network protocol, and has the ability to serialize its state. Whether such an impl exists today or not is separate. With regards, Daniel
On 12/16/22 09:29, Daniel P. Berrangé wrote: > > All the objections you're raising are related to the current > specifics of the implementation of the mssim remote server. > While valid, this is of no concern to QEMU when deciding whether > to require a migration blocker on the client side. This is 3rd > party remote service that should be considered a black box from > QEMU's POV. It is possible to write a remote server that supports > the mssim network protocol, and has the ability to serialize > its state. Whether such an impl exists today or not is separate. Then let's document the scenarios so someone can repeat them, I think this is just fair. James said he tested state migration scenarios and it works, so let's enable others to do it as well. I am open to someone maintaining just this driver and the dynamics that may develop around it. Regards, Stefan > With regards, > Daniel
On Fri, 2022-12-16 at 09:55 -0500, Stefan Berger wrote: > > > On 12/16/22 09:29, Daniel P. Berrangé wrote: > > > > > All the objections you're raising are related to the current > > specifics of the implementation of the mssim remote server. > > While valid, this is of no concern to QEMU when deciding whether > > to require a migration blocker on the client side. This is 3rd > > party remote service that should be considered a black box from > > QEMU's POV. It is possible to write a remote server that supports > > the mssim network protocol, and has the ability to serialize > > its state. Whether such an impl exists today or not is separate. > > Then let's document the scenarios so someone can repeat them, I think > this is just fair. James said he tested state migration scenarios and > it works, so let's enable others to do it as well. I am open to > someone maintaining just this driver and the dynamics that may > develop around it. Well, OK, this is what I think would be appropriate ... I'll fold it in to the second patch. James --- diff --git a/docs/specs/tpm.rst b/docs/specs/tpm.rst index 535912a92b..985d0775a0 100644 --- a/docs/specs/tpm.rst +++ b/docs/specs/tpm.rst @@ -270,6 +270,38 @@ available as a module (assuming a TPM 2 is passed through): /sys/devices/LNXSYSTEM:00/LNXSYBUS:00/MSFT0101:00/tpm/tpm0/pcr-sha256/9 ... +The QEMU TPM Microsoft Simulator Device +--------------------------------------- + +The TCG provides a reference implementation for TPM 2.0 written by +Microsoft (See `ms-tpm-20-ref`_ on github). The reference implementation +starts a network server and listens for TPM commands on port 2321 and +TPM Platform control commands on port 2322, although these can be +altered. The QEMU mssim TPM backend talks to this implementation. By +default it connects to the default ports on localhost: + +.. code-block:: console + + qemu-system-x86_64 <qemu-options> \ + -tpmdev mssim,id=tpm0 \ + -device tpm-crb,tpmdev=tpm0 + + +Although it can also communicate with a remote host, which must be +specified as a SocketAddress via json on the command line for each of +the command and control ports: + +.. code-block:: console + + qemu-system-x86_64 <qemu-options> \ + -tpmdev "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote','port':'2321'},'control':{'type':'inet','host':'remote','port':'2322'}}" \ + -device tpm-crb,tpmdev=tpm0 + + +The mssim backend supports snapshotting and migration, but the state +of the Microsoft Simulator server must be preserved (or the server +kept running) outside of QEMU for restore to be successful. + The QEMU TPM emulator device ---------------------------- @@ -526,3 +558,6 @@ the following: .. _SWTPM protocol: https://github.com/stefanberger/swtpm/blob/master/man/man3/swtpm_ioctls.pod + +.. _ms-tpm-20-ref: + https://github.com/microsoft/ms-tpm-20-ref
On 12/16/22 10:48, James Bottomley wrote: > On Fri, 2022-12-16 at 09:55 -0500, Stefan Berger wrote: >> >> >> On 12/16/22 09:29, Daniel P. Berrangé wrote: >> >>> >>> All the objections you're raising are related to the current >>> specifics of the implementation of the mssim remote server. >>> While valid, this is of no concern to QEMU when deciding whether >>> to require a migration blocker on the client side. This is 3rd >>> party remote service that should be considered a black box from >>> QEMU's POV. It is possible to write a remote server that supports >>> the mssim network protocol, and has the ability to serialize >>> its state. Whether such an impl exists today or not is separate. >> >> Then let's document the scenarios so someone can repeat them, I think >> this is just fair. James said he tested state migration scenarios and >> it works, so let's enable others to do it as well. I am open to >> someone maintaining just this driver and the dynamics that may >> develop around it. > > Well, OK, this is what I think would be appropriate ... I'll fold it in > to the second patch. > > James > > --- > > diff --git a/docs/specs/tpm.rst b/docs/specs/tpm.rst > index 535912a92b..985d0775a0 100644 > --- a/docs/specs/tpm.rst > +++ b/docs/specs/tpm.rst > @@ -270,6 +270,38 @@ available as a module (assuming a TPM 2 is passed through): > /sys/devices/LNXSYSTEM:00/LNXSYBUS:00/MSFT0101:00/tpm/tpm0/pcr-sha256/9 > ... > > +The QEMU TPM Microsoft Simulator Device > +--------------------------------------- > + > +The TCG provides a reference implementation for TPM 2.0 written by > +Microsoft (See `ms-tpm-20-ref`_ on github). The reference implementation > +starts a network server and listens for TPM commands on port 2321 and > +TPM Platform control commands on port 2322, although these can be > +altered. The QEMU mssim TPM backend talks to this implementation. By > +default it connects to the default ports on localhost: > + > +.. code-block:: console > + > + qemu-system-x86_64 <qemu-options> \ > + -tpmdev mssim,id=tpm0 \ > + -device tpm-crb,tpmdev=tpm0 > + > + > +Although it can also communicate with a remote host, which must be > +specified as a SocketAddress via json on the command line for each of > +the command and control ports: > + > +.. code-block:: console > + > + qemu-system-x86_64 <qemu-options> \ > + -tpmdev "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote','port':'2321'},'control':{'type':'inet','host':'remote','port':'2322'}}" \ > + -device tpm-crb,tpmdev=tpm0 > + > + > +The mssim backend supports snapshotting and migration, but the state > +of the Microsoft Simulator server must be preserved (or the server > +kept running) outside of QEMU for restore to be successful. You said you tested it. Can you show how to set it up with command lines? I want to try out at least suspend and resume . Stefan > + > The QEMU TPM emulator device > ---------------------------- > > @@ -526,3 +558,6 @@ the following: > > .. _SWTPM protocol: > https://github.com/stefanberger/swtpm/blob/master/man/man3/swtpm_ioctls.pod > + > +.. _ms-tpm-20-ref: > + https://github.com/microsoft/ms-tpm-20-ref >
On Fri, 2022-12-16 at 11:08 -0500, Stefan Berger wrote: > On 12/16/22 10:48, James Bottomley wrote: [...] > > +The mssim backend supports snapshotting and migration, but the > > state > > +of the Microsoft Simulator server must be preserved (or the server > > +kept running) outside of QEMU for restore to be successful. > > You said you tested it. Can you show how to set it up with command > lines? I want to try out at least suspend and resume . I already did here: https://lore.kernel.org/qemu-devel/77bc5a11fcb7b06deba1c54b1ef2de28e0c53fb1.camel@linux.ibm.com/ But to recap, it's stop migrate "exec:gzip -c > STATEFILE.gz" quit Followed by a restart with <qemu-command-line> -incoming "exec: gzip -c -d STATEFILE.gz" James
On 12/16/22 11:13, James Bottomley wrote: > On Fri, 2022-12-16 at 11:08 -0500, Stefan Berger wrote: >> On 12/16/22 10:48, James Bottomley wrote: > [...] >>> +The mssim backend supports snapshotting and migration, but the >>> state >>> +of the Microsoft Simulator server must be preserved (or the server >>> +kept running) outside of QEMU for restore to be successful. >> >> You said you tested it. Can you show how to set it up with command >> lines? I want to try out at least suspend and resume . > > I already did here: > > https://lore.kernel.org/qemu-devel/77bc5a11fcb7b06deba1c54b1ef2de28e0c53fb1.camel@linux.ibm.com/ > > But to recap, it's > > stop > migrate "exec:gzip -c > STATEFILE.gz" > quit > > Followed by a restart with > > <qemu-command-line> -incoming "exec: gzip -c -d STATEFILE.gz" Good, you can put it into the documentation. Can I do a reboot of the host in between or does the TPM have to keep on running? Stefan > > James >
On 12/16/22 08:53, James Bottomley wrote: > > I could do a blog post, but I really don't think you want this in > official documentation because that creates support expectations. We get support expectations if we don't mention it as not being supported. So, since this driver is not supported the documentation for QEMU should state something along the lines of 'this driver is for experimental or testing purposes and is otherwise unsupported.' That's fair to the user and maintainer. Nevertheless, if the documentation (or as a matter of fact the code) was to claim that VM / TPM state migration scenarios, such as VM snapshotting, are working then users should be able to ask someone 'how' this can be done with the mssim protocol **today**. Since I cannot answer that question you may need to find a way for how to address this concern. Regards, Stefan
On Mon, 2022-12-19 at 06:49 -0500, Stefan Berger wrote: > > > On 12/16/22 08:53, James Bottomley wrote: > > > > > I could do a blog post, but I really don't think you want this in > > official documentation because that creates support expectations. > > We get support expectations if we don't mention it as not being > supported. So, since this driver is not supported the documentation > for QEMU should state something along the lines of 'this driver is > for experimental or testing purposes and is otherwise unsupported.' > That's fair to the user and maintainer. Open source project don't provide support. I already added a Maintainer entry for it, so I'll maintain it. > Nevertheless, if the documentation (or as a matter of fact the code) > was to claim that VM / TPM state migration scenarios, such as VM > snapshotting, are working then users should be able to ask someone > 'how' this can be done with the mssim protocol **today**. Since I > cannot answer that question you may need to find a way for how to > address this concern. I already proposed all of this ... you were the one wanting to document migration. The current wording is: The mssim backend supports snapshotting and migration, but the state of the Microsoft Simulator server must be preserved (or the server kept running) outside of QEMU for restore to be successful. James
On 12/19/22 08:02, James Bottomley wrote: > On Mon, 2022-12-19 at 06:49 -0500, Stefan Berger wrote: >> >> >> On 12/16/22 08:53, James Bottomley wrote: >> >>> >>> I could do a blog post, but I really don't think you want this in >>> official documentation because that creates support expectations. >> >> We get support expectations if we don't mention it as not being >> supported. So, since this driver is not supported the documentation >> for QEMU should state something along the lines of 'this driver is >> for experimental or testing purposes and is otherwise unsupported.' >> That's fair to the user and maintainer. > > Open source project don't provide support. I already added a > Maintainer entry for it, so I'll maintain it. Support for me means reacting to user questions and addressing issues. Good that you maintain this now. > >> Nevertheless, if the documentation (or as a matter of fact the code) >> was to claim that VM / TPM state migration scenarios, such as VM >> snapshotting, are working then users should be able to ask someone >> 'how' this can be done with the mssim protocol **today**. Since I >> cannot answer that question you may need to find a way for how to >> address this concern. > > I already proposed all of this ... you were the one wanting to document > migration. The current wording is: With documenting I wanted to see how users need to provide command lines for the mssim TPM. > > The mssim backend supports snapshotting and migration, but the state > of the Microsoft Simulator server must be preserved (or the server > kept running) outside of QEMU for restore to be successful. VM snapshotting is basically VM suspend / resume on steroids requiring permanent and volatile state to be saved and restoreable from possible very different points in time with possibly different seeds, NVRAM locations etc. How the mssim protocol does this is non-obvious to me and how one coordinates the restoring and saving of the TPM's state without direct coordination by QEMU is also non-obvious. Stefan > > James > >
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Fri, Dec 16, 2022 at 08:32:44AM -0500, Stefan Berger wrote: > > > > > > On 12/16/22 07:54, Daniel P. Berrangé wrote: > > > On Fri, Dec 16, 2022 at 07:28:59AM -0500, Stefan Berger wrote: > > > > > > > > > > > > On 12/16/22 05:27, Daniel P. Berrangé wrote: > > > > > On Thu, Dec 15, 2022 at 03:53:43PM -0500, Stefan Berger wrote: > > > > > > > > > > > > > > > > > > On 12/15/22 15:30, James Bottomley wrote: > > > > > > > On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote: > > > > > > > > On 12/15/22 15:07, James Bottomley wrote: > > > > > > > [...] > > > > > > > > > don't really have much interest in the migration use case, but I > > > > > > > > > knew it should work like the passthrough case, so that's what I > > > > > > > > > tested. > > > > > > > > > > > > > > > > I think your device needs to block migrations since it doesn't handle > > > > > > > > all migration scenarios correctly. > > > > > > > > > > > > > > Passthrough doesn't block migrations either, presumably because it can > > > > > > > also be made to work if you know what you're doing. I might not be > > > > > > > > > > > > Don't compare it to passthrough, compare it to swtpm. It should > > > > > > have at least the same features as swtpm or be better, otherwise > > > > > > I don't see why we need to have the backend device in the upstream > > > > > > repo. > > > > > > > > > > James has explained multiple times that mssim is a beneficial > > > > > thing to support, given that it is the reference implementation > > > > > of TPM2. Requiring the same or greater features than swtpm is > > > > > an unreasonable thing to demand. > > > > > > > > Nevertheless it needs documentation and has to handle migration > > > > scenarios either via a blocker or it has to handle them all > > > > correctly. Since it's supposed to be a TPM running remote you > > > > had asked for TLS support iirc. > > > > > > If the mssim implmentation doesn't provide TLS itself, then I don't > > > consider that a blocker on the QEMU side, merely a nice-to-have. > > > > > > With swtpm the control channel is being used to load and store state > > > during the migration dance. This makes the use of an external process > > > largely transparent to the user, since QEMU handles all the state > > > save/load as part of its migration data stream. > > > > > > With mssim there is state save/load co-ordination with QEMU. Instead > > > whomever/whatever is managing the mssim instance, is responsible for > > > ensuring it is running with the correct state at the time QEMU does > > > a vmstate load. If doing a live migration this co-ordination is trivial > > > if you just use the same mssim instance for both src/dst to connect to. > > > > > > If doing save/store to disk, the user needs to be able to save the mssim > > > state and load it again later. If doing snapshots and reverting to old > > > > There is no way for storing and loading the *volatile state* of the > > mssim device. > > > > > snapshots, then again whomever manages mssim needs to be keeping saved > > > TPM state corresponding to each QEMU snapshot saved, and picking the > > > right one when restoring to old snapshots. > > > > This doesn't work. > > Either way, if it's possible it can be documented and shown how this works. > > > > > > > > QEMU exposes enough functionality to enable a mgmt app / admin us> achieve all of this. > > > > How do you store the volatile state of this device, like the current > > state of the PCRs, loaded sessions etc? It doesn't support this. > > > > > > > > This is not as seemlessly integrated with swtpm is, but it is still > > > technically posssible todo the right thing with migration from QEMU's > > > POV. Whether or not the app/person managing mssim instance actually > > > does the right thing in practice is not a concern of QEMU. I don't > > > see a need for a migration blocker here. > > > > I do see it because the *volatile state* cannot be extracted from > > this device. The state of the PCRs is going to be lost. > > All the objections you're raising are related to the current > specifics of the implementation of the mssim remote server. > While valid, this is of no concern to QEMU when deciding whether > to require a migration blocker on the client side. This is 3rd > party remote service that should be considered a black box from > QEMU's POV. It is possible to write a remote server that supports > the mssim network protocol, and has the ability to serialize > its state. Whether such an impl exists today or not is separate. We would normally want an example of a working implementation though wouldn't we? So I think it's fair to at least want some documentation; if it can be documented and works, fine; if it doesn't work, then it needs a blocker. Dave > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > >
On Mon, 2023-01-09 at 16:59 +0000, Dr. David Alan Gilbert wrote: > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > On Fri, Dec 16, 2022 at 08:32:44AM -0500, Stefan Berger wrote: [...] > > > I do see it because the *volatile state* cannot be extracted from > > > this device. The state of the PCRs is going to be lost. > > > > All the objections you're raising are related to the current > > specifics of the implementation of the mssim remote server. > > While valid, this is of no concern to QEMU when deciding whether > > to require a migration blocker on the client side. This is 3rd > > party remote service that should be considered a black box from > > QEMU's POV. It is possible to write a remote server that supports > > the mssim network protocol, and has the ability to serialize > > its state. Whether such an impl exists today or not is separate. > > We would normally want an example of a working implementation though > wouldn't we? > > So I think it's fair to at least want some documentation; if it can > be documented and works, fine; if it doesn't work, then it needs a > blocker. It works under limited circumstances ... in fact similar circumstances passthrough migration works under, which is also not documented. The external MSSIM TPM emulator has to be kept running to preserve the state. If you restart it, the migration will fail. James
* James Bottomley (jejb@linux.ibm.com) wrote: > On Mon, 2023-01-09 at 16:59 +0000, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrangé (berrange@redhat.com) wrote: > > > On Fri, Dec 16, 2022 at 08:32:44AM -0500, Stefan Berger wrote: > [...] > > > > I do see it because the *volatile state* cannot be extracted from > > > > this device. The state of the PCRs is going to be lost. > > > > > > All the objections you're raising are related to the current > > > specifics of the implementation of the mssim remote server. > > > While valid, this is of no concern to QEMU when deciding whether > > > to require a migration blocker on the client side. This is 3rd > > > party remote service that should be considered a black box from > > > QEMU's POV. It is possible to write a remote server that supports > > > the mssim network protocol, and has the ability to serialize > > > its state. Whether such an impl exists today or not is separate. > > > > We would normally want an example of a working implementation though > > wouldn't we? > > > > So I think it's fair to at least want some documentation; if it can > > be documented and works, fine; if it doesn't work, then it needs a > > blocker. > > It works under limited circumstances ... in fact similar circumstances > passthrough migration works under, Well, not that similar - people expect passthrough migration to fail because, being nailed to a physical servers hardware it's not likely to migrate; where as you're creating a new virtual thing which people might imagine is similar to the existing swtpm. Their imagination might be wrong and thus you need to say why. > which is also not documented. The Inductive proof that we should have no good documentation doesn't get us anywhere. > external MSSIM TPM emulator has to be kept running to preserve the > state. If you restart it, the migration will fail. Document that and we're getting there. Dave > James >
On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert wrote: > * James Bottomley (jejb@linux.ibm.com) wrote: [...] > > external MSSIM TPM emulator has to be kept running to preserve the > > state. If you restart it, the migration will fail. > > Document that and we're getting there. The documentation in the current patch series says ---- The mssim backend supports snapshotting and migration, but the state of the Microsoft Simulator server must be preserved (or the server kept running) outside of QEMU for restore to be successful. ---- What, beyond this would you want to see? James
On 1/9/23 12:55, James Bottomley wrote: > On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert wrote: >> * James Bottomley (jejb@linux.ibm.com) wrote: > [...] >>> external MSSIM TPM emulator has to be kept running to preserve the >>> state. If you restart it, the migration will fail. >> >> Document that and we're getting there. > > > The documentation in the current patch series says > > ---- > The mssim backend supports snapshotting and migration, but the state > of the Microsoft Simulator server must be preserved (or the server > kept running) outside of QEMU for restore to be successful. > ---- > > What, beyond this would you want to see? mssim today lacks the functionality of marshalling and unmarshalling the permanent and volatile state of the TPM 2, which are both needed for snapshot support. How does this work with mssim? Stefan > > James >
On Mon, 2023-01-09 at 13:34 -0500, Stefan Berger wrote: > > > On 1/9/23 12:55, James Bottomley wrote: > > On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert wrote: > > > * James Bottomley (jejb@linux.ibm.com) wrote: > > [...] > > > > external MSSIM TPM emulator has to be kept running to preserve > > > > the state. If you restart it, the migration will fail. > > > > > > Document that and we're getting there. > > > > > > The documentation in the current patch series says > > > > ---- > > The mssim backend supports snapshotting and migration, but the > > state of the Microsoft Simulator server must be preserved (or the > > server kept running) outside of QEMU for restore to be successful. > > ---- > > > > What, beyond this would you want to see? > > mssim today lacks the functionality of marshalling and unmarshalling > the permanent and volatile state of the TPM 2, which are both needed > for snapshot support. How does this work with mssim? You preserve the state by keeping the simulator running as the above says. As long as you can preserve the state, there's no maximum time between snapshots. There's no need of marshal/unmarshal if you do this. James
* James Bottomley (jejb@linux.ibm.com) wrote: > On Mon, 2023-01-09 at 13:34 -0500, Stefan Berger wrote: > > > > > > On 1/9/23 12:55, James Bottomley wrote: > > > On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert wrote: > > > > * James Bottomley (jejb@linux.ibm.com) wrote: > > > [...] > > > > > external MSSIM TPM emulator has to be kept running to preserve > > > > > the state. If you restart it, the migration will fail. > > > > > > > > Document that and we're getting there. > > > > > > > > > The documentation in the current patch series says > > > > > > ---- > > > The mssim backend supports snapshotting and migration, but the > > > state of the Microsoft Simulator server must be preserved (or the > > > server kept running) outside of QEMU for restore to be successful. > > > ---- > > > > > > What, beyond this would you want to see? > > > > mssim today lacks the functionality of marshalling and unmarshalling > > the permanent and volatile state of the TPM 2, which are both needed > > for snapshot support. How does this work with mssim? > > You preserve the state by keeping the simulator running as the above > says. As long as you can preserve the state, there's no maximum time > between snapshots. There's no need of marshal/unmarshal if you do > this. So I think I can understand how that works with a suspend/resume; I'm less sure about a live migration. In a live migration, you normally start up the destination VM qemu process and other processes attached to it, prior to the inwards live migration of state. Then you live migrate the state, then kill the source. With this mssim setup, will the start up of the destination attempt to change the vtpm state during the initialisation? Dave > James >
On Mon, 2023-01-09 at 18:54 +0000, Dr. David Alan Gilbert wrote: > * James Bottomley (jejb@linux.ibm.com) wrote: > > On Mon, 2023-01-09 at 13:34 -0500, Stefan Berger wrote: > > > > > > > > > On 1/9/23 12:55, James Bottomley wrote: > > > > On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert > > > > wrote: > > > > > * James Bottomley (jejb@linux.ibm.com) wrote: > > > > [...] > > > > > > external MSSIM TPM emulator has to be kept running to > > > > > > preserve > > > > > > the state. If you restart it, the migration will fail. > > > > > > > > > > Document that and we're getting there. > > > > > > > > > > > > The documentation in the current patch series says > > > > > > > > ---- > > > > The mssim backend supports snapshotting and migration, but the > > > > state of the Microsoft Simulator server must be preserved (or > > > > the > > > > server kept running) outside of QEMU for restore to be > > > > successful. > > > > ---- > > > > > > > > What, beyond this would you want to see? > > > > > > mssim today lacks the functionality of marshalling and > > > unmarshalling > > > the permanent and volatile state of the TPM 2, which are both > > > needed > > > for snapshot support. How does this work with mssim? > > > > You preserve the state by keeping the simulator running as the > > above > > says. As long as you can preserve the state, there's no maximum > > time > > between snapshots. There's no need of marshal/unmarshal if you do > > this. > > So I think I can understand how that works with a suspend/resume; I'm > less sure about a live migration. > > In a live migration, you normally start up the destination VM > qemu process and other processes attached to it, prior to the inwards > live migration of state. Then you live migrate the state, then kill > the source. > > With this mssim setup, will the start up of the destination attempt > to change the vtpm state during the initialisation? The backend driver contains state checks to prevent this, so if you follow the standard migration in https://www.qemu.org/docs/master/devel/migration.html it detects that you have done a migration on shutdown and simply closes the TPM socket. On start up it sees you're in migrate and doesn't do the power on reset of the TPM. James
On 1/9/23 13:51, James Bottomley wrote: > On Mon, 2023-01-09 at 13:34 -0500, Stefan Berger wrote: >> >> >> On 1/9/23 12:55, James Bottomley wrote: >>> On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert wrote: >>>> * James Bottomley (jejb@linux.ibm.com) wrote: >>> [...] >>>>> external MSSIM TPM emulator has to be kept running to preserve >>>>> the state. If you restart it, the migration will fail. >>>> >>>> Document that and we're getting there. >>> >>> >>> The documentation in the current patch series says >>> >>> ---- >>> The mssim backend supports snapshotting and migration, but the >>> state of the Microsoft Simulator server must be preserved (or the >>> server kept running) outside of QEMU for restore to be successful. >>> ---- >>> >>> What, beyond this would you want to see? >> >> mssim today lacks the functionality of marshalling and unmarshalling >> the permanent and volatile state of the TPM 2, which are both needed >> for snapshot support. How does this work with mssim? > > You preserve the state by keeping the simulator running as the above > says. As long as you can preserve the state, there's no maximum time > between snapshots. There's no need of marshal/unmarshal if you do > this From https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg03146.html "VM snapshotting is basically VM suspend / resume on steroids requiring permanent and volatile state to be saved and restoreable from possible very different points in time with possibly different seeds, NVRAM locations etc. How the mssim protocol does this is non-obvious to me and how one coordinates the restoring and saving of the TPM's state without direct coordination by QEMU is also non-obvious." Stefan . > > James >
On 1/9/23 14:01, Stefan Berger wrote: > > > On 1/9/23 13:51, James Bottomley wrote: >> On Mon, 2023-01-09 at 13:34 -0500, Stefan Berger wrote: >>> >>> >>> On 1/9/23 12:55, James Bottomley wrote: >>>> On Mon, 2023-01-09 at 17:52 +0000, Dr. David Alan Gilbert >>>> wrote: >>>>> * James Bottomley (jejb@linux.ibm.com) wrote: >>>> [...] >>>>>> external MSSIM TPM emulator has to be kept running to >>>>>> preserve the state. If you restart it, the migration will >>>>>> fail. >>>>> >>>>> Document that and we're getting there. >>>> >>>> >>>> The documentation in the current patch series says >>>> >>>> ---- The mssim backend supports snapshotting and migration, >>>> but the state of the Microsoft Simulator server must be >>>> preserved (or the server kept running) outside of QEMU for >>>> restore to be successful. ---- >>>> >>>> What, beyond this would you want to see? >>> >>> mssim today lacks the functionality of marshalling and >>> unmarshalling the permanent and volatile state of the TPM 2, >>> which are both needed for snapshot support. How does this work >>> with mssim? >> >> You preserve the state by keeping the simulator running as the >> above says. As long as you can preserve the state, there's no >> maximum time between snapshots. There's no need of >> marshal/unmarshal if you do this > > From > https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg03146.html > > "VM snapshotting is basically VM suspend / resume on steroids > requiring permanent and volatile state to be saved and restoreable > from possible very different points in time with possibly different > seeds, NVRAM locations etc. How the mssim protocol does this is > non-obvious to me and how one coordinates the restoring and saving > of the TPM's state without direct coordination by QEMU is also > non-obvious." One thing, though: I am aware of the issues that may arise due to support for TPM state migration. However, whether TPM state migration becomes an issue depends on how you use the TPM 2. If the use case is to use the TPM 2 as a local crypto device then state migration is likely not an issue. You may have different keys in the TPM 2 at different points in time and even snapshotting may not be an issue but possibly quite a welcome feature to have along with support of scenarios of VM suspend + host upgrade + host reboot + VM resume. If you use TPM 2 for attestation then certain TPM 2 state migration scenarios may become problematic. One could construct a scenario where attestation preceeds some action that requires trust to have been established in the system in the preceeding attestation step and support for snapshotting the state of the TPM 2 could become an issue if I was to wait for the attestation to have been concluded and then I quickly restart a different snapshot that is not trustworthy and the client proceeds thinking that the system is trustworthy (maybe a few SYNs from the client went into the void) Eliminating TPM 2 state migration is probably not a good idea, because environments where attestation may occur may also support VM suspend/resume along with upgrading a host and rebooting the host or VM migration for some sort of host evacuation before upgrade. When it comes to snapshotting and using the TPM 2 as a crypto device just saying that VM snapshot is supported by leaving the TPM 2 running and not touching it doesn't make this function correctly for all scenarios where the TPM 2 may have had different keys loaded. It is even a worse idea for attestation where I could construct a snapshot A and wait until the attestation has passed and then resume with a snapshot A' that runs untrustworty software but uses the state of the TPM 2 from snapshot A times and remains happy to quote the state of the PCRs from before. If launching a snapshot also restores the state of the PCRs that goes along with the state of the system at that time then that at least allows for quotes to have valid contents of PCRs that reflects the system state at snapshot A'. Kexec also comes to mind in this context where I could quickly start a new system post attestation. So physical system could possibly be used for fooling clients as well. A solution for how to resolve this may involve some sort of protocol and a connection that may not be broken *while* the system needs to be in a trusted state. The protocol would have to help detection of substantial changes of state such as resume of some snapshot or kexec into a system. Repeated attestation (with correctly restored TPM 2 state) may also help resolve the issue. Cheers! Stefan > > > Stefan . >> >> James >> >
On Mon, 2023-01-09 at 16:06 -0500, Stefan Berger wrote: > On 1/9/23 14:01, Stefan Berger wrote: [...] > If you use TPM 2 for attestation then certain TPM 2 state migration > scenarios may become problematic. One could construct a scenario > where attestation preceeds some action that requires trust to have > been established in the system in the preceeding attestation step and > support for snapshotting the state of the TPM 2 could become an issue > if I was to wait for the attestation to have been concluded and then > I quickly restart a different snapshot that is not trustworthy and > the client proceeds thinking that the system is trustworthy (maybe a > few SYNs from the client went into the void) You're over thinking this. For a non-confidential VM, Migration gives you a saved image you can always replay from (this is seen as a feature for fast starts) and if you use the tpm_simulator the TPM state is stored in the migration image, so you can always roll it back if you have access to the migration file. Saving the image state is also a huge problem because the TPM seeds are in the clear if the migration image isn't encrypted. The other big problem is that an external software TPM is always going to give up its state to the service provider, regardless of migration, so you have to have some trust in the provider and thus you'd also have to trust them with the migration replay policy. For Confidential VMs, this is a bit different because the vTPM runs in a secure ring inside the confidential enclave and the secure migration agent ensures that either migration and startup happen or migration doesn't happen at all, so for them you don't have to worry about rollback. Provided you can trust the vTPM provider, having external state not stored in the migration image has the potential actually to solve the rollback problem because you could keep the TPM clock running and potentially increase the reset count, so migrations would show up in TPM quotes and you don't have control of the state of the vTPM to replay it. James
On 1/10/23 09:14, James Bottomley wrote: > On Mon, 2023-01-09 at 16:06 -0500, Stefan Berger wrote: >> On 1/9/23 14:01, Stefan Berger wrote: > [...] >> If you use TPM 2 for attestation then certain TPM 2 state migration >> scenarios may become problematic. One could construct a scenario >> where attestation preceeds some action that requires trust to have >> been established in the system in the preceeding attestation step and >> support for snapshotting the state of the TPM 2 could become an issue >> if I was to wait for the attestation to have been concluded and then >> I quickly restart a different snapshot that is not trustworthy and >> the client proceeds thinking that the system is trustworthy (maybe a >> few SYNs from the client went into the void) > > You're over thinking this. For a non-confidential VM, Migration gives > you a saved image you can always replay from (this is seen as a feature > for fast starts) and if you use the tpm_simulator the TPM state is > stored in the migration image, so you can always roll it back if you 'How' is it stored in the migration image? Does tpm_simulator marshal and unmarshal the state so that it is carried inside the save image? For the tpm_emulator backend this particular code is here: - https://github.com/qemu/qemu/blob/master/backends/tpm/tpm_emulator.c#L758 - https://github.com/qemu/qemu/blob/master/backends/tpm/tpm_emulator.c#L792 > have access to the migration file. Saving the image state is also a > huge problem because the TPM seeds are in the clear if the migration > image isn't encrypted. The other big problem is that an external True. DAC protection of the file versus protection via encryption. Neither really helps against malicious root. > software TPM is always going to give up its state to the service > provider, regardless of migration, so you have to have some trust in > the provider and thus you'd also have to trust them with the migration > replay policy. For Confidential VMs, this is a bit different because > the vTPM runs in a secure ring inside the confidential enclave and the > secure migration agent ensures that either migration and startup happen > or migration doesn't happen at all, so for them you don't have to worry > about rollback. what is the enclave here? Is it an SGX enclave or is it running somewhere inside the address space of the VM? > > Provided you can trust the vTPM provider, having external state not > stored in the migration image has the potential actually to solve the > rollback problem because you could keep the TPM clock running and > potentially increase the reset count, so migrations would show up in > TPM quotes and you don't have control of the state of the vTPM to > replay it. I just don't see how you do that and prevent scenarios where VM A is suspended and then the tpm_simulator just sits there with the state and one resumes VM B with the state. Stefan > > James >
On Tue, 2023-01-10 at 09:47 -0500, Stefan Berger wrote: > On 1/10/23 09:14, James Bottomley wrote: > > On Mon, 2023-01-09 at 16:06 -0500, Stefan Berger wrote: > > > On 1/9/23 14:01, Stefan Berger wrote: > > [...] > > > If you use TPM 2 for attestation then certain TPM 2 state > > > migration scenarios may become problematic. One could construct a > > > scenario where attestation preceeds some action that requires > > > trust to have been established in the system in the preceeding > > > attestation step and support for snapshotting the state of the > > > TPM 2 could become an issue if I was to wait for the attestation > > > to have been concluded and then I quickly restart a different > > > snapshot that is not trustworthy and the client proceeds thinking > > > that the system is trustworthy (maybe a few SYNs from the client > > > went into the void) > > > > You're over thinking this. For a non-confidential VM, Migration > > gives you a saved image you can always replay from (this is seen as > > a feature for fast starts) and if you use the tpm_simulator the TPM > > state is stored in the migration image, so you can always roll it > > back if you > > 'How' is it stored in the migration image? Does tpm_simulator marshal > and unmarshal the state so that it is carried inside the save image? > For the tpm_emulator backend this particular code is here: > - > https://github.com/qemu/qemu/blob/master/backends/tpm/tpm_emulator.c#L758 > - > https://github.com/qemu/qemu/blob/master/backends/tpm/tpm_emulator.c#L792 We seem to be going around in circles: your TPM simulator stores the TPM state in the migration image, mine keeps it in the external TPM. The above paragraph is referring to your simulator. > > have access to the migration file. Saving the image state is also > > a huge problem because the TPM seeds are in the clear if the > > migration image isn't encrypted. The other big problem is that an > > external > > True. DAC protection of the file versus protection via encryption. > Neither really helps against malicious root. > > > software TPM is always going to give up its state to the service > > provider, regardless of migration, so you have to have some trust > > in the provider and thus you'd also have to trust them with the > > migration replay policy. For Confidential VMs, this is a bit > > different because the vTPM runs in a secure ring inside the > > confidential enclave and the secure migration agent ensures that > > either migration and startup happen or migration doesn't happen at > > all, so for them you don't have to worry about rollback. > > what is the enclave here? Is it an SGX enclave or is it running > somewhere inside the address space of the VM? The only current one we're playing with is the SEV-SNP SVSM vTPM which runs the TPM in VMPL0. > > > > Provided you can trust the vTPM provider, having external state not > > stored in the migration image has the potential actually to solve > > the rollback problem because you could keep the TPM clock running > > and potentially increase the reset count, so migrations would show > > up in TPM quotes and you don't have control of the state of the > > vTPM to replay it. > > I just don't see how you do that and prevent scenarios where VM A is > suspended and then the tpm_simulator just sits there with > the state and one resumes VM B with the state. You can't with your TPM simulator because it stores state in the image. If the state is external (not stored in the image) then rolling back the image doesn't roll back the TPM state. James
On 1/10/23 09:55, James Bottomley wrote: > On Tue, 2023-01-10 at 09:47 -0500, Stefan Berger wrote: >> On 1/10/23 09:14, James Bottomley wrote: >>> On Mon, 2023-01-09 at 16:06 -0500, Stefan Berger wrote: >>>> On 1/9/23 14:01, Stefan Berger wrote: >>> [...] >>>> If you use TPM 2 for attestation then certain TPM 2 state >>>> migration scenarios may become problematic. One could construct a >>>> scenario where attestation preceeds some action that requires >>>> trust to have been established in the system in the preceeding >>>> attestation step and support for snapshotting the state of the >>>> TPM 2 could become an issue if I was to wait for the attestation >>>> to have been concluded and then I quickly restart a different >>>> snapshot that is not trustworthy and the client proceeds thinking >>>> that the system is trustworthy (maybe a few SYNs from the client >>>> went into the void) >>> >>> You're over thinking this. For a non-confidential VM, Migration >>> gives you a saved image you can always replay from (this is seen as >>> a feature for fast starts) and if you use the tpm_simulator the TPM >>> state is stored in the migration image, so you can always roll it >>> back if you >> >> 'How' is it stored in the migration image? Does tpm_simulator marshal >> and unmarshal the state so that it is carried inside the save image? >> For the tpm_emulator backend this particular code is here: >> - >> https://github.com/qemu/qemu/blob/master/backends/tpm/tpm_emulator.c#L758 >> - >> https://github.com/qemu/qemu/blob/master/backends/tpm/tpm_emulator.c#L792 > > We seem to be going around in circles: your TPM simulator stores the > TPM state in the migration image, mine keeps it in the external TPM. > The above paragraph is referring to your simulator. My simulator is typically called 'swtpm'. > >>> have access to the migration file. Saving the image state is also >>> a huge problem because the TPM seeds are in the clear if the >>> migration image isn't encrypted. The other big problem is that an >>> external >> >> True. DAC protection of the file versus protection via encryption. >> Neither really helps against malicious root. >> >>> software TPM is always going to give up its state to the service >>> provider, regardless of migration, so you have to have some trust >>> in the provider and thus you'd also have to trust them with the >>> migration replay policy. For Confidential VMs, this is a bit >>> different because the vTPM runs in a secure ring inside the >>> confidential enclave and the secure migration agent ensures that >>> either migration and startup happen or migration doesn't happen at >>> all, so for them you don't have to worry about rollback. >> >> what is the enclave here? Is it an SGX enclave or is it running >> somewhere inside the address space of the VM? > > The only current one we're playing with is the SEV-SNP SVSM vTPM which > runs the TPM in VMPL0. And how is this related to this PR? > >>> >>> Provided you can trust the vTPM provider, having external state not >>> stored in the migration image has the potential actually to solve >>> the rollback problem because you could keep the TPM clock running >>> and potentially increase the reset count, so migrations would show >>> up in TPM quotes and you don't have control of the state of the >>> vTPM to replay it. >> >> I just don't see how you do that and prevent scenarios where VM A is >> suspended and then the tpm_simulator just sits there with >> the state and one resumes VM B with the state. > > You can't with your TPM simulator because it stores state in the image. > If the state is external (not stored in the image) then rolling back > the image doesn't roll back the TPM state. And resuming VM B with the TPM state of suspend VM A is considered 'good'? Stefan > > James >
diff --git a/MAINTAINERS b/MAINTAINERS index 6966490c94..a4a3bf9ab4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3046,6 +3046,11 @@ F: backends/tpm/ F: tests/qtest/*tpm* T: git https://github.com/stefanberger/qemu-tpm.git tpm-next +MSSIM TPM Backend +M: James Bottomley <jejb@linux.ibm.com> +S: Maintained +F: backends/tpm/tpm_mssim.* + Checkpatch S: Odd Fixes F: scripts/checkpatch.pl diff --git a/backends/tpm/Kconfig b/backends/tpm/Kconfig index 5d91eb89c2..d6d6fa53e9 100644 --- a/backends/tpm/Kconfig +++ b/backends/tpm/Kconfig @@ -12,3 +12,8 @@ config TPM_EMULATOR bool default y depends on TPM_BACKEND + +config TPM_MSSIM + bool + default y + depends on TPM_BACKEND diff --git a/backends/tpm/meson.build b/backends/tpm/meson.build index 7f2503f84e..c7c3c79125 100644 --- a/backends/tpm/meson.build +++ b/backends/tpm/meson.build @@ -3,4 +3,5 @@ if have_tpm softmmu_ss.add(files('tpm_util.c')) softmmu_ss.add(when: 'CONFIG_TPM_PASSTHROUGH', if_true: files('tpm_passthrough.c')) softmmu_ss.add(when: 'CONFIG_TPM_EMULATOR', if_true: files('tpm_emulator.c')) + softmmu_ss.add(when: 'CONFIG_TPM_MSSIM', if_true: files('tpm_mssim.c')) endif diff --git a/backends/tpm/tpm_mssim.c b/backends/tpm/tpm_mssim.c new file mode 100644 index 0000000000..7c10ce2944 --- /dev/null +++ b/backends/tpm/tpm_mssim.c @@ -0,0 +1,251 @@ +/* + * Emulator TPM driver which connects over the mssim protocol + * SPDX-License-Identifier: GPL-2.0-or-later + * + * Copyright (c) 2022 + * Author: James Bottomley <jejb@linux.ibm.com> + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "qemu/sockets.h" + +#include "qapi/clone-visitor.h" +#include "qapi/qapi-visit-tpm.h" + +#include "io/channel-socket.h" + +#include "sysemu/tpm_backend.h" +#include "sysemu/tpm_util.h" + +#include "qom/object.h" + +#include "tpm_int.h" +#include "tpm_mssim.h" + +#define ERROR_PREFIX "TPM mssim Emulator: " + +#define TYPE_TPM_MSSIM "tpm-mssim" +OBJECT_DECLARE_SIMPLE_TYPE(TPMmssim, TPM_MSSIM) + +struct TPMmssim { + TPMBackend parent; + + TpmTypeOptions *opts; + + QIOChannelSocket *cmd_qc, *ctrl_qc; +}; + +static int tpm_send_ctrl(TPMmssim *t, uint32_t cmd, Error **errp) +{ + int ret; + + cmd = htonl(cmd); + ret = qio_channel_write_all(QIO_CHANNEL(t->ctrl_qc), (char *)&cmd, sizeof(cmd), errp); + if (ret != 0) + return ret; + ret = qio_channel_read_all(QIO_CHANNEL(t->ctrl_qc), (char *)&cmd, sizeof(cmd), errp); + if (ret != 0) + return ret; + if (cmd != 0) { + error_setg(errp, ERROR_PREFIX "Incorrect ACK recieved on control channel 0x%x\n", cmd); + return -1; + } + return 0; +} + +static void tpm_mssim_instance_init(Object *obj) +{ +} + +static void tpm_mssim_instance_finalize(Object *obj) +{ + TPMmssim *t = TPM_MSSIM(obj); + + if (t->ctrl_qc) + tpm_send_ctrl(t, TPM_SIGNAL_POWER_OFF, NULL); + + object_unref(OBJECT(t->ctrl_qc)); + object_unref(OBJECT(t->cmd_qc)); +} + +static void tpm_mssim_cancel_cmd(TPMBackend *tb) +{ + return; +} + +static TPMVersion tpm_mssim_get_version(TPMBackend *tb) +{ + return TPM_VERSION_2_0; +} + +static size_t tpm_mssim_get_buffer_size(TPMBackend *tb) +{ + /* TCG standard profile max buffer size */ + return 4096; +} + +static TpmTypeOptions *tpm_mssim_get_opts(TPMBackend *tb) +{ + TPMmssim *t = TPM_MSSIM(tb); + TpmTypeOptions *opts; + + opts = QAPI_CLONE(TpmTypeOptions, t->opts); + + return opts; +} + +static void tpm_mssim_handle_request(TPMBackend *tb, TPMBackendCmd *cmd, + Error **errp) +{ + TPMmssim *t = TPM_MSSIM(tb); + uint32_t header, len; + uint8_t locality = cmd->locty; + struct iovec iov[4]; + int ret; + + header = htonl(TPM_SEND_COMMAND); + len = htonl(cmd->in_len); + + iov[0].iov_base = &header; + iov[0].iov_len = sizeof(header); + iov[1].iov_base = &locality; + iov[1].iov_len = sizeof(locality); + iov[2].iov_base = &len; + iov[2].iov_len = sizeof(len); + iov[3].iov_base = (void *)cmd->in; + iov[3].iov_len = cmd->in_len; + + ret = qio_channel_writev_all(QIO_CHANNEL(t->cmd_qc), iov, 4, errp); + if (ret != 0) + goto fail; + + ret = qio_channel_read_all(QIO_CHANNEL(t->cmd_qc), (char *)&len, sizeof(len), errp); + if (ret != 0) + goto fail; + len = ntohl(len); + if (len > cmd->out_len) { + error_setg(errp, "receive size is too large"); + goto fail; + } + ret = qio_channel_read_all(QIO_CHANNEL(t->cmd_qc), (char *)cmd->out, len, errp); + if (ret != 0) + goto fail; + /* ACK packet */ + ret = qio_channel_read_all(QIO_CHANNEL(t->cmd_qc), (char *)&header, sizeof(header), errp); + if (ret != 0) + goto fail; + if (header != 0) { + error_setg(errp, "incorrect ACK received on command channel 0x%x", len); + goto fail; + } + + return; + + fail: + error_prepend(errp, ERROR_PREFIX); + tpm_util_write_fatal_error_response(cmd->out, cmd->out_len); +} + +static TPMBackend *tpm_mssim_create(TpmTypeOptions *opts) +{ + TPMBackend *be = TPM_BACKEND(object_new(TYPE_TPM_MSSIM)); + TPMmssim *t = TPM_MSSIM(be); + int sock; + Error *errp = NULL; + TPMmssimOptions *mo = &opts->u.mssim; + + t->opts = opts; + if (!mo->has_command) { + mo->has_command = true; + mo->command = g_new0(SocketAddress, 1); + mo->command->type = SOCKET_ADDRESS_TYPE_INET; + mo->command->u.inet.host = g_strdup("localhost"); + mo->command->u.inet.port = g_strdup("2321"); + } + if (!mo->has_control) { + mo->has_control = true; + mo->control = g_new0(SocketAddress, 1); + mo->control->type = SOCKET_ADDRESS_TYPE_INET; + mo->control->u.inet.host = g_strdup(mo->command->u.inet.host); + mo->control->u.inet.port = g_strdup("2322"); + } + + t->cmd_qc = qio_channel_socket_new(); + t->ctrl_qc = qio_channel_socket_new(); + + if (qio_channel_socket_connect_sync(t->cmd_qc, mo->command, &errp) < 0) + goto fail; + + if (qio_channel_socket_connect_sync(t->ctrl_qc, mo->control, &errp) < 0) + goto fail; + + /* reset the TPM using a power cycle sequence, in case someone + * has previously powered it up */ + sock = tpm_send_ctrl(t, TPM_SIGNAL_POWER_OFF, &errp); + if (sock != 0) + goto fail; + sock = tpm_send_ctrl(t, TPM_SIGNAL_POWER_ON, &errp); + if (sock != 0) + goto fail; + sock = tpm_send_ctrl(t, TPM_SIGNAL_NV_ON, &errp); + if (sock != 0) + goto fail; + + return be; + + fail: + object_unref(OBJECT(t->ctrl_qc)); + object_unref(OBJECT(t->cmd_qc)); + t->ctrl_qc = NULL; + error_prepend(&errp, ERROR_PREFIX); + error_report_err(errp); + object_unref(OBJECT(be)); + + return NULL; +} + +static const QemuOptDesc tpm_mssim_cmdline_opts[] = { + TPM_STANDARD_CMDLINE_OPTS, + { + .name = "command", + .type = QEMU_OPT_STRING, + .help = "Command socket (default localhost:2321)", + }, + { + .name = "control", + .type = QEMU_OPT_STRING, + .help = "control socket (default localhost:2322)", + }, +}; + +static void tpm_mssim_class_init(ObjectClass *klass, void *data) +{ + TPMBackendClass *cl = TPM_BACKEND_CLASS(klass); + + cl->type = TPM_TYPE_MSSIM; + cl->opts = tpm_mssim_cmdline_opts; + cl->desc = "TPM mssim emulator backend driver"; + cl->create = tpm_mssim_create; + cl->cancel_cmd = tpm_mssim_cancel_cmd; + cl->get_tpm_version = tpm_mssim_get_version; + cl->get_buffer_size = tpm_mssim_get_buffer_size; + cl->get_tpm_options = tpm_mssim_get_opts; + cl->handle_request = tpm_mssim_handle_request; +} + +static const TypeInfo tpm_mssim_info = { + .name = TYPE_TPM_MSSIM, + .parent = TYPE_TPM_BACKEND, + .instance_size = sizeof(TPMmssim), + .class_init = tpm_mssim_class_init, + .instance_init = tpm_mssim_instance_init, + .instance_finalize = tpm_mssim_instance_finalize, +}; + +static void tpm_mssim_register(void) +{ + type_register_static(&tpm_mssim_info); +} + +type_init(tpm_mssim_register) diff --git a/backends/tpm/tpm_mssim.h b/backends/tpm/tpm_mssim.h new file mode 100644 index 0000000000..04a270338a --- /dev/null +++ b/backends/tpm/tpm_mssim.h @@ -0,0 +1,43 @@ +/* + * SPDX-License-Identifier: BSD-2-Clause + * + * The code below is copied from the Microsoft/TCG Reference implementation + * + * https://github.com/Microsoft/ms-tpm-20-ref.git + * + * In file TPMCmd/Simulator/include/TpmTcpProtocol.h + */ + +#define TPM_SIGNAL_POWER_ON 1 +#define TPM_SIGNAL_POWER_OFF 2 +#define TPM_SIGNAL_PHYS_PRES_ON 3 +#define TPM_SIGNAL_PHYS_PRES_OFF 4 +#define TPM_SIGNAL_HASH_START 5 +#define TPM_SIGNAL_HASH_DATA 6 + // {uint32_t BufferSize, uint8_t[BufferSize] Buffer} +#define TPM_SIGNAL_HASH_END 7 +#define TPM_SEND_COMMAND 8 + // {uint8_t Locality, uint32_t InBufferSize, uint8_t[InBufferSize] InBuffer} -> + // {uint32_t OutBufferSize, uint8_t[OutBufferSize] OutBuffer} + +#define TPM_SIGNAL_CANCEL_ON 9 +#define TPM_SIGNAL_CANCEL_OFF 10 +#define TPM_SIGNAL_NV_ON 11 +#define TPM_SIGNAL_NV_OFF 12 +#define TPM_SIGNAL_KEY_CACHE_ON 13 +#define TPM_SIGNAL_KEY_CACHE_OFF 14 + +#define TPM_REMOTE_HANDSHAKE 15 +#define TPM_SET_ALTERNATIVE_RESULT 16 + +#define TPM_SIGNAL_RESET 17 +#define TPM_SIGNAL_RESTART 18 + +#define TPM_SESSION_END 20 +#define TPM_STOP 21 + +#define TPM_GET_COMMAND_RESPONSE_SIZES 25 + +#define TPM_ACT_GET_SIGNALED 26 + +#define TPM_TEST_FAILURE_MODE 30 diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index e99447ad68..319f9eeeb6 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -841,6 +841,7 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict) unsigned int c = 0; TPMPassthroughOptions *tpo; TPMEmulatorOptions *teo; + TPMmssimOptions *tmo; info_list = qmp_query_tpm(&err); if (err) { @@ -874,6 +875,12 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict) teo = &ti->options->u.emulator; monitor_printf(mon, ",chardev=%s", teo->chardev); break; + case TPM_TYPE_MSSIM: + tmo = &ti->options->u.mssim; + monitor_printf(mon, ",command=%s:%s,control=%s:%s", + tmo->command->u.inet.host, tmo->command->u.inet.port, + tmo->control->u.inet.host, tmo->control->u.inet.port); + break; case TPM_TYPE__MAX: break; } diff --git a/qapi/tpm.json b/qapi/tpm.json index d8cbd5ea0e..b773bde2ff 100644 --- a/qapi/tpm.json +++ b/qapi/tpm.json @@ -5,6 +5,7 @@ ## # = TPM (trusted platform module) devices ## +{ 'include': 'sockets.json' } ## # @TpmModel: @@ -49,7 +50,7 @@ # # Since: 1.5 ## -{ 'enum': 'TpmType', 'data': [ 'passthrough', 'emulator' ], +{ 'enum': 'TpmType', 'data': [ 'passthrough', 'emulator', 'mssim' ], 'if': 'CONFIG_TPM' } ## @@ -64,7 +65,7 @@ # Example: # # -> { "execute": "query-tpm-types" } -# <- { "return": [ "passthrough", "emulator" ] } +# <- { "return": [ "passthrough", "emulator", "mssim" ] } # ## { 'command': 'query-tpm-types', 'returns': ['TpmType'], @@ -99,6 +100,22 @@ { 'struct': 'TPMEmulatorOptions', 'data': { 'chardev' : 'str' }, 'if': 'CONFIG_TPM' } +## +# @TPMmssimOptions: +# +# Information for the mssim emulator connection +# +# @command: command socket for the TPM emulator +# @control: control socket for the TPM emulator +# +# Since: 7.2.0 +## +{ 'struct': 'TPMmssimOptions', + 'data': { + '*command': 'SocketAddress', + '*control': 'SocketAddress' }, + 'if': 'CONFIG_TPM' } + ## # @TpmTypeOptions: # @@ -107,6 +124,7 @@ # @id: identifier of the backend # @type: - 'passthrough' The configuration options for the TPM passthrough type # - 'emulator' The configuration options for TPM emulator backend type +# - 'mssim' The configuration options for TPM emulator mssim type # # Since: 1.5 ## @@ -115,7 +133,8 @@ 'id': 'str' }, 'discriminator': 'type', 'data': { 'passthrough' : 'TPMPassthroughOptions', - 'emulator': 'TPMEmulatorOptions' }, + 'emulator': 'TPMEmulatorOptions', + 'mssim': 'TPMmssimOptions' }, 'if': 'CONFIG_TPM' } ##