Message ID | 20220130211838.8382-1-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Shadow stacks for userspace | expand |
Rick, On Sun, Jan 30 2022 at 13:18, Rick Edgecombe wrote: > This is a slight reboot of the userspace CET series. I will be taking over the > series from Yu-cheng. Per some internal recommendations, I’ve reset the version > number and am calling it a new series. Hopefully, it doesn’t cause > confusion. That's fine as it seems to be a major change in course, so a reset to V1 is justified. Don't worry about confusion, we can easily confuse ourself with minor things than that version reset :) > The new plan is to upstream only userspace Shadow Stack support at this point. > IBT can follow later, but for now I’ll focus solely on the most in-demand and > widely available (with the feature on AMD CPUs now) part of CET. We just have to keep in IBT mind so that we don't add roadblocks which we regret some time later. > I thought as part of this reset, it might be useful to more fully write-up the > design and summarize the history of the previous CET series. So this slightly > long cover letter does that. The "Updates" section has the changes, if anyone > doesn't want the history. Thanks for that lengthy writeup. It's appreciated. There is too much confusion already so a coherent summary is helpful. > Why is Shadow Stack Wanted > ========================== > The main use case for userspace shadow stack is providing protection against > return oriented programming attacks. Fedora and Ubuntu already have many/most > packages enabled for shadow stack. Which is unfortunately part of the overall problem ... > History > ======= > The branding “CET” really consists of two features: “Shadow Stack” and > “Indirect Branch Tracking”. They both restrict previously allowed, but rarely > valid behaviors and require userspace to change to avoid these behaviors before > enabling the protection. These raw HW features need to be assembled into a > software solution across userspace and kernel in order to add security value. > The kernel part of this solution has evolved iteratively starting with a lengthy > RFC period. > > Until now, the enabling effort was trying to support both Shadow Stack and IBT. > This history will focus on a few areas of the shadow stack development history > that I thought stood out. > > Signals > ------- > Originally signals placed the location of the shadow stack restore > token inside the saved state on the stack. This was problematic from a > past ABI promises perspective. So the restore location was instead just > assumed from the shadow stack pointer. This works because in normal > allowed cases of calling sigreturn, the shadow stack pointer should be > right at the restore token at that time. There is no alternate shadow > stack support. If an alt shadow stack is added later we would > need to So how is that going to work? altstack is not an esoteric corner case. > Enabling Interface > ------------------ > For the entire history of the original CET series, the design was to > enable shadow stack automatically if the feature bit was detected in > the elf header. Then it was userspace’s responsibility to turn it off > via an arch_prctl() if it was not desired, and this was handled by the > glibc dynamic loader. Glibc’s standard behavior (when CET if configured > is to leave shadow stack enabled if the executable and all linked > libraries are marked with shadow stacks. > > Many distros (Fedora and others) have binaries already marked with > shadow stack, waiting for kernel support. Unfortunately their glibc > binaries expect the original arch_prctl() interface for allocating > shadow stacks, as those changes were pushed ahead of kernel support. > The net result of it all is, when updating to a kernel with shadow > stack these binaries would suddenly get shadow stack enabled and expect > the arch_prctl() interface to be there. And so calls to makecontext() > will fail, resulting in visible breakages. This series deals with this > problem as described below in "Updates". I'm really impressed by the well thought out coordination on the glibc and distro side. Designed by committee never worked ... > Updates > ======= > These updates were mostly driven by public comments, but a lot of the design > elements are new. I would like some extra scrutiny on the updates. > > New syscall for Shadow Stack Allocation > --------------------------------------- > A new syscall is added for allocating shadow stacks to replace > PROT_SHADOW_STACK. Several options were considered, as described in the > “x86/cet/shstk: Introduce map_shadow_stack syscall”. > > Xsave Managed Supervisor State Modifications > -------------------------------------------- > The shadow stack feature requires the kernel to modify xsaves managed > state. On one of the last versions of Yu-cheng’s series Boris had > commented on the pattern it was using to do this not necessarily being > ideal. The pattern was to force a restore to the registers and always > do the modification there. Then Thomas did an overhaul of the fpu code, > part of which consisted of making raw access to the xsave buffer > private to the fpu code. So this series tries to expose access again, > and in a way that addresses Boris’ comments. > > The method is to provide functions like wmsrl/rdmsrl, but that can > direct the operation to the correct location (registers or buffer), > while giving the proper notice to the fpu subsystem so things don’t get > clobbered or corrupted. > > In the past a solution like this was discussed as part of the PASID > series, and Thomas was not in favor. In CET’s case there is a more > logic around the CET MSR’s than in PASID's, and wrapping this logic > minimizes near identical open coded logic needed to do this more > efficiently. In addition it resolves the above described problem of > having no access to the xsave buffer. So it is being put forward here > under the supposition that CET’s usage may lead to a different > conclusion, not to try to ignore past direction. > > The user interrupt series has similar needs as CET, and will also use > this internal interface if it’s found acceptable. I'll have a look. > Switch Enabling Interface > ------------------------- > But there are existing downsides to automatic elf header processing > based enabling. The elf header feature spec is not defined by the > kernel and there are proposals to expand it to describe additional > logic. A simpler interface where the kernel is simply told what to > enable, and leaves all the decision making to userspace, is more > flexible for userspace and simpler for the kernel. There also already > needs to be an ARCH_X86_FEATURE_ENABLE arch_prctl() for WRSS (and > likely LAM will use it too), so it avoids there being two ways to turn > on these types of features. The only tricky part for shadow stack, is > that it has to be enabled very early. Wherever the shadow stack is > enabled, the app cannot return from that point, otherwise there will be > a shadow stack violation. It turns out glibc can enable shadow stack > this early, so it works nicely. So not automatically enabling any > features in the elf header will cleanly disable all old binaries, which > expect the kernel to enable CET features automatically. Then after the > kernel changes are upstream, glibc can be updated to use the new > interface. This is the solution implemented in this series. Makes sense. > Expand Commit Logs > ------------------ > As part of spinning up on this series, I found some of the commit logs > did not describe the changes in enough detail for me understand their > purpose. I tried to expand the logs and comments, where I had to go > digging. Hopefully it’s useful. Proper changelogs are always appreciated. > Limit to only Intel Processors > ------------------------------ > Shadow stack is supported on some AMD processors, but this revision > (with expanded HW usage and xsaves changes) has only has been tested on > Intel ones. So this series has a patch to limit shadow stack support to > Intel processors. Ideally the patch would not even make it to mainline, > and should be dropped as soon as this testing is done. It's included > just in case. Ha. I can give you access to an AMD machine with CET SS supported :) > Future Work > =========== > Even though this is now exclusively a shadow stack series, there is still some > remaining shadow stack work to be done. > > Ptrace > ------ > Early in the series, there was a patch to allow IA32_U_CET and > IA32_PL3_SSP to be set. This patch was dropped and planned as a follow > up to basic support, and it remains the plan. It will be needed for > in-progress gdb support. It's pretty much a prerequisite for enabling it, right? > CRIU Support > ------------ > In the past there was some speculation on the mailing list about > whether CRIU would need to be taught about CET. It turns out, it does. > The first issue hit is that CRIU calls sigreturn directly from its > “parasite code” that it injects into the dumper process. This violates > this shadow stack implementation’s protection that intends to prevent > attackers from doing this. > > With so many packages already enabled with shadow stack, there is > probably desire to make it work seamlessly. But in the meantime if > distros want to support shadow stack and CRIU, users could manually > disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for > a process they will wants to dump. It’s not ideal. > > I’d like to hear what people think about having shadow stack in the > kernel without this resolved. Nothing would change for any users until > they enable shadow stack in the kernel and update to a glibc configured > with CET. Should CRIU userspace be solved before kernel support? Definitely yes. Making CRIU users add a glibc tunable is not really an option. We can't break CRIU systems with a kernel upgrade. Thanks, tglx
Hi Thomas, Thanks for feedback on the plan. On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: > > Until now, the enabling effort was trying to support both Shadow > > Stack and IBT. > > This history will focus on a few areas of the shadow stack > > development history > > that I thought stood out. > > > > Signals > > ------- > > Originally signals placed the location of the shadow stack > > restore > > token inside the saved state on the stack. This was > > problematic from a > > past ABI promises perspective. So the restore location was > > instead just > > assumed from the shadow stack pointer. This works because in > > normal > > allowed cases of calling sigreturn, the shadow stack pointer > > should be > > right at the restore token at that time. There is no > > alternate shadow > > stack support. If an alt shadow stack is added later we > > would > > need to > > So how is that going to work? altstack is not an esoteric corner > case. My understanding is that the main usages for the signal stack were handling stack overflows and corruption. Since the shadow stack only contains return addresses rather than large stack allocations, and is not generally writable or pivotable, I thought there was a good possibility an alt shadow stack would not end up being especially useful. Does it seem like reasonable guesswork? If it does seems likely, I'll give it some more thought than that hand wavy plan. > > > Limit to only Intel Processors > > ------------------------------ > > Shadow stack is supported on some AMD processors, but this > > revision > > (with expanded HW usage and xsaves changes) has only has > > been tested on > > Intel ones. So this series has a patch to limit shadow stack > > support to > > Intel processors. Ideally the patch would not even make it > > to mainline, > > and should be dropped as soon as this testing is done. It's > > included > > just in case. > > Ha. I can give you access to an AMD machine with CET SS supported :) Thanks for the offer. It sounds like John Allen can do this testing. > > > Future Work > > =========== > > Even though this is now exclusively a shadow stack series, there is > > still some > > remaining shadow stack work to be done. > > > > Ptrace > > ------ > > Early in the series, there was a patch to allow IA32_U_CET > > and > > IA32_PL3_SSP to be set. This patch was dropped and planned > > as a follow > > up to basic support, and it remains the plan. It will be > > needed for > > in-progress gdb support. > > It's pretty much a prerequisite for enabling it, right? Yes. > > > CRIU Support > > ------------ > > In the past there was some speculation on the mailing list > > about > > whether CRIU would need to be taught about CET. It turns > > out, it does. > > The first issue hit is that CRIU calls sigreturn directly > > from its > > “parasite code” that it injects into the dumper process. > > This violates > > this shadow stack implementation’s protection that intends > > to prevent > > attackers from doing this. > > > > With so many packages already enabled with shadow stack, > > there is > > probably desire to make it work seamlessly. But in the > > meantime if > > distros want to support shadow stack and CRIU, users could > > manually > > disabled shadow stack via > > “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for > > a process they will wants to dump. It’s not ideal. > > > > I’d like to hear what people think about having shadow stack > > in the > > kernel without this resolved. Nothing would change for any > > users until > > they enable shadow stack in the kernel and update to a glibc > > configured > > with CET. Should CRIU userspace be solved before kernel > > support? > > Definitely yes. Making CRIU users add a glibc tunable is not really > an > option. We can't break CRIU systems with a kernel upgrade. Ok got it, thanks. Just to be clear though, existing distros/binaries out there will not have shadow stack enabled with just an updated kernel (due to the enabling changes). So the CRIU tools would only break after future glibc binaries enable CET, which users/distros would have to do specifically (glibc doesn't even enable cet by default). Since the purpose of this feature is to restrict previously allowed behaviors, and it’s apparently getting enabled by default in some distro's packages, I guess there is a decent chance that once a system is updated with a future glibc some app somewhere will break. I was under the impression that as long as there were no breakages under a current set of binaries (including glibc), this was not considered a kernel regression. Please correct me if this is wrong. I think there are other options if we want to make this softer. Of course none of that prevents known breakages from being fixed for normal reasons and I’ll look into that for CRIU.
On Thu, Feb 3, 2022, at 5:08 PM, Edgecombe, Rick P wrote: > Hi Thomas, >> > Signals >> > ------- >> > Originally signals placed the location of the shadow stack >> > restore >> > token inside the saved state on the stack. This was >> > problematic from a >> > past ABI promises perspective. What was the actual problem? >> > So the restore location was >> > instead just >> > assumed from the shadow stack pointer. This works because in >> > normal >> > allowed cases of calling sigreturn, the shadow stack pointer >> > should be >> > right at the restore token at that time. There is no >> > alternate shadow >> > stack support. If an alt shadow stack is added later we >> > would >> > need to >> >> So how is that going to work? altstack is not an esoteric corner >> case. > > My understanding is that the main usages for the signal stack were > handling stack overflows and corruption. Since the shadow stack only > contains return addresses rather than large stack allocations, and is > not generally writable or pivotable, I thought there was a good > possibility an alt shadow stack would not end up being especially > useful. Does it seem like reasonable guesswork? It's also used for things like DOSEMU that execute in a weird context and then trap back out to the outer program using a signal handler and an altstack. Also, imagine someone writing a SIGSEGV handler specifically intended to handle shadow stack overflow. The shadow stack can be pivoted using RSTORSSP.
On Thu, 2022-02-03 at 21:20 -0800, Andy Lutomirski wrote: > On Thu, Feb 3, 2022, at 5:08 PM, Edgecombe, Rick P wrote: > > Hi Thomas, > > > > Signals > > > > ------- > > > > Originally signals placed the location of the shadow > > > > stack > > > > restore > > > > token inside the saved state on the stack. This was > > > > problematic from a > > > > past ABI promises perspective. > > What was the actual problem? The meat of the discussion that I saw was this thread: https://lore.kernel.org/lkml/CALCETrVTeYfzO-XWh+VwTuKCyPyp-oOMGH=QR_msG9tPQ4xPmA@mail.gmail.com/ The problem was that the saved shadow stack pointer was placed in a location unexpected by userspace (at the end of the floating point state), which apparently can be relocated by userspace that would not be aware of the shadow stack state at the end. I think an earlier version was not extendable either. It does not seem to be a fully dead end, but I have to admit I didn’t fully internalize the limits imposed by the userspace expectations of the sigframe because the current solution seemed good. I’ll have to dig into it a little more because alt stack, CRIU and these expectations all seem intertwined. Here is a question for you guys though – is a general ucontext extension solution a nice-to-have on its own? I was thinking that it would be better to keep CET stuff out of the sigframe related structures if it can be avoided. One reason is that it keeps security related register values out of writable locations. I’m not thinking of any specific problem, but just a general idea of not opening that stuff up if it’s not needed. An example is in the IBT series, the wait-for-endbranch state was saved in a ucontext flag. Some usages may want to keep it from being cleared in a signal and the endbranch check skipped. So for shadow stack, just the general notion that this is not ideal state to open up. On where to keep the wait-for-endbranch state, PeterZ had suggested the possibility of having a per-mm hashmap of (userspace stack addresses)-> (kernel side saved state), limited to some sensible limit of items. This could be extendable to other state besides CET stuff. I was thinking to look in that direction if it’s needed for the alt shadow stack. But, is it swimming against the place where saved state is "supposed" to be? There is some optimum of compatibility (more apps able to opt- in) and security. I guess it's probably not good to have the kernel bend over backwards trying to get both. > > > > So the restore location was > > > > instead just > > > > assumed from the shadow stack pointer. This works > > > > because in > > > > normal > > > > allowed cases of calling sigreturn, the shadow stack > > > > pointer > > > > should be > > > > right at the restore token at that time. There is no > > > > alternate shadow > > > > stack support. If an alt shadow stack is added later we > > > > would > > > > need to > > > > > > So how is that going to work? altstack is not an esoteric corner > > > case. > > > > My understanding is that the main usages for the signal stack were > > handling stack overflows and corruption. Since the shadow stack > > only > > contains return addresses rather than large stack allocations, and > > is > > not generally writable or pivotable, I thought there was a good > > possibility an alt shadow stack would not end up being especially > > useful. Does it seem like reasonable guesswork? > > It's also used for things like DOSEMU that execute in a weird context > and then trap back out to the outer program using a signal handler > and an altstack. Also, imagine someone writing a SIGSEGV handler > specifically intended to handle shadow stack overflow. Interesting, thanks. I had been thinking that an alt shadow stack would require a new interface that would mostly just sit dormant taking up space. But probably an (obvious) better way would be to just have the signalstack() syscall automatically create a new shadow stack, like the rest of the series does automatically for new threads. I’ll think I’ll see how that would look. > > The shadow stack can be pivoted using RSTORSSP. Yes, I just meant that the ability to pivot or modify is restricted (in RSTORSSP's case by restore token checks) and so with less ability to interact with it, it could be less likely for there to be corruptions. This if of course just speculation.
From: Edgecombe, Rick P > Sent: 04 February 2022 01:08 > Hi Thomas, > > Thanks for feedback on the plan. > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: > > > Until now, the enabling effort was trying to support both Shadow > > > Stack and IBT. > > > This history will focus on a few areas of the shadow stack > > > development history > > > that I thought stood out. > > > > > > Signals > > > ------- > > > Originally signals placed the location of the shadow stack > > > restore > > > token inside the saved state on the stack. This was > > > problematic from a > > > past ABI promises perspective. So the restore location was > > > instead just > > > assumed from the shadow stack pointer. This works because in > > > normal > > > allowed cases of calling sigreturn, the shadow stack pointer > > > should be > > > right at the restore token at that time. There is no > > > alternate shadow > > > stack support. If an alt shadow stack is added later we > > > would > > > need to > > > > So how is that going to work? altstack is not an esoteric corner > > case. > > My understanding is that the main usages for the signal stack were > handling stack overflows and corruption. Since the shadow stack only > contains return addresses rather than large stack allocations, and is > not generally writable or pivotable, I thought there was a good > possibility an alt shadow stack would not end up being especially > useful. Does it seem like reasonable guesswork? The other 'problem' is that it is valid to longjump out of a signal handler. These days you have to use siglongjmp() not longjmp() but it is still used. It is probably also valid to use siglongjmp() to jump from a nested signal handler into the outer handler. Given both signal handlers can have their own stack, there can be three stacks involved. I think the shadow stack pointer has to be in ucontext - which also means the application can change it before returning from a signal. In much the same way as all the segment registers can be changed leading to all the nasty bugs when the final 'return to user' code traps in kernel when loading invalid segment registers or executing iret. Hmmm... do shadow stacks mean that longjmp() has to be a system call? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com> wrote: > > From: Edgecombe, Rick P > > Sent: 04 February 2022 01:08 > > Hi Thomas, > > > > Thanks for feedback on the plan. > > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: > > > > Until now, the enabling effort was trying to support both Shadow > > > > Stack and IBT. > > > > This history will focus on a few areas of the shadow stack > > > > development history > > > > that I thought stood out. > > > > > > > > Signals > > > > ------- > > > > Originally signals placed the location of the shadow stack > > > > restore > > > > token inside the saved state on the stack. This was > > > > problematic from a > > > > past ABI promises perspective. So the restore location was > > > > instead just > > > > assumed from the shadow stack pointer. This works because in > > > > normal > > > > allowed cases of calling sigreturn, the shadow stack pointer > > > > should be > > > > right at the restore token at that time. There is no > > > > alternate shadow > > > > stack support. If an alt shadow stack is added later we > > > > would > > > > need to > > > > > > So how is that going to work? altstack is not an esoteric corner > > > case. > > > > My understanding is that the main usages for the signal stack were > > handling stack overflows and corruption. Since the shadow stack only > > contains return addresses rather than large stack allocations, and is > > not generally writable or pivotable, I thought there was a good > > possibility an alt shadow stack would not end up being especially > > useful. Does it seem like reasonable guesswork? > > The other 'problem' is that it is valid to longjump out of a signal handler. > These days you have to use siglongjmp() not longjmp() but it is still used. > > It is probably also valid to use siglongjmp() to jump from a nested > signal handler into the outer handler. > Given both signal handlers can have their own stack, there can be three > stacks involved. > > I think the shadow stack pointer has to be in ucontext - which also > means the application can change it before returning from a signal. > In much the same way as all the segment registers can be changed > leading to all the nasty bugs when the final 'return to user' code > traps in kernel when loading invalid segment registers or executing iret. > > Hmmm... do shadow stacks mean that longjmp() has to be a system call? No. setjmp/longjmp save and restore shadow stack pointer. -- H.J.
On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote: > On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com> > wrote: > > > > From: Edgecombe, Rick P > > > Sent: 04 February 2022 01:08 > > > Hi Thomas, > > > > > > Thanks for feedback on the plan. > > > > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: > > > > > Until now, the enabling effort was trying to support both > > > > > Shadow > > > > > Stack and IBT. > > > > > This history will focus on a few areas of the shadow stack > > > > > development history > > > > > that I thought stood out. > > > > > > > > > > Signals > > > > > ------- > > > > > Originally signals placed the location of the shadow > > > > > stack > > > > > restore > > > > > token inside the saved state on the stack. This was > > > > > problematic from a > > > > > past ABI promises perspective. So the restore location > > > > > was > > > > > instead just > > > > > assumed from the shadow stack pointer. This works > > > > > because in > > > > > normal > > > > > allowed cases of calling sigreturn, the shadow stack > > > > > pointer > > > > > should be > > > > > right at the restore token at that time. There is no > > > > > alternate shadow > > > > > stack support. If an alt shadow stack is added later > > > > > we > > > > > would > > > > > need to > > > > > > > > So how is that going to work? altstack is not an esoteric > > > > corner > > > > case. > > > > > > My understanding is that the main usages for the signal stack > > > were > > > handling stack overflows and corruption. Since the shadow stack > > > only > > > contains return addresses rather than large stack allocations, > > > and is > > > not generally writable or pivotable, I thought there was a good > > > possibility an alt shadow stack would not end up being especially > > > useful. Does it seem like reasonable guesswork? > > > > The other 'problem' is that it is valid to longjump out of a signal > > handler. > > These days you have to use siglongjmp() not longjmp() but it is > > still used. > > > > It is probably also valid to use siglongjmp() to jump from a nested > > signal handler into the outer handler. > > Given both signal handlers can have their own stack, there can be > > three > > stacks involved. So the scenario is? 1. Handle signal 1 2. sigsetjmp() 3. signalstack() 4. Handle signal 2 on alt stack 5. siglongjmp() I'll check that it is covered by the tests, but I think it should work in this series that has no alt shadow stack. I have only done a high level overview of how the shadow stack stuff, that doesn't involve the kernel, works in glibc. Sounds like I'll need to do a deeper dive. > > > > I think the shadow stack pointer has to be in ucontext - which also > > means the application can change it before returning from a signal. Yes we might need to change it to support alt shadow stacks. Can you elaborate why you think it has to be in ucontext? I was thinking of looking at three options for storing the ssp: - Stored in the shadow stack like a token using WRUSS from the kernel. - Stored on the kernel side using a hashmap that maps ucontext or sigframe userspace address to ssp (this is of course similar to storing in ucontext, except that the user can’t change the ssp). - Stored writable in userspace in ucontext. But in this version, without alt shadow stacks, the shadow stack pointer is not stored in ucontext. This causes the limitation that userspace can only call sigreturn when it has returned back to a point where there is a restore token on the shadow stack (which was placed there by the kernel). This doesn’t mean it can’t switch to a different shadow stack or handle a nested signal, but it limits the possibility for calling sigreturn with a totally different sigframe (like CRIU and SROP attacks do). It should hopefully be a helpful, protective limitation for most apps and I'm hoping CRIU can be fixed without removing it. I am not aware of other limitations to signals (besides normal shadow stack enforcement), but I could be missing it. And people's skepticism is making me want to go back over it with more scrutiny. > > In much the same way as all the segment registers can be changed > > leading to all the nasty bugs when the final 'return to user' code > > traps in kernel when loading invalid segment registers or executing > > iret. I don't think this is as difficult to avoid because userspace ssp has its own register that should not be accessed at that point, but I have not given this aspect enough analysis. Thanks for bringing it up. > > > > Hmmm... do shadow stacks mean that longjmp() has to be a system > > call? > > No. setjmp/longjmp save and restore shadow stack pointer. > It sounds like it would help to write up in a lot more detail exactly how all the signal and specialer stack manipulation scenarios work in glibc.
On Sat, Feb 5, 2022 at 12:15 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote: > > On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com> > > wrote: > > > > > > From: Edgecombe, Rick P > > > > Sent: 04 February 2022 01:08 > > > > Hi Thomas, > > > > > > > > Thanks for feedback on the plan. > > > > > > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: > > > > > > Until now, the enabling effort was trying to support both > > > > > > Shadow > > > > > > Stack and IBT. > > > > > > This history will focus on a few areas of the shadow stack > > > > > > development history > > > > > > that I thought stood out. > > > > > > > > > > > > Signals > > > > > > ------- > > > > > > Originally signals placed the location of the shadow > > > > > > stack > > > > > > restore > > > > > > token inside the saved state on the stack. This was > > > > > > problematic from a > > > > > > past ABI promises perspective. So the restore location > > > > > > was > > > > > > instead just > > > > > > assumed from the shadow stack pointer. This works > > > > > > because in > > > > > > normal > > > > > > allowed cases of calling sigreturn, the shadow stack > > > > > > pointer > > > > > > should be > > > > > > right at the restore token at that time. There is no > > > > > > alternate shadow > > > > > > stack support. If an alt shadow stack is added later > > > > > > we > > > > > > would > > > > > > need to > > > > > > > > > > So how is that going to work? altstack is not an esoteric > > > > > corner > > > > > case. > > > > > > > > My understanding is that the main usages for the signal stack > > > > were > > > > handling stack overflows and corruption. Since the shadow stack > > > > only > > > > contains return addresses rather than large stack allocations, > > > > and is > > > > not generally writable or pivotable, I thought there was a good > > > > possibility an alt shadow stack would not end up being especially > > > > useful. Does it seem like reasonable guesswork? > > > > > > The other 'problem' is that it is valid to longjump out of a signal > > > handler. > > > These days you have to use siglongjmp() not longjmp() but it is > > > still used. > > > > > > It is probably also valid to use siglongjmp() to jump from a nested > > > signal handler into the outer handler. > > > Given both signal handlers can have their own stack, there can be > > > three > > > stacks involved. > > So the scenario is? > > 1. Handle signal 1 > 2. sigsetjmp() > 3. signalstack() > 4. Handle signal 2 on alt stack > 5. siglongjmp() > > I'll check that it is covered by the tests, but I think it should work > in this series that has no alt shadow stack. I have only done a high > level overview of how the shadow stack stuff, that doesn't involve the > kernel, works in glibc. Sounds like I'll need to do a deeper dive. > > > > > > > I think the shadow stack pointer has to be in ucontext - which also > > > means the application can change it before returning from a signal. > > Yes we might need to change it to support alt shadow stacks. Can you > elaborate why you think it has to be in ucontext? I was thinking of > looking at three options for storing the ssp: > - Stored in the shadow stack like a token using WRUSS from the kernel. > - Stored on the kernel side using a hashmap that maps ucontext or > sigframe userspace address to ssp (this is of course similar to > storing in ucontext, except that the user can’t change the ssp). > - Stored writable in userspace in ucontext. > > But in this version, without alt shadow stacks, the shadow stack > pointer is not stored in ucontext. This causes the limitation that > userspace can only call sigreturn when it has returned back to a point > where there is a restore token on the shadow stack (which was placed > there by the kernel). This doesn’t mean it can’t switch to a different > shadow stack or handle a nested signal, but it limits the possibility > for calling sigreturn with a totally different sigframe (like CRIU and > SROP attacks do). It should hopefully be a helpful, protective > limitation for most apps and I'm hoping CRIU can be fixed without > removing it. > > I am not aware of other limitations to signals (besides normal shadow > stack enforcement), but I could be missing it. And people's skepticism > is making me want to go back over it with more scrutiny. > > > > In much the same way as all the segment registers can be changed > > > leading to all the nasty bugs when the final 'return to user' code > > > traps in kernel when loading invalid segment registers or executing > > > iret. > > I don't think this is as difficult to avoid because userspace ssp has > its own register that should not be accessed at that point, but I have > not given this aspect enough analysis. Thanks for bringing it up. > > > > > > > Hmmm... do shadow stacks mean that longjmp() has to be a system > > > call? > > > > No. setjmp/longjmp save and restore shadow stack pointer. > > > > It sounds like it would help to write up in a lot more detail exactly > how all the signal and specialer stack manipulation scenarios work in > glibc. > setjmp/longjmp work on the same sigjmp_buf. Shadow stack pointer is saved and restored, just like any other callee-saved registers.
On Fri, Feb 04, 2022 at 01:08:25AM +0000, Edgecombe, Rick P wrote: > > So how is that going to work? altstack is not an esoteric corner > > case. > > My understanding is that the main usages for the signal stack were > handling stack overflows and corruption. Since the shadow stack only > contains return addresses rather than large stack allocations, and is > not generally writable or pivotable, I thought there was a good > possibility an alt shadow stack would not end up being especially > useful. Does it seem like reasonable guesswork? altstacks are also used in userspace threading implemenations.
On Sat, Feb 05, 2022 at 12:21:12PM -0800, H.J. Lu wrote: > setjmp/longjmp work on the same sigjmp_buf. Shadow stack pointer > is saved and restored, just like any other callee-saved registers. How is having that shadow stack pointer in user-writable memory not a problem? That seems like a prime target to subvert the whole shadow stack machinery.
From: Edgecombe, Rick P > Sent: 05 February 2022 20:15 > > On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote: > > On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com> > > wrote: > > > > > > From: Edgecombe, Rick P > > > > Sent: 04 February 2022 01:08 > > > > Hi Thomas, > > > > > > > > Thanks for feedback on the plan. > > > > > > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: > > > > > > Until now, the enabling effort was trying to support both > > > > > > Shadow > > > > > > Stack and IBT. > > > > > > This history will focus on a few areas of the shadow stack > > > > > > development history > > > > > > that I thought stood out. > > > > > > > > > > > > Signals > > > > > > ------- > > > > > > Originally signals placed the location of the shadow > > > > > > stack > > > > > > restore > > > > > > token inside the saved state on the stack. This was > > > > > > problematic from a > > > > > > past ABI promises perspective. So the restore location > > > > > > was > > > > > > instead just > > > > > > assumed from the shadow stack pointer. This works > > > > > > because in > > > > > > normal > > > > > > allowed cases of calling sigreturn, the shadow stack > > > > > > pointer > > > > > > should be > > > > > > right at the restore token at that time. There is no > > > > > > alternate shadow > > > > > > stack support. If an alt shadow stack is added later > > > > > > we > > > > > > would > > > > > > need to > > > > > > > > > > So how is that going to work? altstack is not an esoteric > > > > > corner > > > > > case. > > > > > > > > My understanding is that the main usages for the signal stack > > > > were > > > > handling stack overflows and corruption. Since the shadow stack > > > > only > > > > contains return addresses rather than large stack allocations, > > > > and is > > > > not generally writable or pivotable, I thought there was a good > > > > possibility an alt shadow stack would not end up being especially > > > > useful. Does it seem like reasonable guesswork? > > > > > > The other 'problem' is that it is valid to longjump out of a signal > > > handler. > > > These days you have to use siglongjmp() not longjmp() but it is > > > still used. > > > > > > It is probably also valid to use siglongjmp() to jump from a nested > > > signal handler into the outer handler. > > > Given both signal handlers can have their own stack, there can be > > > three > > > stacks involved. > > So the scenario is? > > 1. Handle signal 1 > 2. sigsetjmp() > 3. signalstack() > 4. Handle signal 2 on alt stack > 5. siglongjmp() > > I'll check that it is covered by the tests, but I think it should work > in this series that has no alt shadow stack. I have only done a high > level overview of how the shadow stack stuff, that doesn't involve the > kernel, works in glibc. Sounds like I'll need to do a deeper dive. The posix/xopen definition for setjmp/longjmp doesn't require such longjmp requests to work. Although they still have to do something that doesn't break badly. Aborting the process is probably fine! > > > I think the shadow stack pointer has to be in ucontext - which also > > > means the application can change it before returning from a signal. > > Yes we might need to change it to support alt shadow stacks. Can you > elaborate why you think it has to be in ucontext? I was thinking of > looking at three options for storing the ssp: > - Stored in the shadow stack like a token using WRUSS from the kernel. > - Stored on the kernel side using a hashmap that maps ucontext or > sigframe userspace address to ssp (this is of course similar to > storing in ucontext, except that the user can’t change the ssp). > - Stored writable in userspace in ucontext. > > But in this version, without alt shadow stacks, the shadow stack > pointer is not stored in ucontext. This causes the limitation that > userspace can only call sigreturn when it has returned back to a point > where there is a restore token on the shadow stack (which was placed > there by the kernel). This doesn’t mean it can’t switch to a different > shadow stack or handle a nested signal, but it limits the possibility > for calling sigreturn with a totally different sigframe (like CRIU and > SROP attacks do). It should hopefully be a helpful, protective > limitation for most apps and I'm hoping CRIU can be fixed without > removing it. > > I am not aware of other limitations to signals (besides normal shadow > stack enforcement), but I could be missing it. And people's skepticism > is making me want to go back over it with more scrutiny. > > > > In much the same way as all the segment registers can be changed > > > leading to all the nasty bugs when the final 'return to user' code > > > traps in kernel when loading invalid segment registers or executing > > > iret. > > I don't think this is as difficult to avoid because userspace ssp has > its own register that should not be accessed at that point, but I have > not given this aspect enough analysis. Thanks for bringing it up. So the user ssp isn't saved (or restored) by the trap entry/exit. So it needs to be saved by the context switch code? Much like the user segment registers? So you are likely to get the same problems if restoring it can fault in kernel (eg for a non-canonical address). > > > Hmmm... do shadow stacks mean that longjmp() has to be a system > > > call? > > > > No. setjmp/longjmp save and restore shadow stack pointer. Ok, I was thinking that direct access to the user ssp would be a privileged operation. If it can be written you don't really have to worry about what code is trying to do - it can actually do what it likes. It just catches unintentional operations (like buffer overflows). Was there any 'spare' space in struct jmpbuf ? Otherwise you can only enable shadow stacks if everything has been recompiled - including any shared libraries the might be dlopen()ed. (or does the compiler invent an alloca() call somehow for a size that comes back from glibc?) I've never really considered how setjmp/longjmp handle callee saved register variables (apart from it being hard). The original pdp11 implementation probably only needed to save r6 and r7. What does happen to all the 'extended state' that XSAVE handles? IIRC all the AVX registers are caller saved (so should probably be zerod), but some of the SSE ones are callee saved, and one or two of the fpu flags are sticky and annoying enough the save/restore at the best of times. > It sounds like it would help to write up in a lot more detail exactly > how all the signal and specialer stack manipulation scenarios work in > glibc. Some cross references might have made people notice that the ucontext extensions for AVX512 (if not earlier ones) broke the minimal/default signal stack size. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sun, Feb 6, 2022 at 5:42 AM David Laight <David.Laight@aculab.com> wrote: > > From: Edgecombe, Rick P > > Sent: 05 February 2022 20:15 > > > > On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote: > > > On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com> > > > wrote: > > > > > > > > From: Edgecombe, Rick P > > > > > Sent: 04 February 2022 01:08 > > > > > Hi Thomas, > > > > > > > > > > Thanks for feedback on the plan. > > > > > > > > > > On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: > > > > > > > Until now, the enabling effort was trying to support both > > > > > > > Shadow > > > > > > > Stack and IBT. > > > > > > > This history will focus on a few areas of the shadow stack > > > > > > > development history > > > > > > > that I thought stood out. > > > > > > > > > > > > > > Signals > > > > > > > ------- > > > > > > > Originally signals placed the location of the shadow > > > > > > > stack > > > > > > > restore > > > > > > > token inside the saved state on the stack. This was > > > > > > > problematic from a > > > > > > > past ABI promises perspective. So the restore location > > > > > > > was > > > > > > > instead just > > > > > > > assumed from the shadow stack pointer. This works > > > > > > > because in > > > > > > > normal > > > > > > > allowed cases of calling sigreturn, the shadow stack > > > > > > > pointer > > > > > > > should be > > > > > > > right at the restore token at that time. There is no > > > > > > > alternate shadow > > > > > > > stack support. If an alt shadow stack is added later > > > > > > > we > > > > > > > would > > > > > > > need to > > > > > > > > > > > > So how is that going to work? altstack is not an esoteric > > > > > > corner > > > > > > case. > > > > > > > > > > My understanding is that the main usages for the signal stack > > > > > were > > > > > handling stack overflows and corruption. Since the shadow stack > > > > > only > > > > > contains return addresses rather than large stack allocations, > > > > > and is > > > > > not generally writable or pivotable, I thought there was a good > > > > > possibility an alt shadow stack would not end up being especially > > > > > useful. Does it seem like reasonable guesswork? > > > > > > > > The other 'problem' is that it is valid to longjump out of a signal > > > > handler. > > > > These days you have to use siglongjmp() not longjmp() but it is > > > > still used. > > > > > > > > It is probably also valid to use siglongjmp() to jump from a nested > > > > signal handler into the outer handler. > > > > Given both signal handlers can have their own stack, there can be > > > > three > > > > stacks involved. > > > > So the scenario is? > > > > 1. Handle signal 1 > > 2. sigsetjmp() > > 3. signalstack() > > 4. Handle signal 2 on alt stack > > 5. siglongjmp() > > > > I'll check that it is covered by the tests, but I think it should work > > in this series that has no alt shadow stack. I have only done a high > > level overview of how the shadow stack stuff, that doesn't involve the > > kernel, works in glibc. Sounds like I'll need to do a deeper dive. > > The posix/xopen definition for setjmp/longjmp doesn't require such > longjmp requests to work. > > Although they still have to do something that doesn't break badly. > Aborting the process is probably fine! > > > > > I think the shadow stack pointer has to be in ucontext - which also > > > > means the application can change it before returning from a signal. > > > > Yes we might need to change it to support alt shadow stacks. Can you > > elaborate why you think it has to be in ucontext? I was thinking of > > looking at three options for storing the ssp: > > - Stored in the shadow stack like a token using WRUSS from the kernel. > > - Stored on the kernel side using a hashmap that maps ucontext or > > sigframe userspace address to ssp (this is of course similar to > > storing in ucontext, except that the user can’t change the ssp). > > - Stored writable in userspace in ucontext. > > > > But in this version, without alt shadow stacks, the shadow stack > > pointer is not stored in ucontext. This causes the limitation that > > userspace can only call sigreturn when it has returned back to a point > > where there is a restore token on the shadow stack (which was placed > > there by the kernel). This doesn’t mean it can’t switch to a different > > shadow stack or handle a nested signal, but it limits the possibility > > for calling sigreturn with a totally different sigframe (like CRIU and > > SROP attacks do). It should hopefully be a helpful, protective > > limitation for most apps and I'm hoping CRIU can be fixed without > > removing it. > > > > I am not aware of other limitations to signals (besides normal shadow > > stack enforcement), but I could be missing it. And people's skepticism > > is making me want to go back over it with more scrutiny. > > > > > > In much the same way as all the segment registers can be changed > > > > leading to all the nasty bugs when the final 'return to user' code > > > > traps in kernel when loading invalid segment registers or executing > > > > iret. > > > > I don't think this is as difficult to avoid because userspace ssp has > > its own register that should not be accessed at that point, but I have > > not given this aspect enough analysis. Thanks for bringing it up. > > So the user ssp isn't saved (or restored) by the trap entry/exit. > So it needs to be saved by the context switch code? > Much like the user segment registers? > So you are likely to get the same problems if restoring it can fault > in kernel (eg for a non-canonical address). > > > > > Hmmm... do shadow stacks mean that longjmp() has to be a system > > > > call? > > > > > > No. setjmp/longjmp save and restore shadow stack pointer. > > Ok, I was thinking that direct access to the user ssp would be > a privileged operation. User space can only pop shadow stack. longjmp does #ifdef SHADOW_STACK_POINTER_OFFSET # if IS_IN (libc) && defined SHARED && defined FEATURE_1_OFFSET /* Check if Shadow Stack is enabled. */ testl $X86_FEATURE_1_SHSTK, %fs:FEATURE_1_OFFSET jz L(skip_ssp) # else xorl %eax, %eax # endif /* Check and adjust the Shadow-Stack-Pointer. */ /* Get the current ssp. */ rdsspq %rax /* And compare it with the saved ssp value. */ subq SHADOW_STACK_POINTER_OFFSET(%rdi), %rax je L(skip_ssp) /* Count the number of frames to adjust and adjust it with incssp instruction. The instruction can adjust the ssp by [0..255] value only thus use a loop if the number of frames is bigger than 255. */ negq %rax shrq $3, %rax /* NB: We saved Shadow-Stack-Pointer of setjmp. Since we are restoring Shadow-Stack-Pointer of setjmp's caller, we need to unwind shadow stack by one more frame. */ addq $1, %rax movl $255, %ebx L(loop): cmpq %rbx, %rax cmovb %rax, %rbx incsspq %rbx subq %rbx, %rax ja L(loop) L(skip_ssp): #endif > If it can be written you don't really have to worry about what code > is trying to do - it can actually do what it likes. > It just catches unintentional operations (like buffer overflows). > > Was there any 'spare' space in struct jmpbuf ? By pure luck, we have ONE spare space in sigjmp_buf. > Otherwise you can only enable shadow stacks if everything has been > recompiled - including any shared libraries the might be dlopen()ed. > (or does the compiler invent an alloca() call somehow for a > size that comes back from glibc?) > > I've never really considered how setjmp/longjmp handle callee saved > register variables (apart from it being hard). > The original pdp11 implementation probably only needed to save r6 and r7. > > What does happen to all the 'extended state' that XSAVE handles? > IIRC all the AVX registers are caller saved (so should probably > be zerod), but some of the SSE ones are callee saved, and one or > two of the fpu flags are sticky and annoying enough the save/restore > at the best of times. > > > It sounds like it would help to write up in a lot more detail exactly > > how all the signal and specialer stack manipulation scenarios work in > > glibc. > > Some cross references might have made people notice that the ucontext > extensions for AVX512 (if not earlier ones) broke the minimal/default > signal stack size. > > David >
(added more CRIU people) On Sun, Jan 30, 2022 at 01:18:03PM -0800, Rick Edgecombe wrote: > Hi, > > This is a slight reboot of the userspace CET series. I will be taking over the > series from Yu-cheng. Per some internal recommendations, I’ve reset the version > number and am calling it a new series. Hopefully, it doesn’t cause confusion. > > The new plan is to upstream only userspace Shadow Stack support at this point. > IBT can follow later, but for now I’ll focus solely on the most in-demand and > widely available (with the feature on AMD CPUs now) part of CET. > > I thought as part of this reset, it might be useful to more fully write-up the > design and summarize the history of the previous CET series. So this slightly > long cover letter does that. The "Updates" section has the changes, if anyone > doesn't want the history. > > > Why is Shadow Stack Wanted > ========================== > The main use case for userspace shadow stack is providing protection against > return oriented programming attacks. Fedora and Ubuntu already have many/most > packages enabled for shadow stack. The main missing piece is Linux kernel > support and there seems to be a high amount of interest in the ecosystem for > getting this feature supported. Besides security, Google has also done some > work on using shadow stack to improve performance and reliability of tracing. > > > Userspace Shadow Stack Implementation > ===================================== > Shadow stack works by maintaining a secondary (shadow) stack that cannot be > directly modified by applications. When executing a CALL instruction, the > processor pushes the return address to both the normal stack and to the special > permissioned shadow stack. Upon ret, the processor pops the shadow stack copy > and compares it to the normal stack copy. If the two differ, the processor > raises a control protection fault. This implementation supports shadow stack on > 64 bit kernels only, with support for 32 bit only via IA32 emulation. > > Shadow Stack Memory > ------------------- > The majority of this series deals with changes for handling the special > shadow stack memory permissions. This memory is specified by the > Dirty+RO PTE bits. A tricky aspect of this is that this combination was > previously used to specify COW memory. So Linux needs to handle COW > differently when shadow stack is in use. The solution is to use a > software PTE bit to denote COW memory, and take care to clear the dirty > bit when setting the memory RO. > > Setup and Upkeep of HW Registers > -------------------------------- > Using userspace CET requires a CR4 bit set, and also the manipulation > of two xsave managed MSRs. The kernel needs to modify these registers > during various operations like clone and signal handling. These > operations may happen when the registers are restored to the CPU, or > saved in an xsave buffer. Since the recent AMX triggered FPU overhaul > removed direct access to the xsave buffer, this series adds an > interface to operate on the supervisor xstate. > > New ABIs > -------- > This series introduces some new ABIs. The primary one is the shadow > stack itself. Since it is readable and the shadow stack pointer is > exposed to user space, applications can easily read and process the > shadow stack. And in fact the tracing usages plan to do exactly that. > > Most of the shadow stack contents are written by HW, but some of the > entries are added by the kernel. The main place for this is signals. As > part of handling the signal the kernel does some manual adjustment of > the shadow stack that userspace depends on. > > In addition to the contents of the shadow stack there is also user > visible behavior around when new shadow stacks are created and set in > the shadow stack pointer (SSP) register. This is relatively > straightforward – shadow stacks are created when new stacks are created > (thread creation, fork, etc). It is more or less what is required to > keep apps working. > > For situations when userspace creates a new stack (i.e. makecontext(), > fibers, etc), a new syscall is provided for creating shadow stack > memory. To make the shadow stack usable, it needs to have a restore > token written to the protected memory. So the syscall provides a way to > specificity this should be done by the kernel. > > When a shadow stack violation happens (when the return address of stack > not matching return address in shadow stack), a segfault is generated > with a new si_code specific to CET violations. > > Lastly, a new arch_prctl interface is created for controlling the > enablement of CET-like features. It is intended to also be used for > LAM. It operates on the feature status per-thread, so for process wide > enabling it is intended to be used early in things like dynamic > linker/loaders. However, it can be used later for per-thread enablement > of features like WRSS. > > WRSS > ---- > WRSS is an instruction that can write to shadow stacks. The HW provides > a way to enable this instruction for userspace use. Since shadow > stack’s are created initially protected, enabling WRSS allows any apps > that want to do unusual things with their stacks to have a way to > weaken protection and make things more flexible. A new feature bit is > defined to control enabling/disabling of WRSS. > > > History > ======= > The branding “CET” really consists of two features: “Shadow Stack” and > “Indirect Branch Tracking”. They both restrict previously allowed, but rarely > valid behaviors and require userspace to change to avoid these behaviors before > enabling the protection. These raw HW features need to be assembled into a > software solution across userspace and kernel in order to add security value. > The kernel part of this solution has evolved iteratively starting with a lengthy > RFC period. > > Until now, the enabling effort was trying to support both Shadow Stack and IBT. > This history will focus on a few areas of the shadow stack development history > that I thought stood out. > > Signals > ------- > Originally signals placed the location of the shadow stack restore > token inside the saved state on the stack. This was problematic from a > past ABI promises perspective. So the restore location was instead just > assumed from the shadow stack pointer. This works because in normal > allowed cases of calling sigreturn, the shadow stack pointer should be > right at the restore token at that time. There is no alternate shadow > stack support. If an alt shadow stack is added later we would need to > find a place to store the regular shadow stack token location. Options > could be to push something on the alt shadow stack, or to keep > something on the kernel side. So the current design keeps things simple > while slightly kicking the can down the road if alt shadow stacks > become a thing later. Siglongjmp is handled in glibc, using the incssp > instruction to unwind the shadow stack over the token. > > Shadow Stack Allocation > ----------------------- > makecontext() implementations need a way to create new shadow stacks > with restore token’s such that they can be pivoted to from userspace. > The first interface to do this was an arch_prctl(). It created a shadow > stack with a restore token pre-setup, since the kernel has an > instruction that can write to user shadow stacks. However, this > interface was abandoned for being strange. > > The next version created PROT_SHADOW_STACK. This interface had two > problems. One, it left no options but for userspace to create writable > memory, write a restore token, then mproctect() it PROT_SHADOW_STACK. > The writable window left the shadow stack exposed, weakening the > security. Second, it caused problems with the guard pages. Since the > memory was initially created writable it did not have a guard page, but > then was mprotected later to a type of memory that should have one. > This resulted in missing guard pages and confused rb_subtree_gap’s. > > This version introduces a new syscall that behaves similarly to the > initial arch_prctl() interface in that it has the kernel write the > restore token. > > Enabling Interface > ------------------ > For the entire history of the original CET series, the design was to > enable shadow stack automatically if the feature bit was detected in > the elf header. Then it was userspace’s responsibility to turn it off > via an arch_prctl() if it was not desired, and this was handled by the > glibc dynamic loader. Glibc’s standard behavior (when CET if configured > is to leave shadow stack enabled if the executable and all linked > libraries are marked with shadow stacks. > > Many distros (Fedora and others) have binaries already marked with > shadow stack, waiting for kernel support. Unfortunately their glibc > binaries expect the original arch_prctl() interface for allocating > shadow stacks, as those changes were pushed ahead of kernel support. > The net result of it all is, when updating to a kernel with shadow > stack these binaries would suddenly get shadow stack enabled and expect > the arch_prctl() interface to be there. And so calls to makecontext() > will fail, resulting in visible breakages. This series deals with this > problem as described below in "Updates". > > > Updates > ======= > These updates were mostly driven by public comments, but a lot of the design > elements are new. I would like some extra scrutiny on the updates. > > New syscall for Shadow Stack Allocation > --------------------------------------- > A new syscall is added for allocating shadow stacks to replace > PROT_SHADOW_STACK. Several options were considered, as described in the > “x86/cet/shstk: Introduce map_shadow_stack syscall”. > > Xsave Managed Supervisor State Modifications > -------------------------------------------- > The shadow stack feature requires the kernel to modify xsaves managed > state. On one of the last versions of Yu-cheng’s series Boris had > commented on the pattern it was using to do this not necessarily being > ideal. The pattern was to force a restore to the registers and always > do the modification there. Then Thomas did an overhaul of the fpu code, > part of which consisted of making raw access to the xsave buffer > private to the fpu code. So this series tries to expose access again, > and in a way that addresses Boris’ comments. > > The method is to provide functions like wmsrl/rdmsrl, but that can > direct the operation to the correct location (registers or buffer), > while giving the proper notice to the fpu subsystem so things don’t get > clobbered or corrupted. > > In the past a solution like this was discussed as part of the PASID > series, and Thomas was not in favor. In CET’s case there is a more > logic around the CET MSR’s than in PASID's, and wrapping this logic > minimizes near identical open coded logic needed to do this more > efficiently. In addition it resolves the above described problem of > having no access to the xsave buffer. So it is being put forward here > under the supposition that CET’s usage may lead to a different > conclusion, not to try to ignore past direction. > > The user interrupt series has similar needs as CET, and will also use > this internal interface if it’s found acceptable. > > Support for WRSS > ---------------- > Andy Lutomirski had asked if we change the shadow stack allocation API > such that userspace cannot create arbitrary shadow stacks, then we look > at exposing an interface to enable the WRSS instruction for userspace. > This way app’s that want to do unexpected things with shadow stacks > would still have the option to create shadow stacks with arbitrary > data. > > Switch Enabling Interface > ------------------------- > As described above there is a problem with userspace binaries waiting > to break as soon as the kernel supports CET. This needs to be prevented > by changing the interface such that the old binaries will not enable > shadow stack AND behave as if shadow stack is not enabled. They should > run normally without shadow stack protection. Creating a new feature > (SHSTK2) for shadow stack was explored. SHSTK would never be supported > by the kernel, and all the userspace build tools would be updated to > target SHSTK2 instead of SHSTK. So old SHSTK binaries would be cleanly > disabled. > > But there are existing downsides to automatic elf header processing > based enabling. The elf header feature spec is not defined by the > kernel and there are proposals to expand it to describe additional > logic. A simpler interface where the kernel is simply told what to > enable, and leaves all the decision making to userspace, is more > flexible for userspace and simpler for the kernel. There also already > needs to be an ARCH_X86_FEATURE_ENABLE arch_prctl() for WRSS (and > likely LAM will use it too), so it avoids there being two ways to turn > on these types of features. The only tricky part for shadow stack, is > that it has to be enabled very early. Wherever the shadow stack is > enabled, the app cannot return from that point, otherwise there will be > a shadow stack violation. It turns out glibc can enable shadow stack > this early, so it works nicely. So not automatically enabling any > features in the elf header will cleanly disable all old binaries, which > expect the kernel to enable CET features automatically. Then after the > kernel changes are upstream, glibc can be updated to use the new > interface. This is the solution implemented in this series. > > Expand Commit Logs > ------------------ > As part of spinning up on this series, I found some of the commit logs > did not describe the changes in enough detail for me understand their > purpose. I tried to expand the logs and comments, where I had to go > digging. Hopefully it’s useful. > > Limit to only Intel Processors > ------------------------------ > Shadow stack is supported on some AMD processors, but this revision > (with expanded HW usage and xsaves changes) has only has been tested on > Intel ones. So this series has a patch to limit shadow stack support to > Intel processors. Ideally the patch would not even make it to mainline, > and should be dropped as soon as this testing is done. It's included > just in case. > > > Future Work > =========== > Even though this is now exclusively a shadow stack series, there is still some > remaining shadow stack work to be done. > > Ptrace > ------ > Early in the series, there was a patch to allow IA32_U_CET and > IA32_PL3_SSP to be set. This patch was dropped and planned as a follow > up to basic support, and it remains the plan. It will be needed for > in-progress gdb support. > > CRIU Support > ------------ > In the past there was some speculation on the mailing list about > whether CRIU would need to be taught about CET. It turns out, it does. > The first issue hit is that CRIU calls sigreturn directly from its > “parasite code” that it injects into the dumper process. This violates > this shadow stack implementation’s protection that intends to prevent > attackers from doing this. > > With so many packages already enabled with shadow stack, there is > probably desire to make it work seamlessly. But in the meantime if > distros want to support shadow stack and CRIU, users could manually > disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for > a process they will wants to dump. It’s not ideal. > > I’d like to hear what people think about having shadow stack in the > kernel without this resolved. Nothing would change for any users until > they enable shadow stack in the kernel and update to a glibc configured > with CET. Should CRIU userspace be solved before kernel support? > > Selftests > --------- > There are some CET selftests being worked on and they are not included > here. > > Thanks, > > Rick > > Rick Edgecombe (7): > x86/mm: Prevent VM_WRITE shadow stacks > x86/fpu: Add helpers for modifying supervisor xstate > x86/fpu: Add unsafe xsave buffer helpers > x86/cet/shstk: Introduce map_shadow_stack syscall > selftests/x86: Add map_shadow_stack syscall test > x86/cet/shstk: Support wrss for userspace > x86/cpufeatures: Limit shadow stack to Intel CPUs > > Yu-cheng Yu (28): > Documentation/x86: Add CET description > x86/cet/shstk: Add Kconfig option for Shadow Stack > x86/cpufeatures: Add CET CPU feature flags for Control-flow > Enforcement Technology (CET) > x86/cpufeatures: Introduce CPU setup and option parsing for CET > x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states > x86/cet: Add control-protection fault handler > x86/mm: Remove _PAGE_DIRTY from kernel RO pages > x86/mm: Move pmd_write(), pud_write() up in the file > x86/mm: Introduce _PAGE_COW > drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS > x86/mm: Update pte_modify for _PAGE_COW > x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for > transition from _PAGE_DIRTY to _PAGE_COW > mm: Move VM_UFFD_MINOR_BIT from 37 to 38 > mm: Introduce VM_SHADOW_STACK for shadow stack memory > x86/mm: Check Shadow Stack page fault errors > x86/mm: Update maybe_mkwrite() for shadow stack > mm: Fixup places that call pte_mkwrite() directly > mm: Add guard pages around a shadow stack. > mm/mmap: Add shadow stack pages to memory accounting > mm: Update can_follow_write_pte() for shadow stack > mm/mprotect: Exclude shadow stack from preserve_write > mm: Re-introduce vm_flags to do_mmap() > x86/cet/shstk: Add user-mode shadow stack support > x86/process: Change copy_thread() argument 'arg' to 'stack_size' > x86/cet/shstk: Handle thread shadow stack > x86/cet/shstk: Introduce shadow stack token setup/verify routines > x86/cet/shstk: Handle signals for shadow stack > x86/cet/shstk: Add arch_prctl elf feature functions > > .../admin-guide/kernel-parameters.txt | 4 + > Documentation/filesystems/proc.rst | 1 + > Documentation/x86/cet.rst | 145 ++++++ > Documentation/x86/index.rst | 1 + > arch/arm/kernel/signal.c | 2 +- > arch/arm64/kernel/signal.c | 2 +- > arch/arm64/kernel/signal32.c | 2 +- > arch/sparc/kernel/signal32.c | 2 +- > arch/sparc/kernel/signal_64.c | 2 +- > arch/x86/Kconfig | 22 + > arch/x86/Kconfig.assembler | 5 + > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > arch/x86/ia32/ia32_signal.c | 25 +- > arch/x86/include/asm/cet.h | 54 +++ > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/disabled-features.h | 8 +- > arch/x86/include/asm/fpu/api.h | 8 + > arch/x86/include/asm/fpu/types.h | 23 +- > arch/x86/include/asm/fpu/xstate.h | 6 +- > arch/x86/include/asm/idtentry.h | 4 + > arch/x86/include/asm/mman.h | 24 + > arch/x86/include/asm/mmu_context.h | 2 + > arch/x86/include/asm/msr-index.h | 20 + > arch/x86/include/asm/page_types.h | 7 + > arch/x86/include/asm/pgtable.h | 302 ++++++++++-- > arch/x86/include/asm/pgtable_types.h | 48 +- > arch/x86/include/asm/processor.h | 6 + > arch/x86/include/asm/special_insns.h | 30 ++ > arch/x86/include/asm/trap_pf.h | 2 + > arch/x86/include/uapi/asm/mman.h | 8 +- > arch/x86/include/uapi/asm/prctl.h | 10 + > arch/x86/include/uapi/asm/processor-flags.h | 2 + > arch/x86/kernel/Makefile | 1 + > arch/x86/kernel/cpu/common.c | 20 + > arch/x86/kernel/cpu/cpuid-deps.c | 1 + > arch/x86/kernel/elf_feature_prctl.c | 72 +++ > arch/x86/kernel/fpu/xstate.c | 167 ++++++- > arch/x86/kernel/idt.c | 4 + > arch/x86/kernel/process.c | 17 +- > arch/x86/kernel/process_64.c | 2 + > arch/x86/kernel/shstk.c | 446 ++++++++++++++++++ > arch/x86/kernel/signal.c | 13 + > arch/x86/kernel/signal_compat.c | 2 +- > arch/x86/kernel/traps.c | 62 +++ > arch/x86/mm/fault.c | 19 + > arch/x86/mm/mmap.c | 48 ++ > arch/x86/mm/pat/set_memory.c | 2 +- > arch/x86/mm/pgtable.c | 25 + > drivers/gpu/drm/i915/gvt/gtt.c | 2 +- > fs/aio.c | 2 +- > fs/proc/task_mmu.c | 3 + > include/linux/mm.h | 19 +- > include/linux/pgtable.h | 8 + > include/linux/syscalls.h | 1 + > include/uapi/asm-generic/siginfo.h | 3 +- > include/uapi/asm-generic/unistd.h | 2 +- > ipc/shm.c | 2 +- > kernel/sys_ni.c | 1 + > mm/gup.c | 16 +- > mm/huge_memory.c | 27 +- > mm/memory.c | 5 +- > mm/migrate.c | 3 +- > mm/mmap.c | 15 +- > mm/mprotect.c | 9 +- > mm/nommu.c | 4 +- > mm/util.c | 2 +- > tools/testing/selftests/x86/Makefile | 9 +- > .../selftests/x86/test_map_shadow_stack.c | 75 +++ > 69 files changed, 1797 insertions(+), 92 deletions(-) > create mode 100644 Documentation/x86/cet.rst > create mode 100644 arch/x86/include/asm/cet.h > create mode 100644 arch/x86/include/asm/mman.h > create mode 100644 arch/x86/kernel/elf_feature_prctl.c > create mode 100644 arch/x86/kernel/shstk.c > create mode 100644 tools/testing/selftests/x86/test_map_shadow_stack.c > > > base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07 > -- > 2.17.1
On Sun, Feb 06, 2022 at 08:42:03PM +0200, Mike Rapoport wrote: > (added more CRIU people) Thanks, Mike. > On Sun, Jan 30, 2022 at 01:18:03PM -0800, Rick Edgecombe wrote: > > This is a slight reboot of the userspace CET series. I will be taking over the > > series from Yu-cheng. Per some internal recommendations, I’ve reset the version > > number and am calling it a new series. Hopefully, it doesn’t cause confusion. > > > > The new plan is to upstream only userspace Shadow Stack support at this point. > > IBT can follow later, but for now I’ll focus solely on the most in-demand and > > widely available (with the feature on AMD CPUs now) part of CET. > > > > I thought as part of this reset, it might be useful to more fully write-up the > > design and summarize the history of the previous CET series. So this slightly > > long cover letter does that. The "Updates" section has the changes, if anyone > > doesn't want the history. [...] > > CRIU Support > > ------------ > > In the past there was some speculation on the mailing list about > > whether CRIU would need to be taught about CET. It turns out, it does. > > The first issue hit is that CRIU calls sigreturn directly from its > > “parasite code” that it injects into the dumper process. This violates > > this shadow stack implementation’s protection that intends to prevent > > attackers from doing this. > > > > With so many packages already enabled with shadow stack, there is > > probably desire to make it work seamlessly. But in the meantime if > > distros want to support shadow stack and CRIU, users could manually > > disabled shadow stack via “GLIBC_TUNABLES=glibc.cpu.x86_shstk=off” for > > a process they will wants to dump. It’s not ideal. > > > > I’d like to hear what people think about having shadow stack in the > > kernel without this resolved. Nothing would change for any users until > > they enable shadow stack in the kernel and update to a glibc configured > > with CET. Should CRIU userspace be solved before kernel support? From the CRIU side I can say that I would definitely like to see this resolved. CRIU just went through a similar exercise with rseq() being enabled in glibc and CI broke all around for us and other projects relying on CRIU. Although rseq() was around for a long time we were not aware of it but luckily 5.13 introduced a way to handle it for CRIU with ptrace. An environment variable existed but did not really help when CRIU is called somewhere in the middle of the container software stack. From my point of view a solution not involving an environment variable would definitely be preferred. Adrian
* David Laight:
> Was there any 'spare' space in struct jmpbuf ?
jmp_buf in glibc looks like this:
(gdb) ptype/o jmp_buf
type = struct __jmp_buf_tag {
/* 0 | 64 */ __jmp_buf __jmpbuf;
/* 64 | 4 */ int __mask_was_saved;
/* XXX 4-byte hole */
/* 72 | 128 */ __sigset_t __saved_mask;
/* total size (bytes): 200 */
} [1]
(gdb) ptype/o __jmp_buf
type = long [8]
The glibc ABI reserves space for 1024 signals, something that Linux is
never going to implement. We can use that space to store a few extra
registers in __save_mask. There is a complication because the
pthread_cancel unwinding allocates only space for the __jmpbuf member.
Fortunately, we do not need to unwind the shadow stack for thread
cancellation, so we don't need that extra space in that case.
Thanks,
Florian
On 2/6/22 23:20, Adrian Reber wrote: >>> CRIU Support >>> ------------ >>> In the past there was some speculation on the mailing list about >>> whether CRIU would need to be taught about CET. It turns out, it does. >>> The first issue hit is that CRIU calls sigreturn directly from its >>> “parasite code” that it injects into the dumper process. This violates >>> this shadow stack implementation’s protection that intends to prevent >>> attackers from doing this. ... >>From the CRIU side I can say that I would definitely like to see this > resolved. CRIU just went through a similar exercise with rseq() being > enabled in glibc and CI broke all around for us and other projects > relying on CRIU. Although rseq() was around for a long time we were not > aware of it but luckily 5.13 introduced a way to handle it for CRIU with > ptrace. An environment variable existed but did not really help when > CRIU is called somewhere in the middle of the container software stack. > >>From my point of view a solution not involving an environment variable > would definitely be preferred. Have there been things like this for CRIU in the past? Something where CRIU needs control but that's also security-sensitive? Any thoughts on how you would _like_ to see this resolved?
On 2/5/22 12:15, Edgecombe, Rick P wrote: > On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote: >> On Sat, Feb 5, 2022 at 5:27 AM David Laight <David.Laight@aculab.com> >> wrote: >>> >>> From: Edgecombe, Rick P >>>> Sent: 04 February 2022 01:08 >>>> Hi Thomas, >>>> >>>> Thanks for feedback on the plan. >>>> >>>> On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: >>>>>> Until now, the enabling effort was trying to support both >>>>>> Shadow >>>>>> Stack and IBT. >>>>>> This history will focus on a few areas of the shadow stack >>>>>> development history >>>>>> that I thought stood out. >>>>>> >>>>>> Signals >>>>>> ------- >>>>>> Originally signals placed the location of the shadow >>>>>> stack >>>>>> restore >>>>>> token inside the saved state on the stack. This was >>>>>> problematic from a >>>>>> past ABI promises perspective. So the restore location >>>>>> was >>>>>> instead just >>>>>> assumed from the shadow stack pointer. This works >>>>>> because in >>>>>> normal >>>>>> allowed cases of calling sigreturn, the shadow stack >>>>>> pointer >>>>>> should be >>>>>> right at the restore token at that time. There is no >>>>>> alternate shadow >>>>>> stack support. If an alt shadow stack is added later >>>>>> we >>>>>> would >>>>>> need to >>>>> >>>>> So how is that going to work? altstack is not an esoteric >>>>> corner >>>>> case. >>>> >>>> My understanding is that the main usages for the signal stack >>>> were >>>> handling stack overflows and corruption. Since the shadow stack >>>> only >>>> contains return addresses rather than large stack allocations, >>>> and is >>>> not generally writable or pivotable, I thought there was a good >>>> possibility an alt shadow stack would not end up being especially >>>> useful. Does it seem like reasonable guesswork? >>> >>> The other 'problem' is that it is valid to longjump out of a signal >>> handler. >>> These days you have to use siglongjmp() not longjmp() but it is >>> still used. >>> >>> It is probably also valid to use siglongjmp() to jump from a nested >>> signal handler into the outer handler. >>> Given both signal handlers can have their own stack, there can be >>> three >>> stacks involved. > > So the scenario is? > > 1. Handle signal 1 > 2. sigsetjmp() > 3. signalstack() > 4. Handle signal 2 on alt stack > 5. siglongjmp() > > I'll check that it is covered by the tests, but I think it should work > in this series that has no alt shadow stack. I have only done a high > level overview of how the shadow stack stuff, that doesn't involve the > kernel, works in glibc. Sounds like I'll need to do a deeper dive. > >>> >>> I think the shadow stack pointer has to be in ucontext - which also >>> means the application can change it before returning from a signal. > > Yes we might need to change it to support alt shadow stacks. Can you > elaborate why you think it has to be in ucontext? I was thinking of > looking at three options for storing the ssp: > - Stored in the shadow stack like a token using WRUSS from the kernel. > - Stored on the kernel side using a hashmap that maps ucontext or > sigframe userspace address to ssp (this is of course similar to > storing in ucontext, except that the user can’t change the ssp). > - Stored writable in userspace in ucontext. > > But in this version, without alt shadow stacks, the shadow stack > pointer is not stored in ucontext. This causes the limitation that > userspace can only call sigreturn when it has returned back to a point > where there is a restore token on the shadow stack (which was placed > there by the kernel). I'll reply here and maybe cover multiple things. User code already needs to rewind the regular stack to call sigreturn -- sigreturn find the signal frame based on ESP/RSP. So if you call it from the wrong place, you go boom. I think that the Linux SHSTK ABI should have the property that no amount of tampering with just the ucontext and associated structures can cause sigreturn to redirect to the wrong IP -- there should be something on the shadow stack that also gets verified in sigreturn. IIRC the series does this, but it's been a while. The post-sigreturn SSP should be entirely implied by pre-sigreturn SSP (or perhaps something on the shadow stack), so, in the absence of an altshadowstack feature, no ucontext changes should be needed. We can also return from a signal or from more than one signal at once, as above, using siglongjmp. It seems like this should Just Work (tm), at least in the absence of altshadowstack. So this leaves altshadowstack. If we want to allow userspace to handle a shstk overflow, I think we need altshadowstack. And I can easily imagine signal handling in a coroutine or user-threading evironment (Go? UMCG or whatever it's called?) wanting this. As noted, this obnoxious Andy person didn't like putting any shstk-related extensions in the FPU state. For better or for worse, altshadowstack is (I think) fundamentally a new API. No amount of ucontext magic is going to materialize an entire shadow stack out of nowhere when someone calls sigaltstack(). So the questions are: should we support altshadowstack from day one and, if so, what should it look like? If we want to be clever, we could attempt to make altstadowstack compatible with RSTORSSP. Signal delivery pushes a restore token to the old stack (hah! what if the old stack is full?) and pushes the RSTORSSP busy magic to the new stack, and sigreturn inverts it. Code that wants to return without sigreturn does it manually with RSTORSSP. (Assuming that I've understood the arcane RSTORSSP sequence right. Intel wins major points for documentation quality here.) Or we could invent our own scheme. In either case, I don't immediately see any reason that the ucontext needs to contain a shadow stack pointer. There's a delightful wart to consider, though. siglongjmp, at least as currently envisioned, can't return off an altshadowstack: the whole point of the INCSSP distance restrictions to to avoid incrementing right off the top of the current stack, but siglongjmp off an altshadowstack fundamentally switches stacks. So either siglongjmp off an altshadowstack needs to be illegal or it needs to work differently. (By incssp-ing to the top of the altshadowstack, then switching, then incssp-ing some more? How does it even find the top of the current altshadowstack?) And the plot thickens if one tries to siglongjmp off two nested altshadowstack-using signals in a single call. Fortunately, since altshadowstack is a new API, it's not entirely crazy to have different rules. So I don't have a complete or even almost complete design in mind, but I think we do need to make a conscious decision either to design this right or to skip it for v1. As for CRIU, I don't think anyone really expects a new kernel, running new userspace that takes advantage of features in the new kernel, to work with old CRIU. Upgrading to a SHSTK kernel should still allow using CRIU with non-SHSTK userspace, but I don't see how it's possible for CRIU to handle SHSTK without updates. We should certainly do our best to make CRIU's life easy, though. This doesn’t mean it can’t switch to a different > shadow stack or handle a nested signal, but it limits the possibility > for calling sigreturn with a totally different sigframe (like CRIU and > SROP attacks do). It should hopefully be a helpful, protective > limitation for most apps and I'm hoping CRIU can be fixed without > removing it. > > I am not aware of other limitations to signals (besides normal shadow > stack enforcement), but I could be missing it. And people's skepticism > is making me want to go back over it with more scrutiny. > >>> In much the same way as all the segment registers can be changed >>> leading to all the nasty bugs when the final 'return to user' code >>> traps in kernel when loading invalid segment registers or executing >>> iret. > > I don't think this is as difficult to avoid because userspace ssp has > its own register that should not be accessed at that point, but I have > not given this aspect enough analysis. Thanks for bringing it up. > >>> >>> Hmmm... do shadow stacks mean that longjmp() has to be a system >>> call? >> >> No. setjmp/longjmp save and restore shadow stack pointer. >> > > It sounds like it would help to write up in a lot more detail exactly > how all the signal and specialer stack manipulation scenarios work in > glibc. >
On Sun, 2022-02-06 at 13:42 +0000, David Laight wrote: > > I don't think this is as difficult to avoid because userspace ssp > > has > > its own register that should not be accessed at that point, but I > > have > > not given this aspect enough analysis. Thanks for bringing it up. > > So the user ssp isn't saved (or restored) by the trap entry/exit. > So it needs to be saved by the context switch code? > Much like the user segment registers? > So you are likely to get the same problems if restoring it can fault > in kernel (eg for a non-canonical address). PL3_SSP is lazily saved and restored by the FPU supervisor xsave code, which has its buffer in kernel memory. For the most part it is userspace instructions that use this register and they can only modify it in limited ways. It does look like IRET can cause a #CP if the PL3 SSP is not aligned, but only after RIP and CPL are set back to userspace. I'm not confident enough interpreting the specs to assert the specific behavior and will follow up internally to clarify.
On Mon, Feb 07, 2022 at 08:30:50AM -0800, Dave Hansen wrote: > On 2/6/22 23:20, Adrian Reber wrote: > >>> CRIU Support > >>> ------------ > >>> In the past there was some speculation on the mailing list about > >>> whether CRIU would need to be taught about CET. It turns out, it does. > >>> The first issue hit is that CRIU calls sigreturn directly from its > >>> “parasite code” that it injects into the dumper process. This violates > >>> this shadow stack implementation’s protection that intends to prevent > >>> attackers from doing this. > ... > >>From the CRIU side I can say that I would definitely like to see this > > resolved. CRIU just went through a similar exercise with rseq() being > > enabled in glibc and CI broke all around for us and other projects > > relying on CRIU. Although rseq() was around for a long time we were not > > aware of it but luckily 5.13 introduced a way to handle it for CRIU with > > ptrace. An environment variable existed but did not really help when > > CRIU is called somewhere in the middle of the container software stack. > > > >>From my point of view a solution not involving an environment variable > > would definitely be preferred. > > Have there been things like this for CRIU in the past? Something where > CRIU needs control but that's also security-sensitive? Generally CRIU requires (almost) root privileges to work, but I don't think it handles something as security sensitive and restrictive as shadow stacks. > Any thoughts on how you would _like_ to see this resolved? Ideally, CRIU will need a knob that will tell the kernel/CET machinery where the next RET will jump, along the lines of restore_signal_shadow_stack() AFAIU. But such a knob will immediately reduce the security value of the entire thing, and I don't have good ideas how to deal with it :(
On Tue, Feb 08, 2022 at 11:16:51AM +0200, Mike Rapoport wrote: > > > Any thoughts on how you would _like_ to see this resolved? > > Ideally, CRIU will need a knob that will tell the kernel/CET machinery > where the next RET will jump, along the lines of > restore_signal_shadow_stack() AFAIU. > > But such a knob will immediately reduce the security value of the entire > thing, and I don't have good ideas how to deal with it :( Probably a kind of latch in the task_struct which would trigger off once returt to a different address happened, thus we would be able to jump inside paratite code. Of course such trigger should be available under proper capability only.
On Mon, Feb 07 2022 at 17:31, Andy Lutomirski wrote: > So this leaves altshadowstack. If we want to allow userspace to handle > a shstk overflow, I think we need altshadowstack. And I can easily > imagine signal handling in a coroutine or user-threading evironment (Go? > UMCG or whatever it's called?) wanting this. As noted, this obnoxious > Andy person didn't like putting any shstk-related extensions in the FPU > state. > > For better or for worse, altshadowstack is (I think) fundamentally a new > API. No amount of ucontext magic is going to materialize an entire > shadow stack out of nowhere when someone calls sigaltstack(). So the > questions are: should we support altshadowstack from day one and, if so, > what should it look like? I think we should support them from day one. > So I don't have a complete or even almost complete design in mind, but I > think we do need to make a conscious decision either to design this > right or to skip it for v1. Skipping it might create a fundamental design fail situation as it might require changes to the shadow stack signal handling in general which becomes a nightmare once a non-altstack API is exposed. > As for CRIU, I don't think anyone really expects a new kernel, running > new userspace that takes advantage of features in the new kernel, to > work with old CRIU. Yes, CRIU needs updates, but what ensures that CRIU managed user space does not use SHSTK if CRIU is not updated yet? > Upgrading to a SHSTK kernel should still allow using CRIU with > non-SHSTK userspace, but I don't see how it's possible for CRIU to > handle SHSTK without updates. We should certainly do our best to make > CRIU's life easy, though. Handling CRIU with SHSTK enabled has to be part of the overall design otherwise we'll either end up with horrible hacks or with a requirement to change the V1 UAPI.... Thanks, tglx
On Tue, Feb 8, 2022, at 1:31 AM, Thomas Gleixner wrote: > On Mon, Feb 07 2022 at 17:31, Andy Lutomirski wrote: >> So this leaves altshadowstack. If we want to allow userspace to handle >> a shstk overflow, I think we need altshadowstack. And I can easily >> imagine signal handling in a coroutine or user-threading evironment (Go? >> UMCG or whatever it's called?) wanting this. As noted, this obnoxious >> Andy person didn't like putting any shstk-related extensions in the FPU >> state. >> >> For better or for worse, altshadowstack is (I think) fundamentally a new >> API. No amount of ucontext magic is going to materialize an entire >> shadow stack out of nowhere when someone calls sigaltstack(). So the >> questions are: should we support altshadowstack from day one and, if so, >> what should it look like? > > I think we should support them from day one. > >> So I don't have a complete or even almost complete design in mind, but I >> think we do need to make a conscious decision either to design this >> right or to skip it for v1. > > Skipping it might create a fundamental design fail situation as it might > require changes to the shadow stack signal handling in general which > becomes a nightmare once a non-altstack API is exposed. It would also expose a range of kernels in which shstk is on but programs that want altshadowstack don't have it. That would be annoying. > >> As for CRIU, I don't think anyone really expects a new kernel, running >> new userspace that takes advantage of features in the new kernel, to >> work with old CRIU. > > Yes, CRIU needs updates, but what ensures that CRIU managed user space > does not use SHSTK if CRIU is not updated yet? In some sense this is like any other feature. If a program uses timerfd but CRIU doesn't support timerfd, then it won't work. SHSTK is a bit unique because it's likely that all programs on a system will start using it all at once.
On Tue, Feb 8, 2022, at 1:29 AM, Cyrill Gorcunov wrote: > On Tue, Feb 08, 2022 at 11:16:51AM +0200, Mike Rapoport wrote: >> >> > Any thoughts on how you would _like_ to see this resolved? >> >> Ideally, CRIU will need a knob that will tell the kernel/CET machinery >> where the next RET will jump, along the lines of >> restore_signal_shadow_stack() AFAIU. >> >> But such a knob will immediately reduce the security value of the entire >> thing, and I don't have good ideas how to deal with it :( > > Probably a kind of latch in the task_struct which would trigger off once > returt to a different address happened, thus we would be able to jump inside > paratite code. Of course such trigger should be available under proper > capability only. I'm not fully in touch with how parasite, etc works. Are we talking about save or restore? If it's restore, what exactly does CRIU need to do? Is it just that CRIU needs to return out from its resume code into the to-be-resumed program without tripping CET? Would it be acceptable for CRIU to require that at least one shstk slot be free at save time? Or do we need a mechanism to atomically switch to a completely full shadow stack at resume? Off the top of my head, a sigreturn (or sigreturn-like mechanism) that is intended for use for altshadowstack could safely verify a token on the altshdowstack, possibly compare to something in ucontext (or not -- this isn't clearly necessary) and switch back to the previous stack. CRIU could use that too. Obviously CRIU will need a way to populate the relevant stacks, but WRUSS can be used for that, and I think this is a fundamental requirement for CRIU -- CRIU restore absolutely needs a way to write the saved shadow stack data into the shadow stack. So I think the only special capability that CRIU really needs is WRUSS, and we need to wire that up anyway.
On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote: > >> But such a knob will immediately reduce the security value of the entire > >> thing, and I don't have good ideas how to deal with it :( > > > > Probably a kind of latch in the task_struct which would trigger off once > > returt to a different address happened, thus we would be able to jump inside > > paratite code. Of course such trigger should be available under proper > > capability only. > > I'm not fully in touch with how parasite, etc works. Are we talking about save or restore? We use parasite code in question during checkpoint phase as far as I remember. push addr/lret trick is used to run "injected" code (code injection itself is done via ptrace) in compat mode at least. Dima, Andrei, I didn't look into this code for years already, do we still need to support compat mode at all? > If it's restore, what exactly does CRIU need to do? Is it just that CRIU needs to return > out from its resume code into the to-be-resumed program without tripping CET? Would it > be acceptable for CRIU to require that at least one shstk slot be free at save time? > Or do we need a mechanism to atomically switch to a completely full shadow stack at resume? > > Off the top of my head, a sigreturn (or sigreturn-like mechanism) that is intended for > use for altshadowstack could safely verify a token on the altshdowstack, possibly > compare to something in ucontext (or not -- this isn't clearly necessary) and switch > back to the previous stack. CRIU could use that too. Obviously CRIU will need a way > to populate the relevant stacks, but WRUSS can be used for that, and I think this > is a fundamental requirement for CRIU -- CRIU restore absolutely needs a way to write > the saved shadow stack data into the shadow stack. > > So I think the only special capability that CRIU really needs is WRUSS, and > we need to wire that up anyway. Thanks for these notes, Andy! I can't provide any sane answer here since didn't read tech spec for this feature yet :-)
On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote: > > > > But such a knob will immediately reduce the security value of > > > > the entire > > > > thing, and I don't have good ideas how to deal with it :( > > > > > > Probably a kind of latch in the task_struct which would trigger > > > off once > > > returt to a different address happened, thus we would be able to > > > jump inside > > > paratite code. Of course such trigger should be available under > > > proper > > > capability only. > > > > I'm not fully in touch with how parasite, etc works. Are we > > talking about save or restore? > > We use parasite code in question during checkpoint phase as far as I > remember. > push addr/lret trick is used to run "injected" code (code injection > itself is > done via ptrace) in compat mode at least. Dima, Andrei, I didn't look > into this code > for years already, do we still need to support compat mode at all? > > > If it's restore, what exactly does CRIU need to do? Is it just > > that CRIU needs to return > > out from its resume code into the to-be-resumed program without > > tripping CET? Would it > > be acceptable for CRIU to require that at least one shstk slot be > > free at save time? > > Or do we need a mechanism to atomically switch to a completely full > > shadow stack at resume? > > > > Off the top of my head, a sigreturn (or sigreturn-like mechanism) > > that is intended for > > use for altshadowstack could safely verify a token on the > > altshdowstack, possibly > > compare to something in ucontext (or not -- this isn't clearly > > necessary) and switch > > back to the previous stack. CRIU could use that too. Obviously > > CRIU will need a way > > to populate the relevant stacks, but WRUSS can be used for that, > > and I think this > > is a fundamental requirement for CRIU -- CRIU restore absolutely > > needs a way to write > > the saved shadow stack data into the shadow stack. Still wrapping my head around the CRIU save and restore steps, but another general approach might be to give ptrace the ability to temporarily pause/resume/set CET enablement and SSP for a stopped thread. Then injected code doesn't need to jump through any hoops or possibly run into road blocks. I'm not sure how much this opens things up if the thread has to be stopped... Cyrill, could it fit into the CRIU pause and resume flow? What action causes the final resuming of execution of the restored process for checkpointing and for restore? Wondering if we could somehow make CET re-enable exactly then. And I guess this also needs a way to create shadow stack allocations at a specific address to match where they were in the dumped process. That is missing in this series. > > > > So I think the only special capability that CRIU really needs is > > WRUSS, and > > we need to wire that up anyway. > > Thanks for these notes, Andy! I can't provide any sane answer here > since didn't > read tech spec for this feature yet :-)
On Wed, Feb 09, 2022 at 02:18:42AM +0000, Edgecombe, Rick P wrote: ... > > Still wrapping my head around the CRIU save and restore steps, but > another general approach might be to give ptrace the ability to > temporarily pause/resume/set CET enablement and SSP for a stopped > thread. Then injected code doesn't need to jump through any hoops or > possibly run into road blocks. I'm not sure how much this opens things > up if the thread has to be stopped... > > Cyrill, could it fit into the CRIU pause and resume flow? What action > causes the final resuming of execution of the restored process for > checkpointing and for restore? Wondering if we could somehow make CET > re-enable exactly then. > > And I guess this also needs a way to create shadow stack allocations at > a specific address to match where they were in the dumped process. That > is missing in this series. Thanks Rick! This sounds like an option. I need a couple of days to refresh my memory about criu internals. Let me CC a few current active criu developers (CC list is already big enough though:), maybe this will speedup the procedure.
Hi Rick, On Wed, Feb 09, 2022 at 02:18:42AM +0000, Edgecombe, Rick P wrote: > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > > On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote: > > > > > But such a knob will immediately reduce the security value of > > > > > the entire > > > > > thing, and I don't have good ideas how to deal with it :( > > > > > > > > Probably a kind of latch in the task_struct which would trigger > > > > off once > > > > returt to a different address happened, thus we would be able to > > > > jump inside > > > > paratite code. Of course such trigger should be available under > > > > proper > > > > capability only. > > > > > > I'm not fully in touch with how parasite, etc works. Are we > > > talking about save or restore? > > > > We use parasite code in question during checkpoint phase as far as I > > remember. > > push addr/lret trick is used to run "injected" code (code injection > > itself is > > done via ptrace) in compat mode at least. Dima, Andrei, I didn't look > > into this code > > for years already, do we still need to support compat mode at all? > > > > > If it's restore, what exactly does CRIU need to do? Is it just > > > that CRIU needs to return > > > out from its resume code into the to-be-resumed program without > > > tripping CET? Would it > > > be acceptable for CRIU to require that at least one shstk slot be > > > free at save time? > > > Or do we need a mechanism to atomically switch to a completely full > > > shadow stack at resume? > > > > > > Off the top of my head, a sigreturn (or sigreturn-like mechanism) > > > that is intended for > > > use for altshadowstack could safely verify a token on the > > > altshdowstack, possibly > > > compare to something in ucontext (or not -- this isn't clearly > > > necessary) and switch > > > back to the previous stack. CRIU could use that too. Obviously > > > CRIU will need a way > > > to populate the relevant stacks, but WRUSS can be used for that, > > > and I think this > > > is a fundamental requirement for CRIU -- CRIU restore absolutely > > > needs a way to write > > > the saved shadow stack data into the shadow stack. > > Still wrapping my head around the CRIU save and restore steps, but > another general approach might be to give ptrace the ability to > temporarily pause/resume/set CET enablement and SSP for a stopped > thread. Then injected code doesn't need to jump through any hoops or > possibly run into road blocks. I'm not sure how much this opens things > up if the thread has to be stopped... IIRC, criu dump does something like this: * Stop the process being dumped (victim) with ptrace * Inject parasite code and data into the victim, again with ptrace. Among other things the parasite data contains a sigreturn frame with saved victim state. * Resume the victim process, which will run parasite code now. * When parasite finishes it uses that frame to sigreturn to normal victim execution So, my feeling is that for dump side WRUSS should be enough. > Cyrill, could it fit into the CRIU pause and resume flow? What action > causes the final resuming of execution of the restored process for > checkpointing and for restore? Wondering if we could somehow make CET > re-enable exactly then. > > And I guess this also needs a way to create shadow stack allocations at > a specific address to match where they were in the dumped process. That > is missing in this series. Yes, criu restore will need to recreate shadow stack mappings. Currently, we recreate the restored process (target) address space based on /proc/pid/maps and /proc/pid/smaps. CRIU preserves the virtual addresses and VMA flags. The relevant steps of restore process can be summarised as: * Clone() the target process tree * Recreate VMAs with the needed size and flags, but not necessarily at the correct place yet * Partially populate memory data from the saved images * Move VMAs to their exact addresses * Complete restoring the data * Create a frame for sigreturn and jump to the target. Here, the stack used after sigreturn contains the data that was captured during dump and it entirely different from what shadow stack will contain. There are several points when the target threads are stopped, so pausing/resuming CET may help. > > > So I think the only special capability that CRIU really needs is > > > WRUSS, and > > > we need to wire that up anyway. > > > > Thanks for these notes, Andy! I can't provide any sane answer here > > since didn't > > read tech spec for this feature yet :-) > > > >
On 2/8/22 18:18, Edgecombe, Rick P wrote: > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: >> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote: >>>>> But such a knob will immediately reduce the security value of >>>>> the entire >>>>> thing, and I don't have good ideas how to deal with it :( >>>> >>>> Probably a kind of latch in the task_struct which would trigger >>>> off once >>>> returt to a different address happened, thus we would be able to >>>> jump inside >>>> paratite code. Of course such trigger should be available under >>>> proper >>>> capability only. >>> >>> I'm not fully in touch with how parasite, etc works. Are we >>> talking about save or restore? >> >> We use parasite code in question during checkpoint phase as far as I >> remember. >> push addr/lret trick is used to run "injected" code (code injection >> itself is >> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look >> into this code >> for years already, do we still need to support compat mode at all? >> >>> If it's restore, what exactly does CRIU need to do? Is it just >>> that CRIU needs to return >>> out from its resume code into the to-be-resumed program without >>> tripping CET? Would it >>> be acceptable for CRIU to require that at least one shstk slot be >>> free at save time? >>> Or do we need a mechanism to atomically switch to a completely full >>> shadow stack at resume? >>> >>> Off the top of my head, a sigreturn (or sigreturn-like mechanism) >>> that is intended for >>> use for altshadowstack could safely verify a token on the >>> altshdowstack, possibly >>> compare to something in ucontext (or not -- this isn't clearly >>> necessary) and switch >>> back to the previous stack. CRIU could use that too. Obviously >>> CRIU will need a way >>> to populate the relevant stacks, but WRUSS can be used for that, >>> and I think this >>> is a fundamental requirement for CRIU -- CRIU restore absolutely >>> needs a way to write >>> the saved shadow stack data into the shadow stack. > > Still wrapping my head around the CRIU save and restore steps, but > another general approach might be to give ptrace the ability to > temporarily pause/resume/set CET enablement and SSP for a stopped > thread. Then injected code doesn't need to jump through any hoops or > possibly run into road blocks. I'm not sure how much this opens things > up if the thread has to be stopped... Hmm, that's maybe not insane. An alternative would be to add a bona fide ptrace call-a-function mechanism. I can think of two potentially usable variants: 1. Straight call. PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, shadow stack push and all. 2. Signal-style. PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal frame just like a real signal is being delivered with the specified handler. There could be a variant to opt-in to also using a specified altstack and altshadowstack. 2 would be more expensive but would avoid the need for much in the way of asm magic. The injected code could be plain C (or Rust or Zig or whatever). All of this only really handles save, not restore. I don't understand restore enough to fully understand the issue. --Andy
On Wed, Feb 9, 2022 at 6:37 PM Andy Lutomirski <luto@kernel.org> wrote: > > On 2/8/22 18:18, Edgecombe, Rick P wrote: > > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > >> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote: > >>>>> But such a knob will immediately reduce the security value of > >>>>> the entire > >>>>> thing, and I don't have good ideas how to deal with it :( > >>>> > >>>> Probably a kind of latch in the task_struct which would trigger > >>>> off once > >>>> returt to a different address happened, thus we would be able to > >>>> jump inside > >>>> paratite code. Of course such trigger should be available under > >>>> proper > >>>> capability only. > >>> > >>> I'm not fully in touch with how parasite, etc works. Are we > >>> talking about save or restore? > >> > >> We use parasite code in question during checkpoint phase as far as I > >> remember. > >> push addr/lret trick is used to run "injected" code (code injection > >> itself is > >> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look > >> into this code > >> for years already, do we still need to support compat mode at all? > >> > >>> If it's restore, what exactly does CRIU need to do? Is it just > >>> that CRIU needs to return > >>> out from its resume code into the to-be-resumed program without > >>> tripping CET? Would it > >>> be acceptable for CRIU to require that at least one shstk slot be > >>> free at save time? > >>> Or do we need a mechanism to atomically switch to a completely full > >>> shadow stack at resume? > >>> > >>> Off the top of my head, a sigreturn (or sigreturn-like mechanism) > >>> that is intended for > >>> use for altshadowstack could safely verify a token on the > >>> altshdowstack, possibly > >>> compare to something in ucontext (or not -- this isn't clearly > >>> necessary) and switch > >>> back to the previous stack. CRIU could use that too. Obviously > >>> CRIU will need a way > >>> to populate the relevant stacks, but WRUSS can be used for that, > >>> and I think this > >>> is a fundamental requirement for CRIU -- CRIU restore absolutely > >>> needs a way to write > >>> the saved shadow stack data into the shadow stack. > > > > Still wrapping my head around the CRIU save and restore steps, but > > another general approach might be to give ptrace the ability to > > temporarily pause/resume/set CET enablement and SSP for a stopped > > thread. Then injected code doesn't need to jump through any hoops or > > possibly run into road blocks. I'm not sure how much this opens things > > up if the thread has to be stopped... > > Hmm, that's maybe not insane. > > An alternative would be to add a bona fide ptrace call-a-function > mechanism. I can think of two potentially usable variants: > > 1. Straight call. PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, > shadow stack push and all. > > 2. Signal-style. PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal > frame just like a real signal is being delivered with the specified > handler. There could be a variant to opt-in to also using a specified > altstack and altshadowstack. > > 2 would be more expensive but would avoid the need for much in the way > of asm magic. The injected code could be plain C (or Rust or Zig or > whatever). > > All of this only really handles save, not restore. I don't understand > restore enough to fully understand the issue. FWIW, CET enabled GDB can call a function in a CET enabled process. Adding Felix who may know more about it.
> -----Original Message----- > From: H.J. Lu <hjl.tools@gmail.com> > Sent: Donnerstag, 10. Februar 2022 03:54 > To: Lutomirski, Andy <luto@kernel.org>; Willgerodt, Felix > <felix.willgerodt@intel.com> > Cc: Edgecombe, Rick P <rick.p.edgecombe@intel.com>; gorcunov@gmail.com; > bsingharora@gmail.com; hpa@zytor.com; Syromiatnikov, Eugene > <esyr@redhat.com>; peterz@infradead.org; rdunlap@infradead.org; > keescook@chromium.org; 0x7f454c46@gmail.com; > dave.hansen@linux.intel.com; kirill.shutemov@linux.intel.com; Eranian, > Stephane <eranian@google.com>; linux-mm@kvack.org; adrian@lisas.de; > fweimer@redhat.com; nadav.amit@gmail.com; jannh@google.com; > avagin@gmail.com; linux-arch@vger.kernel.org; kcc@google.com; > bp@alien8.de; oleg@redhat.com; pavel@ucw.cz; linux-doc@vger.kernel.org; > arnd@arndb.de; Moreira, Joao <joao.moreira@intel.com>; tglx@linutronix.de; > mike.kravetz@oracle.com; x86@kernel.org; Yang, Weijiang > <weijiang.yang@intel.com>; rppt@kernel.org; Dave.Martin@arm.com; > john.allen@amd.com; mingo@redhat.com; Hansen, Dave > <dave.hansen@intel.com>; corbet@lwn.net; linux-kernel@vger.kernel.org; > linux-api@vger.kernel.org; Shankar, Ravi V <ravi.v.shankar@intel.com> > Subject: Re: [PATCH 00/35] Shadow stacks for userspace > > On Wed, Feb 9, 2022 at 6:37 PM Andy Lutomirski <luto@kernel.org> wrote: > > > > On 2/8/22 18:18, Edgecombe, Rick P wrote: > > > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > > >> On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote: > > >>>>> But such a knob will immediately reduce the security value of > > >>>>> the entire > > >>>>> thing, and I don't have good ideas how to deal with it :( > > >>>> > > >>>> Probably a kind of latch in the task_struct which would trigger > > >>>> off once > > >>>> returt to a different address happened, thus we would be able to > > >>>> jump inside > > >>>> paratite code. Of course such trigger should be available under > > >>>> proper > > >>>> capability only. > > >>> > > >>> I'm not fully in touch with how parasite, etc works. Are we > > >>> talking about save or restore? > > >> > > >> We use parasite code in question during checkpoint phase as far as I > > >> remember. > > >> push addr/lret trick is used to run "injected" code (code injection > > >> itself is > > >> done via ptrace) in compat mode at least. Dima, Andrei, I didn't look > > >> into this code > > >> for years already, do we still need to support compat mode at all? > > >> > > >>> If it's restore, what exactly does CRIU need to do? Is it just > > >>> that CRIU needs to return > > >>> out from its resume code into the to-be-resumed program without > > >>> tripping CET? Would it > > >>> be acceptable for CRIU to require that at least one shstk slot be > > >>> free at save time? > > >>> Or do we need a mechanism to atomically switch to a completely full > > >>> shadow stack at resume? > > >>> > > >>> Off the top of my head, a sigreturn (or sigreturn-like mechanism) > > >>> that is intended for > > >>> use for altshadowstack could safely verify a token on the > > >>> altshdowstack, possibly > > >>> compare to something in ucontext (or not -- this isn't clearly > > >>> necessary) and switch > > >>> back to the previous stack. CRIU could use that too. Obviously > > >>> CRIU will need a way > > >>> to populate the relevant stacks, but WRUSS can be used for that, > > >>> and I think this > > >>> is a fundamental requirement for CRIU -- CRIU restore absolutely > > >>> needs a way to write > > >>> the saved shadow stack data into the shadow stack. > > > > > > Still wrapping my head around the CRIU save and restore steps, but > > > another general approach might be to give ptrace the ability to > > > temporarily pause/resume/set CET enablement and SSP for a stopped > > > thread. Then injected code doesn't need to jump through any hoops or > > > possibly run into road blocks. I'm not sure how much this opens things > > > up if the thread has to be stopped... > > > > Hmm, that's maybe not insane. > > > > An alternative would be to add a bona fide ptrace call-a-function > > mechanism. I can think of two potentially usable variants: > > > > 1. Straight call. PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, > > shadow stack push and all. > > > > 2. Signal-style. PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal > > frame just like a real signal is being delivered with the specified > > handler. There could be a variant to opt-in to also using a specified > > altstack and altshadowstack. > > > > 2 would be more expensive but would avoid the need for much in the way > > of asm magic. The injected code could be plain C (or Rust or Zig or > > whatever). > > > > All of this only really handles save, not restore. I don't understand > > restore enough to fully understand the issue. > > FWIW, CET enabled GDB can call a function in a CET enabled process. > Adding Felix who may know more about it. > > > -- > H.J. I don't know much about CRIU or kernel code, so I will stick to explaining what our GDB patches for CET (not upstream yet) currently do. GDB does inferior calls by setting the PC to the function it wants to call and by manipulating the return address. It basically creates a dummy frame and runs that on top of where it currently is. To enable this for CET, our GDB CET patches push onto the shstk of the inferior by writing to the inferiors memory, if it isn't out of range, and by incrementing the SSP (using NT_X86_CET), both via ptrace. (GDB also has a command called 'return', which basically returns early from a function. Here GDB just decrements the SSP via ptrace.) This was based on earlier versions of the kernel patches. If the interface needs to change or if new interfaces would be available to do this better, that is fine from our pov. We didn't upstream this yet. Also, if you have any concerns with this approach please let me know. Regards, Felix Intel Deutschland GmbH Registered Address: Am Campeon 10, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de <http://www.intel.de> Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928
On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: > On 2/8/22 18:18, Edgecombe, Rick P wrote: > > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > > > On Tue, Feb 08, 2022 at 08:21:20AM -0800, Andy Lutomirski wrote: > > > > > > But such a knob will immediately reduce the security value of > > > > > > the entire > > > > > > thing, and I don't have good ideas how to deal with it :( > > > > > > > > > > Probably a kind of latch in the task_struct which would trigger > > > > > off once > > > > > returt to a different address happened, thus we would be able to > > > > > jump inside > > > > > paratite code. Of course such trigger should be available under > > > > > proper > > > > > capability only. > > > > > > > > I'm not fully in touch with how parasite, etc works. Are we > > > > talking about save or restore? > > > > > > We use parasite code in question during checkpoint phase as far as I > > > remember. > > > push addr/lret trick is used to run "injected" code (code injection > > > itself is > > > done via ptrace) in compat mode at least. Dima, Andrei, I didn't look > > > into this code > > > for years already, do we still need to support compat mode at all? > > > > > > > If it's restore, what exactly does CRIU need to do? Is it just > > > > that CRIU needs to return > > > > out from its resume code into the to-be-resumed program without > > > > tripping CET? Would it > > > > be acceptable for CRIU to require that at least one shstk slot be > > > > free at save time? > > > > Or do we need a mechanism to atomically switch to a completely full > > > > shadow stack at resume? > > > > > > > > Off the top of my head, a sigreturn (or sigreturn-like mechanism) > > > > that is intended for > > > > use for altshadowstack could safely verify a token on the > > > > altshdowstack, possibly > > > > compare to something in ucontext (or not -- this isn't clearly > > > > necessary) and switch > > > > back to the previous stack. CRIU could use that too. Obviously > > > > CRIU will need a way > > > > to populate the relevant stacks, but WRUSS can be used for that, > > > > and I think this > > > > is a fundamental requirement for CRIU -- CRIU restore absolutely > > > > needs a way to write > > > > the saved shadow stack data into the shadow stack. > > > > Still wrapping my head around the CRIU save and restore steps, but > > another general approach might be to give ptrace the ability to > > temporarily pause/resume/set CET enablement and SSP for a stopped > > thread. Then injected code doesn't need to jump through any hoops or > > possibly run into road blocks. I'm not sure how much this opens things > > up if the thread has to be stopped... > > Hmm, that's maybe not insane. > > An alternative would be to add a bona fide ptrace call-a-function mechanism. > I can think of two potentially usable variants: > > 1. Straight call. PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, > shadow stack push and all. > > 2. Signal-style. PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal > frame just like a real signal is being delivered with the specified handler. > There could be a variant to opt-in to also using a specified altstack and > altshadowstack. I think this would be ideal. In CRIU, the parasite code is executed in the "daemon" mode and returns back via sigreturn. Right now, CRIU needs to generate a signal frame. If I understand your idea right, the signal frame will be generated by the kernel. > > 2 would be more expensive but would avoid the need for much in the way of > asm magic. The injected code could be plain C (or Rust or Zig or whatever). > > All of this only really handles save, not restore. I don't understand > restore enough to fully understand the issue. In a few words, it works like this: CRIU restores all required resources and prepares a signal frame with a target process state, then it switches to a small PIE blob, where it restores vma-s and calls rt_sigreturn. > > --Andy
On Thu, Feb 10, 2022 at 11:41:16PM -0800, avagin@gmail.com wrote: > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: > > > > An alternative would be to add a bona fide ptrace call-a-function mechanism. > > I can think of two potentially usable variants: > > > > 1. Straight call. PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, > > shadow stack push and all. > > > > 2. Signal-style. PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal > > frame just like a real signal is being delivered with the specified handler. > > There could be a variant to opt-in to also using a specified altstack and > > altshadowstack. > > I think this would be ideal. In CRIU, the parasite code is executed in > the "daemon" mode and returns back via sigreturn. Right now, CRIU needs > to generate a signal frame. If I understand your idea right, the signal > frame will be generated by the kernel. > > > > > 2 would be more expensive but would avoid the need for much in the way of > > asm magic. The injected code could be plain C (or Rust or Zig or whatever). > > > > All of this only really handles save, not restore. I don't understand > > restore enough to fully understand the issue. > > In a few words, it works like this: CRIU restores all required resources > and prepares a signal frame with a target process state, then it > switches to a small PIE blob, where it restores vma-s and calls > rt_sigreturn. I think it's also important to note that the stack is restored as a part of the process memory, i.e. its contents is read from the images. > > > > --Andy
On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: > On 2/8/22 18:18, Edgecombe, Rick P wrote: > > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > > > > Still wrapping my head around the CRIU save and restore steps, but > > another general approach might be to give ptrace the ability to > > temporarily pause/resume/set CET enablement and SSP for a stopped > > thread. Then injected code doesn't need to jump through any hoops or > > possibly run into road blocks. I'm not sure how much this opens things > > up if the thread has to be stopped... > > Hmm, that's maybe not insane. > > An alternative would be to add a bona fide ptrace call-a-function mechanism. > I can think of two potentially usable variants: > > 1. Straight call. PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, > shadow stack push and all. > > 2. Signal-style. PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal > frame just like a real signal is being delivered with the specified handler. > There could be a variant to opt-in to also using a specified altstack and > altshadowstack. Using ptrace() will not solve CRIU's issue with sigreturn because sigreturn is called from the victim context rather than from the criu process that controls the dump and uses ptrace(). Even with the current shadow stack interface Rick proposed, CRIU can restore the victim using ptrace without any additional knobs, but we loose an important ability to "self-cure" the victim from the parasite in case anything goes wrong with criu control process. Moreover, the issue with backward compatibility is not with ptrace but with sigreturn and it seems that criu is not its only user. So I think we need a way to allow direct calls to sigreturn that will bypass check and restore of the shadow stack. I only know that there are sigreturn users except criu that show up in Debian codesearch, and I don't know how do they use it, but for full backward compatibility we'd need to have no-CET sigreturn as default and add a new, say UC_CHECK_SHSTK flag to rt_sigframe->uc.uc_flags or even a new syscall for libc signal handling. > 2 would be more expensive but would avoid the need for much in the way of > asm magic. The injected code could be plain C (or Rust or Zig or whatever). > > All of this only really handles save, not restore. I don't understand > restore enough to fully understand the issue. Restore is more complex, will get to it later. > --Andy
On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote: > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: >> On 2/8/22 18:18, Edgecombe, Rick P wrote: >> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: >> > >> > Still wrapping my head around the CRIU save and restore steps, but >> > another general approach might be to give ptrace the ability to >> > temporarily pause/resume/set CET enablement and SSP for a stopped >> > thread. Then injected code doesn't need to jump through any hoops or >> > possibly run into road blocks. I'm not sure how much this opens things >> > up if the thread has to be stopped... >> >> Hmm, that's maybe not insane. >> >> An alternative would be to add a bona fide ptrace call-a-function mechanism. >> I can think of two potentially usable variants: >> >> 1. Straight call. PTRACE_CALL_FUNCTION(addr) just emulates CALL addr, >> shadow stack push and all. >> >> 2. Signal-style. PTRACE_CALL_FUNCTION_SIGFRAME injects an actual signal >> frame just like a real signal is being delivered with the specified handler. >> There could be a variant to opt-in to also using a specified altstack and >> altshadowstack. > > Using ptrace() will not solve CRIU's issue with sigreturn because sigreturn > is called from the victim context rather than from the criu process that > controls the dump and uses ptrace(). I'm not sure I follow. > > Even with the current shadow stack interface Rick proposed, CRIU can restore > the victim using ptrace without any additional knobs, but we loose an > important ability to "self-cure" the victim from the parasite in case > anything goes wrong with criu control process. > > Moreover, the issue with backward compatibility is not with ptrace but with > sigreturn and it seems that criu is not its only user. So we need an ability for a tracer to cause the tracee to call a function and to return successfully. Apparently a gdb branch can already do this with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the trick. I don't see why we need a sigretur-but-dont-verify -- we just need this mechanism to create a frame such that sigreturn actually works. --Andy
On Mon, Feb 28, 2022 at 12:30:41PM -0800, Andy Lutomirski wrote: > > > On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote: > > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: > >> On 2/8/22 18:18, Edgecombe, Rick P wrote: > >> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > >> > > > > > Even with the current shadow stack interface Rick proposed, CRIU can restore > > the victim using ptrace without any additional knobs, but we loose an > > important ability to "self-cure" the victim from the parasite in case > > anything goes wrong with criu control process. > > > > Moreover, the issue with backward compatibility is not with ptrace but with > > sigreturn and it seems that criu is not its only user. > > So we need an ability for a tracer to cause the tracee to call a function > and to return successfully. Apparently a gdb branch can already do this > with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the > trick. I don't see why we need a sigretur-but-dont-verify -- we just > need this mechanism to create a frame such that sigreturn actually works. If I understand correctly, PTRACE_CALL_FUNCTION_SIGFRAME() injects a frame into the tracee and makes the tracee call sigreturn. I.e. the tracee is stopped and this is used pretty much as PTRACE_CONT or PTRACE_SYSCALL. In such case this defeats the purpose of sigreturn in CRIU because it is called asynchronously by the tracee when the tracer is about to detach or even already detached. For synchronous use-case PTRACE_SETREGSET will be enough, the rest of the sigframe can be restored by other means. And with 'criu restore' there may be even no tracer by the time sigreturn is called. > --Andy
On Mon, Feb 28, 2022, at 1:30 PM, Mike Rapoport wrote: > On Mon, Feb 28, 2022 at 12:30:41PM -0800, Andy Lutomirski wrote: >> >> >> On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote: >> > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: >> >> On 2/8/22 18:18, Edgecombe, Rick P wrote: >> >> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: >> >> > >> > >> > Even with the current shadow stack interface Rick proposed, CRIU can restore >> > the victim using ptrace without any additional knobs, but we loose an >> > important ability to "self-cure" the victim from the parasite in case >> > anything goes wrong with criu control process. >> > >> > Moreover, the issue with backward compatibility is not with ptrace but with >> > sigreturn and it seems that criu is not its only user. >> >> So we need an ability for a tracer to cause the tracee to call a function >> and to return successfully. Apparently a gdb branch can already do this >> with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the >> trick. I don't see why we need a sigretur-but-dont-verify -- we just >> need this mechanism to create a frame such that sigreturn actually works. > > If I understand correctly, PTRACE_CALL_FUNCTION_SIGFRAME() injects a frame > into the tracee and makes the tracee call sigreturn. > I.e. the tracee is stopped and this is used pretty much as PTRACE_CONT or > PTRACE_SYSCALL. > > In such case this defeats the purpose of sigreturn in CRIU because it is > called asynchronously by the tracee when the tracer is about to detach or > even already detached. The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal frame onto the stack and call a function. That function should then be able to call sigreturn just like any normal signal handler. There should be no requirement that the tracer still be attached when this happens, although the code calling sigreturn still needs to be mapped. (Specifically, on modern arches, the user runtime is expected to provide a "restorer" that calls sigreturn. A hypotheticall PTRACE_CALL_FUNCTION_SIGFRAME would either need to call sigreturn directly or provide a restorer.)
On Mon, Feb 28, 2022 at 02:55:30PM -0800, Andy Lutomirski wrote: > > > On Mon, Feb 28, 2022, at 1:30 PM, Mike Rapoport wrote: > > On Mon, Feb 28, 2022 at 12:30:41PM -0800, Andy Lutomirski wrote: > >> > >> > >> On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote: > >> > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: > >> >> On 2/8/22 18:18, Edgecombe, Rick P wrote: > >> >> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > >> >> > > >> > > >> > Even with the current shadow stack interface Rick proposed, CRIU can restore > >> > the victim using ptrace without any additional knobs, but we loose an > >> > important ability to "self-cure" the victim from the parasite in case > >> > anything goes wrong with criu control process. > >> > > >> > Moreover, the issue with backward compatibility is not with ptrace but with > >> > sigreturn and it seems that criu is not its only user. > >> > >> So we need an ability for a tracer to cause the tracee to call a function > >> and to return successfully. Apparently a gdb branch can already do this > >> with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the > >> trick. I don't see why we need a sigretur-but-dont-verify -- we just > >> need this mechanism to create a frame such that sigreturn actually works. > > > > If I understand correctly, PTRACE_CALL_FUNCTION_SIGFRAME() injects a frame > > into the tracee and makes the tracee call sigreturn. > > I.e. the tracee is stopped and this is used pretty much as PTRACE_CONT or > > PTRACE_SYSCALL. > > > > In such case this defeats the purpose of sigreturn in CRIU because it is > > called asynchronously by the tracee when the tracer is about to detach or > > even already detached. > > The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal frame onto > the stack and call a function. That function should then be able to call > sigreturn just like any normal signal handler. Ok, let me reiterate. We have a seized and stopped tracee, use PTRACE_CALL_FUNCTION_SIGFRAME to push a signal frame onto the tracee's stack so that sigreturn could use that frame, then set the tracee %rip to the function we'd like to call and then we PTRACE_CONT the tracee. Tracee continues to execute the parasite code that calls sigreturn to clean up and restore the tracee process. PTRACE_CALL_FUNCTION_SIGFRAME also pushes a restore token to the shadow stack, just like setup_rt_frame() does, so that sys_rt_sigreturn() won't bail out at restore_signal_shadow_stack(). The only thing that CRIU actually needs is to push a restore token to the shadow stack, so for us a ptrace call that does that would be ideal.
On Thu, Mar 3, 2022 at 11:43 AM Mike Rapoport <rppt@kernel.org> wrote: > > On Mon, Feb 28, 2022 at 02:55:30PM -0800, Andy Lutomirski wrote: > > > > > > On Mon, Feb 28, 2022, at 1:30 PM, Mike Rapoport wrote: > > > On Mon, Feb 28, 2022 at 12:30:41PM -0800, Andy Lutomirski wrote: > > >> > > >> > > >> On Mon, Feb 28, 2022, at 12:27 PM, Mike Rapoport wrote: > > >> > On Wed, Feb 09, 2022 at 06:37:53PM -0800, Andy Lutomirski wrote: > > >> >> On 2/8/22 18:18, Edgecombe, Rick P wrote: > > >> >> > On Tue, 2022-02-08 at 20:02 +0300, Cyrill Gorcunov wrote: > > >> >> > > > >> > > > >> > Even with the current shadow stack interface Rick proposed, CRIU can restore > > >> > the victim using ptrace without any additional knobs, but we loose an > > >> > important ability to "self-cure" the victim from the parasite in case > > >> > anything goes wrong with criu control process. > > >> > > > >> > Moreover, the issue with backward compatibility is not with ptrace but with > > >> > sigreturn and it seems that criu is not its only user. > > >> > > >> So we need an ability for a tracer to cause the tracee to call a function > > >> and to return successfully. Apparently a gdb branch can already do this > > >> with shstk, and my PTRACE_CALL_FUNCTION_SIGFRAME should also do the > > >> trick. I don't see why we need a sigretur-but-dont-verify -- we just > > >> need this mechanism to create a frame such that sigreturn actually works. > > > > > > If I understand correctly, PTRACE_CALL_FUNCTION_SIGFRAME() injects a frame > > > into the tracee and makes the tracee call sigreturn. > > > I.e. the tracee is stopped and this is used pretty much as PTRACE_CONT or > > > PTRACE_SYSCALL. > > > > > > In such case this defeats the purpose of sigreturn in CRIU because it is > > > called asynchronously by the tracee when the tracer is about to detach or > > > even already detached. > > > > The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal frame onto > > the stack and call a function. That function should then be able to call > > sigreturn just like any normal signal handler. > > Ok, let me reiterate. > > We have a seized and stopped tracee, use PTRACE_CALL_FUNCTION_SIGFRAME > to push a signal frame onto the tracee's stack so that sigreturn could use > that frame, then set the tracee %rip to the function we'd like to call and > then we PTRACE_CONT the tracee. Tracee continues to execute the parasite > code that calls sigreturn to clean up and restore the tracee process. > > PTRACE_CALL_FUNCTION_SIGFRAME also pushes a restore token to the shadow > stack, just like setup_rt_frame() does, so that sys_rt_sigreturn() won't > bail out at restore_signal_shadow_stack(). That is the intent. > > The only thing that CRIU actually needs is to push a restore token to the > shadow stack, so for us a ptrace call that does that would be ideal. > That seems fine too. The main benefit of the SIGFRAME approach is that, AIUI, CRIU eventually constructs a signal frame anyway, and getting one ready-made seems plausibly helpful. But if it's not actually that useful, then there's no need to do it.
On Thu, 2022-03-03 at 15:00 -0800, Andy Lutomirski wrote: > > > The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal > > > frame onto > > > the stack and call a function. That function should then be able > > > to call > > > sigreturn just like any normal signal handler. > > > > Ok, let me reiterate. > > > > We have a seized and stopped tracee, use > > PTRACE_CALL_FUNCTION_SIGFRAME > > to push a signal frame onto the tracee's stack so that sigreturn > > could use > > that frame, then set the tracee %rip to the function we'd like to > > call and > > then we PTRACE_CONT the tracee. Tracee continues to execute the > > parasite > > code that calls sigreturn to clean up and restore the tracee > > process. > > > > PTRACE_CALL_FUNCTION_SIGFRAME also pushes a restore token to the > > shadow > > stack, just like setup_rt_frame() does, so that sys_rt_sigreturn() > > won't > > bail out at restore_signal_shadow_stack(). > > That is the intent. > > > > > The only thing that CRIU actually needs is to push a restore token > > to the > > shadow stack, so for us a ptrace call that does that would be > > ideal. > > > > That seems fine too. The main benefit of the SIGFRAME approach is > that, AIUI, CRIU eventually constructs a signal frame anyway, and > getting one ready-made seems plausibly helpful. But if it's not > actually that useful, then there's no need to do it. I guess pushing a token to the shadow stack could be done like GDB does calls, with just the basic CET ptrace support. So do we even need a specific push token operation? I suppose if CRIU already used some kernel encapsulation of a seized call/return operation it would have been easier to make CRIU work with the introduction of CET. But the design of CRIU seems to be to have the kernel expose just enough and then tie it all together in userspace. Andy, did you have any other usages for PTRACE_CALL_FUNCTION in mind? I couldn't find any other CRIU-like users of sigreturn in the debian source search (but didn't read all 819 pages that come up with "sigreturn"). It seemed to be mostly seccomp sandbox references.
On 3/3/22 17:30, Edgecombe, Rick P wrote: > On Thu, 2022-03-03 at 15:00 -0800, Andy Lutomirski wrote: >>>> The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal >>>> frame onto >>>> the stack and call a function. That function should then be able >>>> to call >>>> sigreturn just like any normal signal handler. >>> >>> Ok, let me reiterate. >>> >>> We have a seized and stopped tracee, use >>> PTRACE_CALL_FUNCTION_SIGFRAME >>> to push a signal frame onto the tracee's stack so that sigreturn >>> could use >>> that frame, then set the tracee %rip to the function we'd like to >>> call and >>> then we PTRACE_CONT the tracee. Tracee continues to execute the >>> parasite >>> code that calls sigreturn to clean up and restore the tracee >>> process. >>> >>> PTRACE_CALL_FUNCTION_SIGFRAME also pushes a restore token to the >>> shadow >>> stack, just like setup_rt_frame() does, so that sys_rt_sigreturn() >>> won't >>> bail out at restore_signal_shadow_stack(). >> >> That is the intent. >> >>> >>> The only thing that CRIU actually needs is to push a restore token >>> to the >>> shadow stack, so for us a ptrace call that does that would be >>> ideal. >>> >> >> That seems fine too. The main benefit of the SIGFRAME approach is >> that, AIUI, CRIU eventually constructs a signal frame anyway, and >> getting one ready-made seems plausibly helpful. But if it's not >> actually that useful, then there's no need to do it. > > I guess pushing a token to the shadow stack could be done like GDB does > calls, with just the basic CET ptrace support. So do we even need a > specific push token operation? > > I suppose if CRIU already used some kernel encapsulation of a seized > call/return operation it would have been easier to make CRIU work with > the introduction of CET. But the design of CRIU seems to be to have the > kernel expose just enough and then tie it all together in userspace. > > Andy, did you have any other usages for PTRACE_CALL_FUNCTION in mind? I > couldn't find any other CRIU-like users of sigreturn in the debian > source search (but didn't read all 819 pages that come up with > "sigreturn"). It seemed to be mostly seccomp sandbox references. I don't see a benefit compelling enough to justify the added complexity, given that existing mechanisms can do it. The sigframe thing, OTOH, seems genuinely useful if CRIU would actually use it to save the full register state. Generating a signal frame from scratch is a pain. That being said, if CRIU isn't excited, then don't bother.
On Fri, Mar 04, 2022 at 11:13:19AM -0800, Andy Lutomirski wrote: > On 3/3/22 17:30, Edgecombe, Rick P wrote: > > On Thu, 2022-03-03 at 15:00 -0800, Andy Lutomirski wrote: > > > > > The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal > > > > > frame onto > > > > > the stack and call a function. That function should then be able > > > > > to call > > > > > sigreturn just like any normal signal handler. > > > > > > > > Ok, let me reiterate. > > > > > > > > We have a seized and stopped tracee, use > > > > PTRACE_CALL_FUNCTION_SIGFRAME > > > > to push a signal frame onto the tracee's stack so that sigreturn > > > > could use > > > > that frame, then set the tracee %rip to the function we'd like to > > > > call and > > > > then we PTRACE_CONT the tracee. Tracee continues to execute the > > > > parasite > > > > code that calls sigreturn to clean up and restore the tracee > > > > process. > > > > > > > > PTRACE_CALL_FUNCTION_SIGFRAME also pushes a restore token to the > > > > shadow > > > > stack, just like setup_rt_frame() does, so that sys_rt_sigreturn() > > > > won't > > > > bail out at restore_signal_shadow_stack(). > > > > > > That is the intent. > > > > > > > > > > > The only thing that CRIU actually needs is to push a restore token > > > > to the > > > > shadow stack, so for us a ptrace call that does that would be > > > > ideal. > > > > > > > > > > That seems fine too. The main benefit of the SIGFRAME approach is > > > that, AIUI, CRIU eventually constructs a signal frame anyway, and > > > getting one ready-made seems plausibly helpful. But if it's not > > > actually that useful, then there's no need to do it. > > > > I guess pushing a token to the shadow stack could be done like GDB does > > calls, with just the basic CET ptrace support. So do we even need a > > specific push token operation? I've tried to follow gdb CET push implementation, but got lost. What is "basic CET ptrace support"? I don't see any ptrace changes in this series. > > I suppose if CRIU already used some kernel encapsulation of a seized > > call/return operation it would have been easier to make CRIU work with > > the introduction of CET. But the design of CRIU seems to be to have the > > kernel expose just enough and then tie it all together in userspace. > > > > Andy, did you have any other usages for PTRACE_CALL_FUNCTION in mind? I > > couldn't find any other CRIU-like users of sigreturn in the debian > > source search (but didn't read all 819 pages that come up with > > "sigreturn"). It seemed to be mostly seccomp sandbox references. > > I don't see a benefit compelling enough to justify the added complexity, > given that existing mechanisms can do it. > > The sigframe thing, OTOH, seems genuinely useful if CRIU would actually use > it to save the full register state. Generating a signal frame from scratch > is a pain. That being said, if CRIU isn't excited, then don't bother. CRIU is excited :) I just was looking for the minimal possible interface that will allow us to call sigreturn. Rick is right and CRIU does try to expose as little as possible and handle the pain in the userspace. The SIGFRAME approach is indeed very helpful, especially if we can make it work on other architectures eventually.
On Mon, Mar 7, 2022 at 10:57 AM Mike Rapoport <rppt@kernel.org> wrote: > > On Fri, Mar 04, 2022 at 11:13:19AM -0800, Andy Lutomirski wrote: > > On 3/3/22 17:30, Edgecombe, Rick P wrote: > > > On Thu, 2022-03-03 at 15:00 -0800, Andy Lutomirski wrote: > > > > > > The intent of PTRACE_CALL_FUNCTION_SIGFRAME is push a signal > > > > > > frame onto > > > > > > the stack and call a function. That function should then be able > > > > > > to call > > > > > > sigreturn just like any normal signal handler. > > > > > > > > > > Ok, let me reiterate. > > > > > > > > > > We have a seized and stopped tracee, use > > > > > PTRACE_CALL_FUNCTION_SIGFRAME > > > > > to push a signal frame onto the tracee's stack so that sigreturn > > > > > could use > > > > > that frame, then set the tracee %rip to the function we'd like to > > > > > call and > > > > > then we PTRACE_CONT the tracee. Tracee continues to execute the > > > > > parasite > > > > > code that calls sigreturn to clean up and restore the tracee > > > > > process. > > > > > > > > > > PTRACE_CALL_FUNCTION_SIGFRAME also pushes a restore token to the > > > > > shadow > > > > > stack, just like setup_rt_frame() does, so that sys_rt_sigreturn() > > > > > won't > > > > > bail out at restore_signal_shadow_stack(). > > > > > > > > That is the intent. > > > > > > > > > > > > > > The only thing that CRIU actually needs is to push a restore token > > > > > to the > > > > > shadow stack, so for us a ptrace call that does that would be > > > > > ideal. > > > > > > > > > > > > > That seems fine too. The main benefit of the SIGFRAME approach is > > > > that, AIUI, CRIU eventually constructs a signal frame anyway, and > > > > getting one ready-made seems plausibly helpful. But if it's not > > > > actually that useful, then there's no need to do it. > > > > > > I guess pushing a token to the shadow stack could be done like GDB does > > > calls, with just the basic CET ptrace support. So do we even need a > > > specific push token operation? > > I've tried to follow gdb CET push implementation, but got lost. > What is "basic CET ptrace support"? I don't see any ptrace changes in this > series. Here is the CET ptrace patch on CET 5.16 kernel branch: https://github.com/hjl-tools/linux/commit/3a43ec29ddac56f87807161b5aeafa80f632363d > > > I suppose if CRIU already used some kernel encapsulation of a seized > > > call/return operation it would have been easier to make CRIU work with > > > the introduction of CET. But the design of CRIU seems to be to have the > > > kernel expose just enough and then tie it all together in userspace. > > > > > > Andy, did you have any other usages for PTRACE_CALL_FUNCTION in mind? I > > > couldn't find any other CRIU-like users of sigreturn in the debian > > > source search (but didn't read all 819 pages that come up with > > > "sigreturn"). It seemed to be mostly seccomp sandbox references. > > > > I don't see a benefit compelling enough to justify the added complexity, > > given that existing mechanisms can do it. > > > > The sigframe thing, OTOH, seems genuinely useful if CRIU would actually use > > it to save the full register state. Generating a signal frame from scratch > > is a pain. That being said, if CRIU isn't excited, then don't bother. > > CRIU is excited :) > > I just was looking for the minimal possible interface that will allow us to > call sigreturn. Rick is right and CRIU does try to expose as little as > possible and handle the pain in the userspace. > > The SIGFRAME approach is indeed very helpful, especially if we can make it > work on other architectures eventually. > > -- > Sincerely yours, > Mike.
From: Mike Rapoport > Sent: 07 March 2022 18:57 ... > > The sigframe thing, OTOH, seems genuinely useful if CRIU would actually use > > it to save the full register state. Generating a signal frame from scratch > > is a pain. That being said, if CRIU isn't excited, then don't bother. > > CRIU is excited :) > > I just was looking for the minimal possible interface that will allow us to > call sigreturn. Rick is right and CRIU does try to expose as little as > possible and handle the pain in the userspace. > > The SIGFRAME approach is indeed very helpful, especially if we can make it > work on other architectures eventually. I thought the full sigframe layout depends very much on what the kernel decides it needs to save? Some parts are exposed to the signal handler, but there are large blocks of data that XSAVE (etc) save that have to be put onto the signal stack. Is it even vaguely feasible to replicate what a specific kernel generates on specific hardware in a userspace library? The size of this data is getting bigger and bigger - causing issues with the SIGALTSTACK (and even thread stack) minimum sizes. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi all, On Mon, Mar 07, 2022 at 11:07:01AM -0800, H.J. Lu wrote: > On Mon, Mar 7, 2022 at 10:57 AM Mike Rapoport <rppt@kernel.org> wrote: > > > > On Fri, Mar 04, 2022 at 11:13:19AM -0800, Andy Lutomirski wrote: > > > On 3/3/22 17:30, Edgecombe, Rick P wrote: > > Here is the CET ptrace patch on CET 5.16 kernel branch: > > https://github.com/hjl-tools/linux/commit/3a43ec29ddac56f87807161b5aeafa80f632363d It took me a while, but at last I have a version of CRIU that knows how to handle shadow stack. For the shadow stack manipulation during dump and for the creation of the sigframe for sigreturn I used the CET ptrace patch for 5.16 (thanks H.J). For the restore I had to add two modifications to the kernel APIs on top of this version of the shadow stack series: * add address parameter to map_shadow_stack() so that it'll call mmap() with MAP_FIXED if the address is requested. This is required to restore the shadow stack at the same address as it was at dump time. * add ability to unlock shadow stack features using ptrace. This is required because the current glibc (or at least in the version I used for tests) locks shadow stack state when it loads a program. This locking means that a process will either have shadow stack disabled without an ability to enable it or it will have shadow stack enabled with WRSS disabled and again, there is no way to re-enable WRSS. With that, ptrace looked like the most sensible interface to interfere with the shadow stack locking. I've pushed the kernel modifications here: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=cet/kvm and CRIU modifications here: https://github.com/rppt/criu/tree/cet/v0.1
Mike, Thanks for doing this. Glad to hear this is solvable with the current paradigm. On Tue, 2022-05-31 at 14:59 +0300, Mike Rapoport wrote: > * add ability to unlock shadow stack features using ptrace. This is > required because the current glibc (or at least in the version I used > for > tests) locks shadow stack state when it loads a program. This locking > means > that a process will either have shadow stack disabled without an > ability to > enable it or it will have shadow stack enabled with WRSS disabled and > again, there is no way to re-enable WRSS. With that, ptrace looked > like the > most sensible interface to interfere with the shadow stack locking. So whatever glibc you have lock's features even if it doesn't enable shadow stack? Hmm, I've not encountered this. Which glibc is it? WRSS is a feature where you would usually want to lock it as disabled, but WRSS cannot be enabled if shadow stack is not enabled. Locking shadow stack and WRSS off together doesn't have any security benefits in theory. so I'm thinking glibc doesn't need to do this. The kernel could even refuse to lock WRSS without shadow stack being enabled. Could we avoid the extra ptrace functionality then? Rick
On Tue, May 31, 2022 at 04:25:13PM +0000, Edgecombe, Rick P wrote: > Mike, > > Thanks for doing this. Glad to hear this is solvable with the current > paradigm. > > On Tue, 2022-05-31 at 14:59 +0300, Mike Rapoport wrote: > > * add ability to unlock shadow stack features using ptrace. This is > > required because the current glibc (or at least in the version I used > > for > > tests) locks shadow stack state when it loads a program. This locking > > means > > that a process will either have shadow stack disabled without an > > ability to > > enable it or it will have shadow stack enabled with WRSS disabled and > > again, there is no way to re-enable WRSS. With that, ptrace looked > > like the > > most sensible interface to interfere with the shadow stack locking. > > So whatever glibc you have lock's features even if it doesn't enable > shadow stack? Hmm, I've not encountered this. Which glibc is it? I use glibc from here: https://gitlab.com/x86-glibc/glibc/, commit b6f9a22a00c1f8ae8c0991886f0a714f2f5da002 AFAIU, it's H.J cet work. > WRSS is a feature where you would usually want to lock it as disabled, > but WRSS cannot be enabled if shadow stack is not enabled. Locking > shadow stack and WRSS off together doesn't have any security benefits > in theory. so I'm thinking glibc doesn't need to do this. The kernel > could even refuse to lock WRSS without shadow stack being enabled. > Could we avoid the extra ptrace functionality then? What I see for is that a program can support shadow stack, glibc enables shadow stack, does not enable WRSS and than calls arch_prctl(ARCH_X86_FEATURE_LOCK, LINUX_X86_FEATURE_SHSTK | LINUX_X86_FEATURE_WRSS); so that WRSS cannot be re-enabled. For the programs that do not support shadow stack, both SHSTK and WRSS are disabled, but still there is the same call to arch_prctl(ARCH_X86_FEATURE_LOCK, ...) and then neither shadow stack nor WRSS can be enabled. My original plan was to run CRIU with no shadow stack, enable shadow stack and WRSS in the restored tasks using arch_prct() and after the shadow stack contents is restored disable WRSS. Obviously, this didn't work with glibc I have :) On the bright side, having a ptrace call to unlock shadow stack and wrss allows running CRIU itself with shadow stack. > Rick
On Tue, 2022-05-31 at 19:36 +0300, Mike Rapoport wrote: > > WRSS is a feature where you would usually want to lock it as > > disabled, > > but WRSS cannot be enabled if shadow stack is not enabled. Locking > > shadow stack and WRSS off together doesn't have any security > > benefits > > in theory. so I'm thinking glibc doesn't need to do this. The > > kernel > > could even refuse to lock WRSS without shadow stack being enabled. > > Could we avoid the extra ptrace functionality then? > > What I see for is that a program can support shadow stack, glibc > enables > shadow stack, does not enable WRSS and than calls > > arch_prctl(ARCH_X86_FEATURE_LOCK, > LINUX_X86_FEATURE_SHSTK | LINUX_X86_FEATURE_WRSS); I see the logic is glibc will lock SHSTK|IBT if either is enabled in the elf header. I guess that is why I didn't see the locking happening for me, because my manual enablement test doesn't have either set in the header. It can't see where that glibc knows about WRSS though... The glibc logic seems wrong to me also, because shadow stack or IBT could be force-disabled via glibc tunables. I don't see why the elf header bit should exclusively control the feature locking. Or why both should be locked if only one is in the header. > > so that WRSS cannot be re-enabled. > > For the programs that do not support shadow stack, both SHSTK and > WRSS are > disabled, but still there is the same call to > arch_prctl(ARCH_X86_FEATURE_LOCK, ...) and then neither shadow stack > nor > WRSS can be enabled. > > My original plan was to run CRIU with no shadow stack, enable shadow > stack > and WRSS in the restored tasks using arch_prct() and after the shadow > stack > contents is restored disable WRSS. > > Obviously, this didn't work with glibc I have :) Were you disabling shadow stack via glibc tunnable? Or was the elf header marked for IBT? If it was a plain old binary, the code looks to me like it should not lock any features. > > On the bright side, having a ptrace call to unlock shadow stack and > wrss > allows running CRIU itself with shadow stack. > Yea, having something working is really great. My only hesitancy is that, per a discussion on the LAM patchset, we are going to make this enabling API CET only (same semantics for though). I suppose the locking API arch_prctl() could still be support other arch features, but it might be a second CET only regset. It's not the end of the world. I guess the other consideration is tieing CRIU to glibc peculiarities. Like even if we fix glibc, then CRIU may not work with some other libc or app that force disables for some weird reason. Is it supposed to be libc-agnostic?
On Tue, May 31, 2022 at 10:34 AM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Tue, 2022-05-31 at 19:36 +0300, Mike Rapoport wrote: > > > WRSS is a feature where you would usually want to lock it as > > > disabled, > > > but WRSS cannot be enabled if shadow stack is not enabled. Locking > > > shadow stack and WRSS off together doesn't have any security > > > benefits > > > in theory. so I'm thinking glibc doesn't need to do this. The > > > kernel > > > could even refuse to lock WRSS without shadow stack being enabled. > > > Could we avoid the extra ptrace functionality then? > > > > What I see for is that a program can support shadow stack, glibc > > enables > > shadow stack, does not enable WRSS and than calls > > > > arch_prctl(ARCH_X86_FEATURE_LOCK, > > LINUX_X86_FEATURE_SHSTK | LINUX_X86_FEATURE_WRSS); > > I see the logic is glibc will lock SHSTK|IBT if either is enabled in > the elf header. I guess that is why I didn't see the locking happening > for me, because my manual enablement test doesn't have either set in > the header. > > It can't see where that glibc knows about WRSS though... > > The glibc logic seems wrong to me also, because shadow stack or IBT > could be force-disabled via glibc tunables. I don't see why the elf > header bit should exclusively control the feature locking. Or why both > should be locked if only one is in the header. glibc locks SHSTK and IBT only if they are enabled at run-time. It doesn't enable/disable/lock WRSS at the moment. If WRSS can be enabled via arch_prctl at any time, we can't lock it. If WRSS should be locked early, how should it be enabled in application? Also can WRSS be enabled from a dlopened object? > > > > so that WRSS cannot be re-enabled. > > > > For the programs that do not support shadow stack, both SHSTK and > > WRSS are > > disabled, but still there is the same call to > > arch_prctl(ARCH_X86_FEATURE_LOCK, ...) and then neither shadow stack > > nor > > WRSS can be enabled. > > > > My original plan was to run CRIU with no shadow stack, enable shadow > > stack > > and WRSS in the restored tasks using arch_prct() and after the shadow > > stack > > contents is restored disable WRSS. > > > > Obviously, this didn't work with glibc I have :) > > Were you disabling shadow stack via glibc tunnable? Or was the elf > header marked for IBT? If it was a plain old binary, the code looks to > me like it should not lock any features. > > > > > On the bright side, having a ptrace call to unlock shadow stack and > > wrss > > allows running CRIU itself with shadow stack. > > > > Yea, having something working is really great. My only hesitancy is > that, per a discussion on the LAM patchset, we are going to make this > enabling API CET only (same semantics for though). I suppose the > locking API arch_prctl() could still be support other arch features, > but it might be a second CET only regset. It's not the end of the > world. > > I guess the other consideration is tieing CRIU to glibc peculiarities. > Like even if we fix glibc, then CRIU may not work with some other libc > or app that force disables for some weird reason. Is it supposed to be > libc-agnostic? >
On Tue, May 31, 2022 at 05:34:50PM +0000, Edgecombe, Rick P wrote: > On Tue, 2022-05-31 at 19:36 +0300, Mike Rapoport wrote: > > > WRSS is a feature where you would usually want to lock it as > > > disabled, > > > but WRSS cannot be enabled if shadow stack is not enabled. Locking > > > shadow stack and WRSS off together doesn't have any security > > > benefits > > > in theory. so I'm thinking glibc doesn't need to do this. The > > > kernel > > > could even refuse to lock WRSS without shadow stack being enabled. > > > Could we avoid the extra ptrace functionality then? > > > > What I see for is that a program can support shadow stack, glibc > > enables > > shadow stack, does not enable WRSS and than calls > > > > arch_prctl(ARCH_X86_FEATURE_LOCK, > > LINUX_X86_FEATURE_SHSTK | LINUX_X86_FEATURE_WRSS); > > I see the logic is glibc will lock SHSTK|IBT if either is enabled in > the elf header. I guess that is why I didn't see the locking happening > for me, because my manual enablement test doesn't have either set in > the header. The locking was quite a surprise for me when I moved from standalone test to a system with CET-enabled glibc :) > It can't see where that glibc knows about WRSS though... Right, it was my mistake, as H.J. said glibc locks SHSTK and IBT. > The glibc logic seems wrong to me also, because shadow stack or IBT > could be force-disabled via glibc tunables. I don't see why the elf > header bit should exclusively control the feature locking. Or why both > should be locked if only one is in the header. > > > > > so that WRSS cannot be re-enabled. > > > > For the programs that do not support shadow stack, both SHSTK and > > WRSS are > > disabled, but still there is the same call to > > arch_prctl(ARCH_X86_FEATURE_LOCK, ...) and then neither shadow stack > > nor > > WRSS can be enabled. > > > > My original plan was to run CRIU with no shadow stack, enable shadow > > stack > > and WRSS in the restored tasks using arch_prct() and after the shadow > > stack > > contents is restored disable WRSS. > > > > Obviously, this didn't work with glibc I have :) > > Were you disabling shadow stack via glibc tunnable? Or was the elf > header marked for IBT? If it was a plain old binary, the code looks to > me like it should not lock any features. I built criu as a plain old binary, there were no SHSTK or IBT markers. And I've seen that there was a call to arch_prctl that locked the features as disabled. > > On the bright side, having a ptrace call to unlock shadow stack and > > wrss > > allows running CRIU itself with shadow stack. > > Yea, having something working is really great. My only hesitancy is > that, per a discussion on the LAM patchset, we are going to make this > enabling API CET only (same semantics for though). I suppose the > locking API arch_prctl() could still be support other arch features, > but it might be a second CET only regset. It's not the end of the > world. The support for CET in criu is anyway experimental for now, if the kernel API will be slightly different in the end, we'll update criu. The important things are the ability to control tracee shadow stack from ptrace, the ability to map the shadow stack at fixed address and the ability to control the features at least from ptrace. As long as we have APIs that provide those, it should be Ok. > I guess the other consideration is tieing CRIU to glibc peculiarities. > Like even if we fix glibc, then CRIU may not work with some other libc > or app that force disables for some weird reason. Is it supposed to be > libc-agnostic? Actually using the ptrace to control the CET features does not tie criu to glibc. The current proposal for the arch_prctl() allows libc to lock CET features and having a ptrace call to control the lock makes criu agnostic to libc behaviour.
On Wed, 2022-06-01 at 11:06 +0300, Mike Rapoport wrote: > > Yea, having something working is really great. My only hesitancy is > > that, per a discussion on the LAM patchset, we are going to make > > this > > enabling API CET only (same semantics for though). I suppose the > > locking API arch_prctl() could still be support other arch > > features, > > but it might be a second CET only regset. It's not the end of the > > world. > > The support for CET in criu is anyway experimental for now, if the > kernel > API will be slightly different in the end, we'll update criu. > The important things are the ability to control tracee shadow stack > from ptrace, the ability to map the shadow stack at fixed address and > the > ability to control the features at least from ptrace. > As long as we have APIs that provide those, it should be Ok. > > > I guess the other consideration is tieing CRIU to glibc > > peculiarities. > > Like even if we fix glibc, then CRIU may not work with some other > > libc > > or app that force disables for some weird reason. Is it supposed to > > be > > libc-agnostic? > > Actually using the ptrace to control the CET features does not tie > criu to > glibc. The current proposal for the arch_prctl() allows libc to lock > CET > features and having a ptrace call to control the lock makes criu > agnostic > to libc behaviour. From staring at the glibc code, I'm suspicious something was weird with your test setup, as I don't think it should be locking. But I guess to be completely proper you would need to save and restore the lock state anyway. So, ok yea, on balance probably better to have an extra interface. Should we make it a GET/SET interface?
On Tue, 2022-05-31 at 11:00 -0700, H.J. Lu wrote: > > The glibc logic seems wrong to me also, because shadow stack or IBT > > could be force-disabled via glibc tunables. I don't see why the elf > > header bit should exclusively control the feature locking. Or why > > both > > should be locked if only one is in the header. > > glibc locks SHSTK and IBT only if they are enabled at run-time. It's not what I saw in the code. Somehow Mike saw something different as well. > It doesn't > enable/disable/lock WRSS at the moment. If WRSS can be enabled > via arch_prctl at any time, we can't lock it. If WRSS should be > locked early, > how should it be enabled in application? Also can WRSS be enabled > from a dlopened object? I think in the past we discussed having another elf header bit that behaved differently (OR vs AND).
On Wed, Jun 1, 2022 at 10:27 AM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Tue, 2022-05-31 at 11:00 -0700, H.J. Lu wrote: > > > The glibc logic seems wrong to me also, because shadow stack or IBT > > > could be force-disabled via glibc tunables. I don't see why the elf > > > header bit should exclusively control the feature locking. Or why > > > both > > > should be locked if only one is in the header. > > > > glibc locks SHSTK and IBT only if they are enabled at run-time. > > It's not what I saw in the code. Somehow Mike saw something different > as well. The current glibc cet branch: https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/cet/master locks only available CET features. Since only SHSTK is available, I saw arch_prctl(0x3003 /* ARCH_??? */, 0x2) = 0 CET features are always enabled early in ld.so to allow function calls in the CET enabled ld.so. ld.so always locks CET features even if they are disabled when the program or a dependency library isn't CET enabled. > > It doesn't > > enable/disable/lock WRSS at the moment. If WRSS can be enabled > > via arch_prctl at any time, we can't lock it. If WRSS should be > > locked early, > > how should it be enabled in application? Also can WRSS be enabled > > from a dlopened object? > > I think in the past we discussed having another elf header bit that > behaved differently (OR vs AND). We should have a complete list of use cases and design a way to support them.
On Wed, Jun 01, 2022 at 05:24:26PM +0000, Edgecombe, Rick P wrote: > On Wed, 2022-06-01 at 11:06 +0300, Mike Rapoport wrote: > > > Yea, having something working is really great. My only hesitancy is > > > that, per a discussion on the LAM patchset, we are going to make > > > this > > > enabling API CET only (same semantics for though). I suppose the > > > locking API arch_prctl() could still be support other arch > > > features, > > > but it might be a second CET only regset. It's not the end of the > > > world. > > > > The support for CET in criu is anyway experimental for now, if the > > kernel > > API will be slightly different in the end, we'll update criu. > > The important things are the ability to control tracee shadow stack > > from ptrace, the ability to map the shadow stack at fixed address and > > the > > ability to control the features at least from ptrace. > > As long as we have APIs that provide those, it should be Ok. > > > > > I guess the other consideration is tieing CRIU to glibc > > > peculiarities. > > > Like even if we fix glibc, then CRIU may not work with some other > > > libc > > > or app that force disables for some weird reason. Is it supposed to > > > be > > > libc-agnostic? > > > > Actually using the ptrace to control the CET features does not tie > > criu to > > glibc. The current proposal for the arch_prctl() allows libc to lock > > CET > > features and having a ptrace call to control the lock makes criu > > agnostic > > to libc behaviour. > > From staring at the glibc code, I'm suspicious something was weird with > your test setup, as I don't think it should be locking. But I guess to > be completely proper you would need to save and restore the lock state > anyway. So, ok yea, on balance probably better to have an extra > interface. > > Should we make it a GET/SET interface? Yes, I think so.