Message ID | 20230307102924.70577-3-chengyou@linux.alibaba.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | RDMA/erdma: Add non-4K page size support | expand |
On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote: > Doorbell resources are exposed to userspace by mmap. The size unit of mmap > is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE > is not 4K. We support non-4K page size in this commit. Why do you need this information in rdma-core? Can you use sysconf(_SC_PAGESIZE) there to understand the page size like other providers? Thanks
On 3/14/23 7:31 PM, Cheng Xu wrote: > > > On 3/14/23 6:23 PM, Leon Romanovsky wrote: >> On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote: >>> Doorbell resources are exposed to userspace by mmap. The size unit of mmap >>> is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE >>> is not 4K. We support non-4K page size in this commit. >> >> Why do you need this information in rdma-core? >> Can you use sysconf(_SC_PAGESIZE) there to understand the page size like >> other providers? >> > > I don't expose PAGE_SIZE to userspace in this patchset, but the *offset* in > PAGE of the DBs: > > diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h > index b7a0222f978f..57f8942a3c56 100644 > --- a/include/uapi/rdma/erdma-abi.h > +++ b/include/uapi/rdma/erdma-abi.h > @@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx { > __u32 dev_id; > __u32 pad; > __u32 sdb_type; > - __u32 sdb_offset; > + __u32 sdb_entid; > __aligned_u64 sdb; > __aligned_u64 rdb; > __aligned_u64 cdb; > + __u32 sdb_off; > + __u32 rdb_off; > + __u32 cdb_off; > }; > > Our doorbell space is aligned to 4096 > Please ignore these two response. There is something wrong with my Thunderbird: When I press ctrl + s, it send the email, not save it. Sorry for this. Cheng Xu > > >> Thanks
On 3/14/23 6:23 PM, Leon Romanovsky wrote: > On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote: >> Doorbell resources are exposed to userspace by mmap. The size unit of mmap >> is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE >> is not 4K. We support non-4K page size in this commit. > > Why do you need this information in rdma-core? > Can you use sysconf(_SC_PAGESIZE) there to understand the page size like > other providers? > I don't expose PAGE_SIZE to userspace in this patchset, but the *offset* in PAGE of the DBs: diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h index b7a0222f978f..57f8942a3c56 100644 --- a/include/uapi/rdma/erdma-abi.h +++ b/include/uapi/rdma/erdma-abi.h @@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx { __u32 dev_id; __u32 pad; __u32 sdb_type; - __u32 sdb_offset; + __u32 sdb_entid; __aligned_u64 sdb; __aligned_u64 rdb; __aligned_u64 cdb; + __u32 sdb_off; + __u32 rdb_off; + __u32 cdb_off; }; Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is also 4096, and the doorbell space starts from the mapped page. When PAGE_SIZE is not 4096, the doorbell space may starts from the middle of the mapped page. For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0. When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K, the doorbell space starts from the offset 4096 in mmap returned address. So the userspace needs to know the doorbell space offset in mmaped page. Thanks, Cheng Xu > Thanks
On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote: > > > On 3/14/23 6:23 PM, Leon Romanovsky wrote: > > On Tue, Mar 07, 2023 at 06:29:24PM +0800, Cheng Xu wrote: > >> Doorbell resources are exposed to userspace by mmap. The size unit of mmap > >> is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE > >> is not 4K. We support non-4K page size in this commit. > > > > Why do you need this information in rdma-core? > > Can you use sysconf(_SC_PAGESIZE) there to understand the page size like > > other providers? > > > > I don't expose PAGE_SIZE to userspace in this patchset, but the *offset* in > PAGE of the DBs: > > diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h > index b7a0222f978f..57f8942a3c56 100644 > --- a/include/uapi/rdma/erdma-abi.h > +++ b/include/uapi/rdma/erdma-abi.h > @@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx { > __u32 dev_id; > __u32 pad; > __u32 sdb_type; > - __u32 sdb_offset; > + __u32 sdb_entid; > __aligned_u64 sdb; > __aligned_u64 rdb; > __aligned_u64 cdb; > + __u32 sdb_off; > + __u32 rdb_off; > + __u32 cdb_off; > }; > > Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is > also 4096, and the doorbell space starts from the mapped page. When > PAGE_SIZE is not 4096, the doorbell space may starts from the middle of > the mapped page. > > For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0. > When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K, > the doorbell space starts from the offset 4096 in mmap returned address. > > So the userspace needs to know the doorbell space offset in mmaped page. And can't you preserve same alignment in the kernel for doorbells for every page size? Just always start from 0. > > Thanks, > Cheng Xu > > > > > Thanks
On 3/14/23 10:10 PM, Leon Romanovsky wrote: > On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote: >> >> <...> >> >> Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is >> also 4096, and the doorbell space starts from the mapped page. When >> PAGE_SIZE is not 4096, the doorbell space may starts from the middle of >> the mapped page. >> >> For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0. >> When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K, >> the doorbell space starts from the offset 4096 in mmap returned address. >> >> So the userspace needs to know the doorbell space offset in mmaped page. > > And can't you preserve same alignment in the kernel for doorbells for every page size? > Just always start from 0. > I've considered this option before, but unfortunately can't, at least for CQ DB. The size of our PCIe bar 0 is 512K, and offset [484K, 508K] are CQ doorbells. CQ doorbell space is located in offset [36K, 60K] when PAGE_SIZE = 64K, and can't start from offset 0 in this case. Another reason is that we want to organize SQ doorbell space in unit of 4096. In current implementation, each ucontext will be assigned a SQ doorbell space for both normal doorbell and direct wqe usage. Unit of 4096, compared with larger unit, more ucontexts can be assigned exclusive doorbell space for direct wqe. Thanks, Cheng Xu
On Wed, Mar 15, 2023 at 09:58:06AM +0800, Cheng Xu wrote: > > > On 3/14/23 10:10 PM, Leon Romanovsky wrote: > > On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote: > >> > >> > <...> > >> > >> Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is > >> also 4096, and the doorbell space starts from the mapped page. When > >> PAGE_SIZE is not 4096, the doorbell space may starts from the middle of > >> the mapped page. > >> > >> For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0. > >> When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K, > >> the doorbell space starts from the offset 4096 in mmap returned address. > >> > >> So the userspace needs to know the doorbell space offset in mmaped page. > > > > And can't you preserve same alignment in the kernel for doorbells for every page size? > > Just always start from 0. > > > > I've considered this option before, but unfortunately can't, at least for CQ DB. > The size of our PCIe bar 0 is 512K, and offset [484K, 508K] are CQ doorbells. > CQ doorbell space is located in offset [36K, 60K] when PAGE_SIZE = 64K, and can't > start from offset 0 in this case. > > Another reason is that we want to organize SQ doorbell space in unit of 4096. > In current implementation, each ucontext will be assigned a SQ doorbell space > for both normal doorbell and direct wqe usage. Unit of 4096, compared with > larger unit, more ucontexts can be assigned exclusive doorbell space for direct > wqe. I have a feeling that there is an existing API for it already. Let's give a chance for Jason to chime in. Thanks > > Thanks, > Cheng Xu
On Wed, Mar 15, 2023 at 12:22:10PM +0200, Leon Romanovsky wrote: > On Wed, Mar 15, 2023 at 09:58:06AM +0800, Cheng Xu wrote: > > > > > > On 3/14/23 10:10 PM, Leon Romanovsky wrote: > > > On Tue, Mar 14, 2023 at 07:50:19PM +0800, Cheng Xu wrote: > > >> > > >> > > <...> > > >> > > >> Our doorbell space is aligned to 4096, this works fine when PAGE_SIZE is > > >> also 4096, and the doorbell space starts from the mapped page. When > > >> PAGE_SIZE is not 4096, the doorbell space may starts from the middle of > > >> the mapped page. > > >> > > >> For example, our SQ doorbell starts from the offset 4096 in PCIe bar 0. > > >> When we map the first SQ doorbell to userspace when PAGE_SIZE is 64K, > > >> the doorbell space starts from the offset 4096 in mmap returned address. > > >> > > >> So the userspace needs to know the doorbell space offset in mmaped page. > > > > > > And can't you preserve same alignment in the kernel for doorbells for every page size? > > > Just always start from 0. > > > > > > > I've considered this option before, but unfortunately can't, at least for CQ DB. > > The size of our PCIe bar 0 is 512K, and offset [484K, 508K] are CQ doorbells. > > CQ doorbell space is located in offset [36K, 60K] when PAGE_SIZE = 64K, and can't > > start from offset 0 in this case. > > > > Another reason is that we want to organize SQ doorbell space in unit of 4096. > > In current implementation, each ucontext will be assigned a SQ doorbell space > > for both normal doorbell and direct wqe usage. Unit of 4096, compared with > > larger unit, more ucontexts can be assigned exclusive doorbell space for direct > > wqe. > > I have a feeling that there is an existing API for it already. > Let's give a chance for Jason to chime in. This sounds similar to how mlx5 chops up its doorbell space But I don't understand your device security model. In mlx5 it is not allowed to share doorbells between unrelated processes. Doorbells have built in security and a doorbell can only trigger QP/CQ's that are explicitly linked to that doorbell. Thus mlx5 is careful to only mmap doorbells that are linked to the QPs. On 64K page size userspace receives alot of doorbells per mmap, all linked to the same security context. Improperly sharing MMIO pages can result in various security problems if a hostile userspace can write arbitary things to MMIO space. Here you seem to be talking about overmapping stuff. What is the security argument that it is safe to leak to userspace parts of the device MMIO BAR beyond that strictly cotnained to the single doorbell? This has to be clearly explained in the commit message and a comment. Jason
On 3/21/23 10:30 PM, Jason Gunthorpe wrote: > On Wed, Mar 15, 2023 at 12:22:10PM +0200, Leon Romanovsky wrote: <...> > > This sounds similar to how mlx5 chops up its doorbell space > > But I don't understand your device security model. > > In mlx5 it is not allowed to share doorbells between unrelated > processes. Doorbells have built in security and a doorbell can only > trigger QP/CQ's that are explicitly linked to that doorbell. > > Thus mlx5 is careful to only mmap doorbells that are linked to the > QPs. On 64K page size userspace receives alot of doorbells per mmap, > all linked to the same security context. > > Improperly sharing MMIO pages can result in various security problems > if a hostile userspace can write arbitary things to MMIO space. > > Here you seem to be talking about overmapping stuff. What is the > security argument that it is safe to leak to userspace parts of the > device MMIO BAR beyond that strictly cotnained to the single doorbell? Thank you for your explanation. IIUC, this security mechanism of mlx5 needs the support of HW, and the HW can reject doorbells from unauthorized doorbell space. The current generation of erdma devices do not have this capability due to implementation complexity. Without this HW capability, isolating the MMIO space in software doesn't prevent the attack, because the malicious APPs can map mmio itself, not through verbs interface. Our consideration of security is mainly focused on the VM/VF level, not at the process/ucontext level: no matter what the user does inside the VM, it cannot affect other VMs. Therefore, The userspace isolation of mmio inside one VM is incomplete and shared mmio pages cannot be avoided in this generation. > This has to be clearly explained in the commit message and a comment. All right, I will do this in v3. Thanks, Cheng Xu
On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote: > The current generation of erdma devices do not have this capability due to > implementation complexity. Without this HW capability, isolating the MMIO > space in software doesn't prevent the attack, because the malicious APPs > can map mmio itself, not through verbs interface. This doesn't meet the security model of Linux, verbs HW is expected to protect one process from another process. if this is the case we should consider restricting this HW to CAP_SYS_RAW_IO only. You should come with an explanation why this HW is safe enough to avoid this. Jason
On 3/22/23 7:54 PM, Jason Gunthorpe wrote: > On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote: > >> The current generation of erdma devices do not have this capability due to >> implementation complexity. Without this HW capability, isolating the MMIO >> space in software doesn't prevent the attack, because the malicious APPs >> can map mmio itself, not through verbs interface. > > This doesn't meet the security model of Linux, verbs HW is expected to > protect one process from another process. OK, I see. So the key point is that HW should restrict each process to use its own doorbell space. If hardware can do this, share or do not share MMIO pages both will meet the security requirement. Do I get it right? It seems that EFA uses shared MMIO pages with hardware security assurance. > if this is the case we should consider restricting this HW to > CAP_SYS_RAW_IO only. > Please give us a chance to fix this issue first. > You should come with an explanation why this HW is safe enough to > avoid this. I need to discuss with our HW guys and implement the similar security check in HW, and this won't be long. Thanks, Cheng Xu
On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote: > > > On 3/22/23 7:54 PM, Jason Gunthorpe wrote: > > On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote: > > > >> The current generation of erdma devices do not have this capability due to > >> implementation complexity. Without this HW capability, isolating the MMIO > >> space in software doesn't prevent the attack, because the malicious APPs > >> can map mmio itself, not through verbs interface. > > > > This doesn't meet the security model of Linux, verbs HW is expected to > > protect one process from another process. > > OK, I see. > > So the key point is that HW should restrict each process to use its own doorbell > space. If hardware can do this, share or do not share MMIO pages both will meet > the security requirement. Do I get it right? HW can never do that, HW is supposed to rely on the system MMU to isolate doorbell registers The HW responsibility is to make doorbell MMIO registers safe in the hands of other processes. Simple doorbells that only 'kick' and don't convey any information are probably safe to share, and don't require HW checks between the doorbell page and the PD/QP/CQ/etc Doorbells that deliver data - eg a head pointer - are not safe because the wrong head pointer can corrupt the HW state. Process B must not be able to corrupt the head pointer of a QP/CQ owned by Process A under any circumstances. Definitely they cannot have access to the MMIO and also the HW must ensure that writes coming from process B are rejected if they touch resources owned by process a (eg by PD/QPN/CQN checks in HW) Doorbells that accept entire WQE's are definately not safe as a hostile process could execute a WQE on a QP it does not own. > It seems that EFA uses shared MMIO pages with hardware security assurance. I'm not sure how EFA works, it writes this: EFA_SET(&db, EFA_IO_REGS_CQ_DB_CONSUMER_INDEX, cq->cc); EFA_SET(&db, EFA_IO_REGS_CQ_DB_CMD_SN, cq->cmd_sn & 0x3); EFA_SET(&db, EFA_IO_REGS_CQ_DB_ARM, arm); But interestingly there is no CQN value, so I have no idea how it associates the doorbell register with the CQ to load the data into. If it is using a DB register offset to learn the CQN then obviously it cannot share the same doorbell page (or associated CQNs) across verbs FD contexts, that would be a security bug. EFA folks? Jason
On 22/03/2023 16:01, Jason Gunthorpe wrote: >> It seems that EFA uses shared MMIO pages with hardware security assurance. > > I'm not sure how EFA works, it writes this: > > EFA_SET(&db, EFA_IO_REGS_CQ_DB_CONSUMER_INDEX, cq->cc); > EFA_SET(&db, EFA_IO_REGS_CQ_DB_CMD_SN, cq->cmd_sn & 0x3); > EFA_SET(&db, EFA_IO_REGS_CQ_DB_ARM, arm); > > But interestingly there is no CQN value, so I have no idea how it > associates the doorbell register with the CQ to load the data into. > > If it is using a DB register offset to learn the CQN then obviously it > cannot share the same doorbell page (or associated CQNs) across verbs > FD contexts, that would be a security bug. EFA folks? The doorbell's BAR address is dictated by the device, and the CQN is derived from the address. Each doorbell resides in a separate page, there is no sharing of doorbells pages between different UARs.
On 3/22/23 10:01 PM, Jason Gunthorpe wrote: > On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote: >> >> >> On 3/22/23 7:54 PM, Jason Gunthorpe wrote: >>> On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote: >>> >>>> The current generation of erdma devices do not have this capability due to >>>> implementation complexity. Without this HW capability, isolating the MMIO >>>> space in software doesn't prevent the attack, because the malicious APPs >>>> can map mmio itself, not through verbs interface. >>> >>> This doesn't meet the security model of Linux, verbs HW is expected to >>> protect one process from another process. >> >> OK, I see. >> >> So the key point is that HW should restrict each process to use its own doorbell >> space. If hardware can do this, share or do not share MMIO pages both will meet >> the security requirement. Do I get it right? > > HW can never do that, HW is supposed to rely on the system MMU to > isolate doorbell registers > > The HW responsibility is to make doorbell MMIO registers safe in the > hands of other processes. > > Simple doorbells that only 'kick' and don't convey any information are > probably safe to share, and don't require HW checks between the > doorbell page and the PD/QP/CQ/etc > > Doorbells that deliver data - eg a head pointer - are not safe because > the wrong head pointer can corrupt the HW state. Process B must not be > able to corrupt the head pointer of a QP/CQ owned by Process A under > any circumstances. Definitely they cannot have access to the MMIO and > also the HW must ensure that writes coming from process B are rejected > if they touch resources owned by process a (eg by PD/QPN/CQN checks in > HW) > > Doorbells that accept entire WQE's are definately not safe as a > hostile process could execute a WQE on a QP it does not own. > It's much clear, thanks for your explanation and patience. Back to erdma context, we have rethought our implementation. For QPs, we have a field *wqe_index* in SQE/RQE, which indicates the validity of the current WQE. Incorrect doorbell value from other processes can not corrupt the QPC in hardware due to PI range and WQE content validation in HW. Unlike SQ/RQ, for CQ doorbell, It seems that we need some more works to protect it. We have started analyzing the details. Thanks, Cheng Xu
On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote: > > > On 3/22/23 10:01 PM, Jason Gunthorpe wrote: > > On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote: > >> > >> > >> On 3/22/23 7:54 PM, Jason Gunthorpe wrote: > >>> On Wed, Mar 22, 2023 at 03:05:29PM +0800, Cheng Xu wrote: > >>> > >>>> The current generation of erdma devices do not have this capability due to > >>>> implementation complexity. Without this HW capability, isolating the MMIO > >>>> space in software doesn't prevent the attack, because the malicious APPs > >>>> can map mmio itself, not through verbs interface. > >>> > >>> This doesn't meet the security model of Linux, verbs HW is expected to > >>> protect one process from another process. > >> > >> OK, I see. > >> > >> So the key point is that HW should restrict each process to use its own doorbell > >> space. If hardware can do this, share or do not share MMIO pages both will meet > >> the security requirement. Do I get it right? > > > > HW can never do that, HW is supposed to rely on the system MMU to > > isolate doorbell registers > > > > The HW responsibility is to make doorbell MMIO registers safe in the > > hands of other processes. > > > > Simple doorbells that only 'kick' and don't convey any information are > > probably safe to share, and don't require HW checks between the > > doorbell page and the PD/QP/CQ/etc > > > > Doorbells that deliver data - eg a head pointer - are not safe because > > the wrong head pointer can corrupt the HW state. Process B must not be > > able to corrupt the head pointer of a QP/CQ owned by Process A under > > any circumstances. Definitely they cannot have access to the MMIO and > > also the HW must ensure that writes coming from process B are rejected > > if they touch resources owned by process a (eg by PD/QPN/CQN checks in > > HW) > > > > Doorbells that accept entire WQE's are definately not safe as a > > hostile process could execute a WQE on a QP it does not own. > > > > It's much clear, thanks for your explanation and patience. > > Back to erdma context, we have rethought our implementation. For QPs, > we have a field *wqe_index* in SQE/RQE, which indicates the validity > of the current WQE. Incorrect doorbell value from other processes can > not corrupt the QPC in hardware due to PI range and WQE content > validation in HW. No, validating the DB content is not acceptable security. The attacker process can always generate valid content if it tries hard enough. The only acceptable answer is to do like every other NIC did and link the DB register to the HW object it is allowed to affect. Jason
On 3/23/23 7:53 PM, Jason Gunthorpe wrote: > On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote: >> >> >> On 3/22/23 10:01 PM, Jason Gunthorpe wrote: >>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote: >>>> >>>> >>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote: <...> >> >> It's much clear, thanks for your explanation and patience. >> >> Back to erdma context, we have rethought our implementation. For QPs, >> we have a field *wqe_index* in SQE/RQE, which indicates the validity >> of the current WQE. Incorrect doorbell value from other processes can >> not corrupt the QPC in hardware due to PI range and WQE content >> validation in HW. > > No, validating the DB content is not acceptable security. The attacker > process can always generate valid content if it tries hard enough. > Oh, you may misunderstand what I said, our HW validates the *WQE* content, not *DB* content. The attacker can not generate the WQE of other QPs. This protection and correction is already implemented in our HW. > The only acceptable answer is to do like every other NIC did and link > the DB register to the HW object it is allowed to affect. > Emm, still not acceptable with WQE content validation? If it's acceptable, will reduce some works. Thanks, Cheng Xu
On Thu, Mar 23, 2023 at 08:33:53PM +0800, Cheng Xu wrote: > > > On 3/23/23 7:53 PM, Jason Gunthorpe wrote: > > On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote: > >> > >> > >> On 3/22/23 10:01 PM, Jason Gunthorpe wrote: > >>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote: > >>>> > >>>> > >>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote: > <...> > >> > >> It's much clear, thanks for your explanation and patience. > >> > >> Back to erdma context, we have rethought our implementation. For QPs, > >> we have a field *wqe_index* in SQE/RQE, which indicates the validity > >> of the current WQE. Incorrect doorbell value from other processes can > >> not corrupt the QPC in hardware due to PI range and WQE content > >> validation in HW. > > > > No, validating the DB content is not acceptable security. The attacker > > process can always generate valid content if it tries hard enough. > > > > Oh, you may misunderstand what I said, our HW validates the *WQE* content, > not *DB* content. The attacker can not generate the WQE of other QPs. This > protection and correction is already implemented in our HW. Why are you talking about WQEs in a discussion about doorbell security? WQE's are read via DMA from their SQ/RQs - that path doesn't have a doorbell security problem. The issue comes if you try to deliver the WQE via a doorbell write. Jason
On 3/23/23 9:05 PM, Jason Gunthorpe wrote: > On Thu, Mar 23, 2023 at 08:33:53PM +0800, Cheng Xu wrote: >> >> >> On 3/23/23 7:53 PM, Jason Gunthorpe wrote: >>> On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote: >>>> >>>> >>>> On 3/22/23 10:01 PM, Jason Gunthorpe wrote: >>>>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote: >>>>>> >>>>>> >>>>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote: >> <...> >>>> >>>> It's much clear, thanks for your explanation and patience. >>>> >>>> Back to erdma context, we have rethought our implementation. For QPs, >>>> we have a field *wqe_index* in SQE/RQE, which indicates the validity >>>> of the current WQE. Incorrect doorbell value from other processes can >>>> not corrupt the QPC in hardware due to PI range and WQE content >>>> validation in HW. >>> >>> No, validating the DB content is not acceptable security. The attacker >>> process can always generate valid content if it tries hard enough. >>> >> >> Oh, you may misunderstand what I said, our HW validates the *WQE* content, >> not *DB* content. The attacker can not generate the WQE of other QPs. This >> protection and correction is already implemented in our HW. > > Why are you talking about WQEs in a discussion about doorbell > security? Malicious doorbell will corrupt the head pointer in QPC, and then invalid WQEs will be processed. My point is that WQE validation can correct the head pointer carried in malicious doorbell, and can prevent such attack. It looks like the first kind of the doorbell types you mentioned before, but only conveying the QPN. Thanks, Cheng Xu
On Thu, Mar 23, 2023 at 10:10:25PM +0800, Cheng Xu wrote: > > > On 3/23/23 9:05 PM, Jason Gunthorpe wrote: > > On Thu, Mar 23, 2023 at 08:33:53PM +0800, Cheng Xu wrote: > >> > >> > >> On 3/23/23 7:53 PM, Jason Gunthorpe wrote: > >>> On Thu, Mar 23, 2023 at 02:57:49PM +0800, Cheng Xu wrote: > >>>> > >>>> > >>>> On 3/22/23 10:01 PM, Jason Gunthorpe wrote: > >>>>> On Wed, Mar 22, 2023 at 09:30:41PM +0800, Cheng Xu wrote: > >>>>>> > >>>>>> > >>>>>> On 3/22/23 7:54 PM, Jason Gunthorpe wrote: > >> <...> > >>>> > >>>> It's much clear, thanks for your explanation and patience. > >>>> > >>>> Back to erdma context, we have rethought our implementation. For QPs, > >>>> we have a field *wqe_index* in SQE/RQE, which indicates the validity > >>>> of the current WQE. Incorrect doorbell value from other processes can > >>>> not corrupt the QPC in hardware due to PI range and WQE content > >>>> validation in HW. > >>> > >>> No, validating the DB content is not acceptable security. The attacker > >>> process can always generate valid content if it tries hard enough. > >>> > >> > >> Oh, you may misunderstand what I said, our HW validates the *WQE* content, > >> not *DB* content. The attacker can not generate the WQE of other QPs. This > >> protection and correction is already implemented in our HW. > > > > Why are you talking about WQEs in a discussion about doorbell > > security? > > Malicious doorbell will corrupt the head pointer in QPC, and then invalid WQEs > will be processed. My point is that WQE validation can correct the head pointer > carried in malicious doorbell, and can prevent such attack. No, if the head pointer is incorrect an attack can stall the QP by guessing a head pointer that is valid but before already submitted WQEs. There is no WQE based recovery for such a thing. Jason
On 3/23/23 10:18 PM, Jason Gunthorpe wrote: > On Thu, Mar 23, 2023 at 10:10:25PM +0800, Cheng Xu wrote: >> >> <...> >> >> Malicious doorbell will corrupt the head pointer in QPC, and then invalid WQEs >> will be processed. My point is that WQE validation can correct the head pointer >> carried in malicious doorbell, and can prevent such attack. > > No, if the head pointer is incorrect an attack can stall the QP by > guessing a head pointer that is valid but before already submitted > WQEs. You are right. Thanks very much for this discussion and your advisement. Cheng Xu > > There is no WQE based recovery for such a thing. > > Jason
diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c index 83e1b0d55977..0cd8dd61f569 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.c +++ b/drivers/infiniband/hw/erdma/erdma_verbs.c @@ -1137,8 +1137,8 @@ void erdma_mmap_free(struct rdma_user_mmap_entry *rdma_entry) static void alloc_db_resources(struct erdma_dev *dev, struct erdma_ucontext *ctx) { - u32 bitmap_idx; struct erdma_devattr *attrs = &dev->attrs; + u32 bitmap_idx, hw_page_idx; if (attrs->disable_dwqe) goto alloc_normal_db; @@ -1151,11 +1151,9 @@ static void alloc_db_resources(struct erdma_dev *dev, spin_unlock(&dev->db_bitmap_lock); ctx->sdb_type = ERDMA_SDB_PAGE; - ctx->sdb_idx = bitmap_idx; - ctx->sdb_page_idx = bitmap_idx; + ctx->sdb_bitmap_idx = bitmap_idx; ctx->sdb = dev->func_bar_addr + ERDMA_BAR_SQDB_SPACE_OFFSET + - (bitmap_idx << PAGE_SHIFT); - ctx->sdb_page_off = 0; + (bitmap_idx << ERDMA_HW_PAGE_SHIFT); return; } @@ -1166,13 +1164,13 @@ static void alloc_db_resources(struct erdma_dev *dev, spin_unlock(&dev->db_bitmap_lock); ctx->sdb_type = ERDMA_SDB_ENTRY; - ctx->sdb_idx = bitmap_idx; - ctx->sdb_page_idx = attrs->dwqe_pages + + ctx->sdb_bitmap_idx = bitmap_idx; + hw_page_idx = attrs->dwqe_pages + bitmap_idx / ERDMA_DWQE_TYPE1_CNT_PER_PAGE; - ctx->sdb_page_off = bitmap_idx % ERDMA_DWQE_TYPE1_CNT_PER_PAGE; + ctx->sdb_entid = bitmap_idx % ERDMA_DWQE_TYPE1_CNT_PER_PAGE; ctx->sdb = dev->func_bar_addr + ERDMA_BAR_SQDB_SPACE_OFFSET + - (ctx->sdb_page_idx << PAGE_SHIFT); + (hw_page_idx << PAGE_SHIFT); return; } @@ -1181,11 +1179,8 @@ static void alloc_db_resources(struct erdma_dev *dev, alloc_normal_db: ctx->sdb_type = ERDMA_SDB_SHARED; - ctx->sdb_idx = 0; - ctx->sdb_page_idx = ERDMA_SDB_SHARED_PAGE_INDEX; - ctx->sdb_page_off = 0; - - ctx->sdb = dev->func_bar_addr + (ctx->sdb_page_idx << PAGE_SHIFT); + ctx->sdb = dev->func_bar_addr + + (ERDMA_SDB_SHARED_PAGE_INDEX << ERDMA_HW_PAGE_SHIFT); } static void erdma_uctx_user_mmap_entries_remove(struct erdma_ucontext *uctx) @@ -1215,11 +1210,6 @@ int erdma_alloc_ucontext(struct ib_ucontext *ibctx, struct ib_udata *udata) ctx->rdb = dev->func_bar_addr + ERDMA_BAR_RQDB_SPACE_OFFSET; ctx->cdb = dev->func_bar_addr + ERDMA_BAR_CQDB_SPACE_OFFSET; - if (udata->outlen < sizeof(uresp)) { - ret = -EINVAL; - goto err_out; - } - ctx->sq_db_mmap_entry = erdma_user_mmap_entry_insert( ctx, (void *)ctx->sdb, PAGE_SIZE, ERDMA_MMAP_IO_NC, &uresp.sdb); if (!ctx->sq_db_mmap_entry) { @@ -1243,9 +1233,13 @@ int erdma_alloc_ucontext(struct ib_ucontext *ibctx, struct ib_udata *udata) uresp.dev_id = dev->pdev->device; uresp.sdb_type = ctx->sdb_type; - uresp.sdb_offset = ctx->sdb_page_off; + uresp.sdb_entid = ctx->sdb_entid; + uresp.sdb_off = ctx->sdb & ~PAGE_MASK; + uresp.rdb_off = ctx->rdb & ~PAGE_MASK; + uresp.cdb_off = ctx->cdb & ~PAGE_MASK; - ret = ib_copy_to_udata(udata, &uresp, sizeof(uresp)); + ret = ib_copy_to_udata(udata, &uresp, + min(sizeof(uresp), udata->outlen)); if (ret) goto err_out; @@ -1264,9 +1258,9 @@ void erdma_dealloc_ucontext(struct ib_ucontext *ibctx) spin_lock(&dev->db_bitmap_lock); if (ctx->sdb_type == ERDMA_SDB_PAGE) - clear_bit(ctx->sdb_idx, dev->sdb_page); + clear_bit(ctx->sdb_bitmap_idx, dev->sdb_page); else if (ctx->sdb_type == ERDMA_SDB_ENTRY) - clear_bit(ctx->sdb_idx, dev->sdb_entry); + clear_bit(ctx->sdb_bitmap_idx, dev->sdb_entry); erdma_uctx_user_mmap_entries_remove(ctx); diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.h b/drivers/infiniband/hw/erdma/erdma_verbs.h index e0a993bc032a..4dbef1483027 100644 --- a/drivers/infiniband/hw/erdma/erdma_verbs.h +++ b/drivers/infiniband/hw/erdma/erdma_verbs.h @@ -35,9 +35,8 @@ struct erdma_ucontext { struct ib_ucontext ibucontext; u32 sdb_type; - u32 sdb_idx; - u32 sdb_page_idx; - u32 sdb_page_off; + u32 sdb_bitmap_idx; + u32 sdb_entid; u64 sdb; u64 rdb; u64 cdb; diff --git a/include/uapi/rdma/erdma-abi.h b/include/uapi/rdma/erdma-abi.h index b7a0222f978f..57f8942a3c56 100644 --- a/include/uapi/rdma/erdma-abi.h +++ b/include/uapi/rdma/erdma-abi.h @@ -40,10 +40,13 @@ struct erdma_uresp_alloc_ctx { __u32 dev_id; __u32 pad; __u32 sdb_type; - __u32 sdb_offset; + __u32 sdb_entid; __aligned_u64 sdb; __aligned_u64 rdb; __aligned_u64 cdb; + __u32 sdb_off; + __u32 rdb_off; + __u32 cdb_off; }; #endif
Doorbell resources are exposed to userspace by mmap. The size unit of mmap is PAGE_SIZE, previous implementation can not work correctly if PAGE_SIZE is not 4K. We support non-4K page size in this commit. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> --- drivers/infiniband/hw/erdma/erdma_verbs.c | 40 ++++++++++------------- drivers/infiniband/hw/erdma/erdma_verbs.h | 5 ++- include/uapi/rdma/erdma-abi.h | 5 ++- 3 files changed, 23 insertions(+), 27 deletions(-)