Message ID | f611857594c5c53918d782f104d6f4e028ba465d.1667997522.git.leonro@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Extend XFRM core to allow packet offload configuration | expand |
On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > @@ -1166,16 +1187,24 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > spin_lock_bh(&net->xfrm.xfrm_state_lock); > x->km.state = XFRM_STATE_ACQ; > list_add(&x->km.all, &net->xfrm.state_all); > - hlist_add_head_rcu(&x->bydst, net->xfrm.state_bydst + h); > + XFRM_STATE_INSERT(bydst, &x->bydst, > + net->xfrm.state_bydst + h, > + x->xso.type); > h = xfrm_src_hash(net, daddr, saddr, encap_family); > - hlist_add_head_rcu(&x->bysrc, net->xfrm.state_bysrc + h); > + XFRM_STATE_INSERT(bysrc, &x->bysrc, > + net->xfrm.state_bysrc + h, > + x->xso.type); > if (x->id.spi) { > h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family); > - hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h); > + XFRM_STATE_INSERT(byspi, &x->byspi, > + net->xfrm.state_byspi + h, > + x->xso.type); > } > if (x->km.seq) { > h = xfrm_seq_hash(net, x->km.seq); > - hlist_add_head_rcu(&x->byseq, net->xfrm.state_byseq + h); > + XFRM_STATE_INSERT(byseq, &x->byseq, > + net->xfrm.state_byseq + h, > + x->xso.type); > } This does not work. A larval state will never have a x->xso.type set. So this raises the question how to handle acquires with this packet offload. You could place the type and offload device to the template, but we also have to make sure not to mess too much with the non offloaded codepath. This is yet another corner case where the concept of doing policy and state lookup in software for a HW offload does not work so well. I fear this is not the last corner case that comes up once you put this into a real network.
On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > @@ -1166,16 +1187,24 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > spin_lock_bh(&net->xfrm.xfrm_state_lock); > > x->km.state = XFRM_STATE_ACQ; > > list_add(&x->km.all, &net->xfrm.state_all); > > - hlist_add_head_rcu(&x->bydst, net->xfrm.state_bydst + h); > > + XFRM_STATE_INSERT(bydst, &x->bydst, > > + net->xfrm.state_bydst + h, > > + x->xso.type); > > h = xfrm_src_hash(net, daddr, saddr, encap_family); > > - hlist_add_head_rcu(&x->bysrc, net->xfrm.state_bysrc + h); > > + XFRM_STATE_INSERT(bysrc, &x->bysrc, > > + net->xfrm.state_bysrc + h, > > + x->xso.type); > > if (x->id.spi) { > > h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family); > > - hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h); > > + XFRM_STATE_INSERT(byspi, &x->byspi, > > + net->xfrm.state_byspi + h, > > + x->xso.type); > > } > > if (x->km.seq) { > > h = xfrm_seq_hash(net, x->km.seq); > > - hlist_add_head_rcu(&x->byseq, net->xfrm.state_byseq + h); > > + XFRM_STATE_INSERT(byseq, &x->byseq, > > + net->xfrm.state_byseq + h, > > + x->xso.type); > > } > > This does not work. A larval state will never have a x->xso.type set. x->xso.type always exists. Default is 0, which is XFRM_DEV_OFFLOAD_UNSPECIFIED. It means this XFRM_STATE_INSERT() will behave exactly as hlist_add_head_rcu() before. > So this raises the question how to handle acquires with this packet > offload. We handle acquires as SW policies and don't offload them. > You could place the type and offload device to the template, > but we also have to make sure not to mess too much with the non > offloaded codepath. > > This is yet another corner case where the concept of doing policy and > state lookup in software for a HW offload does not work so well. I > fear this is not the last corner case that comes up once you put this > into a real network. > It is not different from any other kernel code, bugs will be fixed. BTW, IPsec packet offload mode is in use for almost two years already in real networks. https://docs.nvidia.com/networking/display/OFEDv521040/Changes+and+New+Features Thanks
On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > This does not work. A larval state will never have a x->xso.type set. > > x->xso.type always exists. Default is 0, which is XFRM_DEV_OFFLOAD_UNSPECIFIED. > It means this XFRM_STATE_INSERT() will behave exactly as hlist_add_head_rcu() before. Sure it exists, and is always 0 here. > > > So this raises the question how to handle acquires with this packet > > offload. > > We handle acquires as SW policies and don't offload them. We trigger acquires with states, not policies. The thing is, we might match a HW policy but create a SW acquire state. This will not match anymore as soon as the lookup is implemented correctly. > > You could place the type and offload device to the template, > > but we also have to make sure not to mess too much with the non > > offloaded codepath. > > > > This is yet another corner case where the concept of doing policy and > > state lookup in software for a HW offload does not work so well. I > > fear this is not the last corner case that comes up once you put this > > into a real network. > > > > It is not different from any other kernel code, bugs will be fixed. The thing that is different here is, that the concept is already broken. We can't split the datapath to be partially handled in SW and HW in any sane way, this becomes clearer and clearer. The full protocol offload simply does not fit well into HW, but we try to make it fit with a hammer. This is the problem why I do not really like this, and is also the reason why this is still not merged. We might be much better of by doing a HW frindly redesign of the protocol and offload this then. But, yes that takes time and will have issues too.
On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > This does not work. A larval state will never have a x->xso.type set. > > > > x->xso.type always exists. Default is 0, which is XFRM_DEV_OFFLOAD_UNSPECIFIED. > > It means this XFRM_STATE_INSERT() will behave exactly as hlist_add_head_rcu() before. > > Sure it exists, and is always 0 here. > > > > > > So this raises the question how to handle acquires with this packet > > > offload. > > > > We handle acquires as SW policies and don't offload them. > > We trigger acquires with states, not policies. The thing is, > we might match a HW policy but create a SW acquire state. > This will not match anymore as soon as the lookup is > implemented correctly. For now, all such packets will be dropped as we have offlaoded policy but not SA. Like we said in one of our IPsec coffee hours, this flow will be supported later. Right now, it is not important. > > > > You could place the type and offload device to the template, > > > but we also have to make sure not to mess too much with the non > > > offloaded codepath. > > > > > > This is yet another corner case where the concept of doing policy and > > > state lookup in software for a HW offload does not work so well. I > > > fear this is not the last corner case that comes up once you put this > > > into a real network. > > > > > > > It is not different from any other kernel code, bugs will be fixed. > > The thing that is different here is, that the concept is already > broken. We can't split the datapath to be partially handled in > SW and HW in any sane way, this becomes clearer and clearer. > > The full protocol offload simply does not fit well into HW, > but we try to make it fit with a hammer. This is the problem > why I do not really like this, and is also the reason why this > is still not merged. We might be much better of by doing a > HW frindly redesign of the protocol and offload this then. > But, yes that takes time and will have issues too. When you say "protocol", what do you mean? Many users, who have deployed IPsec solutions, just want to have same look and feel but much faster. I truly believe that this packet offload fits SW model and the (small) amount of changes supports it. There are almost no changes to the stack to natively support this offload. As long as HW involved, you will never have solution without issues, and like you said even redesign "will have issues". Plus, we called it packet offload, better and redesigned version will be called something else. My patches which enhance XFRM to support more than type of offload are exactly for that. Thanks
On Sun, Nov 20, 2022 at 09:17:02PM +0200, Leon Romanovsky wrote: > On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > So this raises the question how to handle acquires with this packet > > > > offload. > > > > > > We handle acquires as SW policies and don't offload them. > > > > We trigger acquires with states, not policies. The thing is, > > we might match a HW policy but create a SW acquire state. > > This will not match anymore as soon as the lookup is > > implemented correctly. > > For now, all such packets will be dropped as we have offlaoded > policy but not SA. I think you missed my point. If the HW policy does not match the SW acquire state, then each packet will geneate a new acquire. So you need to make sure that policy and acquire state will match to send the acquire just once to userspace. > > > It is not different from any other kernel code, bugs will be fixed. > > > > The thing that is different here is, that the concept is already > > broken. We can't split the datapath to be partially handled in > > SW and HW in any sane way, this becomes clearer and clearer. > > > > The full protocol offload simply does not fit well into HW, > > but we try to make it fit with a hammer. This is the problem > > why I do not really like this, and is also the reason why this > > is still not merged. We might be much better of by doing a > > HW frindly redesign of the protocol and offload this then. > > But, yes that takes time and will have issues too. > > When you say "protocol", what do you mean? Many users, who have > deployed IPsec solutions, just want to have same look and feel > but much faster. > > I truly believe that this packet offload fits SW model and the > (small) amount of changes supports it. There are almost no changes > to the stack to natively support this offload. > > As long as HW involved, you will never have solution without issues, > and like you said even redesign "will have issues". Things would be much easier, if we don't need to add HW policies and states to SW databases. But yes, a redesign might have issues too. That's why we are still working on the current soluion :)
On Mon, Nov 21, 2022 at 10:44:04AM +0100, Steffen Klassert wrote: > On Sun, Nov 20, 2022 at 09:17:02PM +0200, Leon Romanovsky wrote: > > On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > > > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > So this raises the question how to handle acquires with this packet > > > > > offload. > > > > > > > > We handle acquires as SW policies and don't offload them. > > > > > > We trigger acquires with states, not policies. The thing is, > > > we might match a HW policy but create a SW acquire state. > > > This will not match anymore as soon as the lookup is > > > implemented correctly. > > > > For now, all such packets will be dropped as we have offlaoded > > policy but not SA. > > I think you missed my point. If the HW policy does not match > the SW acquire state, then each packet will geneate a new > acquire. So you need to make sure that policy and acquire > state will match to send the acquire just once to userspace. I think that I'm still missing the point. We require both policy and SA to be offloaded. It means that once we hit HW policy, we must hit SA too (at least this is how mlx5 part is implemented). If it doesn't hit HW policy, it means that whole path is not offloaded, so no harm will be if we create SW SA for this path. > > > > > It is not different from any other kernel code, bugs will be fixed. > > > > > > The thing that is different here is, that the concept is already > > > broken. We can't split the datapath to be partially handled in > > > SW and HW in any sane way, this becomes clearer and clearer. > > > > > > The full protocol offload simply does not fit well into HW, > > > but we try to make it fit with a hammer. This is the problem > > > why I do not really like this, and is also the reason why this > > > is still not merged. We might be much better of by doing a > > > HW frindly redesign of the protocol and offload this then. > > > But, yes that takes time and will have issues too. > > > > When you say "protocol", what do you mean? Many users, who have > > deployed IPsec solutions, just want to have same look and feel > > but much faster. > > > > I truly believe that this packet offload fits SW model and the > > (small) amount of changes supports it. There are almost no changes > > to the stack to natively support this offload. > > > > As long as HW involved, you will never have solution without issues, > > and like you said even redesign "will have issues". > > Things would be much easier, if we don't need to add HW policies > and states to SW databases. But yes, a redesign might have issues > too. That's why we are still working on the current soluion :) > There will be PSP for that, but it is not IPsec. Thanks
On Mon, Nov 21, 2022 at 12:27:01PM +0200, Leon Romanovsky wrote: > On Mon, Nov 21, 2022 at 10:44:04AM +0100, Steffen Klassert wrote: > > On Sun, Nov 20, 2022 at 09:17:02PM +0200, Leon Romanovsky wrote: > > > On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > > > > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > > > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > So this raises the question how to handle acquires with this packet > > > > > > offload. > > > > > > > > > > We handle acquires as SW policies and don't offload them. > > > > > > > > We trigger acquires with states, not policies. The thing is, > > > > we might match a HW policy but create a SW acquire state. > > > > This will not match anymore as soon as the lookup is > > > > implemented correctly. > > > > > > For now, all such packets will be dropped as we have offlaoded > > > policy but not SA. > > > > I think you missed my point. If the HW policy does not match > > the SW acquire state, then each packet will geneate a new > > acquire. So you need to make sure that policy and acquire > > state will match to send the acquire just once to userspace. > > I think that I'm still missing the point. > > We require both policy and SA to be offloaded. It means that once > we hit HW policy, we must hit SA too (at least this is how mlx5 part > is implemented). Let's assume a packet hits a HW policy. Then this HW policy must match a HW state. In case there is no matching HW state, we generate an acquire and insert a larval state. Currently, larval states are never marked as HW. Now, the next packet from the same flow maches again this HW policy, but it does not find the larval state because it is not marked as a HW state. So we generate another acquire and insert another larval state. Same happens for packets 3,4,5... Expected behaviour for subsequent packets is that the lookup will find a matching HW larval state and the packet is dropped without creating another acquire + larval state for the same flow.
On Mon, Nov 21, 2022 at 12:09:26PM +0100, Steffen Klassert wrote: > On Mon, Nov 21, 2022 at 12:27:01PM +0200, Leon Romanovsky wrote: > > On Mon, Nov 21, 2022 at 10:44:04AM +0100, Steffen Klassert wrote: > > > On Sun, Nov 20, 2022 at 09:17:02PM +0200, Leon Romanovsky wrote: > > > > On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > > > > > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > > > > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > > > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > > > So this raises the question how to handle acquires with this packet > > > > > > > offload. > > > > > > > > > > > > We handle acquires as SW policies and don't offload them. > > > > > > > > > > We trigger acquires with states, not policies. The thing is, > > > > > we might match a HW policy but create a SW acquire state. > > > > > This will not match anymore as soon as the lookup is > > > > > implemented correctly. > > > > > > > > For now, all such packets will be dropped as we have offlaoded > > > > policy but not SA. > > > > > > I think you missed my point. If the HW policy does not match > > > the SW acquire state, then each packet will geneate a new > > > acquire. So you need to make sure that policy and acquire > > > state will match to send the acquire just once to userspace. > > > > I think that I'm still missing the point. > > > > We require both policy and SA to be offloaded. It means that once > > we hit HW policy, we must hit SA too (at least this is how mlx5 part > > is implemented). > > Let's assume a packet hits a HW policy. Then this HW policy must match > a HW state. In case there is no matching HW state, we generate an acquire > and insert a larval state. Currently, larval states are never marked as HW. And this is there our views are different. If HW (in RX) sees policy but doesn't have state, this packet will be dropped in HW. It won't get to stack and no acquire request will be issues. > > Now, the next packet from the same flow maches again this HW policy, > but it does not find the larval state because it is not marked as > a HW state. So we generate another acquire and insert another > larval state. Same happens for packets 3,4,5... > > Expected behaviour for subsequent packets is that the lookup will > find a matching HW larval state and the packet is dropped without > creating another acquire + larval state for the same flow. > This is why we don't support acquire for now as it will require mixing HW and SW paths which we don't want for now. Thanks
On Mon, Nov 21, 2022 at 01:15:36PM +0200, Leon Romanovsky wrote: > On Mon, Nov 21, 2022 at 12:09:26PM +0100, Steffen Klassert wrote: > > On Mon, Nov 21, 2022 at 12:27:01PM +0200, Leon Romanovsky wrote: > > > On Mon, Nov 21, 2022 at 10:44:04AM +0100, Steffen Klassert wrote: > > > > On Sun, Nov 20, 2022 at 09:17:02PM +0200, Leon Romanovsky wrote: > > > > > On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > > > > > > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > > > > > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > > > > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > > > > > So this raises the question how to handle acquires with this packet > > > > > > > > offload. > > > > > > > > > > > > > > We handle acquires as SW policies and don't offload them. > > > > > > > > > > > > We trigger acquires with states, not policies. The thing is, > > > > > > we might match a HW policy but create a SW acquire state. > > > > > > This will not match anymore as soon as the lookup is > > > > > > implemented correctly. > > > > > > > > > > For now, all such packets will be dropped as we have offlaoded > > > > > policy but not SA. > > > > > > > > I think you missed my point. If the HW policy does not match > > > > the SW acquire state, then each packet will geneate a new > > > > acquire. So you need to make sure that policy and acquire > > > > state will match to send the acquire just once to userspace. > > > > > > I think that I'm still missing the point. > > > > > > We require both policy and SA to be offloaded. It means that once > > > we hit HW policy, we must hit SA too (at least this is how mlx5 part > > > is implemented). > > > > Let's assume a packet hits a HW policy. Then this HW policy must match > > a HW state. In case there is no matching HW state, we generate an acquire > > and insert a larval state. Currently, larval states are never marked as HW. > > And this is there our views are different. If HW (in RX) sees policy but > doesn't have state, this packet will be dropped in HW. It won't get to > stack and no acquire request will be issues. This makes no sense. Acquires are always generated at TX, never at RX. On RX, the state lookup happens first, the policy must match to the decapsulated packet.
On Mon, Nov 21, 2022 at 12:25:21PM +0100, Steffen Klassert wrote: > On Mon, Nov 21, 2022 at 01:15:36PM +0200, Leon Romanovsky wrote: > > On Mon, Nov 21, 2022 at 12:09:26PM +0100, Steffen Klassert wrote: > > > On Mon, Nov 21, 2022 at 12:27:01PM +0200, Leon Romanovsky wrote: > > > > On Mon, Nov 21, 2022 at 10:44:04AM +0100, Steffen Klassert wrote: > > > > > On Sun, Nov 20, 2022 at 09:17:02PM +0200, Leon Romanovsky wrote: > > > > > > On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > > > > > > > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > > > > > > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > > > > > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > > > > > > > So this raises the question how to handle acquires with this packet > > > > > > > > > offload. > > > > > > > > > > > > > > > > We handle acquires as SW policies and don't offload them. > > > > > > > > > > > > > > We trigger acquires with states, not policies. The thing is, > > > > > > > we might match a HW policy but create a SW acquire state. > > > > > > > This will not match anymore as soon as the lookup is > > > > > > > implemented correctly. > > > > > > > > > > > > For now, all such packets will be dropped as we have offlaoded > > > > > > policy but not SA. > > > > > > > > > > I think you missed my point. If the HW policy does not match > > > > > the SW acquire state, then each packet will geneate a new > > > > > acquire. So you need to make sure that policy and acquire > > > > > state will match to send the acquire just once to userspace. > > > > > > > > I think that I'm still missing the point. > > > > > > > > We require both policy and SA to be offloaded. It means that once > > > > we hit HW policy, we must hit SA too (at least this is how mlx5 part > > > > is implemented). > > > > > > Let's assume a packet hits a HW policy. Then this HW policy must match > > > a HW state. In case there is no matching HW state, we generate an acquire > > > and insert a larval state. Currently, larval states are never marked as HW. > > > > And this is there our views are different. If HW (in RX) sees policy but > > doesn't have state, this packet will be dropped in HW. It won't get to > > stack and no acquire request will be issues. > > This makes no sense. Acquires are always generated at TX, never at RX. Sorry, my bad. But why can't we drop all packets that don't have HW state? Why do we need to add larval? > > On RX, the state lookup happens first, the policy must match to the > decapsulated packet. >
On Mon, Nov 21, 2022 at 01:34:30PM +0200, Leon Romanovsky wrote: > On Mon, Nov 21, 2022 at 12:25:21PM +0100, Steffen Klassert wrote: > > On Mon, Nov 21, 2022 at 01:15:36PM +0200, Leon Romanovsky wrote: > > > On Mon, Nov 21, 2022 at 12:09:26PM +0100, Steffen Klassert wrote: > > > > On Mon, Nov 21, 2022 at 12:27:01PM +0200, Leon Romanovsky wrote: > > > > > On Mon, Nov 21, 2022 at 10:44:04AM +0100, Steffen Klassert wrote: > > > > > > On Sun, Nov 20, 2022 at 09:17:02PM +0200, Leon Romanovsky wrote: > > > > > > > On Fri, Nov 18, 2022 at 11:49:07AM +0100, Steffen Klassert wrote: > > > > > > > > On Thu, Nov 17, 2022 at 02:51:33PM +0200, Leon Romanovsky wrote: > > > > > > > > > On Thu, Nov 17, 2022 at 01:12:43PM +0100, Steffen Klassert wrote: > > > > > > > > > > On Wed, Nov 09, 2022 at 02:54:34PM +0200, Leon Romanovsky wrote: > > > > > > > > > > > From: Leon Romanovsky <leonro@nvidia.com> > > > > > > > > > > > > > > > > > > > So this raises the question how to handle acquires with this packet > > > > > > > > > > offload. > > > > > > > > > > > > > > > > > > We handle acquires as SW policies and don't offload them. > > > > > > > > > > > > > > > > We trigger acquires with states, not policies. The thing is, > > > > > > > > we might match a HW policy but create a SW acquire state. > > > > > > > > This will not match anymore as soon as the lookup is > > > > > > > > implemented correctly. > > > > > > > > > > > > > > For now, all such packets will be dropped as we have offlaoded > > > > > > > policy but not SA. > > > > > > > > > > > > I think you missed my point. If the HW policy does not match > > > > > > the SW acquire state, then each packet will geneate a new > > > > > > acquire. So you need to make sure that policy and acquire > > > > > > state will match to send the acquire just once to userspace. > > > > > > > > > > I think that I'm still missing the point. > > > > > > > > > > We require both policy and SA to be offloaded. It means that once > > > > > we hit HW policy, we must hit SA too (at least this is how mlx5 part > > > > > is implemented). > > > > > > > > Let's assume a packet hits a HW policy. Then this HW policy must match > > > > a HW state. In case there is no matching HW state, we generate an acquire > > > > and insert a larval state. Currently, larval states are never marked as HW. > > > > > > And this is there our views are different. If HW (in RX) sees policy but > > > doesn't have state, this packet will be dropped in HW. It won't get to > > > stack and no acquire request will be issues. > > > > This makes no sense. Acquires are always generated at TX, never at RX. > > Sorry, my bad. But why can't we drop all packets that don't have HW > state? Why do we need to add larval? I think that something like this will do the trick. diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 5076f9d7a752..d1c9ef857755 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1090,6 +1090,28 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, } } +static bool xfrm_state_and_policy_mixed(struct xfrm_state *x, + struct xfrm_policy *p) +{ + /* Packet offload: both policy and SA should be offloaded */ + if (p->xdo.type == XFRM_DEV_OFFLOAD_PACKET && + x->xso.type != XFRM_DEV_OFFLOAD_PACKET) + return true; + + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET && + x->xso.type == XFRM_DEV_OFFLOAD_PACKET) + return true; + + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET) + return false; + + /* Packet offload: both policy and SA should have same device */ + if (p->xdo.dev != x->xso.dev) + return true; + + return false; +} + struct xfrm_state * xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, const struct flowi *fl, struct xfrm_tmpl *tmpl, @@ -1147,7 +1169,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, found: x = best; - if (!x && !error && !acquire_in_progress) { + if (!x && !error && !acquire_in_progress && + pol->xdo.type != XFRM_DEV_OFFLOAD_PACKET) { if (tmpl->id.spi && (x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi, tmpl->id.proto, encap_family)) != NULL) { @@ -1228,7 +1251,14 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, *err = -EAGAIN; x = NULL; } + if (x && xfrm_state_and_policy_mixed(x, pol)) { + *err = -EINVAL; + x = NULL; + } } else { + if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) + error = -EINVAL; + *err = acquire_in_progress ? -EAGAIN : error; } rcu_read_unlock(); (END) > > > > > On RX, the state lookup happens first, the policy must match to the > > decapsulated packet. > >
On Mon, Nov 21, 2022 at 01:34:30PM +0200, Leon Romanovsky wrote: > > Sorry, my bad. But why can't we drop all packets that don't have HW > state? Why do we need to add larval? The first packet of a flow tiggers an acquire and inserts a larval state. On a traffic triggered connection, we need this to get a state with keys installed. We need this larval state then, because that tells us we sent already an acquire to userspace. All subsequent packets of that flow will be dropped without sending another acquire. Otherwise each subsequent packet will generate another acquire until the keys are negotiated. If a flow starts sending on a high rate, this would be not so nice for userspace :)
On Mon, Nov 21, 2022 at 02:02:52PM +0200, Leon Romanovsky wrote: > > I think that something like this will do the trick. > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index 5076f9d7a752..d1c9ef857755 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1090,6 +1090,28 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, > } > } > > +static bool xfrm_state_and_policy_mixed(struct xfrm_state *x, > + struct xfrm_policy *p) > +{ > + /* Packet offload: both policy and SA should be offloaded */ > + if (p->xdo.type == XFRM_DEV_OFFLOAD_PACKET && > + x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > + return true; > + > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET && > + x->xso.type == XFRM_DEV_OFFLOAD_PACKET) > + return true; > + > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET) > + return false; > + > + /* Packet offload: both policy and SA should have same device */ > + if (p->xdo.dev != x->xso.dev) > + return true; > + > + return false; > +} > + > struct xfrm_state * > xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > const struct flowi *fl, struct xfrm_tmpl *tmpl, > @@ -1147,7 +1169,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > found: > x = best; > - if (!x && !error && !acquire_in_progress) { > + if (!x && !error && !acquire_in_progress && > + pol->xdo.type != XFRM_DEV_OFFLOAD_PACKET) { > if (tmpl->id.spi && > (x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi, > tmpl->id.proto, encap_family)) != NULL) { > @@ -1228,7 +1251,14 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > *err = -EAGAIN; > x = NULL; > } > + if (x && xfrm_state_and_policy_mixed(x, pol)) { > + *err = -EINVAL; > + x = NULL; If policy and state do not match here, this means the lookup returned the wrong state. The correct state might still sit in the database. At this point, you should either have found a matching state, or no state at all.
On Mon, Nov 21, 2022 at 01:43:49PM +0100, Steffen Klassert wrote: > On Mon, Nov 21, 2022 at 02:02:52PM +0200, Leon Romanovsky wrote: > > > > I think that something like this will do the trick. > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > > index 5076f9d7a752..d1c9ef857755 100644 > > --- a/net/xfrm/xfrm_state.c > > +++ b/net/xfrm/xfrm_state.c > > @@ -1090,6 +1090,28 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, > > } > > } > > > > +static bool xfrm_state_and_policy_mixed(struct xfrm_state *x, > > + struct xfrm_policy *p) > > +{ > > + /* Packet offload: both policy and SA should be offloaded */ > > + if (p->xdo.type == XFRM_DEV_OFFLOAD_PACKET && > > + x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > > + return true; > > + > > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET && > > + x->xso.type == XFRM_DEV_OFFLOAD_PACKET) > > + return true; > > + > > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET) > > + return false; > > + > > + /* Packet offload: both policy and SA should have same device */ > > + if (p->xdo.dev != x->xso.dev) > > + return true; > > + > > + return false; > > +} > > + > > struct xfrm_state * > > xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > const struct flowi *fl, struct xfrm_tmpl *tmpl, > > @@ -1147,7 +1169,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > > found: > > x = best; > > - if (!x && !error && !acquire_in_progress) { > > + if (!x && !error && !acquire_in_progress && > > + pol->xdo.type != XFRM_DEV_OFFLOAD_PACKET) { > > if (tmpl->id.spi && > > (x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi, > > tmpl->id.proto, encap_family)) != NULL) { > > @@ -1228,7 +1251,14 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > *err = -EAGAIN; > > x = NULL; > > } > > + if (x && xfrm_state_and_policy_mixed(x, pol)) { > > + *err = -EINVAL; > > + x = NULL; > > If policy and state do not match here, this means the lookup > returned the wrong state. The correct state might still sit > in the database. At this point, you should either have found > a matching state, or no state at all. I check for "x" because of "x = NULL" above. Thanks
On Mon, Nov 21, 2022 at 01:10:40PM +0100, Steffen Klassert wrote: > On Mon, Nov 21, 2022 at 01:34:30PM +0200, Leon Romanovsky wrote: > > > > Sorry, my bad. But why can't we drop all packets that don't have HW > > state? Why do we need to add larval? > > The first packet of a flow tiggers an acquire and inserts a larval > state. On a traffic triggered connection, we need this to get > a state with keys installed. > > We need this larval state then, because that tells us we sent already an > acquire to userspace. All subsequent packets of that flow will be > dropped without sending another acquire. Otherwise each subsequent > packet will generate another acquire until the keys are negotiated. > If a flow starts sending on a high rate, this would be not so nice > for userspace :) The thing is that this SW acquire flow is a fraction case, as it applies to locally generated traffic. In my mind, there are other cases, like eswitch mode and tunnel mode. In these cases, the packets are arrived to HW without even passing SW stack. What we want to do is to catch in HW all TX packets, which don't have SAs (applicable for all types of traffic), mark them and route back to the driver. The driver will be responsible to talk with XFRM core to generate acquire. The same logic of rerouting packets is required for audit and will be done later. Right now, we rely on *swan implementations which configure everything in advance. Also larval is default to 1 (drop) in all distros. I hope that this larval/acquire is not must for this series to be merged. And it is going to be implemented later as I'm assigned to work on this offload feature till feature complete :). Thanks
On Mon, Nov 21, 2022 at 03:21:45PM +0200, Leon Romanovsky wrote: > > The thing is that this SW acquire flow is a fraction case, as it applies > to locally generated traffic. A router can trigger an acquire on forwarded packets too. Without larvals this could quickly overwhelm the router. Cheers,
On Tue, Nov 22, 2022 at 12:29:12PM +0800, Herbert Xu wrote: > On Mon, Nov 21, 2022 at 03:21:45PM +0200, Leon Romanovsky wrote: > > > > The thing is that this SW acquire flow is a fraction case, as it applies > > to locally generated traffic. > > A router can trigger an acquire on forwarded packets too. Without > larvals this could quickly overwhelm the router. This series doesn't support tunnel mode yet. Maybe I was not clear, but I wanted to say what in eswitch case and tunnel mode, the packets will be handled purely by HW without raising into SW core. It is so called transparent IPsec, where all configuration is done on hypervisor, so VMs connected through eswitch will get already decrypted traffic which is routed through eswitch NIC logic without passing hypervisor data path. Steffen expected to see changes to acquire logic as part of this series and in my explanation, I tried to explain why it is not needed now and how will it be implemented later. Thanks > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Tue, Nov 22, 2022 at 08:27:48AM +0200, Leon Romanovsky wrote: > On Tue, Nov 22, 2022 at 12:29:12PM +0800, Herbert Xu wrote: > > On Mon, Nov 21, 2022 at 03:21:45PM +0200, Leon Romanovsky wrote: > > > > > > The thing is that this SW acquire flow is a fraction case, as it applies > > > to locally generated traffic. > > > > A router can trigger an acquire on forwarded packets too. Without > > larvals this could quickly overwhelm the router. > > This series doesn't support tunnel mode yet. It does not matter if tunnel or transport mode, acquires must work as expected. This is a fundamental concept of IPsec, there is no way to tell userspace that we don't support this. > Maybe I was not clear, but I wanted to say what in eswitch case and > tunnel mode, the packets will be handled purely by HW without raising > into SW core. Can you please explain why we need host interaction for transport, but not for tunnel mode? Staying away with HW policies and states from SW databases is what I wanted to have from the beginning. If that is possible for tunnel mode, it should be possible for transport mode too. And as said already, I want to see the full picture (transport + tunnel mode) before we merge it.
On Mon, Nov 21, 2022 at 03:01:42PM +0200, Leon Romanovsky wrote: > On Mon, Nov 21, 2022 at 01:43:49PM +0100, Steffen Klassert wrote: > > On Mon, Nov 21, 2022 at 02:02:52PM +0200, Leon Romanovsky wrote: > > > > > > I think that something like this will do the trick. > > > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > > > index 5076f9d7a752..d1c9ef857755 100644 > > > --- a/net/xfrm/xfrm_state.c > > > +++ b/net/xfrm/xfrm_state.c > > > @@ -1090,6 +1090,28 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, > > > } > > > } > > > > > > +static bool xfrm_state_and_policy_mixed(struct xfrm_state *x, > > > + struct xfrm_policy *p) > > > +{ > > > + /* Packet offload: both policy and SA should be offloaded */ > > > + if (p->xdo.type == XFRM_DEV_OFFLOAD_PACKET && > > > + x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > > > + return true; > > > + > > > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET && > > > + x->xso.type == XFRM_DEV_OFFLOAD_PACKET) > > > + return true; > > > + > > > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET) > > > + return false; > > > + > > > + /* Packet offload: both policy and SA should have same device */ > > > + if (p->xdo.dev != x->xso.dev) > > > + return true; > > > + > > > + return false; > > > +} > > > + > > > struct xfrm_state * > > > xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > const struct flowi *fl, struct xfrm_tmpl *tmpl, > > > @@ -1147,7 +1169,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > > > > found: > > > x = best; > > > - if (!x && !error && !acquire_in_progress) { > > > + if (!x && !error && !acquire_in_progress && > > > + pol->xdo.type != XFRM_DEV_OFFLOAD_PACKET) { > > > if (tmpl->id.spi && > > > (x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi, > > > tmpl->id.proto, encap_family)) != NULL) { > > > @@ -1228,7 +1251,14 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > *err = -EAGAIN; > > > x = NULL; > > > } > > > + if (x && xfrm_state_and_policy_mixed(x, pol)) { > > > + *err = -EINVAL; > > > + x = NULL; > > > > If policy and state do not match here, this means the lookup > > returned the wrong state. The correct state might still sit > > in the database. At this point, you should either have found > > a matching state, or no state at all. > > I check for "x" because of "x = NULL" above. This does not change the fact that the lookup returned the wrong state.
On Tue, Nov 22, 2022 at 02:00:02PM +0100, Steffen Klassert wrote: > On Tue, Nov 22, 2022 at 08:27:48AM +0200, Leon Romanovsky wrote: > > On Tue, Nov 22, 2022 at 12:29:12PM +0800, Herbert Xu wrote: > > > On Mon, Nov 21, 2022 at 03:21:45PM +0200, Leon Romanovsky wrote: > > > > > > > > The thing is that this SW acquire flow is a fraction case, as it applies > > > > to locally generated traffic. > > > > > > A router can trigger an acquire on forwarded packets too. Without > > > larvals this could quickly overwhelm the router. > > > > This series doesn't support tunnel mode yet. > > It does not matter if tunnel or transport mode, acquires must > work as expected. This is a fundamental concept of IPsec, there > is no way to tell userspace that we don't support this. > > > Maybe I was not clear, but I wanted to say what in eswitch case and > > tunnel mode, the packets will be handled purely by HW without raising > > into SW core. > > Can you please explain why we need host interaction for > transport, but not for tunnel mode? The main difference is that in transport mode, you must bring packet to the kernel in which you configured SA/policy. It means that we must ensure that such packets won't be checked again in SW because all packets (encrypted and not) pass XFRM logic. - wire -> RX NIC -> kernel -> XFRM stack (we need HW DB here to skip this stage) -> .... ... -> kernel -> XFRM stack (skip for HW SA/policies) -> TX NIC -> wire. In tunnel mode, we arrive to XFRM when nothing IPsec related is configured. - wire -> RX PF NIC -> eswitch NIC logic -> TX uplink NIC -> RX representors -> XFRM stack in VM (nothing configured here) -> kernel The most troublesome part is in TX path, where you must skip "double check" before NIC. This check doesn't exist in tunnel mode. In RX, there is also difference between modes due to how we are supposed to treat headers. Raed will add more details. > > And as said already, I want to see the full picture (transport > + tunnel mode) before we merge it. It looks like we already have this picture. Thanks
On Tue, Nov 22, 2022 at 02:10:02PM +0100, Steffen Klassert wrote: > On Mon, Nov 21, 2022 at 03:01:42PM +0200, Leon Romanovsky wrote: > > On Mon, Nov 21, 2022 at 01:43:49PM +0100, Steffen Klassert wrote: > > > On Mon, Nov 21, 2022 at 02:02:52PM +0200, Leon Romanovsky wrote: > > > > > > > > I think that something like this will do the trick. > > > > > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > > > > index 5076f9d7a752..d1c9ef857755 100644 > > > > --- a/net/xfrm/xfrm_state.c > > > > +++ b/net/xfrm/xfrm_state.c > > > > @@ -1090,6 +1090,28 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, > > > > } > > > > } > > > > > > > > +static bool xfrm_state_and_policy_mixed(struct xfrm_state *x, > > > > + struct xfrm_policy *p) > > > > +{ > > > > + /* Packet offload: both policy and SA should be offloaded */ > > > > + if (p->xdo.type == XFRM_DEV_OFFLOAD_PACKET && > > > > + x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > > > > + return true; > > > > + > > > > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET && > > > > + x->xso.type == XFRM_DEV_OFFLOAD_PACKET) > > > > + return true; > > > > + > > > > + if (p->xdo.type != XFRM_DEV_OFFLOAD_PACKET) > > > > + return false; > > > > + > > > > + /* Packet offload: both policy and SA should have same device */ > > > > + if (p->xdo.dev != x->xso.dev) > > > > + return true; > > > > + > > > > + return false; > > > > +} > > > > + > > > > struct xfrm_state * > > > > xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > > const struct flowi *fl, struct xfrm_tmpl *tmpl, > > > > @@ -1147,7 +1169,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > > > > > > found: > > > > x = best; > > > > - if (!x && !error && !acquire_in_progress) { > > > > + if (!x && !error && !acquire_in_progress && > > > > + pol->xdo.type != XFRM_DEV_OFFLOAD_PACKET) { > > > > if (tmpl->id.spi && > > > > (x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi, > > > > tmpl->id.proto, encap_family)) != NULL) { > > > > @@ -1228,7 +1251,14 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > > *err = -EAGAIN; > > > > x = NULL; > > > > } > > > > + if (x && xfrm_state_and_policy_mixed(x, pol)) { > > > > + *err = -EINVAL; > > > > + x = NULL; > > > > > > If policy and state do not match here, this means the lookup > > > returned the wrong state. The correct state might still sit > > > in the database. At this point, you should either have found > > > a matching state, or no state at all. > > > > I check for "x" because of "x = NULL" above. > > This does not change the fact that the lookup returned the wrong state. Steffen, but this is exactly why we added this check - to catch wrong states and configurations. If lookup was successful, we know that HW handles this packet, if lookup failed to find SA and we have HW policy, we should drop such packet. Thanks
On Tue, Nov 22, 2022 at 03:54:42PM +0200, Leon Romanovsky wrote: > On Tue, Nov 22, 2022 at 02:00:02PM +0100, Steffen Klassert wrote: > > On Tue, Nov 22, 2022 at 08:27:48AM +0200, Leon Romanovsky wrote: > > > On Tue, Nov 22, 2022 at 12:29:12PM +0800, Herbert Xu wrote: > > > > On Mon, Nov 21, 2022 at 03:21:45PM +0200, Leon Romanovsky wrote: > > > > Can you please explain why we need host interaction for > > transport, but not for tunnel mode? > > The main difference is that in transport mode, you must bring packet > to the kernel in which you configured SA/policy. It means that we must > ensure that such packets won't be checked again in SW because all packets > (encrypted and not) pass XFRM logic. > > - wire -> RX NIC -> kernel -> XFRM stack (we need HW DB here to skip this stage) -> .... > ... -> kernel -> XFRM stack (skip for HW SA/policies) -> TX NIC -> wire. > > In tunnel mode, we arrive to XFRM when nothing IPsec related is configured. > > - wire -> RX PF NIC -> eswitch NIC logic -> TX uplink NIC -> RX > representors -> XFRM stack in VM (nothing configured here) -> kernel Forget about eswitch, VM, etc. for a moment. I'm interested how the simplest possible tunnel mode cases will work. Forwarding: wire -> random NIC RX -> kernel -> IPsec tunnel offload NIC TX -> wire wire -> IPsec tunnel offload NIC RX -> kernel -> random NIC TX -> wire Local endpoints: Application -> kernel -> IPsec tunnel offload NIC TX -> wire wire -> IPsec tunnel offload NIC RX -> kernel -> Application These two must work, so how are these cases handled? If you can do more fancy things with tunnel mode and special NICs at TX and RX, that's fine but not absolutely required.
On Tue, Nov 22, 2022 at 03:57:43PM +0200, Leon Romanovsky wrote: > On Tue, Nov 22, 2022 at 02:10:02PM +0100, Steffen Klassert wrote: > > On Mon, Nov 21, 2022 at 03:01:42PM +0200, Leon Romanovsky wrote: > > > On Mon, Nov 21, 2022 at 01:43:49PM +0100, Steffen Klassert wrote: > > > > On Mon, Nov 21, 2022 at 02:02:52PM +0200, Leon Romanovsky wrote: > > > > > > > > If policy and state do not match here, this means the lookup > > > > returned the wrong state. The correct state might still sit > > > > in the database. At this point, you should either have found > > > > a matching state, or no state at all. > > > > > > I check for "x" because of "x = NULL" above. > > > > This does not change the fact that the lookup returned the wrong state. > > Steffen, but this is exactly why we added this check - to catch wrong > states and configurations. No, you have to adjust the lookup so that this can't happen. This is not a missconfiguration, The lookup found the wrong SA, this is a difference. Use the offload type and dev as a lookup key and don't consider SAs that don't match this in the lookup. This is really not too hard to do. The thing that could be a bit more difficult is that the lookup should be only adjusted when we really have HW policies installed. Otherwise this affects even systems that don't use this kind of offload.
On Wed, Nov 23, 2022 at 09:37:20AM +0100, Steffen Klassert wrote: > On Tue, Nov 22, 2022 at 03:57:43PM +0200, Leon Romanovsky wrote: > > On Tue, Nov 22, 2022 at 02:10:02PM +0100, Steffen Klassert wrote: > > > On Mon, Nov 21, 2022 at 03:01:42PM +0200, Leon Romanovsky wrote: > > > > On Mon, Nov 21, 2022 at 01:43:49PM +0100, Steffen Klassert wrote: > > > > > On Mon, Nov 21, 2022 at 02:02:52PM +0200, Leon Romanovsky wrote: > > > > > > > > > > If policy and state do not match here, this means the lookup > > > > > returned the wrong state. The correct state might still sit > > > > > in the database. At this point, you should either have found > > > > > a matching state, or no state at all. > > > > > > > > I check for "x" because of "x = NULL" above. > > > > > > This does not change the fact that the lookup returned the wrong state. > > > > Steffen, but this is exactly why we added this check - to catch wrong > > states and configurations. > > No, you have to adjust the lookup so that this can't happen. > This is not a missconfiguration, The lookup found the wrong > SA, this is a difference. > > Use the offload type and dev as a lookup key and don't consider > SAs that don't match this in the lookup. > > This is really not too hard to do. The thing that could be a bit > more difficult is that the lookup should be only adjusted when > we really have HW policies installed. Otherwise this affects > even systems that don't use this kind of offload. Thanks for an explanation, trying it now.
On Wed, Nov 23, 2022 at 09:23:58AM +0100, Steffen Klassert wrote: > On Tue, Nov 22, 2022 at 03:54:42PM +0200, Leon Romanovsky wrote: > > On Tue, Nov 22, 2022 at 02:00:02PM +0100, Steffen Klassert wrote: > > > On Tue, Nov 22, 2022 at 08:27:48AM +0200, Leon Romanovsky wrote: > > > > On Tue, Nov 22, 2022 at 12:29:12PM +0800, Herbert Xu wrote: > > > > > On Mon, Nov 21, 2022 at 03:21:45PM +0200, Leon Romanovsky wrote: > > > > > > Can you please explain why we need host interaction for > > > transport, but not for tunnel mode? > > > > The main difference is that in transport mode, you must bring packet > > to the kernel in which you configured SA/policy. It means that we must > > ensure that such packets won't be checked again in SW because all packets > > (encrypted and not) pass XFRM logic. > > > > - wire -> RX NIC -> kernel -> XFRM stack (we need HW DB here to skip this stage) -> .... > > ... -> kernel -> XFRM stack (skip for HW SA/policies) -> TX NIC -> wire. > > > > In tunnel mode, we arrive to XFRM when nothing IPsec related is configured. > > > > - wire -> RX PF NIC -> eswitch NIC logic -> TX uplink NIC -> RX > > representors -> XFRM stack in VM (nothing configured here) -> kernel > > Forget about eswitch, VM, etc. for a moment. I'm interested how the > simplest possible tunnel mode cases will work. > > Forwarding: > > wire -> random NIC RX -> kernel -> IPsec tunnel offload NIC TX -> wire > wire -> IPsec tunnel offload NIC RX -> kernel -> random NIC TX -> wire > > Local endpoints: > > Application -> kernel -> IPsec tunnel offload NIC TX -> wire > wire -> IPsec tunnel offload NIC RX -> kernel -> Application > > These two must work, so how are these cases handled? These two cases conceptually no different from transport modes. The difference is how HW handles IP packets. If packet comes from RX, it will be received as plain packet in the kernel. If packet goes to TX, it must be skipped in the XFRM. For all "wire -> IPsec tunnel offload NIC RX ...", everything works as you would expect. HW handles everything, and feeds the kernel with plain packet. These packets will have CRYPTO_DONE and status so they can skip all XFRM logic. All this complexity is For "... kernel -> IPsec tunnel offload NIC TX -> wire" flow. You need a way to say to the kernel that XFRM should be skipped. In TX path, we will need to perform neighbor resolution to fill proper MAC address for outer IP header. In RX path, once the packet is decrypted, there is a need to change MAC address for the inner IP header. This will be done by kernel as HW doesn't have such knowledge. Of course, there are many possible implementations of how to have right MAC address (static during SA creations, or dynamic if we listen to ARP events), but it is not XFRM related. Thanks > > If you can do more fancy things with tunnel mode and special NICs > at TX and RX, that's fine but not absolutely required.
On Wed, Nov 23, 2022 at 11:36:19AM +0200, Leon Romanovsky wrote: > On Wed, Nov 23, 2022 at 09:37:20AM +0100, Steffen Klassert wrote: > > On Tue, Nov 22, 2022 at 03:57:43PM +0200, Leon Romanovsky wrote: > > > On Tue, Nov 22, 2022 at 02:10:02PM +0100, Steffen Klassert wrote: > > > > On Mon, Nov 21, 2022 at 03:01:42PM +0200, Leon Romanovsky wrote: > > > > > On Mon, Nov 21, 2022 at 01:43:49PM +0100, Steffen Klassert wrote: > > > > > > On Mon, Nov 21, 2022 at 02:02:52PM +0200, Leon Romanovsky wrote: > > > > > > > > > > > > If policy and state do not match here, this means the lookup > > > > > > returned the wrong state. The correct state might still sit > > > > > > in the database. At this point, you should either have found > > > > > > a matching state, or no state at all. > > > > > > > > > > I check for "x" because of "x = NULL" above. > > > > > > > > This does not change the fact that the lookup returned the wrong state. > > > > > > Steffen, but this is exactly why we added this check - to catch wrong > > > states and configurations. > > > > No, you have to adjust the lookup so that this can't happen. > > This is not a missconfiguration, The lookup found the wrong > > SA, this is a difference. > > > > Use the offload type and dev as a lookup key and don't consider > > SAs that don't match this in the lookup. > > > > This is really not too hard to do. The thing that could be a bit > > more difficult is that the lookup should be only adjusted when > > we really have HW policies installed. Otherwise this affects > > even systems that don't use this kind of offload. > > Thanks for an explanation, trying it now. Something like that? The code is untested yet. diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 5076f9d7a752..5819023c32ba 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1115,6 +1115,19 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, rcu_read_lock(); h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { + if (IS_ENABLED(CONFIG_XFRM_OFFLOAD) && + pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { + if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) + /* HW states are in the head of list, there is no need + * to iterate further. + */ + break; + + /* Packet offload: both policy and SA should have same device */ + if (pol->xdo.dev != x->xso.dev) + continue; + } + if (x->props.family == encap_family && x->props.reqid == tmpl->reqid && (mark & x->mark.m) == x->mark.v && @@ -1132,6 +1145,19 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family); hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h_wildcard, bydst) { + if (IS_ENABLED(CONFIG_XFRM_OFFLOAD) && + pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { + if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) + /* HW states are in the head of list, there is no need + * to iterate further. + */ + break; + + /* Packet offload: both policy and SA should have same device */ + if (pol->xdo.dev != x->xso.dev) + continue; + } + if (x->props.family == encap_family && x->props.reqid == tmpl->reqid && (mark & x->mark.m) == x->mark.v && @@ -1185,6 +1211,17 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, goto out; } + if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { + memcpy(&x->xso, &pol->xdo, sizeof(x->xso)); + error = pol->xdo.dev->xfrmdev_ops->xdo_dev_state_add(x); + if (error) { + x->km.state = XFRM_STATE_DEAD; + to_put = x; + x = NULL; + goto out; + } + } + if (km_query(x, tmpl, pol) == 0) { spin_lock_bh(&net->xfrm.xfrm_state_lock); x->km.state = XFRM_STATE_ACQ;
On Wed, Nov 23, 2022 at 02:53:10PM +0200, Leon Romanovsky wrote: > On Wed, Nov 23, 2022 at 11:36:19AM +0200, Leon Romanovsky wrote: > > Thanks for an explanation, trying it now. > > Something like that? Yes :) > > The code is untested yet. > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index 5076f9d7a752..5819023c32ba 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1115,6 +1115,19 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > rcu_read_lock(); > h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); > hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { > + if (IS_ENABLED(CONFIG_XFRM_OFFLOAD) && > + pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { Please try to avoid that check for every state in the list. Maybe enable this code with a static key if packet offload is used? > + if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > + /* HW states are in the head of list, there is no need > + * to iterate further. > + */ > + break; > + > + /* Packet offload: both policy and SA should have same device */ > + if (pol->xdo.dev != x->xso.dev) > + continue; > + } > + > if (x->props.family == encap_family && > x->props.reqid == tmpl->reqid && > (mark & x->mark.m) == x->mark.v && > @@ -1132,6 +1145,19 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family); > hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h_wildcard, bydst) { > + if (IS_ENABLED(CONFIG_XFRM_OFFLOAD) && > + pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { > + if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > + /* HW states are in the head of list, there is no need > + * to iterate further. > + */ > + break; > + > + /* Packet offload: both policy and SA should have same device */ > + if (pol->xdo.dev != x->xso.dev) > + continue; > + } > + > if (x->props.family == encap_family && > x->props.reqid == tmpl->reqid && > (mark & x->mark.m) == x->mark.v && > @@ -1185,6 +1211,17 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > goto out; > } > > + if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { > + memcpy(&x->xso, &pol->xdo, sizeof(x->xso)); > + error = pol->xdo.dev->xfrmdev_ops->xdo_dev_state_add(x); > + if (error) { > + x->km.state = XFRM_STATE_DEAD; > + to_put = x; > + x = NULL; > + goto out; > + } > + } I guess that is to handle acquires, right? What is the idea behind that? xdo_dev_state_add sets offload type and dev?
On Thu, Nov 24, 2022 at 12:07:48PM +0100, Steffen Klassert wrote: > On Wed, Nov 23, 2022 at 02:53:10PM +0200, Leon Romanovsky wrote: > > On Wed, Nov 23, 2022 at 11:36:19AM +0200, Leon Romanovsky wrote: > > > Thanks for an explanation, trying it now. > > > > Something like that? > > Yes :) Great, will send proper version on Sunday. > > > > > The code is untested yet. > > > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > > index 5076f9d7a752..5819023c32ba 100644 > > --- a/net/xfrm/xfrm_state.c > > +++ b/net/xfrm/xfrm_state.c > > @@ -1115,6 +1115,19 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > rcu_read_lock(); > > h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); > > hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { > > + if (IS_ENABLED(CONFIG_XFRM_OFFLOAD) && > > + pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { > > Please try to avoid that check for every state in the list. > Maybe enable this code with a static key if packet offload > is used? I assumed that it will be optimized by compiled because "pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET)" is constant here. I'll take a look for more fancy solutions, but I have serious doubts if they give any benefits. > > > + if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > > + /* HW states are in the head of list, there is no need > > + * to iterate further. > > + */ > > + break; > > + > > + /* Packet offload: both policy and SA should have same device */ > > + if (pol->xdo.dev != x->xso.dev) > > + continue; > > + } > > + > > if (x->props.family == encap_family && > > x->props.reqid == tmpl->reqid && > > (mark & x->mark.m) == x->mark.v && > > @@ -1132,6 +1145,19 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > > > h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family); > > hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h_wildcard, bydst) { > > + if (IS_ENABLED(CONFIG_XFRM_OFFLOAD) && > > + pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { > > + if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) > > + /* HW states are in the head of list, there is no need > > + * to iterate further. > > + */ > > + break; > > + > > + /* Packet offload: both policy and SA should have same device */ > > + if (pol->xdo.dev != x->xso.dev) > > + continue; > > + } > > + > > if (x->props.family == encap_family && > > x->props.reqid == tmpl->reqid && > > (mark & x->mark.m) == x->mark.v && > > @@ -1185,6 +1211,17 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, > > goto out; > > } > > > > + if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { > > + memcpy(&x->xso, &pol->xdo, sizeof(x->xso)); > > + error = pol->xdo.dev->xfrmdev_ops->xdo_dev_state_add(x); > > + if (error) { > > + x->km.state = XFRM_STATE_DEAD; > > + to_put = x; > > + x = NULL; > > + goto out; > > + } > > + } > > I guess that is to handle acquires, right? Yes > What is the idea behind that? In previous patches, I made sure that policy and SA uses same struct xfrm_dev_offload and set fields exactly the same. This line sets all properties:: memcpy(&x->xso, &pol->xdo, sizeof(x->xso)); And .xdo_dev_state_add() gets everything pre-configured. But yes, it will be different in final patch to make sure that offload_handle is cleared and dev_tracker is valid. Thanks
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 06226942a152..93a4a9149f8c 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -536,7 +536,7 @@ static void xfrm_dst_hash_transfer(struct net *net, __get_hash_thresh(net, pol->family, dir, &dbits, &sbits); h = __addr_hash(&pol->selector.daddr, &pol->selector.saddr, pol->family, nhashmask, dbits, sbits); - if (!entry0) { + if (!entry0 || pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { hlist_del_rcu(&pol->bydst); hlist_add_head_rcu(&pol->bydst, ndsttable + h); h0 = h; @@ -867,7 +867,7 @@ static void xfrm_policy_inexact_list_reinsert(struct net *net, break; } - if (newpos) + if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET) hlist_add_behind_rcu(&policy->bydst, newpos); else hlist_add_head_rcu(&policy->bydst, &n->hhead); @@ -1348,7 +1348,7 @@ static void xfrm_hash_rebuild(struct work_struct *work) else break; } - if (newpos) + if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET) hlist_add_behind_rcu(&policy->bydst, newpos); else hlist_add_head_rcu(&policy->bydst, chain); @@ -1525,7 +1525,7 @@ static void xfrm_policy_insert_inexact_list(struct hlist_head *chain, break; } - if (newpos) + if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET) hlist_add_behind_rcu(&policy->bydst_inexact_list, newpos); else hlist_add_head_rcu(&policy->bydst_inexact_list, chain); @@ -1562,9 +1562,12 @@ static struct xfrm_policy *xfrm_policy_insert_list(struct hlist_head *chain, break; } - if (newpos) + if (newpos && policy->xdo.type != XFRM_DEV_OFFLOAD_PACKET) hlist_add_behind_rcu(&policy->bydst, &newpos->bydst); else + /* Packet offload policies enter to the head + * to speed-up lookups. + */ hlist_add_head_rcu(&policy->bydst, chain); return delpol; @@ -2181,6 +2184,9 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type, break; } } + if (ret && ret->xdo.type == XFRM_DEV_OFFLOAD_PACKET) + goto skip_inexact; + bin = xfrm_policy_inexact_lookup_rcu(net, type, family, dir, if_id); if (!bin || !xfrm_policy_find_inexact_candidates(&cand, bin, saddr, daddr)) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 3d2fe7712ac5..cfc8c72b173d 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -84,6 +84,25 @@ static unsigned int xfrm_seq_hash(struct net *net, u32 seq) return __xfrm_seq_hash(seq, net->xfrm.state_hmask); } +#define XFRM_STATE_INSERT(by, _n, _h, _type) \ + { \ + struct xfrm_state *_x = NULL; \ + \ + if (_type != XFRM_DEV_OFFLOAD_PACKET) { \ + hlist_for_each_entry_rcu(_x, _h, by) { \ + if (_x->xso.type == XFRM_DEV_OFFLOAD_PACKET) \ + continue; \ + break; \ + } \ + } \ + \ + if (!_x || _x->xso.type == XFRM_DEV_OFFLOAD_PACKET) \ + /* SAD is empty or consist from HW SAs only */ \ + hlist_add_head_rcu(_n, _h); \ + else \ + hlist_add_before_rcu(_n, &_x->by); \ + } + static void xfrm_hash_transfer(struct hlist_head *list, struct hlist_head *ndsttable, struct hlist_head *nsrctable, @@ -100,23 +119,25 @@ static void xfrm_hash_transfer(struct hlist_head *list, h = __xfrm_dst_hash(&x->id.daddr, &x->props.saddr, x->props.reqid, x->props.family, nhashmask); - hlist_add_head_rcu(&x->bydst, ndsttable + h); + XFRM_STATE_INSERT(bydst, &x->bydst, ndsttable + h, x->xso.type); h = __xfrm_src_hash(&x->id.daddr, &x->props.saddr, x->props.family, nhashmask); - hlist_add_head_rcu(&x->bysrc, nsrctable + h); + XFRM_STATE_INSERT(bysrc, &x->bysrc, nsrctable + h, x->xso.type); if (x->id.spi) { h = __xfrm_spi_hash(&x->id.daddr, x->id.spi, x->id.proto, x->props.family, nhashmask); - hlist_add_head_rcu(&x->byspi, nspitable + h); + XFRM_STATE_INSERT(byspi, &x->byspi, nspitable + h, + x->xso.type); } if (x->km.seq) { h = __xfrm_seq_hash(x->km.seq, nhashmask); - hlist_add_head_rcu(&x->byseq, nseqtable + h); + XFRM_STATE_INSERT(byseq, &x->byseq, nseqtable + h, + x->xso.type); } } } @@ -1166,16 +1187,24 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, spin_lock_bh(&net->xfrm.xfrm_state_lock); x->km.state = XFRM_STATE_ACQ; list_add(&x->km.all, &net->xfrm.state_all); - hlist_add_head_rcu(&x->bydst, net->xfrm.state_bydst + h); + XFRM_STATE_INSERT(bydst, &x->bydst, + net->xfrm.state_bydst + h, + x->xso.type); h = xfrm_src_hash(net, daddr, saddr, encap_family); - hlist_add_head_rcu(&x->bysrc, net->xfrm.state_bysrc + h); + XFRM_STATE_INSERT(bysrc, &x->bysrc, + net->xfrm.state_bysrc + h, + x->xso.type); if (x->id.spi) { h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family); - hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h); + XFRM_STATE_INSERT(byspi, &x->byspi, + net->xfrm.state_byspi + h, + x->xso.type); } if (x->km.seq) { h = xfrm_seq_hash(net, x->km.seq); - hlist_add_head_rcu(&x->byseq, net->xfrm.state_byseq + h); + XFRM_STATE_INSERT(byseq, &x->byseq, + net->xfrm.state_byseq + h, + x->xso.type); } x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires; hrtimer_start(&x->mtimer, @@ -1280,22 +1309,26 @@ static void __xfrm_state_insert(struct xfrm_state *x) h = xfrm_dst_hash(net, &x->id.daddr, &x->props.saddr, x->props.reqid, x->props.family); - hlist_add_head_rcu(&x->bydst, net->xfrm.state_bydst + h); + XFRM_STATE_INSERT(bydst, &x->bydst, net->xfrm.state_bydst + h, + x->xso.type); h = xfrm_src_hash(net, &x->id.daddr, &x->props.saddr, x->props.family); - hlist_add_head_rcu(&x->bysrc, net->xfrm.state_bysrc + h); + XFRM_STATE_INSERT(bysrc, &x->bysrc, net->xfrm.state_bysrc + h, + x->xso.type); if (x->id.spi) { h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, x->props.family); - hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h); + XFRM_STATE_INSERT(byspi, &x->byspi, net->xfrm.state_byspi + h, + x->xso.type); } if (x->km.seq) { h = xfrm_seq_hash(net, x->km.seq); - hlist_add_head_rcu(&x->byseq, net->xfrm.state_byseq + h); + XFRM_STATE_INSERT(byseq, &x->byseq, net->xfrm.state_byseq + h, + x->xso.type); } hrtimer_start(&x->mtimer, ktime_set(1, 0), HRTIMER_MODE_REL_SOFT); @@ -1409,9 +1442,11 @@ static struct xfrm_state *__find_acq_core(struct net *net, ktime_set(net->xfrm.sysctl_acq_expires, 0), HRTIMER_MODE_REL_SOFT); list_add(&x->km.all, &net->xfrm.state_all); - hlist_add_head_rcu(&x->bydst, net->xfrm.state_bydst + h); + XFRM_STATE_INSERT(bydst, &x->bydst, net->xfrm.state_bydst + h, + x->xso.type); h = xfrm_src_hash(net, daddr, saddr, family); - hlist_add_head_rcu(&x->bysrc, net->xfrm.state_bysrc + h); + XFRM_STATE_INSERT(bysrc, &x->bysrc, net->xfrm.state_bysrc + h, + x->xso.type); net->xfrm.state_num++; @@ -2085,7 +2120,8 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high) spin_lock_bh(&net->xfrm.xfrm_state_lock); x->id.spi = newspi; h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, x->props.family); - hlist_add_head_rcu(&x->byspi, net->xfrm.state_byspi + h); + XFRM_STATE_INSERT(byspi, &x->byspi, net->xfrm.state_byspi + h, + x->xso.type); spin_unlock_bh(&net->xfrm.xfrm_state_lock); err = 0;