Message ID | 1562584584-13132-1-git-send-email-dag.moxnes@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [v3] RDMA/core: Fix race when resolving IP address | expand |
On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: > Use neighbour lock when copying MAC address from neighbour data struct > in dst_fetch_ha. > > When not using the lock, it is possible for the function to race with > neigh_update, causing it to copy an invalid MAC address. > > It is possible to provoke this error by calling rdma_resolve_addr in a > tight loop, while deleting the corresponding ARP entry in another tight > loop. > > This will cause the race shown it the following sample trace: > > rdma_resolve_addr() > rdma_resolve_ip() > addr_resolve() > addr_resolve_neigh() > fetch_ha() > dst_fetch_ha() > n->nud_state == NUD_VALID It isn't nud_state that is the problem here, it is the parallel memcpy's onto ha. I fixed the commit message This could also have been solved by using the ha_lock, but I don't think we have a reason to particularly over-optimize this. > drivers/infiniband/core/addr.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) Applied to for-next, thanks Jason
Thanks Jason, Regards, Dag Den 08.07.2019 19:50, skrev Jason Gunthorpe: > On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: >> Use neighbour lock when copying MAC address from neighbour data struct >> in dst_fetch_ha. >> >> When not using the lock, it is possible for the function to race with >> neigh_update, causing it to copy an invalid MAC address. >> >> It is possible to provoke this error by calling rdma_resolve_addr in a >> tight loop, while deleting the corresponding ARP entry in another tight >> loop. >> >> This will cause the race shown it the following sample trace: >> >> rdma_resolve_addr() >> rdma_resolve_ip() >> addr_resolve() >> addr_resolve_neigh() >> fetch_ha() >> dst_fetch_ha() >> n->nud_state == NUD_VALID > It isn't nud_state that is the problem here, it is the parallel > memcpy's onto ha. I fixed the commit message > > This could also have been solved by using the ha_lock, but I don't > think we have a reason to particularly over-optimize this. > >> drivers/infiniband/core/addr.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) > Applied to for-next, thanks > > Jason
On 7/8/19 11:47 AM, Dag Moxnes wrote: > Thanks Jason, > > Regards, > Dag > > Den 08.07.2019 19:50, skrev Jason Gunthorpe: >> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: >>> Use neighbour lock when copying MAC address from neighbour data struct >>> in dst_fetch_ha. >>> >>> When not using the lock, it is possible for the function to race with >>> neigh_update, causing it to copy an invalid MAC address. >>> >>> It is possible to provoke this error by calling rdma_resolve_addr in a >>> tight loop, while deleting the corresponding ARP entry in another tight >>> loop. >>> >>> This will cause the race shown it the following sample trace: >>> >>> rdma_resolve_addr() >>> rdma_resolve_ip() >>> addr_resolve() >>> addr_resolve_neigh() >>> fetch_ha() >>> dst_fetch_ha() >>> n->nud_state == NUD_VALID >> It isn't nud_state that is the problem here, it is the parallel >> memcpy's onto ha. I fixed the commit message >> >> This could also have been solved by using the ha_lock, but I don't >> think we have a reason to particularly over-optimize this. Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()? >> >>> drivers/infiniband/core/addr.c | 9 ++++++--- >>> 1 file changed, 6 insertions(+), 3 deletions(-) >> Applied to for-next, thanks >> >> Jason > Mark
On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote: > > > On 7/8/19 11:47 AM, Dag Moxnes wrote: > > Thanks Jason, > > > > Regards, > > Dag > > > > Den 08.07.2019 19:50, skrev Jason Gunthorpe: > >> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: > >>> Use neighbour lock when copying MAC address from neighbour data struct > >>> in dst_fetch_ha. > >>> > >>> When not using the lock, it is possible for the function to race with > >>> neigh_update, causing it to copy an invalid MAC address. > >>> > >>> It is possible to provoke this error by calling rdma_resolve_addr in a > >>> tight loop, while deleting the corresponding ARP entry in another tight > >>> loop. > >>> > >>> This will cause the race shown it the following sample trace: > >>> > >>> rdma_resolve_addr() > >>> rdma_resolve_ip() > >>> addr_resolve() > >>> addr_resolve_neigh() > >>> fetch_ha() > >>> dst_fetch_ha() > >>> n->nud_state == NUD_VALID > >> It isn't nud_state that is the problem here, it is the parallel > >> memcpy's onto ha. I fixed the commit message > >> > >> This could also have been solved by using the ha_lock, but I don't > >> think we have a reason to particularly over-optimize this. > > Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()? Yes, that is much better, please respin this Thanks, Jason
Den 08.07.2019 21:38, skrev Jason Gunthorpe: > On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote: >> >> On 7/8/19 11:47 AM, Dag Moxnes wrote: >>> Thanks Jason, >>> >>> Regards, >>> Dag >>> >>> Den 08.07.2019 19:50, skrev Jason Gunthorpe: >>>> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: >>>>> Use neighbour lock when copying MAC address from neighbour data struct >>>>> in dst_fetch_ha. >>>>> >>>>> When not using the lock, it is possible for the function to race with >>>>> neigh_update, causing it to copy an invalid MAC address. >>>>> >>>>> It is possible to provoke this error by calling rdma_resolve_addr in a >>>>> tight loop, while deleting the corresponding ARP entry in another tight >>>>> loop. >>>>> >>>>> This will cause the race shown it the following sample trace: >>>>> >>>>> rdma_resolve_addr() >>>>> rdma_resolve_ip() >>>>> addr_resolve() >>>>> addr_resolve_neigh() >>>>> fetch_ha() >>>>> dst_fetch_ha() >>>>> n->nud_state == NUD_VALID >>>> It isn't nud_state that is the problem here, it is the parallel >>>> memcpy's onto ha. I fixed the commit message >>>> >>>> This could also have been solved by using the ha_lock, but I don't >>>> think we have a reason to particularly over-optimize this. >> Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()? > Yes, that is much better, please respin this OK, will do! Can I still post it as a v4? Or should I do it differently as you already applied it? Regards, -Dag > > Thanks, > Jason
On Mon, Jul 08, 2019 at 10:11:29PM +0200, Dag Moxnes wrote: > > > Den 08.07.2019 21:38, skrev Jason Gunthorpe: > > On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote: > > > > > > On 7/8/19 11:47 AM, Dag Moxnes wrote: > > > > Thanks Jason, > > > > > > > > Regards, > > > > Dag > > > > > > > > Den 08.07.2019 19:50, skrev Jason Gunthorpe: > > > > > On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote: > > > > > > Use neighbour lock when copying MAC address from neighbour data struct > > > > > > in dst_fetch_ha. > > > > > > > > > > > > When not using the lock, it is possible for the function to race with > > > > > > neigh_update, causing it to copy an invalid MAC address. > > > > > > > > > > > > It is possible to provoke this error by calling rdma_resolve_addr in a > > > > > > tight loop, while deleting the corresponding ARP entry in another tight > > > > > > loop. > > > > > > > > > > > > This will cause the race shown it the following sample trace: > > > > > > > > > > > > rdma_resolve_addr() > > > > > > rdma_resolve_ip() > > > > > > addr_resolve() > > > > > > addr_resolve_neigh() > > > > > > fetch_ha() > > > > > > dst_fetch_ha() > > > > > > n->nud_state == NUD_VALID > > > > > It isn't nud_state that is the problem here, it is the parallel > > > > > memcpy's onto ha. I fixed the commit message > > > > > > > > > > This could also have been solved by using the ha_lock, but I don't > > > > > think we have a reason to particularly over-optimize this. > > > Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()? > > Yes, that is much better, please respin this > OK, will do! > Can I still post it as a v4? Or should I do it differently as you already > applied it? post a v4 Jason
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 2f7d141598..51323ffbc5 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -333,11 +333,14 @@ static int dst_fetch_ha(const struct dst_entry *dst, if (!n) return -ENODATA; - if (!(n->nud_state & NUD_VALID)) { + read_lock_bh(&n->lock); + if (n->nud_state & NUD_VALID) { + memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN); + read_unlock_bh(&n->lock); + } else { + read_unlock_bh(&n->lock); neigh_event_send(n, NULL); ret = -ENODATA; - } else { - memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN); } neigh_release(n);