diff mbox series

[v3] RDMA/core: Fix race when resolving IP address

Message ID 1562584584-13132-1-git-send-email-dag.moxnes@oracle.com (mailing list archive)
State Superseded
Delegated to: Jason Gunthorpe
Headers show
Series [v3] RDMA/core: Fix race when resolving IP address | expand

Commit Message

Dag Moxnes July 8, 2019, 11:16 a.m. UTC
Use neighbour lock when copying MAC address from neighbour data struct
in dst_fetch_ha.

When not using the lock, it is possible for the function to race with
neigh_update, causing it to copy an invalid MAC address.

It is possible to provoke this error by calling rdma_resolve_addr in a
tight loop, while deleting the corresponding ARP entry in another tight
loop.

This will cause the race shown it the following sample trace:

rdma_resolve_addr()
  rdma_resolve_ip()
    addr_resolve()
      addr_resolve_neigh()
        fetch_ha()
          dst_fetch_ha()
            n->nud_state == NUD_VALID

and

net_ioctl()
  arp_ioctl()
    arp_rec_delete()
      arp_invalidate()
        neigh_update()
          __neigh_update()
            neigh->nud_state = new;

Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
---
v1 -> v2:
   * Modified implementation to improve readability

v2 -> v3:
   * Added sample trace as suggested by Parav Pandit
   * Added Reviewed-by


---
 drivers/infiniband/core/addr.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe July 8, 2019, 5:50 p.m. UTC | #1
On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
> Use neighbour lock when copying MAC address from neighbour data struct
> in dst_fetch_ha.
> 
> When not using the lock, it is possible for the function to race with
> neigh_update, causing it to copy an invalid MAC address.
> 
> It is possible to provoke this error by calling rdma_resolve_addr in a
> tight loop, while deleting the corresponding ARP entry in another tight
> loop.
> 
> This will cause the race shown it the following sample trace:
> 
> rdma_resolve_addr()
>   rdma_resolve_ip()
>     addr_resolve()
>       addr_resolve_neigh()
>         fetch_ha()
>           dst_fetch_ha()
>             n->nud_state == NUD_VALID

It isn't nud_state that is the problem here, it is the parallel
memcpy's onto ha. I fixed the commit message

This could also have been solved by using the ha_lock, but I don't
think we have a reason to particularly over-optimize this.

>  drivers/infiniband/core/addr.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Applied to for-next, thanks

Jason
Dag Moxnes July 8, 2019, 6:47 p.m. UTC | #2
Thanks Jason,

Regards,
Dag

Den 08.07.2019 19:50, skrev Jason Gunthorpe:
> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
>> Use neighbour lock when copying MAC address from neighbour data struct
>> in dst_fetch_ha.
>>
>> When not using the lock, it is possible for the function to race with
>> neigh_update, causing it to copy an invalid MAC address.
>>
>> It is possible to provoke this error by calling rdma_resolve_addr in a
>> tight loop, while deleting the corresponding ARP entry in another tight
>> loop.
>>
>> This will cause the race shown it the following sample trace:
>>
>> rdma_resolve_addr()
>>    rdma_resolve_ip()
>>      addr_resolve()
>>        addr_resolve_neigh()
>>          fetch_ha()
>>            dst_fetch_ha()
>>              n->nud_state == NUD_VALID
> It isn't nud_state that is the problem here, it is the parallel
> memcpy's onto ha. I fixed the commit message
>
> This could also have been solved by using the ha_lock, but I don't
> think we have a reason to particularly over-optimize this.
>
>>   drivers/infiniband/core/addr.c | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
> Applied to for-next, thanks
>
> Jason
Mark Bloch July 8, 2019, 7:22 p.m. UTC | #3
On 7/8/19 11:47 AM, Dag Moxnes wrote:
> Thanks Jason,
> 
> Regards,
> Dag
> 
> Den 08.07.2019 19:50, skrev Jason Gunthorpe:
>> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
>>> Use neighbour lock when copying MAC address from neighbour data struct
>>> in dst_fetch_ha.
>>>
>>> When not using the lock, it is possible for the function to race with
>>> neigh_update, causing it to copy an invalid MAC address.
>>>
>>> It is possible to provoke this error by calling rdma_resolve_addr in a
>>> tight loop, while deleting the corresponding ARP entry in another tight
>>> loop.
>>>
>>> This will cause the race shown it the following sample trace:
>>>
>>> rdma_resolve_addr()
>>>    rdma_resolve_ip()
>>>      addr_resolve()
>>>        addr_resolve_neigh()
>>>          fetch_ha()
>>>            dst_fetch_ha()
>>>              n->nud_state == NUD_VALID
>> It isn't nud_state that is the problem here, it is the parallel
>> memcpy's onto ha. I fixed the commit message
>>
>> This could also have been solved by using the ha_lock, but I don't
>> think we have a reason to particularly over-optimize this.

Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?

>>
>>>   drivers/infiniband/core/addr.c | 9 ++++++---
>>>   1 file changed, 6 insertions(+), 3 deletions(-)
>> Applied to for-next, thanks
>>
>> Jason
> 

Mark
Jason Gunthorpe July 8, 2019, 7:38 p.m. UTC | #4
On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote:
> 
> 
> On 7/8/19 11:47 AM, Dag Moxnes wrote:
> > Thanks Jason,
> > 
> > Regards,
> > Dag
> > 
> > Den 08.07.2019 19:50, skrev Jason Gunthorpe:
> >> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
> >>> Use neighbour lock when copying MAC address from neighbour data struct
> >>> in dst_fetch_ha.
> >>>
> >>> When not using the lock, it is possible for the function to race with
> >>> neigh_update, causing it to copy an invalid MAC address.
> >>>
> >>> It is possible to provoke this error by calling rdma_resolve_addr in a
> >>> tight loop, while deleting the corresponding ARP entry in another tight
> >>> loop.
> >>>
> >>> This will cause the race shown it the following sample trace:
> >>>
> >>> rdma_resolve_addr()
> >>>    rdma_resolve_ip()
> >>>      addr_resolve()
> >>>        addr_resolve_neigh()
> >>>          fetch_ha()
> >>>            dst_fetch_ha()
> >>>              n->nud_state == NUD_VALID
> >> It isn't nud_state that is the problem here, it is the parallel
> >> memcpy's onto ha. I fixed the commit message
> >>
> >> This could also have been solved by using the ha_lock, but I don't
> >> think we have a reason to particularly over-optimize this.
> 
> Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?

Yes, that is much better, please respin this

Thanks,
Jason
Dag Moxnes July 8, 2019, 8:11 p.m. UTC | #5
Den 08.07.2019 21:38, skrev Jason Gunthorpe:
> On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote:
>>
>> On 7/8/19 11:47 AM, Dag Moxnes wrote:
>>> Thanks Jason,
>>>
>>> Regards,
>>> Dag
>>>
>>> Den 08.07.2019 19:50, skrev Jason Gunthorpe:
>>>> On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
>>>>> Use neighbour lock when copying MAC address from neighbour data struct
>>>>> in dst_fetch_ha.
>>>>>
>>>>> When not using the lock, it is possible for the function to race with
>>>>> neigh_update, causing it to copy an invalid MAC address.
>>>>>
>>>>> It is possible to provoke this error by calling rdma_resolve_addr in a
>>>>> tight loop, while deleting the corresponding ARP entry in another tight
>>>>> loop.
>>>>>
>>>>> This will cause the race shown it the following sample trace:
>>>>>
>>>>> rdma_resolve_addr()
>>>>>     rdma_resolve_ip()
>>>>>       addr_resolve()
>>>>>         addr_resolve_neigh()
>>>>>           fetch_ha()
>>>>>             dst_fetch_ha()
>>>>>               n->nud_state == NUD_VALID
>>>> It isn't nud_state that is the problem here, it is the parallel
>>>> memcpy's onto ha. I fixed the commit message
>>>>
>>>> This could also have been solved by using the ha_lock, but I don't
>>>> think we have a reason to particularly over-optimize this.
>> Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?
> Yes, that is much better, please respin this
OK, will do!
Can I still post it as a v4? Or should I do it differently as you 
already applied it?

Regards,
-Dag
>
> Thanks,
> Jason
Jason Gunthorpe July 8, 2019, 11:05 p.m. UTC | #6
On Mon, Jul 08, 2019 at 10:11:29PM +0200, Dag Moxnes wrote:
> 
> 
> Den 08.07.2019 21:38, skrev Jason Gunthorpe:
> > On Mon, Jul 08, 2019 at 07:22:45PM +0000, Mark Bloch wrote:
> > > 
> > > On 7/8/19 11:47 AM, Dag Moxnes wrote:
> > > > Thanks Jason,
> > > > 
> > > > Regards,
> > > > Dag
> > > > 
> > > > Den 08.07.2019 19:50, skrev Jason Gunthorpe:
> > > > > On Mon, Jul 08, 2019 at 01:16:24PM +0200, Dag Moxnes wrote:
> > > > > > Use neighbour lock when copying MAC address from neighbour data struct
> > > > > > in dst_fetch_ha.
> > > > > > 
> > > > > > When not using the lock, it is possible for the function to race with
> > > > > > neigh_update, causing it to copy an invalid MAC address.
> > > > > > 
> > > > > > It is possible to provoke this error by calling rdma_resolve_addr in a
> > > > > > tight loop, while deleting the corresponding ARP entry in another tight
> > > > > > loop.
> > > > > > 
> > > > > > This will cause the race shown it the following sample trace:
> > > > > > 
> > > > > > rdma_resolve_addr()
> > > > > >     rdma_resolve_ip()
> > > > > >       addr_resolve()
> > > > > >         addr_resolve_neigh()
> > > > > >           fetch_ha()
> > > > > >             dst_fetch_ha()
> > > > > >               n->nud_state == NUD_VALID
> > > > > It isn't nud_state that is the problem here, it is the parallel
> > > > > memcpy's onto ha. I fixed the commit message
> > > > > 
> > > > > This could also have been solved by using the ha_lock, but I don't
> > > > > think we have a reason to particularly over-optimize this.
> > > Sorry I'm late to the party, but why not just use: neigh_ha_snapshot()?
> > Yes, that is much better, please respin this
> OK, will do!
> Can I still post it as a v4? Or should I do it differently as you already
> applied it?

post a v4

Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 2f7d141598..51323ffbc5 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -333,11 +333,14 @@  static int dst_fetch_ha(const struct dst_entry *dst,
 	if (!n)
 		return -ENODATA;
 
-	if (!(n->nud_state & NUD_VALID)) {
+	read_lock_bh(&n->lock);
+	if (n->nud_state & NUD_VALID) {
+		memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
+		read_unlock_bh(&n->lock);
+	} else {
+		read_unlock_bh(&n->lock);
 		neigh_event_send(n, NULL);
 		ret = -ENODATA;
-	} else {
-		memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
 	}
 
 	neigh_release(n);