diff mbox

xen-netfront crash when detaching network while some network activity

Message ID 20160122192343.GL31058@mail-itl (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Marczykowski-Górecki Jan. 22, 2016, 7:23 p.m. UTC
On Thu, Jan 21, 2016 at 12:30:48PM +0000, Joao Martins wrote:
> 
> 
> On 01/20/2016 09:59 PM, Konrad Rzeszutek Wilk wrote:
> > On Tue, Dec 01, 2015 at 11:32:58PM +0100, Marek Marczykowski-Górecki wrote:
> >> On Tue, Dec 01, 2015 at 05:00:42PM -0500, Konrad Rzeszutek Wilk wrote:
> >>> On Tue, Nov 17, 2015 at 03:45:15AM +0100, Marek Marczykowski-Górecki wrote:
> >>>> On Wed, Oct 21, 2015 at 08:57:34PM +0200, Marek Marczykowski-Górecki wrote:
> >>>>> On Wed, May 27, 2015 at 12:03:12AM +0200, Marek Marczykowski-Górecki wrote:
> >>>>>> On Tue, May 26, 2015 at 11:56:00AM +0100, David Vrabel wrote:
> >>>>>>> On 22/05/15 12:49, Marek Marczykowski-Górecki wrote:
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I'm experiencing xen-netfront crash when doing xl network-detach while
> >>>>>>>> some network activity is going on at the same time. It happens only when
> >>>>>>>> domU has more than one vcpu. Not sure if this matters, but the backend
> >>>>>>>> is in another domU (not dom0). I'm using Xen 4.2.2. It happens on kernel
> >>>>>>>> 3.9.4 and 4.1-rc1 as well.
> >>>>>>>>
> >>>>>>>> Steps to reproduce:
> >>>>>>>> 1. Start the domU with some network interface
> >>>>>>>> 2. Call there 'ping -f some-IP'
> >>>>>>>> 3. Call 'xl network-detach NAME 0'
> >>>
> >>> Do you see this all the time or just on occassions?
> >>
> >> Using above procedure - all the time.
> >>
> >>> I tried to reproduce it and couldn't see it. Is your VM an PV or HVM?
> >>
> >> PV, started by libvirt. This may have something to do, the problem didn't
> >> existed on older Xen (4.1) and started by xl. I'm not sure about kernel
> >> version there, but I think I've tried there 3.18 too, which has this
> >> problem.
> >>
> >> But I don't see anything special in domU config file (neither backend
> >> nor frontend) - it may be some libvirt default. If that's really the
> >> cause. Can I (and how) get any useful information about that?
> > 
> > libvirt naturally does some libxl calls, and they may be different.
> > 
> > Any chance you could give me an idea of:
> >  - What commands you use in libvirt?
> >  - Do you use a bond or bridge?
> >  - What version of libvirt you are using?
> > 
> > Thanks!
> > CC-ing Joao just in case he has seen this.
> >>
> Hm, So far I couldn't reproduce the issue with upstream Xen/linux/libvirt, using
> both libvirt or plain xl (both on a bridge setup) and also irrespective of the
> both load and direction of traffic (be it a ping flood, pktgen with min.
> sized packets or iperf).

I've ran the test again, on vanilla 4.4 and collected some info:
 - xenstore dump of frontend (xs-frontend-before.txt)
 - xenstore dump of backend (xs-backend-before.txt)
 - kernel messages (console output) (console.log)
 - kernel config (config-4.4)
 - libvirt config of that domain (netdebug.conf)

Versions:
 - kernel 4.4 (frontend), 4.2.8 (backend)
 - libvirt 1.2.20
 - xen 4.6.0

In backend domain there is no bridge or anything like that - only
routing. The same in frontend - nothing fancy - just IP set on eth0
there.

Steps to reproduce were the same:
 - start frontend domain (virsh create ...)
 - call ping -f
 - xl network-detach NAME 0

Note that the crash doesn't happen with attached patch applied (as noted
in mail on Oct 21), but I have no idea whether is it a proper fix, or
just prevents the crash by a coincidence.
diff mbox

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f821a97..a5efbb0 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1065,9 +1069,10 @@  static void xennet_release_tx_bufs(struct netfront_queue *queue)
 
 		skb = queue->tx_skbs[i].skb;
 		get_page(queue->grant_tx_page[i]);
-		gnttab_end_foreign_access(queue->grant_tx_ref[i],
-					  GNTMAP_readonly,
-					  (unsigned long)page_address(queue->grant_tx_page[i]));
+		gnttab_end_foreign_access_ref(
+				queue->grant_tx_ref[i], GNTMAP_readonly);
+		gnttab_release_grant_reference(
+				&queue->gref_tx_head, queue->grant_tx_ref[i]);
 		queue->grant_tx_page[i] = NULL;
 		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
 		add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, i);