diff mbox series

[net-next,15/16] gve: DQO: Add TX path

Message ID 20210624180632.3659809-16-bcf@google.com (mailing list archive)
State Accepted
Commit a57e5de476be0b4b7f42beb6a21c19ad9c577aa3
Delegated to: Netdev Maintainers
Headers show
Series gve: Introduce DQO descriptor format | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count fail Series longer than 15 patches
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 3 maintainers not CCed: sagis@google.com jonolson@google.com kuba@kernel.org
netdev/source_inline fail Was 0 now: 1
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 0 this patch: 3
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 88 exceeds 80 columns
netdev/build_allmodconfig_warn fail Errors and warnings before: 0 this patch: 1
netdev/header_inline success Link

Commit Message

Bailey Forrest June 24, 2021, 6:06 p.m. UTC
TX SKBs will have their buffers DMA mapped with the device. Each buffer
will have at least one TX descriptor associated. Each SKB will also have
a metadata descriptor.

Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects.
Every TX SKB will have an associated pending_packet object. A TX SKB's
descriptors will use its pending_packet's index as the completion tag,
which will be returned on the TX completion queue.

The device implements a "flow-miss model". Most packets will simply
receive a packet completion. The flow-miss system may choose to process
a packet based on its contents. A TX packet which experiences a flow
miss would receive a miss completion followed by a later reinjection
completion. The miss-completion is received when the packet starts to be
processed by the flow-miss system and the reinjection completion is
received when the flow-miss system completes processing the packet and
sends it on the wire.

Notable mentions:

- Buffers may be freed after receiving the miss-completion, but in order
  to avoid packet reordering, we do not complete the SKB until receiving
  the reinjection completion.

- The driver must robustly handle the unlikely scenario where a miss
  completion does not have an associated reinjection completion. This is
  accomplished by maintaining a list of packets which have a pending
  reinjection completion. After a short timeout (5 seconds), the
  SKB and buffers are released and the pending_packet is moved to a
  second list which has a longer timeout (60 seconds), where the
  pending_packet will not be reused. When the longer timeout elapses,
  the driver may assume the reinjection completion would never be
  received and the pending_packet may be reused.

- Completion handling is triggered by an interrupt and is done in the
  NAPI poll function. Because the TX path and completion exist in
  different threading contexts they maintain their own lists for free
  pending_packet objects. The TX path uses a lock-free approach to steal
  the list from the completion path.

- Both the TSO context and general context descriptors have metadata
  bytes. The device requires that if multiple descriptors contain the
  same field, each descriptor must have the same value set for that
  field.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
---
 drivers/net/ethernet/google/gve/gve_dqo.h    |  12 +
 drivers/net/ethernet/google/gve/gve_tx_dqo.c | 819 ++++++++++++++++++-
 2 files changed, 829 insertions(+), 2 deletions(-)

Comments

kernel test robot June 24, 2021, 9:55 p.m. UTC | #1
Hi Bailey,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 35713d9b8f090d7a226e4aaeeb742265cde33c82]

url:    https://github.com/0day-ci/linux/commits/Bailey-Forrest/gve-Introduce-DQO-descriptor-format/20210625-021110
base:   35713d9b8f090d7a226e4aaeeb742265cde33c82
config: i386-randconfig-a011-20210622 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/af0833aafca5d9abd931a16ee9e761e85f5ad965
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bailey-Forrest/gve-Introduce-DQO-descriptor-format/20210625-021110
        git checkout af0833aafca5d9abd931a16ee9e761e85f5ad965
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_tx_clean_pending_packets':
   drivers/net/ethernet/google/gve/gve_tx_dqo.c:88:27: warning: unused variable 'buf' [-Wunused-variable]
      88 |    struct gve_tx_dma_buf *buf = &cur_state->bufs[j];
         |                           ^~~
   drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_tx_add_skb_no_copy_dqo':
   drivers/net/ethernet/google/gve/gve_tx_dqo.c:496:26: warning: unused variable 'buf' [-Wunused-variable]
     496 |   struct gve_tx_dma_buf *buf =
         |                          ^~~
   drivers/net/ethernet/google/gve/gve_tx_dqo.c:515:26: warning: unused variable 'buf' [-Wunused-variable]
     515 |   struct gve_tx_dma_buf *buf =
         |                          ^~~
   drivers/net/ethernet/google/gve/gve_tx_dqo.c:556:26: warning: unused variable 'buf' [-Wunused-variable]
     556 |   struct gve_tx_dma_buf *buf = &pending_packet->bufs[i];
         |                          ^~~
   drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'remove_from_list':
>> drivers/net/ethernet/google/gve/gve_tx_dqo.c:730:6: warning: variable 'index' set but not used [-Wunused-but-set-variable]
     730 |  s16 index, prev_index, next_index;
         |      ^~~~~
   drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_unmap_packet':
>> drivers/net/ethernet/google/gve/gve_tx_dqo.c:753:25: warning: variable 'buf' set but not used [-Wunused-but-set-variable]
     753 |  struct gve_tx_dma_buf *buf;
         |                         ^~~
   In file included from include/linux/printk.h:7,
                    from include/linux/kernel.h:17,
                    from arch/x86/include/asm/percpu.h:27,
                    from arch/x86/include/asm/current.h:6,
                    from include/linux/sched.h:12,
                    from include/linux/ratelimit.h:6,
                    from include/linux/dev_printk.h:16,
                    from include/linux/device.h:15,
                    from include/linux/dma-mapping.h:7,
                    from drivers/net/ethernet/google/gve/gve.h:10,
                    from drivers/net/ethernet/google/gve/gve_tx_dqo.c:7:
   drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'remove_miss_completions':
>> include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'int' [-Wformat=]
       5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
         |                  ^~~~~~
   include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
      11 | #define KERN_ERR KERN_SOH "3" /* error conditions */
         |                  ^~~~~~~~
   include/linux/printk.h:343:9: note: in expansion of macro 'KERN_ERR'
     343 |  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
         |         ^~~~~~~~
   include/linux/net.h:247:3: note: in expansion of macro 'pr_err'
     247 |   function(__VA_ARGS__);    \
         |   ^~~~~~~~
   include/linux/net.h:257:2: note: in expansion of macro 'net_ratelimited_function'
     257 |  net_ratelimited_function(pr_err, fmt, ##__VA_ARGS__)
         |  ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/google/gve/gve_tx_dqo.c:893:3: note: in expansion of macro 'net_err_ratelimited'
     893 |   net_err_ratelimited("%s: No reinjection completion was received for: %ld.\n",
         |   ^~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/google/gve/gve_tx_dqo.c:893:74: note: format string is defined here
     893 |   net_err_ratelimited("%s: No reinjection completion was received for: %ld.\n",
         |                                                                        ~~^
         |                                                                          |
         |                                                                          long int
         |                                                                        %d


vim +/index +730 drivers/net/ethernet/google/gve/gve_tx_dqo.c

   447	
   448	/* Returns 0 on success, or < 0 on error.
   449	 *
   450	 * Before this function is called, the caller must ensure
   451	 * gve_has_pending_packet(tx) returns true.
   452	 */
   453	static int gve_tx_add_skb_no_copy_dqo(struct gve_tx_ring *tx,
   454					      struct sk_buff *skb)
   455	{
   456		const struct skb_shared_info *shinfo = skb_shinfo(skb);
   457		const bool is_gso = skb_is_gso(skb);
   458		u32 desc_idx = tx->dqo_tx.tail;
   459	
   460		struct gve_tx_pending_packet_dqo *pending_packet;
   461		struct gve_tx_metadata_dqo metadata;
   462		s16 completion_tag;
   463		int i;
   464	
   465		pending_packet = gve_alloc_pending_packet(tx);
   466		pending_packet->skb = skb;
   467		pending_packet->num_bufs = 0;
   468		completion_tag = pending_packet - tx->dqo.pending_packets;
   469	
   470		gve_extract_tx_metadata_dqo(skb, &metadata);
   471		if (is_gso) {
   472			int header_len = gve_prep_tso(skb);
   473	
   474			if (unlikely(header_len < 0))
   475				goto err;
   476	
   477			gve_tx_fill_tso_ctx_desc(&tx->dqo.tx_ring[desc_idx].tso_ctx,
   478						 skb, &metadata, header_len);
   479			desc_idx = (desc_idx + 1) & tx->mask;
   480		}
   481	
   482		gve_tx_fill_general_ctx_desc(&tx->dqo.tx_ring[desc_idx].general_ctx,
   483					     &metadata);
   484		desc_idx = (desc_idx + 1) & tx->mask;
   485	
   486		/* Note: HW requires that the size of a non-TSO packet be within the
   487		 * range of [17, 9728].
   488		 *
   489		 * We don't double check because
   490		 * - We limited `netdev->min_mtu` to ETH_MIN_MTU.
   491		 * - Hypervisor won't allow MTU larger than 9216.
   492		 */
   493	
   494		/* Map the linear portion of skb */
   495		{
   496			struct gve_tx_dma_buf *buf =
   497				&pending_packet->bufs[pending_packet->num_bufs];
   498			u32 len = skb_headlen(skb);
   499			dma_addr_t addr;
   500	
   501			addr = dma_map_single(tx->dev, skb->data, len, DMA_TO_DEVICE);
   502			if (unlikely(dma_mapping_error(tx->dev, addr)))
   503				goto err;
   504	
   505			dma_unmap_len_set(buf, len, len);
   506			dma_unmap_addr_set(buf, dma, addr);
   507			++pending_packet->num_bufs;
   508	
   509			gve_tx_fill_pkt_desc_dqo(tx, &desc_idx, skb, len, addr,
   510						 completion_tag,
   511						 /*eop=*/shinfo->nr_frags == 0, is_gso);
   512		}
   513	
   514		for (i = 0; i < shinfo->nr_frags; i++) {
   515			struct gve_tx_dma_buf *buf =
   516				&pending_packet->bufs[pending_packet->num_bufs];
   517			const skb_frag_t *frag = &shinfo->frags[i];
   518			bool is_eop = i == (shinfo->nr_frags - 1);
   519			u32 len = skb_frag_size(frag);
   520			dma_addr_t addr;
   521	
   522			addr = skb_frag_dma_map(tx->dev, frag, 0, len, DMA_TO_DEVICE);
   523			if (unlikely(dma_mapping_error(tx->dev, addr)))
   524				goto err;
   525	
   526			dma_unmap_len_set(buf, len, len);
   527			dma_unmap_addr_set(buf, dma, addr);
   528			++pending_packet->num_bufs;
   529	
   530			gve_tx_fill_pkt_desc_dqo(tx, &desc_idx, skb, len, addr,
   531						 completion_tag, is_eop, is_gso);
   532		}
   533	
   534		/* Commit the changes to our state */
   535		tx->dqo_tx.tail = desc_idx;
   536	
   537		/* Request a descriptor completion on the last descriptor of the
   538		 * packet if we are allowed to by the HW enforced interval.
   539		 */
   540		{
   541			u32 last_desc_idx = (desc_idx - 1) & tx->mask;
   542			u32 last_report_event_interval =
   543				(last_desc_idx - tx->dqo_tx.last_re_idx) & tx->mask;
   544	
   545			if (unlikely(last_report_event_interval >=
   546				     GVE_TX_MIN_RE_INTERVAL)) {
   547				tx->dqo.tx_ring[last_desc_idx].pkt.report_event = true;
   548				tx->dqo_tx.last_re_idx = last_desc_idx;
   549			}
   550		}
   551	
   552		return 0;
   553	
   554	err:
   555		for (i = 0; i < pending_packet->num_bufs; i++) {
 > 556			struct gve_tx_dma_buf *buf = &pending_packet->bufs[i];
   557	
   558			if (i == 0) {
   559				dma_unmap_single(tx->dev, dma_unmap_addr(buf, dma),
   560						 dma_unmap_len(buf, len),
   561						 DMA_TO_DEVICE);
   562			} else {
   563				dma_unmap_page(tx->dev, dma_unmap_addr(buf, dma),
   564					       dma_unmap_len(buf, len), DMA_TO_DEVICE);
   565			}
   566		}
   567	
   568		pending_packet->skb = NULL;
   569		pending_packet->num_bufs = 0;
   570		gve_free_pending_packet(tx, pending_packet);
   571	
   572		return -1;
   573	}
   574	
   575	static int gve_num_descs_per_buf(size_t size)
   576	{
   577		return DIV_ROUND_UP(size, GVE_TX_MAX_BUF_SIZE_DQO);
   578	}
   579	
   580	static int gve_num_buffer_descs_needed(const struct sk_buff *skb)
   581	{
   582		const struct skb_shared_info *shinfo = skb_shinfo(skb);
   583		int num_descs;
   584		int i;
   585	
   586		num_descs = gve_num_descs_per_buf(skb_headlen(skb));
   587	
   588		for (i = 0; i < shinfo->nr_frags; i++) {
   589			unsigned int frag_size = skb_frag_size(&shinfo->frags[i]);
   590	
   591			num_descs += gve_num_descs_per_buf(frag_size);
   592		}
   593	
   594		return num_descs;
   595	}
   596	
   597	/* Returns true if HW is capable of sending TSO represented by `skb`.
   598	 *
   599	 * Each segment must not span more than GVE_TX_MAX_DATA_DESCS buffers.
   600	 * - The header is counted as one buffer for every single segment.
   601	 * - A buffer which is split between two segments is counted for both.
   602	 * - If a buffer contains both header and payload, it is counted as two buffers.
   603	 */
   604	static bool gve_can_send_tso(const struct sk_buff *skb)
   605	{
   606		const int header_len = skb_checksum_start_offset(skb) + tcp_hdrlen(skb);
   607		const int max_bufs_per_seg = GVE_TX_MAX_DATA_DESCS - 1;
   608		const struct skb_shared_info *shinfo = skb_shinfo(skb);
   609		const int gso_size = shinfo->gso_size;
   610		int cur_seg_num_bufs;
   611		int cur_seg_size;
   612		int i;
   613	
   614		cur_seg_size = skb_headlen(skb) - header_len;
   615		cur_seg_num_bufs = cur_seg_size > 0;
   616	
   617		for (i = 0; i < shinfo->nr_frags; i++) {
   618			if (cur_seg_size >= gso_size) {
   619				cur_seg_size %= gso_size;
   620				cur_seg_num_bufs = cur_seg_size > 0;
   621			}
   622	
   623			if (unlikely(++cur_seg_num_bufs > max_bufs_per_seg))
   624				return false;
   625	
   626			cur_seg_size += skb_frag_size(&shinfo->frags[i]);
   627		}
   628	
   629		return true;
   630	}
   631	
   632	/* Attempt to transmit specified SKB.
   633	 *
   634	 * Returns 0 if the SKB was transmitted or dropped.
   635	 * Returns -1 if there is not currently enough space to transmit the SKB.
   636	 */
   637	static int gve_try_tx_skb(struct gve_priv *priv, struct gve_tx_ring *tx,
   638				  struct sk_buff *skb)
   639	{
   640		int num_buffer_descs;
   641		int total_num_descs;
   642	
   643		if (skb_is_gso(skb)) {
   644			/* If TSO doesn't meet HW requirements, attempt to linearize the
   645			 * packet.
   646			 */
   647			if (unlikely(!gve_can_send_tso(skb) &&
   648				     skb_linearize(skb) < 0)) {
   649				net_err_ratelimited("%s: Failed to transmit TSO packet\n",
   650						    priv->dev->name);
   651				goto drop;
   652			}
   653	
   654			num_buffer_descs = gve_num_buffer_descs_needed(skb);
   655		} else {
   656			num_buffer_descs = gve_num_buffer_descs_needed(skb);
   657	
   658			if (unlikely(num_buffer_descs > GVE_TX_MAX_DATA_DESCS)) {
   659				if (unlikely(skb_linearize(skb) < 0))
   660					goto drop;
   661	
   662				num_buffer_descs = 1;
   663			}
   664		}
   665	
   666		/* Metadata + (optional TSO) + data descriptors. */
   667		total_num_descs = 1 + skb_is_gso(skb) + num_buffer_descs;
   668		if (unlikely(gve_maybe_stop_tx_dqo(tx, total_num_descs +
   669				GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP))) {
   670			return -1;
   671		}
   672	
   673		if (unlikely(gve_tx_add_skb_no_copy_dqo(tx, skb) < 0))
   674			goto drop;
   675	
   676		netdev_tx_sent_queue(tx->netdev_txq, skb->len);
   677		skb_tx_timestamp(skb);
   678		return 0;
   679	
   680	drop:
   681		tx->dropped_pkt++;
   682		dev_kfree_skb_any(skb);
   683		return 0;
   684	}
   685	
   686	/* Transmit a given skb and ring the doorbell. */
   687	netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev)
   688	{
   689		struct gve_priv *priv = netdev_priv(dev);
   690		struct gve_tx_ring *tx;
   691	
   692		tx = &priv->tx[skb_get_queue_mapping(skb)];
   693		if (unlikely(gve_try_tx_skb(priv, tx, skb) < 0)) {
   694			/* We need to ring the txq doorbell -- we have stopped the Tx
   695			 * queue for want of resources, but prior calls to gve_tx()
   696			 * may have added descriptors without ringing the doorbell.
   697			 */
   698			gve_tx_put_doorbell_dqo(priv, tx->q_resources, tx->dqo_tx.tail);
   699			return NETDEV_TX_BUSY;
   700		}
   701	
   702		if (!netif_xmit_stopped(tx->netdev_txq) && netdev_xmit_more())
   703			return NETDEV_TX_OK;
   704	
   705		gve_tx_put_doorbell_dqo(priv, tx->q_resources, tx->dqo_tx.tail);
   706		return NETDEV_TX_OK;
   707	}
   708	
   709	static void add_to_list(struct gve_tx_ring *tx, struct gve_index_list *list,
   710				struct gve_tx_pending_packet_dqo *pending_packet)
   711	{
   712		s16 old_tail, index;
   713	
   714		index = pending_packet - tx->dqo.pending_packets;
   715		old_tail = list->tail;
   716		list->tail = index;
   717		if (old_tail == -1)
   718			list->head = index;
   719		else
   720			tx->dqo.pending_packets[old_tail].next = index;
   721	
   722		pending_packet->next = -1;
   723		pending_packet->prev = old_tail;
   724	}
   725	
   726	static void remove_from_list(struct gve_tx_ring *tx,
   727				     struct gve_index_list *list,
   728				     struct gve_tx_pending_packet_dqo *pending_packet)
   729	{
 > 730		s16 index, prev_index, next_index;
   731	
   732		index = pending_packet - tx->dqo.pending_packets;
   733		prev_index = pending_packet->prev;
   734		next_index = pending_packet->next;
   735	
   736		if (prev_index == -1) {
   737			/* Node is head */
   738			list->head = next_index;
   739		} else {
   740			tx->dqo.pending_packets[prev_index].next = next_index;
   741		}
   742		if (next_index == -1) {
   743			/* Node is tail */
   744			list->tail = prev_index;
   745		} else {
   746			tx->dqo.pending_packets[next_index].prev = prev_index;
   747		}
   748	}
   749	
   750	static void gve_unmap_packet(struct device *dev,
   751				     struct gve_tx_pending_packet_dqo *pending_packet)
   752	{
 > 753		struct gve_tx_dma_buf *buf;
   754		int i;
   755	
   756		/* SKB linear portion is guaranteed to be mapped */
   757		buf = &pending_packet->bufs[0];
   758		dma_unmap_single(dev, dma_unmap_addr(buf, dma),
   759				 dma_unmap_len(buf, len), DMA_TO_DEVICE);
   760		for (i = 1; i < pending_packet->num_bufs; i++) {
   761			buf = &pending_packet->bufs[i];
   762			dma_unmap_page(dev, dma_unmap_addr(buf, dma),
   763				       dma_unmap_len(buf, len), DMA_TO_DEVICE);
   764		}
   765		pending_packet->num_bufs = 0;
   766	}
   767	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/drivers/net/ethernet/google/gve/gve_dqo.h b/drivers/net/ethernet/google/gve/gve_dqo.h
index 3b300223ea15..836042364124 100644
--- a/drivers/net/ethernet/google/gve/gve_dqo.h
+++ b/drivers/net/ethernet/google/gve/gve_dqo.h
@@ -19,6 +19,18 @@ 
 #define GVE_TX_IRQ_RATELIMIT_US_DQO 50
 #define GVE_RX_IRQ_RATELIMIT_US_DQO 20
 
+/* Timeout in seconds to wait for a reinjection completion after receiving
+ * its corresponding miss completion.
+ */
+#define GVE_REINJECT_COMPL_TIMEOUT 1
+
+/* Timeout in seconds to deallocate the completion tag for a packet that was
+ * prematurely freed for not receiving a valid completion. This should be large
+ * enough to rule out the possibility of receiving the corresponding valid
+ * completion after this interval.
+ */
+#define GVE_DEALLOCATE_COMPL_TIMEOUT 60
+
 netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev);
 bool gve_tx_poll_dqo(struct gve_notify_block *block, bool do_clean);
 int gve_rx_poll_dqo(struct gve_notify_block *block, int budget);
diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
index bde8f90ac8bd..a4906b9df540 100644
--- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c
@@ -12,6 +12,67 @@ 
 #include <linux/slab.h>
 #include <linux/skbuff.h>
 
+/* Returns true if a gve_tx_pending_packet_dqo object is available. */
+static bool gve_has_pending_packet(struct gve_tx_ring *tx)
+{
+	/* Check TX path's list. */
+	if (tx->dqo_tx.free_pending_packets != -1)
+		return true;
+
+	/* Check completion handler's list. */
+	if (atomic_read_acquire(&tx->dqo_compl.free_pending_packets) != -1)
+		return true;
+
+	return false;
+}
+
+static struct gve_tx_pending_packet_dqo *
+gve_alloc_pending_packet(struct gve_tx_ring *tx)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	s16 index;
+
+	index = tx->dqo_tx.free_pending_packets;
+
+	/* No pending_packets available, try to steal the list from the
+	 * completion handler.
+	 */
+	if (unlikely(index == -1)) {
+		tx->dqo_tx.free_pending_packets =
+			atomic_xchg(&tx->dqo_compl.free_pending_packets, -1);
+		index = tx->dqo_tx.free_pending_packets;
+
+		if (unlikely(index == -1))
+			return NULL;
+	}
+
+	pending_packet = &tx->dqo.pending_packets[index];
+
+	/* Remove pending_packet from free list */
+	tx->dqo_tx.free_pending_packets = pending_packet->next;
+	pending_packet->state = GVE_PACKET_STATE_PENDING_DATA_COMPL;
+
+	return pending_packet;
+}
+
+static void
+gve_free_pending_packet(struct gve_tx_ring *tx,
+			struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	s16 index = pending_packet - tx->dqo.pending_packets;
+
+	pending_packet->state = GVE_PACKET_STATE_UNALLOCATED;
+	while (true) {
+		s16 old_head = atomic_read_acquire(&tx->dqo_compl.free_pending_packets);
+
+		pending_packet->next = old_head;
+		if (atomic_cmpxchg(&tx->dqo_compl.free_pending_packets,
+				   old_head, index) == old_head) {
+			break;
+		}
+	}
+}
+
 /* gve_tx_free_desc - Cleans up all pending tx requests and buffers.
  */
 static void gve_tx_clean_pending_packets(struct gve_tx_ring *tx)
@@ -199,18 +260,772 @@  void gve_tx_free_rings_dqo(struct gve_priv *priv)
 	}
 }
 
+/* Returns the number of slots available in the ring */
+static inline u32 num_avail_tx_slots(const struct gve_tx_ring *tx)
+{
+	u32 num_used = (tx->dqo_tx.tail - tx->dqo_tx.head) & tx->mask;
+
+	return tx->mask - num_used;
+}
+
+/* Stops the queue if available descriptors is less than 'count'.
+ * Return: 0 if stop is not required.
+ */
+static int gve_maybe_stop_tx_dqo(struct gve_tx_ring *tx, int count)
+{
+	if (likely(gve_has_pending_packet(tx) &&
+		   num_avail_tx_slots(tx) >= count))
+		return 0;
+
+	/* Update cached TX head pointer */
+	tx->dqo_tx.head = atomic_read_acquire(&tx->dqo_compl.hw_tx_head);
+
+	if (likely(gve_has_pending_packet(tx) &&
+		   num_avail_tx_slots(tx) >= count))
+		return 0;
+
+	/* No space, so stop the queue */
+	tx->stop_queue++;
+	netif_tx_stop_queue(tx->netdev_txq);
+
+	/* Sync with restarting queue in `gve_tx_poll_dqo()` */
+	mb();
+
+	/* After stopping queue, check if we can transmit again in order to
+	 * avoid TOCTOU bug.
+	 */
+	tx->dqo_tx.head = atomic_read_acquire(&tx->dqo_compl.hw_tx_head);
+
+	if (likely(!gve_has_pending_packet(tx) ||
+		   num_avail_tx_slots(tx) < count))
+		return -EBUSY;
+
+	netif_tx_start_queue(tx->netdev_txq);
+	tx->wake_queue++;
+	return 0;
+}
+
+static void gve_extract_tx_metadata_dqo(const struct sk_buff *skb,
+					struct gve_tx_metadata_dqo *metadata)
+{
+	memset(metadata, 0, sizeof(*metadata));
+	metadata->version = GVE_TX_METADATA_VERSION_DQO;
+
+	if (skb->l4_hash) {
+		u16 path_hash = skb->hash ^ (skb->hash >> 16);
+
+		path_hash &= (1 << 15) - 1;
+		if (unlikely(path_hash == 0))
+			path_hash = ~path_hash;
+
+		metadata->path_hash = path_hash;
+	}
+}
+
+static void gve_tx_fill_pkt_desc_dqo(struct gve_tx_ring *tx, u32 *desc_idx,
+				     struct sk_buff *skb, u32 len, u64 addr,
+				     s16 compl_tag, bool eop, bool is_gso)
+{
+	const bool checksum_offload_en = skb->ip_summed == CHECKSUM_PARTIAL;
+
+	while (len > 0) {
+		struct gve_tx_pkt_desc_dqo *desc =
+			&tx->dqo.tx_ring[*desc_idx].pkt;
+		u32 cur_len = min_t(u32, len, GVE_TX_MAX_BUF_SIZE_DQO);
+		bool cur_eop = eop && cur_len == len;
+
+		*desc = (struct gve_tx_pkt_desc_dqo){
+			.buf_addr = cpu_to_le64(addr),
+			.dtype = GVE_TX_PKT_DESC_DTYPE_DQO,
+			.end_of_packet = cur_eop,
+			.checksum_offload_enable = checksum_offload_en,
+			.compl_tag = cpu_to_le16(compl_tag),
+			.buf_size = cur_len,
+		};
+
+		addr += cur_len;
+		len -= cur_len;
+		*desc_idx = (*desc_idx + 1) & tx->mask;
+	}
+}
+
+/* Validates and prepares `skb` for TSO.
+ *
+ * Returns header length, or < 0 if invalid.
+ */
+static int gve_prep_tso(struct sk_buff *skb)
+{
+	struct tcphdr *tcp;
+	int header_len;
+	u32 paylen;
+	int err;
+
+	/* Note: HW requires MSS (gso_size) to be <= 9728 and the total length
+	 * of the TSO to be <= 262143.
+	 *
+	 * However, we don't validate these because:
+	 * - Hypervisor enforces a limit of 9K MTU
+	 * - Kernel will not produce a TSO larger than 64k
+	 */
+
+	if (unlikely(skb_shinfo(skb)->gso_size < GVE_TX_MIN_TSO_MSS_DQO))
+		return -1;
+
+	/* Needed because we will modify header. */
+	err = skb_cow_head(skb, 0);
+	if (err < 0)
+		return err;
+
+	tcp = tcp_hdr(skb);
+
+	/* Remove payload length from checksum. */
+	paylen = skb->len - skb_transport_offset(skb);
+
+	switch (skb_shinfo(skb)->gso_type) {
+	case SKB_GSO_TCPV4:
+	case SKB_GSO_TCPV6:
+		csum_replace_by_diff(&tcp->check,
+				     (__force __wsum)htonl(paylen));
+
+		/* Compute length of segmentation header. */
+		header_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (unlikely(header_len > GVE_TX_MAX_HDR_SIZE_DQO))
+		return -EINVAL;
+
+	return header_len;
+}
+
+static void gve_tx_fill_tso_ctx_desc(struct gve_tx_tso_context_desc_dqo *desc,
+				     const struct sk_buff *skb,
+				     const struct gve_tx_metadata_dqo *metadata,
+				     int header_len)
+{
+	*desc = (struct gve_tx_tso_context_desc_dqo){
+		.header_len = header_len,
+		.cmd_dtype = {
+			.dtype = GVE_TX_TSO_CTX_DESC_DTYPE_DQO,
+			.tso = 1,
+		},
+		.flex0 = metadata->bytes[0],
+		.flex5 = metadata->bytes[5],
+		.flex6 = metadata->bytes[6],
+		.flex7 = metadata->bytes[7],
+		.flex8 = metadata->bytes[8],
+		.flex9 = metadata->bytes[9],
+		.flex10 = metadata->bytes[10],
+		.flex11 = metadata->bytes[11],
+	};
+	desc->tso_total_len = skb->len - header_len;
+	desc->mss = skb_shinfo(skb)->gso_size;
+}
+
+static void
+gve_tx_fill_general_ctx_desc(struct gve_tx_general_context_desc_dqo *desc,
+			     const struct gve_tx_metadata_dqo *metadata)
+{
+	*desc = (struct gve_tx_general_context_desc_dqo){
+		.flex0 = metadata->bytes[0],
+		.flex1 = metadata->bytes[1],
+		.flex2 = metadata->bytes[2],
+		.flex3 = metadata->bytes[3],
+		.flex4 = metadata->bytes[4],
+		.flex5 = metadata->bytes[5],
+		.flex6 = metadata->bytes[6],
+		.flex7 = metadata->bytes[7],
+		.flex8 = metadata->bytes[8],
+		.flex9 = metadata->bytes[9],
+		.flex10 = metadata->bytes[10],
+		.flex11 = metadata->bytes[11],
+		.cmd_dtype = {.dtype = GVE_TX_GENERAL_CTX_DESC_DTYPE_DQO},
+	};
+}
+
+/* Returns 0 on success, or < 0 on error.
+ *
+ * Before this function is called, the caller must ensure
+ * gve_has_pending_packet(tx) returns true.
+ */
+static int gve_tx_add_skb_no_copy_dqo(struct gve_tx_ring *tx,
+				      struct sk_buff *skb)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	const bool is_gso = skb_is_gso(skb);
+	u32 desc_idx = tx->dqo_tx.tail;
+
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	struct gve_tx_metadata_dqo metadata;
+	s16 completion_tag;
+	int i;
+
+	pending_packet = gve_alloc_pending_packet(tx);
+	pending_packet->skb = skb;
+	pending_packet->num_bufs = 0;
+	completion_tag = pending_packet - tx->dqo.pending_packets;
+
+	gve_extract_tx_metadata_dqo(skb, &metadata);
+	if (is_gso) {
+		int header_len = gve_prep_tso(skb);
+
+		if (unlikely(header_len < 0))
+			goto err;
+
+		gve_tx_fill_tso_ctx_desc(&tx->dqo.tx_ring[desc_idx].tso_ctx,
+					 skb, &metadata, header_len);
+		desc_idx = (desc_idx + 1) & tx->mask;
+	}
+
+	gve_tx_fill_general_ctx_desc(&tx->dqo.tx_ring[desc_idx].general_ctx,
+				     &metadata);
+	desc_idx = (desc_idx + 1) & tx->mask;
+
+	/* Note: HW requires that the size of a non-TSO packet be within the
+	 * range of [17, 9728].
+	 *
+	 * We don't double check because
+	 * - We limited `netdev->min_mtu` to ETH_MIN_MTU.
+	 * - Hypervisor won't allow MTU larger than 9216.
+	 */
+
+	/* Map the linear portion of skb */
+	{
+		struct gve_tx_dma_buf *buf =
+			&pending_packet->bufs[pending_packet->num_bufs];
+		u32 len = skb_headlen(skb);
+		dma_addr_t addr;
+
+		addr = dma_map_single(tx->dev, skb->data, len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx->dev, addr)))
+			goto err;
+
+		dma_unmap_len_set(buf, len, len);
+		dma_unmap_addr_set(buf, dma, addr);
+		++pending_packet->num_bufs;
+
+		gve_tx_fill_pkt_desc_dqo(tx, &desc_idx, skb, len, addr,
+					 completion_tag,
+					 /*eop=*/shinfo->nr_frags == 0, is_gso);
+	}
+
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		struct gve_tx_dma_buf *buf =
+			&pending_packet->bufs[pending_packet->num_bufs];
+		const skb_frag_t *frag = &shinfo->frags[i];
+		bool is_eop = i == (shinfo->nr_frags - 1);
+		u32 len = skb_frag_size(frag);
+		dma_addr_t addr;
+
+		addr = skb_frag_dma_map(tx->dev, frag, 0, len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx->dev, addr)))
+			goto err;
+
+		dma_unmap_len_set(buf, len, len);
+		dma_unmap_addr_set(buf, dma, addr);
+		++pending_packet->num_bufs;
+
+		gve_tx_fill_pkt_desc_dqo(tx, &desc_idx, skb, len, addr,
+					 completion_tag, is_eop, is_gso);
+	}
+
+	/* Commit the changes to our state */
+	tx->dqo_tx.tail = desc_idx;
+
+	/* Request a descriptor completion on the last descriptor of the
+	 * packet if we are allowed to by the HW enforced interval.
+	 */
+	{
+		u32 last_desc_idx = (desc_idx - 1) & tx->mask;
+		u32 last_report_event_interval =
+			(last_desc_idx - tx->dqo_tx.last_re_idx) & tx->mask;
+
+		if (unlikely(last_report_event_interval >=
+			     GVE_TX_MIN_RE_INTERVAL)) {
+			tx->dqo.tx_ring[last_desc_idx].pkt.report_event = true;
+			tx->dqo_tx.last_re_idx = last_desc_idx;
+		}
+	}
+
+	return 0;
+
+err:
+	for (i = 0; i < pending_packet->num_bufs; i++) {
+		struct gve_tx_dma_buf *buf = &pending_packet->bufs[i];
+
+		if (i == 0) {
+			dma_unmap_single(tx->dev, dma_unmap_addr(buf, dma),
+					 dma_unmap_len(buf, len),
+					 DMA_TO_DEVICE);
+		} else {
+			dma_unmap_page(tx->dev, dma_unmap_addr(buf, dma),
+				       dma_unmap_len(buf, len), DMA_TO_DEVICE);
+		}
+	}
+
+	pending_packet->skb = NULL;
+	pending_packet->num_bufs = 0;
+	gve_free_pending_packet(tx, pending_packet);
+
+	return -1;
+}
+
+static int gve_num_descs_per_buf(size_t size)
+{
+	return DIV_ROUND_UP(size, GVE_TX_MAX_BUF_SIZE_DQO);
+}
+
+static int gve_num_buffer_descs_needed(const struct sk_buff *skb)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int num_descs;
+	int i;
+
+	num_descs = gve_num_descs_per_buf(skb_headlen(skb));
+
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		unsigned int frag_size = skb_frag_size(&shinfo->frags[i]);
+
+		num_descs += gve_num_descs_per_buf(frag_size);
+	}
+
+	return num_descs;
+}
+
+/* Returns true if HW is capable of sending TSO represented by `skb`.
+ *
+ * Each segment must not span more than GVE_TX_MAX_DATA_DESCS buffers.
+ * - The header is counted as one buffer for every single segment.
+ * - A buffer which is split between two segments is counted for both.
+ * - If a buffer contains both header and payload, it is counted as two buffers.
+ */
+static bool gve_can_send_tso(const struct sk_buff *skb)
+{
+	const int header_len = skb_checksum_start_offset(skb) + tcp_hdrlen(skb);
+	const int max_bufs_per_seg = GVE_TX_MAX_DATA_DESCS - 1;
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	const int gso_size = shinfo->gso_size;
+	int cur_seg_num_bufs;
+	int cur_seg_size;
+	int i;
+
+	cur_seg_size = skb_headlen(skb) - header_len;
+	cur_seg_num_bufs = cur_seg_size > 0;
+
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		if (cur_seg_size >= gso_size) {
+			cur_seg_size %= gso_size;
+			cur_seg_num_bufs = cur_seg_size > 0;
+		}
+
+		if (unlikely(++cur_seg_num_bufs > max_bufs_per_seg))
+			return false;
+
+		cur_seg_size += skb_frag_size(&shinfo->frags[i]);
+	}
+
+	return true;
+}
+
+/* Attempt to transmit specified SKB.
+ *
+ * Returns 0 if the SKB was transmitted or dropped.
+ * Returns -1 if there is not currently enough space to transmit the SKB.
+ */
+static int gve_try_tx_skb(struct gve_priv *priv, struct gve_tx_ring *tx,
+			  struct sk_buff *skb)
+{
+	int num_buffer_descs;
+	int total_num_descs;
+
+	if (skb_is_gso(skb)) {
+		/* If TSO doesn't meet HW requirements, attempt to linearize the
+		 * packet.
+		 */
+		if (unlikely(!gve_can_send_tso(skb) &&
+			     skb_linearize(skb) < 0)) {
+			net_err_ratelimited("%s: Failed to transmit TSO packet\n",
+					    priv->dev->name);
+			goto drop;
+		}
+
+		num_buffer_descs = gve_num_buffer_descs_needed(skb);
+	} else {
+		num_buffer_descs = gve_num_buffer_descs_needed(skb);
+
+		if (unlikely(num_buffer_descs > GVE_TX_MAX_DATA_DESCS)) {
+			if (unlikely(skb_linearize(skb) < 0))
+				goto drop;
+
+			num_buffer_descs = 1;
+		}
+	}
+
+	/* Metadata + (optional TSO) + data descriptors. */
+	total_num_descs = 1 + skb_is_gso(skb) + num_buffer_descs;
+	if (unlikely(gve_maybe_stop_tx_dqo(tx, total_num_descs +
+			GVE_TX_MIN_DESC_PREVENT_CACHE_OVERLAP))) {
+		return -1;
+	}
+
+	if (unlikely(gve_tx_add_skb_no_copy_dqo(tx, skb) < 0))
+		goto drop;
+
+	netdev_tx_sent_queue(tx->netdev_txq, skb->len);
+	skb_tx_timestamp(skb);
+	return 0;
+
+drop:
+	tx->dropped_pkt++;
+	dev_kfree_skb_any(skb);
+	return 0;
+}
+
+/* Transmit a given skb and ring the doorbell. */
 netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev)
 {
+	struct gve_priv *priv = netdev_priv(dev);
+	struct gve_tx_ring *tx;
+
+	tx = &priv->tx[skb_get_queue_mapping(skb)];
+	if (unlikely(gve_try_tx_skb(priv, tx, skb) < 0)) {
+		/* We need to ring the txq doorbell -- we have stopped the Tx
+		 * queue for want of resources, but prior calls to gve_tx()
+		 * may have added descriptors without ringing the doorbell.
+		 */
+		gve_tx_put_doorbell_dqo(priv, tx->q_resources, tx->dqo_tx.tail);
+		return NETDEV_TX_BUSY;
+	}
+
+	if (!netif_xmit_stopped(tx->netdev_txq) && netdev_xmit_more())
+		return NETDEV_TX_OK;
+
+	gve_tx_put_doorbell_dqo(priv, tx->q_resources, tx->dqo_tx.tail);
 	return NETDEV_TX_OK;
 }
 
+static void add_to_list(struct gve_tx_ring *tx, struct gve_index_list *list,
+			struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	s16 old_tail, index;
+
+	index = pending_packet - tx->dqo.pending_packets;
+	old_tail = list->tail;
+	list->tail = index;
+	if (old_tail == -1)
+		list->head = index;
+	else
+		tx->dqo.pending_packets[old_tail].next = index;
+
+	pending_packet->next = -1;
+	pending_packet->prev = old_tail;
+}
+
+static void remove_from_list(struct gve_tx_ring *tx,
+			     struct gve_index_list *list,
+			     struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	s16 index, prev_index, next_index;
+
+	index = pending_packet - tx->dqo.pending_packets;
+	prev_index = pending_packet->prev;
+	next_index = pending_packet->next;
+
+	if (prev_index == -1) {
+		/* Node is head */
+		list->head = next_index;
+	} else {
+		tx->dqo.pending_packets[prev_index].next = next_index;
+	}
+	if (next_index == -1) {
+		/* Node is tail */
+		list->tail = prev_index;
+	} else {
+		tx->dqo.pending_packets[next_index].prev = prev_index;
+	}
+}
+
+static void gve_unmap_packet(struct device *dev,
+			     struct gve_tx_pending_packet_dqo *pending_packet)
+{
+	struct gve_tx_dma_buf *buf;
+	int i;
+
+	/* SKB linear portion is guaranteed to be mapped */
+	buf = &pending_packet->bufs[0];
+	dma_unmap_single(dev, dma_unmap_addr(buf, dma),
+			 dma_unmap_len(buf, len), DMA_TO_DEVICE);
+	for (i = 1; i < pending_packet->num_bufs; i++) {
+		buf = &pending_packet->bufs[i];
+		dma_unmap_page(dev, dma_unmap_addr(buf, dma),
+			       dma_unmap_len(buf, len), DMA_TO_DEVICE);
+	}
+	pending_packet->num_bufs = 0;
+}
+
+/* Completion types and expected behavior:
+ * No Miss compl + Packet compl = Packet completed normally.
+ * Miss compl + Re-inject compl = Packet completed normally.
+ * No Miss compl + Re-inject compl = Skipped i.e. packet not completed.
+ * Miss compl + Packet compl = Skipped i.e. packet not completed.
+ */
+static void gve_handle_packet_completion(struct gve_priv *priv,
+					 struct gve_tx_ring *tx, bool is_napi,
+					 u16 compl_tag, u64 *bytes, u64 *pkts,
+					 bool is_reinjection)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+
+	if (unlikely(compl_tag >= tx->dqo.num_pending_packets)) {
+		net_err_ratelimited("%s: Invalid TX completion tag: %d\n",
+				    priv->dev->name, (int)compl_tag);
+		return;
+	}
+
+	pending_packet = &tx->dqo.pending_packets[compl_tag];
+
+	if (unlikely(is_reinjection)) {
+		if (unlikely(pending_packet->state ==
+			     GVE_PACKET_STATE_TIMED_OUT_COMPL)) {
+			net_err_ratelimited("%s: Re-injection completion: %d received after timeout.\n",
+					    priv->dev->name, (int)compl_tag);
+			/* Packet was already completed as a result of timeout,
+			 * so just remove from list and free pending packet.
+			 */
+			remove_from_list(tx,
+					 &tx->dqo_compl.timed_out_completions,
+					 pending_packet);
+			gve_free_pending_packet(tx, pending_packet);
+			return;
+		}
+		if (unlikely(pending_packet->state !=
+			     GVE_PACKET_STATE_PENDING_REINJECT_COMPL)) {
+			/* No outstanding miss completion but packet allocated
+			 * implies packet receives a re-injection completion
+			 * without a a prior miss completion. Return without
+			 * completing the packet.
+			 */
+			net_err_ratelimited("%s: Re-injection completion received without corresponding miss completion: %d\n",
+					    priv->dev->name, (int)compl_tag);
+			return;
+		}
+		remove_from_list(tx, &tx->dqo_compl.miss_completions,
+				 pending_packet);
+	} else {
+		/* Packet is allocated but not a pending data completion. */
+		if (unlikely(pending_packet->state !=
+			     GVE_PACKET_STATE_PENDING_DATA_COMPL)) {
+			net_err_ratelimited("%s: No pending data completion: %d\n",
+					    priv->dev->name, (int)compl_tag);
+			return;
+		}
+	}
+	gve_unmap_packet(tx->dev, pending_packet);
+
+	*bytes += pending_packet->skb->len;
+	(*pkts)++;
+	napi_consume_skb(pending_packet->skb, is_napi);
+	pending_packet->skb = NULL;
+	gve_free_pending_packet(tx, pending_packet);
+}
+
+static void gve_handle_miss_completion(struct gve_priv *priv,
+				       struct gve_tx_ring *tx, u16 compl_tag,
+				       u64 *bytes, u64 *pkts)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+
+	if (unlikely(compl_tag >= tx->dqo.num_pending_packets)) {
+		net_err_ratelimited("%s: Invalid TX completion tag: %d\n",
+				    priv->dev->name, (int)compl_tag);
+		return;
+	}
+
+	pending_packet = &tx->dqo.pending_packets[compl_tag];
+	if (unlikely(pending_packet->state !=
+				GVE_PACKET_STATE_PENDING_DATA_COMPL)) {
+		net_err_ratelimited("%s: Unexpected packet state: %d for completion tag : %d\n",
+				    priv->dev->name, (int)pending_packet->state,
+				    (int)compl_tag);
+		return;
+	}
+
+	pending_packet->state = GVE_PACKET_STATE_PENDING_REINJECT_COMPL;
+	/* jiffies can wraparound but time comparisons can handle overflows. */
+	pending_packet->timeout_jiffies =
+			jiffies +
+			msecs_to_jiffies(GVE_REINJECT_COMPL_TIMEOUT *
+					 MSEC_PER_SEC);
+	add_to_list(tx, &tx->dqo_compl.miss_completions, pending_packet);
+
+	*bytes += pending_packet->skb->len;
+	(*pkts)++;
+}
+
+static void remove_miss_completions(struct gve_priv *priv,
+				    struct gve_tx_ring *tx)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	s16 next_index;
+
+	next_index = tx->dqo_compl.miss_completions.head;
+	while (next_index != -1) {
+		pending_packet = &tx->dqo.pending_packets[next_index];
+		next_index = pending_packet->next;
+		/* Break early because packets should timeout in order. */
+		if (time_is_after_jiffies(pending_packet->timeout_jiffies))
+			break;
+
+		remove_from_list(tx, &tx->dqo_compl.miss_completions,
+				 pending_packet);
+		/* Unmap buffers and free skb but do not unallocate packet i.e.
+		 * the completion tag is not freed to ensure that the driver
+		 * can take appropriate action if a corresponding valid
+		 * completion is received later.
+		 */
+		gve_unmap_packet(tx->dev, pending_packet);
+		/* This indicates the packet was dropped. */
+		dev_kfree_skb_any(pending_packet->skb);
+		pending_packet->skb = NULL;
+		tx->dropped_pkt++;
+		net_err_ratelimited("%s: No reinjection completion was received for: %ld.\n",
+				    priv->dev->name,
+				    (pending_packet - tx->dqo.pending_packets));
+
+		pending_packet->state = GVE_PACKET_STATE_TIMED_OUT_COMPL;
+		pending_packet->timeout_jiffies =
+				jiffies +
+				msecs_to_jiffies(GVE_DEALLOCATE_COMPL_TIMEOUT *
+						 MSEC_PER_SEC);
+		/* Maintain pending packet in another list so the packet can be
+		 * unallocated at a later time.
+		 */
+		add_to_list(tx, &tx->dqo_compl.timed_out_completions,
+			    pending_packet);
+	}
+}
+
+static void remove_timed_out_completions(struct gve_priv *priv,
+					 struct gve_tx_ring *tx)
+{
+	struct gve_tx_pending_packet_dqo *pending_packet;
+	s16 next_index;
+
+	next_index = tx->dqo_compl.timed_out_completions.head;
+	while (next_index != -1) {
+		pending_packet = &tx->dqo.pending_packets[next_index];
+		next_index = pending_packet->next;
+		/* Break early because packets should timeout in order. */
+		if (time_is_after_jiffies(pending_packet->timeout_jiffies))
+			break;
+
+		remove_from_list(tx, &tx->dqo_compl.timed_out_completions,
+				 pending_packet);
+		gve_free_pending_packet(tx, pending_packet);
+	}
+}
+
 int gve_clean_tx_done_dqo(struct gve_priv *priv, struct gve_tx_ring *tx,
 			  struct napi_struct *napi)
 {
-	return 0;
+	u64 reinject_compl_bytes = 0;
+	u64 reinject_compl_pkts = 0;
+	int num_descs_cleaned = 0;
+	u64 miss_compl_bytes = 0;
+	u64 miss_compl_pkts = 0;
+	u64 pkt_compl_bytes = 0;
+	u64 pkt_compl_pkts = 0;
+
+	/* Limit in order to avoid blocking for too long */
+	while (!napi || pkt_compl_pkts < napi->weight) {
+		struct gve_tx_compl_desc *compl_desc =
+			&tx->dqo.compl_ring[tx->dqo_compl.head];
+		u16 type;
+
+		if (compl_desc->generation == tx->dqo_compl.cur_gen_bit)
+			break;
+
+		/* Prefetch the next descriptor. */
+		prefetch(&tx->dqo.compl_ring[(tx->dqo_compl.head + 1) &
+				tx->dqo.complq_mask]);
+
+		/* Do not read data until we own the descriptor */
+		dma_rmb();
+		type = compl_desc->type;
+
+		if (type == GVE_COMPL_TYPE_DQO_DESC) {
+			/* This is the last descriptor fetched by HW plus one */
+			u16 tx_head = le16_to_cpu(compl_desc->tx_head);
+
+			atomic_set_release(&tx->dqo_compl.hw_tx_head, tx_head);
+		} else if (type == GVE_COMPL_TYPE_DQO_PKT) {
+			u16 compl_tag = le16_to_cpu(compl_desc->completion_tag);
+
+			gve_handle_packet_completion(priv, tx, !!napi,
+						     compl_tag,
+						     &pkt_compl_bytes,
+						     &pkt_compl_pkts,
+						     /*is_reinjection=*/false);
+		} else if (type == GVE_COMPL_TYPE_DQO_MISS) {
+			u16 compl_tag = le16_to_cpu(compl_desc->completion_tag);
+
+			gve_handle_miss_completion(priv, tx, compl_tag,
+						   &miss_compl_bytes,
+						   &miss_compl_pkts);
+		} else if (type == GVE_COMPL_TYPE_DQO_REINJECTION) {
+			u16 compl_tag = le16_to_cpu(compl_desc->completion_tag);
+
+			gve_handle_packet_completion(priv, tx, !!napi,
+						     compl_tag,
+						     &reinject_compl_bytes,
+						     &reinject_compl_pkts,
+						     /*is_reinjection=*/true);
+		}
+
+		tx->dqo_compl.head =
+			(tx->dqo_compl.head + 1) & tx->dqo.complq_mask;
+		/* Flip the generation bit when we wrap around */
+		tx->dqo_compl.cur_gen_bit ^= tx->dqo_compl.head == 0;
+		num_descs_cleaned++;
+	}
+
+	netdev_tx_completed_queue(tx->netdev_txq,
+				  pkt_compl_pkts + miss_compl_pkts,
+				  pkt_compl_bytes + miss_compl_bytes);
+
+	remove_miss_completions(priv, tx);
+	remove_timed_out_completions(priv, tx);
+
+	u64_stats_update_begin(&tx->statss);
+	tx->bytes_done += pkt_compl_bytes + reinject_compl_bytes;
+	tx->pkt_done += pkt_compl_pkts + reinject_compl_pkts;
+	u64_stats_update_end(&tx->statss);
+	return num_descs_cleaned;
 }
 
 bool gve_tx_poll_dqo(struct gve_notify_block *block, bool do_clean)
 {
-	return false;
+	struct gve_tx_compl_desc *compl_desc;
+	struct gve_tx_ring *tx = block->tx;
+	struct gve_priv *priv = block->priv;
+
+	if (do_clean) {
+		int num_descs_cleaned = gve_clean_tx_done_dqo(priv, tx,
+							      &block->napi);
+
+		/* Sync with queue being stopped in `gve_maybe_stop_tx_dqo()` */
+		mb();
+
+		if (netif_tx_queue_stopped(tx->netdev_txq) &&
+		    num_descs_cleaned > 0) {
+			tx->wake_queue++;
+			netif_tx_wake_queue(tx->netdev_txq);
+		}
+	}
+
+	/* Return true if we still have work. */
+	compl_desc = &tx->dqo.compl_ring[tx->dqo_compl.head];
+	return compl_desc->generation != tx->dqo_compl.cur_gen_bit;
 }