Message ID | 20191002193553.1633467-2-jernej.skrabec@siol.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | media: cedrus: improvements | expand |
Hi, On Wed 02 Oct 19, 21:35, Jernej Skrabec wrote: > It seems that for some H264 videos at least one bitstream parsing > trigger must be called in order to be decoded correctly. There is no > explanation why this helps, but it was observed that two sample videos > with this fix are now decoded correctly and there is no regression with > others. I understand there might be some magic going on under the hood here, but I would be interested in trying to understand what's really going on. For instance, comparing register dumps of the whole H264 block before/after calling the hardware parser could help, and comparing that to using the previous code (without hardware parsing). I could try and have a look if you have an available sample for testing the erroneous case! Another minor thing: do you have some idea of whether the udelay call adds significant delay in the process? Cheers and thanks for the patch! Paul > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> > --- > .../staging/media/sunxi/cedrus/cedrus_h264.c | 30 +++++++++++++++++-- > .../staging/media/sunxi/cedrus/cedrus_regs.h | 3 ++ > 2 files changed, 30 insertions(+), 3 deletions(-) > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > index d6a782703c9b..bd848146eada 100644 > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > @@ -6,6 +6,7 @@ > * Copyright (c) 2018 Bootlin > */ > > +#include <linux/delay.h> > #include <linux/types.h> > > #include <media/videobuf2-dma-contig.h> > @@ -289,6 +290,28 @@ static void cedrus_write_pred_weight_table(struct cedrus_ctx *ctx, > } > } > > +/* > + * It turns out that using VE_H264_VLD_OFFSET to skip bits is not reliable. In > + * rare cases frame is not decoded correctly. However, setting offset to 0 and > + * skipping appropriate amount of bits with flush bits trigger always works. > + */ > +static void cedrus_skip_bits(struct cedrus_dev *dev, int num) > +{ > + int count = 0; > + > + while (count < num) { > + int tmp = min(num - count, 32); > > + > + cedrus_write(dev, VE_H264_TRIGGER_TYPE, > + VE_H264_TRIGGER_TYPE_FLUSH_BITS | > + VE_H264_TRIGGER_TYPE_N_BITS(tmp)); > + while (cedrus_read(dev, VE_H264_STATUS) & VE_H264_STATUS_VLD_BUSY) > + udelay(1); > + > + count += tmp; > + } > +} > + > static void cedrus_set_params(struct cedrus_ctx *ctx, > struct cedrus_run *run) > { > @@ -299,12 +322,11 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, > struct vb2_buffer *src_buf = &run->src->vb2_buf; > struct cedrus_dev *dev = ctx->dev; > dma_addr_t src_buf_addr; > - u32 offset = slice->header_bit_size; > - u32 len = (slice->size * 8) - offset; > + u32 len = slice->size * 8; > u32 reg; > > cedrus_write(dev, VE_H264_VLD_LEN, len); > - cedrus_write(dev, VE_H264_VLD_OFFSET, offset); > + cedrus_write(dev, VE_H264_VLD_OFFSET, 0); > > src_buf_addr = vb2_dma_contig_plane_dma_addr(src_buf, 0); > cedrus_write(dev, VE_H264_VLD_END, > @@ -323,6 +345,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, > cedrus_write(dev, VE_H264_TRIGGER_TYPE, > VE_H264_TRIGGER_TYPE_INIT_SWDEC); > > + cedrus_skip_bits(dev, slice->header_bit_size); > + > if (((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) && > (slice->slice_type == V4L2_H264_SLICE_TYPE_P || > slice->slice_type == V4L2_H264_SLICE_TYPE_SP)) || > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > index 3329f9aaf975..b52926a54025 100644 > --- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > @@ -538,13 +538,16 @@ > VE_H264_CTRL_SLICE_DECODE_INT) > > #define VE_H264_TRIGGER_TYPE 0x224 > +#define VE_H264_TRIGGER_TYPE_N_BITS(x) (((x) & 0x3f) << 8) > #define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE (8 << 0) > #define VE_H264_TRIGGER_TYPE_INIT_SWDEC (7 << 0) > +#define VE_H264_TRIGGER_TYPE_FLUSH_BITS (3 << 0) > > #define VE_H264_STATUS 0x228 > #define VE_H264_STATUS_VLD_DATA_REQ_INT VE_H264_CTRL_VLD_DATA_REQ_INT > #define VE_H264_STATUS_DECODE_ERR_INT VE_H264_CTRL_DECODE_ERR_INT > #define VE_H264_STATUS_SLICE_DECODE_INT VE_H264_CTRL_SLICE_DECODE_INT > +#define VE_H264_STATUS_VLD_BUSY BIT(8) > > #define VE_H264_STATUS_INT_MASK VE_H264_CTRL_INT_MASK > > -- > 2.23.0 >
Hi! Sorry for late reponse, technical issues... Dne sreda, 02. oktober 2019 ob 23:54:47 CEST je Paul Kocialkowski napisal(a): > Hi, > > On Wed 02 Oct 19, 21:35, Jernej Skrabec wrote: > > It seems that for some H264 videos at least one bitstream parsing > > trigger must be called in order to be decoded correctly. There is no > > explanation why this helps, but it was observed that two sample videos > > with this fix are now decoded correctly and there is no regression with > > others. > > I understand there might be some magic going on under the hood here, but I > would be interested in trying to understand what's really going on. Me too. > > For instance, comparing register dumps of the whole H264 block before/after > calling the hardware parser could help, and comparing that to using the > previous code (without hardware parsing). Please understand that I was working on this on and off for almost half a year and checked many times all register values. At one point I tried libvdpau- sunxi which has no problem with sample video. Still, all relevant register values were the same. In a desperate attempt, I tried with HW header parsing which magically solved the issue. After that, I reused values provided in controls and then finally I made minimal solution as suggested in this patch. > > I could try and have a look if you have an available sample for testing the > erroneous case! Of course: http://jernej.libreelec.tv/videos/h264/test.mkv > > Another minor thing: do you have some idea of whether the udelay call adds > significant delay in the process? I didn't notice any issue with it. Do you have any better idea? I just didn't want to make empty loop and udelay is the shortest delay that is provided by the kernel API. Best regards, Jernej > > Cheers and thanks for the patch! > > Paul > > > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> > > --- > > > > .../staging/media/sunxi/cedrus/cedrus_h264.c | 30 +++++++++++++++++-- > > .../staging/media/sunxi/cedrus/cedrus_regs.h | 3 ++ > > 2 files changed, 30 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index > > d6a782703c9b..bd848146eada 100644 > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > @@ -6,6 +6,7 @@ > > > > * Copyright (c) 2018 Bootlin > > */ > > > > +#include <linux/delay.h> > > > > #include <linux/types.h> > > > > #include <media/videobuf2-dma-contig.h> > > > > @@ -289,6 +290,28 @@ static void cedrus_write_pred_weight_table(struct > > cedrus_ctx *ctx,> > > } > > > > } > > > > +/* > > + * It turns out that using VE_H264_VLD_OFFSET to skip bits is not > > reliable. In + * rare cases frame is not decoded correctly. However, > > setting offset to 0 and + * skipping appropriate amount of bits with > > flush bits trigger always works. + */ > > +static void cedrus_skip_bits(struct cedrus_dev *dev, int num) > > +{ > > + int count = 0; > > + > > + while (count < num) { > > + int tmp = min(num - count, 32); > > > > + > > + cedrus_write(dev, VE_H264_TRIGGER_TYPE, > > + VE_H264_TRIGGER_TYPE_FLUSH_BITS | > > + VE_H264_TRIGGER_TYPE_N_BITS(tmp)); > > + while (cedrus_read(dev, VE_H264_STATUS) & VE_H264_STATUS_VLD_BUSY) > > + udelay(1); > > + > > + count += tmp; > > + } > > +} > > + > > > > static void cedrus_set_params(struct cedrus_ctx *ctx, > > > > struct cedrus_run *run) > > > > { > > > > @@ -299,12 +322,11 @@ static void cedrus_set_params(struct cedrus_ctx > > *ctx, > > > > struct vb2_buffer *src_buf = &run->src->vb2_buf; > > struct cedrus_dev *dev = ctx->dev; > > dma_addr_t src_buf_addr; > > > > - u32 offset = slice->header_bit_size; > > - u32 len = (slice->size * 8) - offset; > > + u32 len = slice->size * 8; > > > > u32 reg; > > > > cedrus_write(dev, VE_H264_VLD_LEN, len); > > > > - cedrus_write(dev, VE_H264_VLD_OFFSET, offset); > > + cedrus_write(dev, VE_H264_VLD_OFFSET, 0); > > > > src_buf_addr = vb2_dma_contig_plane_dma_addr(src_buf, 0); > > cedrus_write(dev, VE_H264_VLD_END, > > > > @@ -323,6 +345,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, > > > > cedrus_write(dev, VE_H264_TRIGGER_TYPE, > > > > VE_H264_TRIGGER_TYPE_INIT_SWDEC); > > > > + cedrus_skip_bits(dev, slice->header_bit_size); > > + > > > > if (((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) && > > > > (slice->slice_type == V4L2_H264_SLICE_TYPE_P || > > > > slice->slice_type == V4L2_H264_SLICE_TYPE_SP)) || > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > > b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h index > > 3329f9aaf975..b52926a54025 100644 > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > > @@ -538,13 +538,16 @@ > > > > VE_H264_CTRL_SLICE_DECODE_INT) > > > > #define VE_H264_TRIGGER_TYPE 0x224 > > > > +#define VE_H264_TRIGGER_TYPE_N_BITS(x) (((x) & 0x3f) << 8) > > > > #define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE (8 << 0) > > #define VE_H264_TRIGGER_TYPE_INIT_SWDEC (7 << 0) > > > > +#define VE_H264_TRIGGER_TYPE_FLUSH_BITS (3 << 0) > > > > #define VE_H264_STATUS 0x228 > > #define VE_H264_STATUS_VLD_DATA_REQ_INT VE_H264_CTRL_VLD_DATA_REQ_INT > > #define VE_H264_STATUS_DECODE_ERR_INT VE_H264_CTRL_DECODE_ERR_INT > > #define VE_H264_STATUS_SLICE_DECODE_INT VE_H264_CTRL_SLICE_DECODE_INT > > > > +#define VE_H264_STATUS_VLD_BUSY BIT(8) > > > > #define VE_H264_STATUS_INT_MASK VE_H264_CTRL_INT_MASK
Hi, On Tue 15 Oct 19, 19:16, Jernej Škrabec wrote: > Please understand that I was working on this on and off for almost half a year > and checked many times all register values. At one point I tried libvdpau- > sunxi which has no problem with sample video. Still, all relevant register > values were the same. In a desperate attempt, I tried with HW header parsing > which magically solved the issue. After that, I reused values provided in > controls and then finally I made minimal solution as suggested in this patch. Okay thanks for the details. I think I've delayed this for far too long already so I think we should get it in without further delay. The patch apparently no longer applies on top of media/master, but feel free to send out a rebased series with: Acked-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Let's leave out 2/3 though, I think I will submit a series adding the flag as indication for the per-slice value in the uAPI and use it in cedrus. Cheers, Paul > > > > I could try and have a look if you have an available sample for testing the > > erroneous case! > > Of course: http://jernej.libreelec.tv/videos/h264/test.mkv > > > > > Another minor thing: do you have some idea of whether the udelay call adds > > significant delay in the process? > > I didn't notice any issue with it. Do you have any better idea? I just didn't > want to make empty loop and udelay is the shortest delay that is provided by > the kernel API. > > Best regards, > Jernej > > > > > Cheers and thanks for the patch! > > > > Paul > > > > > Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> > > > --- > > > > > > .../staging/media/sunxi/cedrus/cedrus_h264.c | 30 +++++++++++++++++-- > > > .../staging/media/sunxi/cedrus/cedrus_regs.h | 3 ++ > > > 2 files changed, 30 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index > > > d6a782703c9b..bd848146eada 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c > > > @@ -6,6 +6,7 @@ > > > > > > * Copyright (c) 2018 Bootlin > > > */ > > > > > > +#include <linux/delay.h> > > > > > > #include <linux/types.h> > > > > > > #include <media/videobuf2-dma-contig.h> > > > > > > @@ -289,6 +290,28 @@ static void cedrus_write_pred_weight_table(struct > > > cedrus_ctx *ctx,> > > > } > > > > > > } > > > > > > +/* > > > + * It turns out that using VE_H264_VLD_OFFSET to skip bits is not > > > reliable. In + * rare cases frame is not decoded correctly. However, > > > setting offset to 0 and + * skipping appropriate amount of bits with > > > flush bits trigger always works. + */ > > > +static void cedrus_skip_bits(struct cedrus_dev *dev, int num) > > > +{ > > > + int count = 0; > > > + > > > + while (count < num) { > > > + int tmp = min(num - count, 32); > > > > > > + > > > + cedrus_write(dev, VE_H264_TRIGGER_TYPE, > > > + VE_H264_TRIGGER_TYPE_FLUSH_BITS | > > > + VE_H264_TRIGGER_TYPE_N_BITS(tmp)); > > > + while (cedrus_read(dev, VE_H264_STATUS) & > VE_H264_STATUS_VLD_BUSY) > > > + udelay(1); > > > + > > > + count += tmp; > > > + } > > > +} > > > + > > > > > > static void cedrus_set_params(struct cedrus_ctx *ctx, > > > > > > struct cedrus_run *run) > > > > > > { > > > > > > @@ -299,12 +322,11 @@ static void cedrus_set_params(struct cedrus_ctx > > > *ctx, > > > > > > struct vb2_buffer *src_buf = &run->src->vb2_buf; > > > struct cedrus_dev *dev = ctx->dev; > > > dma_addr_t src_buf_addr; > > > > > > - u32 offset = slice->header_bit_size; > > > - u32 len = (slice->size * 8) - offset; > > > + u32 len = slice->size * 8; > > > > > > u32 reg; > > > > > > cedrus_write(dev, VE_H264_VLD_LEN, len); > > > > > > - cedrus_write(dev, VE_H264_VLD_OFFSET, offset); > > > + cedrus_write(dev, VE_H264_VLD_OFFSET, 0); > > > > > > src_buf_addr = vb2_dma_contig_plane_dma_addr(src_buf, 0); > > > cedrus_write(dev, VE_H264_VLD_END, > > > > > > @@ -323,6 +345,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, > > > > > > cedrus_write(dev, VE_H264_TRIGGER_TYPE, > > > > > > VE_H264_TRIGGER_TYPE_INIT_SWDEC); > > > > > > + cedrus_skip_bits(dev, slice->header_bit_size); > > > + > > > > > > if (((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) && > > > > > > (slice->slice_type == V4L2_H264_SLICE_TYPE_P || > > > > > > slice->slice_type == V4L2_H264_SLICE_TYPE_SP)) || > > > > > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > > > b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h index > > > 3329f9aaf975..b52926a54025 100644 > > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h > > > @@ -538,13 +538,16 @@ > > > > > > > VE_H264_CTRL_SLICE_DECODE_INT) > > > > > > #define VE_H264_TRIGGER_TYPE 0x224 > > > > > > +#define VE_H264_TRIGGER_TYPE_N_BITS(x) (((x) & 0x3f) << 8) > > > > > > #define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE (8 << 0) > > > #define VE_H264_TRIGGER_TYPE_INIT_SWDEC (7 << 0) > > > > > > +#define VE_H264_TRIGGER_TYPE_FLUSH_BITS (3 << 0) > > > > > > #define VE_H264_STATUS 0x228 > > > #define VE_H264_STATUS_VLD_DATA_REQ_INT > VE_H264_CTRL_VLD_DATA_REQ_INT > > > #define VE_H264_STATUS_DECODE_ERR_INT > VE_H264_CTRL_DECODE_ERR_INT > > > #define VE_H264_STATUS_SLICE_DECODE_INT > VE_H264_CTRL_SLICE_DECODE_INT > > > > > > +#define VE_H264_STATUS_VLD_BUSY BIT(8) > > > > > > #define VE_H264_STATUS_INT_MASK > VE_H264_CTRL_INT_MASK > > > >
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index d6a782703c9b..bd848146eada 100644 --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c @@ -6,6 +6,7 @@ * Copyright (c) 2018 Bootlin */ +#include <linux/delay.h> #include <linux/types.h> #include <media/videobuf2-dma-contig.h> @@ -289,6 +290,28 @@ static void cedrus_write_pred_weight_table(struct cedrus_ctx *ctx, } } +/* + * It turns out that using VE_H264_VLD_OFFSET to skip bits is not reliable. In + * rare cases frame is not decoded correctly. However, setting offset to 0 and + * skipping appropriate amount of bits with flush bits trigger always works. + */ +static void cedrus_skip_bits(struct cedrus_dev *dev, int num) +{ + int count = 0; + + while (count < num) { + int tmp = min(num - count, 32); + + cedrus_write(dev, VE_H264_TRIGGER_TYPE, + VE_H264_TRIGGER_TYPE_FLUSH_BITS | + VE_H264_TRIGGER_TYPE_N_BITS(tmp)); + while (cedrus_read(dev, VE_H264_STATUS) & VE_H264_STATUS_VLD_BUSY) + udelay(1); + + count += tmp; + } +} + static void cedrus_set_params(struct cedrus_ctx *ctx, struct cedrus_run *run) { @@ -299,12 +322,11 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, struct vb2_buffer *src_buf = &run->src->vb2_buf; struct cedrus_dev *dev = ctx->dev; dma_addr_t src_buf_addr; - u32 offset = slice->header_bit_size; - u32 len = (slice->size * 8) - offset; + u32 len = slice->size * 8; u32 reg; cedrus_write(dev, VE_H264_VLD_LEN, len); - cedrus_write(dev, VE_H264_VLD_OFFSET, offset); + cedrus_write(dev, VE_H264_VLD_OFFSET, 0); src_buf_addr = vb2_dma_contig_plane_dma_addr(src_buf, 0); cedrus_write(dev, VE_H264_VLD_END, @@ -323,6 +345,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, cedrus_write(dev, VE_H264_TRIGGER_TYPE, VE_H264_TRIGGER_TYPE_INIT_SWDEC); + cedrus_skip_bits(dev, slice->header_bit_size); + if (((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) && (slice->slice_type == V4L2_H264_SLICE_TYPE_P || slice->slice_type == V4L2_H264_SLICE_TYPE_SP)) || diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h index 3329f9aaf975..b52926a54025 100644 --- a/drivers/staging/media/sunxi/cedrus/cedrus_regs.h +++ b/drivers/staging/media/sunxi/cedrus/cedrus_regs.h @@ -538,13 +538,16 @@ VE_H264_CTRL_SLICE_DECODE_INT) #define VE_H264_TRIGGER_TYPE 0x224 +#define VE_H264_TRIGGER_TYPE_N_BITS(x) (((x) & 0x3f) << 8) #define VE_H264_TRIGGER_TYPE_AVC_SLICE_DECODE (8 << 0) #define VE_H264_TRIGGER_TYPE_INIT_SWDEC (7 << 0) +#define VE_H264_TRIGGER_TYPE_FLUSH_BITS (3 << 0) #define VE_H264_STATUS 0x228 #define VE_H264_STATUS_VLD_DATA_REQ_INT VE_H264_CTRL_VLD_DATA_REQ_INT #define VE_H264_STATUS_DECODE_ERR_INT VE_H264_CTRL_DECODE_ERR_INT #define VE_H264_STATUS_SLICE_DECODE_INT VE_H264_CTRL_SLICE_DECODE_INT +#define VE_H264_STATUS_VLD_BUSY BIT(8) #define VE_H264_STATUS_INT_MASK VE_H264_CTRL_INT_MASK
It seems that for some H264 videos at least one bitstream parsing trigger must be called in order to be decoded correctly. There is no explanation why this helps, but it was observed that two sample videos with this fix are now decoded correctly and there is no regression with others. Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> --- .../staging/media/sunxi/cedrus/cedrus_h264.c | 30 +++++++++++++++++-- .../staging/media/sunxi/cedrus/cedrus_regs.h | 3 ++ 2 files changed, 30 insertions(+), 3 deletions(-)