Message ID | 20200715202233.185680-9-ezequiel@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | media: Clean H264 stateless uAPI | expand |
On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > The H.264 specification requires in its "Slice header semantics" > section that the following values shall be the same in all slice headers: > > pic_parameter_set_id > frame_num > field_pic_flag > bottom_field_flag > idr_pic_id > pic_order_cnt_lsb > delta_pic_order_cnt_bottom > delta_pic_order_cnt[ 0 ] > delta_pic_order_cnt[ 1 ] > sp_for_switch_flag > slice_group_change_cycle > > and can therefore be moved to the per-frame decode parameters control. I am really not a H.264 expert, so this question may not be relevant, but are these values specified for every slice header in the bitstream, or are they specified only once per frame? I am asking this because it would certainly make user-space code simpler if we could remain as close to the bitstream as possible. If these values are specified once per slice, then factorizing them would leave user-space with the burden of deciding what to do if they change across slices. Note that this is a double-edged sword, because it is not necessarily better to leave the firmware in charge of deciding what to do in such a case. :) So hopefully these are only specified once per frame in the bitstream, in which case your proposal makes complete sense.
Hi Alexandre, Thanks a lot for the review. On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > The H.264 specification requires in its "Slice header semantics" > > section that the following values shall be the same in all slice headers: > > > > pic_parameter_set_id > > frame_num > > field_pic_flag > > bottom_field_flag > > idr_pic_id > > pic_order_cnt_lsb > > delta_pic_order_cnt_bottom > > delta_pic_order_cnt[ 0 ] > > delta_pic_order_cnt[ 1 ] > > sp_for_switch_flag > > slice_group_change_cycle > > > > and can therefore be moved to the per-frame decode parameters control. > > I am really not a H.264 expert, so this question may not be relevant, All questions are welcome. I'm more than happy to discuss this patchset. > but are these values specified for every slice header in the > bitstream, or are they specified only once per frame? > > I am asking this because it would certainly make user-space code > simpler if we could remain as close to the bitstream as possible. If > these values are specified once per slice, then factorizing them would > leave user-space with the burden of deciding what to do if they change > across slices. > > Note that this is a double-edged sword, because it is not necessarily > better to leave the firmware in charge of deciding what to do in such > a case. :) So hopefully these are only specified once per frame in the > bitstream, in which case your proposal makes complete sense. Frame-based hardwares accelerators such as Hantro and Rockchip VDEC are doing the slice header parsing themselves. Therefore, the driver is not really parsing these fields on each slice header. Currently, we are already using only the first slice in a frame, as you can see from: if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; Even if these fields are transported in the slice header, I think it makes sense for us to split them into the decode params (per-frame) control. They are really specified to be the same across all slices, so even I'd say if a bitstream violates this, it's likely either a corrupted bitstream or an encoder bug. OTOH, one thing this makes me realize is that the slice params control is wrongly specified as an array. Namely, this text should be removed: This structure is expected to be passed as an array, with one entry for each slice included in the bitstream buffer. As the API is really not defined that way. I'll remove that on next iteration. Thanks for raising this point, Ezequiel
On Mon, Jul 27, 2020 at 4:39 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > Hi Alexandre, > > Thanks a lot for the review. > > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > The H.264 specification requires in its "Slice header semantics" > > > section that the following values shall be the same in all slice headers: > > > > > > pic_parameter_set_id > > > frame_num > > > field_pic_flag > > > bottom_field_flag > > > idr_pic_id > > > pic_order_cnt_lsb > > > delta_pic_order_cnt_bottom > > > delta_pic_order_cnt[ 0 ] > > > delta_pic_order_cnt[ 1 ] > > > sp_for_switch_flag > > > slice_group_change_cycle > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > I am really not a H.264 expert, so this question may not be relevant, > > All questions are welcome. I'm more than happy to discuss this patchset. > > > but are these values specified for every slice header in the > > bitstream, or are they specified only once per frame? > > > > I am asking this because it would certainly make user-space code > > simpler if we could remain as close to the bitstream as possible. If > > these values are specified once per slice, then factorizing them would > > leave user-space with the burden of deciding what to do if they change > > across slices. > > > > Note that this is a double-edged sword, because it is not necessarily > > better to leave the firmware in charge of deciding what to do in such > > a case. :) So hopefully these are only specified once per frame in the > > bitstream, in which case your proposal makes complete sense. > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > are doing the slice header parsing themselves. Therefore, the > driver is not really parsing these fields on each slice header. > > Currently, we are already using only the first slice in a frame, > as you can see from: > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > Even if these fields are transported in the slice header, > I think it makes sense for us to split them into the decode params > (per-frame) control. > > They are really specified to be the same across all slices, > so even I'd say if a bitstream violates this, it's likely > either a corrupted bitstream or an encoder bug. > > OTOH, one thing this makes me realize is that the slice params control > is wrongly specified as an array. It is _not_. > Namely, this text > should be removed: > > This structure is expected to be passed as an array, with one > entry for each slice included in the bitstream buffer. > > As the API is really not defined that way. > > I'll remove that on next iteration. The v4l2_ctrl_h264_slice_params struct has more data than those that are deemed to be the same across all the slices. A remarkable example are the size and start_byte_offset fields.
On Mon, 2020-07-27 at 16:52 +0200, Tomasz Figa wrote: > On Mon, Jul 27, 2020 at 4:39 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > Hi Alexandre, > > > > Thanks a lot for the review. > > > > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > The H.264 specification requires in its "Slice header semantics" > > > > section that the following values shall be the same in all slice headers: > > > > > > > > pic_parameter_set_id > > > > frame_num > > > > field_pic_flag > > > > bottom_field_flag > > > > idr_pic_id > > > > pic_order_cnt_lsb > > > > delta_pic_order_cnt_bottom > > > > delta_pic_order_cnt[ 0 ] > > > > delta_pic_order_cnt[ 1 ] > > > > sp_for_switch_flag > > > > slice_group_change_cycle > > > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > > > I am really not a H.264 expert, so this question may not be relevant, > > > > All questions are welcome. I'm more than happy to discuss this patchset. > > > > > but are these values specified for every slice header in the > > > bitstream, or are they specified only once per frame? > > > > > > I am asking this because it would certainly make user-space code > > > simpler if we could remain as close to the bitstream as possible. If > > > these values are specified once per slice, then factorizing them would > > > leave user-space with the burden of deciding what to do if they change > > > across slices. > > > > > > Note that this is a double-edged sword, because it is not necessarily > > > better to leave the firmware in charge of deciding what to do in such > > > a case. :) So hopefully these are only specified once per frame in the > > > bitstream, in which case your proposal makes complete sense. > > > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > > are doing the slice header parsing themselves. Therefore, the > > driver is not really parsing these fields on each slice header. > > > > Currently, we are already using only the first slice in a frame, > > as you can see from: > > > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > > > Even if these fields are transported in the slice header, > > I think it makes sense for us to split them into the decode params > > (per-frame) control. > > > > They are really specified to be the same across all slices, > > so even I'd say if a bitstream violates this, it's likely > > either a corrupted bitstream or an encoder bug. > > > > OTOH, one thing this makes me realize is that the slice params control > > is wrongly specified as an array. > > It is _not_. > We introduced the hold capture buffer specifically to support this without having a slice array. I don't think we have a plan to support this control properly as an array. If we decide to support the slice control as an array, we would have to implement a mechanism to specify the array size, which we currently don't have AFAIK. > > Namely, this text > > should be removed: > > > > This structure is expected to be passed as an array, with one > > entry for each slice included in the bitstream buffer. > > > > As the API is really not defined that way. > > > > I'll remove that on next iteration. > > The v4l2_ctrl_h264_slice_params struct has more data than those that > are deemed to be the same across all the slices. A remarkable example > are the size and start_byte_offset fields. Not sure how this applies to this discussion. Thanks! Ezequiel
On Mon, Jul 27, 2020 at 6:18 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > On Mon, 2020-07-27 at 16:52 +0200, Tomasz Figa wrote: > > On Mon, Jul 27, 2020 at 4:39 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > Hi Alexandre, > > > > > > Thanks a lot for the review. > > > > > > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > The H.264 specification requires in its "Slice header semantics" > > > > > section that the following values shall be the same in all slice headers: > > > > > > > > > > pic_parameter_set_id > > > > > frame_num > > > > > field_pic_flag > > > > > bottom_field_flag > > > > > idr_pic_id > > > > > pic_order_cnt_lsb > > > > > delta_pic_order_cnt_bottom > > > > > delta_pic_order_cnt[ 0 ] > > > > > delta_pic_order_cnt[ 1 ] > > > > > sp_for_switch_flag > > > > > slice_group_change_cycle > > > > > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > > > > > I am really not a H.264 expert, so this question may not be relevant, > > > > > > All questions are welcome. I'm more than happy to discuss this patchset. > > > > > > > but are these values specified for every slice header in the > > > > bitstream, or are they specified only once per frame? > > > > > > > > I am asking this because it would certainly make user-space code > > > > simpler if we could remain as close to the bitstream as possible. If > > > > these values are specified once per slice, then factorizing them would > > > > leave user-space with the burden of deciding what to do if they change > > > > across slices. > > > > > > > > Note that this is a double-edged sword, because it is not necessarily > > > > better to leave the firmware in charge of deciding what to do in such > > > > a case. :) So hopefully these are only specified once per frame in the > > > > bitstream, in which case your proposal makes complete sense. > > > > > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > > > are doing the slice header parsing themselves. Therefore, the > > > driver is not really parsing these fields on each slice header. > > > > > > Currently, we are already using only the first slice in a frame, > > > as you can see from: > > > > > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > > > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > > > > > Even if these fields are transported in the slice header, > > > I think it makes sense for us to split them into the decode params > > > (per-frame) control. > > > > > > They are really specified to be the same across all slices, > > > so even I'd say if a bitstream violates this, it's likely > > > either a corrupted bitstream or an encoder bug. > > > > > > OTOH, one thing this makes me realize is that the slice params control > > > is wrongly specified as an array. > > > > It is _not_. > > > > We introduced the hold capture buffer specifically to support > this without having a slice array. > > I don't think we have a plan to support this control properly > as an array. > > If we decide to support the slice control as an array, > we would have to implement a mechanism to specify the array size, > which we currently don't have AFAIK. > That wasn't the conclusion when we discussed this last time on IRC. +Nicolas Dufresne Currently the 1-slice per buffer model is quite impractical: 1) the maximum number of buffers is 32, which for some streams can be less than needed to queue a single frame, 2) even more system call overhead due to the need to repeat various operations (e.g. qbuf/dqbuf) per-slice rather than per-frame, 3) no way to do hardware batching for hardware which supports queuing multiple slices at a time, 4) waste of memory - one needs to allocate all the OUTPUT buffers pessimistically to accommodate the biggest possible slice, while with all-slices-per-frame 1 buffer could be just heuristically allocated to be enough for the whole frame. These need to be carefully evaluated, with some proper testing done to confirm whether they are really a problem or not. > > > Namely, this text > > > should be removed: > > > > > > This structure is expected to be passed as an array, with one > > > entry for each slice included in the bitstream buffer. > > > > > > As the API is really not defined that way. > > > > > > I'll remove that on next iteration. > > > > The v4l2_ctrl_h264_slice_params struct has more data than those that > > are deemed to be the same across all the slices. A remarkable example > > are the size and start_byte_offset fields. > > Not sure how this applies to this discussion. These fields need to be specified for each slice in the buffer to make it possible to handle multiple slices per buffer. Best regards, Tomasz
Le lundi 27 juillet 2020 à 20:10 +0200, Tomasz Figa a écrit : > On Mon, Jul 27, 2020 at 6:18 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > On Mon, 2020-07-27 at 16:52 +0200, Tomasz Figa wrote: > > > On Mon, Jul 27, 2020 at 4:39 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > Hi Alexandre, > > > > > > > > Thanks a lot for the review. > > > > > > > > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > > > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > The H.264 specification requires in its "Slice header semantics" > > > > > > section that the following values shall be the same in all slice headers: > > > > > > > > > > > > pic_parameter_set_id > > > > > > frame_num > > > > > > field_pic_flag > > > > > > bottom_field_flag > > > > > > idr_pic_id > > > > > > pic_order_cnt_lsb > > > > > > delta_pic_order_cnt_bottom > > > > > > delta_pic_order_cnt[ 0 ] > > > > > > delta_pic_order_cnt[ 1 ] > > > > > > sp_for_switch_flag > > > > > > slice_group_change_cycle > > > > > > > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > > > > > > > I am really not a H.264 expert, so this question may not be relevant, > > > > > > > > All questions are welcome. I'm more than happy to discuss this patchset. > > > > > > > > > but are these values specified for every slice header in the > > > > > bitstream, or are they specified only once per frame? > > > > > > > > > > I am asking this because it would certainly make user-space code > > > > > simpler if we could remain as close to the bitstream as possible. If > > > > > these values are specified once per slice, then factorizing them would > > > > > leave user-space with the burden of deciding what to do if they change > > > > > across slices. > > > > > > > > > > Note that this is a double-edged sword, because it is not necessarily > > > > > better to leave the firmware in charge of deciding what to do in such > > > > > a case. :) So hopefully these are only specified once per frame in the > > > > > bitstream, in which case your proposal makes complete sense. > > > > > > > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > > > > are doing the slice header parsing themselves. Therefore, the > > > > driver is not really parsing these fields on each slice header. > > > > > > > > Currently, we are already using only the first slice in a frame, > > > > as you can see from: > > > > > > > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > > > > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > > > > > > > Even if these fields are transported in the slice header, > > > > I think it makes sense for us to split them into the decode params > > > > (per-frame) control. > > > > > > > > They are really specified to be the same across all slices, > > > > so even I'd say if a bitstream violates this, it's likely > > > > either a corrupted bitstream or an encoder bug. > > > > > > > > OTOH, one thing this makes me realize is that the slice params control > > > > is wrongly specified as an array. > > > > > > It is _not_. > > > > > > > We introduced the hold capture buffer specifically to support > > this without having a slice array. > > > > I don't think we have a plan to support this control properly > > as an array. > > > > If we decide to support the slice control as an array, > > we would have to implement a mechanism to specify the array size, > > which we currently don't have AFAIK. > > > > That wasn't the conclusion when we discussed this last time on IRC. > +Nicolas Dufresne > > Currently the 1-slice per buffer model is quite impractical: > 1) the maximum number of buffers is 32, which for some streams can be > less than needed to queue a single frame, To give more context, it seems the discussion was about being able to use slice decoder with a 1 poll() per frame model. Of course this will never be as efficient as when using a frame base decoder, but as current design, you can keep a list of pending request (each request is 1 slice/buffer), and simply use memory pressure to poll a mid point and dequeue the remaining. An example, yo have 8 pending request, and reach your memory limit: [R1, R2, R3, R4, R5, R6, R7, R8] As requests are in order and behaves like memory fences, you can pick R6, and poll() that one. When R6 is ready, you can then dequeue R1 to R6 without blocking. In this context, a limit of 16 or 32 buffers seems fair, the optimization we can do in userspace is likely sufficient. So I'd like to drop problem 1) from our list. > 2) even more system call overhead due to the need to repeat various > operations (e.g. qbuf/dqbuf) per-slice rather than per-frame, > 3) no way to do hardware batching for hardware which supports queuing > multiple slices at a time, > 4) waste of memory - one needs to allocate all the OUTPUT buffers > pessimistically to accommodate the biggest possible slice, while with > all-slices-per-frame 1 buffer could be just heuristically allocated to > be enough for the whole frame. > > These need to be carefully evaluated, with some proper testing done to > confirm whether they are really a problem or not. 2, 3 and 4 seems to match what the currently unimplemented API propose. You can mitigate 2) but having multiple slices per buffers. That came with a byte offset to we can program the HW as if it was separate slice buffers. But was limited to 16 buffers, likely a fair compromise. 3) is about batching, in the only use case we know, the batching acceleration consist of programming the next operation on the completion IRQ. I already looked with the Cedrus developers if and how that was feasible, but we don't have a PoC yet. The optimization is about removing context switches between operations, which could prevent fully using the HW. 4) is also well covered with being able to multiplex 1 buffer with multiple slices. To be fair, I understand why we'd like to drop this API, as none of the active developers here of slice decoder (cedrus) have time to engage in supporting this untested "optimization". It's not only about kernel support, but also requires userspace work. I also agree that it could be added later, as an extension. It could be done with 3 new controls, an array of slice_params and an array of slice start offset and the number of slices, or just one, introduce a new structure that have a slice_params structure embedded, num_slices and an array of slice_start_offset. I don't have preference myself, but I'm just illustrating that yes, we could drop the slice batching to avoid pushing untested APIs without scarifying our ability to decode a valid stream. > > > > > Namely, this text > > > > should be removed: > > > > > > > > This structure is expected to be passed as an array, with one > > > > entry for each slice included in the bitstream buffer. > > > > > > > > As the API is really not defined that way. > > > > > > > > I'll remove that on next iteration. > > > > > > The v4l2_ctrl_h264_slice_params struct has more data than those that > > > are deemed to be the same across all the slices. A remarkable example > > > are the size and start_byte_offset fields. > > > > Not sure how this applies to this discussion. > > These fields need to be specified for each slice in the buffer to make > it possible to handle multiple slices per buffer. > > Best regards, > Tomasz
Hi, On Mon, Jul 27, 2020 at 11:39:12AM -0300, Ezequiel Garcia wrote: > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > The H.264 specification requires in its "Slice header semantics" > > > section that the following values shall be the same in all slice headers: > > > > > > pic_parameter_set_id > > > frame_num > > > field_pic_flag > > > bottom_field_flag > > > idr_pic_id > > > pic_order_cnt_lsb > > > delta_pic_order_cnt_bottom > > > delta_pic_order_cnt[ 0 ] > > > delta_pic_order_cnt[ 1 ] > > > sp_for_switch_flag > > > slice_group_change_cycle > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > I am really not a H.264 expert, so this question may not be relevant, > > All questions are welcome. I'm more than happy to discuss this patchset. > > > but are these values specified for every slice header in the > > bitstream, or are they specified only once per frame? > > > > I am asking this because it would certainly make user-space code > > simpler if we could remain as close to the bitstream as possible. If > > these values are specified once per slice, then factorizing them would > > leave user-space with the burden of deciding what to do if they change > > across slices. > > > > Note that this is a double-edged sword, because it is not necessarily > > better to leave the firmware in charge of deciding what to do in such > > a case. :) So hopefully these are only specified once per frame in the > > bitstream, in which case your proposal makes complete sense. > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > are doing the slice header parsing themselves. Therefore, the > driver is not really parsing these fields on each slice header. > > Currently, we are already using only the first slice in a frame, > as you can see from: > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > Even if these fields are transported in the slice header, > I think it makes sense for us to split them into the decode params > (per-frame) control. I don't really get it though. The initial plan that was asked was to mimic as much as possible the bitstream and that's what we did. But that requirement seems to have changed now? Even if it did change though, if this is defined as a slice parameter in the spec, why would we want to move it to some other control entirely if we are to keep the slice parameters control? Especially if the reason is to make the life easier on a couple of drivers, that's really not the point of a userspace API, and we can always add an helper if it really shows up as a pattern. > They are really specified to be the same across all slices, > so even I'd say if a bitstream violates this, it's likely > either a corrupted bitstream or an encoder bug. That doesn't look like something we should worry about though. Or at least, this is true for pretty much any other field in the bitstream and we won't change those. Maxime
Le mardi 28 juillet 2020 à 14:44 +0200, Maxime Ripard a écrit : > Hi, > > On Mon, Jul 27, 2020 at 11:39:12AM -0300, Ezequiel Garcia wrote: > > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > The H.264 specification requires in its "Slice header semantics" > > > > section that the following values shall be the same in all slice headers: > > > > > > > > pic_parameter_set_id > > > > frame_num > > > > field_pic_flag > > > > bottom_field_flag > > > > idr_pic_id > > > > pic_order_cnt_lsb > > > > delta_pic_order_cnt_bottom > > > > delta_pic_order_cnt[ 0 ] > > > > delta_pic_order_cnt[ 1 ] > > > > sp_for_switch_flag > > > > slice_group_change_cycle > > > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > > > I am really not a H.264 expert, so this question may not be relevant, > > > > All questions are welcome. I'm more than happy to discuss this patchset. > > > > > but are these values specified for every slice header in the > > > bitstream, or are they specified only once per frame? > > > > > > I am asking this because it would certainly make user-space code > > > simpler if we could remain as close to the bitstream as possible. If > > > these values are specified once per slice, then factorizing them would > > > leave user-space with the burden of deciding what to do if they change > > > across slices. > > > > > > Note that this is a double-edged sword, because it is not necessarily > > > better to leave the firmware in charge of deciding what to do in such > > > a case. :) So hopefully these are only specified once per frame in the > > > bitstream, in which case your proposal makes complete sense. > > > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > > are doing the slice header parsing themselves. Therefore, the > > driver is not really parsing these fields on each slice header. > > > > Currently, we are already using only the first slice in a frame, > > as you can see from: > > > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > > > Even if these fields are transported in the slice header, > > I think it makes sense for us to split them into the decode params > > (per-frame) control. > > I don't really get it though. The initial plan that was asked was to > mimic as much as possible the bitstream and that's what we did. > > But that requirement seems to have changed now? Indeed, that has changed and has been discussed multiple times already. But in general what you need to realize is that the bitstream is made of redundancy in order to be robust over data lost. Most importantly, it is by design that you can loose a slice, but still the decode the other slices. Carrying this redundancy over the kernel API though didn't make much sense. It only made the amount of data we have to pass (and copy) for each frames bigger. The goal of this change is to reduce the amount of data you actually need to pass, both for frame and slice decoding. For frame decoding, all the invariants from the slice header (notice the spec name here) have been moved from slice_params (not a spec term) into the decode_params (also not a spec term). This way, the slice_params are no longer needed to be passed at all to driver. As for the slice decoding, assuming you care making use of the VB2 control caching, you can pass the decode_params on the first slice only, and only pass the slice_params, which is now slimmer, for the following slices. Remember that Paul initially believed that V4L2 stateless decoder was a stateless API, while in fact the HW is stateless, but V4L2 API are statefull by nature. p.s. it was also request to use raster scan scaling matrix, which isn't the order found in the bitstream. That was also discussed, and the reason is that other existing interfaces use raster order, so not using that is error prone for both kernel and userspace coders. Cedrus has been in raster scan order since the start, ignoring that rule already. > > Even if it did change though, if this is defined as a slice parameter in > the spec, why would we want to move it to some other control entirely if > we are to keep the slice parameters control? You are confusing a term we invented, slice_params, with a specified term slice header. The decode_params are as document information per frame/picture, while slice_params is information needed per slice, nothing more. > > Especially if the reason is to make the life easier on a couple of > drivers, that's really not the point of a userspace API, and we can > always add an helper if it really shows up as a pattern. We have made userspace implementation of this, GStreamer for now, I can only disagree with you. This does not make userspace much more complex and on top of that, it allow reducing some overhead, as prior to that we were computing reference lists for frame base decoder that already compute this in HW. I also find that it make debugging easier, as when we trace we don't endup looking at identical values over and over. > > > They are really specified to be the same across all slices, > > so even I'd say if a bitstream violates this, it's likely > > either a corrupted bitstream or an encoder bug. > > That doesn't look like something we should worry about though. Or at > least, this is true for pretty much any other field in the bitstream and > we won't change those. Sorry, I'm not clear what you refer to, not worry about the fact they are invariant or that the user may pass invalid data ? For the first, we believe it matters, that was also motivated by some of Stanimir research showing that controls are not as free as we'd like to think. Again, all this have been discussed for quite some time now and the participants have seen to reach an agreement on this direction. > > Maxime
On Mon, Jul 27, 2020 at 9:43 PM Nicolas Dufresne <nicolas.dufresne@collabora.com> wrote: > > Le lundi 27 juillet 2020 à 20:10 +0200, Tomasz Figa a écrit : > > On Mon, Jul 27, 2020 at 6:18 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > On Mon, 2020-07-27 at 16:52 +0200, Tomasz Figa wrote: > > > > On Mon, Jul 27, 2020 at 4:39 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > Hi Alexandre, > > > > > > > > > > Thanks a lot for the review. > > > > > > > > > > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > > > > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > > The H.264 specification requires in its "Slice header semantics" > > > > > > > section that the following values shall be the same in all slice headers: > > > > > > > > > > > > > > pic_parameter_set_id > > > > > > > frame_num > > > > > > > field_pic_flag > > > > > > > bottom_field_flag > > > > > > > idr_pic_id > > > > > > > pic_order_cnt_lsb > > > > > > > delta_pic_order_cnt_bottom > > > > > > > delta_pic_order_cnt[ 0 ] > > > > > > > delta_pic_order_cnt[ 1 ] > > > > > > > sp_for_switch_flag > > > > > > > slice_group_change_cycle > > > > > > > > > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > > > > > > > > > I am really not a H.264 expert, so this question may not be relevant, > > > > > > > > > > All questions are welcome. I'm more than happy to discuss this patchset. > > > > > > > > > > > but are these values specified for every slice header in the > > > > > > bitstream, or are they specified only once per frame? > > > > > > > > > > > > I am asking this because it would certainly make user-space code > > > > > > simpler if we could remain as close to the bitstream as possible. If > > > > > > these values are specified once per slice, then factorizing them would > > > > > > leave user-space with the burden of deciding what to do if they change > > > > > > across slices. > > > > > > > > > > > > Note that this is a double-edged sword, because it is not necessarily > > > > > > better to leave the firmware in charge of deciding what to do in such > > > > > > a case. :) So hopefully these are only specified once per frame in the > > > > > > bitstream, in which case your proposal makes complete sense. > > > > > > > > > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > > > > > are doing the slice header parsing themselves. Therefore, the > > > > > driver is not really parsing these fields on each slice header. > > > > > > > > > > Currently, we are already using only the first slice in a frame, > > > > > as you can see from: > > > > > > > > > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > > > > > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > > > > > > > > > Even if these fields are transported in the slice header, > > > > > I think it makes sense for us to split them into the decode params > > > > > (per-frame) control. > > > > > > > > > > They are really specified to be the same across all slices, > > > > > so even I'd say if a bitstream violates this, it's likely > > > > > either a corrupted bitstream or an encoder bug. > > > > > > > > > > OTOH, one thing this makes me realize is that the slice params control > > > > > is wrongly specified as an array. > > > > > > > > It is _not_. > > > > > > > > > > We introduced the hold capture buffer specifically to support > > > this without having a slice array. > > > > > > I don't think we have a plan to support this control properly > > > as an array. > > > > > > If we decide to support the slice control as an array, > > > we would have to implement a mechanism to specify the array size, > > > which we currently don't have AFAIK. > > > > > > > That wasn't the conclusion when we discussed this last time on IRC. > > +Nicolas Dufresne > > > > Currently the 1-slice per buffer model is quite impractical: > > 1) the maximum number of buffers is 32, which for some streams can be > > less than needed to queue a single frame, > > To give more context, it seems the discussion was about being able to > use slice decoder with a 1 poll() per frame model. Of course this will > never be as efficient as when using a frame base decoder, but as > current design, you can keep a list of pending request (each request is > 1 slice/buffer), and simply use memory pressure to poll a mid point and > dequeue the remaining. An example, yo have 8 pending request, and reach > your memory limit: > > [R1, R2, R3, R4, R5, R6, R7, R8] > > As requests are in order and behaves like memory fences, you can pick > R6, and poll() that one. When R6 is ready, you can then dequeue R1 to > R6 without blocking. In this context, a limit of 16 or 32 buffers seems > fair, the optimization we can do in userspace is likely sufficient. So > I'd like to drop problem 1) from our list. > Okay, I forgot about the ability to poll the requests. I guess this solves a part of the problem indeed. > > 2) even more system call overhead due to the need to repeat various > > operations (e.g. qbuf/dqbuf) per-slice rather than per-frame, > > 3) no way to do hardware batching for hardware which supports queuing > > multiple slices at a time, > > 4) waste of memory - one needs to allocate all the OUTPUT buffers > > pessimistically to accommodate the biggest possible slice, while with > > all-slices-per-frame 1 buffer could be just heuristically allocated to > > be enough for the whole frame. > > > > These need to be carefully evaluated, with some proper testing done to > > confirm whether they are really a problem or not. > > 2, 3 and 4 seems to match what the currently unimplemented API propose. > You can mitigate 2) but having multiple slices per buffers. That came > with a byte offset to we can program the HW as if it was separate slice > buffers. But was limited to 16 buffers, likely a fair compromise. > Do we have some ideas on how these problems could be addressed in the future? It would be unfortunate to freeze the current API just to realize that it can't be made efficient anymore and yet another API would have to be invented to redo things in an efficient way. With the request polling method, I guess we could solve 2) by making it possible to dequeue and enqueue multiple buffers at a time. It could be achieved by introducing DQBUF/QBUF variants which operate on an array of buffer indexes. > 3) is about batching, in the only use case we know, the batching > acceleration consist of programming the next operation on the > completion IRQ. I already looked with the Cedrus developers if and how > that was feasible, but we don't have a PoC yet. The optimization is > about removing context switches between operations, which could prevent > fully using the HW. Right, but still, we have to check whether the API we're going to stabilize wouldn't prevent implementing it in the future. One idea is to solve it opportunistically. If there are already some slices queued and not being processed by the hardware yet, queuing more would just join them to the existing (staging) batch. When the hardware finishes its current batch, the staging batch would be closed and handed to the hardware for decoding. > > 4) is also well covered with being able to multiplex 1 buffer with > multiple slices. Note that with MMAP memory type it's not possible, because 1 buffer can be only queued once. However, I guess that with DMABUF, one can just allocate one large buffer and queue it at different V4L2 buffer indexes with different .data_offset (or whatever we introduce for proper, well-defined offset handling). > > To be fair, I understand why we'd like to drop this API, as none of the > active developers here of slice decoder (cedrus) have time to engage in > supporting this untested "optimization". It's not only about kernel > support, but also requires userspace work. I also agree that it could > be added later, as an extension. It could be done with 3 new controls, > an array of slice_params and an array of slice start offset and the > number of slices, or just one, introduce a new structure that have a > slice_params structure embedded, num_slices and an array of > slice_start_offset. I don't have preference myself, but I'm just > illustrating that yes, we could drop the slice batching to avoid > pushing untested APIs without scarifying our ability to decode a valid > stream. Sure, that makes sense, but as I mentioned above, there are problems with the existing API and if we don't want to solve them right now, we at least have to make sure that the problems can be solved later after stabilizing it. Best regards, Tomasz
Hi Tomasz, On Tue, 2020-08-04 at 15:35 +0200, Tomasz Figa wrote: > On Mon, Jul 27, 2020 at 9:43 PM Nicolas Dufresne > <nicolas.dufresne@collabora.com> wrote: > > Le lundi 27 juillet 2020 à 20:10 +0200, Tomasz Figa a écrit : > > > On Mon, Jul 27, 2020 at 6:18 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > On Mon, 2020-07-27 at 16:52 +0200, Tomasz Figa wrote: > > > > > On Mon, Jul 27, 2020 at 4:39 PM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > Hi Alexandre, > > > > > > > > > > > > Thanks a lot for the review. > > > > > > > > > > > > On Sat, 2020-07-25 at 23:34 +0900, Alexandre Courbot wrote: > > > > > > > On Thu, Jul 16, 2020 at 5:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > > > > > The H.264 specification requires in its "Slice header semantics" > > > > > > > > section that the following values shall be the same in all slice headers: > > > > > > > > > > > > > > > > pic_parameter_set_id > > > > > > > > frame_num > > > > > > > > field_pic_flag > > > > > > > > bottom_field_flag > > > > > > > > idr_pic_id > > > > > > > > pic_order_cnt_lsb > > > > > > > > delta_pic_order_cnt_bottom > > > > > > > > delta_pic_order_cnt[ 0 ] > > > > > > > > delta_pic_order_cnt[ 1 ] > > > > > > > > sp_for_switch_flag > > > > > > > > slice_group_change_cycle > > > > > > > > > > > > > > > > and can therefore be moved to the per-frame decode parameters control. > > > > > > > > > > > > > > I am really not a H.264 expert, so this question may not be relevant, > > > > > > > > > > > > All questions are welcome. I'm more than happy to discuss this patchset. > > > > > > > > > > > > > but are these values specified for every slice header in the > > > > > > > bitstream, or are they specified only once per frame? > > > > > > > > > > > > > > I am asking this because it would certainly make user-space code > > > > > > > simpler if we could remain as close to the bitstream as possible. If > > > > > > > these values are specified once per slice, then factorizing them would > > > > > > > leave user-space with the burden of deciding what to do if they change > > > > > > > across slices. > > > > > > > > > > > > > > Note that this is a double-edged sword, because it is not necessarily > > > > > > > better to leave the firmware in charge of deciding what to do in such > > > > > > > a case. :) So hopefully these are only specified once per frame in the > > > > > > > bitstream, in which case your proposal makes complete sense. > > > > > > > > > > > > Frame-based hardwares accelerators such as Hantro and Rockchip VDEC > > > > > > are doing the slice header parsing themselves. Therefore, the > > > > > > driver is not really parsing these fields on each slice header. > > > > > > > > > > > > Currently, we are already using only the first slice in a frame, > > > > > > as you can see from: > > > > > > > > > > > > if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) > > > > > > reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; > > > > > > > > > > > > Even if these fields are transported in the slice header, > > > > > > I think it makes sense for us to split them into the decode params > > > > > > (per-frame) control. > > > > > > > > > > > > They are really specified to be the same across all slices, > > > > > > so even I'd say if a bitstream violates this, it's likely > > > > > > either a corrupted bitstream or an encoder bug. > > > > > > > > > > > > OTOH, one thing this makes me realize is that the slice params control > > > > > > is wrongly specified as an array. > > > > > > > > > > It is _not_. > > > > > > > > > > > > > We introduced the hold capture buffer specifically to support > > > > this without having a slice array. > > > > > > > > I don't think we have a plan to support this control properly > > > > as an array. > > > > > > > > If we decide to support the slice control as an array, > > > > we would have to implement a mechanism to specify the array size, > > > > which we currently don't have AFAIK. > > > > > > > > > > That wasn't the conclusion when we discussed this last time on IRC. > > > +Nicolas Dufresne > > > > > > Currently the 1-slice per buffer model is quite impractical: > > > 1) the maximum number of buffers is 32, which for some streams can be > > > less than needed to queue a single frame, > > > > To give more context, it seems the discussion was about being able to > > use slice decoder with a 1 poll() per frame model. Of course this will > > never be as efficient as when using a frame base decoder, but as > > current design, you can keep a list of pending request (each request is > > 1 slice/buffer), and simply use memory pressure to poll a mid point and > > dequeue the remaining. An example, yo have 8 pending request, and reach > > your memory limit: > > > > [R1, R2, R3, R4, R5, R6, R7, R8] > > > > As requests are in order and behaves like memory fences, you can pick > > R6, and poll() that one. When R6 is ready, you can then dequeue R1 to > > R6 without blocking. In this context, a limit of 16 or 32 buffers seems > > fair, the optimization we can do in userspace is likely sufficient. So > > I'd like to drop problem 1) from our list. > > > > Okay, I forgot about the ability to poll the requests. I guess this > solves a part of the problem indeed. > > > > 2) even more system call overhead due to the need to repeat various > > > operations (e.g. qbuf/dqbuf) per-slice rather than per-frame, > > > 3) no way to do hardware batching for hardware which supports queuing > > > multiple slices at a time, > > > 4) waste of memory - one needs to allocate all the OUTPUT buffers > > > pessimistically to accommodate the biggest possible slice, while with > > > all-slices-per-frame 1 buffer could be just heuristically allocated to > > > be enough for the whole frame. > > > > > > These need to be carefully evaluated, with some proper testing done to > > > confirm whether they are really a problem or not. > > > > 2, 3 and 4 seems to match what the currently unimplemented API propose. > > You can mitigate 2) but having multiple slices per buffers. That came > > with a byte offset to we can program the HW as if it was separate slice > > buffers. But was limited to 16 buffers, likely a fair compromise. > > > > Do we have some ideas on how these problems could be addressed in the > future? It would be unfortunate to freeze the current API just to > realize that it can't be made efficient anymore and yet another API > would have to be invented to redo things in an efficient way. > > With the request polling method, I guess we could solve 2) by making > it possible to dequeue and enqueue multiple buffers at a time. It > could be achieved by introducing DQBUF/QBUF variants which operate on > an array of buffer indexes. > > > 3) is about batching, in the only use case we know, the batching > > acceleration consist of programming the next operation on the > > completion IRQ. I already looked with the Cedrus developers if and how > > that was feasible, but we don't have a PoC yet. The optimization is > > about removing context switches between operations, which could prevent > > fully using the HW. > > Right, but still, we have to check whether the API we're going to > stabilize wouldn't prevent implementing it in the future. > > One idea is to solve it opportunistically. If there are already some > slices queued and not being processed by the hardware yet, queuing > more would just join them to the existing (staging) batch. When the > hardware finishes its current batch, the staging batch would be closed > and handed to the hardware for decoding. > > > 4) is also well covered with being able to multiplex 1 buffer with > > multiple slices. > > Note that with MMAP memory type it's not possible, because 1 buffer > can be only queued once. However, I guess that with DMABUF, one can > just allocate one large buffer and queue it at different V4L2 buffer > indexes with different .data_offset (or whatever we introduce for > proper, well-defined offset handling). > > > To be fair, I understand why we'd like to drop this API, as none of the > > active developers here of slice decoder (cedrus) have time to engage in > > supporting this untested "optimization". It's not only about kernel > > support, but also requires userspace work. I also agree that it could > > be added later, as an extension. It could be done with 3 new controls, > > an array of slice_params and an array of slice start offset and the > > number of slices, or just one, introduce a new structure that have a > > slice_params structure embedded, num_slices and an array of > > slice_start_offset. I don't have preference myself, but I'm just > > illustrating that yes, we could drop the slice batching to avoid > > pushing untested APIs without scarifying our ability to decode a valid > > stream. > > Sure, that makes sense, but as I mentioned above, there are problems > with the existing API and if we don't want to solve them right now, we > at least have to make sure that the problems can be solved later after > stabilizing it. > I think the plan is to support 1-slice per request using the current SLICE_PARAMS control. So, next iteration of this series should clarify that the SLICE_PARAMS is not intended as an array, and clean the fields that were added when the control was first envisioned as part of an array. As you have pointed out, for slice-based hardware, pushing 1-slice per request can be suboptimal. However, this is the current mode that is supported by Cedrus, which is good enough for Cedrus users. I believe there's enough reasons to stabilize this SLICE_PARAMS and clarify the use-case is for 1-slice per request. Nothing prevents to introduce a new control to support another mode, though. So, going forward, if we see slice-based hardware that can handle multiple slices per request, or, if we want to pass an entire frame to Cedrus to save the syscall overhead, we will have to introduce a new control (array-like). I believe a decent plan is along Nicolas' suggestions. I'd like to quote that because it totally matches what I had in mind: """ It could be done with 3 new controls, an array of slice_params and an array of slice start offset and the number of slices, or just one, introduce a new structure that have a slice_params structure embedded, num_slices and an array of slice_start_offset. """ Currently, the length of control arrays are fixed and the kernel rejects applications from passing a size that differs from the size of the array. Before proposing N-slices per request, we would have to introduce new V4L2 control semantics, to be able to pass a dynamic array of controls. This has been in our roadmap, so it'll happen sooner or later. Thanks! Ezequiel
diff --git a/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst b/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst index fc8ca4f8be25..e3c9bfaeff6f 100644 --- a/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-codec.rst @@ -1780,44 +1780,12 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - * - __u8 - ``slice_type`` - - * - __u8 - - ``pic_parameter_set_id`` - - * - __u8 - ``colour_plane_id`` - * - __u8 - ``redundant_pic_cnt`` - - * - __u16 - - ``frame_num`` - - - * - __u16 - - ``idr_pic_id`` - - - * - __u16 - - ``pic_order_cnt_lsb`` - - - * - __s32 - - ``delta_pic_order_cnt_bottom`` - - - * - __s32 - - ``delta_pic_order_cnt0`` - - - * - __s32 - - ``delta_pic_order_cnt1`` - - - * - struct :c:type:`v4l2_h264_pred_weight_table` - - ``pred_weight_table`` - - - * - __u32 - - ``dec_ref_pic_marking_bit_size`` - - Size in bits of the dec_ref_pic_marking() syntax element. - * - __u32 - - ``pic_order_cnt_bit_size`` - - Combined size in bits of the picture order count related syntax - elements: pic_order_cnt_lsb, delta_pic_order_cnt_bottom, - delta_pic_order_cnt0, and delta_pic_order_cnt1. * - __u8 - ``cabac_init_idc`` - @@ -1844,9 +1812,9 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - - ``num_ref_idx_l1_active_minus1`` - If num_ref_idx_active_override_flag is not set, this field must be set to the value of num_ref_idx_l1_default_active_minus1. - * - __u32 - - ``slice_group_change_cycle`` - - + * - __u8 + - ``reserved`` + - Applications and drivers must set this to zero. * - struct :c:type:`v4l2_h264_reference` - ``ref_pic_list0[32]`` - Reference picture list after applying the per-slice modifications @@ -1868,17 +1836,11 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - :stub-columns: 0 :widths: 1 1 2 - * - ``V4L2_H264_SLICE_FLAG_FIELD_PIC`` - - 0x00000001 - - - * - ``V4L2_H264_SLICE_FLAG_BOTTOM_FIELD`` - - 0x00000002 - - * - ``V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED`` - - 0x00000004 + - 0x00000001 - * - ``V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH`` - - 0x00000008 + - 0x00000002 - ``V4L2_CID_MPEG_VIDEO_H264_PRED_WEIGHT (struct)`` @@ -2011,6 +1973,35 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - * - __s32 - ``bottom_field_order_cnt`` - Picture Order Count for the coded bottom field + * - __u16 + - ``frame_num`` + - + * - __u16 + - ``idr_pic_id`` + - + * - __u16 + - ``pic_order_cnt_lsb`` + - + * - __s32 + - ``delta_pic_order_cnt_bottom`` + - + * - __s32 + - ``delta_pic_order_cnt0`` + - + * - __s32 + - ``delta_pic_order_cnt1`` + - + * - __u32 + - ``dec_ref_pic_marking_bit_size`` + - Size in bits of the dec_ref_pic_marking() syntax element. + * - __u32 + - ``pic_order_cnt_bit_size`` + - Combined size in bits of the picture order count related syntax + elements: pic_order_cnt_lsb, delta_pic_order_cnt_bottom, + delta_pic_order_cnt0, and delta_pic_order_cnt1. + * - __u32 + - ``slice_group_change_cycle`` + - * - __u32 - ``flags`` - See :ref:`Decode Parameters Flags <h264_decode_params_flags>` @@ -2029,6 +2020,12 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - * - ``V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC`` - 0x00000001 - That picture is an IDR picture + * - ``V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC`` + - 0x00000002 + - + * - ``V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD`` + - 0x00000004 + - .. c:type:: v4l2_h264_dpb_entry diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c b/drivers/media/v4l2-core/v4l2-ctrls.c index 0c13a7e0e63c..c6d82534f1fa 100644 --- a/drivers/media/v4l2-core/v4l2-ctrls.c +++ b/drivers/media/v4l2-core/v4l2-ctrls.c @@ -1736,6 +1736,7 @@ static int std_validate_compound(const struct v4l2_ctrl *ctrl, u32 idx, { struct v4l2_ctrl_mpeg2_slice_params *p_mpeg2_slice_params; struct v4l2_ctrl_vp8_frame_header *p_vp8_frame_header; + struct v4l2_ctrl_h264_slice_params *p_h264_slice_params; struct v4l2_ctrl_h264_decode_params *p_h264_dec_params; struct v4l2_ctrl_hevc_sps *p_hevc_sps; struct v4l2_ctrl_hevc_pps *p_hevc_pps; @@ -1796,7 +1797,12 @@ static int std_validate_compound(const struct v4l2_ctrl *ctrl, u32 idx, case V4L2_CTRL_TYPE_H264_SPS: case V4L2_CTRL_TYPE_H264_PPS: case V4L2_CTRL_TYPE_H264_SCALING_MATRIX: + break; + case V4L2_CTRL_TYPE_H264_SLICE_PARAMS: + p_h264_slice_params = p; + + zero_reserved(*p_h264_slice_params); break; case V4L2_CTRL_TYPE_H264_DECODE_PARAMS: diff --git a/drivers/media/v4l2-core/v4l2-h264.c b/drivers/media/v4l2-core/v4l2-h264.c index 306a51683606..1a9dcbbba06c 100644 --- a/drivers/media/v4l2-core/v4l2-h264.c +++ b/drivers/media/v4l2-core/v4l2-h264.c @@ -18,14 +18,12 @@ * * @b: the builder context to initialize * @dec_params: decode parameters control - * @slice_params: first slice parameters control * @sps: SPS control * @dpb: DPB to use when creating the reference list */ void v4l2_h264_init_reflist_builder(struct v4l2_h264_reflist_builder *b, const struct v4l2_ctrl_h264_decode_params *dec_params, - const struct v4l2_ctrl_h264_slice_params *slice_params, const struct v4l2_ctrl_h264_sps *sps, const struct v4l2_h264_dpb_entry dpb[V4L2_H264_NUM_DPB_ENTRIES]) { @@ -33,13 +31,13 @@ v4l2_h264_init_reflist_builder(struct v4l2_h264_reflist_builder *b, unsigned int i; max_frame_num = 1 << (sps->log2_max_frame_num_minus4 + 4); - cur_frame_num = slice_params->frame_num; + cur_frame_num = dec_params->frame_num; memset(b, 0, sizeof(*b)); - if (!(slice_params->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)) + if (!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC)) b->cur_pic_order_count = min(dec_params->bottom_field_order_cnt, dec_params->top_field_order_cnt); - else if (slice_params->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) + else if (dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD) b->cur_pic_order_count = dec_params->bottom_field_order_cnt; else b->cur_pic_order_count = dec_params->top_field_order_cnt; diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c b/drivers/staging/media/hantro/hantro_g1_h264_dec.c index 424c648ce9fc..f9839e9c6da5 100644 --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c @@ -23,7 +23,6 @@ static void set_params(struct hantro_ctx *ctx) { const struct hantro_h264_dec_ctrls *ctrls = &ctx->h264_dec.ctrls; const struct v4l2_ctrl_h264_decode_params *dec_param = ctrls->decode; - const struct v4l2_ctrl_h264_slice_params *slices = ctrls->slices; const struct v4l2_ctrl_h264_sps *sps = ctrls->sps; const struct v4l2_ctrl_h264_pps *pps = ctrls->pps; struct vb2_v4l2_buffer *src_buf = hantro_get_src_buf(ctx); @@ -42,11 +41,11 @@ static void set_params(struct hantro_ctx *ctx) if (!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY) && (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD || - slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC)) + dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC)) reg |= G1_REG_DEC_CTRL0_PIC_INTERLACE_E; - if (slices[0].flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) + if (dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC) reg |= G1_REG_DEC_CTRL0_PIC_FIELDMODE_E; - if (!(slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD)) + if (!(dec_param->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD)) reg |= G1_REG_DEC_CTRL0_PIC_TOPFIELD_E; vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL0); @@ -75,7 +74,7 @@ static void set_params(struct hantro_ctx *ctx) /* Decoder control register 4. */ reg = G1_REG_DEC_CTRL4_FRAMENUM_LEN(sps->log2_max_frame_num_minus4 + 4) | - G1_REG_DEC_CTRL4_FRAMENUM(slices[0].frame_num) | + G1_REG_DEC_CTRL4_FRAMENUM(dec_param->frame_num) | G1_REG_DEC_CTRL4_WEIGHT_BIPR_IDC(pps->weighted_bipred_idc); if (pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE) reg |= G1_REG_DEC_CTRL4_CABAC_E; @@ -88,8 +87,8 @@ static void set_params(struct hantro_ctx *ctx) vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL4); /* Decoder control register 5. */ - reg = G1_REG_DEC_CTRL5_REFPIC_MK_LEN(slices[0].dec_ref_pic_marking_bit_size) | - G1_REG_DEC_CTRL5_IDR_PIC_ID(slices[0].idr_pic_id); + reg = G1_REG_DEC_CTRL5_REFPIC_MK_LEN(dec_param->dec_ref_pic_marking_bit_size) | + G1_REG_DEC_CTRL5_IDR_PIC_ID(dec_param->idr_pic_id); if (pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED) reg |= G1_REG_DEC_CTRL5_CONST_INTRA_E; if (pps->flags & V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT) @@ -103,10 +102,10 @@ static void set_params(struct hantro_ctx *ctx) vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL5); /* Decoder control register 6. */ - reg = G1_REG_DEC_CTRL6_PPS_ID(slices[0].pic_parameter_set_id) | + reg = G1_REG_DEC_CTRL6_PPS_ID(pps->pic_parameter_set_id) | G1_REG_DEC_CTRL6_REFIDX0_ACTIVE(pps->num_ref_idx_l0_default_active_minus1 + 1) | G1_REG_DEC_CTRL6_REFIDX1_ACTIVE(pps->num_ref_idx_l1_default_active_minus1 + 1) | - G1_REG_DEC_CTRL6_POC_LENGTH(slices[0].pic_order_cnt_bit_size); + G1_REG_DEC_CTRL6_POC_LENGTH(dec_param->pic_order_cnt_bit_size); vdpu_write_relaxed(vpu, reg, G1_REG_DEC_CTRL6); /* Error concealment register. */ @@ -246,7 +245,7 @@ static void set_buffers(struct hantro_ctx *ctx) /* Destination (decoded frame) buffer. */ dst_dma = hantro_get_dec_buf_addr(ctx, &dst_buf->vb2_buf); /* Adjust dma addr to start at second line for bottom field */ - if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) + if (ctrls->decode->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD) offset = ALIGN(ctx->src_fmt.width, MB_DIM); vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DST); @@ -265,7 +264,7 @@ static void set_buffers(struct hantro_ctx *ctx) * DMV buffer is split in two for field encoded frames, * adjust offset for bottom field */ - if (ctrls->slices[0].flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) + if (ctrls->decode->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD) offset += 32 * MB_WIDTH(ctx->src_fmt.width) * MB_HEIGHT(ctx->src_fmt.height); vdpu_write_relaxed(vpu, dst_dma + offset, G1_REG_ADDR_DIR_MV); diff --git a/drivers/staging/media/hantro/hantro_h264.c b/drivers/staging/media/hantro/hantro_h264.c index 194d05848077..0cbe514dc79a 100644 --- a/drivers/staging/media/hantro/hantro_h264.c +++ b/drivers/staging/media/hantro/hantro_h264.c @@ -372,8 +372,7 @@ int hantro_h264_dec_prepare_run(struct hantro_ctx *ctx) /* Build the P/B{0,1} ref lists. */ v4l2_h264_init_reflist_builder(&reflist_builder, ctrls->decode, - &ctrls->slices[0], ctrls->sps, - ctx->h264_dec.dpb); + ctrls->sps, ctx->h264_dec.dpb); v4l2_h264_build_p_ref_list(&reflist_builder, h264_ctx->reflists.p); v4l2_h264_build_b_ref_lists(&reflist_builder, h264_ctx->reflists.b0, h264_ctx->reflists.b1); diff --git a/drivers/staging/media/rkvdec/rkvdec-h264.c b/drivers/staging/media/rkvdec/rkvdec-h264.c index 57539c630422..57c084910b3b 100644 --- a/drivers/staging/media/rkvdec/rkvdec-h264.c +++ b/drivers/staging/media/rkvdec/rkvdec-h264.c @@ -730,7 +730,6 @@ static void assemble_hw_rps(struct rkvdec_ctx *ctx, struct rkvdec_h264_run *run) { const struct v4l2_ctrl_h264_decode_params *dec_params = run->decode_params; - const struct v4l2_ctrl_h264_slice_params *sl_params = &run->slices_params[0]; const struct v4l2_h264_dpb_entry *dpb = dec_params->dpb; struct rkvdec_h264_ctx *h264_ctx = ctx->priv; const struct v4l2_ctrl_h264_sps *sps = run->sps; @@ -754,7 +753,7 @@ static void assemble_hw_rps(struct rkvdec_ctx *ctx, continue; if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM || - dpb[i].frame_num < sl_params->frame_num) { + dpb[i].frame_num < dec_params->frame_num) { p[i] = dpb[i].frame_num; continue; } @@ -1093,8 +1092,7 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx) /* Build the P/B{0,1} ref lists. */ v4l2_h264_init_reflist_builder(&reflist_builder, run.decode_params, - &run.slices_params[0], run.sps, - run.decode_params->dpb); + run.sps, run.decode_params->dpb); h264_ctx->reflists.num_valid = reflist_builder.num_valid; v4l2_h264_build_p_ref_list(&reflist_builder, h264_ctx->reflists.p); v4l2_h264_build_b_ref_lists(&reflist_builder, h264_ctx->reflists.b0, diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c index 614b1b496e40..2a00b2175ca1 100644 --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c @@ -95,7 +95,6 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx, { struct cedrus_h264_sram_ref_pic pic_list[CEDRUS_H264_FRAME_NUM]; const struct v4l2_ctrl_h264_decode_params *decode = run->h264.decode_params; - const struct v4l2_ctrl_h264_slice_params *slice = run->h264.slice_params; const struct v4l2_ctrl_h264_sps *sps = run->h264.sps; struct vb2_queue *cap_q; struct cedrus_buffer *output_buf; @@ -144,7 +143,7 @@ static void cedrus_write_frame_list(struct cedrus_ctx *ctx, output_buf = vb2_to_cedrus_buffer(&run->dst->vb2_buf); output_buf->codec.h264.position = position; - if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) + if (decode->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC) output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_FIELD; else if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD) output_buf->codec.h264.pic_type = CEDRUS_H264_PIC_TYPE_MBAFF; @@ -412,7 +411,7 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, reg |= VE_H264_SPS_DIRECT_8X8_INFERENCE; cedrus_write(dev, VE_H264_SPS, reg); - mbaff_pic = !(slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) && + mbaff_pic = !(decode->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC) && (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD); pic_width_in_mbs = sps->pic_width_in_mbs_minus1 + 1; @@ -426,9 +425,9 @@ static void cedrus_set_params(struct cedrus_ctx *ctx, reg |= slice->cabac_init_idc & 0x3; if (ctx->fh.m2m_ctx->new_frame) reg |= VE_H264_SHS_FIRST_SLICE_IN_PIC; - if (slice->flags & V4L2_H264_SLICE_FLAG_FIELD_PIC) + if (decode->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC) reg |= VE_H264_SHS_FIELD_PIC; - if (slice->flags & V4L2_H264_SLICE_FLAG_BOTTOM_FIELD) + if (decode->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD) reg |= VE_H264_SHS_BOTTOM_FIELD; if (slice->flags & V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED) reg |= VE_H264_SHS_DIRECT_SPATIAL_MV_PRED; diff --git a/include/media/h264-ctrls.h b/include/media/h264-ctrls.h index f90fe96f0a59..521ffd8f7b34 100644 --- a/include/media/h264-ctrls.h +++ b/include/media/h264-ctrls.h @@ -139,10 +139,8 @@ struct v4l2_ctrl_h264_pred_weight { #define V4L2_H264_SLICE_TYPE_SP 3 #define V4L2_H264_SLICE_TYPE_SI 4 -#define V4L2_H264_SLICE_FLAG_FIELD_PIC 0x01 -#define V4L2_H264_SLICE_FLAG_BOTTOM_FIELD 0x02 -#define V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED 0x04 -#define V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH 0x08 +#define V4L2_H264_SLICE_FLAG_DIRECT_SPATIAL_MV_PRED 0x01 +#define V4L2_H264_SLICE_FLAG_SP_FOR_SWITCH 0x02 #define V4L2_H264_REFERENCE_FLAG_TOP_FIELD 0x01 #define V4L2_H264_REFERENCE_FLAG_BOTTOM_FIELD 0x02 @@ -167,21 +165,8 @@ struct v4l2_ctrl_h264_slice_params { __u32 first_mb_in_slice; __u8 slice_type; - __u8 pic_parameter_set_id; __u8 colour_plane_id; __u8 redundant_pic_cnt; - __u16 frame_num; - __u16 idr_pic_id; - __u16 pic_order_cnt_lsb; - __s32 delta_pic_order_cnt_bottom; - __s32 delta_pic_order_cnt0; - __s32 delta_pic_order_cnt1; - - /* Size in bits of dec_ref_pic_marking() syntax element. */ - __u32 dec_ref_pic_marking_bit_size; - /* Size in bits of pic order count syntax. */ - __u32 pic_order_cnt_bit_size; - __u8 cabac_init_idc; __s8 slice_qp_delta; __s8 slice_qs_delta; @@ -190,7 +175,8 @@ struct v4l2_ctrl_h264_slice_params { __s8 slice_beta_offset_div2; __u8 num_ref_idx_l0_active_minus1; __u8 num_ref_idx_l1_active_minus1; - __u32 slice_group_change_cycle; + + __u8 reserved; struct v4l2_h264_reference ref_pic_list0[V4L2_H264_REF_LIST_LEN]; struct v4l2_h264_reference ref_pic_list1[V4L2_H264_REF_LIST_LEN]; @@ -221,7 +207,9 @@ struct v4l2_h264_dpb_entry { __u32 flags; /* V4L2_H264_DPB_ENTRY_FLAG_* */ }; -#define V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC 0x01 +#define V4L2_H264_DECODE_PARAM_FLAG_IDR_PIC 0x01 +#define V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC 0x02 +#define V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD 0x04 struct v4l2_ctrl_h264_decode_params { struct v4l2_h264_dpb_entry dpb[V4L2_H264_NUM_DPB_ENTRIES]; @@ -229,6 +217,21 @@ struct v4l2_ctrl_h264_decode_params { __u16 nal_ref_idc; __s32 top_field_order_cnt; __s32 bottom_field_order_cnt; + + __u16 frame_num; + __u16 idr_pic_id; + __u16 reserved; + + __u16 pic_order_cnt_lsb; + __s32 delta_pic_order_cnt_bottom; + __s32 delta_pic_order_cnt0; + __s32 delta_pic_order_cnt1; + /* Size in bits of dec_ref_pic_marking() syntax element. */ + __u32 dec_ref_pic_marking_bit_size; + /* Size in bits of pic order count syntax. */ + __u32 pic_order_cnt_bit_size; + __u32 slice_group_change_cycle; + __u32 flags; /* V4L2_H264_DECODE_PARAM_FLAG_* */ }; diff --git a/include/media/v4l2-h264.h b/include/media/v4l2-h264.h index 1a5f26fc2a9a..f08ba181263d 100644 --- a/include/media/v4l2-h264.h +++ b/include/media/v4l2-h264.h @@ -44,7 +44,6 @@ struct v4l2_h264_reflist_builder { void v4l2_h264_init_reflist_builder(struct v4l2_h264_reflist_builder *b, const struct v4l2_ctrl_h264_decode_params *dec_params, - const struct v4l2_ctrl_h264_slice_params *slice_params, const struct v4l2_ctrl_h264_sps *sps, const struct v4l2_h264_dpb_entry dpb[V4L2_H264_NUM_DPB_ENTRIES]);