The 32x32 forward transform can potentially reach peak coefficient
value close to 32700, while the rounding factor can go upto 610.
This could cause overflow issue in the SSSE3 implementation of 32x32
quantization process.
This commit resolves this issue by replacing the addition operations
with saturated addition operations in 32x32 block quantization.
Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70
Moves counting of mv branches to where we have a new mv, instead of after
the whole frame is summed.
Change-Id: I945d9f6d9199ba2443fe816c92d5849340d17bbd
This commit fixed the potential overflow issue in the SSE2
implementation of 32x32 forward DCT. It resolved the corrupted
coded frames in the border of scenes.
Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
- Intermediate height was not correct i.e. when block size is 4 and
y_step_q4 is 6. In this case intermediate height was
(4*6) >> 4 = 1 and vertical interpolation needs two source pixels
plus 7 extra pixels for taps.
- Also if the current output block is 16x16 and we are using 4x upscaling
we need only 12 rows after horizontal filtering instead of 16.
Patch Set 2: Intermediate_height updated after CL 66723
"Fix bug in convolution functions (filter selection)"
Change-Id: I5a1a1bc2ac9d5edb3a6e0818de618bf318fdd589
The 32x32 quantization process can potentially have the intermediate
stacks over 16-bit range, thereby causing enc/dec mismatch. This commit
fixes this overflow issue in the SSSE3 implementation, as well as the
prototype, of 32x32 quantization.
This fixes issue 607 from webm@googlecode.
Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806
This patch is a reformatted version of optimizations done by
engineers at Intel (Erik/Tamar) who have been providing
performance feedback for VP9. For the test clips used (720p, 1080p),
up to 1.2% performance improvement was seen.
Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2
Make the current head working properly, while working on fixing an
issue in the SSSE3 implementation of 32x32 quantization.
Change-Id: Ic029da3fd7f1f5e58bc641341cbd226ec49a16bc
Previous change c4048dbd limits the mv search range assuming max block
size of 64x64, this commit change the search range using actual block
size instead.
Change-Id: Ibe07ab02b62bf64bd9f8675d2b997af20a2c7e11
Making code more compact, adding consts, removing redundant arguments,
adding do/while(0) for macros.
Change-Id: Ic9ec0bc58cee0910a5450b7fb8cfbf35fa9d0d16
(In response to Issue 604:
https://code.google.com/p/webm/issues/detail?id=604)
There were bugs in the convolution code for two cases:
1. Where the filter table was assumed to be aligned to a
256 byte boundary. The offset of the pixel in the
source buffer was computed incorrectly.
2. Where no such alignment assumption was made. An
incorrect address for the filter table base was used.
To fix both problems, I now assume that the filter table is
256-byte aligned and modify the pixel offset calculation to
match.
A later patch should remove the restriction that the filter
table is aligned to a 256-byte boundary.
There was also a bug in the ConvolveTest unit test
(convolve_test.cc).
(Bug & initial fix suggestion submitted by Tero Rintaluoma
and Sami Pietilä).
Change-Id: I71985551e62846e55e40de9e7e3959d4805baa82
vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.
Change-Id: I6e30a3fee1ece5af7f258532416d0bfddd1143f0
It is possible to have invalid scale factors and not access them
during decoding. Error is reported if we really try to use invalid scale
factors.
Change-Id: Ie532d3ea7325ee0c7a6ada08269f804350c80fdf
Adding set_contexts contexts function and call it instead of
set_contexts_on_border. Calling txfrm_block_to_raster_xy to get aoff and
loff.
Change-Id: I41897e344afd2cae1f923f4fdbe63daccf6fe80e
Moving foreach_predicted_block_in_plane function to vp9_reconinter.c
because there is only one usage.
Change-Id: I9852feae43fc3cf809b817fc541d043bc5496209
Updating implementation of vp9_get_pred_context_single_ref_p2 using
has_second_ref function to make code easier to read.
Change-Id: I5ba642712f59861a48aab974e73aa01640d086fe
vp9_short_idct10_8x8_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut several
unnecessary calculations in order to save instructions.
Change-Id: I34fda95e29082b789aded97c2df193991c2d9195
Updating implementation of vp9_get_pred_context_single_ref_p1 using
has_second_ref function to make code easier to read.
Change-Id: Ie8f60403a7195117ceb2c6c43176ca9a9e70b909
As the pixel values beyond image border are duplicates of pixels
on edge, the change limits the mv search range, any mv beyond
the limits no longer produce new/different prediction values
as entire block with pixels used for subpel interpolation are
outside image border.
Change-Id: I4c6fdf06e33c1cef1489f5470ce0fb4e5e01fb79
Updating all foreach_transformed_block_visitor functions to work with
plane block size instead of general block. Removing a lot of duplicated
code.
Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.
Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
The intent was to initialize the deltas for the
segment to the computed value, irrespective of mode
and reference frame if (mode_ref_delta_enabled == 0).
(In response to bug posted by Manjit Hota to codec-devel
and webm-discuss lists)
Change-Id: I10435cb63d0f88359bb4c14f22181878a1988e72