6997 Commits

Author SHA1 Message Date
Jingning Han
161f636809 Merge "Remove unused rd cost calculation from nonrd_use_partition" 2014-12-10 09:05:18 -08:00
James Yu
01fc6f51e0 VP9 common for ARMv8 by using NEON intrinsics 07
Add vp9_convolve8_neon.c
- vp9_convolve8_horiz_neon
- vp9_convolve8_vert_neon

Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
893534a996 VP9 common for ARMv8 by using NEON intrinsics 04
Add vp9_convolve8_avg_neon.c
- vp9_convolve8_avg_horiz_neon
- vp9_convolve8_avg_vert_neon

Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
d12757f5c6 VP9 common for ARMv8 by using NEON intrinsics 03
Add vp9_copy_neon.c
- vp9_convolve_copy_neon

Change-Id: I291fc5423d06240876411bbceab03eae5ef585be
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:02:46 -08:00
Scott LaVarnway
617382a2e3 VP9 common for ARMv8 by using NEON intrinsics 02
Add vp9_avg_neon.c
- vp9_convolve_avg_neon

Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 19:00:21 -08:00
Jingning Han
0cac834b5a Use use_prev_frame_mvs flag for ref mv search branch
Replace error_resilient flag with use_prev_frame_mvs in
vp9_pick_inter_mode reference motion vector search selection.
This effectively turns off the simplified ref mv search in the
settings of frame resizing, even if error-resilient mode is off.

Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686
2014-12-09 18:18:40 -08:00
Jingning Han
e728678c50 Refactor update_state_rt
Update the frame motion vector only if previous frame motion vector
is needed for next frame reference motion vector.

Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f
2014-12-09 15:35:49 -08:00
hkuang
4eee74d6ed Fix clang ioc warning due to NULL src_mi pointer.
The warning only happens in VP9 encoder's first pass due to src_mi
is not set up yet. But it will not fail the encoder as left_mi and
above_mi are not used in the first_pass and they will be set up again
in the second pass.

Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08
2014-12-09 14:32:48 -08:00
Johann
5810f1b4cd Merge "VP9 common for ARMv8 by using NEON intrinsics 01" 2014-12-09 13:41:49 -08:00
James Yu
5b098b1825 VP9 common for ARMv8 by using NEON intrinsics 01
Add vp9_loopfilter_neon.c
- vp9_lpf_horizontal_4_neon
- vp9_lpf_vertical_4_neon
- vp9_lpf_horizontal_8_neon
- vp9_lpf_vertical_8_neon

Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 12:26:56 -08:00
Jingning Han
225cdef665 Make RTC coding flow support sub8x8 in key frame coding
This commit enables the use of sub8x8 blocks in RTC key frame
encoding. It requires the block size to be preset and will decide
the coding mode and encode the bit-stream.

Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b
2014-12-09 11:34:58 -08:00
Jingning Han
4bacaab46d Cosmetic naming change
Rename set_modeinfo_offsets as set_mode_info_offsets, to be more
consistent with naming convention.

Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d
2014-12-09 10:32:04 -08:00
Jingning Han
f051a7beab Take out redundant setting of mode_info from set_block_size
The later encoding process will take the top-left block's
mode_info for pre-determined block size.

Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5
2014-12-09 10:27:18 -08:00
hkuang
3dfdfd5c86 Merge "Clean up the logic of handling corrupted frame." 2014-12-09 10:23:18 -08:00
Paul Wilkins
e68c8dcfd2 Substantial restructuring of AQ mode 2.
The restructure moves the decision into the rd pick
modes loop and makes a decision based at the 16x16
block level instead of only the 64x64 level.

This gives finer granularity and better visual results
on the clips I have tested. Metrics results are worse
than the old AQ2 especially for PSNR and this mode
now falls between AQ0 and AQ1 in terms of visual
impact and metrics results.

Further tuning of this to follow.

It should be noted that if there are multiple iterations
of the recode loop the segment for a MB could change
in each loop if the previous loop causes a change in the
complexity / variance bin of the block. Also where a block
gets a delta Q this will alter the rd multiplier for this block
in subsequent recode iterations and frames where the
segmentation is applied.

Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7
2014-12-09 15:10:52 +00:00
Jingning Han
1395ded2a7 Remove unused rd cost calculation from nonrd_use_partition
The per block rd cost calculation is not needed when partition
size is preset.

Change-Id: Ie5575248bbffb584e908aa13097f697ace6ec747
2014-12-08 18:45:19 -08:00
Yunqing Wang
cddbdeabd0 Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering" 2014-12-08 13:34:54 -08:00
James Zern
c38d0490b3 Merge "Changes to assembler for NASM on mac." 2014-12-08 12:55:06 -08:00
levytamar82
8f9d94ec17 SSSE3 Optimization for Atom processors using new instruction selection and ordering
The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors.
By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved.
In the original code, the PSHUBF uses every byte and is consecutively copied.
This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result.

For example:
filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7
REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15
PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8

This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors.
There was no observed performance impact on Core processors (expected).

Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0
2014-12-08 13:11:01 -07:00
hkuang
f925e5ce0f Merge "Improve the performance by caching the left_mi and right_mi in macroblockd." 2014-12-08 10:24:17 -08:00
Paul Wilkins
127f65531b Merge "Use average mb energy from first pass in AQ2 test." 2014-12-08 09:01:39 -08:00
Frank Galligan
0f8e8330eb Merge "Fix potential integer overflow." 2014-12-07 21:37:39 -08:00
James Zern
da464c483f Merge "vp9 asserts: fix compile warning" 2014-12-05 21:09:42 -08:00
James Zern
3db785facc Merge "vp9: fix frame-parallel encoding" 2014-12-05 19:00:48 -08:00
Deb Mukherjee
0d367474d0 Merge "Some internal-stats, vp9-highbitdepth bug fixes" 2014-12-05 17:49:52 -08:00
James Zern
6db81fd629 vp9: fix frame-parallel encoding
the flag in the header wasn't being set based on the encoder
configuration in non-intra only mode

broken since:
fbc2fbf Adding oxcf temp variable.

Change-Id: Ib4cff9901889824bc4e68d7f0f6deb1e41df2f53
2014-12-05 17:44:46 -08:00
Jingning Han
bd6bfb93b0 Merge "Remove redundant rdcost reset" 2014-12-05 17:35:07 -08:00
Jingning Han
296afb9440 Merge "Fix a motion search skip condition in vp9_pick_inter_mode" 2014-12-05 17:35:04 -08:00
Jingning Han
3d8d1e374e Merge "Remove redundant MB_MODE_INFO reset from vp9_pick_mode_inter" 2014-12-05 16:59:50 -08:00
hkuang
382f86f945 Improve the performance by caching the left_mi and right_mi in macroblockd.
This improve the deocde performance by ~2% on Nexus 7 2013.

Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3
2014-12-05 16:25:42 -08:00
James Zern
616b3a810f vp9 asserts: fix compile warning
string literal to int within an assert

Change-Id: I76a173f96b9add5bf27c3f5ad5d72c6f30e51629
2014-12-05 16:20:42 -08:00
Jingning Han
17bedc54f5 Remove redundant rdcost reset
The initial reset of this_rdc in vp9_pick_inter_mode is not needed,
since it will be re-assign when used.

Change-Id: Ic0e12d741cbab292fc214c1eabb48b129af7839b
2014-12-05 16:06:17 -08:00
Jingning Han
eadffb2d6e Fix a motion search skip condition in vp9_pick_inter_mode
Compare the current best mode rate-distortion cost with the skip
threshold to decide if performing motion search.

Change-Id: Ia071824f8dd3b7db485f424692a485a2da6a1a9f
2014-12-05 15:58:36 -08:00
Jingning Han
732d57c2b5 Remove redundant MB_MODE_INFO reset from vp9_pick_mode_inter
Change-Id: I0222f7abc61202f4a83b117bbfb042ada6304562
2014-12-05 15:51:11 -08:00
hkuang
eaa6deee5b Merge "Merge set_prev_mi function into encoder function." 2014-12-05 15:12:50 -08:00
Deb Mukherjee
37448d3e1f Some internal-stats, vp9-highbitdepth bug fixes
Change-Id: I0363d98f6f6558a43276aec48f27dca37c93f5ad
2014-12-05 13:40:50 -08:00
Jingning Han
6ae829088f Merge "Remove redundant vp9_zero in choose_partitioning" 2014-12-05 11:47:58 -08:00
Jingning Han
69a9dc5cd3 Merge "Enable conditional skip path in rd_pick_intra_sby_mode" 2014-12-05 11:25:30 -08:00
Jingning Han
62c7356098 Merge "Use hybrid RD and non-RD coding flow for key frame coding" 2014-12-05 11:25:19 -08:00
Jingning Han
9d88b30854 Remove redundant vp9_zero in choose_partitioning
It makes the overall speed -6 about 2% faster with no compression
performance change.

Change-Id: I680a967b421caa2c5a5cdb821311c4726a2df45a
2014-12-05 10:39:39 -08:00
Jingning Han
74ded4863e Enable conditional skip path in rd_pick_intra_sby_mode
These speed-up features for key frame coding are only turned on
in the settings of hybrid non-RD and RD mode decision. It provides
about 20% speed-up to the hybrid key frame coding at the expense
of certain compression performance loss. For vidyo1, the key frame
coding statistics are changed
9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us

Overall rtc set compression performance is down by -0.257%.

Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f
2014-12-05 09:36:09 -08:00
Jingning Han
07711e9b27 Use hybrid RD and non-RD coding flow for key frame coding
When block size is below 16x16, the encoder swap from non-RD to
RD mode for key frame coding. This largely brough back the key
frame compression performance. For vidyo1 at 1000 kbps, the key
frame coding statistics are changed

9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us

As compared to the full RD case
7187F, 34.930 dB, 214470 us

The overall rtc set coding performance (single key frame setting)
is improved by 1.5%.

Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878
2014-12-05 09:35:27 -08:00
Yunqing Wang
a3a4a34c60 Merge "vp9_ethread: the tile-based multi-threaded encoder" 2014-12-05 08:23:49 -08:00
Frank Galligan
4c4d7261e4 Fix potential integer overflow.
ioc found a potential integer overflow in the rate control.

This is related to https://code.google.com/p/webm/issues/detail?id=821

Change-Id: Ib6c4acd6e964972f932fce7490592eb134f2b7ea
2014-12-05 08:02:12 -08:00
Paul Wilkins
bb6e47c1c9 Merge "Increase strength of AQ1." 2014-12-05 04:11:43 -08:00
Debargha Mukherjee
15cf55b3ca Merge "Use the RTC optimizations when in high bitdepth mode." 2014-12-04 19:22:27 -08:00
James Zern
b43c27ab6e Merge "vp9_reader: reorder struct members" 2014-12-04 16:08:08 -08:00
Debargha Mukherjee
4bfde1071e Merge "Corrected the renaming of CONFIG_VP9_HIGH ro CONFIG_VP9_HIGHBITDEPTH." 2014-12-04 15:52:35 -08:00
Peter de Rivaz
a306bd8274 Use the RTC optimizations when in high bitdepth mode.
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.

Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
2014-12-04 15:48:42 -08:00
hkuang
dde819599b Clean up the logic of handling corrupted frame.
No more checking of corrupted reference frame as we skip
decoding any non-intra frame in case of frame corrupted.

Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073
2014-12-04 15:07:59 -08:00