3710 Commits

Author SHA1 Message Date
Yunqing Wang
ed36720b66 Do vertical loopfiltering in parallel
This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.

Decoder speedup:
tulip clip:     2% speed gain;
old_town_cross: 1.2% speed gain;
bus:            2% speed gain.

Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
2013-11-22 10:04:51 -08:00
Jim Bankoski
a64a192c90 Merge changes Id1698a35,Idcabd0b9
* changes:
  detokenization speedups
  Don't write 0's to token_cache
2013-11-22 08:16:17 -08:00
Deb Mukherjee
5576a4e1cb Merge "Refactoring of rate control - part 1" 2013-11-22 08:06:48 -08:00
Deb Mukherjee
f1781e86b7 Refactoring of rate control - part 1
Moves all rate control variables to a separate structure,
removes some currently unused variables,
moves some rate control functions to vp9_ratectrl.c,
and splits the encode_frame_to_data_rate function.

Change-Id: I4ed54c24764b3b6de2dd676484f01473724ab52b
2013-11-22 07:07:24 -08:00
Dmitry Kovalev
7c8cac3c21 Removing txfrm_block_to_raster_xy() call from extend_for_intra().
Change-Id: I6a48d1f35ed5fe7a2c7499675b339994c9c3bdf2
2013-11-21 19:30:58 -08:00
Jim Bankoski
70ffd5d055 detokenization speedups
removed unnecessary ifs and branches ..

Change-Id: Id1698a35292659388f48926791024d1400f2cea9
2013-11-21 16:55:22 -08:00
Dmitry Kovalev
ad3333e2cd Merge "Removing plane_block_{width, height} functions." 2013-11-21 16:37:27 -08:00
Dmitry Kovalev
6042f781f7 Merge "Using txfrm_block_to_raster_xy() in encoder." 2013-11-21 16:24:22 -08:00
Dmitry Kovalev
485682c30a Adding select_tx_size() function.
Change-Id: I9d18b31661a2ccdcd4e25956882c7fc2d4b7002e
2013-11-21 15:55:40 -08:00
Frank Galligan
fe847e7660 Merge "Revert "Add 16 wide neon horz loopfilter."" 2013-11-21 15:06:17 -08:00
levytamar82
8def766de2 vp9_short_fdct32x32_rd vp9_short_fdct32x32 optimized for AVX2
Change-Id: I6366e84490883b72362f762369d7e5bccb64f02f
2013-11-21 14:19:49 -08:00
Frank Galligan
97d1258375 Revert "Add 16 wide neon horz loopfilter."
The change caused mismatches with some test vectors on neon.

Original CL: https://gerrit.chromium.org/gerrit/#/c/67863/

Change-Id: I913891636d53783e93cb1865ca78ded1821dc4b0
2013-11-21 14:01:33 -08:00
Jim Bankoski
b38e42fe9d Don't write 0's to token_cache
This code only updates the token_cache if the result is non0.

Change-Id: Idcabd0b993a926fea9c29dbec134b9c5c4859b40
2013-11-21 12:52:15 -08:00
Dmitry Kovalev
864e7c51b6 Syncing update_coef_probs() implementation with decoder.
Using for loop based on max_tx_size instead of separate checks. Combining
build_coeff_contexts() with update_coef_probs().

Change-Id: Ie335a7db29830677fbc14478a9c190d3c1068665
2013-11-21 12:36:02 -08:00
Abo Talib Mahfoodh
ec2dbdd107 Improve vp9_fdct4x4_sse2 (x1.2)
Modifications are done to reduce the total clock cycle.
Speedup: 1.2

Tested with: park_joy_420_720p50.y4m

Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f
2013-11-21 15:04:35 -05:00
Dmitry Kovalev
4896d5c7ef Moving {left, right}_block_mode to vp9_blockd.h.
Both functions have no relation to motion vectors, so moving them from
vp9_findnearmv.h to vp9_blockd.h.

Change-Id: I74f524267886ab0fff4a2da793a10c906ed0f43a
2013-11-21 11:43:53 -08:00
Yunqing Wang
e002bb99a8 Merge "Add filter_selectively_vert_row2 to enable parallel loopfiltering" 2013-11-21 11:25:55 -08:00
hkuang
370bf116a2 Merge "Remove unnecessary eob checking." 2013-11-21 11:24:02 -08:00
Frank Galligan
2dd77580c0 Merge "Add 16 wide neon horz loopfilter." 2013-11-21 10:29:30 -08:00
Yunqing Wang
b5e6d6cccf Add filter_selectively_vert_row2 to enable parallel loopfiltering
Added filter_selectively_vert_row2 to be ready for parallel
loopfiltering in vertical direction. This change did 2-row
filtering at a time. If 2 vertically adjacent 8x8 blocks do same
type of filtering, we can do 16-pixel filtering in parallel.

Next, we need to provide 16-pixel loopfiltering functions in c
and optimized versions for codec speedup.

Change-Id: Idf97bbdd70566e55bd30e1fd25cb8544e33291be
2013-11-21 09:53:15 -08:00
Yunqing Wang
6c4964602a Merge "Correct ssse3 8/16-pixel wide sub-pixel filter calculation" 2013-11-21 09:40:02 -08:00
Frank Galligan
98de15137e Add 16 wide neon horz loopfilter.
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.

Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
2013-11-21 09:39:36 -08:00
Dmitry Kovalev
c90b6bb101 Removing redundant call of vp9_init_mbmode_probs().
This function is called from vp9_setup_past_independence() which is called
before the modified piece of code. Moving reset of inter_mode_probs  into
vp9_init_mbmode_probs() for consistency.

Change-Id: Ib188e8798e1fbe15407fd501406761b746fdda95
2013-11-20 21:56:38 -08:00
Dmitry Kovalev
77a865d970 Merge "Removing old code." 2013-11-20 14:43:03 -08:00
Dmitry Kovalev
a218a96784 Merge "Adding MV_FP_SIZE constant." 2013-11-20 14:39:58 -08:00
Dmitry Kovalev
d54893da1d Merge "Using is_inter_block() and has_second_ref() functions." 2013-11-20 14:39:50 -08:00
Dmitry Kovalev
87ff7f2af3 Removing old code.
Change-Id: I67d1681c7b17661deb792c5e6a9e2014a73ff9b7
2013-11-20 14:05:21 -08:00
Dmitry Kovalev
a84c5f0f64 Using txfrm_block_to_raster_xy() in encoder.
Change-Id: Ibe847000467fe46bf8ce87d8f1ef8f2d5ad1eaf4
2013-11-20 13:58:21 -08:00
Yunqing Wang
256cf7ee7d Correct ssse3 8/16-pixel wide sub-pixel filter calculation
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.

Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca
2013-11-20 12:52:56 -08:00
Dmitry Kovalev
79b5a2b142 Removing plane_block_{width, height} functions.
Change-Id: I29c0dfcf41a1253d5e2a0d2ff740c0c38ebaa5a2
2013-11-20 12:39:29 -08:00
Jim Bankoski
302c33e49f Merge "Clean up removal of vp9_pareto8 table." 2013-11-20 12:30:03 -08:00
Dmitry Kovalev
1a69eed2c4 Using is_inter_block() and has_second_ref() functions.
Change-Id: Iadd771a33c8874f3b774923bca4da3c8fe5429ee
2013-11-20 12:08:10 -08:00
Dmitry Kovalev
4956fcd31b Adding MV_FP_SIZE constant.
Change-Id: I98d750ee92ff51fb714980418ea28be3b1d0f3c6
2013-11-20 12:07:57 -08:00
hkuang
6debc446e0 Remove unnecessary eob checking.
Change-Id: Ia568f70bddc1a2b62141a0197459119ca74c22b5
2013-11-20 11:58:11 -08:00
Jim Bankoski
25aae73a30 Merge "remove the model and copy in pack_mb_tokens" 2013-11-20 11:34:30 -08:00
Jim Bankoski
5bbb0c6295 Clean up removal of vp9_pareto8 table.
Change-Id: I5556e8d1fc150be8a3e93af21900829b59a500dc
2013-11-20 11:17:26 -08:00
Jingning Han
81b9fd4310 Merge "Take out assertion from inverse transforms" 2013-11-20 10:55:27 -08:00
Jim Bankoski
03276bf6e6 remove the model and copy in pack_mb_tokens
Change-Id: I00a5203c8ed76c184d936fccf93d76e7c06773d3
2013-11-20 10:06:04 -08:00
Yunqing Wang
0ef63f596d Fix stack pointer in sub-pixel filters
In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.

Change-Id: I062971e195f1f2ab9d0ab5fb84dcf215a0fcaa67
2013-11-20 09:42:44 -08:00
Guillaume Martres
b00057c88a Merge "vpxenc: add --aq-mode flag to control adaptive quantization" 2013-11-20 08:13:28 -08:00
Dmitry Kovalev
c511f560bf Cleaning up entropy probability update in encoder.
Change-Id: I94cb9e3d910dff74bf90906dd96e3a4e06ebdbe6
2013-11-19 19:49:56 -08:00
Jim Bankoski
7a8a68e2bd Merge "scan order table lookup same for encoder and decoder" 2013-11-19 16:22:48 -08:00
Yunqing Wang
e8f8e77642 Merge "Fix decoder mismatch with ssse3 enabled" 2013-11-19 16:19:32 -08:00
Jingning Han
75673cfc3d Merge "Use restore_dst_buf in handle_inter_mode" 2013-11-19 16:19:04 -08:00
Dmitry Kovalev
e8346f8cf7 Merge "Cleaning up probability/cost functions." 2013-11-19 16:08:16 -08:00
Yaowu Xu
dd04ff506b Merge "Move vp9_setup_interp_filter() to encoder" 2013-11-19 16:01:19 -08:00
Jingning Han
82c32fe1b5 Use restore_dst_buf in handle_inter_mode
There are many places in handle_inter_mode that need to restore the
dst buffer pointers, due to buffer pointer swap and early rd search
breakout. This commit wraps these operations into an inline function
for clean-up.

Change-Id: I0462e8c41c8bc3cd8db07395489cac03d8e5be54
2013-11-19 15:33:16 -08:00
Jim Bankoski
d6667dd54f scan order table lookup same for encoder and decoder
Change-Id: I473947b5ca70b7a81151926284bff86f8555492a
2013-11-19 15:31:43 -08:00
Yunqing Wang
3d50da5397 Fix decoder mismatch with ssse3 enabled
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.

Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
2013-11-19 15:10:04 -08:00
Dmitry Kovalev
65cee2f01a Merge "Simplifying partition context calculation." 2013-11-19 15:09:01 -08:00