Commit Graph

1889 Commits

Author SHA1 Message Date
Yunqing Wang
256cf7ee7d Correct ssse3 8/16-pixel wide sub-pixel filter calculation
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.

Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca
2013-11-20 12:52:56 -08:00
Dmitry Kovalev
79b5a2b142 Removing plane_block_{width, height} functions.
Change-Id: I29c0dfcf41a1253d5e2a0d2ff740c0c38ebaa5a2
2013-11-20 12:39:29 -08:00
Jim Bankoski
302c33e49f Merge "Clean up removal of vp9_pareto8 table." 2013-11-20 12:30:03 -08:00
Dmitry Kovalev
4956fcd31b Adding MV_FP_SIZE constant.
Change-Id: I98d750ee92ff51fb714980418ea28be3b1d0f3c6
2013-11-20 12:07:57 -08:00
hkuang
6debc446e0 Remove unnecessary eob checking.
Change-Id: Ia568f70bddc1a2b62141a0197459119ca74c22b5
2013-11-20 11:58:11 -08:00
Jim Bankoski
25aae73a30 Merge "remove the model and copy in pack_mb_tokens" 2013-11-20 11:34:30 -08:00
Jim Bankoski
5bbb0c6295 Clean up removal of vp9_pareto8 table.
Change-Id: I5556e8d1fc150be8a3e93af21900829b59a500dc
2013-11-20 11:17:26 -08:00
Jingning Han
81b9fd4310 Merge "Take out assertion from inverse transforms" 2013-11-20 10:55:27 -08:00
Jim Bankoski
03276bf6e6 remove the model and copy in pack_mb_tokens
Change-Id: I00a5203c8ed76c184d936fccf93d76e7c06773d3
2013-11-20 10:06:04 -08:00
Yunqing Wang
0ef63f596d Fix stack pointer in sub-pixel filters
In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.

Change-Id: I062971e195f1f2ab9d0ab5fb84dcf215a0fcaa67
2013-11-20 09:42:44 -08:00
Guillaume Martres
b00057c88a Merge "vpxenc: add --aq-mode flag to control adaptive quantization" 2013-11-20 08:13:28 -08:00
Jim Bankoski
7a8a68e2bd Merge "scan order table lookup same for encoder and decoder" 2013-11-19 16:22:48 -08:00
Yunqing Wang
e8f8e77642 Merge "Fix decoder mismatch with ssse3 enabled" 2013-11-19 16:19:32 -08:00
Yaowu Xu
dd04ff506b Merge "Move vp9_setup_interp_filter() to encoder" 2013-11-19 16:01:19 -08:00
Jim Bankoski
d6667dd54f scan order table lookup same for encoder and decoder
Change-Id: I473947b5ca70b7a81151926284bff86f8555492a
2013-11-19 15:31:43 -08:00
Yunqing Wang
3d50da5397 Fix decoder mismatch with ssse3 enabled
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.

Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
2013-11-19 15:10:04 -08:00
Dmitry Kovalev
65cee2f01a Merge "Simplifying partition context calculation." 2013-11-19 15:09:01 -08:00
Jim Bankoski
60aba6558f Merge "entropy code speedup" 2013-11-19 14:58:44 -08:00
Yaowu Xu
df78fea166 Move vp9_setup_interp_filter() to encoder
As it is used in encoder only.

Change-Id: I5f2a8abbe72bb18cbf6ce36a3dc7e132aeae8ec2
2013-11-19 14:57:58 -08:00
Yaowu Xu
f92cfa1ca6 Merge "Move vp9_sadmxn.h from common to encoder" 2013-11-19 14:41:33 -08:00
Jim Bankoski
8cf352abac entropy code speedup
Change-Id: Ic316d3374ff9a2b43897272260947d56765a0fdd
2013-11-19 14:31:38 -08:00
Jim Bankoski
ff4f1c4b76 scan order / neighbors converted to lookup
Change-Id: I64b189dfeee1cf3e90134a1a93497072f3361e5e
2013-11-19 12:55:44 -08:00
Yaowu Xu
30b03050a2 Move vp9_sadmxn.h from common to encoder
Change-Id: I6f6ba91b1b8b280902b171472314d665aa0baf0b
2013-11-19 12:46:08 -08:00
Dmitry Kovalev
f6ec323906 Simplifying partition context calculation.
Reversing bit order of partition_context_lookup, and modifying accordingly
update_partition_context() and partition_plane_context().

Change-Id: I64a11f1a94962a3bf217de2f50698cb781db71a5
2013-11-19 11:17:30 -08:00
Yunqing Wang
f16fb829e6 Merge "Improve vp9_iht4x4_16_add_sse2 (x1.341)" 2013-11-19 11:11:47 -08:00
Dmitry Kovalev
953b1e9683 Removing raster_block_offset_uint8() function.
There is no need to use that function, it is much clear to pass offset
directly to the buffer.

Change-Id: I9026cb0c5094c46f97df5d7f7daeb952f2843b24
2013-11-18 19:00:49 -08:00
Dmitry Kovalev
9e1e7bee48 Merge "Finally removing txfrm_block_to_raster_block() function." 2013-11-18 18:43:16 -08:00
Dmitry Kovalev
220af9ac2c Merge "Cleaning up vp9_entropy.c file." 2013-11-18 18:04:56 -08:00
Abo Talib Mahfoodh
613e2d2e90 Improve vp9_iht4x4_16_add_sse2 (x1.341)
This rebase is a better implementation of the previous ones.

Modifications are done to reduce the total clock cycle.
Speedup: 1.341
Compiled with -O3
Tested with: park_joy_420_720p50.y4m

Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
2013-11-18 20:53:13 -05:00
Dmitry Kovalev
d8c06d23da Cleaning up vp9_entropy.c file.
Change-Id: I568f5e2d4ef2f2affe013ba1691ffb546f1fe8c6
2013-11-18 17:18:14 -08:00
Yaowu Xu
a42ab027fd Merge "Move vp9_extend.{h,c} from common to encoder" 2013-11-18 15:43:32 -08:00
Yaowu Xu
1c61e1960d Move vp9_extend.{h,c} from common to encoder
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h

Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
2013-11-18 12:43:36 -08:00
Yunqing Wang
e3168b0c54 Merge "Do horizontal loopfiltering in parallel" 2013-11-18 10:03:41 -08:00
Jim Bankoski
83eb1975df partition context update speedup
This removes a lot of operations in setting partition context...

Change-Id: I365e6f5607ece85190cb21443988816dfa510ce3
2013-11-17 06:58:08 -08:00
Yunqing Wang
64f728caef Do horizontal loopfiltering in parallel
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.

Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.

Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
2013-11-15 16:18:43 -08:00
Jingning Han
bdc4371174 Take out assertion from inverse transforms
Separate the rounding and right shift operations of forward transform
from those of inverse transform. Take out the assertion check from
inverse transforms. If the transform coefficients were constructed to
cause intermediate steps of inverse transform overflow, the codec will
just let it overflow without breaking the decoding flow.

Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730
2013-11-15 15:30:47 -08:00
hkuang
7424492a0b Let the idct vp9_idct32x32_34_add = vp9_idct32x32_1024_add
on arm until we implenment real vp9_idct32x32_34_add_neon.

This issue is due to commit 47665452f0
Merge "Add 32x32 idct function for eob<=34 case".

Change-Id: I56b5f0abc20e7dd1bba521f78a995e85d65ea296
2013-11-15 14:59:16 -08:00
Guillaume Martres
17084657e6 vpxenc: add --aq-mode flag to control adaptive quantization
Change-Id: I57e1ad4bed3487df12893ced77c49093f8755706
2013-11-15 19:42:20 +01:00
Dmitry Kovalev
8d7bd4d126 Merge "Cleaning up vp9_loopfilter.c file." 2013-11-15 10:10:59 -08:00
Jingning Han
a9b9f22bcd Merge "Fix coding format in vp9_idct" 2013-11-15 08:59:14 -08:00
Jim Bankoski
e1b6c42eed partition plane context speed up
Removes silly operations inside loop.

Change-Id: I9eeab1e914e715a887f86cf1089de508e2364165
2013-11-15 08:00:43 -08:00
Jim Bankoski
ffb17e2c09 Merge "loop filter assert cleanout" 2013-11-15 07:48:36 -08:00
Dmitry Kovalev
38e6cb8c7b Merge "Cleaning up vp9_tile_common.{h, c} files." 2013-11-14 20:55:01 -08:00
Jingning Han
7637387cf1 Fix coding format in vp9_idct
Change-Id: If97ae16a4478717933345b6b9d5bc1b417b8dd84
2013-11-14 16:05:22 -08:00
Adrian Grange
38144ed8b2 fix scalling bug by buffer auto-reallocation
Change-Id: Ib748eb287520c794631697204da6ebe19523ce95
2013-11-14 15:53:09 -08:00
Dmitry Kovalev
3f9fc6f6f8 Cleaning up vp9_loopfilter.c file.
Change-Id: Ic6770072f80dfb54d2725ed96370d4f243a9f474
2013-11-14 15:04:14 -08:00
Dmitry Kovalev
49fbbf72fa Finally removing txfrm_block_to_raster_block() function.
We only use txfrm_block_to_raster_xy() now.

Change-Id: I4242cd592da99e761041acf9fef1bac3d55a48e1
2013-11-14 13:45:51 -08:00
Dmitry Kovalev
f91ac9b436 Cleaning up vp9_tile_common.{h, c} files.
Change-Id: I9d18f351abe7614107f34f47eeb38a234a9937c9
2013-11-14 13:40:56 -08:00
Jim Bankoski
ef99b7b884 loop filter assert cleanout
Change-Id: I4e2ad4b7342681e6ac236356ef3a4927a54f105b
2013-11-14 12:25:32 -08:00
Deb Mukherjee
cfcd5c4f61 Simplifies band-getting with a static array
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.

Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
2013-11-13 22:15:16 -08:00
Dmitry Kovalev
26a1ad604f Merge "Removing function pointers from inter prediction." 2013-11-13 13:54:15 -08:00
Dmitry Kovalev
60d1a52995 Merge "Optimizing set_contexts() function." 2013-11-13 10:01:05 -08:00
Yunqing Wang
8ce0967df8 Merge "Use 1D array to store super block filter levels" 2013-11-13 09:40:14 -08:00
Johann
4da2a8b718 Merge "mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)" 2013-11-13 09:00:09 -08:00
Parag Salasakar
1530a6b77f mips dsp-ase r2 vp9 decoder intra module optimizations (rebase)
Change-Id: Ib27fc4f3dbe01fe8adfa04a61aaba21b3480e75c
2013-11-13 11:17:14 +05:30
Parag Salasakar
248cf6f69f mips dsp-ase r2 vp9 decoder loopfilter module optimizations (rebase)
Change-Id: Ia7f640ca395e8deaac5986f19d11ab18d85eec2d
2013-11-13 10:53:16 +05:30
Dmitry Kovalev
3f3d14e1d3 Moving q_index from MACROBLOCKD to MACROBLOCK.
Moving because q_index is used only by encoder.

Change-Id: I0b96175614ed4fd3d76ee56a0ba36258e1e896f6
2013-11-12 18:13:19 -08:00
Dmitry Kovalev
73a5cbeba4 Merge "Using max_tx_size instead of bsize when possible." 2013-11-12 16:54:30 -08:00
Dmitry Kovalev
3a2ea76469 Merge "Moving {sb, mb, b, ab}_index from MACROBLOCKD to MACROBLOCK." 2013-11-12 15:59:28 -08:00
Dmitry Kovalev
58b004ff64 Merge "Adding const to tree pointer inside vp9_extra_bit struct." 2013-11-12 15:48:07 -08:00
Johann
8dd3905163 Merge "Added optimized vp9_idct32x32_34_add_dspr2" 2013-11-12 15:30:00 -08:00
Dmitry Kovalev
20f34ff0db Adding const to tree pointer inside vp9_extra_bit struct.
Change-Id: I60e02fa3de930ff1f969687ab5af93dee40d86ad
2013-11-12 14:21:15 -08:00
Yunqing Wang
ce89309b45 Use 1D array to store super block filter levels
As Jim suggested, 1D array was used to store filter levels instead
of 2D array. This used shift_y in setup_mask directly, and saved
few cycles.

Change-Id: If61ab298784861f1806b1cd396d4e4e2e0f097b9
2013-11-12 12:07:57 -08:00
Deb Mukherjee
a33a84b11a Merge "Removes conditional statements from band getting" 2013-11-12 11:22:21 -08:00
Johann
e72d49a97a Use lowercase 'b' to branch
iOS doesn't recognize B:
bad instruction `B idct32_pass_loop'

Change-Id: I3cf6aede4639f1d9efa97f7962fa287ba6feaaef
2013-11-12 10:41:06 -08:00
Yunqing Wang
17322275dd Merge "Rewrite filter_selectively_horiz for parallel loopfiltering" 2013-11-12 10:20:49 -08:00
Yunqing Wang
7989768766 Merge "Improve loopfilter function" 2013-11-12 10:19:56 -08:00
Deb Mukherjee
5ade423774 Removes conditional statements from band getting
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.

Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.

Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
2013-11-12 10:13:27 -08:00
Dmitry Kovalev
50f97cf7fb Removing function pointers from inter prediction.
Removing foreach_predicted_block_visitor and calling build_inter_predictors
directly.

Change-Id: I11bb3c872b99b47c2680b01b0dbcc01c558c4a2b
2013-11-11 18:37:00 -08:00
Yunqing Wang
b45438181c Rewrite filter_selectively_horiz for parallel loopfiltering
Added loop filter mask checking, and made the caller function
ready for implementation of parallel loopfiltering in horizontal
direction.

Next, we need to go through the loopfilter functions (both c and
optimized versions), and provide 16-byte wide loopfiltering for
each filter type.

Change-Id: Ifef47e7ef9086ebc2fd6ca7ede8f27c9bbf79e66
2013-11-11 17:06:01 -08:00
Dmitry Kovalev
3551e25099 Moving {sb, mb, b, ab}_index from MACROBLOCKD to MACROBLOCK.
We use {sb, mb, b, ab}_index only inside encoder, so moving them into
appropriate data structure.

Change-Id: Ib5c1036716354d9d321e11a60c1634c1cb8f9716
2013-11-11 15:58:57 -08:00
Jingning Han
d8b4c79270 Decouple macroblockd_plane buffer usage
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.

Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
2013-11-11 15:26:10 -08:00
hkuang
c689a126ed Fix a bug in the assembly code.
Change-Id: Ic416e3f8a11e82ee298e6f709b2119a9ddf1e2f8
2013-11-11 12:49:12 -08:00
Dmitry Kovalev
c53a9c70fb Merge "Localizing NEARESTMV special cases in the code." 2013-11-11 11:12:06 -08:00
Dmitry Kovalev
22a001988b Optimizing set_contexts() function.
Inlining set_contexts_on_border() into set_contexts(). The only difference
is the additional check that "has_eob != 0" in addition to
"xd->mb_to_right_edge < 0" and "xd->mb_to_right_edge < 0". If has_eob == 0
then memset does the right thing and works faster.

Change-Id: I5206f767d729f758b14c667592b7034df4837d0e
2013-11-08 12:44:56 -08:00
Yunqing Wang
e731b2ba2c Merge "Improve vp9_idct4x4_1_add_sse2" 2013-11-08 12:00:36 -08:00
Yunqing Wang
49cf335e7f Improve loopfilter function
This patch continued the work done in "Rewrite loop_filter_info_n
struct"(commit:00dbd369c70270428d56da6d15ea5486fc821c52) to further
improve loopfilter function.

1. Instead of storing pointers to thresholds, store loopfilter
levels within 64x64 SB;
2. Since loopfilter levels are already calculated in setup_mask,
we don't need call build_lfi to look up them again. Just save
loopfilter levels in setup_mask.
3. Reorganized and simplified filter_block_plane().

Tests showed a ~0.8% decoder speedup.

Change-Id: I723c7779738bbc2afcb9afa2c6f78580ee6c3af7
2013-11-08 11:48:31 -08:00
hkuang
a6462990e6 Merge "Add back vp9_short_idct32x32_1_add_neon which is deleted in cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3." 2013-11-07 14:42:29 -08:00
Ivan Maltz
741c14fcf0 Merge "Move SVC per-frame loop from sample app into libvpx proper" 2013-11-06 17:24:05 -08:00
Ivan Maltz
1ed0e1beb5 Move SVC per-frame loop from sample app into libvpx proper
SVC multiple layer per frame encoding is invoked with vpx_svc_init and
vpx_svc_encode. These interfaces are designed to be invoked from ffmpeg.
Additional improvements:
- make dummy frame handling a bit more explicit
- fixed bug with single layer encodes
- track individual frame sizes and psnrs instead of averages
- parameterized quantizer, 16th scalefactors, more logging,
- enabled single layer encodes to generate baseline
- include new mode for 3 layer I frame with 5 total layers

Change-Id: I46cfa600d102e208c6af8acd6132e0cc25cda8d4
2013-11-06 14:49:27 -08:00
Dmitry Kovalev
7b011c5467 Replacing mi_{width,height}_log2 with num_8x8_blocks_{wide,high}_lookup.
Change-Id: I04c55daef89bca2b85cb7db0850f9b052abc5a7c
2013-11-06 13:34:23 -08:00
Yaowu Xu
2f4bade348 Merge "Missing _ means no sse3 for vp9_h_predictor_32x32." 2013-11-06 13:04:28 -08:00
Paul Wilkins
0c39318a8b Missing _ means no sse3 for vp9_h_predictor_32x32.
Error in script means vp9_h_predictor_32x32 sse3 version
is not enabled.

Change-Id: Ia43672740da1ecdfb7fcd420490ef424b04accc4
2013-11-06 13:57:55 +00:00
Dmitry Kovalev
4a96e64dc2 Using max_tx_size instead of bsize when possible.
Change-Id: I246364bc4270ca13aefb4bc3445bcf102b3170dc
2013-11-05 17:36:43 -08:00
hkuang
6b16f63332 Add back vp9_short_idct32x32_1_add_neon which is deleted in
cleanup I63df79a13cf62aa2c9360a7a26933c100f9ebda3.

Change-Id: I034848cf05031618818f7df2e7f9c35102686948
2013-11-05 14:57:32 -08:00
Dmitry Kovalev
815189613b Localizing NEARESTMV special cases in the code.
Removing special case handling from vp9_tree_probs_from_distribution(),
tree_merge_probs(), and vp9_tokens_from_tree_offset() functions. Replacing
inter_mode_offset() function with macro INTER_OFFSET which is used now for
vp9_inter_mode_tree definition.

Change-Id: Iff75a1499d460beb949ece543389c8754deaf178
2013-11-05 11:58:57 -08:00
Dmitry Kovalev
c622e1d18f Unified approach for backward probability update.
Replacing update_mode_probs() and adapt_probs() with tree_merge_probs().

Change-Id: I50b2c968d67c9265f5216c700cbeba25fb014654
2013-11-04 16:12:29 -08:00
Dmitry Kovalev
dde8069e57 Splitting partition_probs array into two arrays.
We only update partition_probs for inter frames but they are constant
for key frames. It is not necessary to have constants inside frame
context and copy them every time. This change reduces FRAME_CONTEXT size
by at least 48 bytes.


Change-Id: If70a53be51043f37fe7d113853217937710932a7
2013-11-04 14:26:16 -08:00
Dmitry Kovalev
dd209fae3a Merge "Removing 'new' probability calculation from convert_distribution()." 2013-11-04 11:14:58 -08:00
James Zern
152181b25c Merge "vp9 ssse3 d207_predictor_32x32: add missing GLOBAL()" 2013-11-02 12:25:47 -07:00
James Zern
2d980b803a vp9 ssse3 d207_predictor_32x32: add missing GLOBAL()
removes a textrel for sh_b23456789abcdefff

Change-Id: I80cb9dfd8e49a0fe884c8ff76472275b3a00cb57
2013-11-01 20:33:22 -07:00
Dmitry Kovalev
df19c6b64c Removing 'new' probability calculation from convert_distribution().
We don't have to calculate 'new' probability in convert_distribution()
because it is enough to calculate only 'new' counters which could be used
to calculate probability if necessary. That's why removing a lot of unused
temporary probability arrays and reducing number of get_binary_prob()
calls.

Change-Id: I4e14eb7203d1ace61bbddefd6b9b6326be83ba63
2013-11-01 15:09:43 -07:00
Yaowu Xu
333345cd26 Merge "Convert filter kernel choice to lookup" 2013-11-01 13:43:09 -07:00
Yaowu Xu
0f76ba5523 Convert filter kernel choice to lookup
Also removed unused declaration related 6 tap filter

Change-Id: Ic17f516141d885157918505f4204081e4c951fad
2013-11-01 13:03:18 -07:00
Dmitry Kovalev
340b2b076e Merge "Cleanup. Adding const to function pointer arguments." 2013-11-01 10:57:03 -07:00
Dmitry Kovalev
0e1756330b Merge "Removing is_intra_mode() function." 2013-10-31 18:06:53 -07:00
Dmitry Kovalev
7c524bbef4 Cleanup. Adding const to function pointer arguments.
Change-Id: I12c67c8c0fa1aa7fb3f7d6cc2ef65be29c4ea292
2013-10-31 14:34:21 -07:00
Yaowu Xu
d515716140 Merge "mb_lpf_horizontal_edge AVX2 optimization" 2013-10-31 10:43:57 -07:00
Yunqing Wang
d03b3cbdd7 Merge "Fix x_offset_q4/y_offset_q4 calculation" 2013-10-31 09:47:54 -07:00
Tamar Levy
54f9205653 mb_lpf_horizontal_edge AVX2 optimization
This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.

Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
2013-10-31 10:26:15 -06:00