Commit Graph

1481 Commits

Author SHA1 Message Date
James Zern
3c8cce353f vp9: make some static tables const
Change-Id: I8bcae51271673da8755c66a51aea005dfe6a3739
2013-07-22 19:19:13 -07:00
Frank Galligan
e88db77892 Merge "Speedup loopfilter neon code." 2013-07-22 17:39:42 -07:00
Dmitry Kovalev
0ad079e583 Cleanup inside vp9_get_pred_context_tx_size.
Using max_txsize_lookup to get max transform size.

Change-Id: If4b39beba3c06a581effd8cab698ea90727dc2c9
2013-07-22 17:18:11 -07:00
James Zern
ab139094ed Merge "VP9_COMMON: drop cur_tile_{row,col}_idx" 2013-07-22 17:12:39 -07:00
Frank Galligan
5af6bf6c43 Speedup loopfilter neon code.
Try and cut down the cycle count by rearranging the instructions
so there are less stalls.

Change-Id: Ic1383335ee0f05e656477d9ee9c179ec231285d5
2013-07-22 17:00:01 -07:00
Ronald S. Bultje
e20fcd9585 More optimizations for cost_coeffs().
4x4:    163 ->  123 cycles (33% faster)
8x8:    491 ->  399 cycles (23% faster)
16x16: 1889 -> 1763 cycles (7% faster)
32x32: 8311 -> 8180 cycles (1.6% faster)

Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.

Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
2013-07-22 16:09:09 -07:00
James Zern
38a4412e1b vp9: apply loopfilter inline if possible
excludes tiled content currently

Change-Id: I44155253e8d6771e5e039d663be5f21cc9d0355d
2013-07-22 15:52:10 -07:00
Dmitry Kovalev
b2fc6fa969 Adding update_tx_counts function.
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).

Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
2013-07-22 14:57:43 -07:00
James Zern
746154d905 Merge "filter_block_plane: remove MACROBLOCKD param" 2013-07-22 13:43:34 -07:00
James Zern
0a58f462b8 VP9_COMMON: remove unused temp_scale_frame
Change-Id: I696a0dca1d02d365e283029d1d077710bd5680e0
2013-07-22 13:42:11 -07:00
Dmitry Kovalev
0c5a383b2a Merge "Using update_ct and update_ct2 functions for probability update." 2013-07-22 13:34:30 -07:00
James Zern
ccf6710dc2 VP9_COMMON: drop cur_tile_{row,col}_idx
these were only being written in one location and never read.

Change-Id: If59f3c09aa1485cf89bac0099a8a79e99688b5d1
2013-07-22 13:23:33 -07:00
James Zern
76db4d599a Merge "VP[89]_COMMON: remove golden/altref frame counts" 2013-07-22 12:55:07 -07:00
Jingning Han
a5a9f5f7f3 Merge "Optimize operation flow in sub8x8 rd loop" 2013-07-22 12:08:15 -07:00
Dmitry Kovalev
8c5ca9ff14 Using update_ct and update_ct2 functions for probability update.
Update logic for both mode and mvref was the same, so using MODE_COUNT_SAT,
MODE_MAX_UPDATE_FACTOR, update_ct, update_ct2 for both cases. Removing
function update_tx_ct because it was identical to update_mode_ct2.

Change-Id: Iff566be27dbd6cde4c2ec04e8d988f207046b8f0
2013-07-22 12:06:43 -07:00
Jingning Han
409e77f2d4 Optimize operation flow in sub8x8 rd loop
Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
the encoder to skip the forward transform, quantization, and coeff cost
estimation, in the sub8x8 rd optimization search, if the motion
vector(s) are of integer pixel value, and have been tested in the
previous prediction filter type rd loops of the same block.

This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
Its efficacy depends how frequently the motion search will select an
integer motion vector.

Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
2013-07-22 10:40:33 -07:00
Dmitry Kovalev
ee1fe2f750 Merge "Removing pre probabilities from FRAME_CONTEXT." 2013-07-20 22:50:32 -07:00
Dmitry Kovalev
8962d975b2 Merge "Moving all loop filter related variables into new struct." 2013-07-20 22:45:24 -07:00
Dmitry Kovalev
39342db138 Merge "Consistent names for inter mode probabilities and encodings." 2013-07-20 22:40:51 -07:00
Dmitry Kovalev
f66821afbb Merge "Removing frame_type field from MACROBLOCKD struct." 2013-07-20 22:40:06 -07:00
Dmitry Kovalev
7e703de729 Removing pre probabilities from FRAME_CONTEXT.
Using cm->frame_contexts[cm->frame_context_idx] as source of previous
probabilities.

Change-Id: Ie03778acf0e7bebdc3a1f6a51854d4a0712f24a1
2013-07-19 17:33:10 -07:00
Dmitry Kovalev
ee1771ebaa Moving all loop filter related variables into new struct.
Adding loopfilter struct with fields from MACROBLOCKD and VP9Common.
Eventually it will be moved to vp9_loopfilter.h for better code structure.

Change-Id: Iaf5fb71c33719cdfa1b991f671caf071be9ea035
2013-07-19 16:19:10 -07:00
Dmitry Kovalev
f00a237a43 Merge "Fixing problem introduced in one of my previous commits." 2013-07-19 16:14:21 -07:00
Dmitry Kovalev
c3a56ee583 Merge "Moving Scale2Ration function from vp9_onyx.h to vp9_onyx_if.c." 2013-07-19 15:27:24 -07:00
Dmitry Kovalev
2fc927c66a Fixing problem introduced in one of my previous commits.
Changing fc->tx_probs back to fc->pre_tx_probs. This change actually
affects the bitstream but current test vectors work. Chrome branch is not
affected at all. Broken since:

cc662dd Adding struct tx_probs and struct tx_counts to cleanup the code.

Change-Id: I36dd4b3678e902e10aba8dd49b0012eb558c209d
2013-07-19 15:18:43 -07:00
James Zern
de012cec4f filter_block_plane: remove MACROBLOCKD param
replace with direct use of the plane and MODE_INFO

Change-Id: Icce57bc398a6e3607aedde0573d977e192040696
2013-07-19 14:19:55 -07:00
Dmitry Kovalev
e71a4a77bb Merge "Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE)." 2013-07-19 12:14:32 -07:00
Dmitry Kovalev
97e96bc4e9 Removing frame_type field from MACROBLOCKD struct.
Change-Id: Ia4e83913251c1cdc7aa2abd64bf01ecb1a962119
2013-07-19 11:55:36 -07:00
Dmitry Kovalev
c0eb57406c Renaming TXFM_MODE to TX_MODE (like TX_SIZE, TX_TYPE).
Moving TX_MODE enum to vp9_enums.h. Renaming txfm_mode variables to
tx_mode.

Change-Id: I459d1af6dd928ce7fccdf8ce30b6f1ca057bef92
2013-07-19 11:37:13 -07:00
Dmitry Kovalev
afe43d4089 Removing redundant VP9_COMMON* from function signatures.
Functions: vp9_get_pred_context_switchable_interp,
           vp9_get_pred_context_intra_inter,
           vp9_get_pred_context_single_ref_p1,
           vp9_get_pred_context_single_ref_p2.

Change-Id: I3d6fb8aee23c9062270768e1e6da416dd9bb8f96
2013-07-19 11:20:49 -07:00
Dmitry Kovalev
bc7acb134b Consistent names for inter mode probabilities and encodings.
Renaming vp9_sb_mv_ref_tree to vp9_inter_mode_tree, and
vp9_sb_mv_ref_encoding_array to vp9_inter_mode_encodings.

Change-Id: I0e91fbf81350d3ec5a2599064c74089b5d06133a
2013-07-19 10:40:04 -07:00
Paul Wilkins
b2b5836a16 Merge "Block index variables in MACROBLOCKD reduced to chars." 2013-07-19 10:14:52 -07:00
hkuang
97dbee00dd Merge "Add neon optimize vp9_short_idct8x8_add." 2013-07-19 08:28:39 -07:00
Paul Wilkins
710d10c521 Block index variables in MACROBLOCKD reduced to chars.
Change-Id: I9a4df095732d561807de01a41dcb1a1960726a3c
2013-07-19 11:32:51 +01:00
Dmitry Kovalev
13253d6121 Merge "Removing kf_{y, uv}_mode_prob arrays from VP9Common." 2013-07-19 01:00:46 -07:00
Dmitry Kovalev
b829a9d63c Merge "Removing unused int_mv32 union." 2013-07-18 17:56:11 -07:00
Yaowu Xu
67fb0679ee Merge "Merge scale_factors and scale_factors_uv." 2013-07-18 17:50:34 -07:00
hkuang
d757de744c Add neon optimize vp9_short_idct8x8_add.
Change-Id: Ic32acf3e2939c6d12d9c2bf192a5f5da59705fda
2013-07-18 16:40:41 -07:00
Dmitry Kovalev
0b562b2d3d Using VP9_REF_NO_SCALE instead of (1 << VP9_REF_SCALE_SHIFT).
Change-Id: Ide58a74d31ff948319445a6337d2c05e98720e34
2013-07-18 15:12:46 -07:00
Ronald S. Bultje
5ebe503f04 Merge scale_factors and scale_factors_uv.
This prevents a duplicate memcpy of a 128-byte struct every time
set_scale_factors() is called (which is a lot), thus leading to a
decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
RD/partition loop.

Overall, this decreases encoding time of the first 50 frames of bus
@ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
overall speedup. We can likely get more gains by removing the copy
of the other struct (and replacing it with an indexing) as well.

Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
2013-07-18 14:10:56 -07:00
James Zern
5f30a0c687 VP[89]_COMMON: remove golden/altref frame counts
these are only used in the encoder.
frames_since_golden / frames_till_alt_ref_frame -> VP[89]_COMP

Change-Id: Ie14a6f46987bced685ddb449b85dc261caba6dfe
2013-07-18 14:09:21 -07:00
Dmitry Kovalev
9f3c0e34a9 Moving Scale2Ration function from vp9_onyx.h to vp9_onyx_if.c.
Change-Id: Idfe2a850f72b38f519aea1aac1266d8c3aa813ee
2013-07-18 14:05:06 -07:00
Dmitry Kovalev
b3c0a5fddb Removing unused mv_bias and check_mv_bounds functions.
Change-Id: I1558fd969d9ad112bf6480bdd16ef87edd396ab5
2013-07-18 11:20:48 -07:00
Frank Galligan
7fd5d8e6a4 Fix horz loopfilter loops
If count was greater than 1 the src pointer would be off on
the second loop.

Change-Id: I8e09037e68dc4ae92076a8067f7b6dacbbef8263
2013-07-18 09:44:15 -07:00
Dmitry Kovalev
7363e668d5 Removing unused int_mv32 union.
Change-Id: Ie692ed6e5fa1d2122e3a03573914d0fcce842f9e
2013-07-17 17:02:44 -07:00
Dmitry Kovalev
f9f453ec8d Removing kf_{y, uv}_mode_prob arrays from VP9Common.
These arrays have constant values (no any updates). Removing two
corresponding memcpy calls. Making a little cleanup in vp9_entropymode.h
as well: removing redundant 'extern' keyword and moving all function
declarations at the end.

Change-Id: Ia16b38b46aec2e2500f5df29c40a297ae241dede
2013-07-17 16:50:52 -07:00
hkuang
7b9a652813 Merge "Remove unnecessary buffer copy in idct4x4." 2013-07-17 14:51:53 -07:00
Dmitry Kovalev
a7a1e96136 Merge changes Ieffea49e,Idf610746
* changes:
  Removing two unused arguments from vp9_inc_mv signature.
  Changing signature of vp9_get_pred_probs_tx_size.
2013-07-17 14:44:20 -07:00
Dmitry Kovalev
b775081283 Merge "Removing experimental code from vp9_entropymv.c." 2013-07-17 14:43:45 -07:00
hkuang
bd6ce7128c Remove unnecessary buffer copy in idct4x4.
Change-Id: I386066b9bcfb4bffb582e6827af36ca0181f6a83
2013-07-17 14:20:56 -07:00
Dmitry Kovalev
8452c34551 Removing experimental code from vp9_entropymv.c.
Change-Id: I340d06e3bc32c78358654496503cccd4196cbe2e
2013-07-17 10:25:09 -07:00
Johann
9ca66ec050 Merge "vp9_convolve8_neon placeholder" 2013-07-17 10:09:00 -07:00
Johann
59dc4e9cdd vp9_convolve8_neon placeholder
Call the individually optimized horizontal and vertical functions. This
implementation abuses the temp buffer.

This will be replaced with a custom optimized function.

Over 2x speedup.

Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
2013-07-17 08:39:27 -07:00
Paul Wilkins
5f4722c75f Merge "Minor cleanup in code to fine uv tx_size." 2013-07-17 02:50:09 -07:00
Dmitry Kovalev
6638b6f63f Merge "Removing MV_GROUP_UPDATE define and corresponding code." 2013-07-16 21:09:00 -07:00
Dmitry Kovalev
41ae3d02d4 Removing two unused arguments from vp9_inc_mv signature.
Change-Id: Ieffea49eb7a5e5092f21f8694c546aff69b07c6d
2013-07-16 17:01:08 -07:00
Dmitry Kovalev
5b65a71cdc Changing signature of vp9_get_pred_probs_tx_size.
Removing VP9_COMMON* argument and adding struct tx_probs* instead of
MACROBLOCKD*.

Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c
2013-07-16 16:34:54 -07:00
Dmitry Kovalev
f53d007b9e Merge "Loop filter code cleanup." 2013-07-16 15:55:17 -07:00
Dmitry Kovalev
3997da0d35 Removing MV_GROUP_UPDATE define and corresponding code.
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
2013-07-16 15:03:00 -07:00
Dmitry Kovalev
9482a0bf10 Cleaning up tile code.
Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.

Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
2013-07-16 14:47:15 -07:00
Dmitry Kovalev
2de3c8d29b Loop filter code cleanup.
Cosmetic code changes, renaming 'flat' local var to 'mask', removing
unused field 'blim' from loopfilter_info_n and loop_filter_info structs.

Change-Id: I51e6ccf727fe361ad9a08e29e1201aa7abd4987f
2013-07-16 14:39:31 -07:00
James Zern
98e132bde0 Merge changes I40454d26,I892e76d5,I865ab3f9,I4a4bec17,I61c4351e,I37eb3559,I1031c556,I8c8f1f42
* changes:
  delete vp9_loopfilter_sse2.asm
  vp9_loopfilter_intrin_sse2: cosmetics: fix indent
  delete x86/vp9_loopfilter_x86.h
  vp9_loopfilter_intrin_sse2: make some funcs static
  vp9_loopfilter_intrin_sse2: remove unused uv funcs
  vp9_loopfilter: remove uv function typedef
  filter_block_plane: reuse some constants
  vp9_loopfilter.c: make some functions static
2013-07-16 14:25:32 -07:00
James Zern
39ce4b13d5 Merge "use consistent framerate naming" 2013-07-16 14:22:52 -07:00
James Zern
9581eb6e8a use consistent framerate naming
s/frame_rate/framerate/g

Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
2013-07-16 14:12:47 -07:00
Jingning Han
5e8e2bf48e Merge "SSE2 16x16 inverse ADST/DCT hybrid transform" 2013-07-16 14:04:04 -07:00
Dmitry Kovalev
5de96b3ce6 Merge "Rewriting vp9_set_pred_flag_{seg_id, mbskip}." 2013-07-16 13:34:42 -07:00
Dmitry Kovalev
85a0d8e85c Merge "Moving vp9_kf_default_bmode_probs to vp9_entropymode.c." 2013-07-16 13:26:53 -07:00
James Zern
50015f6eba delete vp9_loopfilter_sse2.asm
sse2 functions are provided by vp9_loopfilter_intrin_sse2.c

Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b
2013-07-16 13:09:16 -07:00
James Zern
8f4787a383 vp9_loopfilter_intrin_sse2: cosmetics: fix indent
Change-Id: I892e76d5ad1443b2ea0d1a7839fe26afe9c68ffb
2013-07-16 13:09:16 -07:00
James Zern
af58254267 delete x86/vp9_loopfilter_x86.h
also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h

Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405
2013-07-16 13:09:05 -07:00
James Zern
5baa416b6c Merge "vp9: remove frames_{since,till}.. from MACROBLOCKD" 2013-07-16 13:00:14 -07:00
Jingning Han
d05f66aa10 SSE2 16x16 inverse ADST/DCT hybrid transform
This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
This provides about 1% encoding speed-up at speed 0.

Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
2013-07-16 12:51:42 -07:00
James Zern
c0562d08f6 Merge "VP[89]_COMMON: remove unused near_boffset" 2013-07-16 12:17:04 -07:00
James Zern
63e914bde4 Merge "VP9_COMMON: remove unused framerate/bitrate" 2013-07-16 12:16:37 -07:00
James Zern
3a7c2665d0 Merge "yv12config: remove YUV_TYPE" 2013-07-16 12:16:04 -07:00
Ronald S. Bultje
58a2005367 Merge "Replace generated quant tables with static lookup tables." 2013-07-16 12:07:17 -07:00
Ronald S. Bultje
e965cccce5 Replace generated quant tables with static lookup tables.
This prevents possible float rounding issues between architectures.

Change-Id: I6ed260aebd49feb4cfb5596a5370c44be5f72167
2013-07-16 12:06:26 -07:00
John Koleszar
cc1aac1b3c Merge "Fix above context pointers" 2013-07-16 11:23:38 -07:00
Jingning Han
5851904744 Merge "SSE2 8x8 inverse ADST/DCT transform" 2013-07-16 11:00:11 -07:00
Dmitry Kovalev
baf0c959c7 Moving vp9_kf_default_bmode_probs to vp9_entropymode.c.
Removing vp9_modelcontext.c.

Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427
2013-07-16 10:54:34 -07:00
Dmitry Kovalev
863138a2ad Rewriting vp9_set_pred_flag_{seg_id, mbskip}.
Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.

Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
2013-07-16 10:44:48 -07:00
Paul Wilkins
30d2ea45ce Minor cleanup in code to fine uv tx_size.
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
2013-07-16 18:27:33 +01:00
John Koleszar
5efd9609e3 Fix above context pointers
In the prior code, the above context pointers used for entropy
decoding were initialized on the first frame, and not updated when
the frame size changed. The per-frame code which initializes the
contexts assumes that the contexts are contiguous, leading to an
incomplete initialization when the frame is smaller. This commit
updates the pointers so that the context is contigous whenever
the frame size changes.

Change-Id: I08b53e3a30c8289491212311682ff1b8028cff6c
2013-07-16 10:26:56 -07:00
Johann
90ebfe621f Merge "vp9_convolve8_[horiz|vert]_avg" 2013-07-16 09:42:52 -07:00
Dmitry Kovalev
e8e7620a1f Merge "Removing and moving around constant definitions." 2013-07-16 00:52:53 -07:00
Yaowu Xu
c5b0cd8405 Merge "Change to extend full border only when needed" 2013-07-15 21:35:32 -07:00
Yaowu Xu
5b915ebd92 Change to extend full border only when needed
This is a short term optimization till we work out a decoder
implementation requiring no frame border extension.

Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
2013-07-15 20:52:13 -07:00
Dmitry Kovalev
ca75f1255f Removing and moving around constant definitions.
Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
2013-07-15 19:26:30 -07:00
Dmitry Kovalev
65762849d1 Merge "Consistent naming for loop-filter filters." 2013-07-15 19:21:32 -07:00
Frank Galligan
ce1d69aed9 Merge "Neon: Update mbfilter if all vectors follow one branch." 2013-07-15 17:11:55 -07:00
Dmitry Kovalev
e973b4e2d9 Consistent naming for loop-filter filters.
Renaming flatmask4 to flat_mask4, flatmask5 to flat_mask5, hevmask to
hev_mask, filter to filter4, mbfilter to filter8, wide_mbfilter to
filter16.

Change-Id: Ic61c73e59c2eee505257584867aafac99833cea1
2013-07-15 16:01:31 -07:00
Frank Galligan
f4f60f6005 Neon: Update mbfilter if all vectors follow one branch.
Change the mbfilter Neon code from executing both branches if all
vectors follow only one branch.

The code is about 5% faster when executing only one branch and about
1% slower when executing both branches.

-PS5: Remove local stack space from mbfilter.

Change-Id: I6a23f9b318a9f4568a2718b4c9348db988fe2182
2013-07-15 13:08:28 -07:00
Dmitry Kovalev
1f14bbb624 Merge "Fixing vp9_get_pred_context_comp_ref_p function." 2013-07-15 10:51:42 -07:00
James Zern
04606d7258 vp9_loopfilter_intrin_sse2: make some funcs static
+ drop 'vp9_'

Change-Id: I4a4bec175316aab8f65c3a23bacc8362399a1357
2013-07-13 18:48:00 -07:00
James Zern
dc968d3d45 vp9_loopfilter_intrin_sse2: remove unused uv funcs
vp9_mbloop_filter_horizontal_edge_sse2 /
vp9_mbloop_filter_vertical_edge_uv_sse2

Change-Id: I61c4351ef0cce79fa4156a47ddace781f1566869
2013-07-13 18:44:32 -07:00
James Zern
bd6b79c44d vp9_loopfilter: remove uv function typedef
loop_filter_uvfunction is unused

Change-Id: I37eb3559e9eb2808f1f29dfea429441c94c9df2a
2013-07-13 18:38:28 -07:00
James Zern
9a4e175a64 filter_block_plane: reuse some constants
+ light const application
+ limit scope of params to build_lfi

Change-Id: I1031c556aec160a690921dc10e7aa8a707f43ecd
2013-07-13 18:21:05 -07:00
James Zern
b09d37af0c vp9_loopfilter.c: make some functions static
+ drop 'vp9_'

Change-Id: I8c8f1f421f7fc84d2efb80349cd725de3c9bf6bd
2013-07-13 18:14:03 -07:00
James Zern
dc1d2331f6 vp9: remove frames_{since,till}.. from MACROBLOCKD
frames_since_golden / frames_till_alt_ref_frame are unused.

Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b
2013-07-13 18:02:11 -07:00
James Zern
04092764f7 VP9_COMMON: remove unused framerate/bitrate
+ VP8_COMMON: place them under CONFIG_POSTPROC_VISUALIZER

Change-Id: I2702d5a3e1134b9c5f7ddc14b4173955a400f2cf
2013-07-12 21:43:23 -07:00
Jingning Han
91365addf8 SSE2 8x8 inverse ADST/DCT transform
This commit enables SSE2 implementation of 8x8 inverse ADST/DCT
transform. The runtime goes from 1216 cycles -> 266 cycles.
For bus_cif at 2000 kbps, the overall runtime reduces from
253707ms -> 248430ms, i.e., 2% speed-up at speed 0.

Change-Id: Ib0372e17e9162d7b11a10d653b1c8be547c878fb
2013-07-12 21:03:16 -07:00
James Zern
ce0324d8dd VP[89]_COMMON: remove unused near_boffset
Change-Id: If9b9ca703b997312df85241a0758d414cfdc5228
2013-07-12 19:41:27 -07:00
Dmitry Kovalev
429070987a Using vp9_copy and vp9_zero instead of custom code.
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
2013-07-12 18:07:43 -07:00
Dmitry Kovalev
31a68bcdff Fixing vp9_get_pred_context_comp_ref_p function.
Adding missed parenthesis around boolean expressions. Bitstream is changed.
Regenerating test vectors.

Change-Id: I4cc00b761e9473f92f180a9fc3a0c607f0aaae56
2013-07-12 17:46:02 -07:00
Johann
a15bebfc0a vp9_convolve8_[horiz|vert]_avg
Super basic conversion from the other implementations. Any changes to
one should be trivial to copy over keep in sync.

Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
2013-07-12 16:21:33 -07:00
Dmitry Kovalev
aa518af8c7 Merge "Adding struct tx_probs and struct tx_counts to cleanup the code." 2013-07-12 16:02:09 -07:00
James Zern
c9a2a06c20 Merge "vp9_postproc: remove useless self-assign" 2013-07-12 15:41:41 -07:00
James Zern
4fc6c88e9c yv12config: remove YUV_TYPE
this was never fleshed out in the context of VP8, for which it was
added. for VP9 it has no meaning.

Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff
2013-07-12 15:25:48 -07:00
Dmitry Kovalev
cc662dd768 Adding struct tx_probs and struct tx_counts to cleanup the code.
Also removing unused declarations from vp9_entropymode.h file.

Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
2013-07-12 15:22:38 -07:00
Dmitry Kovalev
60969da5cb Merge "Code cleanup in vp9_pred_common.c" 2013-07-12 15:04:07 -07:00
James Zern
cca973a1ab vp9_postproc: remove useless self-assign
Change-Id: I0bc5d2d8c9fec8be18263b0dc2528886bb5b7b61
2013-07-12 14:17:15 -07:00
Dmitry Kovalev
3ab86adb1e Code cleanup in vp9_pred_common.c
No bitstream changes. Using MB_MODE_INFO temp variables instead of
MODE_INFO variables. Removing redundant curly braces.

Change-Id: Ib9d1bedfbd8af97ecc722ccf697ea8177bbe287c
2013-07-12 14:11:48 -07:00
James Zern
0195fb53cb vp9: consistent 'log2' variable naming
lg2 -> log2

Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf
2013-07-12 11:37:43 -07:00
Deb Mukherjee
94c481f9f1 Some minor cleanups for efficiency
Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
2013-07-12 10:22:56 -07:00
Dmitry Kovalev
dd150e8ea9 Removing redundant code mostly from vp9_pred_common.{h, c}.
Removing redundant function arguments and curly braces.

Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b
2013-07-11 18:39:10 -07:00
Jingning Han
dac5891a1a Merge "SSE2 4x4 invserse ADST/DCT transform" 2013-07-11 14:17:23 -07:00
Dmitry Kovalev
b55ecafda8 Merge "Making vp9_default_nmv_context static." 2013-07-11 13:58:34 -07:00
Dmitry Kovalev
c4ad3273c7 Moving segmentation related vars into separate struct.
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
2013-07-11 11:57:57 -07:00
Johann
158c80cbb0 convolve8 optimizations for neon
Independent horizontal and vertical implementations.

Requires that blocks be built from 4x4 and [xy]_step_q4 == 16

6-10% improvement. CIF improved the least.

Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
2013-07-11 11:08:19 -07:00
hkuang
c9b25dcae4 Add neon optimize vp9_dc_only_idct_add.
Change-Id: Iae84ab945cc9662a0ddd839aa2b9ca59f2ae5423
2013-07-11 10:30:47 -07:00
Jim Bankoski
5000cdf0ff Merge "Wide loopfilter 16 pix at a time" 2013-07-11 06:44:02 -07:00
Jingning Han
49b6302044 SSE2 4x4 invserse ADST/DCT transform
Enable SSE2 4x4 inverse ADST/DCT transform. The runtime goes from
292 cycles down to 89 cycles. Running bus_cif at 2000 kbps, the
overall runtime of speed 0 goes from 301s to 295s (2% speed-up).

Change-Id: I24098136e7fee7ab2fbf1c11755bdf2ca37f3628
2013-07-10 20:16:02 -07:00
Ronald S. Bultje
decead7336 Replace copy_memNxM functions with a generic copy/avg function.
Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa
2013-07-10 18:27:24 -07:00
Dmitry Kovalev
ac72ad071d Making vp9_default_nmv_context static.
Change-Id: Ia3d5bd45adf288de11ab59c4728266c93c17e275
2013-07-10 17:44:45 -07:00
Ronald S. Bultje
46997bde88 Merge "Remove unused iwalsh4x4 MMX/SSE2 functions." 2013-07-10 17:08:46 -07:00
Ronald S. Bultje
a7ef456453 Merge "Remove unused 16x3/3x16 sad SSE2 functions." 2013-07-10 17:08:43 -07:00
John Koleszar
64f7a4d8cb Wide loopfilter 16 pix at a time
Where possible, do the 16 pixel wide filter while doing the horizontal
filtering pass. The same approach can be taken for the mbloop_filter
when that's implemented. Doing so on the vertical pass is a little more
involved, but possible.

Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
2013-07-10 16:32:44 -07:00
Deb Mukherjee
7494bba66b Merge "Prunes out full-rd computation based on modeled rd" 2013-07-10 15:37:11 -07:00
Ronald S. Bultje
3f210f10eb Remove unused iwalsh4x4 MMX/SSE2 functions.
Change-Id: I2d22577911a37ed7d8c7e08cac20764842267652
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
48c53233fd Remove unused 16x3/3x16 sad SSE2 functions.
Change-Id: I30a597c0cc366e34c9a3e2afe32d70e044f95ca4
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
e6f955251f Merge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction." 2013-07-10 14:52:23 -07:00
Ronald S. Bultje
6a60249071 Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction." 2013-07-10 14:52:19 -07:00
Jim Bankoski
865ca76604 Merge "remove warnings when NDEBUG is set" 2013-07-10 14:39:39 -07:00
Jim Bankoski
6591cf2f7e remove warnings when NDEBUG is set
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
2013-07-10 14:27:20 -07:00
Deb Mukherjee
53ff43adc3 Prunes out full-rd computation based on modeled rd
Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
2013-07-10 13:49:49 -07:00
Jingning Han
114423538f SSE2 16x16 ADST/DCT hybrid transform
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.

Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
2013-07-10 12:14:53 -07:00
John Koleszar
d1f8dd518c Merge "Fix intermediate height in convolve" 2013-07-10 11:04:40 -07:00
Ronald S. Bultje
44b29a769c Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction." 2013-07-10 10:24:16 -07:00
Ronald S. Bultje
89810bfd71 Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction." 2013-07-10 10:13:16 -07:00
Dmitry Kovalev
20986c81b3 Merge "Removing vp9_maskingmv.c and corresponding assembly file." 2013-07-10 10:05:06 -07:00
Ronald S. Bultje
7fd643264a SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction.
Change-Id: Iad70966b986f65259329070e258f76ef0af816b4
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
8dade638a1 SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction.
Change-Id: I3441c059214c2956e8261331bbf521525a617a86
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
75b33c68c7 SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction.
Change-Id: I55a6cfa2daba738cbc0c4a02f806893f7e556997
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
92c5d3665d SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction.
Change-Id: Ibe1690afc5459f3b3beca401e7734fcd03da6dd0
2013-07-10 09:28:03 -07:00
Jim Bankoski
863204e64d mi_width_log2 & mi_height_log2
converted to lookup to avoid unnecessary code

Change-Id: I2ee6a01f06984cc2c4ba74b3fffd215318f749d2
2013-07-10 07:26:08 -07:00
Jim Bankoski
6c8170af52 b_width_log2 and b_height_log2 lookups
Replace case statement with lookup.
    Small speed gain at low speed settings but at speed 2+ where the
    number of motion searches etc. falls the impact rises to ~3-4%.

    Change-Id: Idff639b7b302ee65e042b7bf836943ac0a06fad8

Change-Id: I5940719a4a161f8c26ac9a6753f1678494cec644
2013-07-10 07:19:09 -07:00
Jim Bankoski
fb027a7658 removing case statements around prediction entropy coding
Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b
2013-07-09 20:10:16 -07:00
James Zern
dac57fece6 Merge "Remove all asm offset files from VP9" 2013-07-09 19:13:37 -07:00
Dmitry Kovalev
2824048a56 Merge "Loop filter code cleanup." 2013-07-09 18:56:19 -07:00
Frank Galligan
53971d86ea Merge "Add Neon horizontal and vertical vp9_mbloop_filter" 2013-07-09 15:38:44 -07:00
John Koleszar
f0d9f10d24 Remove all asm offset files from VP9
The files are empty and unused.

Change-Id: Ieb4242d14273efdf24149bda33f9591540bba06a
2013-07-09 14:26:53 -07:00
Frank Galligan
198fa6d0a0 Add Neon horizontal and vertical vp9_mbloop_filter
- The vp9 mbfilter C code will branch on flat and mask. This CL
  will perform both branches and combine the data. A later CL will
  perform a check to see if all patch will take one branch.
- These functions are about 1.75 times faster than the C code on
  Nexus 7.

PS #3
- Changed all functions to dub limit, blimit, and thresh from
  vld {dx[]}, freeing up r4-r6.
- Changed code to use vbif to reduce one instruction and free
  up a d register.

Change-Id: I028dae0e434dc9891c3677bdb182e201ffb04777
2013-07-09 12:40:05 -07:00
Dmitry Kovalev
ec68d25521 Merge "Adding update_tx_ct function, removing duplicated code." 2013-07-09 12:26:11 -07:00
Dmitry Kovalev
aeed28f143 Removing vp9_maskingmv.c and corresponding assembly file.
Change-Id: I9842d02d61d78d17dc3449bae8ffbe60f4b3ecb3
2013-07-09 11:22:56 -07:00
Dmitry Kovalev
92a9eaef50 Loop filter code cleanup.
Using MAX_LOOP_FILTER constant instead of number 63.

Change-Id: If91e0c198331b3041e7cd0707a5948479e9209d8
2013-07-09 11:18:09 -07:00
Ronald S. Bultje
d8fa5d45cc Merge "Make intra prediction pointers RTCD-based." 2013-07-09 09:54:43 -07:00
Yaowu Xu
df5731273f Merge "Fix loopfilter bug" 2013-07-09 01:34:25 -07:00
Dmitry Kovalev
c6c279aff0 Merge "Using mi_cols instead of mb_cols." 2013-07-08 20:09:19 -07:00
Dmitry Kovalev
1c65c580d6 Merge "Refactoring setup_pre_planes function." 2013-07-08 20:08:05 -07:00
Dmitry Kovalev
6254c8d780 Merge "Calling set_partition_seg_context() instead of code duplication." 2013-07-08 20:07:06 -07:00
Ronald S. Bultje
8350e7fe38 Make intra prediction pointers RTCD-based.
This probably has a mildly negative impact on performance, but will
(in future commits - or possibly merged with this one) allow SIMD
implementations of individual intra prediction functions. We may
perhaps want to consider having separate functions per txfm-size
also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for
each intra prediction mode), but I haven't played much with that
yet.

Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269
2013-07-08 17:25:51 -07:00
John Koleszar
527fc5caf6 Fix loopfilter bug
In the rare case were 4x4 interior filtering was called for but no
8x8 or larger filtering takes place, the previous code was skipping
the filtering. This patch fixes the issue by including the interior
mask in the overall mask for the filter application loops.

Change-Id: I4a0b65056c64f97478827c2ff41e0914fc7779d0
2013-07-08 16:49:57 -07:00
Ronald S. Bultje
bd867f1619 Inline vp9_get_mv_joint().
Encode time for first 50 frames of bus (speed 0) @ 1500kbps goes from
2min10.9 to 2min10.5, i.e. 0.3% faster overall, basically because we
prevent the call overhead.

Change-Id: I1eab1a95dd3eae282f9b866f1f0b3dcadff073d5
2013-07-08 16:22:39 -07:00
Dmitry Kovalev
b7559258a4 Using mi_cols instead of mb_cols.
Eliminating usage of mb-units, switching to mi-units. Adding
ALIGN_POWER_OF_TWO macro.

Change-Id: I2491c969f713207c062011878b57e4e531818607
2013-07-08 14:54:04 -07:00
Tero Rintaluoma
18303b1263 Fix intermediate height in convolve
intermediate_height for horizontal filtering must be at least 8
pixels to be able to do vertical filtering correctly. Currently
it can be less for small block and y_step_q4 sizes.

Change-Id: I2ee28b0591b2041c2fa9844d0ae2ff8a1a59cc21
2013-07-05 14:58:25 +03:00
Dmitry Kovalev
bfcef95c45 Adding update_tx_ct function, removing duplicated code.
Change-Id: I8882fe3cd247a5a8304ab8ab2ee9abdb92830133
2013-07-03 18:24:13 -07:00
Dmitry Kovalev
f72e072555 Refactoring setup_pre_planes function.
Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63
2013-07-03 17:42:01 -07:00
Dmitry Kovalev
430bd0c94a Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE." 2013-07-03 14:16:02 -07:00
Dmitry Kovalev
2ad62c9312 Calling set_partition_seg_context() instead of code duplication.
Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555
2013-07-03 11:15:58 -07:00
Dmitry Kovalev
5a21de8418 Replacing 64 / MI_SIZE with MI_BLOCK_SIZE.
Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
2013-07-03 10:54:50 -07:00
Yaowu Xu
0f02dc2709 Inline a few intra predictors
Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710
2013-07-03 10:20:41 -07:00
Ronald S. Bultje
98c493a1c0 Merge "Remove unused function vp9_build_inter4x4_predictors_mbuv()." 2013-07-03 09:05:20 -07:00
Dmitry Kovalev
be77f6bbbf Removing redundant struct from union b_mode_info.
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
2013-07-02 16:51:57 -07:00
Ronald S. Bultje
5b87240230 Remove unused function vp9_build_inter4x4_predictors_mbuv().
Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
2013-07-02 16:34:24 -07:00
Deb Mukherjee
8d3d2b76f3 Tx size selection enhancements
(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
2013-07-02 13:54:00 -07:00
Dmitry Kovalev
904070ca64 Merge "Removing unused implicit segmentation code." 2013-07-02 11:58:48 -07:00
Ronald S. Bultje
3cc6eb7c00 Merge "Make get_coef_context() branchless." 2013-07-02 11:48:15 -07:00
Dmitry Kovalev
3140c443e4 Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h." 2013-07-02 11:31:35 -07:00
Dmitry Kovalev
a3d2e6c98b Removing unused implicit segmentation code.
Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905
2013-07-02 11:16:42 -07:00
Ronald S. Bultje
9df24b41ca Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also." 2013-07-02 09:38:08 -07:00
Dmitry Kovalev
1ac0540296 Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h.
Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
2013-07-01 17:28:08 -07:00
Ronald S. Bultje
26b6318de8 Make get_coef_context() branchless.
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
2013-07-01 16:34:10 -07:00
Yaowu Xu
ba3b2604f0 Merge "Quantize (64-bit only, for now) SSSE3 SIMD." 2013-07-01 15:58:57 -07:00
Ronald S. Bultje
c8defcfdee Update quantize SSSE3 SIMD to cover 32x32 transform case also.
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
2min10.1, i.e. a 2.3% overall speed increase.

Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
2013-07-01 11:36:33 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Dmitry Kovalev
2ab3bc8871 Removing vp9_modecont.{h, c}.
Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.

Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de
2013-07-01 10:17:15 -07:00
Jingning Han
993942ce0c Merge "Enable SSE2 4x4 ADST/DCT transform" 2013-06-29 15:57:04 -07:00
Christian Duvivier
466e0cf303 SSE2 version of vp9_short_fdct32x32_rd.
43,000 -> 5,750 cycles, about 7.5x faster.

Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
2013-06-29 13:53:00 -07:00
Johann
6098e359f4 Merge "add Neon optimized add constant residual functions" 2013-06-28 19:50:38 -07:00
James Zern
84d08fa9c4 Merge "fix test compile error" 2013-06-28 19:48:05 -07:00
Ronald S. Bultje
a487af8d35 Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)." 2013-06-28 19:37:11 -07:00
chm
a83cfd4da1 add Neon optimized add constant residual functions
- Add add_constant_residual_8x8 16x16 32x32 functions
- Tested under RealView debugger enviroment

Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e
2013-06-28 19:06:51 -07:00
James Zern
a63e31e81e fix test compile error
since:
92479d9 Make update_partition_context faster

fixes:
vp9/common/vp9_blockd.h:408:22: error:
non-constant-expression cannot be narrowed from type 'int' to 'char' in
initializer list [-Wc++11-narrowing]
  char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
                     ^~~~~~~~~~~~~~~~~

Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b
2013-06-28 18:07:37 -07:00
Jingning Han
1109b6b888 Enable SSE2 4x4 ADST/DCT transform
This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.

Change-Id: Iad4d526346e05c7be896466c05500711bb763660
2013-06-28 17:24:43 -07:00
Dmitry Kovalev
228b8232d3 Cosmetic reordering of FRAME_CONTEXT members.
Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2
2013-06-28 16:16:03 -07:00
Dmitry Kovalev
59070f6e3c Merge "Removing CONFIG_DEBUG checks on assertions." 2013-06-28 14:03:28 -07:00
Ronald S. Bultje
ec5d09b950 Merge "Make coefficient skip condition an explicit RD choice." 2013-06-28 11:54:28 -07:00
Ronald S. Bultje
d00b8e5f82 Inline vp9_get_coef_context() (and remove vp9_ prefix).
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
2013-06-28 10:40:21 -07:00
Dmitry Kovalev
0345fc3ad9 Merge "Decoder's code cleanup." 2013-06-28 10:38:54 -07:00
Dmitry Kovalev
8e6ce6bb9e Removing CONFIG_DEBUG checks on assertions.
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.

Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
2013-06-28 10:36:20 -07:00
Ronald S. Bultje
af660715c0 Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28 10:28:49 -07:00
Dmitry Kovalev
3231da0a9e Decoder's code cleanup.
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.

Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
2013-06-27 16:15:43 -07:00
Frank Galligan
1d6dc1b702 Add Neon optimized loop filter functions.
- Added vp9_loop_filter_horizontal_edge_neon and
  vp9_loop_filter_vertical_edge_neon.
- The functions are based off the vp8 loopfilter
  functions.
- Matches x86 md5 checksum.

Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0
2013-06-27 16:14:45 -07:00
Dmitry Kovalev
a3664258c5 Merge "General cleanup in segmentation-related code." 2013-06-27 14:57:07 -07:00
Dmitry Kovalev
be83ef3104 Merge "Moving subexp encoding functions in separate vp9_dsubexp.c file." 2013-06-27 14:55:18 -07:00
Jingning Han
fc1cfd8e32 Merge "Make intra predictor reference buffer configurable" 2013-06-26 19:02:02 -07:00
Jingning Han
4c10515f89 Merge "Make update_partition_context faster" 2013-06-26 19:01:45 -07:00
Yaowu Xu
896dc47cac Merge "Change to use LUT for mode-to-txfm conversion" 2013-06-26 17:19:47 -07:00
Jingning Han
861cb06c67 Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.

Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
2013-06-26 17:17:21 -07:00
Jingning Han
92479d9526 Make update_partition_context faster
Use vpx_memset for updating the partition contexts. Thanks to Noah
for pointing out the need of refactoring in this part.

Change-Id: I67fb78429d632298f1cd8a0be346cc76f79392a6
2013-06-26 17:05:51 -07:00
Yaowu Xu
25fe05fd92 Change to use LUT for mode-to-txfm conversion
Change-Id: Ieb989830f49e6708ee7728eddebf7a2144c37c6f
2013-06-26 14:10:43 -07:00
Dmitry Kovalev
be07485e9a General cleanup in segmentation-related code.
Using consistent function and variable names.

Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7
2013-06-26 10:27:28 -07:00
John Koleszar
8137e24f3d Merge "Move vp9_counts_to_nmv_context to encoder" 2013-06-25 22:44:21 -07:00
John Koleszar
7bbb0633cd Merge "Move vp9_full_to_model_counts to encoder" 2013-06-25 22:44:16 -07:00
Jingning Han
3cc8c8c3a0 Merge "Refactor intra predictor block" 2013-06-25 19:46:55 -07:00
Jingning Han
d19ea3861d Refactor intra predictor block
Remove vp9_intra4x4_predict(). Use the common intra prediction
function for all block sizes.

Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
2013-06-25 16:33:13 -07:00
Dmitry Kovalev
6fb10f2de4 Renaming "nmv" to "mv".
Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
2013-06-25 15:19:18 -07:00
Ronald S. Bultje
c24d922396 Add averaging-SAD functions for 8-point comp-inter motion search.
Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
the variance of the averaging predictor. This is slightly suboptimal
because the function is subpixel-position-aware, but it will (at least
for the SSE2 version) not actually use a bilinear filter for a full-pixel
position, thus leading to approximately the same performance compared to
if we implemented an actual average-aware full-pixel variance function.
That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
leading to a total gain of 2.7%.

Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
2013-06-25 12:57:28 -07:00
Dmitry Kovalev
9467571777 Moving subexp encoding functions in separate vp9_dsubexp.c file.
Change-Id: Idbb2ea80f764fa830fe2ddcfc54ef7fe232f05a8
2013-06-25 11:53:17 -07:00
Dmitry Kovalev
5ae096778e Merge "Removing unused code." 2013-06-25 11:50:55 -07:00
Yaowu Xu
c2e3ee13e7 Merge "Changed size of mb_mode_context to 8 bits" 2013-06-25 10:44:47 -07:00
Scott LaVarnway
855e23ce8c Merge "Small mode_info_context cleanup in filter_block_plane" 2013-06-25 10:34:19 -07:00
Dmitry Kovalev
87ee34aacb Removing unused code.
Removing block index (ib) parameter from get_tx_type_{8x8, 16x16}
functions.

Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1
2013-06-25 10:17:19 -07:00
Dmitry Kovalev
70e9622185 Merge "Removing find_seg_id and using vp9_get_pred_mi_segid instead." 2013-06-25 10:16:06 -07:00
Dmitry Kovalev
529679bd52 Merge "Transforming scale_mv_component_q4 into scale_mv_q4 function." 2013-06-25 10:15:33 -07:00
Scott LaVarnway
c787f40bc4 Small mode_info_context cleanup in filter_block_plane
Unnecessary updates to xd->mode_info_context.

Change-Id: I36d2d68ca48366f727548526726b1b5437f62968
2013-06-25 12:28:50 -04:00
Yaowu Xu
b9c934df8e Merge "Enable sse2 implmentation of 8x8 ADST/DCT" 2013-06-25 09:13:22 -07:00
Jingning Han
a32a086d23 Enable sse2 implmentation of 8x8 ADST/DCT
This commit makes use of the butterfly structure to enable the sse2
version implementation of 8x8 ADST/DCT hybrid transform coding.

The runtime of hybrid transform module goes down from 1170 cycles
to 245 cycles. Overall speed-up around 1.5%.

Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
2013-06-24 18:41:33 -07:00
John Koleszar
4ecd6dbead Move vp9_counts_to_nmv_context to encoder
This function only used from within vp9_encodemv.c.

Change-Id: Ib3fc7c30b1e2d27321397ac474cbc8976bc1f4b1
2013-06-24 15:58:18 -07:00
John Koleszar
08b1798ae7 Move vp9_full_to_model_counts to encoder
This function is not called from the decoder, so it doesn't need to be
in common/.

Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733
2013-06-24 15:46:15 -07:00
John Koleszar
ece724ae16 Merge "Remove unused vp9_build_intra_predictors_sb{y,uv}_s" 2013-06-24 15:08:58 -07:00
John Koleszar
ee4a7e4e46 Merge "Remove unused vp9_model_to_full_probs_sb()" 2013-06-24 15:08:54 -07:00
Scott LaVarnway
dfa2ecc3f1 Changed size of mb_mode_context to 8 bits
This reduced the size of the MODE_INFO array (mip and prev_mip)
by 425,568 bytes each for 1080p resolutions.

Change-Id: Ifa513ec2d0a49e8ec0867ec90620762fb7f1261d
2013-06-24 17:11:16 -04:00
John Koleszar
858475a03a Fix loopfilter of leftmost 4x4 edges in SB
For cases where there's no transform set in bit 0 (the left edge of
the SB) but bit 0 of mask_4x4_int is set (the edge 4 pixels from the
left edge needs filtering), it was incorrectly being skipped before.
This situation only happens on the leftmost edge of the image, as
the edge at column 0 is intentionally skipped since there aren't
pixels to the left to read.

Change-Id: Ib2fbbcb40166e90af31b1a0e13b85b68c226cbd3
2013-06-24 08:26:00 -07:00
John Koleszar
9e7019f7df Remove unused vp9_build_intra_predictors_sb{y,uv}_s
The functions no longer referenced.

Change-Id: If2705dfbc607f79ec8ec2242d5e03bec27a35aaf
2013-06-21 16:10:05 -07:00
John Koleszar
5c32215e27 Remove unused vp9_model_to_full_probs_sb()
This function never referenced.

Change-Id: I1c42cd355bfa88e17d169f7335a44be682af58cc
2013-06-21 15:38:55 -07:00
Dmitry Kovalev
f27f76dfb3 Transforming scale_mv_component_q4 into scale_mv_q4 function.
Using MV instead of int_mv for function arguments.

Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6
2013-06-21 15:34:29 -07:00
Dmitry Kovalev
40141681c0 Removing find_seg_id and using vp9_get_pred_mi_segid instead.
Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827
2013-06-21 13:05:10 -07:00
Ronald S. Bultje
54b2a59623 Implement SSE2 block_error.
Change vp9_block_error() to return a 64bit error variable, change all
callers to expect a 64bit return value (this will prevent overflows,
which we basically don't check for at all right now). Remove duplicate
block_error() function, which fixed that through truncation. Remove
old (incompatible) mmx/sse2 block_error SIMD versions and replace with
a new one that returns a 64bit value.

Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
3min23, i.e. a 3% overall speedup.

Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
2013-06-21 12:54:52 -07:00
Ronald S. Bultje
7756e9892b Merge "Add subtract_block SSE2 version and unit test." 2013-06-21 12:49:50 -07:00
Ronald S. Bultje
9a480482cb Merge "SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance()." 2013-06-21 12:49:43 -07:00
Ronald S. Bultje
25c588b1e4 Add subtract_block SSE2 version and unit test.
3% faster overall (3min35.0 to 3min28.5).

Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e
2013-06-21 09:35:37 -07:00
Yaowu Xu
e6cd5ed307 Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes." 2013-06-20 17:42:50 -07:00
Ronald S. Bultje
1e6a32f1af SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance().
Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to
3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions
which use a bilinear filter (x_offset & 7 || y_offset & 7) aren't
perfectly interleaved, and can probably be improved further in the
future. I've marked this with a few TODOs/FIXMEs in the code.

Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9
2013-06-20 15:59:48 -07:00
Frank Galligan
c259af4f73 Fix win64 warning.
- size_t vs int.

Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
2013-06-20 14:07:11 -07:00
Dmitry Kovalev
8283d893eb Merge "Renaming 'nmv' to 'mv' for several functions." 2013-06-20 10:17:12 -07:00
Ronald S. Bultje
8fb6c58191 Implement sse2 and ssse3 versions for all sub_pixel_variance sizes.
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):

sse2   4x4:    99 ->   82 cycles
sse2   4x8:           128 cycles
sse2   8x4:           121 cycles
sse2   8x8:   149 ->  129 cycles
sse2   8x16:  235 ->  245 cycles (?)
sse2  16x8:   269 ->  203 cycles
sse2  16x16:  441 ->  349 cycles
sse2  16x32:          641 cycles
sse2  32x16:          643 cycles
sse2  32x32: 1733 -> 1154 cycles
sse2  32x64:         2247 cycles
sse2  64x32:         2323 cycles
sse2  64x64: 6984 -> 4442 cycles

ssse3  4x4:           100 cycles (?)
ssse3  4x8:           103 cycles
ssse3  8x4:            71 cycles
ssse3  8x8:           147 cycles
ssse3  8x16:          158 cycles
ssse3 16x8:   188 ->  162 cycles
ssse3 16x16:  316 ->  273 cycles
ssse3 16x32:          535 cycles
ssse3 32x16:          564 cycles
ssse3 32x32:          973 cycles
ssse3 32x64:         1930 cycles
ssse3 64x32:         1922 cycles
ssse3 64x64:         3760 cycles

Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
2013-06-20 09:34:25 -07:00
Jim Bankoski
2c6bdbbc78 new debug modes code
The new print out includes skips and has prefixed sections so you can
grep to find things like transforms chosen on each frame.

Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
2013-06-20 09:33:11 -07:00
Yaowu Xu
12180c8329 Remove unnecessary copying of probs.
Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
2013-06-18 23:02:27 -07:00
Dmitry Kovalev
87e1fa7627 Renaming 'nmv' to 'mv' for several functions.
Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09
2013-06-18 18:28:10 -07:00
Jingning Han
7088426976 Merge "Make fdct32 computation flow within 16bit range" 2013-06-18 11:40:14 -07:00
Dmitry Kovalev
dfc0385291 Merge "Removing vp9_invtrans.{c, h} files." 2013-06-18 10:16:25 -07:00
Jingning Han
a41a4860c0 Make fdct32 computation flow within 16bit range
This commit makes use of dual fdct32x32 versions for rate-distortion
optimization loop and encoding process, respectively. The one for
rd loop requires only 16 bits precision for intermediate steps.
The original fdct32x32 that allows higher intermediate precision (18
bits) was retained for the encoding process only.

This allows speed-up for fdct32x32 in the rd loop. No performance
loss observed.

Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
2013-06-18 09:46:24 -07:00
Ronald S. Bultje
d9fc451666 Move subpixel variance function from common/ to encoder/.
This seems to only be used in the encoder. Also remove an empty wrapper
file that contained forward declarations for this function, but didn't
actually define any actual functions.

Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b
2013-06-17 16:54:09 -07:00
Dmitry Kovalev
686b99741c Removing vp9_invtrans.{c, h} files.
Moving single function from vp9_invtrans.c to vp9_encodemb.c.

Change-Id: I26bf6bb90de342a3036c0dbfba78a7dd75a61fe7
2013-06-17 16:09:03 -07:00
John Koleszar
61ecc282b5 Merge "Remove unused need_to_clamp_mvs" 2013-06-17 10:31:58 -07:00
John Koleszar
141ab2d5d0 Merge "Fix type mismatch in array definition" 2013-06-14 17:07:22 -07:00
John Koleszar
c2da365484 Merge "Remove constant vp9_coef_update_prob table" 2013-06-14 17:07:19 -07:00
John Koleszar
a9415d2e4c Fix type mismatch in array definition
vp9_default_inter_mode_probs was being accessed with a different type
than it was defined with. Ensure that its declaration is included
prior to its definition.

Change-Id: I2f963f513ab2f4e339f8a3c17e3d0f03749eba16
2013-06-14 16:38:42 -07:00
John Koleszar
0f7a66e962 Remove constant vp9_coef_update_prob table
All elements of this table are equal to 252, so replace it with a
single constant VP9_COEF_UPDATE_PROB.

Change-Id: I1e2d1d284326ce6df9899a740c2fc344b3ec81c9
2013-06-14 15:12:31 -07:00
Jingning Han
0b7910b9ff Merge "Enable sse2 version of sad8x4/4x8" 2013-06-14 13:15:49 -07:00
Jingning Han
c43af9a8a3 Enable sse2 version of sad8x4/4x8
The encoding time for bus at CIF goes from 661s to 625s. This commit
also enabled unit test of sad8x4/4x8 in sad_test.cc.

Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
2013-06-14 09:19:28 -07:00
Jingning Han
15f50e7b42 Enable sse2 version of sad8x4/4x8
The encoding time for bus at CIF goes from 661s to 625s. This commit
also enabled unit test of sad8x4/4x8 in sad_test.cc.

Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
2013-06-13 16:18:18 -07:00
John Koleszar
8e47093c9e Remove unused need_to_clamp_mvs
This flag no longer needed.

Change-Id: If13482015ddb92d225792ea5c0ee455d2285d1f6
2013-06-12 16:50:14 -07:00
Scott LaVarnway
a81bd12a2e Quick modifications to mb loopfilter intrinsic functions
Modified to work with 8x8 blocks of memory.  Will revisit
later for further optimizations.  For the HD clip used, the
decoder improved by almost 20%.

Change-Id: Iaa4785be293a32a42e8db07141bd699f504b8c67
2013-06-12 19:23:03 -04:00
Yaowu Xu
d682243012 Merge "Quick modifications to wide loopfilter intrinsic functions" 2013-06-12 15:16:11 -07:00
Ronald S. Bultje
fa96eeb835 Implement SSE version for sad4x8x4d and SSE2 version for sad8x4x4d.
Encoding time of crew (CIF, first 50 frames) @ 1500kbps goes from 4min56
to 4min42.

Change-Id: I92c0c8b32980d2ae7c6dafc8b883a2c7fcd14a9f
2013-06-12 17:40:01 -04:00
Scott LaVarnway
26496c52bf Quick modifications to wide loopfilter intrinsic functions
Modified to work with 8x8 blocks of memory.  Will revisit
later for further optimizations.  For the HD clip used, the
decoder improved my 20%.

Change-Id: Ia0057f55d66d1445882351ea6c43b595a5a980e5
2013-06-12 16:49:08 -04:00
John Koleszar
1fa04e1a03 Merge changes I86fe51b0,I4c9a9e0f
* changes:
  Remove unused vp9_idct_add_{y,uv}_block
  Remove some unused loopfilter code
2013-06-12 13:43:30 -07:00
Johann
bbd5cb2bd4 Merge "Fix compile warnings on windows." 2013-06-12 13:36:50 -07:00
John Koleszar
495ff8e0c7 Merge "Enable mmx loop filter routines" 2013-06-12 12:52:04 -07:00
John Koleszar
ceee4563d6 Remove unused vp9_idct_add_{y,uv}_block
These functions are not used, and appear to have been superceded.

Change-Id: I86fe51b088264f6b1b8d4d232bba97b371b98120
2013-06-12 12:24:22 -07:00
Jingning Han
1a5bb3cc76 Fix the comments in boundary block partition check
Change-Id: Ic6b2881d8d495269edbc514b33376ca963798b45
2013-06-12 12:05:06 -07:00
John Koleszar
8933a652fc Remove some unused loopfilter code
This code is unreachable, and not useful for later reference.

Change-Id: I4c9a9e0fbf859c1081bbcfbcda9710afb4b4741f
2013-06-12 11:36:00 -07:00
Frank Galligan
4524548f80 Fix compile warnings on windows.
Change-Id: If74bc6110016bc75ea3883ab136fbbac88f6a913
2013-06-12 11:34:15 -07:00
John Koleszar
0e1e16db90 Enable mmx loop filter routines
The mmx routines work as expected for the loop filter, so enable them.

Change-Id: I2bbd9b99a4445fcba17bb95002f1fb6e01fe8f85
2013-06-12 11:28:21 -07:00
Yaowu Xu
efe05b7437 fix a mis use of ref_frame
Change-Id: I9aac140d775b7b4a8727494d15b185b75501a546
2013-06-12 10:32:38 -07:00
Frank Galligan
15f9077ee2 Fix duplicate const.
Change-Id: I86be1f7421ed49d577cacf405f6e4b0daa85cfdc
2013-06-12 08:52:34 -07:00
John Koleszar
9831f20594 Disallow wide loopfilter on some chroma borders
Don't do the 15 tap filter if there aren't 8 pixels below/right of the
edge.

Change-Id: I62f16437c1d9ba59b6901a5fe71ddb2f472da344
2013-06-11 11:28:38 -07:00
Jingning Han
551f37d63d Fix partition coding of corner block
This commit fixed the allowable partition types for bottom-right
corner blocks.

When a block has over half of its pixels as valid content in both
vertical and horizontal directions, allow all the four partition
types in the bit-stream. Otherwise, apply partition type constraints.

Change-Id: I2252e2de7125a8bfb1c824bf34299a13c81102e3
2013-06-10 21:43:17 -07:00
Deb Mukherjee
51a7c7631d Merge "New probs for filters/tx_size and a few others" into experimental 2013-06-10 16:39:43 -07:00
Deb Mukherjee
a43ff15399 New probs for filters/tx_size and a few others
* New probs for subpel filters/tx_count
* Makes a change to not reset to defaults for the tx_size
probs if an intermediate frame reverts to using a fixed tx_size.
* A few updates to the parameters for backward adaptation for mode/mv
* some cosmetic cleanups

derf300: +0.06%

Change-Id: I22994d659bc31ca7a4fc8820fde24001e64a2920
2013-06-10 16:38:47 -07:00
John Koleszar
091e23c3e6 Merge "Remove remnants of VP8 profiles/versions" into experimental 2013-06-10 16:16:17 -07:00
John Koleszar
0fcb625e35 Remove remnants of VP8 profiles/versions
Remove the bilinear filter mode, and the no-loopfilter mode, and the
related vp9_setup_version() function.

Change-Id: I32311367812faf37863131df3af37d63d03973d7
2013-06-10 15:55:03 -07:00
Jim Bankoski
ba2af976cb print debugging info from mode info struct
This commit has no impact but to help us debug issues.   To Use call like
this:

  vp9_print_modes_and_motion_vectors(cpi->common.mi, cpi->common.mi_rows,
                                     cpi->common.mi_cols,
                                     cpi->common.current_video_frame,
                                     "decode_mi.stt");

Change-Id: I89e27725dae351370eb7f311a20a145ed4f1d041
2013-06-10 14:03:17 -07:00
John Koleszar
44db42c114 Merge the new loopfilter experiment
Change-Id: I524ba98841f2e1850e3276ac365c501cea31546d
2013-06-10 12:30:12 -07:00
John Koleszar
c37a1e5ef2 Merge "Loopfilter: Fix chroma edge selection" into experimental 2013-06-10 12:17:24 -07:00
John Koleszar
2f3cbfdde1 Merge "Fix use of get_uv_tx_size in loopfilter" into experimental 2013-06-10 12:17:11 -07:00
Adrian Grange
c4e5b77d74 Merge "Implement intra-coded frames" into experimental 2013-06-10 12:08:09 -07:00
Deb Mukherjee
995ce523eb Cosmetic cleanups of filters
No bitstream change.

Removes unused filters and the code for the case of 2 switchable filters;
also changes the 8tap-smooth filter coefficients for integer shifts to be
interpolating to be consistent with the way it is implemented currently.

Change-Id: I96c542fd8c06f4e0df507a645976f58e6de92aae
2013-06-10 12:06:36 -07:00
Adrian Grange
eac344ef10 Implement intra-coded frames
Implements ability to signal and decode frames that are
encoded using only intra coding modes. Only the decode
side has been implemented here.

Change-Id: I53ac6a8d90422cd08ba389e5236e15b45f9e93de
2013-06-10 11:43:16 -07:00
John Koleszar
48b7cbcac5 Loopfilter: Fix chroma edge selection
A 32x32 transform should have no internal filtering (check c==4)

Change-Id: I7414cf4748ed053208217692ef00cd8b20d49a91
2013-06-10 11:40:57 -07:00
John Koleszar
717d744a01 Fix use of get_uv_tx_size in loopfilter
Change the argument of get_uv_tx_size() to be an MBMI pointer, so that the
correct column's MBMI can be passed to the function.

Change-Id: Ied6b8ec33b77cdd353119e8fd2d157811815fc98
2013-06-10 11:40:57 -07:00
John Koleszar
ec38b6150d Merge "Fixed point reference picture scaling" into experimental 2013-06-10 09:45:34 -07:00
Ronald S. Bultje
549258b1c2 Merge "border mvref issue" into experimental 2013-06-10 09:22:49 -07:00
Jim Bankoski
75459d65df border mvref issue
Fixes mvref issue.

Change-Id: I07dc1b0682845bc18fe0efa6af5e4f4da3abfa3a
2013-06-10 09:21:11 -07:00
Yaowu Xu
7f99844e91 Merge "Loopfilter: bug fix in sb_type usage" into experimental 2013-06-10 08:56:38 -07:00
Tero Rintaluoma
86bb6df005 Fixed point reference picture scaling
Fixed point scaling factors are calculated once for each
reference frame by using integer division. Otherwise fixed point
scaling routines are used in all scaling calculations. This makes it
possible to calculate fixed point scaling factors on device driver
software and pass them to hardware and thus avoid division on hardware.

TODO:
 - Missing check for maximum frame dimensions
   (currently scaling uses 14 bits)
 - Missing check for maximum scaling ratio
   (upscaling 16:1, downscaling 2:1)

Problems:
 - Straightforward fixed point implementation can cause error +-1
   compared to integer division (i.e. in x_step_q4). Should only
   be an issue for frames larger than 16k.

Change-Id: I3cf4dabd610a4dc18da3bdb31ae244ebaf5d579c
2013-06-10 08:07:55 -07:00
Janne Salonen
548f90d2ce Loopfilter: bug fix in sb_type usage
Was always using sb_type of first column in a row of 8x8 units when
determining decoded block edges as a subcondition for loop filter
skipping.

Change-Id: Ib17554633a63a90b70cdaa7bed65db035a8ad9d8
2013-06-10 06:40:05 -07:00
Yaowu Xu
4852a8023d Merge "Loopfilter: Always filter intra edges" into experimental 2013-06-09 21:18:00 -07:00
Yaowu Xu
9c44ce9f4b Merge "Loopfilter: use the current block only for skip" into experimental 2013-06-09 21:17:54 -07:00
Yaowu Xu
2e1fd0a497 Merge "Modified loop filter edge skipping" into experimental 2013-06-09 21:17:47 -07:00
John Koleszar
140ac34e57 Loopfilter: Always filter intra edges
Change-Id: Ifb1ce2bd52147981ca1aec9ec6cfea8738a23e45
2013-06-09 09:02:47 -07:00
Ronald S. Bultje
c3f9b070ca Merge "New comp_inter defaults." into experimental 2013-06-09 06:40:02 -07:00
Ronald S. Bultje
3993d30922 Merge "Fix firstpass if framesize is not a multiple of 16." into experimental 2013-06-08 17:40:17 -07:00
Ronald S. Bultje
d30968c32a Merge "New default tables" into experimental 2013-06-08 17:39:50 -07:00
Ronald S. Bultje
20760254f6 Merge "Align frame size to 8 instead of 16." into experimental 2013-06-08 17:39:41 -07:00
Ronald S. Bultje
99e10253b0 New comp_inter defaults.
It seems like I inverted the meaning of the contexts by accident?

Change-Id: Iafb2346d9933930949578342b84519b719dd5dd3
2013-06-08 15:13:57 -07:00
Ronald S. Bultje
073c7d5eec Fix firstpass if framesize is not a multiple of 16.
Change-Id: Iec41736c2b6140715f90f40de5ae6cf52497a9b8
2013-06-08 13:32:05 -07:00
Ronald S. Bultje
b64be43998 New default tables
Change-Id: Ice8c73a2a843113877b8f8ed78737a1442c25ced
2013-06-08 13:29:14 -07:00
Deb Mukherjee
17da2cab78 TX_SIZE contexts simplification.
Reduces TX_SIZE contexts to 2 for each kind. The code is
cleaner and there is hardly any performance difference with
more than two contexts.

Results: almost neutral

Change-Id: I17656bd6db76224ae2856adf882504560e7dbaa4
2013-06-08 12:32:26 -07:00
Deb Mukherjee
67cb1f093c Minor fix in TX_SIZE contexts
Change-Id: I9e81f84877e18ba7e55d66389ed60e64a5b7abcc
2013-06-08 07:14:58 -07:00
Yaowu Xu
b7da6d0c5a Merge "Handle partition type coding of boundary blocks" into experimental 2013-06-07 18:16:16 -07:00
John Koleszar
f7e4b72df8 Loopfilter: use the current block only for skip
Use the current block's skip flag to determine edge skipping.

Change-Id: I4ba81f899286afbc3f6bb83eba2ef146a01b6fa4
2013-06-07 17:48:57 -07:00
Ronald S. Bultje
71701f3d40 Align frame size to 8 instead of 16.
Change-Id: Ic606ef1b31e49963a779455a1e010a9ebb0f3f1f
2013-06-07 17:20:50 -07:00
Adrian Grange
07a5777bde Frame header changes to support intra_only frames
Made changes to the frame header to write the sync
code in the frame header for a non-displayable,
intra-only frame.

Extended reset_frame_context to 2-bits.

(Submitting on behalf of Dmitri)

Change-Id: Ie836ae0df9ed572fb4f08aabe9351a555c4f3b96
2013-06-07 16:19:34 -07:00
Deb Mukherjee
21401942b0 Coding tx-size selection by use of spatial context
Adds coding of transform size within a frame by use of context
of transform sizes selected in left and above blocks.

Also incorporates code for generating stats.

TODO: generate and incorporate new default stats

Change-Id: I6a7af099f6ad61d448521d9a51167aedaf638ed6
2013-06-07 16:07:58 -07:00
Deb Mukherjee
869a39ba60 Cleans up mbskip encoding
Refactors mbskip coding to be compatible with coding of the rest of
the symbols. Adds forward/backward adaptation and removes a lot of
the legacy code.

Results:
fast50: +1.6%
derfraw300: +0.317%

Change-Id: I395a2976d15af044d3b8ded5acfa45f6f065f980
2013-06-07 16:00:26 -07:00
Jingning Han
78b8190cc7 Handle partition type coding of boundary blocks
The partition types of blocks sitting on the frame boundary are
constrained by the block size and the position of each sub-block
relative to the frame. Hence we use truncated probability models
to handle the coding of such information.

100 frames run:
yt 0.138%

Change-Id: I85d9b45665c15280069c0234ea6f778af586d87d
2013-06-07 14:19:40 -07:00
Ronald S. Bultje
28164eb962 Fix segment feature data size.
Change-Id: I4331cfd99a717938f4f970cad81c468cbf287b00
2013-06-07 13:57:28 -07:00
Ronald S. Bultje
fb1f6f1db4 Fix segment feature data type.
It has a range of -255,255, so should be int16_t, not int8_t.

Change-Id: I5ef4b6aefb6212b0f35f4754f3c4d73fddbc52a0
2013-06-07 13:57:27 -07:00
Ronald S. Bultje
363dc6ceda Don't crash if motion vector ref points to out-of-bounds area.
This can only happen if partition is partly out-of-frame, in which
case the referenced mv is either out-of-frame also (and thus has the
same value as an already-read one), or it is actually uninitialized,
in which case we don't want to use it.

Change-Id: Icf39fa4d987c7abcbebb9bbdcdd6311e8fb9d3c9
2013-06-07 13:57:27 -07:00
Paul Wilkins
340c7a48e6 Change to segment ref frame feature.
Simplify feature to only support a single reference frame
instead of a mask.

Change-Id: I5dd3a98c7a224aafb35708850ab82e2f220e68fb
2013-06-07 21:42:22 +01:00
Yaowu Xu
0bb6da3668 Merge "Remove two un-used entries in mode_lf_delta[]" into experimental 2013-06-07 10:10:45 -07:00
Yaowu Xu
254f46bc5b Merge "Specify mv neighborhood for block larger than 8x8" into experimental 2013-06-07 10:09:35 -07:00
Yaowu Xu
b097a3ba82 Remove two un-used entries in mode_lf_delta[]
With the removal of i4X4 and SPLIT_MV modes, the two entries for the
modes are no longer used. This patch remove the coding of the deltas.

Change-Id: Iea4eb500404ebe9706159380a03b8eca542fb4c3
2013-06-07 09:24:09 -07:00
Deb Mukherjee
78fbaf4d84 Merge "Coding updates for tx-size selection" into experimental 2013-06-07 09:19:36 -07:00
Ronald S. Bultje
def6bc765c Merge "Revert "Align frame size to 8 instead of 16."" into experimental 2013-06-07 09:01:33 -07:00
Yaowu Xu
8b3ad75266 Specify mv neighborhood for block larger than 8x8
The new neighorbhood adapts to the shape and size of the block type
cif +.16%
stdhd +.13%

Change-Id: I978db58278e9ae3fbd6726ef831bdfc5f5f37d02
2013-06-07 08:59:48 -07:00
Ronald S. Bultje
e7d306aae6 Revert "Align frame size to 8 instead of 16."
This reverts commit c2574414d4

Change-Id: Ie9013cb0bb43e639e01b4588f630b1da59295d38
2013-06-07 08:59:27 -07:00
Deb Mukherjee
3ee1a21a42 Coding updates for tx-size selection
Changes to the coding of transform sizes, along with forward
and backward probability updates.

Results:
derf300: +0.241%

Context based coding of transform sizes will be in a separate
patch.

Change-Id: I97241d60a926f014fee2de21fa4446ca56495756
2013-06-07 08:54:00 -07:00
Janne Salonen
5c5223860a Modified loop filter edge skipping
Added condition to not to skip filtering of transform block edges when
the edge is also a decoding block edge.

Change-Id: Iaccb6206c4202b78e5dca3b89379556e0f4aba0c
2013-06-07 06:36:22 -07:00
Paul Wilkins
576c2bb021 Fix bug in segment skip.
Wrong max data size (skip has no data) and use of vp9_get_segdata()
when it should be vp9_segfeature_active().

Change-Id: I1eb97d33df6e2a42cc589049f704266fe3639902
2013-06-07 13:27:08 +01:00
Yaowu Xu
4df9e7883c Merge "Removed rectangular intra prediction code" into experimental 2013-06-06 22:58:07 -07:00
Yaowu Xu
472669befb Fix a merge conflict
ref_frame in MB_Mode_Info was changed in the ref frame coding patch
to be an array to handle first and second reference frame, this patch
fix the loop filter code that use the pointer directly as reference
frame.

Change-Id: I71afa5a49deb50c1bc38029fd07470b984c6dfe9
2013-06-06 22:10:07 -07:00
Yaowu Xu
9470c1a2a1 Removed rectangular intra prediction code
As all intra predictions happen on squared transform block now.

Change-Id: I7ec91e3f0ad01383a03d2bd3099bbf32e87e3466
2013-06-06 21:35:10 -07:00
Jim Bankoski
fa9db8da15 Merge "Fix FIXME." into experimental 2013-06-06 20:50:51 -07:00
Jim Bankoski
686f437264 Merge "Align frame size to 8 instead of 16." into experimental 2013-06-06 20:49:59 -07:00
John Koleszar
736c7b804a Merge "Reimplementation of loop filter" into experimental 2013-06-06 17:34:26 -07:00
Ronald S. Bultje
c2574414d4 Align frame size to 8 instead of 16.
Change-Id: Ic22f416a33de558519d5c30a929f6a954546ade9
2013-06-06 17:28:11 -07:00
Ronald S. Bultje
bc41af00cf Fix FIXME.
Change-Id: I47a9857d35da1bff6153f8090c6b98b689b31a61
2013-06-06 17:28:11 -07:00
Ronald S. Bultje
6ef805eb9d Change ref frame coding.
Code intra/inter, then comp/single, then the ref frame selection.
Use contextualization for all steps. Don't code two past frames
in comp pred mode.

Change-Id: I4639a78cd5cccb283023265dbcc07898c3e7cf95
2013-06-06 17:28:09 -07:00
Ronald S. Bultje
ad34368786 New intra mode and partitioning probabilities.
Split partition probabilities between keyframes and non-keyframes,
since they are fairly different. Also have per-blocksize interframe
y intramode probabilities, since these vary heavily between different
blocksizes.

Lastly, replace default probabilities for partitioning and intra modes
with new ones generated from current codec. Replace counts with actual
probabilities also.

Change-Id: I77ca996e25e4a28e03bdbc542f27a3e64ca1234f
2013-06-06 10:45:30 -07:00
John Koleszar
043d348aae Reimplementation of loop filter
This version of the loop filter supports non-4:2:0 subsampling and
a fourth plane, as well as changing the filtering order to be more
friendly to hardware implementations.

The filters are applied first to all vertical edges within the
64x64 SB, followed by the top horizontal edge and any internal
horizontal edges. Since filtering is applied on each 4x4 edge
serially, a dependency is created from filtering one block edge
to the next. It would be possible to remove this depencnecy by
building all filtering decisions from the unfiltered
reconstruction data.

Change-Id: I08f3e9683eb7bded8a76651cbc50fc0dfdd05fa7
2013-06-06 08:45:45 -07:00
Jim Bankoski
5a88271b09 don't tokenize & encode tokens for blocks in UMV
This avoids encoding tokens for blocks that are entirely
in the UMV border. This changes the bitstream.

Change-Id: I32b4df46ac8a990d0c37cee92fd34f8ddd4fb6c9
2013-06-06 06:10:25 -07:00
Dmitry Kovalev
28d31aed7f Merge "Moving bits from compressed header to uncompressed one." into experimental 2013-06-06 01:15:44 -07:00
Jingning Han
61e6586230 Merge "Fix UV intra coding rd loop" into experimental 2013-06-05 21:47:00 -07:00
Jingning Han
f04b15486a Fix UV intra coding rd loop
This commit makes the coding/reconstruction operations of intra
coding rate-distortion loop for UV components consistent with those
of the encoding process.

key frame coding gains:
derf:   0.11%
stdhd:  0.42%

Change-Id: I8d49f83924a320e3689ef2d60096c49d7f0c7a40
2013-06-05 21:18:02 -07:00
Dmitry Kovalev
12345cb391 Moving bits from compressed header to uncompressed one.
Bits moved: refresh_frame_flags, active_ref_idx[], ref_frame_sign_bias[],
allow_high_precision_mv, mcomp_filter_type, ref_pred_probs[].

Derf results: +0.040%

Change-Id: I011f43c7eac0371d533b255fd99aee5ed75b85a5
2013-06-05 20:56:37 -07:00
Deb Mukherjee
30226a658f Cosmetic renaming VP9_MVREFS to VP9_INTER_MODES
NO bitstream change

Change-Id: I79f6146dac5fdd157051b6f8dc611c0b7b5e5f7f
2013-06-05 11:24:01 -07:00