Commit Graph

2230 Commits

Author SHA1 Message Date
Yunqing Wang
b409c1d91e Merge "Remove unnecessary calling of vp9_init_quantizer()" 2013-07-17 15:47:17 -07:00
Yunqing Wang
3798db88e1 Remove unnecessary calling of vp9_init_quantizer()
vp9_init_quantizer() is called in vp9_create_compressor(), and
should not be called in vp9_set_speed_features().

Change-Id: Ic2f1f4b0531b9d46bb841d7e1d8da9812207dad6
2013-07-17 14:59:00 -07:00
hkuang
7b9a652813 Merge "Remove unnecessary buffer copy in idct4x4." 2013-07-17 14:51:53 -07:00
Yaowu Xu
6ac5b7db2c Merge "changed mode checking order" 2013-07-17 14:44:40 -07:00
Dmitry Kovalev
a7a1e96136 Merge changes Ieffea49e,Idf610746
* changes:
  Removing two unused arguments from vp9_inc_mv signature.
  Changing signature of vp9_get_pred_probs_tx_size.
2013-07-17 14:44:20 -07:00
Dmitry Kovalev
b775081283 Merge "Removing experimental code from vp9_entropymv.c." 2013-07-17 14:43:45 -07:00
Dmitry Kovalev
102632aab1 Merge "Adding read_comp_pred function." 2013-07-17 14:26:34 -07:00
Ronald S. Bultje
c6917528a5 Add a best_yrd shortcut in splitmv mode search.
Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min6.2 to 1min5.9, i.e. 0.5% faster overall.

Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683
2013-07-17 14:21:57 -07:00
hkuang
bd6ce7128c Remove unnecessary buffer copy in idct4x4.
Change-Id: I386066b9bcfb4bffb582e6827af36ca0181f6a83
2013-07-17 14:20:56 -07:00
Ronald S. Bultje
161c995658 Skip redundant nearest/near/zero encodes in splitmv.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
1min7.3 to 1min6.2, i.e. 1.7% faster overall.

Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987
2013-07-17 13:53:48 -07:00
Yaowu Xu
42facc292d changed mode checking order
Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
2013-07-17 13:43:50 -07:00
Ronald S. Bultje
8fea880b6f Skip nearest/near/zero redundant encodes.
Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
to 1min7.3, i.e. 8% faster.

Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62
2013-07-17 11:33:15 -07:00
Yunqing Wang
10e83b0717 Enable disable_splitmv feature for other speeds
Added disable_splitmv feature at other speed levels. For speed 3 or
above, always turn it on.

Change-Id: Ibb36f0a7ef12a34b4f8d0f9cb6193eab43b34360
2013-07-17 10:25:49 -07:00
Dmitry Kovalev
8452c34551 Removing experimental code from vp9_entropymv.c.
Change-Id: I340d06e3bc32c78358654496503cccd4196cbe2e
2013-07-17 10:25:09 -07:00
Johann
9ca66ec050 Merge "vp9_convolve8_neon placeholder" 2013-07-17 10:09:00 -07:00
Ronald S. Bultje
9f427bfe98 Best_rd breakout in rd partition search.
About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
goes from 1min36 to 1min24. Results become slightly better (+0.2% on
derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
super_block_yrd(). Overall speed change (on derfraw300) is roughly
-13%. This can probably be improved further by caching best_yrd
between partition searches. Also, we might be able to get more
speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
just at the sb8x8 level.

Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
2013-07-17 09:56:46 -07:00
Ronald S. Bultje
83c7e13a6b Do a skip-block check for sub8x8 partitions also.
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
2013-07-17 09:46:47 -07:00
Yunqing Wang
df90d58f4f Speed up motion estimation using small partitions' result(experiment)
Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
2013-07-17 09:11:47 -07:00
Johann
59dc4e9cdd vp9_convolve8_neon placeholder
Call the individually optimized horizontal and vertical functions. This
implementation abuses the temp buffer.

This will be replaced with a custom optimized function.

Over 2x speedup.

Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd
2013-07-17 08:39:27 -07:00
Paul Wilkins
d66eab15dd Merge "Move uv intra mode selection in rd loop." 2013-07-17 05:19:26 -07:00
Paul Wilkins
154c34a3ee Merge "Limit transform sizes searched for uv intra." 2013-07-17 03:40:11 -07:00
Paul Wilkins
2ee338ce3b Move uv intra mode selection in rd loop.
Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
2013-07-17 11:11:21 +01:00
Paul Wilkins
6c667f0ffe Limit transform sizes searched for uv intra.
Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
2013-07-17 11:08:55 +01:00
Paul Wilkins
5f4722c75f Merge "Minor cleanup in code to fine uv tx_size." 2013-07-17 02:50:09 -07:00
Dmitry Kovalev
6638b6f63f Merge "Removing MV_GROUP_UPDATE define and corresponding code." 2013-07-16 21:09:00 -07:00
Jingning Han
0b58fa80a0 Merge "Skip redundant motion search in 4x4 level rd loop" 2013-07-16 20:54:25 -07:00
Dmitry Kovalev
851a911158 Adding read_comp_pred function.
Removing old debug code from vp9_decodemv.c.

Change-Id: I51a6d5fe6a2f6583a1555e692bb1ee5a5b315d6c
2013-07-16 20:20:25 -07:00
Jingning Han
a142d6fc93 Skip redundant motion search in 4x4 level rd loop
This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.

Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
2013-07-16 17:21:11 -07:00
Dmitry Kovalev
41ae3d02d4 Removing two unused arguments from vp9_inc_mv signature.
Change-Id: Ieffea49eb7a5e5092f21f8694c546aff69b07c6d
2013-07-16 17:01:08 -07:00
Dmitry Kovalev
5b65a71cdc Changing signature of vp9_get_pred_probs_tx_size.
Removing VP9_COMMON* argument and adding struct tx_probs* instead of
MACROBLOCKD*.

Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c
2013-07-16 16:34:54 -07:00
Dmitry Kovalev
f53d007b9e Merge "Loop filter code cleanup." 2013-07-16 15:55:17 -07:00
Dmitry Kovalev
3997da0d35 Removing MV_GROUP_UPDATE define and corresponding code.
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
2013-07-16 15:03:00 -07:00
Dmitry Kovalev
9482a0bf10 Cleaning up tile code.
Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.

Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
2013-07-16 14:47:15 -07:00
Dmitry Kovalev
2de3c8d29b Loop filter code cleanup.
Cosmetic code changes, renaming 'flat' local var to 'mask', removing
unused field 'blim' from loopfilter_info_n and loop_filter_info structs.

Change-Id: I51e6ccf727fe361ad9a08e29e1201aa7abd4987f
2013-07-16 14:39:31 -07:00
James Zern
98e132bde0 Merge changes I40454d26,I892e76d5,I865ab3f9,I4a4bec17,I61c4351e,I37eb3559,I1031c556,I8c8f1f42
* changes:
  delete vp9_loopfilter_sse2.asm
  vp9_loopfilter_intrin_sse2: cosmetics: fix indent
  delete x86/vp9_loopfilter_x86.h
  vp9_loopfilter_intrin_sse2: make some funcs static
  vp9_loopfilter_intrin_sse2: remove unused uv funcs
  vp9_loopfilter: remove uv function typedef
  filter_block_plane: reuse some constants
  vp9_loopfilter.c: make some functions static
2013-07-16 14:25:32 -07:00
James Zern
39ce4b13d5 Merge "use consistent framerate naming" 2013-07-16 14:22:52 -07:00
James Zern
9581eb6e8a use consistent framerate naming
s/frame_rate/framerate/g

Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
2013-07-16 14:12:47 -07:00
Jingning Han
5e8e2bf48e Merge "SSE2 16x16 inverse ADST/DCT hybrid transform" 2013-07-16 14:04:04 -07:00
Dmitry Kovalev
5de96b3ce6 Merge "Rewriting vp9_set_pred_flag_{seg_id, mbskip}." 2013-07-16 13:34:42 -07:00
Dmitry Kovalev
85a0d8e85c Merge "Moving vp9_kf_default_bmode_probs to vp9_entropymode.c." 2013-07-16 13:26:53 -07:00
James Zern
50015f6eba delete vp9_loopfilter_sse2.asm
sse2 functions are provided by vp9_loopfilter_intrin_sse2.c

Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b
2013-07-16 13:09:16 -07:00
James Zern
8f4787a383 vp9_loopfilter_intrin_sse2: cosmetics: fix indent
Change-Id: I892e76d5ad1443b2ea0d1a7839fe26afe9c68ffb
2013-07-16 13:09:16 -07:00
James Zern
af58254267 delete x86/vp9_loopfilter_x86.h
also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h

Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405
2013-07-16 13:09:05 -07:00
James Zern
5baa416b6c Merge "vp9: remove frames_{since,till}.. from MACROBLOCKD" 2013-07-16 13:00:14 -07:00
Jingning Han
d05f66aa10 SSE2 16x16 inverse ADST/DCT hybrid transform
This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
This provides about 1% encoding speed-up at speed 0.

Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b
2013-07-16 12:51:42 -07:00
James Zern
c0562d08f6 Merge "VP[89]_COMMON: remove unused near_boffset" 2013-07-16 12:17:04 -07:00
James Zern
63e914bde4 Merge "VP9_COMMON: remove unused framerate/bitrate" 2013-07-16 12:16:37 -07:00
James Zern
3a7c2665d0 Merge "yv12config: remove YUV_TYPE" 2013-07-16 12:16:04 -07:00
Ronald S. Bultje
58a2005367 Merge "Replace generated quant tables with static lookup tables." 2013-07-16 12:07:17 -07:00
Ronald S. Bultje
e965cccce5 Replace generated quant tables with static lookup tables.
This prevents possible float rounding issues between architectures.

Change-Id: I6ed260aebd49feb4cfb5596a5370c44be5f72167
2013-07-16 12:06:26 -07:00
John Koleszar
cc1aac1b3c Merge "Fix above context pointers" 2013-07-16 11:23:38 -07:00
Jingning Han
5851904744 Merge "SSE2 8x8 inverse ADST/DCT transform" 2013-07-16 11:00:11 -07:00
Dmitry Kovalev
baf0c959c7 Moving vp9_kf_default_bmode_probs to vp9_entropymode.c.
Removing vp9_modelcontext.c.

Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427
2013-07-16 10:54:34 -07:00
Dmitry Kovalev
863138a2ad Rewriting vp9_set_pred_flag_{seg_id, mbskip}.
Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.

Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
2013-07-16 10:44:48 -07:00
Paul Wilkins
30d2ea45ce Minor cleanup in code to fine uv tx_size.
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
2013-07-16 18:27:33 +01:00
John Koleszar
5efd9609e3 Fix above context pointers
In the prior code, the above context pointers used for entropy
decoding were initialized on the first frame, and not updated when
the frame size changed. The per-frame code which initializes the
contexts assumes that the contexts are contiguous, leading to an
incomplete initialization when the frame is smaller. This commit
updates the pointers so that the context is contigous whenever
the frame size changes.

Change-Id: I08b53e3a30c8289491212311682ff1b8028cff6c
2013-07-16 10:26:56 -07:00
Johann
90ebfe621f Merge "vp9_convolve8_[horiz|vert]_avg" 2013-07-16 09:42:52 -07:00
Jingning Han
dd97c62ab8 Merge "Skip inter-coded block reconstruction in rd loop" 2013-07-16 09:03:38 -07:00
Dmitry Kovalev
e8e7620a1f Merge "Removing and moving around constant definitions." 2013-07-16 00:52:53 -07:00
Yaowu Xu
c5b0cd8405 Merge "Change to extend full border only when needed" 2013-07-15 21:35:32 -07:00
Yaowu Xu
5b915ebd92 Change to extend full border only when needed
This is a short term optimization till we work out a decoder
implementation requiring no frame border extension.

Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
2013-07-15 20:52:13 -07:00
Dmitry Kovalev
ca75f1255f Removing and moving around constant definitions.
Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
2013-07-15 19:26:30 -07:00
Dmitry Kovalev
65762849d1 Merge "Consistent naming for loop-filter filters." 2013-07-15 19:21:32 -07:00
Johann
6eae37f45c Merge "Remove print_nmvcounts" 2013-07-15 18:43:41 -07:00
Ronald S. Bultje
1ff94fea56 Inline vp9_quantize() in xform_quant().
Cycle times:
4x4:    151 to  131 cycles (15% faster)
8x8:    334 to  306 cycles (9% faster)
16x16: 1401 to 1368 cycles (2.5% faster)
32x32: 7403 to 7367 cycles (0.5% faster)

Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.

Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
2013-07-15 17:30:57 -07:00
Ronald S. Bultje
7e684e2009 Merge "Inline xform_quant() in encode_block_intra()." 2013-07-15 17:29:39 -07:00
Frank Galligan
ce1d69aed9 Merge "Neon: Update mbfilter if all vectors follow one branch." 2013-07-15 17:11:55 -07:00
Dmitry Kovalev
e973b4e2d9 Consistent naming for loop-filter filters.
Renaming flatmask4 to flat_mask4, flatmask5 to flat_mask5, hevmask to
hev_mask, filter to filter4, mbfilter to filter8, wide_mbfilter to
filter16.

Change-Id: Ic61c73e59c2eee505257584867aafac99833cea1
2013-07-15 16:01:31 -07:00
Ronald S. Bultje
6fb418741f Inline xform_quant() in encode_block_intra().
Also inline some of the block calculations to assist the compiler to
not do silly things like calculating the same offset (or converting
between raster/transform block offset or block, mi and pixel unit)
many, many, many times.

Cycle times:
4x4:     584 ->   505 cycles (16% faster)
8x8:    1651 ->  1560 cycles (6% faster)
16x16:  7897 ->  7704 cycles (2.5% faster)
32x32: 16096 -> 15852 cycles (1.5% faster)

Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.

Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
2013-07-15 16:00:42 -07:00
Dmitry Kovalev
2c31729839 Code cleanup inside vp9_decodeframe.c.
Removing unused DEC_DEBUG define and dec_debug variable. Changing function
signatures to eliminate code duplication, renaming function
mb_init_dequantizer to init_dequantizer. Also removing redundant curly
braces, and comments.


Change-Id: Ia56ee1b0be5f24abb0e878581845be8a4773c298
2013-07-15 14:47:25 -07:00
Frank Galligan
f4f60f6005 Neon: Update mbfilter if all vectors follow one branch.
Change the mbfilter Neon code from executing both branches if all
vectors follow only one branch.

The code is about 5% faster when executing only one branch and about
1% slower when executing both branches.

-PS5: Remove local stack space from mbfilter.

Change-Id: I6a23f9b318a9f4568a2718b4c9348db988fe2182
2013-07-15 13:08:28 -07:00
Jingning Han
043e0f9dad Skip inter-coded block reconstruction in rd loop
Skip the inverse transform and reconstruction of inter-mode coded
blocks in the rate-distortion optimization loop, when skip_encode_sb
feature is turned on. This provides about 1% speed-up at speed 0,
and 1.5% speed-up at speed 1. No performance change in both settings.

Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
2013-07-15 11:32:14 -07:00
Jingning Han
faff6ed0fb Skip duplicate block encoding in the rd loop
This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7% speed-up) at 0.04dB
         performance loss.

This operation is currently turned on in settings of speed 1.

Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec
2013-07-15 11:08:58 -07:00
Dmitry Kovalev
1f14bbb624 Merge "Fixing vp9_get_pred_context_comp_ref_p function." 2013-07-15 10:51:42 -07:00
James Zern
04606d7258 vp9_loopfilter_intrin_sse2: make some funcs static
+ drop 'vp9_'

Change-Id: I4a4bec175316aab8f65c3a23bacc8362399a1357
2013-07-13 18:48:00 -07:00
James Zern
dc968d3d45 vp9_loopfilter_intrin_sse2: remove unused uv funcs
vp9_mbloop_filter_horizontal_edge_sse2 /
vp9_mbloop_filter_vertical_edge_uv_sse2

Change-Id: I61c4351ef0cce79fa4156a47ddace781f1566869
2013-07-13 18:44:32 -07:00
James Zern
bd6b79c44d vp9_loopfilter: remove uv function typedef
loop_filter_uvfunction is unused

Change-Id: I37eb3559e9eb2808f1f29dfea429441c94c9df2a
2013-07-13 18:38:28 -07:00
James Zern
9a4e175a64 filter_block_plane: reuse some constants
+ light const application
+ limit scope of params to build_lfi

Change-Id: I1031c556aec160a690921dc10e7aa8a707f43ecd
2013-07-13 18:21:05 -07:00
James Zern
b09d37af0c vp9_loopfilter.c: make some functions static
+ drop 'vp9_'

Change-Id: I8c8f1f421f7fc84d2efb80349cd725de3c9bf6bd
2013-07-13 18:14:03 -07:00
James Zern
dc1d2331f6 vp9: remove frames_{since,till}.. from MACROBLOCKD
frames_since_golden / frames_till_alt_ref_frame are unused.

Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b
2013-07-13 18:02:11 -07:00
James Zern
04092764f7 VP9_COMMON: remove unused framerate/bitrate
+ VP8_COMMON: place them under CONFIG_POSTPROC_VISUALIZER

Change-Id: I2702d5a3e1134b9c5f7ddc14b4173955a400f2cf
2013-07-12 21:43:23 -07:00
Jingning Han
91365addf8 SSE2 8x8 inverse ADST/DCT transform
This commit enables SSE2 implementation of 8x8 inverse ADST/DCT
transform. The runtime goes from 1216 cycles -> 266 cycles.
For bus_cif at 2000 kbps, the overall runtime reduces from
253707ms -> 248430ms, i.e., 2% speed-up at speed 0.

Change-Id: Ib0372e17e9162d7b11a10d653b1c8be547c878fb
2013-07-12 21:03:16 -07:00
James Zern
ce0324d8dd VP[89]_COMMON: remove unused near_boffset
Change-Id: If9b9ca703b997312df85241a0758d414cfdc5228
2013-07-12 19:41:27 -07:00
Dmitry Kovalev
429070987a Using vp9_copy and vp9_zero instead of custom code.
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
2013-07-12 18:07:43 -07:00
Dmitry Kovalev
31a68bcdff Fixing vp9_get_pred_context_comp_ref_p function.
Adding missed parenthesis around boolean expressions. Bitstream is changed.
Regenerating test vectors.

Change-Id: I4cc00b761e9473f92f180a9fc3a0c607f0aaae56
2013-07-12 17:46:02 -07:00
Dmitry Kovalev
31403080ff Merge "Removing redundant call to set_mi_row_col." 2013-07-12 17:08:23 -07:00
Dmitry Kovalev
3c94fffdb0 Removing redundant call to set_mi_row_col.
This function is actually called from set_offsets which is called right
before vp9_read_mode_info.

Change-Id: Ibb9d5ad606194bc80eab264fad85b31c9dfd8f77
2013-07-12 16:25:23 -07:00
Johann
a15bebfc0a vp9_convolve8_[horiz|vert]_avg
Super basic conversion from the other implementations. Any changes to
one should be trivial to copy over keep in sync.

Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8
2013-07-12 16:21:33 -07:00
Yaowu Xu
cdea4a7c66 Merge "Fix a build issue" 2013-07-12 16:17:22 -07:00
Dmitry Kovalev
aa518af8c7 Merge "Adding struct tx_probs and struct tx_counts to cleanup the code." 2013-07-12 16:02:09 -07:00
Dmitry Kovalev
444c8d4c53 Merge "Making functions read_{inter, intra}_segment_id more similar." 2013-07-12 15:50:02 -07:00
James Zern
c9a2a06c20 Merge "vp9_postproc: remove useless self-assign" 2013-07-12 15:41:41 -07:00
James Zern
4fc6c88e9c yv12config: remove YUV_TYPE
this was never fleshed out in the context of VP8, for which it was
added. for VP9 it has no meaning.

Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff
2013-07-12 15:25:48 -07:00
Dmitry Kovalev
cc662dd768 Adding struct tx_probs and struct tx_counts to cleanup the code.
Also removing unused declarations from vp9_entropymode.h file.

Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
2013-07-12 15:22:38 -07:00
Dmitry Kovalev
60969da5cb Merge "Code cleanup in vp9_pred_common.c" 2013-07-12 15:04:07 -07:00
Dmitry Kovalev
db0d603b1c Making functions read_{inter, intra}_segment_id more similar.
Change-Id: I51f9ac910834f2d7aba2be4f7ffbce597e61a144
2013-07-12 14:50:33 -07:00
James Zern
cca973a1ab vp9_postproc: remove useless self-assign
Change-Id: I0bc5d2d8c9fec8be18263b0dc2528886bb5b7b61
2013-07-12 14:17:15 -07:00
Dmitry Kovalev
3ab86adb1e Code cleanup in vp9_pred_common.c
No bitstream changes. Using MB_MODE_INFO temp variables instead of
MODE_INFO variables. Removing redundant curly braces.

Change-Id: Ib9d1bedfbd8af97ecc722ccf697ea8177bbe287c
2013-07-12 14:11:48 -07:00
Yaowu Xu
fb754b182f Fix a build issue
Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
2013-07-12 11:38:44 -07:00
James Zern
0195fb53cb vp9: consistent 'log2' variable naming
lg2 -> log2

Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf
2013-07-12 11:37:43 -07:00
James Zern
37c0a1a8d0 Merge changes I33e76c42,I24aeac1e,If4192b40
* changes:
  vp9_dx_iface: s/vp8/vp9/ where possible
  vp[89]_dx_iface: delete unused function
  vp[89]_dx_iface: factorize vp8_mmap_*()
2013-07-12 11:10:18 -07:00
James Zern
563b4b20a8 vp9_dx_iface: s/vp8/vp9/ where possible
drop 'vp9_' from most static functions unrelated to the codec interface
itself.

Change-Id: I33e76c425bb7373570a57a61662a56d65ab4bdf3
2013-07-12 11:05:39 -07:00
Deb Mukherjee
94c481f9f1 Some minor cleanups for efficiency
Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
2013-07-12 10:22:56 -07:00
Dmitry Kovalev
727631873d Merge "Removing redundant code mostly from vp9_pred_common.{h, c}." 2013-07-12 10:22:30 -07:00
Paul Wilkins
b8ddc9f0d3 Merge "Speed 2 feature adjustment." 2013-07-12 02:14:01 -07:00
James Zern
e202a2be03 vp[89]_dx_iface: delete unused function
static mmap_lkup

Change-Id: I24aeac1eca8453e28d58bc06925e58efc228a0a6
2013-07-11 23:03:24 -07:00
James Zern
b088998e5d vp[89]_dx_iface: factorize vp8_mmap_*()
s/vp8/vpx/ -> vpx_codec_internal.h / vpx_codec.c

Change-Id: If4192b40206276a761b01d44e334fe15bcb81128
2013-07-11 23:01:26 -07:00
Jingning Han
84c3ac0476 Merge "Remove unnecessary tx_type branch in encode_block" 2013-07-11 21:52:27 -07:00
Dmitry Kovalev
dd150e8ea9 Removing redundant code mostly from vp9_pred_common.{h, c}.
Removing redundant function arguments and curly braces.

Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b
2013-07-11 18:39:10 -07:00
Johann
e6ab476dd4 Remove print_nmvcounts
For some reason iOS builds take a really long time to sort this
function out.

It's not used anywhere so remove it.

Change-Id: Ia5c8513a0d9c7eb32641cca58ca1c1113e2dd9f4
2013-07-11 17:22:03 -07:00
Ronald S. Bultje
ee09dd9949 Remove unused function block_error().
Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
2013-07-11 17:14:03 -07:00
James Zern
30bac896f9 Merge "vp9: fix peek_si for version==0" 2013-07-11 15:51:39 -07:00
Dmitry Kovalev
cae3fb7267 Merge "Calling is_inter_mode() instead of custom code." 2013-07-11 15:20:14 -07:00
Jingning Han
dac5891a1a Merge "SSE2 4x4 invserse ADST/DCT transform" 2013-07-11 14:17:23 -07:00
Dmitry Kovalev
8c05e59065 Calling is_inter_mode() instead of custom code.
Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf
2013-07-11 14:14:47 -07:00
Dmitry Kovalev
b55ecafda8 Merge "Making vp9_default_nmv_context static." 2013-07-11 13:58:34 -07:00
James Zern
7645c9ab34 vp9: fix peek_si for version==0
Change-Id: I6bfec4fa50dfc1a953edb1a2aa8e97e6e896bed6
2013-07-11 12:22:39 -07:00
Dmitry Kovalev
c4ad3273c7 Moving segmentation related vars into separate struct.
Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
2013-07-11 11:57:57 -07:00
Dmitry Kovalev
f70c021d36 Merge "Adding write_compressed_header function." 2013-07-11 11:57:17 -07:00
Dmitry Kovalev
802e57535a Merge "Removing unused TOKENEXTRA arg from pick_sb_modes function." 2013-07-11 11:46:06 -07:00
Johann
158c80cbb0 convolve8 optimizations for neon
Independent horizontal and vertical implementations.

Requires that blocks be built from 4x4 and [xy]_step_q4 == 16

6-10% improvement. CIF improved the least.

Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
2013-07-11 11:08:19 -07:00
hkuang
c9b25dcae4 Add neon optimize vp9_dc_only_idct_add.
Change-Id: Iae84ab945cc9662a0ddd839aa2b9ca59f2ae5423
2013-07-11 10:30:47 -07:00
Jingning Han
b9381b6faf Remove unnecessary tx_type branch in encode_block
The function encode_block is called only by inter-prediction modes,
hence removing the transform type branching there.

Change-Id: I34a3172e28ce2388835efd0f8781922211bff857
2013-07-11 09:11:35 -07:00
Scott LaVarnway
f2a6bcfb18 Eliminated prev_mip memsets/memcpys in encoder
This patch is in experimental but was not merged into master.

This patch swaps ptrs instead of copying and uses the
last show_frame flag instead of setting the entire buffer
to zero.

Change-Id: Ia0950466c8ba301a2a5bf917ff3d07bc1a2c2311
2013-07-11 10:47:28 -04:00
Jim Bankoski
5000cdf0ff Merge "Wide loopfilter 16 pix at a time" 2013-07-11 06:44:02 -07:00
Paul Wilkins
5290eeab88 Speed 2 feature adjustment.
With sf->auto_mv_step_size on it is questionable
whether sf->reduce_first_step_size is worthwhile.
At speed 2 it was not having a big impact.

Even at speed 2 sf->optimize_coefficients = 0 is not
having a big speed imapct so for now I have moved it
down into a higher speed setting.

Change-Id: I8a54de76d486ad37aabce76474889da2768b14c1
2013-07-11 13:59:12 +01:00
Jingning Han
49b6302044 SSE2 4x4 invserse ADST/DCT transform
Enable SSE2 4x4 inverse ADST/DCT transform. The runtime goes from
292 cycles down to 89 cycles. Running bus_cif at 2000 kbps, the
overall runtime of speed 0 goes from 301s to 295s (2% speed-up).

Change-Id: I24098136e7fee7ab2fbf1c11755bdf2ca37f3628
2013-07-10 20:16:02 -07:00
Jingning Han
aedc7c59b1 Merge "Fix tx_type bug in intra4x4 rd loop" 2013-07-10 20:13:25 -07:00
Ronald S. Bultje
decead7336 Replace copy_memNxM functions with a generic copy/avg function.
Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa
2013-07-10 18:27:24 -07:00
Ronald S. Bultje
c13e0bcb52 Remove unused fwalsh/fdct x86 SIMD implementations.
Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e
2013-07-10 18:22:51 -07:00
Dmitry Kovalev
ac72ad071d Making vp9_default_nmv_context static.
Change-Id: Ia3d5bd45adf288de11ab59c4728266c93c17e275
2013-07-10 17:44:45 -07:00
Ronald S. Bultje
46997bde88 Merge "Remove unused iwalsh4x4 MMX/SSE2 functions." 2013-07-10 17:08:46 -07:00
Ronald S. Bultje
a7ef456453 Merge "Remove unused 16x3/3x16 sad SSE2 functions." 2013-07-10 17:08:43 -07:00
John Koleszar
64f7a4d8cb Wide loopfilter 16 pix at a time
Where possible, do the 16 pixel wide filter while doing the horizontal
filtering pass. The same approach can be taken for the mbloop_filter
when that's implemented. Doing so on the vertical pass is a little more
involved, but possible.

Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
2013-07-10 16:32:44 -07:00
Dmitry Kovalev
544d8c3316 Removing unused TOKENEXTRA arg from pick_sb_modes function.
Change-Id: I0543e72fa092eef3976b65e16bb597197c364873
2013-07-10 15:57:28 -07:00
Jingning Han
18803f9cc4 Fix tx_type bug in intra4x4 rd loop
This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.

Change-Id: I7fe9953175b74890357dbcee33c138573766e980
2013-07-10 15:49:49 -07:00
Deb Mukherjee
7494bba66b Merge "Prunes out full-rd computation based on modeled rd" 2013-07-10 15:37:11 -07:00
Dmitry Kovalev
140447db5a Merge "Adding read_compressed_header function." 2013-07-10 15:11:08 -07:00
Dmitry Kovalev
0ac5e4dd58 Adding write_compressed_header function.
Change-Id: Ic5257fa8278e9b6297de230e4fd26a1e23ad2bb7
2013-07-10 15:08:34 -07:00
Jim Bankoski
68ef7a6b8a configure with internal stats not working
Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a
2013-07-10 15:07:53 -07:00
Ronald S. Bultje
3f210f10eb Remove unused iwalsh4x4 MMX/SSE2 functions.
Change-Id: I2d22577911a37ed7d8c7e08cac20764842267652
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
48c53233fd Remove unused 16x3/3x16 sad SSE2 functions.
Change-Id: I30a597c0cc366e34c9a3e2afe32d70e044f95ca4
2013-07-10 14:52:47 -07:00
Ronald S. Bultje
e6f955251f Merge "SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction." 2013-07-10 14:52:23 -07:00
Ronald S. Bultje
6a60249071 Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction." 2013-07-10 14:52:19 -07:00
Jim Bankoski
865ca76604 Merge "remove warnings when NDEBUG is set" 2013-07-10 14:39:39 -07:00
Jim Bankoski
6591cf2f7e remove warnings when NDEBUG is set
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
2013-07-10 14:27:20 -07:00
Deb Mukherjee
53ff43adc3 Prunes out full-rd computation based on modeled rd
Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
2013-07-10 13:49:49 -07:00
Jingning Han
114423538f SSE2 16x16 ADST/DCT hybrid transform
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.

Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
2013-07-10 12:14:53 -07:00
Dmitry Kovalev
417df1d42e Merge "Adding encode_tiles function to vp9_bitstream.c." 2013-07-10 11:43:50 -07:00
Yaowu Xu
e52eec490c Merge "Add a feature to reduce chrome intra mode search" 2013-07-10 11:35:47 -07:00
Scott LaVarnway
25b4909076 Merge "Bug fix: set frame_parallel_decoding_mode" 2013-07-10 11:09:30 -07:00
John Koleszar
d1f8dd518c Merge "Fix intermediate height in convolve" 2013-07-10 11:04:40 -07:00
Dmitry Kovalev
704afd0c7a Adding read_compressed_header function.
Splitting setup_txfm_mode into read_tx_mode and read_tx_probs.

Change-Id: I5b4fe48698d56490857d32eafcaeb4291f208479
2013-07-10 10:27:50 -07:00
Ronald S. Bultje
44b29a769c Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction." 2013-07-10 10:24:16 -07:00
Ronald S. Bultje
89810bfd71 Merge "SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction." 2013-07-10 10:13:16 -07:00
Ronald S. Bultje
4823307720 Merge "Remove memcpy() in handle_inter_mode() filter selection." 2013-07-10 10:13:07 -07:00
Dmitry Kovalev
20986c81b3 Merge "Removing vp9_maskingmv.c and corresponding assembly file." 2013-07-10 10:05:06 -07:00
Ronald S. Bultje
7fd643264a SSSE3 assembly for 4x4/8x8/16x16/32x32 H intra prediction.
Change-Id: Iad70966b986f65259329070e258f76ef0af816b4
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
8dade638a1 SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 TM intra prediction.
Change-Id: I3441c059214c2956e8261331bbf521525a617a86
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
75b33c68c7 SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 V intra prediction.
Change-Id: I55a6cfa2daba738cbc0c4a02f806893f7e556997
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
92c5d3665d SSE/SSE2 assembly for 4x4/8x8/16x16/32x32 DC intra prediction.
Change-Id: Ibe1690afc5459f3b3beca401e7734fcd03da6dd0
2013-07-10 09:28:03 -07:00
Ronald S. Bultje
b1df674a99 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
2013-07-10 09:27:56 -07:00
Yaowu Xu
bed27a960a Add a feature to reduce chrome intra mode search
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
2013-07-10 08:59:18 -07:00
Jim Bankoski
863204e64d mi_width_log2 & mi_height_log2
converted to lookup to avoid unnecessary code

Change-Id: I2ee6a01f06984cc2c4ba74b3fffd215318f749d2
2013-07-10 07:26:08 -07:00
Jim Bankoski
6c8170af52 b_width_log2 and b_height_log2 lookups
Replace case statement with lookup.
    Small speed gain at low speed settings but at speed 2+ where the
    number of motion searches etc. falls the impact rises to ~3-4%.

    Change-Id: Idff639b7b302ee65e042b7bf836943ac0a06fad8

Change-Id: I5940719a4a161f8c26ac9a6753f1678494cec644
2013-07-10 07:19:09 -07:00
Jim Bankoski
fb027a7658 removing case statements around prediction entropy coding
Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b
2013-07-09 20:10:16 -07:00
Yaowu Xu
059f2929e9 Merge "Revert "Remove memcpy() in handle_inter_mode() filter selection."" 2013-07-09 20:10:06 -07:00
James Zern
dac57fece6 Merge "Remove all asm offset files from VP9" 2013-07-09 19:13:37 -07:00
Dmitry Kovalev
2824048a56 Merge "Loop filter code cleanup." 2013-07-09 18:56:19 -07:00
Yaowu Xu
205efbc153 Revert "Remove memcpy() in handle_inter_mode() filter selection."
This reverts commit fcf7998a47.

Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb
2013-07-09 17:42:10 -07:00
Dmitry Kovalev
d82f459d1a Adding encode_tiles function to vp9_bitstream.c.
Change-Id: Ie44824ec25fd8fdb25d7c8124a9b28c26d802029
2013-07-09 15:59:19 -07:00
Frank Galligan
53971d86ea Merge "Add Neon horizontal and vertical vp9_mbloop_filter" 2013-07-09 15:38:44 -07:00
John Koleszar
f0d9f10d24 Remove all asm offset files from VP9
The files are empty and unused.

Change-Id: Ieb4242d14273efdf24149bda33f9591540bba06a
2013-07-09 14:26:53 -07:00
Scott LaVarnway
5900d13183 Merge "Removed unnecessary xd->mode_info_context assignment" 2013-07-09 12:45:32 -07:00
Frank Galligan
198fa6d0a0 Add Neon horizontal and vertical vp9_mbloop_filter
- The vp9 mbfilter C code will branch on flat and mask. This CL
  will perform both branches and combine the data. A later CL will
  perform a check to see if all patch will take one branch.
- These functions are about 1.75 times faster than the C code on
  Nexus 7.

PS #3
- Changed all functions to dub limit, blimit, and thresh from
  vld {dx[]}, freeing up r4-r6.
- Changed code to use vbif to reduce one instruction and free
  up a d register.

Change-Id: I028dae0e434dc9891c3677bdb182e201ffb04777
2013-07-09 12:40:05 -07:00
Dmitry Kovalev
ec68d25521 Merge "Adding update_tx_ct function, removing duplicated code." 2013-07-09 12:26:11 -07:00
Dmitry Kovalev
aeed28f143 Removing vp9_maskingmv.c and corresponding assembly file.
Change-Id: I9842d02d61d78d17dc3449bae8ffbe60f4b3ecb3
2013-07-09 11:22:56 -07:00
Dmitry Kovalev
92a9eaef50 Loop filter code cleanup.
Using MAX_LOOP_FILTER constant instead of number 63.

Change-Id: If91e0c198331b3041e7cd0707a5948479e9209d8
2013-07-09 11:18:09 -07:00
Scott LaVarnway
69d1d1d865 Removed unnecessary xd->mode_info_context assignment
mi is xd->mode_info_context

Change-Id: Ib101be922b695205ec57b5ce1828ba19bde5b41c
2013-07-09 13:41:34 -04:00
Ronald S. Bultje
204d1b7058 Merge "Unbreak lossless." 2013-07-09 09:54:48 -07:00
Ronald S. Bultje
d8fa5d45cc Merge "Make intra prediction pointers RTCD-based." 2013-07-09 09:54:43 -07:00
Ronald S. Bultje
059c0ba5d4 Unbreak lossless.
Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1
2013-07-09 09:46:37 -07:00
Jim Bankoski
7f960223ea cleanup read_mode_info if (1)
Change-Id: I851af23c787a2d3637d84244b9f75063cbf782f1
2013-07-09 09:04:45 -07:00
Jim Bankoski
c36d502e92 decoder speedup - get-segment-id only if segmentation enabled
Change-Id: I9355f8446660aeb7dfdbc5ee56635c791ac35e95
2013-07-09 08:52:30 -07:00
Yaowu Xu
df5731273f Merge "Fix loopfilter bug" 2013-07-09 01:34:25 -07:00
Dmitry Kovalev
c6c279aff0 Merge "Using mi_cols instead of mb_cols." 2013-07-08 20:09:19 -07:00
Dmitry Kovalev
1c65c580d6 Merge "Refactoring setup_pre_planes function." 2013-07-08 20:08:05 -07:00
Dmitry Kovalev
6254c8d780 Merge "Calling set_partition_seg_context() instead of code duplication." 2013-07-08 20:07:06 -07:00
Ronald S. Bultje
8350e7fe38 Make intra prediction pointers RTCD-based.
This probably has a mildly negative impact on performance, but will
(in future commits - or possibly merged with this one) allow SIMD
implementations of individual intra prediction functions. We may
perhaps want to consider having separate functions per txfm-size
also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for
each intra prediction mode), but I haven't played much with that
yet.

Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269
2013-07-08 17:25:51 -07:00
John Koleszar
527fc5caf6 Fix loopfilter bug
In the rare case were 4x4 interior filtering was called for but no
8x8 or larger filtering takes place, the previous code was skipping
the filtering. This patch fixes the issue by including the interior
mask in the overall mask for the filter application loops.

Change-Id: I4a0b65056c64f97478827c2ff41e0914fc7779d0
2013-07-08 16:49:57 -07:00
Ronald S. Bultje
a5062cc635 Don't call encode_sb() for the final of 4-split subpartitions.
The resulting reconstruction is never used, thus it just wastes CPU
cycles. Reduces encode time of first 50 frames of bus (speed 0) @
1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.

Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
bd867f1619 Inline vp9_get_mv_joint().
Encode time for first 50 frames of bus (speed 0) @ 1500kbps goes from
2min10.9 to 2min10.5, i.e. 0.3% faster overall, basically because we
prevent the call overhead.

Change-Id: I1eab1a95dd3eae282f9b866f1f0b3dcadff073d5
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
8fde07a3ae Don't recalculate mv_ref costs for each block/partition.
Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.

Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
5a73254918 Remove unnecessary memset(best_index, 0) from trellis/optimize.
First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to
2min11.6, i.e. 0.75% overall speedup.

Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
fcf7998a47 Remove memcpy() in handle_inter_mode() filter selection.
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0
2013-07-08 16:22:39 -07:00
Ronald S. Bultje
ed995afba1 Make frame-wide filter-type decision fully RD-based.
Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
2013-07-08 16:22:37 -07:00
Dmitry Kovalev
b7559258a4 Using mi_cols instead of mb_cols.
Eliminating usage of mb-units, switching to mi-units. Adding
ALIGN_POWER_OF_TWO macro.

Change-Id: I2491c969f713207c062011878b57e4e531818607
2013-07-08 14:54:04 -07:00
Deb Mukherjee
d9b62160a0 Implements several heuristics to prune mode search
Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
2013-07-08 12:17:12 -07:00
Jingning Han
a38cf2658a Merge "Refactor SSE2 8x8 functional units" 2013-07-05 11:18:18 -07:00
Tero Rintaluoma
18303b1263 Fix intermediate height in convolve
intermediate_height for horizontal filtering must be at least 8
pixels to be able to do vertical filtering correctly. Currently
it can be less for small block and y_step_q4 sizes.

Change-Id: I2ee28b0591b2041c2fa9844d0ae2ff8a1a59cc21
2013-07-05 14:58:25 +03:00
Paul Wilkins
ef0ca2deaa Merge "Fix to comp_inter_joint_search_thresh feature." 2013-07-04 03:27:00 -07:00
Dmitry Kovalev
bfcef95c45 Adding update_tx_ct function, removing duplicated code.
Change-Id: I8882fe3cd247a5a8304ab8ab2ee9abdb92830133
2013-07-03 18:24:13 -07:00
Dmitry Kovalev
f72e072555 Refactoring setup_pre_planes function.
Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63
2013-07-03 17:42:01 -07:00
Dmitry Kovalev
2ce6b23473 Merge "Adding write_skip_coeff function." 2013-07-03 16:33:58 -07:00
Jingning Han
68172dbede Merge "Enable early termination in rd search" 2013-07-03 14:20:41 -07:00
Dmitry Kovalev
430bd0c94a Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE." 2013-07-03 14:16:02 -07:00
Dmitry Kovalev
dda1835dc6 Adding write_skip_coeff function.
Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd
2013-07-03 13:23:47 -07:00
Yaowu Xu
c86b18894e Merge "Inline a few intra predictors" 2013-07-03 13:21:22 -07:00
Jingning Han
2bd6fe08f8 Enable early termination in rd search
This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
2013-07-03 12:54:18 -07:00
Dmitry Kovalev
2ad62c9312 Calling set_partition_seg_context() instead of code duplication.
Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555
2013-07-03 11:15:58 -07:00
Dmitry Kovalev
5a21de8418 Replacing 64 / MI_SIZE with MI_BLOCK_SIZE.
Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
2013-07-03 10:54:50 -07:00
Dmitry Kovalev
60198a595d Merge "Adding write_selected_txfm_size function." 2013-07-03 10:33:55 -07:00
Yaowu Xu
0f02dc2709 Inline a few intra predictors
Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710
2013-07-03 10:20:41 -07:00
Jingning Han
2cb75c9607 Refactor SSE2 8x8 functional units
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
2013-07-03 10:11:59 -07:00
Ronald S. Bultje
61fe678f36 Merge "Use pmovmskb to skip quantize loops over empty coefficients." 2013-07-03 09:05:48 -07:00
Ronald S. Bultje
98c493a1c0 Merge "Remove unused function vp9_build_inter4x4_predictors_mbuv()." 2013-07-03 09:05:20 -07:00
Paul Wilkins
f58b44ad62 Fix to comp_inter_joint_search_thresh feature.
When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88
2013-07-03 16:58:34 +01:00
Paul Wilkins
72c5778ec5 Added two new skip experiments.
sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
2013-07-03 16:56:06 +01:00
Scott LaVarnway
5e6b333902 Bug fix: set frame_parallel_decoding_mode
This patch allows the frame_parallel_decoding_mode flag
to be set instead of returning a codec error.

Change-Id: I4a1631c625723ac8873290d0fd0211074a87d112
2013-07-03 10:25:29 -04:00
Paul Wilkins
b0a2871c35 Merge "Adjust Speed 0 settings." 2013-07-03 02:47:18 -07:00
Dmitry Kovalev
1f6e95e76a Merge "Removing redundant struct from union b_mode_info." 2013-07-02 18:09:31 -07:00
Dmitry Kovalev
be77f6bbbf Removing redundant struct from union b_mode_info.
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
2013-07-02 16:51:57 -07:00
Dmitry Kovalev
edb060a77c Adding write_selected_txfm_size function.
Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214
2013-07-02 16:41:22 -07:00
Yaowu Xu
0d7b7c09cb Added a speed feature use_square_partition_only
This commit adds a speed feature where only squared partition are
evaluated in partition picking. Enable this feature in cpu-used 2
reduces encoding time by ~30%.

loss of compression:
-0.9% on cif set
-1.23% on stdhd

Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
2013-07-02 16:40:15 -07:00
Ronald S. Bultje
e5fb4b61b6 Use pmovmskb to skip quantize loops over empty coefficients.
If none of the 16 coefficients that we quantize per loop iteration
are larger than the zbin, directly skip to the next round of coeffs,
rather than doing a full quantize loop that will eventually result
in 16 zeroes. This incurs a jump cost, but saves a lot of other work.
32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded
no significantly positive results for smaller transforms, so is not
used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles).

Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647
2013-07-02 16:34:24 -07:00
Ronald S. Bultje
5b87240230 Remove unused function vp9_build_inter4x4_predictors_mbuv().
Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
2013-07-02 16:34:24 -07:00
Deb Mukherjee
37501d687c Speed feature to binary search dir intramodes
This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
2013-07-02 14:07:19 -07:00
Deb Mukherjee
66324d501a Merge "Clean-up in forward update to use mapping tables" 2013-07-02 14:02:53 -07:00
Deb Mukherjee
8d3d2b76f3 Tx size selection enhancements
(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
2013-07-02 13:54:00 -07:00
Deb Mukherjee
9c20cedd93 Clean-up in forward update to use mapping tables
Uses mapping tables instead of complicated modulo/division
operations for prob mapping for forward updates.

No bit-stream or output change.

Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546
2013-07-02 12:48:20 -07:00
Dmitry Kovalev
904070ca64 Merge "Removing unused implicit segmentation code." 2013-07-02 11:58:48 -07:00
Ronald S. Bultje
3cc6eb7c00 Merge "Make get_coef_context() branchless." 2013-07-02 11:48:15 -07:00
Dmitry Kovalev
3140c443e4 Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h." 2013-07-02 11:31:35 -07:00
Dmitry Kovalev
18fd43601c Merge "Additional vp9_decodemv.c cleanup." 2013-07-02 11:31:23 -07:00
Dmitry Kovalev
a3d2e6c98b Removing unused implicit segmentation code.
Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905
2013-07-02 11:16:42 -07:00
Yunqing Wang
f4bee75c2b Merge "Add speed feature to disable splitmv" 2013-07-02 10:54:22 -07:00
Yunqing Wang
b12e060b55 Add speed feature to disable splitmv
Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.

(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)

Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6
2013-07-02 10:02:34 -07:00
Jingning Han
b91a1586a3 Calculate rd cost per transformed block
Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
2013-07-02 09:58:46 -07:00
Ronald S. Bultje
9df24b41ca Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also." 2013-07-02 09:38:08 -07:00
Paul Wilkins
1319d9c077 Adjust Speed 0 settings.
Remove the use of sf->comp_inter_joint_search_thresh
from the baseline speed 0. Approx +0.4% on derf.

Change-Id: Icc14db98909830f40e5ac66130d40e78d2e55c71
2013-07-02 15:42:14 +01:00
Paul Wilkins
b7cd01ed73 Revert "New motion threshold factor - speed feature."
This reverts commit 1377278180.
Also fixes a spelling mistake.

Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
2013-07-02 15:06:40 +01:00
Yaowu Xu
9e408e3504 fix the mismatch again in cpu_used 2
Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679
2013-07-01 19:13:18 -07:00
Jim Bankoski
d4158283e7 use partitioning from last frame
This cl converts use partition from last frame to do the following:

if part is none,horz, vert -> try split
if part != none and one of the children is not split - try none


Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
2013-07-01 18:18:50 -07:00
Dmitry Kovalev
1ac0540296 Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h.
Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
2013-07-01 17:28:08 -07:00
Ronald S. Bultje
26b6318de8 Make get_coef_context() branchless.
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
2013-07-01 16:34:10 -07:00
Dmitry Kovalev
7ed754995d Additional vp9_decodemv.c cleanup.
Change-Id: I5b413bc0884af0bda38c05332d86490103905b3b
2013-07-01 16:14:13 -07:00
Yaowu Xu
ba3b2604f0 Merge "Quantize (64-bit only, for now) SSSE3 SIMD." 2013-07-01 15:58:57 -07:00
Dmitry Kovalev
6411228aca Merge "Removing vp9_modecont.{h, c}." 2013-07-01 14:58:48 -07:00
Dmitry Kovalev
d9db0d96ec Merge "Moving encoder subexp encoding functions to subexp.{h, c}." 2013-07-01 14:58:36 -07:00
Dmitry Kovalev
a4e14d7f9f Merge "Adding vp9_rb_read_signed_literal function." 2013-07-01 14:58:20 -07:00