1013 Commits

Author SHA1 Message Date
Dmitry Kovalev
430bd0c94a Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE." 2013-07-03 14:16:02 -07:00
Dmitry Kovalev
5a21de8418 Replacing 64 / MI_SIZE with MI_BLOCK_SIZE.
Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
2013-07-03 10:54:50 -07:00
Yaowu Xu
0f02dc2709 Inline a few intra predictors
Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710
2013-07-03 10:20:41 -07:00
Ronald S. Bultje
98c493a1c0 Merge "Remove unused function vp9_build_inter4x4_predictors_mbuv()." 2013-07-03 09:05:20 -07:00
Dmitry Kovalev
be77f6bbbf Removing redundant struct from union b_mode_info.
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
2013-07-02 16:51:57 -07:00
Ronald S. Bultje
5b87240230 Remove unused function vp9_build_inter4x4_predictors_mbuv().
Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
2013-07-02 16:34:24 -07:00
Deb Mukherjee
8d3d2b76f3 Tx size selection enhancements
(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
2013-07-02 13:54:00 -07:00
Dmitry Kovalev
904070ca64 Merge "Removing unused implicit segmentation code." 2013-07-02 11:58:48 -07:00
Ronald S. Bultje
3cc6eb7c00 Merge "Make get_coef_context() branchless." 2013-07-02 11:48:15 -07:00
Dmitry Kovalev
3140c443e4 Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h." 2013-07-02 11:31:35 -07:00
Dmitry Kovalev
a3d2e6c98b Removing unused implicit segmentation code.
Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905
2013-07-02 11:16:42 -07:00
Ronald S. Bultje
9df24b41ca Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also." 2013-07-02 09:38:08 -07:00
Dmitry Kovalev
1ac0540296 Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h.
Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
2013-07-01 17:28:08 -07:00
Ronald S. Bultje
26b6318de8 Make get_coef_context() branchless.
This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
2013-07-01 16:34:10 -07:00
Yaowu Xu
ba3b2604f0 Merge "Quantize (64-bit only, for now) SSSE3 SIMD." 2013-07-01 15:58:57 -07:00
Ronald S. Bultje
c8defcfdee Update quantize SSSE3 SIMD to cover 32x32 transform case also.
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
2min10.1, i.e. a 2.3% overall speed increase.

Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
2013-07-01 11:36:33 -07:00
Ronald S. Bultje
7353ceab9d Quantize (64-bit only, for now) SSSE3 SIMD.
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
2013-07-01 11:36:07 -07:00
Dmitry Kovalev
2ab3bc8871 Removing vp9_modecont.{h, c}.
Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.

Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de
2013-07-01 10:17:15 -07:00
Jingning Han
993942ce0c Merge "Enable SSE2 4x4 ADST/DCT transform" 2013-06-29 15:57:04 -07:00
Christian Duvivier
466e0cf303 SSE2 version of vp9_short_fdct32x32_rd.
43,000 -> 5,750 cycles, about 7.5x faster.

Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
2013-06-29 13:53:00 -07:00
Johann
6098e359f4 Merge "add Neon optimized add constant residual functions" 2013-06-28 19:50:38 -07:00
James Zern
84d08fa9c4 Merge "fix test compile error" 2013-06-28 19:48:05 -07:00
Ronald S. Bultje
a487af8d35 Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)." 2013-06-28 19:37:11 -07:00
chm
a83cfd4da1 add Neon optimized add constant residual functions
- Add add_constant_residual_8x8 16x16 32x32 functions
- Tested under RealView debugger enviroment

Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e
2013-06-28 19:06:51 -07:00
James Zern
a63e31e81e fix test compile error
since:
92479d9 Make update_partition_context faster

fixes:
vp9/common/vp9_blockd.h:408:22: error:
non-constant-expression cannot be narrowed from type 'int' to 'char' in
initializer list [-Wc++11-narrowing]
  char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)};
                     ^~~~~~~~~~~~~~~~~

Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b
2013-06-28 18:07:37 -07:00
Jingning Han
1109b6b888 Enable SSE2 4x4 ADST/DCT transform
This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.

Change-Id: Iad4d526346e05c7be896466c05500711bb763660
2013-06-28 17:24:43 -07:00
Dmitry Kovalev
228b8232d3 Cosmetic reordering of FRAME_CONTEXT members.
Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2
2013-06-28 16:16:03 -07:00
Dmitry Kovalev
59070f6e3c Merge "Removing CONFIG_DEBUG checks on assertions." 2013-06-28 14:03:28 -07:00
Ronald S. Bultje
ec5d09b950 Merge "Make coefficient skip condition an explicit RD choice." 2013-06-28 11:54:28 -07:00
Ronald S. Bultje
d00b8e5f82 Inline vp9_get_coef_context() (and remove vp9_ prefix).
Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
2013-06-28 10:40:21 -07:00
Dmitry Kovalev
0345fc3ad9 Merge "Decoder's code cleanup." 2013-06-28 10:38:54 -07:00
Dmitry Kovalev
8e6ce6bb9e Removing CONFIG_DEBUG checks on assertions.
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.

Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
2013-06-28 10:36:20 -07:00
Ronald S. Bultje
af660715c0 Make coefficient skip condition an explicit RD choice.
This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
2013-06-28 10:28:49 -07:00
Dmitry Kovalev
3231da0a9e Decoder's code cleanup.
Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.

Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
2013-06-27 16:15:43 -07:00
Frank Galligan
1d6dc1b702 Add Neon optimized loop filter functions.
- Added vp9_loop_filter_horizontal_edge_neon and
  vp9_loop_filter_vertical_edge_neon.
- The functions are based off the vp8 loopfilter
  functions.
- Matches x86 md5 checksum.

Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0
2013-06-27 16:14:45 -07:00
Dmitry Kovalev
a3664258c5 Merge "General cleanup in segmentation-related code." 2013-06-27 14:57:07 -07:00
Dmitry Kovalev
be83ef3104 Merge "Moving subexp encoding functions in separate vp9_dsubexp.c file." 2013-06-27 14:55:18 -07:00
Jingning Han
fc1cfd8e32 Merge "Make intra predictor reference buffer configurable" 2013-06-26 19:02:02 -07:00
Jingning Han
4c10515f89 Merge "Make update_partition_context faster" 2013-06-26 19:01:45 -07:00
Yaowu Xu
896dc47cac Merge "Change to use LUT for mode-to-txfm conversion" 2013-06-26 17:19:47 -07:00
Jingning Han
861cb06c67 Make intra predictor reference buffer configurable
This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.

Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
2013-06-26 17:17:21 -07:00
Jingning Han
92479d9526 Make update_partition_context faster
Use vpx_memset for updating the partition contexts. Thanks to Noah
for pointing out the need of refactoring in this part.

Change-Id: I67fb78429d632298f1cd8a0be346cc76f79392a6
2013-06-26 17:05:51 -07:00
Yaowu Xu
25fe05fd92 Change to use LUT for mode-to-txfm conversion
Change-Id: Ieb989830f49e6708ee7728eddebf7a2144c37c6f
2013-06-26 14:10:43 -07:00
Dmitry Kovalev
be07485e9a General cleanup in segmentation-related code.
Using consistent function and variable names.

Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7
2013-06-26 10:27:28 -07:00
John Koleszar
8137e24f3d Merge "Move vp9_counts_to_nmv_context to encoder" 2013-06-25 22:44:21 -07:00
John Koleszar
7bbb0633cd Merge "Move vp9_full_to_model_counts to encoder" 2013-06-25 22:44:16 -07:00
Jingning Han
3cc8c8c3a0 Merge "Refactor intra predictor block" 2013-06-25 19:46:55 -07:00
Jingning Han
d19ea3861d Refactor intra predictor block
Remove vp9_intra4x4_predict(). Use the common intra prediction
function for all block sizes.

Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560
2013-06-25 16:33:13 -07:00
Dmitry Kovalev
6fb10f2de4 Renaming "nmv" to "mv".
Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
2013-06-25 15:19:18 -07:00
Ronald S. Bultje
c24d922396 Add averaging-SAD functions for 8-point comp-inter motion search.
Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
the variance of the averaging predictor. This is slightly suboptimal
because the function is subpixel-position-aware, but it will (at least
for the SSE2 version) not actually use a bilinear filter for a full-pixel
position, thus leading to approximately the same performance compared to
if we implemented an actual average-aware full-pixel variance function.
That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
leading to a total gain of 2.7%.

Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
2013-06-25 12:57:28 -07:00