- For experiment EXT_INTERP under high bit depth.
- Add unit test to verify bit-exact.
- Speed performance improvement:
On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
drops from 6682503 ms to 5390270 ms.
Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to,
81ad953 Convolution vertical filter SSSE3 optimization
Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
to 6.73%.
Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57
- Apply signal direction/4-pixel vertical/8-pixel vertical
parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.
Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.
Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoder overall instruction count drops 2.91%.
- Decoder overall instruction count drops 1.01%.
- Add unit test to test bit-exact result against C.
Change-Id: I908c9e0e5106c58f67dd72d28760e6c9ce54278e
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update vp10_fht16x16_test.cc to do bit-exact test against
latest C version.
- HBD encoder speed improves ~1.8%.
Change-Id: Icfc799a212e5289bcf6cedcae3722032133a2bc6
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update bit-exact unit test against current C version.
- HBD encoder speed improves ~3.8%.
Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
- Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Overall encoder speed improves ~4.5%-6%.
- Update bit-exact unit test against current C version.
Change-Id: If751c030612245b1c2470200c9570cf40d655504
Unit test shows manually developed SSE4.1 code would performs ~30%
better if TXFM_2D_CFG configuration is set in lower level. This
change only updates function signature. There is no performance
impact.
Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
When using FLIPADST, the vp10_inv_txfm_add functions used to flip
the destination array, add the result of the inverse transform, to it
and then flip the destination back. This has been replaced by
flipping the result of the inverse transform before adding it to the
destination. Up-Down flipping is done by negating the destination
stride, and staring from the bottom, so it should now be free.
Left-right flipping is done with the usual SSE2 instructions in the
optimized code.
The C functions match the SSE2 functions as expected, so the C functions
now do the flipping as well when required. Adding this cleanly required
some refactoring of the C functions, but there is no measurable
performance impact when ext-tx is not enabled.
Encode speedup with ext-tx enabled is about 3%.
Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234
1) copy following files from vpx_dsp/ to vp10/common/
vp10_inv_txfm.c
vp10_inv_txfm.h
vp10_inv_txfm_sse2.c
vp10_inv_txfm_sse2.h
2) change the function prefix "vpx_" to "vp10_" in above files
3) add unit test at vp10_inv_txfm_test.cc
Change-Id: I206f10f60c8b27d872c84b7482c3bb1d1cb4b913