Improve the readability in the related rate-distortion optimization
search control function of sub8x8 blocks.
Change-Id: I7f7456bf40a98aa5146abfe0488cda745b84d899
This commit makes the sub8x8 block to use its nearest neighbor's
motion vector as predicted motion vector for NEWMV mode. It improves
the coding performance by 0.12%.
Change-Id: I99e56715b327573ce7e8a26e3515a4984dadfd98
Moved the API patch from NextGen to NextGenv2 and also added this
API to VP10. An example was included. To try it, for example, run
the following command:
$ examples/vpx_cx_set_ref vp10 352 288 in.yuv out.ivf 4 30
Change-Id: Ib56bc3d365e530cfc8d859a13ddbf4c007907b81
This patch fixes 2 issues in Palette mode:
1. More memory is needed in PALETTE_BUFFER for 444 video format.
2. A merge issue caused by
https://chromium-review.googlesource.com/#/c/333940/7
Change-Id: I2aedc7dfdfb6b66fbd600189ec6e1e2cc6120d40
- Wrote function: fidtx8_sse2() and fidtx16_sse2().
- Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
- Updated 8x8/16x16 unit tests for accuracy/speed.
- Running 20K times with random numbers and getting through
tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
8x8: ~131%
16x16: ~66%
Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
Skip checking obmc when regular inter predictor is not so good (the
rd-cost for Y residual is greater than the total rd of the best mode
so far.)
Performance change compared to full rd search:
+0.006% lowres, -0.056% midres
Encoding time :
1.14X baseline (was 1.42X)
Change-Id: I11350f955a20e1a2331be458537a915e09fbedf3
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.
Change-Id: I8f736db8388408a8cc35320a2f80abb02906571c
Skip filtered intra modes search in inter frame when DC mode is
worse than the best mode so far.
With ext-intra enabled, the overall speed is increased by 20~40%;
performance drop is 0.03% on lowres and 0.05% on midres.
Change-Id: I75d2503b067cf5e46e3533b97fb01497e125baa7
- Added function fidtx4_sse2().
- Turned on vp10_fht4x4_sse2() for these tx types.
- Updated 4x4 unit test for speed/accuracy.
- 4x4 Unit test passed.
- Running 20K times with random numbers for tx type from
V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.
Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf
If configured with --enable-ext-tile, the codec uses an alternative
tile coding syntax in the bitstream. Changes include::
- The maximum number of tile rows and columns is extended to 1024
each.
- The minimum tile width/height is 64 pixels (1 superblock).
- A tile copy mode is added where a tile directly reuse the coded
data of a previous tile
- The meaning of the tile-columns and tile-rows codec parameters are
overloaded to mean tile-width and tile-height in units of 64
pixels.
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxdec also gained the options to decode only a particular tile,
tile row, or tile column.
Changes without --enable-ext-tile:
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxenc default tile configuration changed to use 1 tile column.
Change-Id: I0cd08ad550967ac18622dae5e98ad23d581cb33e
- Use Makefile to control the build for highbd_fwd_txfm_sse4.c.
- Fixed hybrid transform (HT) types due to recent update.
- Added new unit test cases for highbd HT.
Change-Id: Ifd768a9b429a8c21ed40c1de8152fb5ac71e2f90
This commit separates the predicted motion vector from the nearestmv
motion vector in the coding process for both regular and sub8x8
block sizes.
Change-Id: I703490513b0194e6669ebf719352db015facb3e1
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
--bit-depth=12, 50 frames.
- Unit test passed.
Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
This has been ported under ext_partition_types because it is due
to be combined with the coding_unit_size experiment which is
already being ported under ext_partition
Change-Id: I47af869ae123ddf0aa99160dac644059d14266ee