Commit Graph

2735 Commits

Author SHA1 Message Date
James Zern
b644384bb5 Merge "vp9: fix high-bitdepth NEON build" 2015-04-01 23:36:17 -07:00
Yaowu Xu
54210f706c Merge "use MAX_MB_PLANE consistently" 2015-04-01 18:24:39 -07:00
Yaowu Xu
f26b8c84f8 use MAX_MB_PLANE consistently
Change-Id: Ic416a7f145001a88f5a7f70dde9b1edbc1b69381
2015-04-01 15:21:20 -07:00
Jingning Han
1470529f62 Refactor block_yrd function for RTC coding mode
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.

Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
2015-04-01 12:00:43 -07:00
James Zern
8845334097 vp9: fix high-bitdepth NEON build
remove incorrect specializations in rtcd and update a configuration
check in partial_idct_test.cc

Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0
2015-03-31 17:45:25 -07:00
hui su
d4f2f1dd5b Merge "Move vp9_coef_con_tree to common/" 2015-03-31 10:51:10 -07:00
Jingning Han
db5ec37edc Merge "Enable 16x16 Hadamard transform in SATD based mode decision" 2015-03-31 09:55:41 -07:00
hui su
302e24cb3e Move vp9_coef_con_tree to common/
This tree should be defined in common/, as it is needed for
both encoder and decoder.

Change-Id: I4f5cbc80025cf2ced14182c98f7c82dc7d0f87db
2015-03-31 09:20:46 -07:00
Jingning Han
26d3d3af6a Enable 16x16 Hadamard transform in SATD based mode decision
This commit replaces the 16x16 2D-DCT transform with Hadamard
transform for RTC coding mode. It reduces the CPU cycles cost
on 16x16 transform by 5X. Overall it makes the speed -6 encoding
speed 1.5% faster without compromise on compression performance.

Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b
2015-03-30 15:43:31 -07:00
Jingning Han
f0ac5aaa08 Merge "Hadamard transform based coding mode decision process" 2015-03-30 15:43:15 -07:00
Jingning Han
8c411f74e0 Hadamard transform based coding mode decision process
This commit uses Hadamard transform based rate-distortion cost
estimate for rtc coding mode decision. It improves the compression
performance of speed -6 for many hard clips at lower bit-rates.
For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for
niklas720p. This will introduce extra encoding cycle costs at
this point.

Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375
2015-03-30 14:46:05 -07:00
jackychen
68610ae568 vp9_postproc.c: eliminate -Wshadow build warnings.
Change-Id: I6df525a9ad1ae3cfbba8710d21db8fee76e64dbb
2015-03-27 20:27:30 -07:00
Alex Converse
a1e20ec58f Refactor fast loop filter code to handle 444.
Change-Id: I921b1ebabdf617049f8fa26fbe462c3ff115c1ce
2015-03-24 11:17:50 -07:00
hkuang
9f4f98fdbd Merge "Optimize the intra frame decode to skip some unnecessary copy." 2015-03-23 16:50:37 -07:00
hkuang
85107641a4 Optimize the intra frame decode to skip some unnecessary copy.
This speeds up a normal YT style 1080P clip decode by ~1% on nexus 7.

Change-Id: Ied7fa0d8bc941b2adb4db9382f549ee4d5654f3a
2015-03-23 10:11:49 -07:00
hkuang
b88dac8938 Safely free all the frame buffers after all the workers finish the work.
Issue: 978

Change-Id: Ia7aa809095008f6819a44d7ecb0329def79b1117
2015-03-19 12:21:00 -07:00
Yaowu Xu
73508be364 Fix a typo introduced in #94401aff
This fixes all test vector failures

Change-Id: Ie1a9fe0f023f7a0c7e89eb55df1b40ff65302adc
2015-03-12 08:01:08 -07:00
hkuang
4a691aa209 Merge "Refactor the block decode code to make it simpler." 2015-03-11 16:19:14 -07:00
hkuang
94401aff5c Refactor the block decode code to make it simpler.
Change-Id: I0f983cb821ad7ec6fbefe7895cb8124a8fa39df6
2015-03-11 11:37:16 -07:00
Yunqing Wang
f0cf9719d0 Accumulate tx_totals counters in multi-threaded encoder
Tx_totals counters weren't handled correctly in multi-thread
case, which caused the mismatch while encoding using threads > 1.
This patch fixed that.

Change-Id: Ice9b0386f57175fb92a0bdcd5042686a3106246a
2015-03-10 10:02:49 -07:00
Hangyu Kuang
a1ef75bb63 Merge "Only wait for previous frame's motion vector if needed." 2015-03-06 10:27:26 -08:00
Hangyu Kuang
d5fa786b4f Only wait for previous frame's motion vector if needed.
Change-Id: Iecce685a33b64844446c0009f21bc85566d7469f
2015-03-05 16:09:44 -08:00
Johann
42eb97eb91 Declare function used by 'once' with 'void' parameters
Visual Studio is exceptionally picky about this:
vp9_reconintra.c(900): warning C4113: 'void (__cdecl *)()' differs in
parameter lists from 'void (__cdecl *)(void)'
[.build-x86_64-win64-vs10\vpx.vcxproj]

Change-Id: I564c7415f4608fd962be8c699d6133a996b545f7
2015-03-04 15:34:55 -08:00
Adrian Grange
3807dd82ab Make encoder buffer allocation dynamic
Frame buffers are now allocated dynamically on-demand.

Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".

All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.

Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
2015-03-04 07:58:32 -08:00
Yunqing Wang
55639c383b fix a race condition caused by intra function pointer initialization
This patch fixed webm issue 962.
(https://code.google.com/p/webm/issues/detail?id=962)
The data races occurred when an encoder and a decoder were created
at the same time, and the function pointers were initialized twice.

Change-Id: I8851b753c4b4ad4767d6eea781b61f0ac9abb44b
2015-03-03 09:58:37 -08:00
Jingning Han
1790d45252 Use variance metric for integral projection vector match
This commit replaces the SAD with variance as metric for the
integral projection vector match. It improves the search accuracy
in the presence of slight light change. The average speed -6
compression performance for rtc set is improved by 1.7%. No speed
changes are observed for the test clips.

Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d
2015-03-01 10:42:56 -08:00
Jingning Han
c4cb8059ff Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 4" 2015-02-27 09:49:10 -08:00
Jingning Han
43bb97f7d0 Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 3" 2015-02-27 09:49:00 -08:00
Jingning Han
4800b0e80d Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 2" 2015-02-27 09:48:51 -08:00
Jingning Han
8ec22296b3 Fix high bit-depth loop-filter sse2 compiling issue - part 3
Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69
2015-02-26 15:21:22 -08:00
Jingning Han
14ff1cb74a Fix high bit-depth loop-filter sse2 compiling issue - part 2
Change-Id: I6728b69bb3dff1daa64ff7142f691e80a089f1c4
2015-02-26 12:41:19 -08:00
Jingning Han
2080e4b206 Fix high bit-depth loop-filter sse2 compiling issue - part 1
The intrinsic statement _mm_subs_epi16() should take immediate.
Feeding variable as its input argument will cause compile failure
in older version gcc.

Change-Id: I6a71efcc8d3b16b84715e0a9bcfa818494eea3f4
2015-02-25 09:59:50 -08:00
James Zern
044bfa3949 Merge "vp9_loopfilter: quiet integer constant size warnings" 2015-02-24 19:09:32 -08:00
Jingning Han
5b87f1bb5a Fix high bit-depth loop-filter sse2 compiling issue - part 4
Change-Id: I39f56f60425836f2e1ec07da71edd4810a4c78bb
2015-02-24 14:50:30 -08:00
James Zern
279d350f0b vp9_loopfilter: quiet integer constant size warnings
mark uint64_t constants with 'ULL'

Change-Id: I7648e161b4004fba35e1fa7ab79e34cc19e39716
2015-02-24 11:13:16 -08:00
Hangyu Kuang
8724d31d12 Move dequant table from VP9_COMMON to VP9_COMP as decoder
does not need it any more.

This reduces VP9_COMMON size from 25776 bytes to 17584 bytes(~31%).

Change-Id: Ic5daea732ccefb6d512b048af7983f0efe08589b
2015-02-20 11:12:42 -08:00
Jingning Han
216b171d63 Merge "Integral projection based motion estimation" 2015-02-19 15:08:11 -08:00
Jingning Han
ed2dc59c1b Integral projection based motion estimation
This commit introduces a new block match motion estimation
using integral projection measurement. The 2-D block and the nearby
region is projected onto the horizontal and vertical 1-D vectors,
respectively. It then runs vector match, instead of block match,
over the two separate 1-D vectors to locate the motion compensated
reference block.

This process is run per 64x64 block to align the reference before
choosing partitioning in speed 6. The overall CPU cycle cost due
to this additional 64x64 block match (SSE2 version) takes around 2%
at low bit-rate rtc speed 6. When strong motion activities exist in
the video sequence, it substantially improves the partition
selection accuracy, thereby achieving better compression performance
and lower CPU cycles.

The experiments were tested in RTC speed -6 setting:
cloud 1080p 500 kbps
17006 b/f, 37.086 dB, 5386 ms ->
16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster)

pedestrian_area 1080p 500 kbps
53537 b/f, 36.771 dB, 18706 ms ->
51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings)

blue_sky 1080p 500 kbps
70214 b/f, 33.600 dB, 13979 ms ->
53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster)

jimred 400 kbps
13380 b/f, 36.014 dB, 5723 ms ->
13377 b/f, 36.087 dB, 5831 ms  (2% bit-rate savings, 2% slower)

Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c
2015-02-19 13:47:19 -08:00
James Zern
0dd591bedd loop_filter_rows_mt: remove dependency on 'last_height'
using this to control reallocation would miss a change if the function
were not called for every frame.
fixes potential memory corruption by the subsequent memset

Change-Id: I4c6bb6ab68803104fc824c7e27cc2f9b2cf53e33
2015-02-13 19:11:23 -08:00
Yunqing Wang
238707ab4c Merge "Make vp9_print_modes_and_motion_vectors() work" 2015-02-11 16:58:52 -08:00
James Zern
d8ed558c99 Merge "vp9_thread: prefer pthread.h if available" 2015-02-11 16:50:07 -08:00
James Zern
923cc0bf51 vp9_highbd_tm_predictor_16x16: fix win64
by saving xmm8; cglobal's xmm reg arg is 0-based

Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb
2015-02-10 19:34:12 -08:00
Yunqing Wang
f37788eaf6 Make vp9_print_modes_and_motion_vectors() work
MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors()
didn't work anymore. This patch modified vp9_debugmodes.c so that
this function works again for debug usage.

Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a
2015-02-10 16:37:02 -08:00
James Zern
d167a1aeee vp9_thread: prefer pthread.h if available
this avoids conflicts with recent versions of mingw-w64 (tested g++
4.8.2) and the unit tests

Change-Id: Ic41ea31eebe0e3e712ed5e657f37d8cad6712088
2015-02-10 12:47:14 -08:00
Yunqing Wang
84b813aa42 Merge "Make encoder and decoder share common thread function" 2015-02-10 09:06:41 -08:00
Yunqing Wang
d3a37731c2 Merge "Rename loopfilter_thread files to thread_common files" 2015-02-10 09:06:23 -08:00
hkuang
dd88f48296 Set the maximum decode threads to be 8.
This will fix the frame parallel decode hang on windows
due to not enough semaphores.

This will also make the frame parallel decode safer as
the number of frame buffers could only support maximum
8 threads.

Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309
2015-02-09 10:38:41 -08:00
Yunqing Wang
4ae092c660 Make encoder and decoder share common thread function
Moved vp9_accumulate_frame_counts to vp9_thread_common.c to
eliminate the duplicate code.

Change-Id: I9cf506d729603c8bf1494b4c86a3b7d47af1917a
2015-02-06 11:45:51 -08:00
Yunqing Wang
41063137c3 Rename loopfilter_thread files to thread_common files
Renames the files to allow more common thread code to
be moved to vp9/common.

Change-Id: I7386e64e221086e3cdc087e79812f993c423413b
2015-02-06 10:03:31 -08:00
Jingning Han
0c6d3a03e1 Account for chroma component costs in RTC mode decision
This commit allows the encoder to account for additional chroma
plane costs in the mode decision process, if the current block
potentially contains significant color change. It improves the
visual quality at very low bit-rates.

The compression performance of dark720p is improved by 12.39% in
speed 6. For jimred at 150 kbps, the PSNR of V component (red)
increased by 0.2 dB, at the expense of about 5% increase in
encoding time. Note that for sequences where the chroma components
are fairly consistent, the encoding time increase is negligible.

On average the rtc set compression performance is improved by
1.172% in PSNR and 1.920% in SSIM.

Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c
2015-02-04 09:45:14 -08:00