7218 Commits

Author SHA1 Message Date
Jim Bankoski
d7783cae95 Merge "make low bitrates a lot less blocky" 2015-02-03 13:25:06 -08:00
Jingning Han
894f0fbd3b Merge "Assign 2nd ref frame in choose_partitioning" 2015-02-03 12:25:18 -08:00
Jingning Han
ca9c352fc3 Assign 2nd ref frame in choose_partitioning
Avoid the use of uninitialized second reference frame for fetching
reference block.

Change-Id: I9983a0daea829700b3270dc8bf2bcc6d6ea36652
2015-02-03 11:17:51 -08:00
Yunqing Wang
f5b3631621 Merge "vp9_dthread: pass frame counts to decoder functions" 2015-02-03 10:52:02 -08:00
Yaowu Xu
a6b3e01a27 Add mutex initialization in encoder
This resolves the encoder crashes on windows.

Change-Id: I159d79014cf9279751e403936ce1f84482ae82da
2015-02-03 09:53:08 -08:00
Yunqing Wang
85a9bc04d4 vp9_dthread: pass frame counts to decoder functions
The current multi-threaded tile decoder requires that the videoes
are encoded with frame_parallel_decoding_mode = 1. This requirement
is not necessary, and is better to be removed. This patch includes
the first part of the work.

Change-Id: Ic7695fb3cfe13f9022582c9f0edd2aa6e2e36d28
2015-02-03 09:39:15 -08:00
Jim Bankoski
9f1cf2c8cf make low bitrates a lot less blocky
Remove loop filter skip at speed 7+ because of bad visual artifacts and
up the postprocessing.

Change-Id: Ibdd0bac71aaee232d2bb2e14462733c51517768d
2015-02-03 06:45:56 -08:00
hkuang
4ed539f22e Merge "Fix a bug from merging frame parallel branch into master." 2015-02-02 17:08:42 -08:00
hkuang
94a459522e Fix a bug from merging frame parallel branch into master.
The merge did not merge the fix for issue #850.

Change-Id: I0dc1377dbfcb9497fb01a13d4f78ac65bff5eb33
2015-02-02 16:01:17 -08:00
Alex Converse
a79db92c07 Merge "Allow larger encoder configurations." 2015-02-02 12:05:56 -08:00
Yaowu Xu
80e729f601 Merge "Optimize coef update" 2015-02-01 20:08:29 -08:00
hkuang
be6aeadaf4 Try again to merge branch 'frame-parallel' into master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.

Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.

There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.

Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
2015-01-30 21:00:13 -08:00
James Zern
f6c2a6c5d6 vp9: rename 'near' parameters
+ nearest for consistency

near is a reserved word in windows builds so using it as a parameter
name may cause build failures with some configurations

Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403
2015-01-30 15:52:24 -08:00
Jingning Han
f1ab5c1021 Merge "Format fixes in vp9_rd_pick_inter_mode_sb/sub8x8" 2015-01-30 15:49:14 -08:00
Yaowu Xu
45971abd1d Optimize coef update
1. move the check of search method of USE_TX_8X8 up one level to
avoid operations of build_tree_distributions()
2. count tx used and avoid computaton for coef udpate when one size
is not used at all.

Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c
2015-01-30 10:16:40 -08:00
Yunqing Wang
3b3e299650 Merge "Fix issues in 32bit PIC enabled build" 2015-01-29 16:41:25 -08:00
Alex Converse
797a2556eb Allow larger encoder configurations.
Allow changing colorspace in the encoder and increasing frame size.

Change-Id: I8e7c3b891af29ce420a15beb4f6f9c250245b2bb
2015-01-29 15:07:40 -08:00
Paul Wilkins
68340a3470 Merge "Change to update of rate control factors." 2015-01-29 13:50:52 -08:00
Marco
a80dd52b6e Merge "Fix to vp9 denoiser." 2015-01-29 09:10:30 -08:00
Paul Wilkins
f752da8ce2 Change to update of rate control factors.
Remove damping parameter and use the damping
formula introduced by Yaowu Xu in all cases.

Change-Id: I18db7e0d0f262d5140102f259ab07821d374d285
2015-01-28 15:44:53 -08:00
Yaowu Xu
ff99a3c750 Simplify update_coef_probs()
1. reduce the size of temporaray arrays on stack
2. avoid build_tree_distribution for tx size that is not used at all.

Change-Id: I0f8d7124e16a3789d3c15ad24cf02c1c12789e2c
2015-01-28 15:12:42 -08:00
Marco
c0923d4d3a Fix to vp9 denoiser.
Prevent from using wrong mv for denoiser motion compensation.

Change-Id: Ifa0f9daabdbdab0900d3c17304059fe0d15de914
2015-01-28 12:07:27 -08:00
Frank Galligan
d1e6b8231a Merge "Add vp9_sad32x32x4d_neon Neon intrinsic function." 2015-01-28 10:35:50 -08:00
Frank Galligan
eb12d880ab Merge "Add vp9_sad16x16x4d_neon Neon intrinsic function." 2015-01-27 23:01:44 -08:00
Frank Galligan
80a3a07929 Merge "Add vp9_sad64x64x4d_neon Neon intrinsic function." 2015-01-27 23:01:15 -08:00
Yunqing Wang
10d5e09c87 Fix issues in 32bit PIC enabled build
This patch was to fix issue 924:
https://code.google.com/p/webm/issues/detail?id=924

The SECTION_RODATA macro was modified to support macho32 format.
The sub-pixel functions were modified to pass in 2 more parameters
to handle the global offsets for PIC build.

Change-Id: I3bfcd336bcae945edf300bca4ab40376a2628cd4
2015-01-27 22:20:21 -08:00
Yaowu Xu
fe2439703d Merge "move clear_system_state() call before using double" 2015-01-27 12:42:13 -08:00
Frank Galligan
e3167f7fbf Add vp9_sad32x32x4d_neon Neon intrinsic function.
On Nexus 7 speed -6 saw ~18% increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

BUG=https://code.google.com/p/webm/issues/detail?id=908

Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c
2015-01-27 08:54:00 -08:00
Frank Galligan
9f574d0316 Add vp9_sad16x16x4d_neon Neon intrinsic function.
On Nexus 7 speed -6 saw ~15% increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

BUG=https://code.google.com/p/webm/issues/detail?id=908

Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9
2015-01-27 08:42:17 -08:00
Frank Galligan
54fa956715 Add vp9_sad64x64x4d_neon Neon intrinsic function.
On Nexus 7 speed -6 saw ~30% increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

BUG=https://code.google.com/p/webm/issues/detail?id=908

Change-Id: Id12af7d1883243c23e6692e898aea82299633d58
2015-01-27 08:33:40 -08:00
Marco
1c4a84c6e9 Merge "aq-mode=3: Update to allow for refresh on modes other than zero-mv." 2015-01-26 19:47:13 -08:00
Yaowu Xu
645b7cdf03 move clear_system_state() call before using double
Floating point is used in vp9_convert_qindex_to_q(), so sometime unit
test ActiveMapTest would cause run time error without properly call
to clear_system_state to reset register status.

Change-Id: I181e9395148c44a6ca8b97d6e109bd4a152143c6
2015-01-26 18:41:50 -08:00
Paul Wilkins
d231ce4fde Merge "Adjust active maxq for GF groups." 2015-01-26 18:19:09 -08:00
Yaowu Xu
d987dc4fdb Merge "Fix MSVC warnings on conversion from int64 to int" 2015-01-26 16:52:30 -08:00
Marco
3f1af6e85e aq-mode=3: Update to allow for refresh on modes other than zero-mv.
Add distortion threshold condition to refresh state of a coding block,
and allow for qp adjustment also for some intra modes and non-zero motion modes.

Also some code cleanup (remove unused variables/code).

Change-Id: I735fa2b28bc64f60e0323976b82510577b074203
2015-01-26 16:44:25 -08:00
Paul Wilkins
fd070220ff Adjust active maxq for GF groups.
Currently disabled by default: enabled using
#define GROUP_ADAPTIVE_MAXQ

In this patch the active max Q is adjusted for each GF
group based on the vbr bit allocation and raw first pass
group error.

This will tend to give a lower q for easy sections
and a higher value for very hard sections. As such it is
expected to improve quality in some of the easier
sections where quality issues have been reported.

This change tends to hurt overall psnr but help
average psnr. SSIM also shows a small gain.

Average results for derf, yt, std-hd and yt-hd test sets were
as follows (%change for average psnr, overal psnr and ssim):-

derf +0.291, - 0.252, -0.021
yt +6.466, -1.436, +0.552
std-hd +0.490, +0.014, +0.380
yt-hd +5.565, - 1.573, +0.099

Change-Id: Icc015499cebbf2a45054a05e8e31f3dfb43f944a
2015-01-26 14:55:36 -08:00
Yaowu Xu
6d16f6c14c Fix MSVC warnings on conversion from int64 to int
Change-Id: I7e96509ffa36899fcd2935749927a1e8aac8d025
2015-01-26 10:54:06 -08:00
Frank Galligan
9f6eba419a Add Neon intrinsic vp9_fdct8x8_quant_neon
On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30%
increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51
2015-01-24 22:49:50 -08:00
Yaowu Xu
643c75d90b Merge "Replace divide with look-up" 2015-01-23 21:12:18 -08:00
Jingning Han
9bdc0ae2b2 Format fixes in vp9_rd_pick_inter_mode_sb/sub8x8
Add parentheses to bit operations.

Change-Id: I095d601f0631d055adc4b3a8fde70c9cbae9e749
2015-01-23 11:48:58 -08:00
JackyChen
65f60f8e8c Merge "SSE2 code for the filter in MFQE." 2015-01-23 11:08:16 -08:00
Adrian Grange
0e2e2c2652 Merge "Remove elevate_newmv_thresh from SPEED_FEATURES (unused)" 2015-01-23 09:57:03 -08:00
Yaowu Xu
eda179764f Replace divide with look-up
This commit replaces an integer divide with a table-lookup. It is
to improve decoding speed, and at the same time, to reduce possible
complications with a bug in AMD Family 12h processors:

"665 Integer Divide Instruction May Cause Unpredictable Behavior"

Change-Id: I678b707a538798a923850bac467e66e847e6def7
2015-01-23 09:02:07 -08:00
Johann
a18da9760a Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch."
This reverts commit bde04ce5039cbcf86c8b34bdb4127e18d7e1d0c7

Change-Id: I053dae04c761b04a36dc239558503905a14d2470
2015-01-23 08:42:02 -08:00
hkuang
bde04ce503 Merge branch 'frame-parallel' to enable frame parallel decode in master branch.
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.

* frame-parallel:
  Add error handling for frame parallel decode and unit test for that.
  Fix a bug in frame parallel decode and add a unit test for that.
  Add two test vectors to test frame parallel decode.
  Add key frame seeking to webmdec and webm_video_source.
  Implement frame parallel decode for VP9.
  Increase the thread test range to cover 5, 6, 7, 8 threads.
  Fix a bug in adding frame parallel unit test.
  Add VP9 frame-parallel unit test.
  Manually pick "Make the api behavior conform to api spec." from master branch.
  Move vp9_dec_build_inter_predictors_* to decoder folder.
  Add segmentation map array for current and last frame segmentation.
  Include the right header for VP9 worker thread.
  Move vp9_thread.* to common.
  ctrl_get_reference does not need user_priv.
  Seperate the frame buffers from VP9 encoder/decoder structure.
  Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""

 Conflicts:
       test/codec_factory.h
       test/decode_test_driver.cc
       test/decode_test_driver.h
       test/invalid_file_test.cc
       test/test-data.sha1
       test/test.mk
       test/test_vectors.cc
       vp8/vp8_dx_iface.c
       vp9/common/vp9_alloccommon.c
       vp9/common/vp9_entropymode.c
       vp9/common/vp9_loopfilter_thread.c
       vp9/common/vp9_loopfilter_thread.h
       vp9/common/vp9_mvref_common.c
       vp9/common/vp9_onyxc_int.h
       vp9/common/vp9_reconinter.c
       vp9/decoder/vp9_decodeframe.c
       vp9/decoder/vp9_decodeframe.h
       vp9/decoder/vp9_decodemv.c
       vp9/decoder/vp9_decoder.c
       vp9/decoder/vp9_decoder.h
       vp9/encoder/vp9_encoder.c
       vp9/encoder/vp9_pickmode.c
       vp9/encoder/vp9_rdopt.c
       vp9/vp9_cx_iface.c
       vp9/vp9_dx_iface.c

Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
2015-01-22 18:18:53 -08:00
Adrian Grange
527e073163 Remove elevate_newmv_thresh from SPEED_FEATURES (unused)
Change-Id: I78ef7f89586a329787f6bc4c58ec83af210989a3
2015-01-22 16:12:50 -08:00
Marco
0dccb6277c Modify variance partition selection for low resolutions.
For low spatial resolutions: bias partittion selection to smaller block sizes,
and base the variance computation on 4x4 down-sampling.

Also move the threshold computations into the choose_partitioning,
so they are computed once for each sb block.

On low-res clips (RTC_derf) PSNR/SSIMetrics increase by about 4-5%.
No change for resolutions above CIF.

Change-Id: I93f8ff742c8044786977bb6e31dcf8efda6dd1b0
2015-01-22 15:16:55 -08:00
Paul Wilkins
cf3202132f Merge "Bug when last group before forced key frame is short." 2015-01-22 08:28:19 -08:00
Paul Wilkins
0bff1efc2b Bug when last group before forced key frame is short.
Just before a forced key frame we often get a foreshortened
arf/gf group. In such a case, we do not want to update
rc->last_boosted_qindex, which is used to define the Q range
for the forced key frame itself.

This gives a small average metrics gain for the YT and YT-HD sets
(eg. YT SSIM +0.141%).

Change-Id: Ie06698bc4f249e87183b8f8fb27ff8f3fde216d9
2015-01-21 15:25:57 -08:00
JackyChen
cd0830f452 Merge "Fix compile error in Chromium building." 2015-01-21 14:52:32 -08:00