1473 Commits

Author SHA1 Message Date
Hui Su
096d8ace8e Merge "Extra round of subpel MV search around second best full-pixel MV" into nextgenv2 2016-07-19 16:55:11 +00:00
Wei-ting Lin
3c13124e9b Merge "Allow OVERLAY frames to use the show_exsiting_frame flag" into nextgenv2 2016-07-19 04:28:08 +00:00
Sarah Parker
5fa46c0b60 Add global motion parameters to compressed header
Currently nothing is implemented to compute GM parameters, this
just adds the capability to send them in the bitstream if they
were computed. Still need to implement the reconstruction
based on the parameters in reconinter.

Change-Id: I72aea3c6a9de9f5a40f96da76c82b54a52781fe2
2016-07-18 17:24:07 -07:00
Wei-ting Lin
ccc9e7cfc6 Allow OVERLAY frames to use the show_exsiting_frame flag
ARF with zero strength temporal filter can be reused by setting the
show_existing_frame = 1, and in this case, there is no need to
refresh the reference frame buffer. However, we used the flag
"refresh_golden_frame" as the identifier for the starting point of a gf
group.

A new flags "is_arf_filter_off" is used to record if the filter with
strengrh zero is used.

Change-Id: I25971a760f6e1638d5147fe30488c48125512b1a
2016-07-18 17:15:33 -07:00
Yaowu Xu
681ba36414 Merge "Merge changes from libvpx/master by cherry-pick" into nextgenv2 2016-07-18 22:43:40 +00:00
Sarah Parker
e03af51203 Merge "Add buf0, width, height fields to buf_2d" into nextgenv2 2016-07-18 22:40:37 +00:00
hui su
9a4702417a Extra round of subpel MV search around second best full-pixel MV
Keep track of the best and second best full pixel motion vector
candidates, and do subpel search around both of them.

Compression improvement:
lowres 0.22%   midres 0.23%   hdres 0.18%

No noticeable encoding speed changes observed on lowres test clips.

Change-Id: I5f4df2a03d1db061cfdfdba6138b27e9ea91f089
2016-07-18 12:25:24 -07:00
Sarah Parker
166c3250a3 Add buf0, width, height fields to buf_2d
These are needed for the warping function in the global motion
experiment.

Change-Id: Iaab176d0c0b90f6b938e2bac48b24c07e87e3cd9
2016-07-18 11:04:56 -07:00
Johann
2967bf355e Merge changes from libvpx/master by cherry-pick
This commit bring all up-to-date changes from master that are
applicable to nextgenv2. Due to the remove VP10 code in master,
we had to cherry pick the following commits to get those changes:

Add default flags for arm64/armv8 builds

Allows building simple targets with sane default flags.

For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
  --platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
  ~/libvpx/configure --target=arm64-linux-gcc --disable-multithread

BUG=webm:1143

vpx_lpf_horizontal_4_sse2: Remove dead load.

Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e

Fail early when android target does not include --sdk-path

Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c

configure: clean up var style and set_all usage

Use quotes whenever possible and {} always for variables.

Replace multiple set_all calls with *able_feature().

Conflicts:
	build/make/configure.sh

vp9-svc: Remove some unneeded code/comment.

datarate_test,DatarateTestLarge: normalize bits type

quiets a msvc warning:
conversion from 'const int64_t' to 'size_t', possible loss of data

mips added p6600 cpu support

Removed -funroll-loops

psnr.c: use int64_t for sum of differences

Since the values can be negative.

*.asm: normalize label format

add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.

BUG=b/29583530

Prevent negative variance

Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.

configure: remove old visual studio support (<2010)

BUG=b/29583530

Conflicts:
	configure

configure: restore vs_version variable

inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)

this prevents an empty CONFIG_VS_VERSION and avoids make failure

Require x86inc.asm

Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.

The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).

BUG=b:29583530

convolve_test: fix byte offsets in hbd build

CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.

+ factorized some common expressions

Conflicts:
	test/convolve_test.cc

vpx_dsp: remove x86inc.asm distinction

BUG=b:29583530

Conflicts:
	vpx_dsp/vpx_dsp.mk
	vpx_dsp/vpx_dsp_rtcd_defs.pl
	vpx_dsp/x86/highbd_variance_sse2.c
	vpx_dsp/x86/variance_sse2.c

test: remove x86inc.asm distinction

BUG=b:29583530

Conflicts:
	test/vp9_subtract_test.cc

configure: remove x86inc.asm distinction

BUG=b:29583530

Change-Id: I59a1192142e89a6a36b906f65a491a734e603617

Update vpx subpixel 1d filter ssse3 asm

Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.

improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw

Make set_reference control API work in VP9

Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30

Conflicts:
	examples.mk
	examples/vpx_cx_set_ref.c
	test/cx_set_ref.sh
	vp9/decoder/vp9_decoder.c

deblock filter : moved from vp8 code branch

The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.

vpx_thread.[hc]: update webp source reference

+ drop the blob hash, the updated reference will be updated in the
commit message

BUG=b/29583578

vpx_thread: use native windows cond var if available

BUG=b/29583578

original webp change:

commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 19:49:58 2015 -0800

    thread: use native windows cond var if available

    Vista / Server 2008 and up. no speed difference observed.

100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use InitializeCriticalSectionEx if available

BUG=b/29583578

original webp change:

commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:38:46 2015 -0800

    thread: use InitializeCriticalSectionEx if available

    Windows Vista / Server 2008 and up

100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use WaitForSingleObjectEx if available

BUG=b/29583578

original webp change:

commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:40:26 2015 -0800

    thread: use WaitForSingleObjectEx if available

    Windows XP and up

100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vpx_thread: use CreateThread for windows phone

BUG=b/29583578

original webp change:

commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date:   Mon Nov 23 20:41:26 2015 -0800

    thread: use CreateThread for windows phone

    _beginthreadex is unavailable for winrt/uwp

    Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d

100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h

vp9_postproc.c missing extern.

BUG=webm:1256

deblock: missing const on extern const.

postproc - move filling of noise buffer to vpx_dsp.

Fix encoder crashes for odd size input

clean-up vp9_intrapred_test

remove tuple and overkill VP9IntraPredBase class.

postproc: noise style fixes.

gtest-all.cc: quiet an unused variable warning

under windows / mingw builds

vp9_intrapred_test: follow-up cleanup

address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea

Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
2016-07-18 10:31:10 -07:00
Jingning Han
2ad40b89b3 Align the quantizers for inter/inter modes in the first pass coding
Use regular extended zero bin quantizer for both inter and intra
modes in the first pass. This doesn't affect lowres and midres
significantly, but would bring back 0.9% coding gains for hdres.

Change-Id: Ifa5977fa7b141fc5be595c0f3a4fc81a93f6606f
2016-07-18 10:16:03 -07:00
skal
87c2db8296 fix vp10_convolve() signatures
fortunately, the call site was calling the function with
the correct parameter order.

Change-Id: Ia48099c18288a2416c8b9a7062d2b8d417fd07df
2016-07-18 15:35:18 +00:00
Yaowu Xu
06c297bd1c Merge "Merge branch 'master' into nextgenv2" into nextgenv2 2016-07-15 04:45:53 +00:00
Yaowu Xu
6fe07a207b Merge branch 'master' into nextgenv2
Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265
2016-07-14 16:05:48 -07:00
Sarah Parker
010d4a8a93 Merge "Add new_quant quantization in rdopt for 4x4 blocks and intra" into nextgenv2 2016-07-14 22:15:33 +00:00
Debargha Mukherjee
5f8ea94c1f Remove unused zcoeff_blk
from PICK_MODE_CONTEXT and MACROBLOCK

Change-Id: I42f98ce51871948244bdcaaaeb3d0191622116ae
2016-07-14 12:36:03 -07:00
Sarah Parker
a6aed6e4b3 Add new_quant quantization in rdopt for 4x4 blocks and intra
Originally the uniform quantization function was not being
replaced with the new_quant version in rdopt when new_quant
is turned on. This fixes the bug.

Change-Id: I593793bb909e1e1a6f89544eeca6783fe0576f25
2016-07-14 11:25:13 -07:00
Hui Su
0c68db43ea Merge "Refactor codes about motion search" into nextgenv2 2016-07-14 00:13:47 +00:00
Jingning Han
75b3224a42 Merge "Fix highbd inter prediction filter sse4 overwriting issue" into nextgenv2 2016-07-13 21:35:29 +00:00
Jingning Han
edbbce8e61 Fix highbd inter prediction filter sse4 overwriting issue
Properly handle the case where the height is an integer multiple
of 4.

Change-Id: I11ac188c13f78db20902e2e333c60ce76ce837c5
2016-07-13 12:51:02 -07:00
Yue Chen
f2b34c3ad8 Merge "Optimize and cleanup obmc predictor and rd search." into nextgenv2 2016-07-13 18:40:49 +00:00
hui su
581636d767 Refactor codes about motion search
1. Add "best_mv" in MACROBLOCK to store the best motion vector
during motion search, so that we don't need to pass its pointer
to various motion search functions.

2. Declare some functions as static when possible.

3. Fix some indents.

Change-Id: I0778146c0866cbc55e245988c59222577ea8260e
2016-07-13 10:12:37 -07:00
Geza Lore
4c4f04ac11 Optimize and cleanup obmc predictor and rd search.
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the obmc predictor. Clean up calc_target_weighted_pred.

Encoder speedup: 1.3%
Decoder speedup: 6.5%

Change-Id: I0c774fe53d22399e92a10d1daf3af0010d88d2c5
2016-07-13 16:54:20 +00:00
Aamir Anis
15aaa601bd Merge "Fix for loop filter selection procedure" into nextgenv2 2016-07-12 23:56:37 +00:00
Aamir Anis
8575709f97 Fix for loop filter selection procedure
Fixed best error reported by loop filter selection, this value is used
during loop restoration to pick best mode. Baseline remains unchanged,
change in BDRate for loop restoration experiment:
-0.628 -> -0.625 for lowres,
-1.262 -> -1.283 for highres.

Change-Id: I69ef1608bc232b250ac46f59e31fdbed1a999dcd
2016-07-12 15:01:07 -07:00
Yi Luo
fde48c980a Merge "HBD convolution filtering (10/12 taps) SSE4.1 optimization" into nextgenv2 2016-07-12 19:28:48 +00:00
Yi Luo
8cacca73bf HBD convolution filtering (10/12 taps) SSE4.1 optimization
- For experiment EXT_INTERP under high bit depth.
- Add unit test to verify bit-exact.
- Speed performance improvement:
  On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
  drops from 6682503 ms to 5390270 ms.

Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
2016-07-12 10:13:30 -07:00
James Zern
b8a28fbb3a Merge changes from topic 'missing-proto' into nextgenv2
* changes:
  vp10/encoder/rdopt.c: make a function static
  vp10/encoder/rd.c: make a function static
  vp10_convolve_ssse3.c: make some functions static
  vp10/encoder/bitstream.[hc]: correct a prototype
  vp10/common/idct.h: add some missing prototypes
  highbd_quantize_intrin_sse2.c: add missing rtcd include
  vp10: add some missing includes
2016-07-12 02:39:24 +00:00
Yue Chen
4ff6d13771 Merge "Cosmetics for vp10/common/vp10_rtcd_defs.pl" into nextgenv2 2016-07-12 01:21:33 +00:00
James Zern
849e990779 vp10/encoder/rdopt.c: make a function static
+ remove vp10_ prefix

quiets a -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: I8821c38009b90296280f9b14233e73c92076e81f
2016-07-11 16:52:11 -07:00
James Zern
0baa08336a vp10/encoder/rd.c: make a function static
+ remove vp10_ prefix

quiets a -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: I6b5d71f8120a6d1fee4c782beb4c6d6eef980f65
2016-07-11 16:52:10 -07:00
James Zern
08bd57ef0d vp10_convolve_ssse3.c: make some functions static
quiets -Wmissing-prototypes warnings

BUG=b/29584271

Change-Id: I4d2eb7f4b45d7b829421976641b3212bcf29e7dd
2016-07-11 16:52:10 -07:00
James Zern
3c127f2e36 vp10/encoder/bitstream.[hc]: correct a prototype
quiets a -Wmissing-prototypes warning

BUG=b/29584271

Change-Id: I91aba2a75dccd6752bdf91837564c2aa45817c09
2016-07-11 16:52:09 -07:00
James Zern
9bf5a1ab46 vp10/common/idct.h: add some missing prototypes
quiets the warning of the same name

BUG=b/29584271

Change-Id: I220cd58e1060f77e3910472fed1b167add3a08f8
2016-07-11 16:52:08 -07:00
James Zern
bc4341fd94 vp10: add some missing includes
quiets some -Wmissing-prototypes warnings

BUG=b/29584271

Change-Id: I9174728459fcabb6d9ac0028ae58029e52c0da92
2016-07-11 16:52:07 -07:00
Yue Chen
68e19472c1 Cosmetics for vp10/common/vp10_rtcd_defs.pl
Change-Id: Iaf8c6f0b1e340f0406df2871a3dc2ded19b7009a
2016-07-11 23:41:30 +00:00
Debargha Mukherjee
5041ff4921 Merge "Add a few branch hints to vp10_optimize_b." into nextgenv2 2016-07-11 22:30:33 +00:00
Debargha Mukherjee
6770c7361e Merge "Optimize and cleanup supertx predictor." into nextgenv2 2016-07-11 22:30:16 +00:00
Debargha Mukherjee
6bbadfb303 Merge "Improve vpx_blend_* functions." into nextgenv2 2016-07-11 19:30:04 +00:00
Geza Lore
cd489264e1 Optimize and cleanup supertx predictor.
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the supertx predictor.

Decoder speedup of up to 4% has been observed.

Change-Id: I255a5ba4cc24f78dc905d25b6e2f7fbafac13253
2016-07-11 18:14:21 +00:00
Geza Lore
bfa59b4a5f Improve vpx_blend_* functions.
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
  indicative that the function does alpha blending. The 6, or 6b
  suffix was misleading, as the max mask value (64) does not fit into
  6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
  the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
  masks directly and apply them to all rows/columns
  (vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
  horizontal version now falls back on the 2D version. This can be
  improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
  (ie: a single pixel). This is for usage convenience. The SSE4.1
  optimized versions fall back on the C implementation if
  w <= 2 or h <= 2. This can again be improved if it becomes hot code.

Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
2016-07-11 19:05:17 +01:00
Pascal Massimino
e5fb2d4e93 remove ROUNDZ_* macros in favor of just ROUND_* ones
Change-Id: I263088be8d71018deb9cc6a9d2c66307770b824d
2016-07-11 06:27:41 -07:00
Debargha Mukherjee
5d28183fcf Merge "Refactor and clean up on blend_mask6" into nextgenv2 2016-07-09 06:50:32 +00:00
Yue Chen
5b25323c25 Merge "Fix assertion failures in mips+msa setting" into nextgenv2 2016-07-09 01:07:27 +00:00
Yue Chen
4ab19eac62 Fix assertion failures in mips+msa setting
Directly call c functions, otherwise when EXT_TX is enabled, hybrid
transform other than combination of DCT/ADST has not been implemented, thus
will cause assertion failures in the switch loops in vp10_fhtnxn_msa() and
vp10_ihtnxn_nxn_add_msa().

BUG=webm:1239

Change-Id: I2379a07e5406f9489edcd2f3205682f679c9b091
2016-07-08 17:13:52 -07:00
Jingning Han
9c4b041a80 Merge "Properly reset rate and distortion value for zero pred residual case" into nextgenv2 2016-07-08 22:21:27 +00:00
Debargha Mukherjee
72ef6d7704 Refactor and clean up on blend_mask6
Change-Id: Ie9188471e7dc07ab9c95b22f258b1662e895c533
2016-07-08 15:02:57 -07:00
Jingning Han
985dd03ff7 Merge "Integrate ext-interp into dual filter framework" into nextgenv2 2016-07-08 18:25:14 +00:00
Geza Lore
0b9b3d8643 Add a few branch hints to vp10_optimize_b.
vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.

Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.

Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
2016-07-08 19:20:35 +01:00
Sarah Parker
6c56def33e Merge "Make new_quant bin widths to be uniform" into nextgenv2 2016-07-08 17:40:55 +00:00
Jingning Han
e3a2aeb05d Integrate ext-interp into dual filter framework
The combination of the two experiments improves the compression
performance gains:

lowres 2.5%
midres 2.1%

Change-Id: Id26c0a9474ce08893aa1d946365c7ff850fab57a
2016-07-08 16:38:59 +00:00