Commit Graph

3085 Commits

Author SHA1 Message Date
hui su
5afc4e4c77 Fix some typos.
Change-Id: I32aacd014df6c927cf2893dc096cbe6ec7604b9b
2016-01-27 16:12:49 -08:00
Scott LaVarnway
5232326716 VP9: Eliminate MB_MODE_INFO
Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185
2016-01-19 16:40:20 -08:00
Scott LaVarnway
d4bc17d696 Merge "VP9: inline vp9_use_mv_hp()" 2016-01-14 13:36:40 +00:00
Scott LaVarnway
a85e552d95 VP9: Remove decoder args from find_mv_refs_idx()
The decoder does not use this function.

Change-Id: Ie67f909c0f4108ef286789c70df867d4b960a780
2016-01-13 13:30:40 -08:00
Scott LaVarnway
de993a847f VP9: inline vp9_use_mv_hp()
Change-Id: Ib275bfc4c29c572d6c70e5ec6dbfc241590d3e3e
2016-01-13 08:02:05 -08:00
Scott LaVarnway
15939cb2d7 Merge "VP9: Eliminate unnecessary nearest/near searches" 2016-01-12 20:00:59 +00:00
Scott LaVarnway
d8aa40634a VP9: Eliminate unnecessary nearest/near searches
Prior to this patch, read_inter_block_mode_info() would
find the nearmv and nearestmv for all modes.  Now it does not
search for ZEROMV modes and breaks out early for NEARMV and
NEWMV modes.

Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd
2016-01-12 05:09:06 -08:00
Yaowu Xu
2bd4f44409 Assert no mv clamping for scaled references
Under --enable-better-hw-compabibility, this commit adds the asserts
that no mv clamping is applied for scaled references, so when built
with this configure option, decoder will assert if an input bitstream
triggger mv clamping for scaled reference frames.

Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb
2016-01-05 14:55:05 -08:00
Yaowu Xu
ce6d3f1de4 Merge "Assert no 8x4/4x8 partition for scaled references" 2016-01-05 20:35:46 +00:00
Yaowu Xu
03a021a6fc Assert no 8x4/4x8 partition for scaled references
This commit adds a new configure option:

--enable-better-hw-compatibility

The purpose of the configure option is to provide information on known
hardware decoder implementation bugs, so encoder implementers may
choose to implement their encoders in a way to avoid triggering these
decoder bugs.

The WebM team were made aware of that a number of hardware decoders
have trouble in handling the combination of scaled frame reference
frame and 8x4 or 4x8 partitions. This commit added asserts to vp9
decoder, so when built with above configure option, the decoder can
assert if an input bitstream triggers such decoder bug.

Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf
2016-01-04 18:33:37 -08:00
James Zern
d36659cec7 move vp9_avg to vpx_dsp
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Jacky Chen
d9bba21306 Merge "Add vp9_avg_4x4_neon and the unit test." 2015-12-09 06:09:33 +00:00
jackychen
303f144eef Add vp9_avg_4x4_neon and the unit test.
Change-Id: I3ef9a9648841374ed3cc865a02053c14ad821a20
2015-12-08 17:23:36 -08:00
Scott LaVarnway
f0b0b1fe62 VP9: Add ssse3 version of vpx_idct32x32_135_add()
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00
James Zern
fd51d90159 Merge changes Iaf8cbe95,I6748183d,I2a49811d
* changes:
  add vp9_satd_neon
  fix vp9_satd_sse2
  vp9_satd: return an int
2015-11-25 01:48:53 +00:00
James Zern
eb1d0f8d60 add vp9_satd_neon
~60-65% faster at the function level across block sizes

Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
2015-11-24 16:09:10 -08:00
Alex Converse
4b038ad2ef Merge "Deduplicate some high bit depth tables" 2015-11-24 18:24:32 +00:00
James Zern
60760f710f fix vp9_satd_sse2
accumulate satd in 32-bits
+ add unit test

Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
2015-11-20 14:35:46 -08:00
James Zern
3e0138edb7 vp9_satd: return an int
the final sum may use up to 26 bits

+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit

Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
2015-11-20 14:35:38 -08:00
paulwilkins
0149fb3d6b Changes to exhaustive motion search.
This change alters the nature and use of exhaustive motion search.

Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.

Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.

For example:
  stage 1: Range +/- 64 interval 4
  stage 2: Range +/- 32 interval 2
  stage 3: Range +/- 15 interval 1

This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.

This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained  a bug (the two searches used different distortion
metrics).

For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.

Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.

Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
2015-11-13 10:16:31 +00:00
Geza Lore
5eefd3ebfd Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
2015-11-11 14:03:47 +00:00
James Zern
30466f26b4 Revert "Add AVX vectorized vp9_diamond_search_sad"
This reverts commit f1342a7b07.

This breaks 32-bit builds:
 runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment

+ _mm_set1_epi64x is incompatible with some versions of visual studio

Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
2015-11-06 13:15:01 -08:00
Yunqing Wang
57cae22c1e Merge "Add AVX vectorized vp9_diamond_search_sad" 2015-11-05 20:17:13 +00:00
Geza Lore
f1342a7b07 Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
2015-11-05 10:02:17 +00:00
Alex Converse
246e0eaa71 Deduplicate some high bit depth tables
Change-Id: I6977f7d155cc1e81ae2393933893caac6770821f
2015-11-03 15:40:44 -08:00
hui su
e085fb643f Generate intra prediction reference values only when necessary
This can help increase encoding speed substantially.

Change-Id: Id0c009146e6e74d9365add71c7b10b9a57a84676
2015-11-02 10:26:50 -08:00
Alex Converse
989193c797 Make the zero handling in extend_to_full_distribution more explicit.
The old workaround "p = 0 ? 0 : p -1" is misleading.

?: happens before =
assigning back to p truncates to one byte.

Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.

https://bugs.chromium.org/p/webm/issues/detail?id=1089

Change-Id: I587c44dd61c1f3767543c0126376f881889935af
2015-10-29 14:46:55 -07:00
Alex Converse
663960e757 Revert "Replace the zero handling in extend_to_full_distribution."
This reverts commit 7f56cb2978.

It causes uninitialized reads in the first pass setting up later cost tables.

Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
2015-10-28 11:51:40 -07:00
Alex Converse
7f56cb2978 Replace the zero handling in extend_to_full_distribution.
The old workaround "p = 0 ? 0 : p -1" is misleading.

?: happens before =
assigning back to p truncates to one byte.

Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.

https://code.google.com/p/webm/issues/detail?id=1089

Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
2015-10-26 11:29:46 -07:00
Geza Lore
aa8f85223b Optimize vp9_highbd_block_error_8bit assembly.
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.

Further 2 optimizations are applied:

1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.

The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.

Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
2015-10-21 12:30:40 +01:00
Yaowu Xu
568429512e Add a new enum type vpx_color_range_t
to make meaning of color_range obvious.

Change-Id: I303582e448b82b3203b497e27b22601cc718dfff
2015-10-16 16:27:18 -07:00
Geza Lore
0134764fa6 Optimization of 8bit block error for high bitdepth
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.

Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
2015-10-08 14:05:25 -07:00
Alex Converse
2f7f482c77 vp9: simplify extrabits encoding
Change-Id: I5a2abd35cb303d8f6354b3119ab95acf90405116
2015-10-06 16:26:08 -07:00
Debargha Mukherjee
cb5c47f20d Merge "Accelerated transform in high bit depth" 2015-10-02 06:55:55 +00:00
hui su
06bdc7f6db Small cleanup
Change-Id: I5aeaa94b743f84738d288f8b027fec4c164f2ec3
2015-10-01 11:19:13 -07:00
Scott LaVarnway
2f8625d824 VP9: remove plane_type from macroblockd_plane
Change-Id: Ia5072a3a92212d8565f33359f6c146469bdfbbec
2015-09-30 15:15:11 -07:00
Scott LaVarnway
13888e0eef Merge "VP9: remove plane_type checks in loopfilter functions" 2015-09-30 22:11:21 +00:00
James Zern
a18cc591a5 vp9_loopfilter: remove unnecessary masks
Change-Id: I264e75bf3ddd083ee5311c50a37fb18fe634ddc3
2015-09-30 12:12:53 -07:00
James Zern
a1914dbb31 vp9_reset_lfm: harmonize function signature
Change-Id: Ifb0f41fb43564a777be29b4c66443b366fa146a3
2015-09-29 20:46:37 -07:00
Scott LaVarnway
18373264d9 VP9: remove plane_type checks in loopfilter functions
vp9_filter_block_plane_ss11() and vp9_filter_block_plane_non420()
are only called for the uv planes.

Change-Id: Iacd3b3242c8ce581edd37c8f06d95efc8a0f88a3
2015-09-29 15:54:33 -07:00
Scott LaVarnway
66de2b710f Merge "VP9: move loopfilter build masks to decode loop" 2015-09-29 21:40:48 +00:00
Yaowu Xu
45948a03c0 Fix a macro definition
to be consistent with the head file name.

Change-Id: I9634332a2b3fac7e7f3b7ef58821ea7c81c5c813
2015-09-29 09:34:42 -07:00
Scott LaVarnway
7718117104 VP9: move loopfilter build masks to decode loop
The loopfilter masks are now built in the decode loop.
This is done so we can eventually reduce the number of
MODE_INFO structs required by the decoder.

The encoder builds the masks for the entire frame prior
to calling the loopfilter.

Change-Id: Ia2146b07e0acb8c50203e586dfae0c4c5b316f11
2015-09-29 05:20:49 -07:00
Julia Robson
406030d1b0 Accelerated transform in high bit depth
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.

Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-09-28 21:09:16 -07:00
Ronald S. Bultje
36ffe64498 Rename display_{size,width,height} to render_*.
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.

Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d
2015-09-25 21:34:29 -04:00
Scott LaVarnway
5404978825 VP9: Remove frame_parallel_decoding_mode from macroblockd
Not used.

Change-Id: I71527d0ee43a5730f1a2527e7ab687a77a137db4
2015-09-23 16:06:46 -07:00
James Zern
9d8decc162 Merge changes from topic 'tile-thread-cleanup'
* changes:
  vp9/decode_tiles_mt: move frame count accum from loop
  VP9Decoder: remove duplicate tile_worker_info
  vp9/decode_tiles_mt: move some inits from inner loop
  vp9_accumulate_frame_counts: pass counts directly
2015-09-17 22:00:23 +00:00
Ronald S. Bultje
eeb5ef0a24 Add support for color-range.
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).

See issue 1059.

Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
2015-09-16 06:41:46 -04:00
James Zern
b09aa3ac54 vp9: add extern "C" to headers
Change-Id: I1b6927ad820f99340985b094d415aaab14defaf4
2015-09-09 23:15:59 -07:00
Jingning Han
42b0560319 Fix the sub8x8 block inter prediction with scaled reference frame
Sync the encoder's buffer offset calculation for sub8x8 block motion
compensated prediction with scaled reference frame to match the
decoder's behavior. This resolves an enc/dec mismatch issue when
sub8x8 inter mode with scaled is turned on.

Change-Id: I4bab3672b007a5ae0c992f8a701341892d2458b0
2015-09-08 11:09:30 -07:00