Commit Graph

3283 Commits

Author SHA1 Message Date
Scott LaVarnway
d4bc17d696 Merge "VP9: inline vp9_use_mv_hp()" 2016-01-14 13:36:40 +00:00
Scott LaVarnway
a85e552d95 VP9: Remove decoder args from find_mv_refs_idx()
The decoder does not use this function.

Change-Id: Ie67f909c0f4108ef286789c70df867d4b960a780
2016-01-13 13:30:40 -08:00
Scott LaVarnway
de993a847f VP9: inline vp9_use_mv_hp()
Change-Id: Ib275bfc4c29c572d6c70e5ec6dbfc241590d3e3e
2016-01-13 08:02:05 -08:00
Scott LaVarnway
15939cb2d7 Merge "VP9: Eliminate unnecessary nearest/near searches" 2016-01-12 20:00:59 +00:00
Scott LaVarnway
d8aa40634a VP9: Eliminate unnecessary nearest/near searches
Prior to this patch, read_inter_block_mode_info() would
find the nearmv and nearestmv for all modes.  Now it does not
search for ZEROMV modes and breaks out early for NEARMV and
NEWMV modes.

Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd
2016-01-12 05:09:06 -08:00
Yaowu Xu
2bd4f44409 Assert no mv clamping for scaled references
Under --enable-better-hw-compabibility, this commit adds the asserts
that no mv clamping is applied for scaled references, so when built
with this configure option, decoder will assert if an input bitstream
triggger mv clamping for scaled reference frames.

Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb
2016-01-05 14:55:05 -08:00
Yaowu Xu
ce6d3f1de4 Merge "Assert no 8x4/4x8 partition for scaled references" 2016-01-05 20:35:46 +00:00
Yaowu Xu
03a021a6fc Assert no 8x4/4x8 partition for scaled references
This commit adds a new configure option:

--enable-better-hw-compatibility

The purpose of the configure option is to provide information on known
hardware decoder implementation bugs, so encoder implementers may
choose to implement their encoders in a way to avoid triggering these
decoder bugs.

The WebM team were made aware of that a number of hardware decoders
have trouble in handling the combination of scaled frame reference
frame and 8x4 or 4x8 partitions. This commit added asserts to vp9
decoder, so when built with above configure option, the decoder can
assert if an input bitstream triggers such decoder bug.

Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf
2016-01-04 18:33:37 -08:00
James Zern
d36659cec7 move vp9_avg to vpx_dsp
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Jacky Chen
d9bba21306 Merge "Add vp9_avg_4x4_neon and the unit test." 2015-12-09 06:09:33 +00:00
jackychen
303f144eef Add vp9_avg_4x4_neon and the unit test.
Change-Id: I3ef9a9648841374ed3cc865a02053c14ad821a20
2015-12-08 17:23:36 -08:00
Scott LaVarnway
f0b0b1fe62 VP9: Add ssse3 version of vpx_idct32x32_135_add()
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00
James Zern
fd51d90159 Merge changes Iaf8cbe95,I6748183d,I2a49811d
* changes:
  add vp9_satd_neon
  fix vp9_satd_sse2
  vp9_satd: return an int
2015-11-25 01:48:53 +00:00
James Zern
eb1d0f8d60 add vp9_satd_neon
~60-65% faster at the function level across block sizes

Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
2015-11-24 16:09:10 -08:00
Alex Converse
4b038ad2ef Merge "Deduplicate some high bit depth tables" 2015-11-24 18:24:32 +00:00
James Zern
60760f710f fix vp9_satd_sse2
accumulate satd in 32-bits
+ add unit test

Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
2015-11-20 14:35:46 -08:00
James Zern
3e0138edb7 vp9_satd: return an int
the final sum may use up to 26 bits

+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit

Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
2015-11-20 14:35:38 -08:00
paulwilkins
0149fb3d6b Changes to exhaustive motion search.
This change alters the nature and use of exhaustive motion search.

Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.

Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.

For example:
  stage 1: Range +/- 64 interval 4
  stage 2: Range +/- 32 interval 2
  stage 3: Range +/- 15 interval 1

This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.

This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained  a bug (the two searches used different distortion
metrics).

For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.

Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.

Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
2015-11-13 10:16:31 +00:00
Geza Lore
5eefd3ebfd Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
2015-11-11 14:03:47 +00:00
James Zern
30466f26b4 Revert "Add AVX vectorized vp9_diamond_search_sad"
This reverts commit f1342a7b07.

This breaks 32-bit builds:
 runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment

+ _mm_set1_epi64x is incompatible with some versions of visual studio

Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
2015-11-06 13:15:01 -08:00
Yunqing Wang
57cae22c1e Merge "Add AVX vectorized vp9_diamond_search_sad" 2015-11-05 20:17:13 +00:00
Geza Lore
f1342a7b07 Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
2015-11-05 10:02:17 +00:00
Alex Converse
246e0eaa71 Deduplicate some high bit depth tables
Change-Id: I6977f7d155cc1e81ae2393933893caac6770821f
2015-11-03 15:40:44 -08:00
hui su
e085fb643f Generate intra prediction reference values only when necessary
This can help increase encoding speed substantially.

Change-Id: Id0c009146e6e74d9365add71c7b10b9a57a84676
2015-11-02 10:26:50 -08:00
Alex Converse
989193c797 Make the zero handling in extend_to_full_distribution more explicit.
The old workaround "p = 0 ? 0 : p -1" is misleading.

?: happens before =
assigning back to p truncates to one byte.

Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.

https://bugs.chromium.org/p/webm/issues/detail?id=1089

Change-Id: I587c44dd61c1f3767543c0126376f881889935af
2015-10-29 14:46:55 -07:00
Alex Converse
663960e757 Revert "Replace the zero handling in extend_to_full_distribution."
This reverts commit 7f56cb2978.

It causes uninitialized reads in the first pass setting up later cost tables.

Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
2015-10-28 11:51:40 -07:00
Alex Converse
7f56cb2978 Replace the zero handling in extend_to_full_distribution.
The old workaround "p = 0 ? 0 : p -1" is misleading.

?: happens before =
assigning back to p truncates to one byte.

Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.

https://code.google.com/p/webm/issues/detail?id=1089

Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
2015-10-26 11:29:46 -07:00
Geza Lore
aa8f85223b Optimize vp9_highbd_block_error_8bit assembly.
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.

Further 2 optimizations are applied:

1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.

The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.

Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
2015-10-21 12:30:40 +01:00
Yaowu Xu
568429512e Add a new enum type vpx_color_range_t
to make meaning of color_range obvious.

Change-Id: I303582e448b82b3203b497e27b22601cc718dfff
2015-10-16 16:27:18 -07:00
Geza Lore
0134764fa6 Optimization of 8bit block error for high bitdepth
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.

Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
2015-10-08 14:05:25 -07:00
Alex Converse
2f7f482c77 vp9: simplify extrabits encoding
Change-Id: I5a2abd35cb303d8f6354b3119ab95acf90405116
2015-10-06 16:26:08 -07:00
Debargha Mukherjee
cb5c47f20d Merge "Accelerated transform in high bit depth" 2015-10-02 06:55:55 +00:00
hui su
06bdc7f6db Small cleanup
Change-Id: I5aeaa94b743f84738d288f8b027fec4c164f2ec3
2015-10-01 11:19:13 -07:00
Scott LaVarnway
2f8625d824 VP9: remove plane_type from macroblockd_plane
Change-Id: Ia5072a3a92212d8565f33359f6c146469bdfbbec
2015-09-30 15:15:11 -07:00
Scott LaVarnway
13888e0eef Merge "VP9: remove plane_type checks in loopfilter functions" 2015-09-30 22:11:21 +00:00
James Zern
a18cc591a5 vp9_loopfilter: remove unnecessary masks
Change-Id: I264e75bf3ddd083ee5311c50a37fb18fe634ddc3
2015-09-30 12:12:53 -07:00
James Zern
a1914dbb31 vp9_reset_lfm: harmonize function signature
Change-Id: Ifb0f41fb43564a777be29b4c66443b366fa146a3
2015-09-29 20:46:37 -07:00
Scott LaVarnway
18373264d9 VP9: remove plane_type checks in loopfilter functions
vp9_filter_block_plane_ss11() and vp9_filter_block_plane_non420()
are only called for the uv planes.

Change-Id: Iacd3b3242c8ce581edd37c8f06d95efc8a0f88a3
2015-09-29 15:54:33 -07:00
Scott LaVarnway
66de2b710f Merge "VP9: move loopfilter build masks to decode loop" 2015-09-29 21:40:48 +00:00
Yaowu Xu
45948a03c0 Fix a macro definition
to be consistent with the head file name.

Change-Id: I9634332a2b3fac7e7f3b7ef58821ea7c81c5c813
2015-09-29 09:34:42 -07:00
Scott LaVarnway
7718117104 VP9: move loopfilter build masks to decode loop
The loopfilter masks are now built in the decode loop.
This is done so we can eventually reduce the number of
MODE_INFO structs required by the decoder.

The encoder builds the masks for the entire frame prior
to calling the loopfilter.

Change-Id: Ia2146b07e0acb8c50203e586dfae0c4c5b316f11
2015-09-29 05:20:49 -07:00
Julia Robson
406030d1b0 Accelerated transform in high bit depth
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.

Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-09-28 21:09:16 -07:00
Ronald S. Bultje
36ffe64498 Rename display_{size,width,height} to render_*.
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.

Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d
2015-09-25 21:34:29 -04:00
Scott LaVarnway
5404978825 VP9: Remove frame_parallel_decoding_mode from macroblockd
Not used.

Change-Id: I71527d0ee43a5730f1a2527e7ab687a77a137db4
2015-09-23 16:06:46 -07:00
James Zern
9d8decc162 Merge changes from topic 'tile-thread-cleanup'
* changes:
  vp9/decode_tiles_mt: move frame count accum from loop
  VP9Decoder: remove duplicate tile_worker_info
  vp9/decode_tiles_mt: move some inits from inner loop
  vp9_accumulate_frame_counts: pass counts directly
2015-09-17 22:00:23 +00:00
Ronald S. Bultje
eeb5ef0a24 Add support for color-range.
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).

See issue 1059.

Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
2015-09-16 06:41:46 -04:00
James Zern
b09aa3ac54 vp9: add extern "C" to headers
Change-Id: I1b6927ad820f99340985b094d415aaab14defaf4
2015-09-09 23:15:59 -07:00
Jingning Han
42b0560319 Fix the sub8x8 block inter prediction with scaled reference frame
Sync the encoder's buffer offset calculation for sub8x8 block motion
compensated prediction with scaled reference frame to match the
decoder's behavior. This resolves an enc/dec mismatch issue when
sub8x8 inter mode with scaled is turned on.

Change-Id: I4bab3672b007a5ae0c992f8a701341892d2458b0
2015-09-08 11:09:30 -07:00
James Zern
0548046ae3 vp9_accumulate_frame_counts: pass counts directly
Change-Id: Ic3c6cfba5b1867c335f2834da936e20caec8597a
2015-09-04 19:47:33 -07:00
Johann
c5f11912ae Include vpx_dsp_common.h when using VPXMIN/MAX
Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee
2015-08-31 14:36:35 -07:00
James Zern
5e16d397bd vpx_dsp_common: add VPX prefix to MIN/MAX
prevents redeclaration warnings;
vp8 has its own define which will be resolved in a future commit

Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c
2015-08-26 20:11:32 -07:00
Jingning Han
89af744ba6 Change vp9_ prefix function names in vpx_scale to vpx_
Change-Id: Iac85902cbbb3e752801dc85de9a3c778e47304aa
2015-08-14 15:27:43 -07:00
hui su
088b05fd99 Use sizeof(variable) instead of sizeof(type)
Change-Id: Ia069da11eebb271063e9eb837bdb3e7175ecce13
2015-08-12 11:25:38 -07:00
Scott LaVarnway
4ef08dcec8 Merge "VPX: Add rtcd support for scaling." 2015-08-11 13:19:00 +00:00
Alex Converse
a8a08ce57e Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.h
Use system_state.h in vpx_dsp and remove unneeded includes of
vp9_systemdependent.h.

Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
2015-08-10 15:37:14 -07:00
Jingning Han
244912d506 Make build_inter_predictors static function
Remove the function declaration from vp9_reconinter.h file.

Change-Id: I193562151b69ece19b9ee2efa1a791fe2522cca0
2015-08-10 15:51:13 +00:00
Alex Converse
f2e44aa664 Move the msvc round() replacement to msvc.h
Change-Id: If470411c3c62a27f52261f4ece2c5054b71789c7
2015-08-07 18:27:48 -07:00
Alex Converse
610e258cc5 Make the round() replacement match C99 and POSIX.
http://pubs.opengroup.org/onlinepubs/009695399/functions/round.html

Change-Id: Idf387d944d36bf593f8797db9053e11e5c9b9b39
2015-08-07 18:24:21 -07:00
Jingning Han
a9aa29d901 Merge "Add static syntax to copy_mem64x64" 2015-08-07 21:41:32 +00:00
Jingning Han
1057ee4847 Add static syntax to copy_mem64x64
Change-Id: Iee4c853ea4a44ae9f5de60c09e5a7b810f15d2dd
2015-08-07 10:16:27 -07:00
Aℓex Converse
eaa8043a31 Merge "Move VP9 SSIM metrics to vpx_dsp." 2015-08-07 16:43:28 +00:00
Alex Converse
c7b7011b9b Move VP9 SSIM metrics to vpx_dsp.
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
2015-08-06 18:25:25 -07:00
Jingning Han
b4f2c567c8 Cosmetic - align format in vp9
Change-Id: I83ed3422f1f4009675ad2f5c4b7236bc7b83b30e
2015-08-06 15:56:11 -07:00
Jingning Han
3ad75fc623 Merge "Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names" 2015-08-04 22:30:36 +00:00
Jingning Han
08a453b9de Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.

Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
2015-08-04 13:46:11 -07:00
Jingning Han
457a87d986 Merge "Move inverse transfrom dspr2 functions from vp9 to vpx_dsp" 2015-08-04 04:16:22 +00:00
James Zern
a0fd7a9831 Merge "add vp9_vector_var_neon" 2015-08-04 02:30:41 +00:00
Jingning Han
bfad9d2fe6 Move inverse transfrom dspr2 functions from vp9 to vpx_dsp
Change-Id: Ia9cf7c31cab4ba3dd6b9bb668c4b3e84bd55cf69
2015-08-03 11:59:50 -07:00
Jingning Han
92b08f516a Add common_dspr2.c file to vpx_dsp/mips
Move the declaration of commonly referenced variable to
vpx_dsp/mips/common_dspr2.c.

Change-Id: Ia51287b02e2ac5cfae0fba98c721f0810618f28e
2015-08-03 10:53:47 -07:00
Scott LaVarnway
8f6b943100 VPX: Add rtcd support for scaling.
Change-Id: If34bfb0d918967445aea7dc30cd7b55ebfedb1f2
2015-08-03 09:43:34 -07:00
Jingning Han
0b0eba728d Add _dspr2 to local function names
It avoids symbol conflicts between function names of various
implementation versions.

Change-Id: Iad79ebcb8e289457801812a7745c8380b5b06a46
2015-08-02 20:21:59 -07:00
Jingning Han
44849516d4 Factor out mips/msa inverse transform implementations
Move mips/msa inverse transform implementations from vp9 folder to
vpx_dsp.

Change-Id: Ic4cf3f05247c3c63db7b532a0e5000017a962391
2015-08-01 09:25:12 -07:00
Jingning Han
b37494cfb5 Merge "Use precise header files in inverse transform msa implementations" 2015-08-01 16:20:43 +00:00
James Zern
7dc5a689b4 add vp9_vector_var_neon
~50-60% faster depending on the width

Change-Id: I9d007cfa10b9aaa2169c8c009d95522df6123a92
2015-07-31 17:31:58 -07:00
Jingning Han
56c2cb7553 Use precise header files in inverse transform msa implementations
Change-Id: Ie8a79d9e2837842c3f60776b661cd42782b108d5
2015-07-31 23:24:54 +00:00
Jingning Han
e8b133c79c Factor inverse transform functions into vpx_dsp
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.

Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31 16:21:00 -07:00
Zoe Liu
7cfdc00337 Refactor mips/dspr2 on convolution.
Change-Id: If59a39d5a92c261537342726f94bb7f7f26dfff3
2015-07-31 10:27:42 -07:00
Zoe Liu
7186a2dd86 Code refactor on InterpKernel
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.

Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
2015-07-31 10:27:33 -07:00
James Zern
f42012e526 Merge "add vp9_block_error_fp_neon" 2015-07-29 00:47:09 +00:00
Hui Su
4cbf36b105 Merge "Replace prefix vp9_ with vpx_ for intra prediction functions" 2015-07-29 00:38:48 +00:00
Jingning Han
fc18cf7a11 Merge "Move DC only forward 2D-DCT functions to vpx_dsp" 2015-07-29 00:06:37 +00:00
Aℓex Converse
08d5cf226e Merge "Remove branch in inner loop of foreach_transformed_block_in_plane()" 2015-07-28 21:59:33 +00:00
Jingning Han
d19033fa4e Move DC only forward 2D-DCT functions to vpx_dsp
This completes the forward transform functions layout refactoring.

Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
2015-07-28 14:52:30 -07:00
Hui Su
fe7cabe8b6 Merge "Move intra prediction functions from vp9/common/ to vpx_dsp/" 2015-07-28 20:41:01 +00:00
Jingning Han
a73f0f4170 Merge "Factor 32x32 fwd DCT to vpx_dsp folder" 2015-07-28 20:36:59 +00:00
Jingning Han
a6a4659bea Factor 32x32 fwd DCT to vpx_dsp folder
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/.

Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
2015-07-28 11:13:41 -07:00
Frank Galligan
b1fb6e0365 Fix dspr2 build.
Change-Id: I18895c29d6db872d033b3874de9dcd9501d0c10e
2015-07-28 09:05:41 -07:00
James Zern
ea990af7f5 add vp9_block_error_fp_neon
~60-70% faster depending on the block size

Change-Id: Icdbaa9977a91a63cbcc6ead0cf19d5a2af7f27e1
2015-07-27 19:59:50 -07:00
hui su
4013645353 Replace prefix vp9_ with vpx_ for intra prediction functions
Change-Id: I8ae6fb586f8d5d018ace228df11714f82b085076
2015-07-27 13:42:06 -07:00
hui su
7971846a5e Move intra prediction functions from vp9/common/ to vpx_dsp/
Change-Id: I64edc26cf4aab050c83f2d393df6250628ad43b8
2015-07-27 13:38:16 -07:00
Jingning Han
5f214d6bca Use common coefficient definition in neon idct implementations
Replace the duplicate coefficient definition in neon implementations
of inverse transform with those from vpx_dsp/txfm_common.h

Change-Id: I4cd9bd9569ab1793dfdbb6f16d80bcb581599f0d
2015-07-27 12:12:31 -07:00
Jingning Han
a9a1d4e8e5 Replace vp9_idct.h for precise dependency
This commit replaces vp9_idct.h with txfm_common.h in many SIMD
implementation files for precise file dependency.

Change-Id: If73dd726bb16537e7494f28538b0a169810f9756
2015-07-27 11:55:31 -07:00
Jingning Han
5ebc8febdc Refactor vp9_idct.h file
Separate the common coefficient constant into vpx_dsp/txfm_common.h.
Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
This clears the use case of vp9_idct.h in vpx_dsp folder.

Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
2015-07-26 08:26:32 -07:00
Alex Converse
742021f026 Remove branch in inner loop of foreach_transformed_block_in_plane()
Change-Id: Ib14d09376a9ce4fa5f541264e5c335aceb71380a
2015-07-24 11:14:33 -07:00
Jingning Han
d341f843e2 Refactor forward/inverse transform msa implementations
This commit factors out common macro definitions from the forward
and inverse transform implementations into vpx_dsp. It removes
the duplicate macro definitions from encoder and decoder folders.

Change-Id: I92301acbd3317075e9c5f03328a25abb123bca78
2015-07-23 11:20:30 -07:00
Jingning Han
b67821f37b Factor forward 2D-DCT transforms into vpx_dsp
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.

Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
2015-07-22 15:48:17 -07:00
Yaowu Xu
70ad668056 vpx_dsp/prob.h: vp9_ -> vpx_
change prefix vp9_ to vpx_ for non codec specific functions and data
structures.

Change-Id: I97c7e6422eceea99212b93f4942bc2187763a07c
2015-07-20 18:13:04 -07:00
Yaowu Xu
bf82514b54 vpx_dsp/bitreader.h: vp9_->vpx_
Replace vp9_ in names to vpx_ as they are not codec specific.

Change-Id: I2e583aa63dee769353ada4b42417aa15c4074ebb
2015-07-20 18:06:31 -07:00
Jingning Han
e253eaa036 Unify the high bit-depth forward hybrid transforms
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.

Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
2015-07-20 11:17:49 -07:00
Yaowu Xu
345ff1a2f2 Merge "Removed vp9_ prefix from vpx_dsp/bitreader file names" 2015-07-20 17:12:08 +00:00
Yunqing Wang
f65473c036 Merge "Migrate quantization functions from vp9/ to vpx_dsp/" 2015-07-20 16:20:07 +00:00
Yaowu Xu
87d2c3c063 Removed vp9_ prefix from vpx_dsp/bitreader file names
Change-Id: I0426126d0a65f13f9250983e44cc366b1b1a9c4a
2015-07-20 08:57:35 -07:00
Yaowu Xu
b0e6811ace Merge "Move bit reader files to vpx_dsp" 2015-07-20 14:52:50 +00:00
Yunqing Wang
38f1fbbb75 Migrate quantization functions from vp9/ to vpx_dsp/
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32

vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32

The purpose of doing that was to allow these functions to be shared
by multiple codecs.

Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
2015-07-17 16:38:14 -07:00
Jingning Han
2992739b5d Rename loop filter function from vp9_ to vpx_
Change-Id: I6f424bb8daec26bf8482b5d75dd9b0e45c11a665
2015-07-17 15:55:02 -07:00
Yaowu Xu
97279ed2e2 Move bit reader files to vpx_dsp
Change-Id: Ib1cb1fbe92a39ff5312cee069559be6d3ea458d0
2015-07-17 15:38:40 -07:00
Jingning Han
4735edd00f Migrate mips dspr2 loop filter implementation from vp9 to vpx
This commit moves the loop filter dspr2 implementation from vp9 to
vpx_dsp directory. It also fixes header file format issues.

Change-Id: I09203ed4bd267d7fd76bb79a6ee84a37646206b2
2015-07-17 11:51:05 -07:00
Jingning Han
d0750d287f Resolve dspr2 loop filter dependency complexity
Narrow the scope of dependency required by the dspr2 implementation
of loop filters.

Change-Id: Ib8d99dc7d9c231f69dd31d02e0a89e5bd0545a28
2015-07-17 10:38:35 -07:00
Jingning Han
55e80a3cc6 Replace vp9_common_dspr2.h with common_dspr2.h
Narrow the scope of dependency in dspr2 loop filter implementation.

Change-Id: I30426d7e4d41575a82286f1d3c5881aeb99a3250
2015-07-17 10:31:38 -07:00
Jingning Han
b8ff84b7f8 Create common dspr2 header file in vpx_dsp
Move the common prefetch_load/store in dspr2 to header file in
vpx_dsp/mips.

Change-Id: I8acc22970f2a0ef97d73061e39a3ae65c6955eac
2015-07-17 09:54:02 -07:00
Jingning Han
3590a4b437 Merge "Simplify dependencies in dspr2 related codes" 2015-07-17 16:12:52 +00:00
Jingning Han
845aad42b8 Merge "Migrate loop filter functions from vp9/ to vpx_dsp/" 2015-07-17 16:12:01 +00:00
Jingning Han
d190ad228f Simplify dependencies in dspr2 related codes
The common_dspr2.h should be independent of codec-specific data
structures.

Change-Id: I34ee1f9552c2d2d205fd7f1813cdf312c7ff5d2b
2015-07-16 18:22:48 -07:00
Jingning Han
50adfdf5ba Migrate loop filter functions from vp9/ to vpx_dsp/
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.

Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
2015-07-16 16:40:47 -07:00
Frank Galligan
8be1dcb4cb Merge "Add vp9_int_pro_col_neon." 2015-07-16 05:45:17 +00:00
Jingning Han
db8e731b8d Add vpx_dsp_common.h file
Move the clamp functions to vpx_dsp_common.h file. Clear out the
dependency of vp9_loopfilter_filters.c on vp9_common.h file.

Change-Id: I9c4b928bcd7f597106b5aa96354356d3775a3431
2015-07-15 13:03:23 -07:00
Jingning Han
3fe83cdf81 Remove redundant header files in vp9_loopfilter_filers.c
This cleans out the unnecessary dependency on vp9 codec-specific
data structures.

Change-Id: Iadbe431174a0f9bf9423f39ab854fc18be554bea
2015-07-15 12:44:47 -07:00
Frank Galligan
1c39998e39 Add vp9_int_pro_col_neon.
BUG=https://code.google.com/p/webm/issues/detail?id=1023

Change-Id: I212a1d67b23ce3b5ce08800de369b25b9e375e7d
2015-07-15 09:04:28 -07:00
Alex Converse
fa94dbda81 Merge "Add an SSE2 version of vp9_iwht4x4_16_add" 2015-07-14 22:11:47 +00:00
Alex Converse
d8426d6f12 Add an SSE2 version of vp9_iwht4x4_16_add
Roughly half as many cycles as plain C.

Change-Id: I8c16c29940b76d54ee7e4fb874c328ce90bff5d4
2015-07-14 14:23:32 -07:00
Jingning Han
81452cf0b7 Refactor intra block prediction function
This commit simplifies the intra block boundary condition logic.
It removes the block index from the argument set.

Change-Id: If00142512eb88992613d6609356dfd73ba390138
2015-07-13 15:20:47 -07:00
Yaowu Xu
0cdc85d8cf Merge "Revert "Add an SSE2 version of vp9_iwht4x4_16_add."" 2015-07-13 16:27:10 +00:00
Yaowu Xu
ae5394b9e2 Revert "Add an SSE2 version of vp9_iwht4x4_16_add."
This reverts commit f8d3501640.

Change-Id: If8c7af403c091b7fb447a6f0c73fecdbccbc51b3
2015-07-13 16:26:27 +00:00
Yaowu Xu
f70c80289c Merge "Clean out more MSVC warnings" 2015-07-09 17:49:08 +00:00
Scott LaVarnway
e8103f3676 Merge "Eliminate num_8x8 and num_4x4 width/height lookups" 2015-07-09 17:16:22 +00:00
Scott LaVarnway
13a4f14710 Eliminate num_8x8 and num_4x4 width/height lookups
Also some log2 lookups.

Pass in 8x8 block width/height and log2 num4x4s instead.

Change-Id: I8ea9a1ec1e0bbab23f8ba556954a1b5433f4d613
2015-07-09 05:30:46 -07:00
Yaowu Xu
c369daf3ea Clean out more MSVC warnings
Change-Id: I1bab0c104df2ec4825d050cd516e26ab635a7b3e
2015-07-08 15:09:20 -07:00
Alex Converse
f8d3501640 Add an SSE2 version of vp9_iwht4x4_16_add.
80% fewer cycles than C

Change-Id: I841bde1e268ddd33ae2ee75eee94737a400e2cde
2015-07-08 15:00:51 -07:00
Alex Converse
8bf791e7ef Merge "Don't allocate dqcoeff in MACROBLOCKD." 2015-07-08 20:42:36 +00:00
Alex Converse
89090d8046 Don't allocate dqcoeff in MACROBLOCKD.
The encoder gets its dqcoeff from the context tree. In the decoder move
it to directly after MACROBLOCKD.

Change-Id: I46c9b76f26956a360d17de0b26ecb994dae34ecb
2015-07-08 12:37:55 -07:00
Frank Galligan
b770def572 Merge "VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization" 2015-07-08 18:15:39 +00:00
James Zern
892128f6ca Merge "vp9_entropymv: remove vp9_get_mv_mag()" 2015-07-08 01:27:13 +00:00
Frank Galligan
5327fcf857 Merge "Add vp9_int_pro_row_neon." 2015-07-08 00:16:03 +00:00
Johann
6a82f0d7fb Move sub pixel variance to vpx_dsp
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
2015-07-07 15:51:04 -07:00
Jingning Han
a652048efd Add vp9_ prefix to init_macroblockd
Change-Id: I202d4924e627eec94838741df004ed9259d38b88
2015-07-07 12:00:01 -07:00
Jingning Han
cccad1c5de Reduce dqcoeff array size in decoder
The decoding process handles detokenization and reconstruction per
transform block sequentially. There is no need to offset the dqcoeff
buffer according to the transform block index. This allows to
reduce the memory spill and improve cache performance.

Change-Id: Ibb8bfe532a7a08fcabaf6d42cbec1e986901d32d
2015-07-07 11:36:05 -07:00
James Zern
c6d90f0535 vp9_entropymv: remove vp9_get_mv_mag()
inline the code directly in read_mv_component(), the only place where it
was being used; this removes a function call in a hot function

Change-Id: I66f99c0c9ce3bc310101dbca4a470f023cc6fb55
2015-07-06 22:30:21 -07:00
James Zern
1696114587 Merge "mips msa vp9 subpel variance optimization" 2015-07-06 22:43:01 +00:00
Jingning Han
fcb5a8692a Merge "Move subtract functions from vp9 to vpx_dsp" 2015-07-06 22:39:26 +00:00
Parag Salasakar
fbe67d307a mips msa vp9 subpel variance optimization
Change-Id: If88401bf8c5d8ee58200278734d7a5058d1585d0
2015-07-06 14:59:01 -07:00
James Zern
91c412b6db Merge "remove vp9_get_interp_kernel()" 2015-07-06 21:36:37 +00:00
James Zern
017253b7a3 remove vp9_get_interp_kernel()
expose filter_kernels[] and do the table lookup directly

Change-Id: I0b10bff0327c3e01a723736141a9ffd377cd3d20
2015-07-06 13:04:05 -07:00
Jingning Han
432cd4bfb7 Move subtract functions from vp9 to vpx_dsp
Factor out the subtraction operator as common function.

Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
2015-07-06 12:22:47 -07:00
Jingning Han
39f03bf9c6 Merge "Rename vpx_thread to vpx_util" 2015-07-06 17:01:30 +00:00
James Zern
3d4526322b Merge "Revert "mips msa vp9 subpel variance optimization"" 2015-07-02 21:07:32 +00:00
James Zern
4c5ac477cb Merge "Revert "mips msa vp9 avg subpel variance optimization"" 2015-07-02 21:07:24 +00:00
James Zern
97946622c0 Revert "mips msa vp9 subpel variance optimization"
This reverts commit a42df86c03.

this change causes MSA/VP9SubpelVarianceTest.Ref and
MSA/VP9SubpelVarianceTest.ExtremeRef failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu

Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
2015-07-02 12:06:51 -07:00
James Zern
ced982640b Revert "mips msa vp9 avg subpel variance optimization"
This reverts commit 61774ad1c4.

this change causes MSA/VP9SubpelAvgVarianceTest.Ref failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu

Change-Id: I7fb520c12b2a3b212d5e84b7619a380a48e49bb0
2015-07-02 12:06:29 -07:00
levytamar82
3c5256d572 VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization
The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)

Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
2015-07-02 11:56:11 -07:00
Jingning Han
d1b30ceaa3 Rename vpx_thread to vpx_util
Change the dir name to include more util tools.

Change-Id: Id5b16062803ce5eed872fe2edb36d7e56b32eed8
2015-07-02 10:02:37 -07:00
Jingning Han
8565a1c99a Merge "Use vpx prefix for codec independent threading functions" 2015-07-02 04:24:54 +00:00
Jingning Han
66cf8098e6 Merge "Move multi-threading module functions into vpx_thread folder" 2015-07-02 04:24:37 +00:00
James Zern
e757808429 Merge "vp9_pred_common: inline vp9_get_tx_size_context" 2015-07-02 01:52:40 +00:00
James Zern
0ea304620c Merge "vp9_pred_common: inline vp9_get_segment_id" 2015-07-02 01:52:21 +00:00
Jingning Han
04d2e57425 Use vpx prefix for codec independent threading functions
Replace vp9_ prefix with vpx_ for common multi-threading functions.

Change-Id: I941a5ead9bfe8213fdad345511d2061b07797b55
2015-07-02 00:47:54 +00:00
Jingning Han
3a3b0be09a Move multi-threading module functions into vpx_thread folder
This commit moves the primitive multi-threading files from vp9
folder to vpx_thread, which will be accessible by all vpx codec.

Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
2015-07-01 17:45:49 -07:00
Johann
79fcc56781 Merge "Fix --disable-use-x86inc when used with --enable-vp9-highbitdepth" 2015-07-01 21:14:41 +00:00
Johann
8d5389171f Merge "Fix --disable-use-x86inc" 2015-07-01 21:14:17 +00:00
Johann
1c967f17bd Fix --disable-use-x86inc when used with --enable-vp9-highbitdepth
Change-Id: I0ed6de72dc0bb99fc9c5b1f6500399b16754ffb3
2015-07-01 13:17:01 -07:00
Johann
ff8505a54d Fix --disable-use-x86inc
Change-Id: I374fcd8fb45a6893dcdeac6896671be142a99f06
2015-07-01 13:15:51 -07:00
James Zern
4f7e7c4d49 Merge "mips msa vp9 avg subpel variance optimization" 2015-07-01 20:05:50 +00:00
Scott LaVarnway
dc6d954bd2 Merge "Move inter_predictor to vp9_reconinter.h" 2015-07-01 20:01:53 +00:00
Scott LaVarnway
d157742788 Merge "VP9: Move ref_mvs[][] and mode_context[] from MB_MODE_INFO" 2015-07-01 12:52:21 +00:00
Parag Salasakar
61774ad1c4 mips msa vp9 avg subpel variance optimization
average improvement ~3x-5x

Change-Id: Iefbcafc05daab77b38a4e63b551e427867a501a4
2015-07-01 13:46:41 +05:30
Parag Salasakar
a42df86c03 mips msa vp9 subpel variance optimization
average improvement ~3x-5x

Change-Id: I4cbba2711467b0e205904769ebbb4a1fcbb1a311
2015-07-01 07:51:34 +05:30
James Zern
fc5f3b8f4f Merge "vp9_common_data: right-size tables" 2015-06-30 21:12:54 +00:00
Parag Salasakar
fc3c456053 Merge "mips msa vp9 common macro comments updated" 2015-06-30 06:25:31 +00:00
Scott LaVarnway
c06d56cc7d VP9: Move ref_mvs[][] and mode_context[] from MB_MODE_INFO
to MB_MODE_INFO_EXT.  This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)

Change-Id: If006abb2224acaf326df3c2be09e77e967662107
2015-06-29 12:46:47 -07:00
Scott LaVarnway
437d033dbb Merge "Remove tile param" 2015-06-29 18:04:56 +00:00
Parag Salasakar
3c353e58c0 mips msa vp9 common macro comments updated
Cosmetic/Grammatical corrections in vp9 macro comments

Change-Id: I774b983aff854feb69c7e4442e8731ce4c995645
2015-06-29 11:52:28 +05:30
Parag Salasakar
b92cc27b76 mips msa vp9 temporal filter optimization
average improvement ~4x-5x

Change-Id: Iad9c0a296dbc2ea96d000bd009077999ed58a3c5
2015-06-26 12:00:24 +05:30
Parag Salasakar
c040f96e4b mips msa vp9 subtract block optimization
average improvement ~3x-4x

Change-Id: Idbe4d13a00d05ff8be6559b116f416e42c3b4097
2015-06-26 09:23:56 +05:30
Parag Salasakar
d017f5ba38 Merge "mips msa vp9 block error optimization" 2015-06-26 03:42:31 +00:00
Parag Salasakar
1543f2b60e mips msa vp9 block error optimization
average improvement ~3x-4x

Change-Id: If0fdcc34b17437a7e3e7fb4caaf1067bc175f291
2015-06-26 09:04:00 +05:30
James Zern
28a8226350 vp9_common_data: right-size tables
Change-Id: I2206ee148a46b234df58f2b623e9f32f26033e04
2015-06-25 20:20:40 -07:00
James Zern
d219f2b9d2 Merge "vp9_reconintra_neon: add d45 16x16" 2015-06-24 21:23:15 +00:00
Frank Galligan
944ad6cac9 Add vp9_int_pro_row_neon.
BUG=https://code.google.com/p/webm/issues/detail?id=1022

Change-Id: I510c3b0a70158fa2e4da554f7c5d7558021a6ddf
2015-06-23 11:53:49 -07:00
James Zern
9db1f24c47 vp9_reconintra_neon: add d45 16x16
~90% faster over 20M pixels

Change-Id: I92d80f66e91e0a870a672cfb5dd29bf1a17cb11a
2015-06-22 21:00:07 -07:00
Parag Salasakar
7555e2b822 mips msa vp9 avg optimization
average improvement ~2x-3x

Change-Id: I76f7fc00c0ffdf2b4ba41bf3819f3b6044bcdeff
2015-06-23 07:32:25 +05:30
Parag Salasakar
7b71cdb0b4 Merge "mips msa vp9 fdct 4x4 optimization" 2015-06-23 01:46:54 +00:00
James Zern
c8b9658ecc Merge "vp9_reconintra_neon: add d45 8x8" 2015-06-22 22:27:57 +00:00
Scott LaVarnway
86f4a3d8af Remove tile param
and added to MACROBLOCKD.

Change-Id: I0e60aaa9f84bcc9f2376d71bd934f251baee38db
2015-06-22 06:09:38 -07:00
Parag Salasakar
bc94999148 mips msa vp9 fdct 4x4 optimization
average improvement ~2x-3x

Change-Id: Idf8be780b8b4228fc91f110a94e4ee1fd9af0163
2015-06-22 14:30:24 +05:30
Parag Salasakar
b6131a733d Merge "mips msa vp9 fdct 8x8 optimization" 2015-06-20 02:58:10 +00:00
James Zern
12c6688e31 vp9_reconintra_neon: add d45 8x8
based on ssse3 implementation

~91% faster over 20M pixels

Change-Id: I6d743a53352c2d6de0efe7899d7996e8b0f7fa29
2015-06-19 19:19:22 -07:00
Parag Salasakar
7ca84888c2 mips msa vp9 fdct 8x8 optimization
average improvement ~4x-5x

Change-Id: I37582efc2622bc20b2bf99617a76110ab24e9f6a
2015-06-20 07:48:35 +05:30
James Zern
714a46a63c Merge "vp9_filter: make all filter tables static" 2015-06-19 03:32:24 +00:00
James Zern
a2c69af50e Merge "vp9_reconintra_neon: add d45 4x4" 2015-06-19 03:27:23 +00:00
James Zern
5d1d72df16 Merge changes from topic 'vp9-intra-pred'
* changes:
  vp9_reconintra_neon: add d135 4x4
  vp9_reconintra: correct d135 4x4 signature
2015-06-19 03:24:58 +00:00
James Zern
ce88d74d34 vp9_reconintra_neon: add d45 4x4
based on webp's LD4()

~59% faster over 20M pixels

Change-Id: I371eaed9ce8f470451046997e130b0ba1a2f7a9c
2015-06-18 15:25:07 -07:00
James Zern
337b221e00 vp9_reconintra_neon: add d135 4x4
based on webp's RD4()

~50% faster over 20M pixels

Change-Id: Ifcb7bf7f7fc8eabf79d9e3b219ce1be67abc524a
2015-06-18 15:25:06 -07:00
James Zern
e8e3583fc7 vp9_reconintra: correct d135 4x4 signature
add missing '_c' suffix

Change-Id: I928d6cf8f90db0b8ca0b1f3bbf10b3d792062cec
2015-06-18 15:25:06 -07:00
James Zern
41d8545ab6 Merge "vp9_reconintra_neon: add DC 4x4 predictors" 2015-06-18 22:24:55 +00:00
James Zern
6e44bf20f7 vp9_reconintra_neon: add DC 4x4 predictors
~85-89% faster over 20M pixels

Change-Id: I3812e8adfffe5255034da88dfe6546e12f4d10ee
2015-06-18 15:22:43 -07:00
James Zern
e77f859d72 Merge "vp9_reconintra_neon: add DC 32x32 predictors" 2015-06-18 22:17:51 +00:00
Parag Salasakar
d9fedf7832 mips msa vp9 fdct 32x32 optimization
average improvement ~4x-6x

Change-Id: Ibcac3ef8ed5e207cf8c121e696570e6b63d3c0f4
2015-06-17 07:58:34 +05:30
Parag Salasakar
fa53008fb7 Merge "mips msa vp9 fdct 16x16 optimization" 2015-06-17 01:21:59 +00:00
Scott LaVarnway
5fe0e55ca4 Merge "Eliminated frame_type check in get_partition_probs()" 2015-06-16 13:40:23 +00:00
Scott LaVarnway
b2658ec321 Eliminated frame_type check in get_partition_probs()
Moved the frame_type check to the tile level and stored
the prob ptr in MACROBLOCKD.

Change-Id: I10b5a4abd58213dc7610e3ade1a1583c01526842
2015-06-16 05:37:54 -07:00
Scott LaVarnway
a41fe749a8 Merge "Update use_prev_frame_mvs flag in decoder." 2015-06-16 12:28:46 +00:00
Parag Salasakar
89b4b315aa mips msa vp9 fdct 16x16 optimization
average improvement ~4x-6x

Change-Id: Id3b2243e5b3c7844c90c4231a5e75fa69911362c
2015-06-16 12:49:34 +05:30
James Zern
79fb3a013e vp9_reconintra_neon: add DC 32x32 predictors
~84-85% faster over 20M pixels

Change-Id: Ia67a7f4a342bf7b0a9280e05c25d81a774d90469
2015-06-15 20:57:28 -07:00
James Zern
3edd293dae vp9_pred_common: inline vp9_get_tx_size_context
+ drop 'vp9_' prefix

Change-Id: If3f3ec32d03026af78b8fcd82749e587a3f43059
2015-06-15 18:41:22 -07:00
James Zern
e6add6499f vp9_pred_common: inline vp9_get_segment_id
+ drop 'vp9_' prefix

Change-Id: Id5a3c8d416dbdf93d9f4f1bde662f7b2c2290168
2015-06-15 18:41:14 -07:00
James Zern
17c9678a3c Merge "vp9_entropy: delete vp9_coefmodel_tree[]" 2015-06-15 23:02:42 +00:00
James Zern
e8d3491ec2 Merge "vp9_entropymode: make vp9_init_mode_probs private" 2015-06-15 23:02:36 +00:00
James Zern
98f0178611 enable vp9_d153_predictor_32x32_ssse3
unused since its initial commit
~91% faster over 20M pixels

Change-Id: Ic8b5b3246bc97c8406be8bc4496601370403b70a
2015-06-12 19:48:22 -07:00
James Zern
ef75416ab7 vp9_entropy: delete vp9_coefmodel_tree[]
it's been unused since:
4ac6a25 Moving vp9_tree_probs_from_distribution() to encoder.

Change-Id: Ieae65864277fc3dbe993c5c08d75c6c5fcaa3a2d
2015-06-12 18:43:37 -07:00
James Zern
53b7f33f2d vp9_entropymode: make vp9_init_mode_probs private
rename to init_mode_probs

Change-Id: Id451d7763b784ed37e43f2c35073a778078d3d0f
2015-06-12 18:25:23 -07:00
Parag Salasakar
ecbbef6b67 Merge "mips msa vp9 filter by weight optimization" 2015-06-12 18:30:11 +00:00
Parag Salasakar
fbac961b47 mips msa vp9 filter by weight optimization
filter by weight - average improvement ~2x-3x

Change-Id: I4832033335d339cdafdce697f07ce3e643920057
2015-06-12 12:06:42 +05:30
James Zern
e2b52f6f01 vp9_filter: make all filter tables static
these are returned via vp9_get_interp_kernel()

Change-Id: I45ed75e5b1515c4f5be9212759dcb50a456b5548
2015-06-11 15:15:52 -07:00
James Zern
33b3953c54 vp9_filter: restore vp9_bilinear_filters alignment
the declaration containing the alignment in vp9_filter.h was removed in:
eb88b17 Make vp9 subpixel match vp8

fixes a crash in 32-bit builds

Change-Id: I9a97e6b4e8e94698e43ff79d0d8bb85043b73c61
2015-06-11 15:15:25 -07:00
Scott LaVarnway
cca866f578 inline vp9_get_segdata()
and change name.

Change-Id: I706645cf9d9dc04f1b3b6ac80df80edb7f101854
2015-06-11 09:52:00 -07:00
Scott LaVarnway
a49c701529 Merge "inline vp9_segfeature_active()" 2015-06-11 12:29:45 +00:00
Scott LaVarnway
42c0b1b1f1 inline vp9_segfeature_active()
and changed name.

Change-Id: Ie023ca66cc2c823032f58d4faeb53fd1863c94f3
2015-06-11 04:20:55 -07:00
Parag Salasakar
c7489f4815 Merge "mips msa vp9 intra-pred optimization" 2015-06-11 03:31:49 +00:00
James Zern
44afbbb72d Merge "vp9_reconintra/d45_predictor: remove temp storage" 2015-06-10 19:23:57 +00:00
Scott LaVarnway
97880c3324 Merge "Reducing size of MODE_INFO struct" 2015-06-10 13:15:19 +00:00
Scott LaVarnway
c9976b32b4 Update use_prev_frame_mvs flag in decoder.
Added check to see if last frame was all intra.  This will
eliminate two checks in find_mv_refs_idx().  Also, do not
update the frame mvs if the current frame is all intra.

This improved performance on material with frequent
intra-only frames.

Change-Id: I44a4042c3670ab0d38439d565062a0e2a1ba9d1e
2015-06-08 03:38:13 -07:00
Parag Salasakar
a2288d274c mips msa vp9 intra-pred optimization
intra pred - average improvement ~2x-3x

Change-Id: Ie3f7d6eded5ecb7ed7ee506ba8e4d98f93803b09
2015-06-06 22:29:32 +05:30
James Zern
9c6eea35b6 Merge "vp9_reconintra: simplify d63_predictor" 2015-06-05 21:49:13 +00:00
Frank Galligan
bfb6d48812 Add control to skip loop filter in VP9 decoder.
This control allows the application to skip the loop filter in the
decoder. This is an advanced control that should only be used in
extreme circumstances as it may introduce and accumulate decode
artifacts.

Change-Id: I278c65c60826f84c9141ebe06c6eeed3c2335fa8
2015-06-05 10:07:09 -07:00
Parag Salasakar
d43fd99822 mips msa vp9 loopfilter 4, 8 optimization
average improvement ~3x-4x

Change-Id: I59279293ce4b2a1e99bd10579ac97740e943643f
2015-06-05 09:56:08 +05:30
James Zern
60d0b3364c vp9_reconintra/d45_predictor: remove temp storage
dst row 0 can be reused in the same way

Change-Id: Id977da62545dcc4a89cebbcbad90ba84f8ff5d6b
2015-06-04 20:11:53 -07:00
James Zern
7012ba6395 vp9_reconintra: simplify d63_predictor
calculate the averages needed for even and odd rows once; this removes a
conditional from the inner loop
the final average calculated currently relies on above[] being extended,
it could be reduced to use
above[block_size - 2] + 3 * above[block_size - 1]

Change-Id: I70f5eac8d8a2a959c7114844a95826f445c3dd4d
2015-06-04 19:21:05 -07:00
Parag Salasakar
dc07cc6fed Merge "mips msa vp9 loopfilter 16 optimization" 2015-06-05 02:15:26 +00:00
James Zern
c2cf347fe2 Merge "vp9_reconintra: use AVG[23] consistently" 2015-06-05 02:15:22 +00:00
James Zern
2b6d62140e Merge "vp9_reconintra_neon_asm/tm4x4: simplify left load" 2015-06-05 01:46:39 +00:00
James Zern
6c3b691c49 Merge "vp9_reconintra: fix d45/d63 discrepancies" 2015-06-04 22:56:43 +00:00
James Zern
faea038f4f vp9_reconintra: fix d45/d63 discrepancies
the final index in rows 2, 3 differ from vp8

Change-Id: I0fcea907b4ab44e266c0f1fd77b290d2236b280a
2015-06-04 14:49:56 -07:00
Scott LaVarnway
baaaa57533 Reducing size of MODE_INFO struct
Reduced size from 124 bytes to 104 bytes.  For decode only builds,
it is reduced to 68 bytes.

Change-Id: If9e6b92285459425fa086ab5a743d0a598a69de3
2015-06-04 07:32:16 -07:00
Scott LaVarnway
8bb37dd069 Remove cm parameter from vp9_decode_block_tokens() part 2
Change-Id: Iee24b6bb095f748333223e6036fc5c9d9e7e5f1c
2015-06-04 07:13:19 -07:00
Scott LaVarnway
877fac122b Merge "Remove counts param" 2015-06-04 13:46:42 +00:00
Parag Salasakar
914f8f9ee0 mips msa vp9 loopfilter 16 optimization
average improvement ~3x-4x

Change-Id: I8ef263da6ebcf8f20aabaefeccf25a84640ba048
2015-06-04 11:50:41 +05:30
Johann Koenig
c005792951 Merge "Make vp9 subpixel match vp8" 2015-06-04 06:16:13 +00:00
Parag Salasakar
fd891a9655 Merge "mips msa vp9 convolve8 avg hv optimization" 2015-06-04 05:44:24 +00:00
Johann
eb88b172fe Make vp9 subpixel match vp8
The only difference between the two was that the vp9 function allowed
for every step in the bilinear filter (16 steps) while vp8 only allowed
for half of those. Since all the call sites in vp9 (<< 1) the input, it
only ever used the same steps as vp8.

This will allow moving the subpel variance to vpx_dsp with the rest of
the variance functions.

Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75
2015-06-03 22:10:51 -07:00
hkuang
ce5e17072d Merge "Optimize the idct assembly code." 2015-06-04 04:32:11 +00:00
James Zern
4fcabf5169 vp9_reconintra: use AVG[23] consistently
Change-Id: Iab7215f82be0c0c831cd81b6f8091afc3710dd54
2015-06-03 19:52:46 -07:00
Parag Salasakar
bdfbc3e876 mips msa vp9 convolve8 avg hv optimization
average improvement ~4x-6x

Change-Id: I7c8b4f2334491be8a859592606e568bc95d019aa
2015-06-04 08:11:01 +05:30
James Zern
2da8d24e8f Merge "vp9_reconintra: simplify d45_predictor" 2015-06-04 01:59:10 +00:00
James Zern
a9f55e8324 Merge changes from topic 'vp9-intra-pred'
* changes:
  vp9_reconintra: specialize d135 4x4
  vp9_reconintra: specialize d117 4x4
  vp9_reconintra: specialize d207 4x4
  vp9_reconintra: specialize d153 4x4
  vp9_reconintra: specialize d63 4x4
  vp9_reconintra: specialize d45 4x4
2015-06-04 01:58:28 +00:00
James Zern
65d9599807 vp9_reconintra_neon_asm/tm4x4: simplify left load
use vld1.8 {d0[]}, [r0] rather than ldrb+vdup; mildly faster

Change-Id: Ia5ffc736bcb0f5497b7d9e55a93bf5a5f5f6928c
2015-06-03 18:51:13 -07:00
hkuang
98e88e6ad8 Optimize the idct assembly code.
Change-Id: Ia0ff859ff1c813dbe100e2f27b1ef78167483f4e
2015-06-03 17:20:35 -07:00
Parag Salasakar
b8c1cdcd12 mips msa vp9 convolve8 avg horiz optimization
average improvement ~5x-8x

Change-Id: I179a69ec620fbd69979bd128f05d18113618aab4
2015-06-03 11:33:42 +05:30
Parag Salasakar
c543d38ac7 mips msa vp9 convolve8 avg vert optimization
average improvement ~4x-6x

Change-Id: Ia2e6f770da46416ebec31fdcea5cc7878879a9d9
2015-06-03 09:55:25 +05:30
Scott LaVarnway
f779dba405 Remove counts param
Moved to MACROBLOCKD.

Change-Id: Icce765b334f2755f4fe2a4c39fb2ae2d7660d004
2015-06-02 09:06:00 -07:00
Parag Salasakar
54a6f73958 mips msa vp9 idct4x4 and iwht4x4 optimization
average improvement ~3x-4x
moved assert to respective files

Change-Id: I6c915059d456a00bdd76fab0dd2eede8b6c6ea58
2015-06-02 12:16:28 +05:30
Parag Salasakar
ebf7466cd8 mips msa vp9 updated convolve horiz, vert, hv, copy, avg module
Updated sources according to improved version of common MSA macros.
Enabled respective convolve MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.

Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d
2015-06-02 12:03:51 +05:30