Commit Graph

13933 Commits

Author SHA1 Message Date
Parag Salasakar
9b375871db mips msa vpx subpel variance optimization
Removed redundant clip/saturate code from 2tap filter functions
average improvement 20%-40%

Change-Id: I362540b0c7d5d3d69932c39d61b7d2a44da533d2
2015-08-03 13:00:55 +05:30
Jingning Han
0b0eba728d Add _dspr2 to local function names
It avoids symbol conflicts between function names of various
implementation versions.

Change-Id: Iad79ebcb8e289457801812a7745c8380b5b06a46
2015-08-02 20:21:59 -07:00
Jingning Han
da7dc59837 Merge "Factor out mips/msa inverse transform implementations" 2015-08-03 03:18:39 +00:00
Jingning Han
0fcfc613c6 Merge "Add x86inc flag guard to inv_txfm_sse2.asm" 2015-08-02 21:56:09 +00:00
Jingning Han
4f7a7d29fa Add x86inc flag guard to inv_txfm_sse2.asm
Fix the VS build failure.

Change-Id: I4fb9d1c83980c4b52d5a848a9cb02ec72493dccb
2015-08-02 08:43:51 -07:00
James Zern
22a8474fe7 vpx_convolve_copy_sse2: fix win64
xmm6-7 need to be stored

Change-Id: I6c51559598d335946ec91be6246b49589c63b724
2015-08-01 11:45:49 -07:00
Jingning Han
44849516d4 Factor out mips/msa inverse transform implementations
Move mips/msa inverse transform implementations from vp9 folder to
vpx_dsp.

Change-Id: Ic4cf3f05247c3c63db7b532a0e5000017a962391
2015-08-01 09:25:12 -07:00
Jingning Han
b37494cfb5 Merge "Use precise header files in inverse transform msa implementations" 2015-08-01 16:20:43 +00:00
Jingning Han
b4c7d0523a Merge "Factor inverse transform functions into vpx_dsp" 2015-08-01 16:20:24 +00:00
Parag Salasakar
c1b233dd43 Merge "mips msa vp8 temporal filter optimization" 2015-08-01 02:12:20 +00:00
Jingning Han
4dc390b15d Merge "Add dynamic range notes to vp9_vector_var_c" 2015-08-01 01:01:37 +00:00
Aℓex Converse
fd22c492f7 Merge "Turn off simple_model_rd_from_var at speed 4." 2015-07-31 23:51:01 +00:00
Jingning Han
36a9a33b90 Add dynamic range notes to vp9_vector_var_c
Change-Id: If536ad31046ecd9e2ecd9c21f52f8192c8153ad7
2015-07-31 16:42:09 -07:00
Jingning Han
56c2cb7553 Use precise header files in inverse transform msa implementations
Change-Id: Ie8a79d9e2837842c3f60776b661cd42782b108d5
2015-07-31 23:24:54 +00:00
James Zern
d8642d831f Merge "VP9_COPY_CONVOLVE_SSE2 optimization" 2015-07-31 23:22:34 +00:00
Jingning Han
e8b133c79c Factor inverse transform functions into vpx_dsp
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.

Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31 16:21:00 -07:00
Alex Converse
af6d2c7d42 Turn off simple_model_rd_from_var at speed 4.
This got erroneously changed during the refactor. This fixes
SvcTest.TwoPassEncode2TemporalLayersWithMultipleFrameContextsAndTiles.

Change-Id: Ifa5ab0e098396c5e2d10478db87df256eadfa4c7
2015-07-31 15:50:17 -07:00
James Zern
e184b613b9 Merge changes Iecdbbc34,I8b4db93f
* changes:
  Android.mk: fix *_rtcd.h deps for armeabi-v7a
  Android.mk: add a dep on vpx_config.asm for x86_64
2015-07-31 22:22:48 +00:00
Scott LaVarnway
a5e97d874b VP9_COPY_CONVOLVE_SSE2 optimization
This function suffers from a couple problems in small core(tablets):
-The load of the next iteration is blocked by the store of previous iteration
-4k aliasing (between future store and older loads)
-current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished
fixed by:
- prefetching 2 lines ahead
- unroll copy of 2 rows of block
- pre-load all xmm regiters before the loop, final stores after the loop
The function is optimized by:
copy_convolve_sse2 64x64 - 16%
copy_convolve_sse2 32x32 - 52%
copy_convolve_sse2 16x16 - 6%
copy_convolve_sse2 8x8 - 2.5%
copy_convolve_sse2 4x4 - 2.7%
credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com)

Change-Id: I63d3428799c50b2bf7b5677c8268bacb9fc29671
2015-07-31 14:51:51 -07:00
Jingning Han
6025c6d65b Merge "Fix compiler warning in mips/dspr2" 2015-07-31 21:29:50 +00:00
Aℓex Converse
dd4b416412 Merge "Compute skippable inside the block_rd_txfm loop." 2015-07-31 21:19:11 +00:00
Jingning Han
135b43ccf3 Fix compiler warning in mips/dspr2
This commit fixes the mix declaration and definition warning when
mips/dspr2 is turned on.

Change-Id: I633d6fe42368b9ac35b106786ebac6969ad53552
2015-07-31 12:34:34 -07:00
Aℓex Converse
90e563d91f Merge changes Ic1ce346a,Ic0b4e92c
* changes:
  Simplify model_rd_for_sb HBD ifdefs
  Simplify dist_block HBD ifdefs
2015-07-31 19:05:54 +00:00
Alex Converse
ab20c98e84 Compute skippable inside the block_rd_txfm loop.
Change-Id: Iaa43aeeb7a2074495e00cdb83bb551c3f13d3ed2
2015-07-31 11:45:59 -07:00
Zoe Liu
7f8dd35329 Merge "Refactor mips/dspr2 on convolution." 2015-07-31 18:23:19 +00:00
Zoe Liu
873a158f14 Merge "Code refactor on InterpKernel" 2015-07-31 18:20:14 +00:00
Alex Converse
c62228f273 Simplify model_rd_for_sb HBD ifdefs
Change-Id: Ic1ce346a053800ae3b2d77178f46e6a388357f6d
2015-07-31 11:16:59 -07:00
Alex Converse
da9c73c293 Simplify dist_block HBD ifdefs
Change-Id: Ic0b4e92cbaf813bcca8a8e9052c936c2e025e114
2015-07-31 11:04:01 -07:00
Aℓex Converse
8abd0c2a12 Merge "Short circuit rate_block in block_rd_txfm." 2015-07-31 17:59:22 +00:00
Zoe Liu
7cfdc00337 Refactor mips/dspr2 on convolution.
Change-Id: If59a39d5a92c261537342726f94bb7f7f26dfff3
2015-07-31 10:27:42 -07:00
Zoe Liu
7186a2dd86 Code refactor on InterpKernel
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.

Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
2015-07-31 10:27:33 -07:00
Alex Converse
4ac5058afc Give skip_txfm constants names.
This is using a define instead of an enum to keep byte packing.

Change-Id: I3abb07c8bfe377e19be4531b624af7b7b4207792
2015-07-31 10:08:08 -07:00
Alex Converse
73422d3b2d Short circuit rate_block in block_rd_txfm.
Don't run rate_block (cost_coeffs) if distortion alone is enough to
surpass best_rd.

This decreases 2nd pass runtime on HD at speed 2 by about 2%. There is
zero effect on output if tx_cache is removed.

Change-Id: Ia3b1cc77bfbe6ee988c395fde06c0eb92940b784
2015-07-31 10:05:51 -07:00
Parag Salasakar
8fbc641540 mips msa vp8 temporal filter optimization
average improvement ~2x-3x

Change-Id: I05593bed583234dc7809aaec6cab82773a29505d
2015-07-31 12:03:19 +05:30
Parag Salasakar
0e3f494b21 mips msa vp8 block subtract optimization
average improvement ~2x-3x

Change-Id: I30abf4c92cddcc9e87b7a40d4106076e1ec701c2
2015-07-31 09:29:10 +05:30
Parag Salasakar
e3ee8c292b Merge "mips msa vp8 quantize optimization" 2015-07-31 03:44:03 +00:00
Yunqing Wang
3b2e73b9a4 Remove tx cache and speed up tx size selection
1. The RD scores obtained during the tx size selection were stored in the
tx cache, and used to help make the tx decision for the following frames.
This wasn't used anymore in VP9 encoder. Recovered the related decision
making code from 1.5+ years ago, and borg tests didn't show any quality
gain. This patch removed it to lower the complexity.

2. An optimization was done after the above refactoring. If the tx_mode
is not TX_MODE_SELECT, we only need to test the chosen tx size instead
of all posible tx sizes. This gave a 1.5% average speed gain at speed 2,
and a 1% average speed gain at speed 3.

Change-Id: Id8cd650e066a8cef33829d8c15388a8138adc78c
2015-07-30 18:53:40 -07:00
Aℓex Converse
eb6b443bd2 Merge "Convert simple_model_rd_from_var from a speed check to a speed feature." 2015-07-30 23:04:28 +00:00
Hui Su
a71c5c0ee9 Merge "Exclude vpx intra prediction functions in vp8-only build" 2015-07-30 22:29:35 +00:00
Alex Converse
c827c59eaf Convert simple_model_rd_from_var from a speed check to a speed feature.
Change-Id: I8877025e172fff29bc4e270790211463b676b4d7
2015-07-30 13:53:26 -07:00
hui su
5fddefbced Exclude vpx intra prediction functions in vp8-only build
Currently vp8 is not using the intra prediction functions in vpx_dsp.

Change-Id: I1522b5f5cb12a81999fb126cf7c62c70259e7a52
2015-07-30 13:49:47 -07:00
James Zern
21da45e570 Android.mk: fix *_rtcd.h deps for armeabi-v7a
strip '.neon' so *_rtcd.h depends on the correct file

Change-Id: Iecdbbc34c9ce5c6d0a4b466332d52f4e6a0cb128
2015-07-30 13:27:30 -07:00
Parag Salasakar
56aa0da405 mips msa vp8 quantize optimization
average improvement ~2x-3x

Change-Id: I6fc37191bf9cb5a67e1af9787d0d27659c17bdba
2015-07-30 12:56:57 -07:00
Alex Converse
b7f441a0bc Cleanup rdcost_block_args
Change-Id: I9d613cbe9e76b5dd15e935878ef9fd04521690ba
2015-07-30 12:55:51 -07:00
Aℓex Converse
c0f0245e8a Merge "Clean up some casts." 2015-07-30 19:37:28 +00:00
Jingning Han
91feec1452 Merge "Cosmetics - Fix header file order in unit tests" 2015-07-30 05:37:53 +00:00
Jingning Han
097d59c28c Cosmetics - Fix header file order in unit tests
Change-Id: I9582a8d74990125b71e8fe620f7f3f2585a30798
2015-07-29 20:48:25 -07:00
Parag Salasakar
0c2a14f9e2 mips msa vp8 fdct optimization
average improvement ~2x-4x

Change-Id: Id0bc600440f7ef53348f585ebadb1ac6869e9a00
2015-07-30 08:14:42 +05:30
Parag Salasakar
7c6ae373ac Merge "mips msa vp8 post proc optimization" 2015-07-30 02:34:06 +00:00
Aℓex Converse
583c205270 Merge "Comment zcoeff_blk." 2015-07-30 01:06:08 +00:00