Commit Graph

377 Commits

Author SHA1 Message Date
Johann
2d6b5df657 neon: vpx_quantize_b
With skip block or coeff < zbin it is about twice as fast as C.

If most coeff values are > zbin it is about 10-15x as fast as C.

BUG=webm:1426

Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
2017-07-31 10:38:46 -07:00
Linfeng Zhang
7f4acf8700 Add vpx_idct16x16_38_add_sse2()
Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898
2017-07-27 18:02:43 -07:00
Alexandra Hájková
666c543f7b ppc: Add vpx_idct16x16_256_add_vsx
Change-Id: Ibc3f7965423fd91179f8d8e77c7ae3e6d7f80572
2017-07-25 12:34:15 +00:00
Johann
e381753926 sad4d neon: 64x[32,64]
Rewrite 64x64.

BUG=webm:1425

Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf
2017-07-12 13:26:39 +00:00
Johann
e1bde306c8 sad4d neon: 32x[16,32,64]
Rewrite 32x32. Use half the accumulator registers.

BUG=webm:1425

Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13
2017-07-12 13:25:18 +00:00
Johann
807ce8fb1e sad4d neon: 16x[8,16,32]
Rewrite 16x16. Use half the accumulator registers.

BUG=webm:1425

Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2
2017-07-12 13:25:11 +00:00
Johann
8152b0904d sad4d neon: 8x[4,8,16]
BUG=webm:1425

Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4
2017-07-12 13:25:03 +00:00
Johann
dd4347e9ec sad4d neon: 4x4, 4x8
BUG=webm:1425

Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468
2017-07-12 03:38:03 +00:00
Johann Koenig
4b78c6e6f7 Merge "remove vp9_full_sad_search" 2017-07-10 20:42:40 +00:00
Johann
109faffe9b remove vp9_full_sad_search
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.

Remove the remaining sizes and all the highbitdepth versions.

BUG=webm:1425

Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
2017-07-10 11:20:35 -07:00
Johann Koenig
4e16f70703 Merge changes Id84d9780,Iaa6ea75b,I3362e0dd,I0020a49e,Ia42e4f36, ...
* changes:
  sad neon: avg for 64x[32,64]
  sad neon: macroize 64xN definitions
  sad neon: avg for 32x[16,32,64]
  sad neon: macroize 32xN definitions
  sad neon: avg for 16x[8,16,32]
  sad neon: macroize 16xN definitions
2017-07-07 21:01:23 +00:00
Johann Koenig
6c375b9cd0 Merge "fdct neon: 32x32_rd" 2017-07-07 14:05:51 +00:00
Johann
e4e08556db sad neon: avg for 64x[32,64]
BUG=webm:1425

Change-Id: Id84d97807a6a0fbcc889c4dfe11929d54f85493d
2017-07-07 07:04:04 -07:00
Johann
67cffc1ef6 sad neon: avg for 32x[16,32,64]
BUG=webm:1425

Change-Id: I3362e0dded3b46ca032caa7f44db42f324bc596d
2017-07-07 07:04:04 -07:00
Johann
527e0c9b1c sad neon: avg for 16x[8,16,32]
BUG=webm:1425

Change-Id: Ia42e4f36547c5fe12114fb58379e34bce82eb2f2
2017-07-07 07:04:04 -07:00
Johann
63bdc574e5 sad neon: avg for 8x[4,8,16]
BUG=webm:1425

Change-Id: If2ab51e3050e078b0011b174efe41fcb65a15f44
2017-07-06 07:43:09 -07:00
Johann
6bac3f80ee sad neon: avg for 4x4 and 4x8
BUG=webm:1425

Change-Id: Ifc685a96cb34f7fd9243b4c674027480564b84fb
2017-07-06 07:12:47 -07:00
Johann
75b00592c7 fdct neon: 32x32_rd
About 40% faster than the non-rd version.

BUG=webm:1424

Change-Id: Ia99d14eb9532302eeaab8cd3e503395b0374b5a2
2017-07-06 06:30:50 -07:00
Alexandra Hájková
c757d6dde4 ppc: Add vpx_idct8x8_64_add_vsx
Change-Id: I4ed1312f365509e0595dcc09890ecb050f6f2069
2017-07-01 12:55:47 -07:00
Alexandra Hájková
d8c277030c ppc: Add vpx_idct4x4_16_add_vsx
Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8
2017-07-01 12:32:18 -07:00
Linfeng Zhang
1e3a93e72e Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7
* changes:
  Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
  sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
  Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
  Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
2017-06-30 20:49:19 +00:00
Linfeng Zhang
c338f3635e Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
BUG=webm:1412

Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29 17:19:34 -07:00
Johann
9fe510c12a partial fdct neon: add 32x32_1
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.

The values could potentially be summed and shifted in place.

BUG=webm:1424

Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
2017-06-28 15:37:44 -07:00
Johann
f310ddc470 partial fdct neon: add 16x16_1
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.

BUG=webm:1424

Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
2017-06-28 15:37:44 -07:00
Johann
4959dd3eb3 partial fdct neon: add 4x4_1
BUG=webm:1424

Change-Id: Ib0f3cfd6116fc1f5a99acb8bfd76e25b90177ffc
2017-06-28 15:37:44 -07:00
Johann
cf75ab6ccd partial fdct neon: move 8x8_1 and enable hbd tests
The function was originally written with HBD in mind. Enable it and
configure the tests.

BUG=webm:1424

Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
2017-06-28 15:37:43 -07:00
Johann Koenig
81e25512c3 Merge changes Ib454762d,I966650df,Ie126553e,I068f06c6,Icb72a94e
* changes:
  sad neon: rewrite 64x64 and add 64x32
  sad neon: rewrite 32x32, add 32x16 and 32x64
  sad neon: rewrite 16x8, 16x16, add 16x32
  sad neon: rewrite 8x8 and 8x16
  sad neon: rewrite 4x4 and add 4x8
2017-06-28 22:37:00 +00:00
Johann Koenig
35f8515c3f Merge "partial fdct test" 2017-06-28 22:34:53 +00:00
Johann
5ac88162b9 partial fdct test
Test the _1 variant of the fdct, which simply sums the block and applies
a modifying shift based on the block size.

BUG=webm:1424

Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468
2017-06-28 20:32:20 +00:00
Johann
ad011aaab8 sad neon: rewrite 64x64 and add 64x32
BUG=webm:1425

Change-Id: Ib454762d1c61b05a98324fe81ad58c9e09784717
2017-06-28 12:21:34 -07:00
Johann
77a648885c sad neon: rewrite 32x32, add 32x16 and 32x64
BUG=webm:1425

Change-Id: I966650df7e3face93e1e771634d1cc5458a35f85
2017-06-28 12:20:27 -07:00
Johann
469643757f sad neon: rewrite 16x8, 16x16, add 16x32
BUG=webm:1425

Change-Id: Ie126553e5fffcdfaf3d82a85b368ac10ce9ab082
2017-06-28 12:16:00 -07:00
Johann
e40e78be24 sad neon: rewrite 8x8 and 8x16
BUG=webm:1425

Change-Id: I068f06c67b841f09ea07c04ada0c2f1706102138
2017-06-28 12:15:57 -07:00
Johann
46d8660ce3 sad neon: rewrite 4x4 and add 4x8
The previous implementation loaded 8 values (discarding half)

BUG=webm:1425

Change-Id: Icb72a94e2557a4ee2db7091266ab58fd92f72158
2017-06-28 11:14:59 -07:00
Linfeng Zhang
8253a27904 Add vpx_highbd_idct4x4_16_add_sse4_1()
BUG=webm:1412

Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
2017-06-23 14:30:12 -07:00
Johann Koenig
794a5ad713 Merge "fdct32x32 neon implementation" 2017-06-23 01:58:00 +00:00
Johann
e67660cf37 fdct32x32 neon implementation
Almost 3x faster in constrained loop testing. Over 10x faster in HBD
builds.

BUG=webm:1424

Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9
2017-06-22 06:40:17 -07:00
Linfeng Zhang
2b43a1ee18 Clean 32x32 full idct sse2 and ssse3 code
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.

Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
2017-06-21 13:46:49 -07:00
Linfeng Zhang
98967645a1 Remove vpx_idct8x8_64_add_ssse3()
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.

Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
2017-06-15 14:09:33 -07:00
Johann Koenig
903375a48a Merge "fdct16x16 neon optimization" 2017-06-08 15:19:36 +00:00
Johann
eae7cf2368 fdct16x16 neon optimization
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.

BUG=webm:1424

Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-06-07 14:59:55 -07:00
James Zern
ff42e04f9c Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}" 2017-06-06 23:52:39 +00:00
James Zern
4753c23983 Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx" 2017-06-06 02:19:41 +00:00
Johann Koenig
755b3daf90 Merge "comp_avg_pred neon: used by sub pixel avg variance" 2017-05-31 18:17:28 +00:00
Johann
f695b30ac2 comp_avg_pred neon: used by sub pixel avg variance
BUG=webm:1423

Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
2017-05-30 22:47:34 +00:00
Alexandra Hájková
8bf6eaf433 ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}
Change-Id: I547d0099e15591655eae954e3ce65fdf3b003123
2017-05-24 13:27:09 +00:00
Johann
f6fcd3410d sub pel avg variance neon: 4x block sizes
BUG=webm:1423

Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b
2017-05-22 14:40:05 -07:00
Johann
188d58eaa9 sub pel variance neon: 4x block sizes
Add optimizations for blocks of width 4

BUG=webm:1423

Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938
2017-05-22 14:40:01 -07:00
Johann
9b0d306a2f sub pel avg variance neon: add neon optimizations
These are missing an optimized version of vpx_comp_avg_pred

BUG=webm:1423

Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed
2017-05-22 13:58:43 -07:00
Linfeng Zhang
c167345ffb Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2
BUG=webm:1412

Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
2017-05-22 11:24:21 -07:00
Johann Koenig
e7cac13016 Merge changes Ib8dd96f7,Ie9854b77
* changes:
  neon variance: process 4x blocks
  use memcpy for unaligned neon stores
2017-05-22 17:48:33 +00:00
Johann
7b742da63e neon variance: process 4x blocks
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.

BUG=webm:1422

Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
2017-05-17 17:35:01 -07:00
Johann
105503b839 neon fdct: 4x4 implementation
Approximately twice as fast as C implementation.

BUG=webm:1424

Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f
2017-05-17 07:38:18 -07:00
Alexandra Hájková
bcbc3929ae ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx
Change-Id: Ic9639b1331d8c5cbc207c2a036891ff0137fc56f
2017-05-13 13:13:15 +00:00
James Zern
ac8f58f6ab Merge changes I1b54a7a5,I3028bdad,I59788cd9
* changes:
  ppc: Add get_mb_ss_vsx
  ppc: Add get4x4sse_cs_vsx
  ppc: Add comp_avg_pred_vsx
2017-05-12 15:24:59 +00:00
Luca Barbato
143b21e362 ppc: Add get_mb_ss_vsx
Change-Id: I1b54a7a5bb642e4b836d786ea1ae506eed025e3f
2017-05-12 17:23:00 +02:00
Luca Barbato
6d225eb5f9 ppc: Add get4x4sse_cs_vsx
Change-Id: I3028bdadf653665d18e781d28e9625f62804b3d8
2017-05-12 17:23:00 +02:00
Luca Barbato
a7f8bd451b ppc: Add comp_avg_pred_vsx
Change-Id: I59788cd98231e707239c2ad95ae54f67cfe24e10
2017-05-12 17:22:55 +02:00
Alexandra Hájková
f48532e271 ppc: Add vpx_sad64x32/64_vsx
Change-Id: I84e3705fa52f75cb91b2bab4abf5cc77585ee3e2
2017-05-12 16:10:16 +02:00
Alexandra Hájková
0b15bf1e54 ppc Add vpx_sad32x16/32/64_vsx
Change-Id: I3c4f9d595275669580413a71b3c3c810e7ddcacd
2017-05-12 16:10:11 +02:00
James Zern
a12ea1d5e9 Merge "ppc: Add vpx_sad16x8/16/32_vsx" 2017-05-12 13:33:51 +00:00
Alexandra Hájková
cc7f0c0f3e ppc: Add vpx_sad16x8/16/32_vsx
Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9
2017-05-10 19:57:30 +00:00
Linfeng Zhang
764b3b8090 Update specializations of idct functions
Introduced append situation in Commit 0178d97 which could be
confusing. Clean a little bit and add some comments.

Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
2017-05-10 12:51:18 -07:00
Johann Koenig
d713ec3c46 Merge changes I92eb4312,Ibb2afe4e
* changes:
  subpel variance neon: add mixed sizes
  sub pixel variance neon: use generic variance
2017-05-10 18:19:52 +00:00
Johann Koenig
1814463864 Merge changes Id602909a,Ib0e85608
* changes:
  neon variance: process two rows of 8 at a time
  neon variance: add small missing sizes
2017-05-08 17:34:20 +00:00
Linfeng Zhang
2c3a2ad6f1 Merge changes I0cfe4117,I3581d80d,Ida62c941
* changes:
  Split dsp/x86/inv_txfm_sse2.c
  Update highbd idct functions arguments to use uint16_t dst
  Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct
2017-05-08 16:15:57 +00:00
Johann
2346a6da4a subpel variance neon: add mixed sizes
Add support for everything except block sizes of 4.

Performance is better but numbers will improve again when the variance
optimizations land.

BUG=webm:1423

Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
2017-05-04 15:30:01 -07:00
Johann
cb9133c72f neon variance: add small missing sizes
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.

When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.

BUG=webm:1422

Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
2017-05-04 08:59:42 -07:00
Linfeng Zhang
d5de63d2be Update highbd idct functions arguments to use uint16_t dst
BUG=webm:1388

Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5
2017-05-03 13:59:16 -07:00
Yi Luo
a3452996a1 High bit depth inter prediction horizontal/vertical filters AVX2
User level speed improvement on i7-6700, cpu-used=1,
  x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps:
- Decoder:
  1080p: ~4%
  4K: ~5%
- Encoder:
  1080p: ~1%
  4K: ~3%

Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640
2017-05-03 12:18:01 -07:00
Linfeng Zhang
a10a5cb356 Merge changes I8bb660de,Ica51d780,I6037525d
* changes:
  Clean specializes of idct functions
  Clean add_protos of highbd idct functions
  Clean add_protos of idct functions
2017-05-03 19:17:55 +00:00
Luca Barbato
e2ad89092d ppc: Add convolve8_vsx and convolve8_avg_vsx
Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857
2017-05-02 20:27:47 -07:00
Luca Barbato
e6ca81ee67 ppc: Add convolve8_avg_vert_vsx
Only the generic one again, speedups for 8x8 and larger blocks to
come later.

Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d
2017-05-02 20:27:42 -07:00
Luca Barbato
a65f1771ad ppc: Add convolve8_vert
Only the generic one again, speedups for 8x8 and larger blocks
to come later.

Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f
2017-05-02 20:27:33 -07:00
Luca Barbato
77772350f3 ppc: Add convolve8_horiz_avg
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15
2017-05-02 20:27:28 -07:00
Luca Barbato
08edb85bd0 ppc: Add convolve8_horiz
The 8x8 and larger blocks cases can be sped up further.

Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046
2017-05-02 20:27:16 -07:00
Linfeng Zhang
0178d974e5 Clean specializes of idct functions
Change-Id: I8bb660de47b5f97263ec381dc428db96e9c9a4b2
2017-05-02 18:01:19 -07:00
Linfeng Zhang
4412996d59 Clean add_protos of highbd idct functions
Change-Id: Ica51d780b92b316ce9112740c56cdf7670816371
2017-05-02 17:59:38 -07:00
Linfeng Zhang
a7a57d9756 Clean add_protos of idct functions
Change-Id: I6037525d92ec172810edab720389eb1865ed3b1a
2017-05-02 17:58:40 -07:00
Luca Barbato
d51d3934f5 ppc: Add convolve_avg
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
2017-04-29 15:47:25 +02:00
Luca Barbato
63860ba7b8 ppc: Add convolve_copy
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
2017-04-29 15:47:25 +02:00
Linfeng Zhang
51dc998f3a Update highbd convolve functions arguments to use uint16_t src/dst
BUG=webm:1388

Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
2017-04-25 14:22:19 -07:00
Luca Barbato
914b160fb5 ppc: h predictor 8x8
Slightly faster with the current compiler.

Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
2017-04-19 19:57:51 -07:00
Luca Barbato
0b9be93205 ppc: d63 predictor 8x8
10x faster.

Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
2017-04-19 19:57:51 -07:00
Luca Barbato
ee9325b0bd ppc: tm predictor 4x4
Slightly faster.

Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
2017-04-19 19:57:51 -07:00
Luca Barbato
2904eb5800 ppc: h predictor 4x4
2x faster.

Change-Id: I0583dec353299c6797401b646099f18db4e0420d
2017-04-19 19:57:51 -07:00
Luca Barbato
58245d7050 ppc: dc predictor 8x8
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.

Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
2017-04-19 19:57:51 -07:00
Luca Barbato
6b4a65e8b1 ppc: d45 predictor 8x8
11x faster.

Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
2017-04-19 19:57:51 -07:00
Luca Barbato
92e33c7b31 ppc: d63 predictor 32x32
About 10x faster.

Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
2017-04-19 19:57:51 -07:00
Luca Barbato
a5469a00a8 ppc: d63 predictor 16x16
About 18x faster.

Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
2017-04-19 19:57:51 -07:00
Luca Barbato
cc868da526 ppc: d45 predictor 32x32
About 12x faster.

Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
2017-04-19 19:57:51 -07:00
Luca Barbato
7a7dc9e624 ppc: d45 predictor 16x16
About 16x faster.

Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
2017-04-19 19:57:51 -07:00
Luca Barbato
c08baa2900 ppc: dc predictor 32x32
10x and 5x faster.

Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
2017-04-19 19:57:47 -07:00
Luca Barbato
22ca468c7c ppc: dc top and left predictor 32x32
6x faster.

Change-Id: I717995b4056e5579c68191d11b495372971fe1ae
2017-04-19 19:49:31 -07:00
Luca Barbato
ad9dea1f6d ppc: dc top and left predictor 16x16
13x faster.

Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6
2017-04-19 19:49:31 -07:00
Luca Barbato
d68d37872c ppc: dc_128 predictor 32x32
6x faster.

Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b
2017-04-19 19:49:31 -07:00
Luca Barbato
f9d20e6df2 ppc: dc_128 predictor 16x16
20x faster.

Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709
2017-04-19 19:49:31 -07:00
Luca Barbato
0d9417de4a ppc: tm predictor 32x32
About 8x faster.

Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
2017-04-19 19:49:26 -07:00
Luca Barbato
479443a570 ppc: tm predictor 16x16
About 10x faster.

Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
2017-04-19 01:48:10 +02:00
Luca Barbato
c8f5a55df4 ppc: tm predictor 8x8
About 5x faster.

Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
2017-04-19 01:48:09 +02:00