generic-library/vpx

Author	SHA1	Message	Date
Johann	1088b4f87c	move neon load/stores to a new file Move the tran_low_t helper functions to a new file. Additional load/store functions will be added here. Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3	2017-05-15 08:29:43 -07:00
Linfeng Zhang	6fc2e57c2c	Update 32x32 high bitdepth idct NEON optimization Preparation of CONVERT_TO_BYTEPTR/SHORTPTR clean up. BUG=webm:1388 Change-Id: I928d30a5698023bb90888d783cf81c51ec183760	2017-04-05 15:28:11 -07:00
James Zern	f91c3bb3ab	idct_neon: prefix non-static functions w/'vpx_' Change-Id: I94fcdeae18468e6ef0cb7119b8142d982a048031	2017-03-22 11:49:23 -07:00
Linfeng Zhang	50b13f75b8	Add vpx_highbd_idct32x32_34_add_neon() BUG=webm:1301 Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06	2017-03-17 00:27:46 -07:00
Linfeng Zhang	c756eb01c8	Fix overflow issue in 32x32 idct NEON intrinsics Similar issue as Change `bc1c18e`. The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon() in high bit-depth mode exposes 16-bit overflow in final stage of pass 2, when changing the test number from 1,000 to 1,000,000. Change to use saturating add/sub for vpx_idct32x32_34_add_neon(), vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high bit-depth mode. Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f	2017-03-14 16:59:14 -07:00
Linfeng Zhang	c4e5c54d69	cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon() No speed changes and disassembly is almost identical. Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f	2017-03-08 08:58:32 -08:00
Johann	2c24f7178d	Move load_and_transpose to transpose_neon.h Allows for use outside the idcts without pulling in idct_neon.h Change-Id: I4a94c1af3dac3e1b5bc8296ec9eab0ddcc8cfecf	2016-12-09 12:54:55 -08:00
James Zern	568d4b1d63	idct_neon: rename load_tran_low_to_s16 -> ...s16q BUG=webm:1294 Change-Id: I164cfcbe9bc4511d1d04af9206cf351a0ec2957b	2016-11-23 19:57:48 -08:00
James Zern	738c8f23c6	enable vpx_idct32x32_34_add_neon in hbd builds replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is used in idct32_8_neon() where the input is the correctly sized output from the earlier stage. BUG=webm:1294 Change-Id: I4257c4b3a421b2cf5d13651f966eee0680ef98a9	2016-11-08 17:03:36 -08:00
Johann	bf8ab194ee	arm idct: move to-be-shared code to header Change-Id: I67458cd358b4dc4434bbdbfcdd571769561b619e	2016-11-01 15:43:56 -07:00
Johann	9720b58aac	Optimize idct32x32_34_add for NEON Approximately 3 times faster than the 1024 version which was used previously. BUG=webm:1295 Change-Id: Id15fb3d096029ec38ef01c53e5f6eb08254347c9	2016-10-25 15:43:58 -07:00

11 Commits