1815 Commits

Author SHA1 Message Date
Martin Storsjö
23f57adaea Do full register loads instead of single-lane loads in DeblockLumaEq4H_neon
Instead of loading the registers one lane at a time, load full
registers and then transpose them.

This is faster, reducing the runtime for the function from about
506 cycles to 434 cycles (tested on a Cortex A8).

This also avoids an issue which seems like a cpu bug, present
on Sony Xperia T (cpu implementer 0x51 architecture 7 variant 0x1
part 0x04d). On such a device, it seemed like the "vswp q9, q10"
could start executing before the previous
vld4.u8 {d20[x],d21[x],d22[x],d23[x]}, [r3], r1
had finished and written back their result. Changing the
"vswp q9, q10" into "vswp q10, q9", or into separate
"vswp d18, d20; vswp d19, d21" (or the other way around) seemed to
avoid the issue. This happened occasionally (a couple times per
100000 invocations or so).
2014-04-28 10:12:16 +03:00
huili2
9d1af8c378 Merge pull request #751 from huili2/NewSeq_replace_I
use new seq instead of I slice
2014-04-28 13:40:15 +08:00
Licai Guo
669d704fac refine pNzc set access 2014-04-26 16:36:16 -07:00
volvet
c5f04cfbd4 Merge pull request #750 from mstorsjo/deblocking-neon-cpu-features
Check for WELS_CPU_NEON before calling DeblockingBSCalcEnc_neon
2014-04-25 19:05:12 +08:00
volvet
84ff16c015 Merge pull request #749 from mstorsjo/dos-newlines
Remove dos newlines in the android java code
2014-04-25 18:18:11 +08:00
Martin Storsjö
00a724076b Check for WELS_CPU_NEON before calling DeblockingBSCalcEnc_neon
Checking HAVE_NEON is not enough; e.g. android devices with
armeabi-v7a are not required to have NEON, so every use of such
functions should be check WELS_CPU_NEON in the cpu features
as well.
2014-04-25 13:02:22 +03:00
sijchen
bd8d97dddb Merge pull request #748 from huili2/newSeq_bugfix
fix bug of new seq check
2014-04-25 17:54:33 +08:00
huili2
0c544962d8 use new seq instead of I slice 2014-04-25 01:46:00 -07:00
Martin Storsjö
655d9c5dbf Remove dos newlines in the android java code 2014-04-25 11:03:03 +03:00
huili2
c0d21a23f3 Merge pull request #745 from lyao2/scrollingUT
add scroll detection UT
2014-04-25 15:07:15 +08:00
huili2
c65d250817 fix bug of new seq check 2014-04-24 22:11:42 -07:00
volvet
c65e286036 Merge pull request #738 from mstorsjo/gnu-aarch64
Fix building the aarch64 assembly using gnu binutils
2014-04-25 09:07:43 +08:00
ruil2
f57bb5042a Merge pull request #747 from ganyangbbl/resolution_issue
fix resolution setting issue
2014-04-25 09:03:01 +08:00
ganyang
8ee85918c8 fix resolution setting issue 2014-04-24 17:34:31 +08:00
Licai Guo
bcb76d383b Merge pull request #746 from huili2/IDR_bugfix
fix typo of IDR loss
2014-04-24 17:30:04 +08:00
huili2
593e291d19 fix typo of IDR loss 2014-04-24 02:26:40 -07:00
Licai Guo
d905d6bdfd Merge pull request #742 from huili2/ec_ut_copy
add slice/frame copy UT for EC
2014-04-24 17:08:41 +08:00
huili2
c4ab780d21 Merge pull request #732 from licaiguo/forJS
specify accurate align information for ST32
2014-04-24 16:50:00 +08:00
Licai Guo
fe23d53acc Merge pull request #744 from huili2/ec_IdrLoss
enable the case for IDR loss
2014-04-24 16:47:36 +08:00
Licai Guo
41698901c1 Merge pull request #743 from huili2/ec_refidx_return
prevent from return if ref_idx is error
2014-04-24 16:47:20 +08:00
lyao2
34ad719cf2 Squashed commit of the following:
commit f73d6cf0fcae5f401fc2817ab736af996113ca09
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Thu Apr 24 15:02:21 2014 +0800

    remove comments

commit 75416c2cf6c1ebb7aabf9e8c52d8c7163a8009b7
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Thu Apr 24 14:52:09 2014 +0800

    for test

commit 7dfb65ce514edcff892bfb3919921cadcce1d055
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Thu Apr 24 14:12:31 2014 +0800

    for test

commit eff771645e8c349dc4e454ab1751530b3cef18ed
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Thu Apr 24 10:51:34 2014 +0800

    for test

commit 9c42b9a7a04068e70be94529941f549b58e63780
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Wed Apr 23 17:46:59 2014 +0800

    update cpu_flag

commit cce3fccc0a4249b82ab2e0e92fe53579ef942799
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Wed Apr 23 17:26:56 2014 +0800

    for test

commit 3d292995b3c4437a2674a687cc4e8da1b5fb83f5
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Wed Apr 23 16:45:57 2014 +0800

    remove space

commit c608c2ba7cf010f1dcf8c0344f68536c48e181cb
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Wed Apr 23 16:42:43 2014 +0800

    remove tabs

commit 3b769342a06e25ad23a2c86f23a94d0d7ca1a4c8
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Wed Apr 23 16:33:55 2014 +0800

    refine UT case

commit 89b869f0c8f8c9bbd61e9de32caa77877aeae064
Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
Date:   Tue Apr 22 13:40:50 2014 +0800

    Squashed commit of the following:

    commit abe55494134ef8342ffe9566df4e1b3265fe21b6
    Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
    Date:   Tue Apr 22 10:50:07 2014 +0800

        set MV range

    commit 8c7f70c351e50d945c29118bed8b3781c22b7dbc
    Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
    Date:   Mon Apr 21 16:53:10 2014 +0800

        refinement

    commit bf35f19a7dc88743aacf8e89e681e0ef3302d40a
    Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
    Date:   Fri Apr 18 17:24:31 2014 +0800

        correct tabs

    commit 130b7f895d7020bfc571d910966891da93150242
    Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
    Date:   Fri Apr 18 17:17:06 2014 +0800

        correct format

    commit 0429703b0844363559dd2b3d44e45034232a9d8f
    Author: lyao2 <lyao2@LYAO2-WS01.cisco.com>
    Date:   Fri Apr 18 15:12:44 2014 +0800

        add scroll UT
2014-04-24 15:12:49 +08:00
huili2
314005435e enable the case for IDR loss 2014-04-23 23:13:12 -07:00
ruil2
ba8e3f2967 Merge pull request #741 from varunbpatil/ref_frames
Fix calculation of num ref frames
2014-04-24 14:06:46 +08:00
Varun B Patil
422d1d1569 Fix calculation of num ref frames 2014-04-24 10:42:15 +05:30
huili2
4a6259cf74 add slice/frame copy UT for EC 2014-04-23 21:45:17 -07:00
huili2
9d5bc6fd74 Merge pull request #740 from licaiguo/test
fix bNewSeqBegin logic
2014-04-24 11:02:26 +08:00
Licai Guo
bada2d35bf fix bNewSeqBegin logic 2014-04-23 18:50:11 -07:00
Licai Guo
f00d3ac15f Merge pull request #737 from mstorsjo/make-aarch64
Add support for building the arm64 assembly with the make build system
2014-04-24 06:40:07 +08:00
Martin Storsjö
66f58e8357 Add macros for the non-standard mov.16b/mov.8b/ext.16b/ext.8b
This fixes building with gnu binutils, which don't support this
nonstandard form of the instructions.

Once Apple's tools support the proper standard form of the
instructions, the code should be updated to use that everywhere
instead, and these macros should be removed.
2014-04-23 11:47:12 +03:00
Martin Storsjö
7cd175d097 Use the correct ext syntax in the gnu version of macros 2014-04-23 11:47:12 +03:00
Martin Storsjö
b13a399ab5 Use a plain "ret" instead of "ret lr"
This fixes an issue with assembling with gnu binutils.
2014-04-23 11:47:12 +03:00
Martin Storsjö
f2642b308a Add correct arguments to the gnu version of UNPACK_FILTER_SINGLE_TAG_16BITS 2014-04-23 11:47:12 +03:00
Martin Storsjö
90fad9fd98 Add \() to macro arguments to separate the argument from the following .8h or similar 2014-04-23 11:47:12 +03:00
Martin Storsjö
80bd541cbe Remove .syntax unified from the aarch64 common header
This directive isn't available in aarch64 code, only in arm code.
2014-04-23 11:47:12 +03:00
Martin Storsjö
f1b2d51d86 Add support for building the arm64 assembly in platform-arch.mk 2014-04-23 11:44:47 +03:00
Martin Storsjö
3c2e9cd7bf Regenerate makefiles to include the new arm64 assembly files 2014-04-23 11:44:47 +03:00
Martin Storsjö
84ff82ee24 Exclude the new arm64 include file 2014-04-23 11:44:47 +03:00
Martin Storsjö
c8901c7dcd Add support for arm64 assembly source files in mktargets.py
Disambiguate between arm and arm64 sources by checking the directory
names.

The arm assembly sources can be assembled on arm64 and vice versa
without any effect since all of the implementation is hidden behind
the HAVE_NEON and HAVE_NEON_AARCH64 defines, but it still is cleaner
to not build extra empty object files than to build all *.S files
on all arm variants. (The iOS project files build all of the arm
assembly files, regardless of the target architecture, since
individual files can't easily be excluded based on the target
architecture there.)
2014-04-23 11:41:17 +03:00
Martin Storsjö
764f787dcb Rename the makefile variable for arm assembly sources
This is in preparation for adding support for the aarc64 assembly
files as well.
2014-04-23 10:55:30 +03:00
Martin Storsjö
788b67cbde Fix the indentation of a line in targets.mk
This would be avoided if the targets.mk files are updated by
rerunning mktargets.sh instead of manually updating them.
2014-04-23 10:55:30 +03:00
volvet
f1737cbec6 Merge pull request #736 from licaiguo/update-gitignore
update .gitignore to ignore *.orig
2014-04-23 15:40:56 +08:00
Licai Guo
5507c76e5d update .gitignore to ignore *.orig 2014-04-23 00:20:08 -07:00
Licai Guo
021fff491b Merge pull request #735 from mstorsjo/cleanup-mess
Clean up the mess left by merging the motion compensation arm64 neon code
2014-04-23 15:19:15 +08:00
Martin Storsjö
a842f14a3c Remove .orig files left over from running astyle 2014-04-23 09:24:23 +03:00
Martin Storsjö
45aef90d26 Remove the executable bit from source files 2014-04-23 09:23:56 +03:00
Licai Guo
b6a765ad71 Merge pull request #734 from dongzha/MC_ARM64
Add Motion Compehension ARM64 Neon Code
2014-04-23 13:58:26 +08:00
huili2
13db7fea7b prevent from return if ref_idx is error 2014-04-22 22:43:40 -07:00
dongzhang
ad9e2dab4f Add Motion Compehension ARM64 Neon Code 2014-04-23 13:26:28 +08:00
Licai Guo
b47606a4ff Merge pull request #733 from dongzha/ExpandPic_ARM64
Add expand picture support for ARM64 NEON
2014-04-23 09:57:39 +08:00
dongzhang
2444327a6c Add expand picture support for ARM64 NEON
Remove duplicate MACROS
2014-04-23 09:14:32 +08:00