Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
MMX variance code in vp8 was reading out of bounds..
TODO(JBB): The best fix would involve removing duplicate library
functions between vp8 and vp9...
Change-Id: I5722853a6a58d3b55257ff385fa54c773bf98ded
This patch fixed errors reported in Issue 746: "dr memory VP8
encode errors" and Issue 745: "dr memory VP8 decode errors".
The "UNINITIALIZED READ" errors were fixed in x86 assembly
code. The list of files fixed is
vp8_intra_pred_uv_tm_sse2
vp8_intra_pred_uv_tm_ssse3
vp8_intra_pred_uv_ho_mmx2
vp8_intra_pred_uv_ho_ssse3
vp8_intra_pred_y_tm_sse2
vp8_intra_pred_y_tm_ssse3
vp8_intra_pred_y_ho_sse2
Change-Id: Ib6df7bf1d442077fe534edfd90e50ad16fadacdd
This patch fixed WebRTC Issue 3020: "Uninit error at
vp8_mbpost_proc_down_xmm". The first 8 values in d were not initialized,
but was accessed. This patch fixed c code as well as mmx and sse2 code.
Change-Id: Iaa5b41a4ed3bea971b15fb826ce34b7ab4e36fb1
The declaration of the bilinear filters specified an alignment clause
in the implementation file but not in the header. This turned out
to be harmless, but it did cause linker warnings to be emitted when
building on Windows.
The (extern) declaration in the header was changed, to match the
declaration in the implementation.
Change-Id: I44be89b1572fe9a50fa47a42e4db9128c4897b04
Jenkins warns on left shift of negative numbers and non-aligned read
of int. This commit fixed the two issues.
Change-Id: I389a7fb6a572c643902e40a4c10fefef94500d2c
xmm[6-11] should be saved and restored for Windows x64; prevents an
encoder mismatch and some datarate issues.
Change-Id: I03c38eb18ec20c6c441cae19416393058baad1ee
xmm6/xmm7 should be saved and restored for Windows x64; prevents an
encoder mismatch and some datarate issues.
Change-Id: Ifa1a82ab25fbdc5112d66f5332e14b16e69ac164
1. Algorithm modification:
Instead of having same filter threshold for a whole frame, now we
allow the thresholds to be adjusted for each macroblock. In current
implementation, to avoid excessive blur on background as reported
in issue480(http://code.google.com/p/webm/issues/detail?id=480), we
reduce the thresholds for skipped macroblocks.
2. SSE2 optimization:
As started in issue479(http://code.google.com/p/webm/issues/detail?id=479),
the filter calculation was adjusted for better performance. The c
code was also modified accordingly. This made the deblock filter
2x faster, and the decoder was 1.2x faster overall.
Next, the demacroblock filter will be modified similarly.
Change-Id: I05e54c3f580ccd427487d085096b3174f2ab7e86
SAD returns unsigned values. Make all the declarations the same.
Remove bestsad initialization and check. It is always set to the
result of a SAD call so it will never remain UINT_MAX
Use ja instead of jg to test unsigned comparison instead of signed.
Update test.
Change-Id: I46336ab45f4e60fc37caf20bd36bc5782079c7a5
This patch fixed issue 458 by calling copy function when both
offsets are 0, which guarantees the SSSE3 functions output
same result as the c function for all possible offsets.
Change-Id: I209aec7a4c6b3362db2646a8887c1038493b6496
Allows building the library with the gcc -pedantic option, for improved
portabilty. In particular, this commit removes usage of C99/C++ style
single-line comments and dynamic struct initializers. This is a
continuation of the work done in commit 97b766a46, which removed most
of these warnings for decode only builds.
Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
Add PRIVATE macro for adding private_extern directive for yasm
to hide global symbols. This is only enabled if -DCHROMIUM is used
with YASM.
Also fixed a small problem with rtcd_defs.sh to guard TEMPORAL_DENOISING.
Change-Id: I9027fce3ebddcf20078293e4b86b396f21da7857
Rework unit tests to have a single executable rather than many, which
should avoid pollution of the visual studio project namespace, improve
build times, and make it easier to use the gtest test sharding system
when we get these going on the continuous build cluster.
Change-Id: If4c3e5d4b3515522869de6c89455c2a64697cca6
Local variable offsets are now consistent for the functions,
removed unused parameters, reworked the assembly to eliminate
stalls/instructions.
Change-Id: Iaa37668f8a9bb8754df435f6a51c3a08d547f879
Break MFQE code into it's own file.
It is currently only valid for 16x16 and 8x8 Y blocks. It also filters
4x4 U/V blocks.
Refactor filtering and add associated assembly. Limited test cases show
--mfqe introduces a penalty of ~20% with HD content. The assembly
reduces the penalty to ~15%
Change-Id: I4b8de6b5cdff5413037de5b6c42f437033ee55bf
On Android NDK, rand() is inlined function. But, on our SSE optimization,
we need symbol for rand()
Change-Id: I42ab00e3255208ba95d7f9b9a8a3605ff58da8e1
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.
Overview:
RTCD "functions" are implemented as either a global function pointer
or a macro (when only one eligible specialization available).
Functions which have RTCD specializations are listed using a simple
DSL identifying the function's base name, its prototype, and the
architecture extensions that specializations are available for.
Advantages over the old system:
- No INVOKE macros. A call to an RTCD function looks like an ordinary
function call.
- No need to pass vtables around.
- If there is only one eligible function to call, the function is
called directly, rather than indirecting through a function pointer.
- Supports the notion of "required" extensions, so in combination with
the above, on x86_64 if the best function available is sse2 or lower
it will be called directly, since all x86_64 platforms implement
sse2.
- Elides all references to functions which will never be called, which
could reduce binary size. For example if sse2 is required and there
are both mmx and sse2 implementations of a certain function, the
code will have no link time references to the mmx code.
- Significantly easier to add a new function, just one file to edit.
Disadvantages:
- Requires global writable data (though this is not a new requirement)
- 1 new generated source file.
Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55
This patch removes the local copies of the dequantize
constants and implements John's idea as described
in "Make a local copy of the dequantized data" commit.
Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2