Average vertically before horizontally; horizontal averaging is more
worksome. Doing the vertical averaging first reduces the number of
horizontal averages by half.
Use pmaddubsw and pavgw to do the horizontal averaging for a slight
performance improvement.
Minor tweaks.
Improve the SSSE3 dyadic downsample routines and drop the SSE4 routines.
The non-temporal loads used in the SSE4 routines do nothing for cache-
backed memory AFAIK.
Adjust tests because averaging vertically first gives slightly different
output.
~2.39x speedup for the widthx32 routine on Haswell when not memory-bound.
~2.20x speedup for the widthx16 routine on Haswell when not memory-bound.
Note that the widthx16 routine can be unrolled for further speedup.
This allows decoders to know that a baseline stream doesn't
contain any of the features that aren't in the constrained
baseline subset.
Set the flags based on what profile we target.
1, ref_list_mgr_svc.cpp, Ln314: fix a wrong DeleteLTRFromLongList which should not be used under screen strategy
2, ref_list_mgr_svc.cpp, Ln910: remove the frame which is furthest in distance to the current frame in ref list
3, wels_preprocess.cpp: use scene LTR ref list when the current frame is scene LTR
-, ref_list_mgr_svc.cpp, Ln811: add DEBUG trace
reviewed at: https://rbcommons.com/s/OpenH264/r/870/
Previously these lines were within an #ifdef MT_ENABLED block,
but now that threading is enabled by default, we should probably
only do it if threading has been requested.
1. add pNalLen in Structure SWelsEncoderOutput to store each nal length
2. rename iNalLengthInByte[MAX_NAL_UNITS_IN_LAYER] to pNalLengthInByte in Structure SLayerBSInfo, the pointer point to pNalLen, like pBSBuf point to pFrameBS.
Since this is on by default now, we don't need a separate test
that has it enabled. Thus we can now remove one of the encoder
tests.
By not overriding the deblocking setting at all, the hash
changes for the other tests that use SEncParamExt.