boost/libs/yap/doc/codegen.qbk
2021-10-05 21:37:46 +02:00

131 lines
4.0 KiB
Plaintext

[section Object Code]
Let's look at some assembly. All assembly here was produced with Clang 4.0
with `-O3`. Given these definitions:
[arithmetic_perf_decls]
Here is a _yap_-based arithmetic function:
[arithmetic_perf_eval_as_yap_expr]
and the assembly it produces:
arithmetic_perf[0x100001c00] <+0>: pushq %rbp
arithmetic_perf[0x100001c01] <+1>: movq %rsp, %rbp
arithmetic_perf[0x100001c04] <+4>: mulsd %xmm1, %xmm0
arithmetic_perf[0x100001c08] <+8>: addsd %xmm2, %xmm0
arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
arithmetic_perf[0x100001c10] <+16>: mulsd %xmm1, %xmm1
arithmetic_perf[0x100001c14] <+20>: addsd %xmm0, %xmm1
arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
arithmetic_perf[0x100001c1c] <+28>: popq %rbp
arithmetic_perf[0x100001c1d] <+29>: retq
And for the equivalent function using builtin expressions:
[arithmetic_perf_eval_as_cpp_expr]
the assembly is:
arithmetic_perf[0x100001e10] <+0>: pushq %rbp
arithmetic_perf[0x100001e11] <+1>: movq %rsp, %rbp
arithmetic_perf[0x100001e14] <+4>: mulsd %xmm1, %xmm0
arithmetic_perf[0x100001e18] <+8>: addsd %xmm2, %xmm0
arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
arithmetic_perf[0x100001e20] <+16>: mulsd %xmm1, %xmm1
arithmetic_perf[0x100001e24] <+20>: addsd %xmm0, %xmm1
arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
arithmetic_perf[0x100001e2c] <+28>: popq %rbp
arithmetic_perf[0x100001e2d] <+29>: retq
If we increase the number of terminals by a factor of four:
[arithmetic_perf_eval_as_yap_expr_4x]
the results are the same: in this simple case, the _yap_ and builtin
expressions result in the same object code.
However, increasing the number of terminals by an additional factor of 2.5
(for a total of 90 terminals), the inliner can no longer do as well for _yap_
expressions as for builtin ones.
More complex nonarithmetic code produces more mixed results. For example, here
is a function using code from the Map Assign example:
std::map<std::string, int> make_map_with_boost_yap ()
{
return map_list_of
("<", 1)
("<=",2)
(">", 3)
(">=",4)
("=", 5)
("<>",6)
;
}
By contrast, here is the Boost.Assign version of the same function:
std::map<std::string, int> make_map_with_boost_assign ()
{
return boost::assign::map_list_of
("<", 1)
("<=",2)
(">", 3)
(">=",4)
("=", 5)
("<>",6)
;
}
Here is how you might do it "manually":
std::map<std::string, int> make_map_manually ()
{
std::map<std::string, int> retval;
retval.emplace("<", 1);
retval.emplace("<=",2);
retval.emplace(">", 3);
retval.emplace(">=",4);
retval.emplace("=", 5);
retval.emplace("<>",6);
return retval;
}
Finally, here is the same map created from an initializer list:
std::map<std::string, int> make_map_inializer_list ()
{
std::map<std::string, int> retval = {
{"<", 1},
{"<=",2},
{">", 3},
{">=",4},
{"=", 5},
{"<>",6}
};
return retval;
}
All of these produce roughly the same amount of assembly instructions.
Benchmarking these four functions with Google Benchmark yields these results:
[table Runtimes of Different Map Constructions
[[Function] [Time (ns)]]
[[make_map_with_boost_yap()] [1285]]
[[make_map_with_boost_assign()] [1459]]
[[make_map_manually()] [985]]
[[make_map_inializer_list()] [954]]
]
The _yap_-based implementation finishes in the middle of the pack.
In general, the expression trees produced by _yap_ get evaluated down to
something close to the hand-written equivalent. There is an abstraction
penalty, but it is small for reasonably-sized expressions.
[endsect]