**Document number**: P0037R4
**Date**: 2017-10-16
**Reply-to**: John McFarlane, [mcfarlane.john+fixed-point@gmail.com](mailto:mcfarlane.john+fixed-point@gmail.com)
**Audience**: SG6, SG14, LEWG
# Fixed-Point Real Numbers
## I. Introduction
This proposal introduces a system for performing binary fixed-point
arithmetic using integral types.
## II. Motivation
Floating-point types are an exceedingly versatile and widely supported
method of expressing real numbers on modern architectures.
However, there are certain situations where fixed-point arithmetic is
preferable:
* Some systems lack native floating-point registers and must emulate them in software;
* many others are capable of performing some or all operations more efficiently using integer arithmetic;
* certain applications can suffer from the variability in precision which comes from a dynamic radix point
[\[pathengine\]](http://www.pathengine.com/Contents/Overview/FundamentalConcepts/WhyIntegerCoordinates/page.php);
* in situations where a variable exponent is not desired,
it takes valuable space away from the significand and reduces precision and
* not all hardware and compilers produce exactly the same results, leading to non-deterministic results.
Integer types provide the basis for an efficient
representation of binary fixed-point real numbers. However, laborious,
error-prone steps are required to normalize the results of certain
operations and to convert to and from fixed-point types.
A set of tools for defining and manipulating fixed-point types is
proposed. These tools are designed to make work easier for those who
traditionally use integers to perform low-level, high-performance
fixed-point computation.
They are composable such that a wide range of trade-offs between speed, accuracy and safety are supported.
## III. Impact On the Standard
This proposal is a pure library extension. It does not require
changes to any standard classes or functions.
It adds several new class and function templates
to new header file, ``.
## IV. Design Decisions
The design is driven by the following aims in roughly descending
order:
1. to automate the task of using integer types to perform low-level
binary fixed-point arithmetic;
2. to facilitate a style of code that is intuitive to anyone who is
comfortable with integer and floating-point arithmetic;
3. to treat fixed-point as a super-set of integer such that
a fixed-point type with an exponent of zero can provide
a drop-in replacement for its underlying integer type
4. to avoid incurring expense for unused features - including compilation time.
More generally, the aim of this proposal is to contain within a single API
all the tools necessary to perform binary fixed-point arithmetic.
The design facilitates a wide range of competing compile-time strategies for
avoiding overflow and precision loss, but implements only the simplest by default.
Similarly, orthogonal concerns such as run-time overflow detection and rounding modes
are deferred to the underlying integer types used as storage.
### Class Template
Fixed-point numbers are specializations of
template
class fixed_point;
where the template parameters are described as follows.
#### `Rep` Type Template Parameter
This parameter indicates the integral type used as storage.
Fundamental integral types other than `bool` are ideal choices
but any suitably integer-like type can be used.
Other than scale, the characteristics of `fixed_point`
are the characteristics of `Rep` including:
* signedness;
* number of digits;
* behavior or operators and
* alignment.
#### `Exponent` Non-Type Template Parameter
The exponent of a fixed-point type is the equivalent of the exponent
field in a floating-point type and shifts the stored value by the
requisite number of bits necessary to produce the desired range. The
default value of `Exponent` is zero, giving `fixed_point` the same
range as `T`.
By far the most common use of fixed-point is to store values with fractional digits.
Thus, the exponent is typically a negative value.
The resolution of a specialization of `fixed_point` is
pow(2, Exponent)
and the minimum and maximum values are
std::numeric_limits::min() * pow(2, Exponent)
and
std::numeric_limits::max() * pow(2, Exponent)
respectively.
Any usage that results in values of `Exponent` which lie outside the
range, (`INT_MIN / 2`, `INT_MAX / 2`), may result in undefined
behavior and/or overflow or underflow. This range of exponent values
is far in excess of the largest built-in floating-point type and should
be adequate for all intents and purposes.
### Conversion
While effort is made to ensure that significant digits are not lost
during conversion, no effort is made to avoid rounding errors.
Whatever would happen when converting to and from `Rep` largely
applies to `fixed_point` objects also. For example:
fixed_point{.499}==0
...equates to `true` and is considered an acceptable rounding error.
### Access to `Rep` Value
It is sometimes necessary to read from and write to
the `Rep` value contained in a `fixed_point` object.
This is supported through numeric traits, `to_rep` and `from_rep` respectively.
These traits are described in paper,
[\[P0675\]](https://github.com/johnmcfarlane/papers/blob/master/wg21/p0675r0.md).
constexpr auto a = from_rep>()(320);
static_assert(a == 1.25);
constexpr auto b = to_rep>()(a);
static_assert(b == 320); // 1.25*(1<<8)
### Class Template Deduction
The type of a `fixed_point` object can be deduced by an integer initializer:
auto a = fixed_point(0ul);
static_assert(is_same_v>);
It can also be deduced with an integral constant of type `constant`
(described in [\[P0827\]](https://github.com/johnmcfarlane/papers/blob/master/wg21/p0827r0.md)):
constexpr auto b = fixed_point(constant<0xFF00000000L>{});
static_assert(is_same_v>);
static_assert(to_rep>()(b) == 0xFF);
For `Exponent`, the highest value which does not incur data loss is used.
This minimizes the required range of the underlying integer value
which reduces the likelihood of out-of-range errors during arithmetic operations.
For `Rep`, a fundamental integer type of `int` width is preferred unless a wider type is required.
### Operators
Any arithmetic, comparison, logic and bitwise operators
that might be applied to integer types can also be
applied to fixed-point types. A guiding principle of operator
overloads is that they perform as little run-time computation as is
practically possible.
With the exception of shift and comparison operators, binary operators
can take any combination of:
* one or two fixed-point arguments and
* zero or one arguments of any arithmetic type, i.e. a type for which
`numeric_limits` is specialized.
Assuming a binary operation, `@`, in the form
auto R = S @ T;
where `S` is of type `fixed_point`
and `T` is a numeric type — possibly another `fixed_point` instantiation —
then result, `R`, of the operation is determined as follows.
1. If `T` is a floating-point type, `Float`, then `S` is cast to `Float`
and a floating-point operation takes place, e.g.:
auto a = fixed_point(3) + 4.f;
static_assert(is_same_v);
2. If `T` is a `constant` of integer type, `Integer`, then:
a. If the operator is bitwise left shift (`<<`),
then the result is `fixed_point`
with the shift operator applied.
b. If the operator is bitwise right shift (`>>`),
then the result is `fixed_point`
with the shift operator applied.
c. Otherwise, `T` is converted to `fixed_point(T{})`, e.g.
auto b = fixed_point(200U) - constant<100L>{};
static_assert(is_same_v(200) - fixed_point(100))>);
and proceeding rule #4 subsequently applies.
3. If `T` is an integer type, `Integer`, then:
a. If the operator is bitwise shift (`<<` or `>>`),
then the result is type `S` with the shift operator applied.
b. Otherwise, `T` is cast to `fixed_point`, e.g.
auto c = fixed_point<>(5) * 6ul;
static_assert(is_same_v(5) * fixed_point(6))>);
and proceeding rule #4 subsequently applies.
4. If `T` is type, `fixed_point`, then:
a. If the operator is multiplication (`*`),
then the result is `fixed_point`, e.g.:
constexpr auto d = fixed_point{1.25} * fixed_point{8};
static_assert(is_same_v>);
static_assert(d == 10);
b. If the operator is division (`/`),
then the result is `fixed_point`, e.g.:
constexpr auto e = fixed_point{1.5} / fixed_point{2.5};
static_assert(is_same_v>);
static_assert(e == .5);
c. If the operator is modulo (`%`),
then the result is `fixed_point`, e.g.:
constexpr auto f = fixed_point{1.5} % fixed_point{2.5};
static_assert(is_same_v>);
static_assert(f == .25);
d. If the operator is addition (`+`) or subtraction (`-`), then the operand
with the greater exponent is converted such that its exponent matches the
other operands' exponent. Then the result is
`fixed_point`, eg.:
constexpr auto g = fixed_point{12.5} - fixed_point{8};
static_assert(is_same_v>);
static_assert(g == 4.5);
e. If the operator is comparison (`==`, `!=`, `<`, `>`, `<=` or `>=`), then
the operand with the greater exponent is converted such that its exponent
matches the other operands' exponent. Then the result is
`decltype(RepS@RepT)`, eg.:
constexpr auto h = fixed_point{12.5} <= fixed_point{8};
static_assert(is_same_v);
static_assert(h == false);
(See section, **Extended Comparison Range**, for additional details.)
f. If the operator is bitwise or (`|`) or xor (`^`) then the same rules as
addition (`+`) are applied.
g. If the operator is bitwise and (`&`) then the same rules as bitwise or (`|`)
are applied except that the greater — not less — exponent is preferred.
Some details have been left out for brevity. Unary operators are supported.
Some minor variations occur when `S` is not `fixed_point` and `T` is `fixed_point`.
The complete set of rules may appear to be large and complex. However, this
mostly reflects the existing complexity in the behavior of arithmetic types.
Relatively few design principles govern these rules:
1. A `fixed_point` should follow the same behavior as `T` to the greatest
extend practical, reflecting the facts that: a) all integers are fixed-point
— rather than floating-point types and b) integer arithmetic generally
provides the best efficiency and performance characteristics.
2. In situations where a trade-off between overflow and underflow must be made,
the design guards against underflow. This follows from principle #1.
Far more operations can cause overflow and users are generally more wart of
it. And detection/handling of overflow is an orthogonal concern which is best
implemented using a custom numeric type such as the `safe_integer` and
`elastic_integer` types discussed in
[\[P0554\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html#componentization).
### Named Arithmetic Functions
The following named function templates can be used as alternatives to arithmetic operators, `*` and `/`.
template
constexpr auto multiply(const Lhs& lhs, const Rhs& rhs);
template
constexpr auto divide(const Lhs& lhs, const Rhs& rhs);
They eschew machine-friendly integer promotion
in favor of custom widening rules
that place conciseness and correctness before raw performance.
Multiplying two 8-bit values does not necessarily produce an int-sized result:
auto f = fixed_point{15.9375};
auto p = multiply(f, f);
// p === fixed_point{254.00390625}
The arithmetic operators shield the user from none of the pitfalls of fixed-point arithmetic.
For example, naive use of `*` can easily lead to surprising results.
auto f = fixed_point{15.9375};
auto p = f * f;
// p === fixed_point{0}
In contrast, the `multiply` function returns appropriately widened results.
It does a better job at avoiding unnecessary narrowing and catastrophic information loss.
auto f = fixed_point{15.9375};
auto p = multiply(f, f);
// p === fixed_point{254.00390625}
Function, `divide`, performance a similar job. It follows rules laid out in
[\[P0106\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html).
In doing so, it performs division in a manner which may be less surprising to
users who are familiar with floating-point division.
constexpr auto n = fixed_point{1};
constexpr auto d = fixed_point{2};
// information appears to be lost by division operator
constexpr auto q1 = n / d;
static_assert(is_same_v>);
static_assert(q1 == 0);
// but divide preserves accurate results where practical
constexpr auto q2 = divide(n, d);
static_assert(is_same_v>);
static_assert(q2 == 0.5);
### Alternative Types for `Rep`
Using built-in integral types as the default underlying representation
minimizes certain costs:
* many fixed-point operations are as efficient as their integral equivalents;
* compile-time complexity is kept relatively low and
* the behavior of fixed-point types should cause few surprises.
However, this choice also brings with it many of the deficiencies of built-in types.
For example:
* the typical rounding behavior is distinct for:
* conversion from floating-point types;
* right shift and
* divide operations;
* all of these rounding behaviors cause drift and propagate error;
* overflow, underflow and flush are handled silently with wrap-around or undefined behavior;
* divide-by-zero similarly results in undefined behavior and
* the range of values is limited by the largest type: `long long int`.
The effort involved in addressing these deficiencies is non-trivial
and on-going (for example [\[P0105\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0105r0.html)).
As solutions are made available, it should become easier
to define custom integral types which address concerns surrounding robustness and correctness.
How to combine such numeric types is the topic of
[\[P0554\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html),
**Composition of Arithmetic Types**.
#### Required Specializations
For a type to be suitable as parameter, `Rep`, of `fixed_point`,
it must meet the following requirements:
* it must have specialized the following existing standard library types:
* `numeric_limits`
* `make_signed` and `make_unsigned`
* it must have specialized the following proposed standard library types as described
in [\[P0675\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0675r0.html):
* `num_digits` and `set_num_digits`,
* `to_rep`, `from_rep` and `from_value`.
Note that `make_signed` and `make_unsigned` must not be specialized for custom types.
Unless this rule can be relaxed, some equivalent mechanism must be introduced
in order for custom types to be used with `fixed_point<>`.
One possibility is the addition of `numeric_limits<>::signed` and `numeric_limits<>::unsigned` type aliases.
### Example
The following function, `magnitude`, calculates the magnitude of a 3-dimensional vector.
template
constexpr auto magnitude(Fp x, Fp y, Fp z)
{
return sqrt(x*x+y*y+z*z);
}
And here is a call to `magnitude`.
auto m = magnitude(
fixed_point(1),
fixed_point(4),
fixed_point(9));
// m === fixed_point{9.8994948863983154}
## V. Technical Specification
### Header \ Synopsis
namespace std {
template class fixed_point;
// for each unary arithmetic, comparison, logic and bitwise operator, @
template
constexpr auto operator@(
const fixed_point & rhs);
// for each binary arithmetic, comparison, logic and bitwise operator, @
// (and for named functions, multiply and divide)
template
constexpr auto operator@(
const fixed_point & lhs,
const fixed_point & rhs);
template ::is_iec559>>
constexpr auto operator@(
const fixed_point & lhs,
const RhsFloat & rhs);
template ::is_iec559>>
constexpr auto operator@(
const LhsFloat & lhs,
const fixed_point & rhs);
template ::is_integer>>
constexpr auto operator@(
const fixed_point & lhs,
const RhsInteger & rhs);
template ::is_integer>>
constexpr auto operator@(
const LhsInteger & lhs,
const fixed_point & rhs);
template
constexpr auto operator@(
const fixed_point & lhs,
constant);
template
constexpr auto operator@(
constant,
const fixed_point & rhs);
// for each arithmetic, comparison, logic and bitwise compound assignment operator, @=
template
constexpr auto operator@=(
fixed_point & lhs,
const fixed_point & rhs);
template ::is_iec559>>
constexpr auto operator@=(
fixed_point & lhs,
const RhsFloat & rhs);
template ::is_iec559>>
constexpr auto operator@=(
LhsFloat & lhs,
const fixed_point & rhs);
template ::is_integer>>
constexpr auto operator@=(
fixed_point & lhs,
const RhsInteger & rhs);
template ::is_integer>>
constexpr auto operator@=(
LhsInteger & lhs,
const fixed_point & rhs);
template
constexpr auto operator@=(
fixed_point & lhs,
constant);
}
#### `fixed_point<>` Class Template
``` c++
template
class fixed_point
{
public:
using rep = Rep;
constexpr static int exponent;
constexpr fixed_point();
template
constexpr fixed_point(fixed_point const&);
template
constexpr fixed_point(constant);
template::is_integer, int> Dummy = 0>
constexpr fixed_point(S const&);
template::is_iec559, int> Dummy = 0>
constexpr fixed_point(S);
template::is_integer, int> Dummy = 0>
constexpr fixed_point& operator=(S);
template::is_iec559, int> Dummy = 0>
constexpr fixed_point& operator=(S);
template
constexpr fixed_point& operator=(fixed_point const&);
template::is_integer, int> Dummy = 0>
explicit constexpr operator S() const;
template::is_iec559, int> Dummy = 0>
explicit constexpr operator S() const;
};
```
## VI. Open Issues
### Library Support
Because the aim is to provide an alternative to existing arithmetic
types which are supported by the standard library, it is conceivable
that a future proposal might specialize existing class templates and
overload existing functions.
Possible candidates for overloading include the functions defined in
\ and a templated specialization of `numeric_limits`. A new type
trait, `is_fixed_point`, would also be useful.
While `fixed_point` is intended to provide drop-in replacements to
existing built-ins, it may be preferable to deviate slightly from the
behavior of certain standard functions. For example, overloads of
functions from \ will be considerably less concise, efficient
and versatile if they obey rules surrounding error cases. In
particular, the guarantee of setting `errno` in the case of an error
prevents a function from being defined as pure. This highlights a
wider issue surrounding the adoption of the functional approach and
compile-time computation that is beyond the scope of this document.
One suggested addition is a specialization of `std::complex`.
This would take the form:
template
class complex>;
This type's arithmetic operators would differ from existing specializations
because `fixed_point<>` operators often return results
of a different type to their operands. Hence signatures such as
```c+
template
complex operator*( const complex& lhs, const complex& rhs);
```
would need to be replaced with:
```c+
template
auto operator*( const complex& lhs, const complex& rhs);
```
### Different Division Strategies
Integer and floating-point types solve division in fundamentally different ways.
This poses a problem for the unwary user of `fixed_point` who may see the radix
point and assume that `fixed_point` division will behave the same as
floating-point. For instance, almost 50% of all integer and `fixed_point`
division operations contain a numerator greater than the denominator, resulting
in a value of zero. And where both operands have the same exponent, the result
will have an exponent of zero.
The `divide` function answers the needs of users expecting results to contain a
number of integer and fractional digits complementing to those of the operands.
However, this does not stop the behavior of the division operator (`/`) from
being a source of surprise.
One reason for not giving the division operator the same behavior as `divide`
is that results will vary between different platforms depending on the width
of types such as `int` and `long`. This is not a problem for types such as `elastic_integer`
([\[P0554\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html#componentization))
which explicitly represents the number of bits in its type.
It is proposed that specialization, `fixed_point, E>`,
have its own division operator whose behavior closely resembles that of `divide`.
### Template Parameter Order
It is quite possible that the order of the two template parameters of
`fixed_point` are the wrong way around and should be:
template
class fixed_point;
While being able to express `fixed_point` reinforces the fact that
integers are merely fixed-point types with radix point fixed at zero,
the second parameter, `Exponent` is perhaps more important to specify.
### Non-Binary Radixes
Interest in decimal fixed-point arithmetic has been observed.
It seems plausible that a general-purpose type could support both
binary and decimal fixed-point types as follows:
template
class basic_fixed_point;
template
using fixed_point = basic_fixed_point;
template
using decimal_fixed_point = basic_fixed_point;
This naming scheme imitates `basic_string`
in order to illustrate a similar pattern.
Further investigation needs to be conducted
in order to ascertain whether this break-down
can maintain the same level of efficiency.
### Extended Comparison Range
Comparison operations between two `fixed_point` operands require that they both
have the same exponent. When they do not, conversion takes place to ensure they
do. Unfortunately, if the difference in exponents is too great, the conversion
may cause an out-of-bounds condition.
However, where two operands have bits whose values are in ranges that do not
overlap, it may not be necessary to perform a conversion which results in
out-of-range results: a result that ensures they continue to not overlap may be
sufficient. For example,
static_assert(fixed_point{0} < fixed_point{4.e38});
requires that the right-hand operand be converted to `fixed_point`.
This will result in the underlying integer being scaled up by 1000 bits,
resulting in undefined behavior and/or a flushed value. But in this case,
it only needed to be scaled by 8 bits in order for none of its bit values to
overlap with those of the left-hand operand.
### Extensions to Exponent Parameter
The possibility of scaling by other than an exponent has been discussed previously
[\[future\]](https://groups.google.com/a/isocpp.org/forum/#!search/fixed_point/std-proposals/nK2EsdEf20A/LWmGn-GhAwAJ).
It is likely that `fixed_point` is just one example of a more general numeric
type, `scaled`:
// a numeric type
template
class scaled;
// a member of concept, "scale"
template
class power;
// fixed_point as a numeric type scaled by powers of 2
template
using fixed_point = scaled>;
This approach separates two concerns currently combined in `fixed_point`:
1. using an existing number type to represent a new number type and
2. applying a scaling factor to the existing number.
In this dichotomy, `scaled` holds onto a member variable of type, `Rep`,
while `power` is stateless and provides an API which converts to and from
the `Rep` type through function parameters. `power` also provides
arithmetic support.
Alternatives to `power` would include:
* existing class, `ratio` (with additional arithmetic support)
* units (containing static dimension information)
* rounding and overflow-handling facilities
* bounded integer
* unit scalar (in the range [0, 1] or [-1, 1])
This design is a matter for future experimentation but it has
immediate consequences for the API of `fixed_point`.
### Disable Addition/Subtraction if Exponents are Different
When `fixed_point` operands of different exponent values are added and
subtracted, one of them must first be scaled. This may represent an
unexpected performance regression to some users. It might be clearer
if the user is force to align the operands explicitly:
auto l = fixed_point{0x55}, r = fixed_point{0xAA}
// stop this from compiling
auto sum = l + r;
// force user to pick a common exponent
auto sum = l + fixed_point{r};
### Named Functions
Functions, `multiply` and `divide` are of limited utility.
They can be implemented by the user using casts.
In the other direction, `add` and `subtract` could be useful additions.
They could ensure that operations between numbers with vastly different exponents
did not result in a flushed value.
Q: Are `multiply` and `divide` worthwhile?
Q: Should `add` and `subtract` be defined also?
## VII. Prior Art
Many examples of fixed-point support in C and C++ exist. While almost
all of them aim for low run-time cost and expressive alternatives to
raw integer manipulation, they vary greatly in detail and in terms of
their interface.
One especially interesting dichotomy is between solutions which offer
a discrete selection of fixed-point types and libraries which contain
a continuous range of exponents through type parameterization.
### N1169
One example of the former is found in proposal N1169
[\[N1169\]](http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1169.pdf),
the intent of which is to expose features found in certain embedded
hardware. It introduces a succinct set of language-level fixed-point
types and impose constraints on the number of integer or fractional
digits each can possess.
As with all examples of discrete-type fixed-point support, the limited
choice of exponents is a considerable restriction on the versatility
and expressiveness of the API.
Nevertheless, it may be possible to harness performance gains provided
by N1169 fixed-point types through explicit template specialization.
This is likely to be a valuable proposition to potential users of the
library who find themselves targeting platforms which support
fixed-point arithmetic at the hardware level.
### P0106
There are many other C++ libraries available which fall into the
latter category of continuous-range fixed-point arithmetic
[\[mizvekov\]](https://github.com/mizvekov/fp)
[\[schregle\]](http://www.codeproject.com/Articles/37636/Fixed-Point-Class)
[\[viboes\]](https://github.com/viboes/fixed_point). In particular, an
existing library proposal by Lawrence Crowl,
P0106 [\[P0106\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0106r0.html)
(formerly N3352),
aims to achieve very similar goals through similar means and warrants
closer comparison than N1169.
P0106 introduces four class templates covering the quadrant of signed
versus unsigned and fractional versus integer numeric types. It is
intended to replace built-in types in a wide variety of situations and
accordingly, is highly compile-time configurable in terms of how
rounding and overflow are handled. Parameters to these four class
templates include the storage in bits and - for fractional types - the
resolution.
The `fixed_point` class template could probably - with a few caveats -
be generated using the two fractional types, `nonnegative` and
`negatable`, replacing the `Rep` parameter with the integer bit
count of `Rep`, specifying `fastest` for the rounding mode and
specifying `undefined` as the overflow mode.
However, `fixed_point` more closely and concisely caters to the needs of
users who already use integer types and simply desire a more concise,
less error-prone form. It more closely follows the four design aims of
the library and - it can be argued - more closely follows the spirit
of the standard in its pursuit of zero-cost abstraction.
Some aspects of the design of the P0106 API which back up these
conclusion are that:
* the result of arithmetic operations closely resemble the `trunc_`
function templates and are potentially more costly at run-time;
* the nature of the range-specifying template parameters - through
careful framing in mathematical terms - abstracts away valuable
information regarding machine-critical type size information;
* the breaking up of duties amongst four separate class templates
introduces four new concepts and incurs additional mental load for
relatively little gain while further detaching the interface from
vital machine-level details and
* the absence of the most negative number from signed types reduces
the capacity of all types by one.
The added versatility that the P0106 API provides regarding rounding
and overflow handling are of relatively low priority to users who
already bear the scars of battles with raw integer types.
Nevertheless, providing them as options to be turned on or off at
compile time is an ideal way to leave the choice in the hands of the
user.
Many high-performance applications - in which fixed-point is of
potential value - favor run-time checks during development which are
subsequently deactivated in production builds.
The P0106 interface is highly conducive to this style of development.
The design proposed in this paper aims to achieve similar results
by composing fixed-point types from custom integral types.
## VIII. Acknowledgements
SG6: Lawrence Crowl
SG14: Guy Davidson, Michael Wong
Contributors: Ed Ainsley, Billy Baker, Lance Dyson, Marco Foco,
Mathias Gaunard, Clément Grégoire, Nicolas Guillemot, Kurt Guntheroth, Matt Kinzelman, Joël Lamotte,
Sean Middleditch, Paul Robinson, Patrice Roy, Peter Schregle, Ryhor Spivak
## IX. Revisions
This paper revises [P0037R3](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0037r3.html):
* removed mention of `width` and `set_width`
* rewritten description of `Rep` template parameter
* added sections, **Access to `Rep` Value** and **Class Template Deduction**
* removed `make_fixed` and `make_ufixed` function templates
* rewritten **Operator Overloads** section, renamed **Operators** and included `constant` operators
* removed sections, **Overflow** and **Underflow**
* removed `add` and `subtract` function templates
* reversed the roles of `operator/` and `divide`
* replaced section on composability with link to
[\[P0554\]](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0554r0.html)
* replaced reference to [\[P0381\]](http://johnmcfarlane.github.io/fixed_point/papers/p0381r1.html)
with reference to [\[P0675\]](https://github.com/johnmcfarlane/papers/blob/master/wg21/p0675r0.md)
* revised synopsis
* renamed section **Future Issues** to **Open Issues** and added sections,
* **Different Division Strategies**,
**Template Parameter Order**,
**Disable Addition/Subtraction if Exponents are Different**,
**Named Functions**,
**Extended Comparison Range**
and remove sections
* **Compile-Time Bit-Shift Operations**,
**Alternative Return Type Policies**,
* removed section, **References**, moving links into document body
* rewrote **Appendix 1: Reference Implementation** referencing
[\[github\]](https://github.com/johnmcfarlane/cnl)
* formatting changes intended to make markdown more readable as plain text
## XI. Appendix 1: Reference Implementation
An in-development implementation of the fixed_point class template and
its essential supporting functions and types is available
[\[github\]](https://github.com/johnmcfarlane/cnl).
## XII. Appendix 2: Performance
Despite a focus on usable interface and direct translation from
integer-based fixed-point operations, there is an overwhelming
expectation that the source code result in minimal instructions and
clock cycles. A few preliminary numbers are presented to give a very
early idea of how the API might perform.
Some notes:
* A few test functions were run, ranging from single arithmetic
operations to basic geometric functions, performed against integer,
floating-point and fixed-point types for comparison.
* Figures were taken from a single CPU, OS and compiler, namely:
* Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
* Target: x86_64-pc-linux-gnu
* Thread model: posix
* Fixed inputs were provided to each function, meaning that branch
prediction rarely fails. Results may also not represent the full
range of inputs.
* Details of the test harness used can be found in the source
project mentioned in Appendix 1;
* Times are in nanoseconds;
* Code has not yet been optimized for performance.
### Types
Where applicable various combinations of integer, floating-point and
fixed-point types were tested with the following identifiers:
* `uint8_t`, `int8_t`, `uint16_t`, `int16_t`, `uint32_t`, `int32_t`,
`uint64_t` and `int64_t` built-in integer types;
* `float`, `double` and `long double` built-in floating-point types;
* s3:4, u4:4, s7:8, u8:8, s15:16, u16:16, s31:32 and u32:32 format
fixed-point types.
### Basic Arithmetic
Plus, minus, multiplication and division were tested in isolation
using a number of different numeric types with the following results:
name cpu_time
add(float) 1.78011
add(double) 1.73966
add(long double) 3.46011
add(u4_4) 1.87726
add(s3_4) 1.85051
add(u8_8) 1.85417
add(s7_8) 1.82057
add(u16_16) 1.94194
add(s15_16) 1.93463
add(u32_32) 1.94674
add(s31_32) 1.94446
add(int8_t) 2.14857
add(uint8_t) 2.12571
add(int16_t) 1.9936
add(uint16_t) 1.88229
add(int32_t) 1.82126
add(uint32_t) 1.76
add(int64_t) 1.76
add(uint64_t) 1.83223
sub(float) 1.96617
sub(double) 1.98491
sub(long double) 3.55474
sub(u4_4) 1.77006
sub(s3_4) 1.72983
sub(u8_8) 1.72983
sub(s7_8) 1.72983
sub(u16_16) 1.73966
sub(s15_16) 1.85051
sub(u32_32) 1.88229
sub(s31_32) 1.87063
sub(int8_t) 1.76
sub(uint8_t) 1.74994
sub(int16_t) 1.82126
sub(uint16_t) 1.83794
sub(int32_t) 1.89074
sub(uint32_t) 1.85417
sub(int64_t) 1.83703
sub(uint64_t) 2.04914
mul(float) 1.9376
mul(double) 1.93097
mul(long double) 102.446
mul(u4_4) 2.46583
mul(s3_4) 2.09189
mul(u8_8) 2.08
mul(s7_8) 2.18697
mul(u16_16) 2.12571
mul(s15_16) 2.10789
mul(u32_32) 2.10789
mul(s31_32) 2.10789
mul(int8_t) 1.76
mul(uint8_t) 1.78011
mul(int16_t) 1.8432
mul(uint16_t) 1.76914
mul(int32_t) 1.78011
mul(uint32_t) 2.19086
mul(int64_t) 1.7696
mul(uint64_t) 1.79017
div(float) 5.12
div(double) 7.64343
div(long double) 8.304
div(u4_4) 3.82171
div(s3_4) 3.82171
div(u8_8) 3.84
div(s7_8) 3.8
div(u16_16) 9.152
div(s15_16) 11.232
div(u32_32) 30.8434
div(s31_32) 34
div(int8_t) 3.82171
div(uint8_t) 3.82171
div(int16_t) 3.8
div(uint16_t) 3.82171
div(int32_t) 3.82171
div(uint32_t) 3.81806
div(int64_t) 10.2286
div(uint64_t) 8.304
Among the slowest types are `long double`. It is likely that they are
emulated in software. The next slowest operations are fixed-point
multiply and divide operations - especially with 64-bit types. This is
because values need to be promoted temporarily to double-width types.
This is a known fixed-point technique which inevitably experiences
slowdown where a 128-bit type is required on a 64-bit system.
Here is a section of the disassembly of the s15:16 multiply call:
30: mov %r14,%rax
mov %r15,%rax
movslq -0x28(%rbp),%rax
movslq -0x30(%rbp),%rcx
imul %rax,%rcx
shr $0x10,%rcx
mov %ecx,-0x38(%rbp)
mov %r12,%rax
4c: movzbl (%rbx),%eax
cmp $0x1,%eax
↓ jne 68
54: mov 0x8(%rbx),%rax
lea 0x1(%rax),%rcx
mov %rcx,0x8(%rbx)
cmp 0x38(%rbx),%rax
↑ jb 30
The two 32-bit numbers are multiplied together and the result shifted
down - much as it would if raw `int` values were used. The efficiency
of this operation varies with the exponent. An exponent of zero should
mean no shift at all.
### 3-Dimensional Magnitude Squared
A fast `sqrt` implementation has not yet been tested with
`fixed_point`. (The naive implementation takes over 300ns.) For this
reason, a magnitude-squared function is measured, combining multiply
and add operations:
template
constexpr FP magnitude_squared(const FP & x, const FP & y, const FP & z)
{
return x * x + y * y + z * z;
}
Only real number formats are tested:
float 2.42606
double 2.08
long double 4.5056
s3_4 2.768
s7_8 2.77577
s15_16 2.752
s31_32 4.10331
Again, the size of the type seems to have the largest impact.
### Circle Intersection
A similar operation includes a comparison and branch:
template
bool circle_intersect_generic(Real x1, Real y1, Real r1, Real x2, Real y2, Real r2)
{
auto x_diff = x2 - x1;
auto y_diff = y2 - y1;
auto distance_squared = x_diff * x_diff + y_diff * y_diff;
auto touch_distance = r1 + r2;
auto touch_distance_squared = touch_distance * touch_distance;
return distance_squared <= touch_distance_squared;
}
float 3.46011
double 3.48
long double 6.4
s3_4 3.88
s7_8 4.5312
s15_16 3.82171
s31_32 5.92
Again, fixed-point and native performance are comparable.