Document number: J16/06-0046 = WG21 N1976
Date: 2006-04-20
Author: Benjamin Kosnik <bkoz@redhat.com>
Evolution Working Group, Modules and Linkage
Hosted implementations of C++ have long had the ability to collect object files from individual translation units into a single entity. Often, these collections of object files are called libraries, and simplify software creation and maintenance by clearly separating out dependencies, and providing interfaces between components.
For the purposes of this paper, there are two main types of libraries. The first, a static library, copies used elements from the library directly to the created executable. Thus, when an executable is created by statically linking against a library, the result is a system with two copies of the library. (The original library, and the newly-formed executable.)
The second type of library is one without duplication, and is a dynamic library. Instead of duplicating the data, when an executable is created by dynamically linking against a library, the implementation performs a magic step that allows the end result to reference the library directly, without a copy.
This paper will be concerned solely with the second type of library, and with dynamic shared objects in particular. The use of this kind of library in the C++ development community is widespread, and has been in existence for over ten years. Years of use have pointed out some of the pitfalls with parts of the C++ language and current implementations: it is the goal of this paper to provide a survey of current dynamic linking capabilities, techniques, and to explicitly quantify known issues.
The C++ standard defines three distinct linkage types: internal, external, none. (See 3.5 - Program and linkage [basic.link])
Adopt the notion of load unit from Austern indicating a binding of individual translation units together to form a single group. (Via some undefined mechanism.)
Define symbol as a definition for a specific entity in a load unit.
Adopt visibility to determine if a symbol can be used outside of the load unit. In addition, adopt the following refinements:
An entity defined within a load unit with external visibility implies that other load units are able to use the defined symbol.
An entity defined within a load unit with internal visibility implies that other load units are not able to use the defined symbol.
Adopt the notion of load set from Austern indicating the closed set of all individual load units.
Windows
Dynamic linking meta-picture:
.exe -> .lib -> .dll
Where .dll is the shared object, with symbol definitions. The .lib is a stubs library, which contains a list of symbols to be resolved in the .dll at runtime. The .exe is the final executable, and it's dependencies are resolved against the .lib at link time.
Default visibility is internal. Annotations required both for marking a symbol with external visibility and for importing an external symbol.
Visibility control techniques include:
One: decoration via __declspec(dllexport) and __declspec(dllimport) on a class and member function basis, but not both. Also allowed template specializations.
Two: by a text file containing a list of symbols to export or by ordinal.
Dynamic loading via LoadLibrary, where you can set resolution to immediate or delayed.
Other notes: On class-scope visiblity decorations, all member functions, static data members, virtual functions, typeinfo, etc are visible along with the same for any base classes. Operator new inter-position is known not to work. Versioning capability on .dll include major, minor, build date.
SVR4 (Sun/Linux)
Dynamic linking meta-picture:
.exe -> .so
Where .so is the shared object, with symbol definitions.The .exe is the final executable, and it's dependencies are resolved against the .lib at runtime.
Default visibility is external. Annotation or other method required for marking a symbol with internal visibility.
Visibility control techniques include:
One: decoration via __attribute__((visibility(option))) where option is one of: hidden, default, internal, protected.
Two: #pragma interface/#pragma implementation
Three: extern template, -fno-implicit-templates, and inlining declarations
Four: #pragma GCC visibility push(hidden) /#pragma GCC visibility pop in combination with -fvisibility,-fvisibility-inlines-hidden
Five: a text file containing a list of symbols to export which is used as an input file to the linker, with optional minor version refinement. Symbols with C++ linkage (ie mangled) can be exported in namespace globs in an un-mangled syntax.
Version control techniques include:
One:Versioning of libraries happens exclusive using the SONAME, a simple string that is part of the object file format (ELF). There can be exactly one library for a given SONAME.
Two: Versioning capability on a library include major, minor, minor with refinements, aliasing and renaming. These versioning details are explicitly specified in a text file that the linker uses during library creation.
Three: Explicit version number mangling on nested namespaces, and then injecting these versioned names into the enclosing namespace via the GNU compiler extensions described by namespace associations.
Dynamic loading via dlopen/dlmopen/dlsym/dlclose, where you can set resolution to immediate or delayed.
Other notes: Operator new inter-position can be made to work. The -fvisibility and #pragma visibility options are known to not work and or have serious flaws.
1: Sharing load unit by dynamic linking
2: Sharing load unit by dynamic loading
3: Load unit with restricted external visibility
4: Load unit with versioned external visibility
In the following examples, the following color key is used: annotations and sources in light blue are for
SVR4 (Sun/Linux)
sources and annotations in light red are for
Windows
and platform-independent code will be in gray
as in this neutral color
The fundamental example for dynamic linking. A load unit (libfoo) that contains a function (get_city) is then shared by an executables (one.exe) at runtime by compile-time linking.
foo.h
extern const char* get_city();
extern const char* __declspec(dllimport) get_city();
foo.cc
static const char* city = "mont tremblant"; const char* get_city() { return city; }
static const char* city = "mont tremblant"; const char* __declspec(dllexport) get_city() { return city; }
one.cc
#include <cstring> #include "foo.h" int check_city_one() { return std::strcmp(get_city(), "chicago"); } int main() { return check_city_one(); }
On linux, the situation outlined above is constructed as follows:
g++ -shared -fPIC -O2 -g foo.cc -o libfoo.so
g++ -g -O2 -L. one.cc -lfoo -o one.exe
The fundamental example for dynamic loading. A load unit (libfoo) that contains a function (get_city) is then shared by an executables (one.exe) by loading at runtime.
foo.h
extern const char* get_city();
extern const char* __declspec(dllimport) get_city();
foo.cc
static const char* city = "mont tremblant"; const char* get_city() { return city; }
static const char* city = "mont tremblant"; const char* __declspec(dllexport) get_city() { return city; }
two.cc
#include <dlfcn.h> #include <cstring> #include <stdexcept> #include "foo.h" const char* mangle(const char* unmangled) { // GNU const char* mangled = "_Z8get_cityv"; return mangled; } void dynamic_open(void*& h) { dlerror(); void* tmp = dlopen("./libfoo.so", RTLD_LAZY); if (!tmp) { try { // Throws std::logic_error on NULL string. std::string error(dlerror()); throw std::runtime_error(error); } catch (const std::logic_error&) { } } h = tmp; } void get_and_execute_dynamic_symbol(void*& h) { dlerror(); typedef void (*function_type) (void); function_type fn; fn = reinterpret_cast<function_type>(dlsym(h, mangle("get_city"))); try { std::string error(dlerror()); throw std::runtime_error(error); } catch (const std::logic_error&) { } fn(); } void dynamic_close(void*& h) { if (dlclose(h) != 0) { try { std::string error(dlerror()); throw std::runtime_error(error); } catch (const std::logic_error&) { } } } int main() { void* h; dynamic_open(h); get_and_execute_dynamic_symbol(h); dynamic_close(h); return 0; }
#include <windows.h> #include <cstring> #include <stdexcept> #include "foo.h" const char* mangle(const char* unmangled) { // Microsoft const char* mangled = "?get_city@@YAXXZ"; return mangled; } void dynamic_open(HINSTANCE& h) { HINSTANCE tmp; tmp = LoadLibrary("./libfoo.so"); if (!tmp) { throw std::runtime_error(error); } h = tmp; } void get_and_execute_dynamic_symbol(HINSTANCE& h) { typedef void (*function_type) (void); function_type fn; fn = reinterpret_cast<function_type>(GetProcAddress(h, mangle("get_city"))); if (!fn) { throw std::runtime_error(error); } (fn)(); } void dynamic_close(HINSTANCE& h) { if (FreeLibrary(h) != 0) { throw std::runtime_error(error); } } int main() { HINSTANCE h; dynamic_open(h); get_and_execute_dynamic_symbol(h); dynamic_close(h); return 0; }
On linux, the situation outlined above is constructed as follows:
g++ -shared -fPIC -O2 -g foo.cc -o libfoo.so
g++ -g -O2 two.cc -ldl -o two.exe
There are at least three common techniques available for specifying visibility of individual entities in a load unit. Three will be detailed here: the use of compiler-specific pragmas, the use of vendor-specific decorations on types and declarations, and finally the use of vendor-specific link maps.
container.cc
#pragma GCC visibility push(hidden) //#pragma GCC visibility push(default) typedef int value_type; class foo { value_type v; public: foo(); virtual ~foo() { } value_type& get_vector(); }; foo::foo() { value_type apple; v = 1; } value_type& foo::get_vector() { return v; } void swap_foo() { value_type empty; foo f; } #pragma GCC visibility pop
On linux, the situation outlined above is constructed as follows:
g++ -g -c container.cc
Regardless of any other compiler flags (ie
-fvisibility=hidden
) the defined symbols will have the
visibility as noted in the pragma.
3b: Decoration.
container.cc
#ifdef VIS_EXTERNAL
#define VIS __attribute__ ((visibility("default")))
#else
#define VIS __attribute__ ((visibility("hidden")))
#endif
typedef int value_type;
class VIS foo
{
value_type v;
public:
foo();
virtual ~foo() { }
value_type&
get_vector();
};
foo::foo()
{
value_type apple;
v = 1;
}
value_type&
foo::get_vector()
{ return v; }
void VIS
swap_foo()
{
value_type empty;
foo f;
}
On linux, the situation outlined above is constructed as follows:
g++ -g -c container.cc
Regardless of any other compiler flags (ie
-fvisibility=hidden
) the defined symbols will have the
visibility as noted in the pragma.
3b: Lists.
container.cc
typedef int value_type;
class foo
{
value_type v;
public:
foo();
virtual ~foo() { }
value_type&
get_vector();
};
foo::foo()
{
value_type apple;
v = 1;
}
value_type&
foo::get_vector()
{ return v; }
void
swap_foo()
{
value_type empty;
foo f;
}
container.ver
{
global:
_Z8swap_foov;
local: *;
};
On linux, the situation outlined above is constructed as follows:
g++ -g -Wl,--version-script=container.ver -shared container.cc -o container.so
Other compiler flags (ie -fvisibility=hidden
) will
prevail over the visibility as noted in the link map.
Four: Load unit with versioned external visibility.
On occasion, it is possible to safely extend class declarations
and other entities over time. In order to have this work, there must
be a way to attach a version to symbols in a load unit with external
visibility.
For instance, this class:
container.h
struct foo
{
foo();
int
get_value();
private:
int v;
};
container.cc
#include "container.h"
foo::foo()
{ v = 1; }
int
foo::get_value()
{ return v; }
container.ver
VERSION_1.0
{
global:
_ZN3foo9get_valueEv;
local: *;
};
On linux, the situation outlined above is constructed as follows:
g++ -g -Wl,--version-script=container.ver -shared container.cc -o container.so
At this point, the only externally-visible symbol is
foo::get_value()
, and this symbol is versioned with the
tag VERSION_1.0
.
After a period of use, a new feature is added and as part of this,
a new member function is added to struct foo, say
foo::get_second_value()
, and this symbol is versioned
with the tag VERSION_1.1
. A new version of container.so
is generated, with both the member functions defined with the
corresponding version tag. This allows newer code to use the new
member function, but allows a graceful system response if this newer
code is run in an environment without the newer container.so.
container.ver
VERSION_1.0
{
global:
_ZN3foo9get_valueEv;
local: *;
};
VERSION_1.1
{
global:
_ZN3foo16get_second_valueEv;
local: *;
} VERSION_1.0;
This is constructed in the same manner as the first container.so file.
Known Problems, by example
1: Overriding global operator new
2: Order of initialization
3: Exceptions across load units
4: Vague linkage and duplicate symbol resolution
Problem One: Overriding global operator
new.
Standard defined behavior for global scope operator new signatures
allow users to provide custom definitions and override the default.
What happens if there are two user-defined operator new definitions:
which one is picked? This has been referred to as the operator new
"inter-position" issue, but could be generalized to
load-unit allocation/deallocation problems. It is important that
allocation and deallocation mechanisms match across load unit
boundaries.
There are other issues with memory management and multiple load
units. A big question is how to keep the allocator equality
requirements (ie, an instance of an allocator is equal to another
allocator iff one can free the other's allocation.)
foo.cc
#include <cstdio>
#include <new>
#include <tr1/array>
std::string*
get_string()
{ return new std::string("olive street beach"); }
void
dispose_string(std::string* s)
{ delete s; }
// Fixed external storage.
typedef std::tr1::array<char, 256> array_type;
static array_type _M_array;
void* operator new(std::size_t __n) throw (std::bad_alloc)
{
puts("operator new");
static std::size_t __array_used;
if (__array_used + __n > _M_array.size())
std::bad_alloc();
void* __ret = _M_array.begin() + __array_used;
__array_used += __n;
return __ret;
}
void operator delete(void*) throw()
{
// Does nothing.
puts("operator delete");
}
foo.ver
VERSION_1.0
{
global:
_Z10get_stringv;
_Z14dispose_stringPSs;
local: *;
};
test.cc
#include <new>
#include <vector>
#include <string>
extern std::string* get_string();
extern void dispose_string(std::string*);
int main()
{
typedef std::vector<int> vector_type;
vector_type* v;
try
{
v = new vector_type(100);
}
catch (const std::exception& e)
{
puts(e.what());
throw;
}
catch (...)
{ throw; }
delete v;
std::string* s = get_string();
dispose_string(s);
return 0;
}
Construct the example as follows:
g++ -shared -fPIC -O0 -g foo.cc -o libfoo.so
g++ -g -O0 -L. test.cc -lfoo -o problem1.exe
And then run the resulting executable:
%./problem1.exe
operator new
operator new
operator delete
operator delete
operator new
operator new
operator delete
operator delete
As suspected, the operator new definitions in libfoo are used in
problem1.exe as well, overriding the default definitions in the
standard library. Although a trivial example, the problem is real: any
load unit that redefines operator new could, when added to a new load
set, change underlying allocations.
One way around this is to limit the visibility of the operator new
and operator delete definitions to within the libfoo.so load unit.
g++ -shared -fPIC -O0 -g foo.cc -Wl,--version-script=foo.ver -o libfoo.so
g++ -g -O0 -L. test.cc -lfoo -o problem1.exe
And then run the resulting executable:
%./problem1.exe
operator new
operator delete
By limiting the visibility, the operator new and delete
definitions can be bound to a specific load unit.
Problem Two: Order of initialization.
How are global objects supposed to be initialized and finalized in
scenarios with multiple accesses and accesses that can be opened and
closed at will? In addition, using static local objects may run into
issues with initialization.
foo.cc
#include <cstdio>
struct A
{
A()
{ puts("A ctor"); }
~A()
{ puts("A dtor"); }
};
void f()
{ static A obj; }
test.cc
#include <cstdio>
extern void f();
struct B
{
B()
{ puts("B ctor"); }
~B()
{ puts("B dtor"); }
};
static B foo;
int main()
{
f();
return 0;
}
On linux, the situation outlined above is constructed as follows:
g++ -shared -fPIC -O2 -g foo.cc -o libfoo.so
g++ -g -O2 -L. test.cc -lfoo -o two.exe
And then run the resulting executable:
%./one.exe
B ctor
A ctor
A dtor
B dtor
This ordering is correct. However, on windows:
B ctor
A ctor
B dtor
A dtor
... which demonstrates the issue of ordering objects across
different load units.
Problem Three: Exceptions across load units.
Compiler-generated information for virtual functions and typeinfo
has vague linkage that is difficult for the programmer to control
given the language facilities available in standard C++. Because of this,
default visibility and the order of binding of symbols all impact the
ability to throw and catch exceptions across load units.
There is an underspecification of this necessary compiler-generated
magic with respect to multiple load units. Symptomatic of this
include use of typeid (typeinfo) and throwing exceptions across load
units, inlining template member functions defined in multiple load
units (each with unique addresses), and the use of inheritance in
multiple load units (do base class definitions and
implicitly-generated data have to be visible across load units?)
error_handling.h
#include <stdexcept>
struct insert_error : public std::runtime_error
{
insert_error(const std::string&);
};
void check_insert();
error_handling.cc
#include "error_handling.h"
insert_error::insert_error(const std::string& s) : std::runtime_error(s) { };
void check_insert()
{
// Do something, assume it's wrong.
throw insert_error("check_insert: something happened");
}
error_handling.ver
VERSION_1.0
{
global:
_ZN12insert_errorC*;
# _ZTS12insert_error;
_Z12check_insertv;
local: *;
};
test.cc
#include "error_handling.h"
#include <iostream>
int main()
{
try
{
check_insert();
}
catch (const insert_error& e)
{
// 1: Expect catch here.
}
catch (const std::exception& e)
{
// 2: Visibility issues may lead to catch here.
std::cout << "caught object of type: " << typeid(e).name() << std::endl;
}
catch (...)
{
// 3: Catch all.
throw;
}
return 0;
}
Construct the example as follows:
g++ -shared -fPIC -O2 -g error_handling.cc
-Wl,--version-script=error_handling.ver -o libfoo.so
g++ -g -O2 -L. test.cc -lfoo -o problem3.exe
And then run the resulting executable:
%./problem3.exe
caught object of type: 12insert_error
Without typeinfo name sharing between libfoo and
problem3.exe, the execution path is non-intuitive and in error. The
typeinfo information is generated with vague linkage in both libfoo
and problem3.exe. By allowing the export of _ZTS12insert_error from
error_handling.ver, both will end up using the vague typeinfo name
stored in the executable, and exception handling will work as
expected.
Problem Four: Vague linkage and
duplicate symbol resolution.
Template classes with member functions can have instantiations in
multiple files. To prevent duplicate symbols, many compilers have
implemented vague linkage semantics that coalesce multiple,
equivalent symbol names across translations units into one
definition. Picking one version across multiple dynamic load units is
tricky. Depending on the order in which different load units are
initialized, the definition that is picked for all the other load
units in a given load set may change, and may end up being different
than expected or planned when the individual load units were
constructed. Symptomatic of this problem are multi-ABI binaries,
where different compilers define the same symbol in a given load set.
Also implicated are template designs that depend on macro defines to
change behavior.
The end result is similar, in both cases: load units that have
porous boundaries, that end up using symbols defined elsewhere in the
load set to resolve symbols within the original load unit.
container.h
template<typename T>
class container
{
T data;
void
do_private();
public:
container(T value = T()) : data(value) { }
void
do_public()
{ return do_private(); }
T
get_data() { return data; }
};
template<typename T>
void container<T>::do_private()
{
#ifdef OLD_VERSION
// Clear.
data = T();
#else
// Multiply.
data *= 2;
#endif
}
typedef container<int> container_type;
foo.cc
#include "container.h"
#include <iostream>
void foo()
{
container_type obj(4);
obj.do_public();
std::cout << obj.get_data() << std::endl;
}
test.cc
#include "container.h"
extern void foo();
int main()
{
container_type obj(2);
// Use do_public, get weak definition.
obj.do_public();
// Call external function on the rest.
foo();
return 0;
}
Construct the example as follows:
g++ -g -O2 -fPIC -shared foo.cc -o libfoo.so
g++
-DOLD_VERSION -g -O2 -L. test.cc -lfoo -o problem4.exe
And then run the resulting executable:
%./problem4.exe
0
Both libfoo.so and problem4.exe have vague linkage for
container<int>::do_private(). (Other options include no
definition, leading to undefined symbols or both having definitions,
and duplicate symbol errors.)As a result, it is system-defined and
order dependent which definition will be picked for both uses. On
linux, when problem4.exe is loaded, its symbol is used for all other
uses, including the one in libfoo.so. This is probably not what was
intended by the author of libfoo.so. In addition, the behavior of
libfoo.so now depends on optimization options: without optimization,
the un-inlined function will pick up the symbol from the executable,
and with optimization that results in inlining (try -O3) the behavior
of libfoo will change.
Impact on Existing Standard by Chapter
01. Runtime/execution model needs
to be modified to address load unit.
02. Are new keywords needed to
express the idea of controlling visibility of specific entities in a
given load unit. Ie, visible or invisible, public or private, hidden
or "exported"? Multiple layers of visibility (ie
versioning)?
03. ODR scope object lifetime
storage duration startup/termination -- what order? linkage
05. Operators new/delete, typeid,
dynamic_cast
07. Static, extern vs. load units
(linkage). Namespace-scope visibility?
09. Class linkage changes. What if
not all members of a class are visible. Differing visibility between
nested and enclosing classes, or local and enclosing classes. Will
nested classes and nested namespaces have the same semantics?
10. Issues with vtable, typeinfo
visibility across multiple load units. Do base class vtables have to
be visible in order to use a derived class in a different load unit?
12. Tons of stuff, including
constructors, destructors, operator new and delete.
15. Detail throw/catch exceptions
across multiple load units.
18. Most everything.
19-27. How does this impact C++ standard library?
Solution Space
Going from least to most ambitious.
Nothing, go with the status quo.
Attempt a TR, or "best
practices" document with suggestions for clients and vendors.
Come up with standard terminology,
select a common subset of what's possible and figure out syntax for
expressing it portably. Suspect that many C++ features will fall by
the wayside.
Come up with standard terminology,
and figure out syntax for expressing it. Includes specification for
C++-specific requirements like throwing exceptions, templates,
template specializations, and vague linkage.
Come up with new C++ constructs, for example modules,
annotation rules for namespaces, etc.
Acknowledgements
Contributors to the Modules and Linkage discussions at Mont
Treblant (2005) and Berlin(2006), not limited to: David Vandevoorde,
Mat Marcus, Judy Ward, PremAnand Rao, Doug Harrison, Bronek Kozicki,
Eugene Gershnik, and Thomas Witt.
References
Matthew Austern. Toward standardization of dynamic libraries.
Technical Report N1400=02-0058, Sep 25, 2002.
Pete Becker. Draft Proposal for Dynamic Libraries in C++.
Technical Report N1428=03-0010, March 3, 2003.
Daveed Vandevoorde. Modules in C++ (Revision 2). Technical Report
N1778=05-0038, January 2005.
John R. Levine. Linkers and Loaders. Morgan Kaufmann, January 15,
2000.
Ulrich Drepper. How
To Write Shared Libraries. January, 2005.
Ulrich Drepper provided feedback and corrections on previous drafts
of this document.
Doug Harrison. DllHelper_0.90.
Jonathan H. Lundquist. 270. Order of initialization of static data
members of class templates. CWG Defects.
Mark Mitchell. 362. Order of initialization in instantiation units. CWG Defects.
Sun Microsystems. Linker
and Library Guide. 2002.
Microsoft. DLLs.
Microsoft. Walkthrough:
Creating and Using a Dynamic Link Library.
Apple. Overview
of the C++ Runtime Environment.
ACE shared object wrappers, ie. ACE_Shared_Object.
Mat Marcus. Typeinfo comparison code easily breaks shared libs.
http://gcc.gnu.org/PR23628
Ralf W. Grosse-Kunstleve, David Abrahams, Jason Merrill. Minimal
GCC/Linux shared lib + EH bug example.
http://mail.python.org/pipermail/python-dev/2002-May/023988.html
http://gcc.gnu.org/ml/gcc/2002-05/msg00882.html
http://sources.redhat.com/ml/libc-alpha/2002-05/msg00222.html
Doug Harrison. Order in court!
http://groups.google.com/group/microsoft.public.vc.mfc/msg/438bdfa1dc683ce1?hl=en&
Doug Harrison. com_ptr_t as static object in a DLL.
http://groups.google.com/group/microsoft.public.vc.mfc/msg/21cfdeb16358e755?hl=en&
Static variable in template function across compilation units.
http://groups.google.com/group/comp.lang.c++.moderated/msg/50091d17e98a36a0?hl=en