Modules, Header Files, and Miscellaneous Topics

This chapter starts with a detailed discussion on how modules allow you to write reusable components and contrasts this against old-style header files. It also explains what preprocessor directives are and gives some examples of why C-style preprocessor macros are dangerous. The chapter then explains the concept of linkage, which specifies where named entities can be accessed from, and explains the one definition rule. The final part of the chapter discusses the different uses of the static and extern keywords, as well as C-style variable-length argument lists.

MODULES

Modules are introduced in Chapter 1, “A Crash Course in C++ and the Standard Library,” and you have already authored and consumed your own simple modules in previous chapters. However, there are quite a few more things to say about modules. Before the introduction of modules in C++20, header files, discussed later in this chapter, were used to provide the interface to a reusable piece of code. Header files do have a number of problems, though, such as avoiding multiple includes of the same header file and making sure header files are included in the correct order. Additionally, simply #include’ing, for example, <iostream> adds tens of thousands of lines of code that the compiler has to crunch through. If several source files #include <iostream>, all of those translation units grow much bigger. And that is with an include of just a single header file. Imagine if you need <iostream>, <vector>, <format>, and more.

Modules solve all these issues, and more. The order in which modules are imported is not important. Modules are compiled once to a binary format, which the compiler can then use whenever a module is imported in another source file. This is in stark contrast with header files, which the compiler has to compile over and over again, every time it encounters an #include of that header file. Hence, modules can drastically improve compilation times. Incremental compilation times also improve, as certain modifications in modules, for example, modifying an exported function’s implementation in a module interface file, do not trigger recompilation of users of that module (discussed in more details later in this chapter). Modules are not influenced by any externally defined macros, and any macros defined inside a module are never visible to any code outside the module, that is, modules are self-isolating. Hence, the following recommendation:

If possible, legacy code can slowly be transitioned to modules as well. However, there is a lot of legacy code in the world, and a lot of third-party libraries don’t embrace modules yet, as not all compilers fully support modules at the time of this writing. For these reasons it is still important to know how legacy header files work. That’s why this chapter still includes discussions on header files.

Unmodularizing Code

If you want to compile code samples from this book with a compiler that does not yet fully support modules, you can unmodularize the code as follows:

Rename .cppm module interface files to .h header files.
Add a #pragma once at the top of each .h header file.
Remove export module xyz declarations.
Replace module xyz declarations with an #include to include the corresponding header file.
Replace import and export import declarations with proper #include directives. If the code is using import std;, then those need to be replaced with #include directives to include all necessary individual header files. See Appendix C, “Standard Library Header Files,” for a list of all Standard Library headers and a brief description of their contents.
Remove any export keywords.
Remove all occurrences of module;, which denotes the start of a global module fragment.
If a function definition or variable definition appears in a .h header file, add the inline keyword in front of it.

Standard Named Modules

As Chapter 1 explains, you get access to everything from the C++ Standard Library by importing the standard named module std. This named module makes the entire Standard Library available to you, including all C functionality, defined in such headers as <cstddef>. However, all C functionality is made available only through the std namespace. For legacy code, you can consider importing the std.compat named module instead, which imports everything std imports but makes all C functionality available both in the std namespace and the global namespace. The use of std.compat is not recommended in new code.

Module Interface Files

A module interface file defines the interface for the functionality provided by a module and usually has .cppm as a file extension. A module interface file starts with a declaration stating that the file is defining a module with a certain name. This is called the module declaration. A module’s name can be any valid C++ identifier. The name can include dots but cannot start or end with a dot and cannot contain multiple dots in a row. Examples of valid names are datamodel, mycompany.datamodel, mycompany.datamodel.core, datamodel_core, and so on.

A module needs to explicitly state what to export, i.e., what should be visible when client code imports the module. A module can export any declaration, such as variable declarations, function declarations, type declarations, using directives, and using declarations. Additionally, import declarations can be exported as well. Exporting entities from a module is done with the export keyword. Anything that is not exported from a module is visible only from within the module itself. The collection of all exported entities is called the module interface.

Here is an example of a module interface file called Person.cppm, defining a person module and exporting a Person class. Note that it imports the functionality provided by std.

export module person;  // Named module declaration

import std;            // Import declaration

export class Person    // Export declaration
{
    public:
        Person(std::string firstName, std::string lastName)
            : m_firstName { std::move(firstName) }
            , m_lastName { std::move(lastName) } { }
        const std::string& getFirstName() const { return m_firstName; }
        const std::string& getLastName() const { return m_lastName; }
    private:
        std::string m_firstName;
        std::string m_lastName;
};

In standardese terms, everything starting from a named module declaration (the first line in the previous code snippet) until the end of the file is called the module purview.

This Person class can be made available for use by importing the person module as follows (test.cpp):

import person;     // Import declaration for person module
import std;

using namespace std;

int main()
{
    Person person { "Kole", "Webb" };
    println("{}, {}", person.getLastName(), person.getFirstName());
}

Pretty much anything can be exported from a module, as long as it has a name. Examples are class definitions, function prototypes, class enumeration types, using declarations and directives, namespaces, and so on. If a namespace is explicitly exported with the export keyword, everything inside that namespace is automatically exported as well. For example, the following code snippet exports the entire DataModel namespace; hence, there is no need to explicitly export the individual classes and type alias:

export module datamodel;
import std;
export namespace DataModel
{
    class Person { /* … */ };
    class Address { /* … */ };
    using Persons = std::vector<Person>;
}

You can also export a whole block of declarations using an export block. Here’s an example:

export
{
    namespace DataModel
    {
        class Person { /* … */ };
        class Address { /* … */ };
        using Persons = std::vector<Person>;
    }
}

Module Implementation Files

A module can be split into a module interface file and one or more module implementation files. Module implementation files usually have .cpp as their extension. You are free to decide which implementations you move to module implementation files and which implementations you leave in the module interface file. One option is to move all function and member function implementations to a module implementation file and leave only the function prototypes, class definitions, and so on in the module interface file. Another option is to leave the implementation of small functions and member functions in the interface file, while moving the implementations of other functions and member functions to an implementation file. You have a lot of flexibility here.

A module implementation file again contains a named module declaration to specify for which module the implementations are for, but without the export keyword. For example, the previous person module can be split into an interface and an implementation file as follows. Here is the module interface file:

export module person;  // Module declaration

import std;

export class Person
{
    public:
        Person(std::string firstName, std::string lastName);
        const std::string& getFirstName() const;
        const std::string& getLastName() const;
    private:
        std::string m_firstName;
        std::string m_lastName;
};

The implementations now go in a Person.cpp module implementation file:

module person;  // Module declaration, but without the export keyword

using namespace std;

Person::Person(string firstName, string lastName)
    : m_firstName { move(firstName) }, m_lastName { move(lastName) }
{
}

const string& Person::getFirstName() const { return m_firstName; }
const string& Person::getLastName() const { return m_lastName; }

Note that the implementation file does not have an import declaration for the person module. The module person declaration implicitly includes an import person declaration. Also note that the implementation file does not have any import declaration for std, even though it’s using std::string in the implementation of the member functions. Thanks to the implicit import person, and because this implementation file is part of the same person module, it implicitly inherits the std import declaration from the module interface file. In contrast, adding an import person declaration to the test.cpp file does not implicitly inherit the std import declaration because test.cpp is not part of the person module. There is more to be said about this, which is the topic of the “Visibility vs. Reachability” section later in this chapter.

Module implementation files cannot export anything; only module interface files can.

Splitting Interface from Implementation

When using header files, discussed later in this chapter, instead of modules, it is strongly recommended to put only declarations in your header file (.h) and move all implementations to a source file (.cpp). One of the reasons is to improve compilation times. If you were to put your implementations in the header file, any change, even just changing a comment, would require you to recompile all other source files that include that header. For certain header files, this could ripple through the entire code base, causing a full recompile of the program. By putting your implementations in a source file instead, making changes to those implementations without touching the header file means that only that single source file needs to be recompiled.

Modules work differently. A module interface consists only of class definitions, function prototypes, and so on, but does not include any function or member function implementations, even if those implementations are directly in the module interface file. This means that changing a function or member function implementation that is inside a module interface file does not require a recompilation of users of that module, as long as you do not touch the interface part, for example, the function header (= function name, parameter list, and return type). Two exceptions are functions marked with the inline keyword, and template definitions. For both of these, the compiler needs to know their complete implementations at the time client code using them is compiled. Hence, any change to inline functions or template definitions can trigger recompilation of client code.

Even though technically, it is not required anymore to split the interface from the implementation, in some cases I still recommend doing so. The main goal should be to have clean and easy-to-read interfaces. Implementations of functions can stay in the interface, as long as they don’t obscure the interface and make it harder for users to quickly grasp what the public interface provides. For example, if a module has a rather big public interface, it might be better not to obscure that interface with implementations, so the user can have a better overview of what’s being offered. Still, small getter and setter functions can stay in the interface, as they don’t really impact the readability of the interface.

Separating the interface from the implementation can be done in several ways. One option is to split a module into interface and implementation files, as discussed in the previous section. Another option is to separate the interface and the implementations within a single module interface file. For example, here is the Person class defined in a single module interface file (person.cppm), but with the implementations split from the interface:

export module person;
import std;
// Class definition
export class Person
{
    public:
        Person(std::string firstName, std::string lastName);
        const std::string& getFirstName() const;
        const std::string& getLastName() const;
    private:
        std::string m_firstName;
        std::string m_lastName;
};
// Implementations
Person::Person(std::string firstName, std::string lastName)
    : m_firstName { std::move(firstName) }, m_lastName { std::move(lastName) } { }
const std::string& Person::getFirstName() const { return m_firstName; }
const std::string& Person::getLastName() const { return m_lastName; }

Visibility vs. Reachability

As mentioned earlier, when you import the person module in another source file that is not part of the person module, for example in a test.cpp file, then you are not implicitly inheriting the std import declaration from the person module interface file. Without an explicit import for std in test.cpp, the std::string name, for example, is not visible, meaning the following highlighted line of code does not compile:

import person;
int main()
{
    std::string str;
}

Still, even without adding an explicit import for std to test.cpp, the following lines of code work just fine:

Person person { "Kole", "Webb" };
const auto& lastName { person.getLastName() };
auto length { lastName.length() };

Why is this working? There is a difference between visibility and reachability of entities in C++. By importing the person module, the functionality from std becomes reachable but not visible. Member functions of reachable classes automatically become visible. All this means that you can use certain functionality from std, such as storing the result of getLastName() in a variable by using auto type deduction and calling member functions on it such as length().

To make the std::string name properly visible in test.cpp, an explicit import of std or <string> is required.

Submodules

The C++ standard does not speak about submodules as such; however, it is allowed to use dots in a module’s name, and that makes it possible to structure your modules in any hierarchy you want. For example, earlier, the following example of a DataModel namespace was given:

export module datamodel;
import std;
export namespace DataModel
{
    class Person { /* … */ };
    class Address { /* … */ };
    using Persons = std::vector<Person>;
}

Both the Person and Address classes are inside the DataModel namespace and in the datamodel module. This can be restructured by defining two submodules: datamodel.person and datamodel.address. The module interface file for the datamodel.person submodule is as follows:

export module datamodel.person;  // datamodel.person submodule
export namespace DataModel { class Person { /* … */ }; }

Here is the module interface file for datamodel.address:

export module datamodel.address;  // datamodel.address submodule
export namespace DataModel { class Address { /* … */ }; }

Finally, a datamodel module is defined as follows. It imports and immediately exports both submodules.

export module datamodel;          // datamodel module
export import datamodel.person;   // Import and export person submodule
export import datamodel.address;  // Import and export address submodule
import std;
export namespace DataModel { using Persons = std::vector<Person>; }

Of course, the member function implementations of classes in submodules can also go into module implementation files. For example, suppose the Address class has a default constructor that just prints a statement to standard output. That implementation could be in a file called datamodel.address.cpp:

module datamodel.address;  // datamodel.address submodule
import std;
using namespace std;
DataModel::Address::Address() { println("Address::Address()"); }

A benefit of structuring your code with submodules is that clients can import either everything at once or only specific parts they want to use. For example, if client code needs access to everything in the datamodel module, then the following import declaration is the easiest:

import datamodel;

On the other hand, if client code is only interested in using the Address class, then the following import declaration suffices:

import datamodel.address;

Importing everything at once is more convenient than selectively importing what you need, especially for stable modules that rarely change. However, by using selective imports for less stable modules, it might be possible to improve build times if changes are made to the module. For example, if a change is made to the interface of the datamodel.address submodule, then only those files that import that submodule need to be recompiled.

Module Partitions

Another option to structure modules is to split them into separate partitions. The difference between submodules and partitions is that the submodule structuring is visible to users of the module, allowing users to selectively import only those submodules they want to use. Partitions, on the other hand, are used to structure a module internally. Partitions are not exposed to users of the module. All partitions declared in module interface partition files must ultimately be exported by the primary module interface file, either directly or indirectly. A module always has only one such primary module interface file, and that’s the interface file containing the export module name declaration.

A module partition is created by separating the name of the module and the name of the partition with a colon. The name of a partition can be any legal identifier. For example, the DataModel module from the previous section can be restructured using partitions instead of submodules. Here is the person partition in a datamodel.person.cppm module interface partition file:

export module datamodel:person;  // datamodel:person partition
export namespace DataModel { class Person { /* … */ }; }

Here is the address partition, including a default constructor:

export module datamodel:address; // datamodel:address partition
export namespace DataModel
{
    class Address
    {
    public:
        Address();
        /* … */
    };
}

Unfortunately, there is a caveat when using implementation files in combination with partitions: there can be only one file with a certain partition name. So, having an implementation file that starts with the following declaration is ill-formed:

module datamodel:address;

Instead, you can just put the address partition implementations in an implementation file for the datamodel module as follows:

module datamodel;  // Not datamodel:address!
import std;
using namespace std;
DataModel::Address::Address() { println("Address::Address()"); }

Multiple files cannot have the same partition name. Having multiple module interface partition files with the same partition name is illegal, and implementations for declarations in a module interface partition file cannot go in an implementation file with the same partition name. Instead, just put those implementations in a module implementation file for the module instead.

An important point to remember when authoring modules structured in partitions is that each module interface partition must ultimately be exported by the primary module interface file, either directly or indirectly. To import a partition, you just specify the name of the partition prefixed with a colon, for example import :person. It’s illegal to say something like import datamodel:person. Remember, partitions are not exposed to users of a module; partitions only structure a module internally. Hence, users cannot import a specific partition; they must import the entire module. Partitions can be imported only within the module itself, so it’s redundant (and illegal) to specify the name of the module before the colon. Here is the primary module interface file for the datamodel module:

export module datamodel; // datamodel module (primary module interface file)
export import :person;   // Import and export person partition
export import :address;  // Import and export address partition
import std;
export namespace DataModel { using Persons = std::vector<Person>; }

This partition-structured datamodel module can be used as follows:

import datamodel;
int main() { DataModel::Address a; }

Earlier it is explained that a module name declaration implicitly includes an import name declaration. This is not the case for partitions.

For example, the datamodel:person partition does not have an implicit import datamodel declaration. In this example, it’s even not allowed to add an explicit import datamodel to the datamodel:person interface partition file. Doing so would result in a circular dependency: the datamodel interface file contains an import :person declaration, while the datamodel:person interface partition file would contain an import datamodel declaration.

To break such circular dependencies, you can move the functionality that the datamodel:person partition needs from the datamodel interface file to another partition, which subsequently can be imported by both the datamodel:person interface partition file and the datamodel interface file.

Implementation Partitions

A partition does not need to be declared in a module interface partition file, it can also be declared in a module implementation partition file, a normal source code file with extension .cpp, in which case it’s an implementation partition, sometimes called an internal partition. Such partitions cannot be exported, compared to module interface partitions, which must be exported by the primary module interface file.

For example, suppose you have the following math primary module interface file (math.cppm):

export module math; // math module declaration
export namespace Math
{
    double superLog(double z, double b);
    double lerchZeta(double lambda, double alpha, double s);
}

Suppose further that the implementations of the math functions require some helper functions that must not be exported by the module. An implementation partition is the perfect place to put such helper functions. The following defines such an implementation partition in a file called math_helpers.cpp:

module math:details;  // math:details implementation partition
double someHelperFunction(double a) { return /* … */ ; }

Other math module implementation files can get access to these helper functions by importing this implementation partition. For example, a math module implementation file (math.cpp) could look like this:

module math;
import :details;
double Math::superLog(double z, double b) { return /* … */; }
double Math::lerchZeta(double lambda, double alpha, double s) { return /* … */; }

With the import :details; declaration, the superLog() and lerchZeta() functions can call someHelperFunction().

Of course, using such implementation partitions with helper functions makes sense only if multiple other source files use those helper functions.

Private Module Fragment

The primary module interface can include a private module fragment. This private module fragment starts with the following line:

module :private;

Everything after this line is part of the private module fragment. Anything that is defined in this private module fragment is not exported and thus not visible to consumers of the module.

Chapter 9, “Mastering Classes and Objects,” demonstrates the pimpl idiom, also known as the private implementation idiom. It hides all implementation details from consumers of a class. The solution in Chapter 9 requires two files: a primary module interface file and a module implementation file. Using a private module fragment, you can achieve this separation using a single file. Here is a concise example:

export module adder;
import std;
export class Adder
{
    public:
        Adder();
        virtual ˜Adder();
        int add(int a, int b) const;
    private:
        class Impl;
        std::unique_ptr<Impl> m_impl;
};

module :private;

class Adder::Impl
{
    public:
        ˜Impl() { std::println("Destructor of Adder::Impl"); }
        int add(int a, int b) const { return a + b;}
};

Adder::Adder() : m_impl { std::make_unique<Impl>() } { }
Adder::˜Adder() {}
int Adder::add(int a, int b) const { return m_impl->add(a, b); }

This class can be tested as follows:

Adder adder;
println("Value: {}", adder.add(20, 22));

Now, to prove that everything in the private module fragment is truly hidden, let’s add a public member function getImplementation() at the end of the Adder class:

export class Adder
{
    /* … as before, omitted for brevity … */
    private:
        class Impl;
        std::unique_ptr<Impl> m_impl;
    public:
        Impl* getImplementation() { return m_impl.get(); }
};

The following compiles and works fine:

Adder adder;
auto impl { adder.getImplementation() };

From the point of view of consumers of the Adder module, getImplementation() returns a pointer to an incomplete type. The code snippet is storing that pointer in a variable called impl. Simply storing a pointer to an incomplete type is fine, as long as you use auto type deduction. However, you cannot do anything with that pointer. Calling add() on that incomplete pointer results in an error:

auto result { impl->add(20, 22) };  // Error!

The error is something like: use of undefined type Adder::Impl. The reason is that the Adder::Impl class is part of the private module fragment and hence not accessible from consumers of the Adder module.

If you remove the module :private; line from the module interface file, then the previous code snippet compiles and runs fine. You might be surprised at first sight by this; after all, the Adder::Impl class is not explicitly exported. That’s correct—it’s not explicitly exported, but it is implicitly exported because the Adder class is exported and the Impl class is declared within the Adder class.

Header Units

When importing a module, you use an import declaration such as the following:

import person;

If you have legacy code, such as a person.h header file defining a Person class, then you can modularize it by converting it to a proper module, person.cppm, and use import declarations to make it available to client code. However, sometimes you cannot modularize such headers. Maybe your Person class should remain usable by compilers that do not yet have support for modules. Or maybe the person.h header is part of a third-party library that you cannot modify. In such cases, you can import your header file directly, as follows:

import "person.h";

With such a declaration, everything in the person.h header file becomes implicitly exported. Additionally, macros defined in the header become visible to client code, which is not the case for real modules, neither for your own modules nor for the named std and std.compat modules.

Such an import declaration can include relative or absolute paths to header files, and you can use < > instead of "" to search in the system include directories:

import "include/person.h"; // Can include a relative or absolute path.
import <person.h>;         // Search in system include directories.

Compared to using #include to add a header file, using import will improve build throughput, as the person.h header will implicitly be converted to a module and hence be compiled only once, instead of every time when the header is included in a source file. As such, it can be used as a standardized way to support precompiled header files, instead of using compiler-dependent precompiled header file support.

For each import declaration naming a header file, the compiler creates a module with an exported interface similar to what the header file defines, i.e., it implicitly exports everything from the header file. This is called a header unit. The procedure for this is compiler dependent, so check the documentation of your compiler to learn how to work with header units.

Importable Standard Library Headers

All C++ headers, such as <iostream>, <vector>, <string>, and so on, are importable headers that can be imported with an import declaration. That means you can, for example, write the following:

import <vector>;

Of course, starting with C++23, it’s more convenient to simply import the named module called std, instead of manually importing those importable headers that you need. For example, the following makes everything in the Standard Library available for your use:

import std;

As you know by now, importable C++ Standard Library headers don’t have any .h extension, e.g., <vector>, and they define everything in the std namespace or a subnamespace of std.

In C, the names of Standard Library header files end with .h, such as <stdio.h>, and namespaces are not used.

Most of the Standard Library functionality from C is available in C++ but is provided through two different headers:

The recommended versions without the .h extension but with a c prefix, for example, <cstdio>. These put everything in the std namespace.
The C-style versions with the .h extension, for example, <stdio.h>. These do not use namespaces. Their use is discouraged, except when you are writing code that needs to be both valid C++ and valid C at the same time. This use case is not further discussed in this C++ book.

Technically, the old versions are allowed to put things in the std namespace as well, and the new versions are allowed to additionally put things in the global namespace. This behavior is not standardized, so you should not rely on it.

As mentioned earlier, when using import std; you automatically get access to C-style functions, such as the mathematical functions defined in <cmath>. They will be in the std namespace, e.g., std::sqrt(). If you import std.compat; these C-style functions will additionally be available in the global namespace, e.g., ::sqrt().

However, if you cannot use the std or std.compat named modules, then keep in mind that the C Standard Library headers are not guaranteed to be importable with an import declaration. In that case, to be safe, use #include <cxyz> instead of import <cxyz>;.

Additionally, as mentioned in the previous section, importing a proper module, e.g., std or std.compat, won’t make any C-style macros defined in the module available to the importing code. This is especially important to remember when you want to use C-style macros from the C Standard Library. Luckily, there aren’t many! One of them is <cassert>, a C Standard Library header that defines the assert() macro, which is explained in more detail in Chapter 31, “Conquering Debugging.” Since the named std and std.compat modules won’t make the assert() macro available to importing code, and since <cassert> is a C Standard Library header and thus not guaranteed to be importable, you must use #include <cassert> to get access to assert().

If you do need to #include a header in a module interface or module implementation file, the #include directives should be placed in the global module fragment, which must come before any named module declaration and starts with a nameless module declaration. A global module fragment can only contain preprocessing directives such as #includes, #defines, and so on. Such a global module fragment and comments are the only things that are allowed to appear before a named module declaration. For example, if you need to use functionality from the <cassert> C header file, you can make that available as follows:

module;                // Start of the global module fragment
#include <cassert>     // Include legacy header files

export module person;  // Named module declaration
import std;
export class Person { /* … */ };

Place all #include directives in a module interface or module implementation file in the global module fragment.

PREPROCESSOR DIRECTIVES

Chapter 1 introduces the #include preprocessor directive to include the contents of a header file. There are a few more preprocessor directives available. The following table shows some of the most commonly used preprocessor directives:

PREPROCESSOR DIRECTIVE	FUNCTIONALITY	COMMON USES
`#include [file]`	The contents of the file with name `[file]` is inserted into the code at the location of the directive.	Almost always used to include header files so that code can make use of functionality defined elsewhere.
`#define [id] [value]`	Every occurrence of the identifier `[id]` is replaced with `[value]`.	Often used in C to define a constant value or a macro. C++ provides better mechanisms for constants and most types of macros. Macros can be dangerous, so use them cautiously. See the next section for some examples.
`#undef [id]`	Undefines the identifier `[id]` previously defined using `#define`.	Used if a defined identifier is only required within a limited scope of the code.
`cpp #if [expression] #elif [expression] #else #endif`	Conditionally include a block of code based on the result of a given expression.	Often used to provide specific code for specific platforms.
`cpp #ifdef [id] #endif #ifndef [id] #endif`	Conditionally include code based on whether the specified identifier has been defined with `#define`. `#ifdef [id]` is equivalent to `#if defined(id)` and `#ifndef [id]` is equivalent to `#if !defined(id)`.	Used most frequently to protect against circular includes. Each header file starts with an `#ifndef` checking the absence of an identifier, followed by a `#define` directive to define that identifier. The header file ends with an `#endif`. This prevents the file from being included multiple times; see the Header Files section later in this chapter.
`cpp #elifdef [id] #elifndef [id]`	`#elifdef [id]` is equivalent to `#elif defined(id)` and `#elifndef [id]` is equivalent to `#elif !defined(id)`.	Shorthand notations for other functionality.
`#pragma [xyz]`	Controls compiler-specific behavior. `[xyz]` is compiler dependent. Most compilers support `once` to prevent a header file from being included multiple times.	See the Header Files section later in this chapter for an example.
`#error [message]`	Causes the compilation to stop with the given message.	Can be used to stop the compilation if the user tries to compile code on an unsupported platform.
`#warning [message]`	Causes the compiler to emit the given message as a warning, but compilation continues.	Used to display a warning to the user without affecting the compilation result.

Preprocessor Macros

You can use the C++ preprocessor to write macros, which are like little functions. Here is an example:

#define SQUARE(x) ((x) * (x)) // No semicolon after the macro definition!

int main()
{
    println("{}", SQUARE(5));
}

Macros are a remnant from C that are quite similar to inline functions, except that they are not type-checked, and the preprocessor dumbly replaces any calls to them with their expansions. The preprocessor does not apply true function-call semantics. This behavior can cause unexpected results. For example, consider what would happen if you called the SQUARE macro with 2+3 instead of 5, like this:

println("{}", SQUARE(2 + 3));

You expect SQUARE to calculate 25, which it does. However, what if you left out some parentheses from the macro definition, so that it looks like this?

#define SQUARE(x) (x * x)

Now the call to SQUARE(2+3) generates 11, not 25! Remember that the macro is dumbly expanded without regard to function-call semantics. This means that any x in the macro body is replaced by 2 + 3, leading to this expansion:

println("{}", (2 + 3 * 2 + 3));

Following proper order of operations, this line performs the multiplication first, followed by the additions, generating 11 instead of 25!

Macros can also have a performance impact. Suppose you call the SQUARE macro as follows:

println("{}", SQUARE(veryExpensiveFunctionCallToComputeNumber()));

The preprocessor replaces this with the following:

println("{}", ((veryExpensiveFunctionCallToComputeNumber()) *
         (veryExpensiveFunctionCallToComputeNumber())));

Now you are calling the expensive function twice—another reason to avoid macros.

Macros also cause problems for debugging because the code you write is not the code that the compiler sees or that shows up in your debugger (because of the search-and-replace behavior of the preprocessor). For these reasons, you should avoid macros entirely in favor of inline functions. The details are shown here only because quite a bit of C++ code out there still employs macros. You need to understand them to read and maintain such code.

LINKAGE

This section describes the concept of linkage in C++. As Chapter 1 explains, C++ source files are first processed by the preprocessor, which processes all preprocessor directives, resulting in translation units. All translation units are then compiled independently into object files, which contain the machine executable code but in which references to functions and so on are not yet defined. Resolving those references is done by the final phase, the linker, which links all object files together into the final executable. Technically, there are a few more phases in the compilation process, but for this discussion, this simplified view is sufficient.

Each name in a C++ translation unit, including functions and global variables, either has linkage or has no linkage, and this specifies where that name can be defined and from where it can be accessed. There are four types of linkage:

No linkage: The name is accessible only from the scope in which it is defined.
External linkage: The name is accessible from any translation unit.
Internal linkage (also called static linkage): The name is accessible only from the current translation unit, but not from other translation units.
Module linkage: The name is accessible from any translation unit from the same module.

Internal Linkage

By default, functions and global variables have external linkage. However, you can specify internal (or static) linkage by employing anonymous namespaces. For example, suppose you have two source files: FirstFile.cpp and AnotherFile.cpp. Here is FirstFile.cpp:

void f();

int main()
{
    f();
}

Note that this file provides a prototype for f() but doesn’t show the definition. Here is AnotherFile.cpp:

import std;

void f();

void f()
{
    std::println("f");
}

This file provides both a prototype and a definition for f(). Note that it is legal to write prototypes for the same function in two different files. That’s precisely what the preprocessor does for you if you put the prototypes in a header file that you #include in each of the source files. For this example, I don’t use a header file. The reason to use header files used to be that it was easier to maintain (and keep synchronized) one copy of the prototype, but now that C++ has support for modules, using modules is recommended over using header files.

Each of these source files compiles without error, and the program links fine: because f() has external linkage, main() can call it from a different file.

However, suppose you wrap the f() function in AnotherFile.cpp in an anonymous namespace to give it internal linkage as follows:

import std;

namespace
{
    void f();

    void f()
    {
        std::println("f");
    }
}

Entities in an anonymous namespace have internal linkage and thus can be accessed anywhere following their declaration in the same translation unit, but cannot be accessed from other translation units. With this change, each of the source files still compiles without error, but the linker step fails because f() has internal linkage, making it unavailable from FirstFile.cpp.

An alternative to using anonymous namespaces to give a name internal linkage is to prefix the declaration with the keyword static. The earlier anonymous namespace example can be written as follows. Note that you don’t need to repeat the static keyword in front of the definition of f(). As long as it precedes the first instance of the function name, there is no need to repeat it.

import std;

static void f();

void f()
{
    std::println("f");
}

The semantics of this version of the code are exactly the same as the one using an anonymous namespace.

If a translation unit needs a helper entity that is only required within that translation unit, wrap it in an anonymous namespace to give it internal linkage. Using the static keyword for this is discouraged.

The extern Keyword

A related keyword, extern, seems like it should be the opposite of static, specifying external linkage for the names it precedes, and it can be used that way in certain cases. For example, consts and typedefs have internal linkage by default. You can use extern to give them external linkage. However, extern has some complications. When you specify a name as extern, the compiler treats it as a declaration, not a definition. For variables, this means the compiler doesn’t allocate space for the variable. You must provide a separate definition for the variable without the extern keyword. For example, here is the content of AnotherFile.cpp:

extern int x;
int x { 3 };

Alternatively, you can initialize x in the extern statement, which then serves as the declaration and the definition:

extern int x { 3 };

The extern in this case is not very useful, because x has external linkage by default anyway. The real use of extern is when you want to use x from another source file, FirstFile.cpp:

import std;

extern int x;

int main()
{
    std::println("{}", x);
}

Here, FirstFile.cpp uses an extern declaration so that it can use x. The compiler needs a declaration of x to use it in main(). If you declared x without the extern keyword, the compiler would think it’s a definition and would allocate space for x, causing the linkage step to fail (because there are then two x variables in the global scope). With extern, you can make variables globally accessible from multiple source files.

It is not recommended to use global variables at all. They are confusing and error-prone, especially in large programs. Use them judiciously!

HEADER FILES

Before the introduction of C++20’s modules, header files, also called headers, were used as a mechanism for providing the interface to a subsystem or piece of code. The most common use of headers is to declare functions that will be defined elsewhere. A declaration tells the compiler that an entity (function, variable, etc.) with a certain name exists. For functions, a declaration specifies how a function is called, declaring the number and types of parameters and the function’s return type. A definition also tells the compiler that an entity with a certain name exists, but also defines the entity itself. For functions, a definition contains the actual code for the function. All definitions are declarations, but not all declarations are definitions. Declarations, and thus also class definitions, which are declarations, see Chapter 8, “Gaining Proficiency with Classes and Objects,” usually go into header files, typically with extension .h. Definitions, including definitions of non-inline class members, usually go into source files, typically with extension .cpp. This book uses modules everywhere, but this section briefly discusses a few trickier aspects of using header files, such as avoiding duplicate definitions and circular dependencies, because you will encounter these in legacy code bases.

One Definition Rule (ODR)

A single translation unit can have exactly one definition of a variable, function, class type, enumeration type, concept, or template. For some types, multiple declarations are allowed, but not multiple definitions. Furthermore, exactly one definition of non-inline functions and non-inline variables is allowed in the entire program.

With header files, it’s easy to violate the one definition rule, resulting in duplicate definitions. The next section discusses how such duplicate definitions through header files can be avoided.

Between modules, it’s harder to violate the one definition rule, as each module is much better isolated from other modules. A major reason for this is that an entity in a module that is not exported from that module has module linkage and thus is inaccessible from code in other modules. That is, multiple modules can define their own local non-exported entities with the same name without any problem. On the other hand, in non-modular source files, local entities have external linkage by default. Of course, within a module itself, you still need to make sure you don’t violate the one definition rule.

Duplicate Definitions

Suppose A.h includes Logger.h, defining a Logger class, and B.h also includes Logger.h. If you have a source file called App.cpp, which includes both A.h and B.h, you end up with duplicate definitions of the Logger class because the Logger.h header is included through A.h and B.h.

This problem of duplicate definitions can be avoided with a mechanism known as include guards, also known as header guards. The following code snippet shows the Logger.h header with include guards. At the beginning of each header file, the #ifndef directive checks whether a certain key has not been defined. If the key has been defined, the compiler skips to the matching #endif, which is usually placed at the end of the file. If the key has not been defined, the file proceeds to define the key so that a subsequent include of the same file will be skipped.

#ifndef LOGGER_H
#define LOGGER_H

class Logger { /* … */ };

#endif // LOGGER_H

Alternatively, nearly all compilers these days support the #pragma once directive, which replaces include guards. Placing a #pragma once at the beginning of a header file makes sure it’ll be included only once and hence avoids duplicate definitions resulting from including the header multiple times. Here’s an example:

#pragma once

class Logger { /* … */ };

Circular Dependencies

Another tool for avoiding problems with header files is forward declarations. If you need to refer to a class but you cannot include its header file (for example, because it relies heavily on the class you are writing), you can tell the compiler that such a class exists without providing a formal definition through the #include mechanism. Of course, you cannot actually use the class in the code because the compiler knows nothing about it, except that the named class will exist after everything is linked together. However, you can still make use of pointers and references to forward-declared classes in your code. You can also declare functions that return such forward-declared classes by value or that have such forward-declared classes as pass-by-value function parameters. Of course, both the code defining the function and any code calling the function will need to include the right header files that properly define the forward-declared classes.

For example, assume that the Logger class uses another class called Preferences that keeps track of user settings. The Preferences class may in turn use the Logger class, so you have a circular dependency that cannot be resolved with include guards. You need to make use of forward declarations in such cases. In the following code, the Logger.h header file uses a forward declaration for the Preferences class and subsequently refers to the Preferences class without including its header file:

#pragma once

#include <string_view>

class Preferences;  // forward declaration

class Logger
{
    public:
        void setPreferences(const Preferences& preferences);
        void logError(std::string_view error);
};

It’s recommended to use forward declarations as much as possible in your header files instead of including other headers. This can reduce your compilation and recompilation times, because it breaks dependencies of your header file on other headers. Of course, your implementation file needs to include the correct headers for types that you’ve forward-declared; otherwise, it won’t compile.

Querying Existence of Headers

To query whether a certain header file exists, use the __has_include("filename") or __has_include(<filename>) preprocessor constant expressions. These evaluate to 1 if the header file exists, 0 if it doesn’t exist. For example, before the <optional> header file was fully approved for C++17, some compilers already had a preliminary version in <experimental/optional>. You could use __has_include() to check which of the two header files was available on your system:

#if __has_include(<optional>)
    #include <optional>
#elif __has_include(<experimental/optional>)
    #include <experimental/optional>
#endif

Module Import Declarations

Header files should not contain any module import declarations. The standard mandates that module import declarations must be at the beginning of a file before any other declarations and must not be coming from header inclusions or preprocessor macro expansions. This makes it easier on build systems to discover module dependencies, which are then used to determine the order modules need to be built.

FEATURE-TEST MACROS FOR CORE LANGUAGE FEATURES

You can use feature-test macros to detect which core language features are supported by a compiler. All these macros start with either __cpp_ or __has_cpp_. The following are some examples. Consult your favorite C++ reference for a complete list of all possible core language feature-test macros.

__cpp_range_based_for
__cpp_binary_literals
__cpp_char8_t
__cpp_generic_lambdas
__cpp_consteval
__cpp_coroutines
…
__has_cpp_attribute([attribute_name])
…

The value of these macros is a number representing the month and year when a specific feature was added or updated. The date is formatted as YYYYMM. For example, the value of __cpp_binary_literals is 201304, i.e., April 2013, which is the date when binary literals were introduced. As another example, the value of __has_cpp_attribute(nodiscard) can be 201603, i.e., March 2016, which is the date when the [[nodiscard]] attribute was first introduced. Or it can be 201907, i.e., July 2019, which is the date when the attribute was updated to allow specifying a reason such as [[nodiscard("Reason")]].

All these core language feature-test macros are available without having to include any specific header. Here is an example use:

int main()
{
#ifdef __cpp_range_based_for
    println("Range-based for loops are supported!");
#else
    println("Bummer! Range-based for loops are NOT supported!");
#endif
}

Chapter 16, “Overview of the C++ Standard Library,” explains that there are similar feature-test macros for Standard Library features.

THE STATIC KEYWORD

There are several uses of the keyword static in C++, all seemingly unrelated. Part of the motivation for “overloading” the keyword was attempting to avoid having to introduce new keywords into the language. One use of the keyword is discussed earlier in this chapter in the context of linkage. Other uses are discussed in this section.

static Data Members and Member Functions

You can declare static data members and member functions of classes. static data members, unlike non-static data members, are not part of each object. Instead, there is only one copy of the data member, which exists outside any objects of that class.

static member functions are similarly at the class level instead of the object level. A static member function does not execute in the context of a specific object; hence, it does not have an implicit this pointer. This also means that static member functions cannot be marked as const.

Chapter 9 provides examples of both static data members and member functions.

static Variables in Functions

Another use of the static keyword in C++ is to create variables that retain their values between exits and entrances to their scope. For example, a static local variable inside a function is like a global variable that is accessible only from within that function. One common use of static variables is to “remember” whether a particular initialization has been performed for a certain function. For example, code that employs this technique might look something like this:

void performTask()
{
    static bool initialized { false };
    if (!initialized) {
        println("initializing");
        // Perform initialization.
        initialized = true;
    }
    // Perform the desired task.
}

However, static variables can be confusing, and there are usually better ways to structure your code so that you can avoid them. In this case, you might want to write a class in which the constructor performs the required initialization.

Sometimes, however, they can be useful. One example is for implementing the Meyers’ singleton design pattern, as explained in Chapter 33, “Applying Design Patterns.”

Order of Initialization of Nonlocal Variables

Before leaving the topic of static variables, consider the order of initialization of such variables. All global and static variables in a program are initialized before main() begins. The variables in a given source file are initialized in the order they appear in the source file. For example, in the following file, Demo::x is guaranteed to be initialized before y:

class Demo
{
    public:
        static int x;
};
int Demo::x { 3 };
int y { 4 };

However, C++ provides no specifications or guarantees about the initialization ordering of nonlocal variables in different source files. If you have a global variable x in one source file and a global variable y in another, you have no way of knowing which will be initialized first. Normally, this lack of specification isn’t cause for concern. However, it can be problematic if one global or static variable depends on another. Recall that initialization of objects implies running their constructors. The constructor of one global object might access another global object, assuming that it is already constructed. If these two global objects are declared in two different source files, you cannot count on one being constructed before the other, and you cannot control the order of initialization. This order might not be the same for different compilers or even different versions of the same compiler, and the order might even change when you simply add another file to your project.

Initialization order of nonlocal variables in different source files is undefined.

Order of Destruction of Nonlocal Variables

Nonlocal variables are destroyed in the reverse order they were initialized. Nonlocal variables in different source files are initialized in an undefined order, which means that the order of destruction is also undefined.

C-STYLE VARIABLE-LENGTH ARGUMENT LISTS

In legacy code, you might come across the use of C-style variable-length argument lists. In new code, you should avoid using these and instead use variadic templates for type-safe variable-length argument lists, which are covered in Chapter 26, “Advanced Templates.”

So that you are aware of C-style variable-length argument lists, consider the C function printf() from <cstdio>. You can call it with any number of arguments:

printf("int %d\n", 5);
printf("String %s and int %d\n", "hello", 5);
printf("Many ints: %d, %d, %d, %d, %d\n", 1, 2, 3, 4, 5);

C/C++ provides the syntax and some utility macros for writing your own functions with a variable number of arguments. These functions usually look a lot like printf(). For example, suppose you want to write a quick-and-dirty debug function that prints strings to stderr if a debug flag is set but does nothing if the debug flag is not set. Just like printf(), this function should be able to print strings with an arbitrary number of arguments and arbitrary types of arguments. A simple implementation looks as follows:

import std;
#include <cstdarg>
#include <cstdio>

bool debug { false };

void debugOut(const char* str, …)
{
    if (debug) {
        va_list ap;
        va_start(ap, str);
        vfprintf(stderr, str, ap);
        va_end(ap);
    }
}

The code uses va_list(), va_start(), and va_end(), which are macros defined in <cstdarg> and thus require an explicit #include <cstdarg>, as import std; does not export any macros. Similarly, stderr is a macro defined in <cstdio> requiring an explicit #include <cstdio>.

The prototype for debugOut() contains one typed and named parameter str, followed by … (ellipses). They stand for any number and type of arguments. To access these arguments, you must declare a variable of type va_list and initialize it with a call to va_start. The second parameter to va_start() must be the rightmost named variable in the parameter list. All functions with variable-length argument lists require at least one named parameter. The debugOut() function simply passes this list to vfprintf() (a standard function in <cstdio>). After the call to vfprintf() returns, debugOut() calls va_end() to terminate the access of the variable argument list. You must always call va_end() after calling va_start() to ensure that the function ends with the stack in a consistent state.

You can use the function in the following way:

debug = true;
debugOut("int %d\n", 5);
debugOut("String %s and int %d\n", "hello", 5);
debugOut("Many ints: %d, %d, %d, %d, %d\n", 1, 2, 3, 4, 5);

Accessing the Arguments

If you want to access the actual arguments yourself, you can use va_arg() to do so. It accepts a va_list as first argument, and the type of the argument to interpret. Unfortunately, there is no way to know what the end of the argument list is unless you provide an explicit way of doing so. For example, you can make the first parameter a count of the number of parameters. Or, in the case where you have a set of pointers, you may require the last pointer to be nullptr. There are many ways, but they are all burdensome to the programmer.

The following example demonstrates the technique where the caller specifies in the first named parameter how many arguments are provided. The function accepts any number of ints and prints them out.

void printInts(unsigned num, …)
{
    va_list ap;
    va_start(ap, num);
    for (unsigned i { 0 }; i < num; ++i) {
        int temp { va_arg(ap, int) };
        print("{} ", temp);
    }
    va_end(ap);
    println("");
}

You can call printInts() as follows. Note that the first parameter specifies how many integers will follow.

printInts(5, 5, 4, 3, 2, 1);

Why You Shouldn’t Use C-Style Variable-Length Argument Lists

Accessing C-style variable-length argument lists is not very safe. There are several risks, as you can see from the printInts() function.

You don’t know the number of parameters. In the case of printInts(), you must trust the caller to pass the right number of arguments as the first argument. In the case of debugOut(), you must trust the caller to pass the same number of arguments after the str string as there are replacement fields in the string.
You don’t know the types of the arguments. va_arg() takes a type, which it uses to interpret the value in its current spot. However, you can tell va_arg() to interpret the value as any type. There is no way for it to verify the correct type.

Avoid using C-style variable-length argument lists. It is preferable to pass in an std::array or vector of values, to use initializer lists described in Chapter 1, or to use variadic templates for type-safe variable-length argument lists, as described in Chapter 26.

SUMMARY

This chapter started with details on authoring and consuming modules and discussed a few trickier aspects of using old-style header files. You also learned about preprocessor directives, preprocessor macros, details of linkage, the one definition rule, and the different uses of the static and extern keywords. The chapter finished with a discussion on how to write C-style variable-length argument lists.

Preprocessor directives and C-style variable-length argument lists are important to understand, because you might encounter them in legacy code bases. However, they should be avoided in any newly written code.

The next chapter starts a discussion on templates allowing you to write generic code.

EXERCISES

By solving the following exercises, you can practice the material discussed in this chapter. Solutions to all exercises are available with the code download on the book’s website at www.wiley.com/go/proc++6e. However, if you are stuck on an exercise, first reread parts of this chapter to try to find an answer yourself before looking at the solution from the website.

Exercise 11-1: Write a single-file module called simulator containing two classes, CarSimulator and BikeSimulator, in a Simulator namespace. The content of the classes is not important for these exercises. Just provide a default constructor that prints a message to the standard output. Test your code in a main() function.
Exercise 11-2: Take your solution from Exercise 11-1 and split the module into several files: a primary module interface file without any implementations and two module implementation files, one for the CarSimulator and one for the BikeSimulator class.
Exercise 11-3: Take your solution from Exercise 11-2 and convert it to use one primary module interface file and two module interface partition files, one for the simulator:car partition containing the CarSimulator class, and one for the simulator:bike partition containing the BikeSimulator class.
Exercise 11-4: Take your solution from Exercise 11-3 and add an implementation partition called internals, containing a helper function called convertMilesToKm(double miles) in the Simulator namespace. One mile is 1.6 kilometers. Add a member function to both the CarSimulator and BikeSimulator classes called setOdometer(double miles), which uses the helper function to convert the given miles to kilometers and then prints it out to the standard output. Confirm in your main() function that the setOdometer() works on both classes. Also confirm that main() cannot call convertMilesToKm().
Exercise 11-5: Write a source file containing a preprocessor identifier with the value 0 or 1. Use preprocessor directives to check the value of this identifier. If the value is 1, make the compiler output a warning. If it’s 0, ignore it. If it’s any other value, make the compiler generate an error.