Developing Cross-Platform and Cross-Language Applications

C++ programs can be compiled to run on a variety of computing platforms, and the language has been rigorously defined to ensure that programming in C++ for one platform is similar to programming in C++ for another. Yet, despite the standardization of the language, platform differences eventually come into play when writing professional-quality programs in C++. Even when development is limited to a particular platform, small differences in compilers can elicit major headaches. This chapter examines the necessary complication of programming in a world with multiple platforms and multiple programming languages.

The first part of this chapter surveys the platform-related issues that C++ programmers encounter. A platform is the collection of all the details that make up your development and/or run-time system. For example, your platform may be the Microsoft Visual C++ 2022 compiler running on Windows 11 on an Intel Core i7 processor. Alternatively, your platform might be the GCC 13.2 compiler running on Linux on a PowerPC processor. Both of these platforms are able to compile and run C++ programs, but there are significant differences between them.

The second part of this chapter looks at how C++ can interact with other programming languages. While C++ is a general-purpose language, it may not always be the right tool for the job. Through a variety of mechanisms, you can integrate C++ code with other languages that may better serve your needs.

CROSS-PLATFORM DEVELOPMENT

There are several reasons why the C++ language encounters platform issues. C++ is a high-level language, and the standard does not specify certain low-level details. For example, the layout of an object in memory is unspecified by the standard and left to the compiler¹. Different compilers can use different memory layouts for objects.

C++ also faces the challenge of providing a standard language and a Standard Library without a standard implementation. Varying interpretations of the specification among C++ compiler and library vendors can lead to trouble when moving from one system to another.

Finally, C++ is selective in what the language provides as standard. Despite the existence of the Standard Library, programs often need functionality that is not provided by the language or the Standard Library. This functionality generally comes from third-party libraries or the platform itself, and can vary greatly.

Architecture Issues

The term architecture generally refers to the processor, or family of processors, on which a program runs. A standard PC running Windows or Linux generally runs on the x86 or x64 architecture, and older versions of macOS were usually found on the PowerPC architecture. As a high-level language, C++ shields you from the differences between these architectures. For example, a Core i7 processor may have a single instruction that performs the same functionality as six PowerPC instructions. As a C++ programmer, you don’t need to know what this difference is or even that it exists. One advantage to using a high-level language is that the compiler takes care of converting your code into the processor’s native assembly code format.

However, processor differences do sometimes rise up to the level of C++ code. The first one discussed, the size of integers, is important if you are writing cross-platform code. The others you won’t face often, unless you are doing particularly low-level work, but still, you should be aware that they exist.

Size of Integers

The C++ standard does not define the exact size of integer types. The standard just says the following in section [basic.fundamental]:

There are five standard signed integer types: signed char, short int, int, long int, and long long int. In this list, each type provides at least as much storage as those preceding it in the list.

The standard does give a few additional hints for the size of these types, but never an exact size. The actual size is compiler-dependent. Thus, if you want to write truly cross-platform code, you cannot rely on these types. One of the exercises at the end of this chapter asks you to investigate this further.

Besides these core language integer types, the C++ Standard Library does define a number of types that have clearly specified sizes, all defined in the std namespace in <cstdint>, although some of the types are optional. Here is an overview:

TYPE	DESCRIPTION
`cpp int8_t int16_t int32_t int64_t`	Signed integers of which the size is exactly 8, 16, 32, or 64 bits. On exotic platforms, some of these types might be absent. For example, if your exotic platform doesn’t support an 8-bit type, int8_t will simply be absent.
`cpp int_fast8_t int_fast16_t int_fast32_t int_fast64_t`	Signed integers with sizes of at least 8, 16, 32, or 64 bits. For these, the compiler should use the fastest integer type it has that satisfies the requirements.
`cpp int_least8_t int_least16_t int_least32_t int_least64_t`	Signed integers with sizes of at least 8, 16, 32, or 64 bits — the smallest such types that exist. These types are guaranteed always to exist, even on exotic platforms. For example, a hypothetical platform with 24-bit bytes would alias both int_least8_t and `int_least16_t` to its 24-bit `char` type.
`cpp intmax_t`	An integer type with the maximum size supported by the compiler.
`cpp intptr_t`	An integer type big enough to store a pointer. This type is also optional, but most compilers support it.

There are also unsigned variants available, such as uint8_t, uint_fast8_t, and so on.

Binary Compatibility

As you probably already know, you cannot take a program compiled for a Core i7 computer and run it on a PowerPC-based Mac. These two platforms are not binary compatible because their processors do not support the same set of instructions. When you compile a C++ program, your source code is turned into binary instructions that the computer executes. That binary format is defined by the platform, not by the C++ language.

One solution to support platforms that are not binary compatible is to build each version separately with a compiler on each target platform.

Another solution is cross-compilation. When you are using platform X for your development, but you want your program to run on platforms Y and Z, you can use a cross-compiler on your platform X that generates binary code for platforms Y and Z.

You can also make your program open source. When you make your source code available to end users, they can compile it natively on their systems and build a version of the program that is in the correct binary format for their machines. As discussed in Chapter 4, “Designing Professional C++ Programs,” open-source software has become increasingly popular. One of the major reasons is that it allows programmers to collaboratively develop software and increase the number of platforms on which it can run.

Address Sizes

When someone describes an architecture as 64-bit, they most likely mean that the address size is 64 bits, or 8 bytes. In general, a system with a larger address size can handle more memory and might operate more quickly on complex programs.

Because pointers are memory addresses, they are inherently tied to address sizes. Sometimes, programmers are taught that pointers are always 8 bytes, but this is wrong. For example, consider the following code snippet, which outputs the size of a pointer:

int *ptr { nullptr };
println("ptr size is {} bytes", sizeof(ptr));

If this program is compiled and executed on a 32-bit x86 system, the output will be as follows:

ptr size is 4 bytes

If you compile and run it on an x86-64 system, the output will be as follows:

ptr size is 8 bytes

From a programmer’s point of view, the upshot of varying pointer sizes is that you cannot equate a pointer with either 4 or 8 bytes. More generally, you need to be aware that most sizes are not prescribed by the C++ standard. The standard only says that a long is at least as long as an int, which is at least as long as a short, and so on.

The size of a pointer is also not necessarily the same as the size of an integer. For example, on a 64-bit platform, pointers are 64-bit, but integers could be 32-bit. Casting a 64-bit pointer to a 32-bit integer results in losing 32 critical bits! The standard does define an std::uintptr_t type alias in <cstdint> which is an integer type at least big enough to hold a pointer. The definition of this type is optional according to the standard, but virtually all compilers support it.

Never assume that a pointer is 32 bits or 64 bits. Never cast a pointer to an integer, unless you use std::uintptr_t.

Byte Order

All modern computers store numbers in a binary representation, but the representation of the same number on two platforms may not be identical. This sounds contradictory, but as you’ll see, there are two approaches to representing numbers that both make sense.

Most computers these days are byte-addressable, meaning that every byte in memory has a unique memory address. Numeric types in C++ usually occupy multiple bytes. For example, a short may occupy 2 bytes. Imagine that your program contains the following line:

short myShort { 513 };

In binary, the number 513 is 0000 0010 0000 0001. This number contains 16 ones and zeros, or 16 bits. Because there are 8 bits in a byte, the computer needs 2 bytes to store the number. Because each individual memory address contains 1 byte, the computer needs to split the number up into multiple bytes. Assuming that a short is 2 bytes, the number is split into two parts. The higher part of the number is put into the high-order byte, and the lower part of the number is put into the low-order byte. In this case, the high-order byte is 0000 0010 and the low-order byte is 0000 0001.

Now that the number has been split up into memory-sized parts, the only question that remains is how to store them in memory. Two bytes are needed, but the order of the bytes is underdetermined and, in fact, depends on the architecture of the system in question.

One way to represent the number is to put the high-order byte first in memory and the low-order byte next. This strategy is called big-endian ordering because the bigger part of the number comes first. PowerPC and SPARC processors use a big-endian approach. Some other processors, such as x86, arrange the bytes in the opposite order, putting the low-order byte first in memory. This approach is called little-endian ordering because the smaller part of the number comes first. An architecture may choose one approach or the other, usually based on backward compatibility. For the curious, the terms big-endian and little-endian predate modern computers by several hundred years. Jonathan Swift coined the terms in his eighteenth-century novel Gulliver’s Travels to describe the opposing camps of a debate about the proper end on which to break an egg.

Regardless of the endianness a particular architecture uses, your programs can continue to use numerical values without paying any attention to whether the machine uses big-endian ordering or little-endian ordering. That ordering only comes into play when data moves between architectures. For example, if you are sending binary data across a network, you may need to consider the endianness of the other system. One solution is to use the standard network byte ordering, which is always big-endian. So, before sending data across a network, you convert it to big-endian, and whenever you receive data from a network, you convert it from big-endian to the native endianness of your system.

Similarly, if you are writing binary data to a file, you may need to consider what happens when that file is opened on a system with opposite byte ordering.

The Standard Library includes the std::endian enumeration, defined in <bit>, which can be used to determine whether the current system is a big- or little-endian system. The following code snippet outputs the native byte ordering of your system:

switch (endian::native)
{
    case endian::little:
        println("Native ordering is little-endian.");
        break;
    case endian::big:
        println("Native ordering is big-endian.");
        break;
}

Implementation Issues

When a C++ compiler is written, it is designed by a human being who attempts to adhere to the C++ standard. Unfortunately, the C++ standard is more than 2,000 pages long and written in a combination of prose, pseudocode, and examples. Two human beings implementing a compiler according to such a standard are unlikely to interpret every piece of prescribed information in the same way or to catch every single edge case. As a result, compilers will have bugs.

Compiler Quirks and Extensions

There is no simple rule for finding or avoiding compiler bugs. The best you can do is to stay up to speed on compiler updates and perhaps subscribe to a mailing list or newsgroup for your compiler. If you suspect that you have encountered a compiler bug, a simple web search for the error message or condition you have witnessed could uncover a workaround or patch.

One area that compilers are notorious for having trouble with is language features that are added by recent updates to the standard. Although in recent years, vendors of major compilers are pretty quick in adding support for the latest features.

Another issue to be aware of is that compilers often include their own language extensions without making it obvious to the programmer. For example, variable-length stack-based arrays (VLAs) are not part of the C++ language; however, they are part of the C language. Some compilers support both the C and C++ standards and can allow the use of VLAs in C++ code. One such compiler is g++. The following compiles and runs as expected with the g++ compiler:

int i { 4 };
char myStackArray[i];  // Not a standard language feature!

Some compiler extensions may be useful, but if there is a chance that you will switch compilers at some point, you should see if your compiler has a strict mode where it avoids using such extensions. For example, compiling the previous code with the -pedantic flag passed to g++ yields the following warning:

warning: ISO C++ forbids variable length array 'myStackArray' [-Wvla]

The C++ specification allows for a certain type of compiler-defined language extension through the #pragma mechanism. #pragma is a preprocessor directive whose behavior is defined by the implementation. If the implementation does not understand the directive, it ignores it. For example, some compilers allow the programmer to turn compiler warnings off temporarily with #pragma.

Library Implementations

Most likely, your compiler includes an implementation of the C++ Standard Library. Because the Standard Library is written in C++, however, you aren’t required to use the implementation that came bundled with your compiler. You could use a third-party Standard Library that, for example, has been optimized for speed, or you could even write your own.

Of course, Standard Library implementers face the same problems that compiler writers face: the standard is subject to interpretation. In addition, certain implementations may make tradeoffs that are incompatible with your needs. For example, one implementation may optimize for speed, while another implementation may focus on optimizing for being able to catch misuses at run time.

When working with a Standard Library implementation, or indeed any third-party library, it is important to consider the tradeoffs that the designers made during the development. Chapter 4 contains a more detailed discussion of the issues involved in using libraries.

Handling Different Implementations

As discussed in the previous sections, not all compilers and Standard Library implementations behave exactly the same. This is something you need to keep in mind when doing cross-platform development. Concretely, as a developer, you are most likely using a single toolchain, that is, a single compiler with a single Standard Library implementation. It’s unlikely that you will personally verify all your code changes with all toolchains that your product must build with. The solution: continuous integration and automated testing.

You should set up a continuous integration environment that automatically builds all code changes on all toolchains that you need to support. The moment a build breaks on one of the toolchains, the developer who broke the build should automatically be informed.

Not all development environments use the same project files to describe all the source files, compiler switches, and so on. If you need to support multiple environments, manually maintaining separate project files for each is a maintenance nightmare. Instead, it’s better to a use a single type of project file or single set of build scripts that can then automatically be transformed to concrete project files or concrete build scripts for specific toolchains. One such tool is called CMake. The collection of source files, compiler switches, libraries to link with, and so on, are described in CMake configuration files, which also have support for scripting. The CMake tool then automatically generates project files, for example for Visual C++ for development on Windows, or Xcode for development on macOS.

Once the continuous integration environment produces a build, automated testing should be triggered for that build. This should run a suite of test scripts on the produced executable to verify its correct behavior. Also in this step, if something goes wrong, the developer should automatically be informed.

Platform-Specific Features

C++ is a great general-purpose language. Thanks to the Standard Library, the language is packed with so many features that a casual programmer could happily code in C++ for years without going beyond what is built in. However, professional programs require facilities that C++ does not provide. This section lists several important features that are provided by the platform or third-party libraries, not by the C++ language or the C++ Standard Library:

Graphical user interfaces: Most commercial programs today run on an operating system that has a graphical user interface, containing such elements as clickable buttons, movable windows, and hierarchical menus. C++, like the C language, has no notion of these elements. To write a graphical application in C++, you can use platform-specific libraries that allow you to draw windows, accept input through the mouse, and perform other graphical tasks. A better option is to use a third-party library, such as wxWidgets (span Start cssStyle=“text-decoration:underline”?wxwidgets.org), Qt (span Start cssStyle=“text-decoration:underline”?qt.io), Uno (platform.uno), and many more that provide an abstraction layer for building graphical applications. These libraries often provide support for many different target platforms.
Networking: The Internet has changed the way we write applications. These days, most applications check for updates through the web, and games provide a networked multiplayer mode. C++ does not provide a mechanism for networking yet, though several standard libraries exist. The most common means of writing networking software is through an abstraction called sockets. A socket library implementation can be found on most platforms, and it provides a simple procedure-oriented way to transfer data over a network. Some platforms support a stream-based networking system that operates like I/O streams in C++. There are also third-party networking libraries available that provide a networking abstraction layer. These libraries often support many different target platforms. Choosing a networking library that is IPv-independent would be a better choice than choosing one that only supports IPv4, as IPv6 is already being used.
OS events and application interaction: In pure C++ code, there is little interaction with the surrounding operating system and other applications. The command-line arguments are about all you get in a standard C++ program without platform extensions. For example, operations such as copy and paste (which interact with the operating system’s “clipboard”) are not directly supported in C++. You can either use platform-provided libraries or use third-party libraries that support multiple platforms. For example, both wxWidgets and Qt are examples of libraries that abstract the clipboard operations and support multiple platforms.
Low-level files: Chapter 13, “Demystifying C++ I/O,” explains standard I/O in C++, including reading and writing files. Many operating systems provide their own file APIs, which are usually incompatible with the standard file classes in C++. These libraries often provide OS-specific file tools, such as a mechanism to get the home directory of the current user.
Threads: Concurrent threads of execution within a single program were not directly supported in C++03 or earlier. Since C++11, a threading support library has been included with the Standard Library, as explained in Chapter 27, “Multithreaded Programming with C++,” and C++17 has added parallel algorithms, as discussed in Chapter 20, “Mastering Standard Library Algorithms.” If you need more powerful threading functionality besides what the Standard Library provides, then you need to use third-party libraries. Two examples are Intel’s Threading Building Blocks (TBB) and the STE||AR Group’s High Performance ParalleX (HPX) library.

CROSS-LANGUAGE DEVELOPMENT

For certain types of programs, C++ may not be the best tool for the job. For example, if your Unix program needs to interact closely with the shell environment, you may be better off writing a shell script than a C++ program. If your program performs heavy text processing, you may decide that the Perl language is the way to go. If you need a lot of database interaction, then C# or Java might be a better choice. C# in combination with the WPF framework or the Uno platform might be better suited to write modern graphical user interface applications, and so on. Still, if you do decide to use another language, you sometimes might want to be able to call into C++ code, for example, to perform some computational-expensive operations; or the other way around, you might want to call non-C++ code from C++. Fortunately, there are some techniques you can use to get the best of both worlds—the specific specialty of another language combined with the power and flexibility of C++.

Mixing C and C++

As you already know, the C++ language is almost a superset of the C language. Most C code can easily be converted to C++, but there are a few things to keep in mind. A handful of C features are not supported by C++; for example, C supports variable-length arrays (VLAs), while C++ does not. Other things to keep in mind are the use of reserved words. In C, for example, the term class has no particular meaning. Thus, it could be used as a variable name, as in the following C code:

int class = 1; // Compiles in C, not C++
printf("class is %d\n", class);

This program compiles and runs in C but yields an error when compiled as C++ code. When you translate, or port, a program from C to C++, these are the types of errors you will face. Fortunately, the fixes are usually quite simple. In this case, rename the class variable to, e.g., classID and the code will compile.

On the other hand, every C++ compiler is also a C compiler. There is no reason to compile “C as C++”; you can just compile “C as C”. If your project consists of a mixture of C and C++, you can simply link the C and C++ object files together into the final executable. This ease of incorporating C code in a C++ program comes in handy when you encounter a useful library or legacy code that was written in C. Functions and classes, as you’ve seen many times in this book, work just fine together. A class member function can call a function, and a function can make use of objects.

Shifting Paradigms

One of the dangers of mixing C and C++ is that your program may start to lose its object-oriented properties. For example, if your object-oriented web browser is implemented with a procedural networking library, the program will be mixing these two paradigms. Given the importance and quantity of networking tasks in such an application, you might consider writing an object-oriented wrapper around the procedural library. A typical design pattern that can be used for this is called the façade.

For example, imagine that you are writing a web browser in C++, but you are using a networking library that has a C-style API and contains the functions declared in the following code. Note that the HostHandle and ConnectionHandle data structures have been omitted for brevity.

#include "HostHandle.h"
#include "ConnectionHandle.h"

// Gets the host record for a particular Internet host given
// its hostname (i.e. www.host.com).
HostHandle* lookupHostByName(const char* hostName);
// Frees the given HostHandle.
void freeHostHandle(HostHandle* host);

// Connects to the given host.
ConnectionHandle* connectToHost(HostHandle* host);
// Closes the given connection.
void closeConnection(ConnectionHandle* connection);

// Retrieves a web page from an already-opened connection.
char* retrieveWebPage(ConnectionHandle* connection, const char* page);
// Frees the memory pointed to by page.
void freeWebPage(char* page);

The networklib.h interface is fairly simple and straightforward. However, it is not object-oriented, and a C++ programmer who uses such a library is bound to feel icky, to use a technical term. This library isn’t organized into a cohesive class. Of course, the authors of the library could have written a better interface, but as the user of a library, you have to accept what you are given. Writing a wrapper is your opportunity to customize the interface.

Before you build an object-oriented wrapper for this library, take a look at how it might be used as-is to gain an understanding of its actual usage. In the following program, the networklib library is used to retrieve the web page at www.example.com/index.html:

HostHandle* myHost { lookupHostByName("www.example.com") };
ConnectionHandle* myConnection { connectToHost(myHost) };
char* result { retrieveWebPage(myConnection, "/index.html") };

println("The result is:\n{}", result);

freeWebPage(result); result = nullptr;
closeConnection(myConnection); myConnection = nullptr;
freeHostHandle(myHost); myHost = nullptr;

A possible way to make the library more object-oriented is to provide a single abstraction that recognizes the commonality between looking up a host, connecting to the host, and retrieving a web page. A good object-oriented wrapper hides the needless complexity of the HostHandle and ConnectionHandle types.

This example follows the design principles described in Chapters 5, “Designing with Classes,” and 6, “Designing for Reuse”: the new class should capture the common use case for the library. The previous example shows the most frequently used pattern: first a host is looked up, then a connection is established, and finally a page is retrieved. It is also likely that subsequent pages will be retrieved from the same host, so a good design will accommodate that mode of use as well.

To start, the HostRecord class wraps the functionality of looking up a host. It’s an RAII class. Its constructor uses lookupHostByName() to perform the lookup. The unique_ptr data member uses a custom deleter to automatically free the retrieved HostHandle by calling freeHostHandle(). See Chapter 7, “Memory Management,” for a discussion of using custom deleters with unique_ptr. Here is the code:

export class HostRecord final
{
    public:
        // Looks up the host record for the given host.
        explicit HostRecord(const std::string& host)
            : m_hostHandle { lookupHostByName(host.c_str()), freeHostHandle }
        { }
        // Returns the underlying handle.
        HostHandle* get() const noexcept { return m_hostHandle.get(); }
    private:
        std::unique_ptr<HostHandle, decltype(&freeHostHandle)> m_hostHandle;
};

Next, a WebHost class is implemented that uses the HostRecord class. The WebHost class creates a connection to a given host and supports retrieving webpages. It’s also an RAII class. When the WebHost object is destroyed, it automatically closes the connection to the host. The getPage() member function calls retrieveWebPage() and immediately stores the result in a unique_ptr with a custom deleter, freeWebPage(). Here is the code:

export class WebHost final
{
    public:
        // Connects to the given host.
        explicit WebHost(const std::string& host);
        // Obtains the given page from this host.
        std::string getPage(const std::string& page);
    private:
        std::unique_ptr<ConnectionHandle, decltype(&closeConnection)> m_connection
            { nullptr, closeConnection };
};

WebHost::WebHost(const std::string& host)
{
    HostRecord hostRecord { host };
    if (hostRecord.get()) {
        m_connection = { connectToHost(hostRecord.get()), closeConnection };
    }
}

std::string WebHost::getPage(const std::string& page)
{
    std::string resultAsString;
    if (m_connection) {
        std::unique_ptr<char[], decltype(&freeWebPage)> result {
            retrieveWebPage(m_connection.get(), page.c_str()),
            freeWebPage };
        resultAsString = result.get();
    }
    return resultAsString;
}

The WebHost class effectively encapsulates the behavior of a host and provides useful functionality without unnecessary calls and data structures. The implementation of the WebHost class makes extensive use of the networklib library without exposing any of its workings to the user. The constructor of WebHost uses a HostRecord RAII object for the specified host. The resulting HostRecord is used to set up a connection to the host, which is stored in the m_connection data member for later use. The HostRecord RAII object is automatically destroyed at the end of the constructor. The WebHost destructor destroys m_connection which closes the connection. The getPage() member function uses retrieveWebPage() to retrieve a web page, converts it to an std::string, uses freeWebPage() to free memory, and returns the retrieved page as an std::string.

The WebHost class makes the common case easy for the client programmer. Here is an example:

WebHost myHost { "www.example.com" };
string result { myHost.getPage("/index.html") };
println("The result is:\n{}", result);

As you can see, the WebHost class provides an object-oriented wrapper around the C-style library. By providing an abstraction, you can change the underlying implementation without affecting client code, and you can provide additional features. These features can include connection reference counting, automatically closing connections after a specific time to adhere to the HTTP specification, automatically reopening the connection on the next getPage() call, and so on.

You’ll explore writing wrappers a bit more in one of the exercises at the end of this chapter.

Linking with C Code

The previous example assumed that you had the raw C code to work with. The example took advantage of the fact that most C code will successfully compile with a C++ compiler. If you only have compiled C code, perhaps in the form of a library, you can still use it in your C++ program, but you need to take a few extra steps.

Before you can start using compiled C code in your C++ programs, you first need to know about a concept called name mangling. To implement function overloading, the complex C++ namespace is “flattened.” For example, if you have a C++ program, it is legitimate to write the following:

void myFunc(double);
void myFunc(int);
void myFunc(int, int);

However, this would mean that the linker would see several different functions, all called myFunc, and would not know which one you want to call. Therefore, all C++ compilers perform an operation that is referred to as name mangling and is the logical equivalent of generating names, as follows:

myFunc_double
myFunc_int
myFunc_int_int

To avoid conflicts with other names you might have defined, the compiler might generate names that are reserved as identifiers, for example, names beginning with double underscores or names beginning with an underscore followed by an uppercase letter. Alternatively, some compilers generate names that have characters that are legal to the linker but not legal in C++ source code. For example, Microsoft VC++ generates names as follows:

?myFunc@@YAXN@Z
?myFunc@@YAXH@Z
?myFunc@@YAXHH@Z

This encoding is complex and often vendor specific. The C++ standard does not specify how function overloading should be implemented on a given platform, so there is no standard for name mangling algorithms.

In C, function overloading is not supported (the compiler will complain about duplicate definitions). So, names generated by the C compiler are quite simple, for example, _myFunc.

Now, if you compile a simple program with the C++ compiler, even if it has only one instance of the myFunc name, it still generates a request to link to a mangled name. However, when you link with the C library, it cannot find the desired mangled name, and the linker complains. Therefore, it is necessary to tell the C++ compiler to not mangle that name. This is done by using the extern "C" qualification both in the header file (to instruct the client code to create a name compatible with C) and, if your library source is in C++, at the definition site (to instruct the library code to generate a name compatible with C).

Here is the syntax of extern "C":

extern "C" declaration1();
extern "C" declaration2();

or:

extern "C" {
    declaration1();
    declaration2();
}

The C++ standard says that any language specification can be used, so in principle, the following could be supported by a compiler:

extern "C" void myFunc(int i);
extern "Fortran" Matrix* matrixInvert(Matrix* M);
extern "Pascal" void someLegacySubroutine(int n);
extern "Ada" bool aimMissileDefense(double angle);

In practice, many compilers only support "C". Each compiler vendor will inform you which language designators they support.

As an example, the following code specifies the function prototype for cFunction() as an external C function:

extern "C" {
    void cFunction(int i);
}

int main()
{
    cFunction(8); // Calls the C function.
}

The actual definition for cFunction() is provided in a compiled binary file attached in the link phase. The extern keyword informs the compiler that the linked-in code was compiled in C.

A more common pattern for using extern is at the header level. For example, if you are using a graphics library written in C, it probably came with an .h file for you to include. The author of this header file should condition it on whether it is being compiled for C or C++. A C++ compiler predefines the symbol __cplusplus if you are compiling for C++. The symbol is not defined for C compilations. This symbol can be used to condition a header file as follows:

#ifdef __cplusplus
    extern "C" {
#endif
        drawCircle();
        drawSquare();
#ifdef __cplusplus
    } // matches extern "C"
#endif

This means that drawCircle() and drawSquare() are functions that are in a library compiled by the C compiler. Using this technique, the same header file can be used in both C and C++ clients.

Whether you are including C code in your C++ program or linking against a compiled C library, remember that even though C++ is almost a superset of C, they are different languages with different design goals. Adapting C code to work in C++ is quite common, but providing an object-oriented C++ wrapper around procedural C code is often much better.

Calling C++ Code from C#

Even though this is a C++ book, I won’t pretend that there aren’t other languages out there. One example is C#. By using the Interop services from C#, it’s pretty easy to call C++ code from within your C# applications. An example scenario could be that you develop parts of your application, like the graphical user interface, in C#, but use C++ to implement certain performance-critical or computational-expensive components. To make Interop work, you need to write a library in C++, which can be called from C#. On Windows, the library will be in a .dll file. The following C++ example defines a functionInDLL() function that is compiled into a library. The function accepts a Unicode string and returns an integer. The implementation writes the received string to the console and returns the value 42 to the caller:

import std;
using namespace std;

extern "C"
{
    __declspec(dllexport) int functionInDLL(const wchar_t* p)
    {
        wcout << format(L"The following string was received by C++: '{}'", p)
              << endl;
        return 42;    // Return some value…
    }
}

Keep in mind that you are implementing a function in a library, not writing a program, so you do not need a main() function. How you compile this code depends on your development environment. If you are using Microsoft Visual C++, you need to go to the properties of your project and select Dynamic Library (.dll) as the configuration type. The example uses __declspec(dllexport) to tell the linker that this function should be made available to clients of the library. This is the way you do it with Microsoft Visual C++. Other linkers might use a different mechanism to export functions.

Once you have the library, you can call it from C# by using the Interop services. First, you need to include the Interop namespace:

using System.Runtime.InteropServices;

Next, you define the function prototype and tell C# where it can find the implementation of the function. This is done with the following statement, assuming you have compiled the library as HelloCpp.dll:

[DllImport("HelloCpp.dll", CharSet = CharSet.Unicode)]
public static extern int functionInDLL(String s);

The first line is saying that C# should import this function from a library called HelloCpp.dll and that it should use Unicode strings. The second line specifies the actual prototype of the function, which is a function accepting a string as parameter and returning an integer. The following code shows a complete example of how to use the C++ library from C#:

using System;
using System.Runtime.InteropServices;

namespace HelloCSharp
{
    class Program
    {
        [DllImport("HelloCpp.dll", CharSet = CharSet.Unicode)]
        public static extern int functionInDLL(String s);

        static void Main(string[] args)
        {
            Console.WriteLine("Written by C#.");
            int result = functionInDLL("Some string from C#.");
            Console.WriteLine("C++ returned the value " + result);
        }
    }
}

The output is as follows:

Written by C#.
The following string was received by C++: 'Some string from C#.'
C++ returned the value 42

The details of the C# code are outside the scope of this C++ book, but the general idea should be clear with this example.

This section only talked about calling C++ functions from C# and didn’t say anything about using C++ classes from C#. That will be remedied in the next section with the introduction of C++/CLI.

Use C# Code from C++ and C++ from C# with C++/CLI

To use C# code from C++, you can use C++/CLI. CLI stands for Common Language Infrastructure and is the backbone of all .NET languages such as C#, Visual Basic .NET, and so on. C++/CLI was created by Microsoft in 2005 to be a version of C++ that supports the CLI. In December 2005, C++/CLI has been standardized as the ECMA-372 standard. You can write your C++ programs in C++/CLI and gain access to any other piece of functionality written in any other language that supports the CLI, such as C#. Keep in mind, though, that C++/CLI might lag behind the latest C++ standard, meaning that it does not necessarily support all latest C++ features. Discussing the C++/CLI language in detail is outside the scope of this pure C++ book. Only a few small examples are given.

Suppose you have the following C# class in a C# library:

namespace MyLibrary
{
    public class MyClass
    {
        public double DoubleIt(double value) { return value * 2.0; }
    }
}

You can consume this C# library from your C++/CLI code as follows. The important bits are highlighted. CLI objects are managed by a memory garbage collector that automatically cleans up memory when memory is not needed anymore. As such, you cannot just use the standard C++ new operator to create managed objects; you have to use gcnew, an abbreviation for “garbage collect new.” Instead of storing the resulting pointer in a normal C++ pointer variable such as MyClass* or in a smart pointer such as std::unique_ptr<MyClass>, you have to store it using a handle, MyClass^, usually pronounced as “MyClass hat.”

#include <iostream>

using namespace System;
using namespace MyLibrary;

int main(array<System::String^>^ args)
{
    MyClass^ instance { gcnew MyClass() };
    auto result { instance->DoubleIt(1.2) };
    std::cout << result << std::endl;
}

C++/CLI can also be used in the other direction; you can write managed C++ ref classes, which are then accessible by any other CLI language. Here is a simple example of a managed C++ ref class:

#pragma once
using namespace System;

namespace MyCppLibrary
{
    public ref class MyCppRefClass
    {
        public:
            double TripleIt(double value) { return value * 3.0; }
    };
}

This C++/CLI ref class can then be used from C# as follows:

using MyCppLibrary;

namespace MyLibrary
{
    public class MyClass
    {
        public double TripleIt(double value)
        {
            // Ask C++ to triple it.
            MyCppRefClass cppRefClass = new MyCppRefClass();
            return cppRefClass.TripleIt(value);
        }
    }
}

As you can see, the basics are not that complicated, but these examples all use primitive datatypes, such as double. It starts to become more complicated if you need to work with non-primitive datatypes such as strings, vectors, and so on, because then you need to start marshaling objects between C# and C++/CLI, and vice versa. However, this would take us too far for this brief introduction to C++/CLI.

Calling C++ Code from Java with JNI

The Java Native Interface (JNI) is part of the Java language that allows programmers to access functionality that was not written in Java. Because Java is a cross-platform language, the original intent was to make it possible for Java programs to interact with the operating system. JNI also allows programmers to make use of libraries written in other languages, such as C++. Access to C++ libraries may be useful to a Java programmer who has a performance-critical or computational-expensive piece of code or who needs to use legacy code.

JNI can also be used to execute Java code within a C++ program, but such a use is far less common. Because this is a C++ book, I do not include an introduction to the Java language. This section is recommended if you already know Java and want to incorporate C++ code into your Java code.

To begin your Java cross-language adventure, start with the Java program. For this example, the simplest of Java programs will suffice:

public class HelloCpp
{
    public static void main(String[] args)
    {
        System.out.println("Hello from Java!");
    }
}

Next, you need to declare a Java method that will be written in another language. To do this, you use the native keyword and leave out the implementation:

// This will be implemented in C++.
public static native void callCpp();

The C++ code will eventually be compiled into a shared library that gets dynamically loaded into the Java program. You can load this library inside a Java static block so that it is loaded when the Java program begins executing. The name of the library can be whatever you want, for example, hellocpp.so on Linux systems, or hellocpp.dll on Windows systems.

static { System.loadLibrary("hellocpp"); }

Finally, you need to actually call the C++ code from within the Java program. The callCpp() Java method serves as a placeholder for the not-yet-written C++ code. Here is the complete Java program:

public class HelloCpp
{
    static { System.loadLibrary("hellocpp"); }

    // This will be implemented in C++.
    public static native void callCpp();

    public static void main(String[] args)
    {
        System.out.println("Hello from Java!");
        callCpp();
    }
}

That’s all for the Java side. Now, just compile the Java program as you normally would:

javac HelloCpp.java

Then use the javah program (I like to pronounce it as jav-AHH!) to create a header file for the native function:

javah HelloCpp

After running javah, you will find a file named HelloCpp.h, which is a fully working (if somewhat ugly) C/C++ header file. Inside of that header file is a C function definition for a function called Java_HelloCpp_callCpp(). Your C++ program will need to implement this function. The full prototype is as follows:

JNIEXPORT void JNICALL Java_HelloCpp_callCpp(JNIEnv*, jclass);

Your C++ implementation of this function can make full use of the C++ language. This example just outputs some text from C++. First, you need to include the jni.h header file and the HelloCpp.h file that was created by javah. You also need to include any C++ headers that you intend to use.

#include <jni.h>
#include "HelloCpp.h"
#include <iostream>

The C++ function is written as normal. The parameters to the function allow interaction with the Java environment and the object that called the native code. They are beyond the scope of this example.

JNIEXPORT void JNICALL Java_HelloCpp_callCpp(JNIEnv*, jclass)
{
    std::cout << "Hello from C++!" << std::endl;
}

How to compile this code into a library depends on your environment, but you will most likely need to tweak your compiler’s settings to include the JNI headers. Using the GCC compiler on Linux, your compile command might look like this:

g++ -shared -I/usr/java/jdk/include/ -I/usr/java/jdk/include/linux \
HelloCpp.cpp -o hellocpp.so

The output from the compiler is the library used by the Java program. As long as the shared library is somewhere in the Java class path, you can execute the Java program as you normally would:

java HelloCpp

You should see the following result:

Hello from Java!
Hello from C++!

Of course, this example just scratches the surface of what is possible through JNI. You could use JNI to interface with OS-specific features or hardware drivers. For complete coverage of JNI, you should consult a Java text.

Calling Scripts from C++ Code

The original Unix OS included a rather limited C library, which did not support certain common operations. Unix programmers therefore developed the habit of launching scripts from applications to accomplish tasks that should have had API or library support. Scripts can be written in languages such as Perl and Python, but they can also be shell scripts for executing in a shell such as Bash.

Today, many of these Unix programmers still insist on using scripts as a form of subroutine call. To enable these kinds of interoperabilities, C++ provides the std::system() function defined in <cstdlib>. It requires only a single argument, a string representing the command you want to execute. Here are some examples:

system("python my_python_script.py");  // Launch a Python script.
system("perl my_perl_script.pl");      // Launch a Perl script.
system("my_shell_script.sh");          // Launch a Shell script.

However, there are significant risks to this approach. For example, if there is an error in the script, the caller may or may not get a detailed error indication. The system() call is also exceptionally heavy-duty, because it has to create an entire new process to execute the script. This may ultimately be a serious performance bottleneck in your application.

Using system() to launch scripts is not further discussed in this text. In general, you should explore the features of C++ libraries to see if there are better ways to do something. There are some platform-independent wrappers around a lot of platform-specific libraries, for example, the Boost Asio library, which provides portable networking and other low-level I/O, including sockets, timers, serial ports, and so on. If you need to work with a filesystem, you can use the platform-independent <filesystem> API available as part of the C++ Standard Library since C++17, and discussed in Chapter 13. Concepts like launching a Perl script with system() to process some textual data may not be the best choice. Using techniques like the regular expressions library of C++, see Chapter 21, “String Localization and Regular Expressions,” might be a better choice for your string processing needs.

Calling C++ Code from Scripts

C++ contains a built-in general-purpose mechanism to interface with other languages and environments. You’ve already used it many times, probably without paying much attention to it—it’s the arguments to and return value from the main() function.

C and C++ were designed with command-line interfaces in mind. The main() function receives the arguments from the command line and returns a status code that can be interpreted by the caller. In a scripting environment, arguments to and status codes from your program can be a powerful mechanism that allows you to interface with the environment.

A Practical Example: Encrypting Passwords

Assume that you have a system that writes everything a user sees and types to a file for auditing purposes. The file can be read only by the system administrator so that she can figure out who to blame if something goes wrong. An excerpt of such a file might look like this:

Login: bucky-bo
Password: feldspar

bucky-bo > mail
bucky-bo has no mail
bucky-bo > exit

While the system administrator may want to keep a log of all user activity, she may also want to obscure everybody’s passwords in case the file is somehow obtained by a hacker. She decides to write a script to parse the log files and to use C++ to perform the actual encryption. The script then calls out to a C++ program to perform the encryption.

The following script uses the Perl language, though almost any scripting language could accomplish this task. Note also that these days, there are libraries available for Perl that perform encryption, but, for the sake of this example, let’s assume the encryption is done in C++. If you don’t know Perl, you will still be able to follow along. The most important element of the Perl syntax for this example is the ` character. The ` character instructs the Perl script to shell out to an external command. In this case, the script will shell out to a C++ program called encryptString.

The strategy for the script is to loop over every line of a file, userlog.txt, looking for lines that contain a password prompt. The script writes a new file, userlog.out, which contains the same text as the source file, except that all passwords are encrypted. The first step is to open the input file for reading and the output file for writing. Then, the script needs to loop over all the lines in the file. Each line in turn is placed in a variable called $line.

open (INPUT, "userlog.txt") or die "Couldn't open input file!";
open (OUTPUT, ">userlog.out") or die "Couldn't open output file!";
while ($line = <INPUT>) {

Next, the current line is checked against a regular expression to see if this particular line contains the Password: prompt. If it does, Perl stores the password in the variable $1.

    if ($line =˜ m/^Password: (.*)/) {

If a match is found, the script calls the encryptString program with the detected password to obtain an encrypted version of it. The output of the program is stored in the $result variable, and the result status code from the program is stored in the variable $?. The script checks $? and quits immediately if there is a problem. If everything is okay, the password line is written to the output file with the encrypted password instead of the original one.

        $result = `./encryptString $1`;
        if ($? != 0) { exit(-1); }
        print OUTPUT "Password: $result\n";
    } else {

If the current line is not a password prompt, the script writes the line as is to the output file. At the end of the loop, it closes both files and exits.

        print OUTPUT "$line";
    }
}
close (INPUT);
close (OUTPUT);

That’s it. The only other required piece is the actual C++ program. Implementation of a cryptographic algorithm is beyond the scope of this book. The important piece is the main() function because it accepts the string that should be encrypted as an argument.

Arguments are contained in the argv array of C-style strings. You should always check the argc parameter before accessing an element of argv. If argc is 1, there is one element in the argument list, and it is accessible as argv[0]. Actual command-line parameters begin at argv[1]. The zeroth element of the argv array is generally the name of the program, but because it is controlled by whoever spawned the current process, e.g., by the Linux execve() system call, it can technically hold any data at all. All you’re guaranteed is that each argument in argv[0] through argv[argc-1] is a null-terminated string, and argv[argc] itself is a null pointer.

The following is the main() function for a C++ program that encrypts the input string. Notice that the program returns 0 for success and non-0 for failure, as is standard in Linux.

int main(int argc, char** argv)
{
    if (argc < 2) {
        println(cerr, "Usage: {} string-to-be-encrypted", argv[0]);
        return 1;
    }
    print("{}", encrypt(argv[1]));
}

Now that you’ve seen how easily C++ programs can be incorporated into scripting languages, you can combine the strengths of the two languages for your own projects. You can use a scripting language to interact with the operating system and control the flow of the script, and a traditional programming language like C++ for the heavy lifting.

Calling Assembly Code from C++

C++ is considered a fast language, especially relative to other languages. Yet, in some rare cases, you might want to use raw assembly code when speed is absolutely critical. The compiler generates assembly code from your source files, and this generated assembly code is fast enough for virtually all purposes. Both the compiler and the linker (when it supports link time code generation) use optimization algorithms to make the generated assembly code as fast as possible. These optimizers are getting more and more powerful by using special processor instruction sets such as MMX, SSE, and AVX. These days, it’s hard to write your own assembly code that outperforms the code generated by the compiler, unless you know all the little details of these enhanced instruction sets.

However, in case you do need it, the keyword asm can be used by a C++ compiler to allow the programmer to insert raw assembly code. The keyword is part of the C++ standard, but its implementation is compiler-defined. In some compilers, you can use asm to drop from C++ down to the level of assembly right in the middle of your program. Sometimes, the support for the asm keyword depends on your target architecture, and sometimes a compiler uses a non-standard keyword instead of the asm keyword. For example, Microsoft Visual C++ 2022 does not support the asm keyword. Instead, it supports the __asm keyword when compiling in 32-bit mode, and does not support inline assembly at all when compiling in 64-bit mode.

Assembly code can be useful in some applications, but I don’t recommend it for most programs. There are several reasons to avoid assembly code:

Your code is no longer portable to another processor once you start including raw assembly code for your platform.
Most programmers don’t know assembly languages and won’t be able to modify or maintain your code.
Assembly code is not known for its readability. It can hurt your program’s use of style.
Most of the time, it is not necessary. If your program is slow, look for algorithmic problems, or consult some of the other performance suggestions from Chapter 29, “Writing Efficient C++.”

When you encounter performance issues in your application, use a profiler to determine the real hotspot, and look into algorithmic speed-ups! Only start thinking about using assembly code if you have exhausted all other options, and even then, think about the disadvantages of assembly code.

Practically, if you have a computationally expensive block of code, you should move it to its own C++ function. If you determine, using performance profiling (see Chapter 29), that this function is a performance bottleneck, and there is no way to write the code smaller and faster, you might use raw assembly code to try to increase its performance.

In such a case, one of the first things you want to do is declare the function extern "C" so the C++ name mangling is suppressed. Then, you can write a separate module in assembly code that performs the function more efficiently. The advantage of a separate module is that there is both a “reference implementation” in C++ that is platform-independent, and also a platform-specific high-performance implementation in raw assembly code. The use of extern "C" means that the assembly code can use a simple naming convention (otherwise, you have to reverse-engineer your compiler’s name mangling algorithm). Then, you can link with either the C++ version or the assembly code version.

You would write this module in assembly code and run it through an assembler, rather than using the inline asm keyword in C++. This is particularly true in many of the popular x86-compatible 64-bit compilers, where the inline asm keyword is not supported.

Even though it is possible, you should use raw assembly code only if there are significant performance improvements. Additionally, don’t forget Amdahl’s law. For example, a 10x speedup in your encryption routine sounds great, but if your program spends 90 percent of its time not doing encryption, that means that 10x speedup is for 10 percent of the program only—just a 9 percent overall improvement!

SUMMARY

If you take away one point from this chapter, it should be that C++ is a flexible language. It exists in the sweet spot between languages that are too tied to a particular platform, and languages that are too high-level and generic. Rest assured that when you develop code in C++, you aren’t locking yourself into the language forever. C++ can be mixed with other technologies and has a solid history and code base that will help guarantee its relevance in the future.

In Part V of this book, I discussed software engineering methods, writing efficient C++, testing and debugging techniques, design techniques and patterns, and cross-platform and cross-language application development. This is a terrific way to end your journey through professional C++ programming because these topics help good C++ programmers become great C++ programmers. By thinking through your designs, experimenting with different approaches in object-oriented programming, selectively adding new techniques to your coding repertoire, and practicing testing and debugging techniques, you’ll be able to take your C++ skills to the professional level.

EXERCISES

By solving the following exercises, you can practice the material discussed in this chapter. Solutions to all exercises are available with the code download on the book’s website at www.wiley.com/go/proc++6e. However, if you are stuck on an exercise, first reread parts of this chapter to try to find an answer yourself before looking at the solution from the website.

Exercise 34-1: Write a program that outputs the sizes of all standard C++ integer types. If possible, try to compile and execute it with different compilers on different platforms.
Exercise 34-2: This chapter introduces the concept of big- and little-endian encoding of integer values. It also explained that over a network, it’s recommended to always use big-endian encoding and to convert as necessary. Write a program that can convert 16-bit unsigned integers between little- and big-endian encoding in both directions. Pay special attention to the data types you use. Write a main() function to test your function.
Bonus exercise: Can you do the same for 32-bit integers?
Exercise 34-3: The networking example in the Shifting Paradigms section showing how to use a C-style API with C++ might be a bit abstract. It doesn’t present an entire implementation as that would require networking code which is neither provided by the C nor the C++ Standard Library. In this exercise, let’s look at a much smaller C-style library that you might want to use in your C++ code. The C-style library basically consists of two functions. The first function, reverseString(), allocates a new string and initializes it with the reverse of a given source string. The second function, freeString(), frees the memory allocated by reverseString(). Here are their declarations with descriptive comments:
```
/// <summary>
/// Allocates a new string and initializes it with the reverse of a given string.
/// </summary>
/// <param name="string">The source string to reverse.</param>
/// <returns>A newly allocated buffer filled with the reverse of the
/// given string.
/// The returned memory needs to be freed with freeString().</returns>
char* reverseString(const char* string);

/// <summary>Frees the memory allocated for the given string.</summary>
/// <param name="string">The string to deallocate.</param>
void freeString(char* string);
```
How would you use this “library” from your C++ code?
Exercise 34-4: All examples about mixing C and C++ code in this chapter have been about calling C code from C++. Of course, the opposite is also possible when limiting yourself to data types known by C. In this exercise, you’ll combine both directions. Write a C function called writeTextFromC(const char*) that calls a C++ function called writeTextFromCpp(const char*) that uses std::println() to print out the given string to the standard output. To test your code, write a main() function in C++ that calls the C function writeTextFromC().

NOTE

Starting with C++23, the order of members is defined by the standard and must be in the same order as they are declared in the class definition—older standards allowed compilers to reorder members. However, other aspects of object layout, such as padding and alignment, are platform-specific. ↩