Strings
A string isn't typically classified as a container, but it's close enough to one to put it in this chapter. We've seen a lot of these so far, so I'll save you some excessive explanations.
Most of the interface for std::string
is pretty self-explanatory.
std::string name = "Why are all my string variables in this book called name?";
name.front(); // 'W'
name.back(); // '?'
name.at(3); // ' '
// at() performs bounds checking
name[5] = 'k';
// operator[] does not
names.size(); // get length of string
names.insert(0, 'W');
names.push_back('?');
names.pop_back();
names.append("Addendum");
names += "Hello"; // same as append()
names.clear(); // erase
names = "Hello" + std::string(" World");
names.substr(1, 4); // "ell"
// takes start index, end index
names.replace(names.begin(), names.begin() + 2, "Je");
// start iterator, end iterator
names.replace(4, 1, "y"); // "Jelly World"
// starting index, length
if (names.find("Joe") == std::string::npos) {
// does not contain "Joe"
// find returns an index or std::string::npos if it is not found
}
const auto startIndex = names.rfind('l');
// search for string or char starting from end of string
names.erase(names.begin() + startIndex); // "Jelly Word"
In std::string
and other containers, there is a separate notion of size
and capacity
. size
is the actual amount of elements in the container that are valid,
capacity
is the amount of memory currently allocated for the container including uninitialized memory. size
can be increased and decreased with the resize()
method while capacity
can be increased using the
reserve()
method. The capacity grows at a rate of 2x
the previous capacity, and is increased automatically when the size of the container goes beyond the current capacity.
When the container needs to reallocate memory, it has to first, allocate the amount of memory equal to the increased capacity, then move the current elements into this new area of memory and
finally delete the old memory. Because of the cost of allocating memory and moving the container, by increasing the capacity by 2x
each time methods like push_back
and append
have an
amortized time complexity of O(1)
.
std::string
stores a pointer to a dynamically allocated buffer. The address of this pointer may change for the aforementioned reason.
However, a std::string
also typically contains a buffer of 10 to 15 characters which is allocated on the stack. This is known as the small string optimization or SSO and allows strings of a small length to
get by without ever making a dynamic memory allocation. Dynamic allocations are very expensive compared to stack allocations so this optimization helps improve performance even though it requires more data to copy during move operations.
Thus, you may find sizeof(std::string)
to be quite a bit more than one might expect.
Concatenation with the plus operator creates a new string that is the combination of the two operands. This renders long chains of concatenations pretty inefficient.
Luckily we have the std::stringstream
to help us. We'll discuss streams later, but std::stringstream
has similar usage as std::cout
. We can use operator<<
to append to the stream and then use the str()
member method to convert everything to a string.
std::stingstream ss;
ss << "Hello World!" << greeting
<< "It's " << hour << " o'clock where I am and the weather is "
<< weatherDescription;
const auto st = ss.str();
std::string
also contains a c_str()
method to convert itself to a C string. A C string is a pointer to a contiguous buffer of characters ending in the null terminator ('\0').
This special byte serves as a sentinel for functions to know when they reached the end of the string. String literals are all C strings. This is why I've never used the plus operator on two string literals and
always converted at least one to an std::string
first because operator+(const char*, const char*)
is not a defined function. By the way, you can actually concatenate two string literals by
just putting them right next to each other.
Like function pointers, string literals don't have to be freed since they're never manually allocated. Actually, literals are stored directly in the compiled binary, and the pointer is an address to where that literal is.
const char * concatLits = "Hello" " World";
std::string concatStr = "Hello" + std::string(" ") + "World";
auto rawStringLiteral = R"DELIM(
Raw string here
"" fjedjsd
new lines part of string too
\n will not be a new line but instead the literal character \n
)DELIM";
// DELIM can be whatever you want it to be
const wchar_t * wideStr = L"2 bytes per character";
const auto utf8String = u8"UTF-8";
const auto utf16String = u"UTF-16";
// NOT the same thing as a wideStr
// wide string is always 2 bytes per character
// utf-16 is 2 bytes per code point and at least 1 code point per character with at least 1 character per glyph
const auto utf32String = U"UTF-32";
Since a C string is not a class and has no size()
method, we must use a function from the C standard library to get its length: strlen()
.
Unlike size()
, this function is O(n)
since it has to keep looking at each byte following the pointer of the string (which points to the first character) until it finds the null terminator which has the value 0.
For non-owning references to strings, C++17 introduces the std::string_view
. A string_view
has a very similar interface to std::string
but one major difference: it does not own its data.
This means that the lifetime of std::string_view
must be totally within the lifetime of whatever string-like object it was constructed from.
A std::string_view
can be constructed from an std::string
, a C string, a pointer and a length, or a begin and end iterator.
The typical implementation of a string_view
would only have it store a pointer and a size. Thus, copying an std::string_view
is likely more efficient than moving an std::string
due to the SSO.
But once again, std::string_view
cannot exist past the lifetime of whatever data it was constructed from. C++20 introduces the std::span
which is to std::vector
what std::string_view
is to std::string
.
One difference with std::string_view
is that methods that would normally return a std::string
such as substr()
return a non-owning std::string_view
.
std::string message = "wee woo wee woo";
std::string_view msgView(message.data(), message.rfind(" wee")); // "wee woo"
// construct from pointer and size
// find returns an index, which is basically the size 0f a substr from start of string
// to the argument passed to find/rfind
std::string_view wee = msgView.substr(0, msgView.find(' '));
wee[1] = 'w';
std::cout << message; // wwe woo wee woo
auto getStr() {
std::string msg = "Hello";
return std::string_view(msg);
// BAD: msg is a local variable and goes out of scope
}
auto msg2 = getStr();
// undefined behavior!
Finally, C++ strings are part of the standard template library, so it's not too much of a surprise to realize that std::string
is an alias for
std::basic_string<char, std::char_traits<char>, std::allocator<char>>
. We'll explore the ramifications of this more later, but for now you should know that this means we can use
std::string
for a string of char
, wchar_t
(2 byte characters), char8_t
(utf-8), char16_t
(utf-16), and char32_t
(utf-32).
Since the second and third template parameters have default arguments, you can customize the type simply as follows:
std::basic_string<char8_t> utf8String;
However, since std::basic_string<char8_t>
and std::string
have different template parameters, they are different types and thus don't fit together as easily.
auto result = std::string("Hello") + std::basic_string<wchar_t>(L" World"); // error
Furthermore, members operate on their type parameter, not on abstract notions defined by multi-byte encodings. So for example, the size of a utf8 string will be how many bytes it is which is not necessarily how many characters it has, using characters in the sense that a human would likely use the term character.