QStringView Diaries: Advances in QStringLiteral
This is the first in a series of blog posts on QStringView, the std::u16string_view equivalent for Qt. You can read about QStringView in my original post to the Qt development mailing-list, follow its status by tracking the “qstringview” topic on Gerrit and learn about string views in general in Marshall Clow’s CppCon 2015 talk, aptly named “string_view”.
But this post is not about QStringView — yet. It’s about QStringLiteral and its upcoming sister QStringViewLiteral.
If you have been to one of my Effective Qt talks, or seen them on-line, you know that I eschew QStringLiteral for several reasons.
What’s wrong with QStringLiteral?
First, it may fall back to QString::fromUtf8(). That makes it all but impossible to recommend it as a fast way of creating a QString: Creating it from a QLatin1String would be faster than with fromUtf8().
Second, each use produces a new UTF-16 array that contains the string data. This duplicates the data as many times as you “call” QStringLiteral with the same argument. It does so even within a single translation unit. Common C string literals, on the other hand, are allowed to share a single memory location.
Third, since it returns an actual QString, its use clutters the executable with calls to the QString destructor. The destructor will be a no-op in all executions in of the program. But the dead code still sits there and costs you in binary size and reduced effective i-cache size.
What can we do about it?
At face value, not much, in Qt 5.
We can’t change QStringLiteral to return something else than a full QString. That would break code such as QStringLiteral("...").append(...).
For reasons that would go beyond the scope of this post, we also can’t enable string data sharing between QStringLiteral instances before Qt 6. The key point here is: QString::fromRawData() mustn’t allocate memory, which is not possible with the Qt 5 QString design.
But we increased the minimum compiler requirements in Qt 5.7. That means we can do something about the unfortunate QString::fromUtf8() fall-back: remove it.
Towards a noexcept QStringLiteral
For my work on QStringView, I recently carefully analysed the #ifdef jungle in qstring.h and qcompilerdetection.h. This revealed that only one supported platform, QNX 6.x, still uses the fromUtf8() fall-back. More importantly, I found that it shouldn’t.
To give you the gist of it: The compiler shipped with QNX 6 supports Unicode string literals: u"string", a const char16_t. But it ships with a standard library that lacks support for char16_t. That means that, say, std::u16string is not available. Qt C++ feature macros imply a certain level of standard library support as well as the core language feature. So we did not enable the macro for Unicode strings on that platform.
Now observe that, crucially, the QStringLiteral implementation only needs the core language feature: it needs to be able to prefix u to the C string literal you pass to QStringLiteral. That turns the C string literal into a UTF-16 sequence that it then stores in a static object. The implementation does not need std::u16string, or any other library support.
That leaves one supported platform without support for Unicode string literals: MSVC 2013. That, however, has an existing fall-back in place: it uses wchar_t, which, on Windows, happens to be the same size as char16_t.
So I prepared a patch that removes the check for Unicode string literals, uses wchar_t on Windows and char16_t everywhere else. It removes the QString::fromUtf8() fall-back for good. I’m happy to report that it will be Qt 5.9. With a bit more attention paid to performance, it could have been in 5.7 already…
Remember, that patch effectively only changes a single platform: QNX 6. But it means that programmers can now safely assume that QStringLiteral never allocates memory.
That said, if you find that the change breaks your platform, please file a bug so I can do something about it before 5.9.0 gets released.
Towards string sharing and less code bloat in QStringLiteral
The above does not address the problem of QStringLiteral data duplication (point two in the introduction). As I hinted above, that needs a different QString design, which can’t happen before Qt 6.
But if QStringLiteral allocates no memory anymore, it also means that references into the QStringLiteral never expire. We can therefore lift the machinery for QStringLiteral and use it to create a QStringViewLiteral. That simply prefixes either L or u to the string, depending on the platform. In any case, the result is implicitly convertible to QStringView, which will stay valid for as long as the program runs.
There is still the problem with DLL unloading that plagues QStringLiteral, too. But while the problem potentially affects all QString uses when QStringLiteral is the source, no sane programmer would keep a QStringView around for longer without storing it in a QString to make a copy.
Advances in compiler support for C++11 enabled us to tighten the guarantees of QStringLiteral: From Qt 5.9 on, it never allocates memory, and references into it never expire.
We cannot do something about QStringLiteral‘s other drawbacks until Qt 6 allows us to change the QString layout. But the introduction of QStringView, hopefully in Qt 5.10, allowed me to implement a QStringViewLiteral which has none of the drawbacks of QStringLiteral. However, it “only” returns a QStringView instead of a QString.
【 在 hgoldfish 的大作中提到: 】
: 为了兼容新版本 c++ 编译器，让旧版编译器跑起来慢一点？