Login / Register

Latest official blog posts

Passing values…

23rd June 2016

A few months ago I had some interesting performance problems with OpenGL on OSX. I identified the problem and made some work arounds for development to continue. This week I've properly fixed the issue, and I want to record it here for myself and others to avoid this mistake.

So here's a scene, rendering on OSX, at an abysmal frame rate of 14 on a MacBook Pro. That's right. 14. I've got the game paused so there isn't any time spent on updates, this is just drawing.


If I move the camera to a different location, the frame rate is 126. Thats a difference of 63 or so milliseconds. Ouch.


So after much debugging I determined that rendering animated models was causing the slow down. The image of just trees doesn't have any deer or people moving around. And if I remove the people from my original test scene, the frame rate is over 100.


Since rendering houses and trees really only has minor differences with animated models I disabled the shader code that animates the models and the frame rate went back up to normal. This looks funny, and runs fast.


So here's the basic code that handles animation in GLSL. It looks pretty standard and is simple code. This isn't the entire shader, just enough to get an idea of how the animation part works.

struct BoneConstants
    mat4x4 transforms[64];

uniform BoneConstants bc;

in vec3 inputPosition;
in vec4 inputWeight;
in ivec4 inputIndex;

vec3 SkinPosition(vec3 position, ivec4 index, vec4 weight, BoneConstants bones)
		((bones.transforms[index.x] * vec4(position, 1.0)) * weight.x + 
		 (bones.transforms[index.y] * vec4(position, 1.0)) * weight.y + 
		 (bones.transforms[index.z] * vec4(position, 1.0)) * weight.z + 
		 (bones.transforms[index.w] * vec4(position, 1.0)) * weight.w)).xyz;

void main()
    vec3 position = SkinPosition(inputPosition, inputIndex, inputIndex, bc);
    gl_Position = (gc.worldToProjection * (tc.transform * vec4(position, 1.0)));

What this code does is transform the position of a vertex by up to four bones in the models structure. It then weights them by how much influence each bone has on the vertex.

I stared at this code for a while (more than a while actually), and after messing about a bit, it finally dawned on me what's wrong with it. Face Palm.

To fix it, instead of calling a function to animate the models, I manually inlined the code. And my frame rate returned to normal, with animated characters.

void main()
   vec4  position = 
   		((bc.transforms[inputIndex.x] * vec4(inputPosition, 1.0)) * inputWeight.x + 
		 (bc.transforms[inputIndex.y] * vec4(inputPosition, 1.0)) * inputWeight.y + 
		 (bc.transforms[inputIndex.z] * vec4(inputPosition, 1.0)) * inputWeight.z + 
		 (bc.transforms[inputIndex.w] * vec4(inputPosition, 1.0)) * inputWeight.w)).xyz;
    gl_Position = (gc.worldToProjection * (tc.transforms[gl_InstanceID] * vec4(position, 1.0)));

Wow. So whats going on there?

There's two ways to pass parameters to a function. Either by value, or by reference.

When you pass a parameter by value, a copy of the variable is made so that any changes to the variable in the function don't effect its value in the calling function.

When you pass a parameter by reference any modifications to the variable change it directly. No copy is made.

In my case with animation, the entire array of bone transformations is being copied, because it's being passed by value. My suspicion is that the program running on the GPU doesn't have enough registers to make this copy, so the GLSL compiler is generating code - copying the array bit by bit, and then is running the code over and over to evaluate the final result. What's just a few matrix multiples, scaling, and adding becomes many many copies and conditionals. This possibly results in different execution paths per GPU thread, causing even more slowdown.

My first attempt before manually inlining this code was actually to pass the array by reference, but the OpenGL compiler yelled at me that you can't pass a uniform by reference.

On Windows and Linux, I suspect the compiler is smart enough to see that the function doesn't modify the array, and optimizes the copy away. (Or my GTX 980 and 290X are just too fast for me to notice the slowdown...)

Most people directly reference the global list of uniform bone transformations directly and never run into this issue. But since my custom shader language that generates GLSL doesn't have a concept of globals, everything is passed to functions if it's needed. Arghghghg.

So what's the real fix?

I don't want to have to manually repeat code in shaders, that's just bad programming practice. Luckily, I control the compiler for my own shading language, so I can get it to generate different code.

So I just recently added an 'inline' keyword for functions. The code gets inlined automatically and any value passed by reference isn't copied when the GLSL is generated.

Previously my skinning function looked (in SRSL, not GLSL) like this:

inline float3 SkinPosition(float3 position, int4 index, float4 weight, BoneConstants bc) {...}

And now it looks like this

inline float3 SkinPosition(float3 position, inout int4 index, inout float4 weight, 
	inout BoneConstants bc) {...}

No more repeated skinning code everywhere.

Getting my compiler to inline the code is pretty easy. However, as most shader languages don't feature a goto or label statement to jump over remaining code, it's hard (if not impossible) to inline a certain class of functions. So my inline feature doesn't handle inlining when returning from complex flow control. This really isn't an issue for shaders, as the programs tend to be straight forward and not have many loops or conditionals.

So long story short, don't pass uniform arrays and large structs to a function by value in GLSL.

View comments »

Quick Fixes

22nd May 2016

I just uploaded some quick bug fixes introduced with the last build. 1.0.6 is live on Steam. If you need to redownload it from Humble you can log in and grab it, or you can use this tool: https://www.humblebundle.com/resender. GOG.com should have an updated build shortly.

There's a new modkit, available here: BanishedKit_1.0.6.160521.zip, though there shouldn't be any changes to it from 1.0.5.

Changes in this build:

  • - Fixed a crash that occurred when clicking on the town hall if a translation mod was in use that was built with 1.0.4. Missing text data will now be blank.
  • - Fixed a bug that caused orchards and pastures to not drop items inside their boundaries as was intended.

View comments »


19th May 2016

Today 1.0.5 has been released! If you play on Steam, you should get an auto update. On Humble Store, or if you bought direct, you can download the new version by logging into your humble account, or using this tool: https://www.humblebundle.com/resender. If you bought on GOG.com, you'll have to log in and download it. GOG might take a few more hours to update to 1.0.5.

There's a new modkit, available here: BanishedKit_1.0.5.160505.zip

If you want, you can patch from 1.0.4 to 1.0.5 manually (the non-steam version). You can use this patch here: BanishedPatch_1.0.4_To_1.0.5.160505.zip. Just unzip it into the directory where Banished is installed.

If anything seems amiss with the new version, contact me.

Changes in 1.0.5:

  • - UTF8 is now used instead of USC2.
  • - Resource files can be in UTF8, USC2, UTF16, big and little endian. They'll be converted to UTF8 on load.
  • - Memory usage allowance has been increased to 1 gigabyte, which should allow for larger mods.
  • - All materials now use custom shading language SRSL instead of HLSL.

    • - Any mods with custom materials will need to be modified to point to the new shaders and/or use SRSL.

  • - Math library can now be compiled without the need for SIMD instructions.
  • - OpenGL is now supported (but isn't currently being released with the PC version)
  • - Data compilation is now in a separate DLL - CompileWin.dll - this can be swapped out for other platforms (consoles, mac, linux, etc)
  • - Shader compiler is now in it's own DLL. Video DX9/DX11/GL dlls are no longer required for compiling shaders.
  • - Added safety code to check for invalid and dangling pointers - this should make catching hard to find and rare issues easier.
  • - Sped up mod details dialog for massive mods that have 10000's of files included. This should make looking at conflicts and uploading to Steam workshop easier.
  • - Beta Mods and Mods newer than the currently released version can no longer be uploaded to Steam Workshop.
  • - Nvidia and AMD GPUs in laptops should now be auto selected for use, instead of an Intel Integrated card.
  • - Textile limit is now available for modders to use.
    • - Cropfields, Fishing, Forester, Hunters, Orchards, and Pastures now have a configurable resource limit.
    • - Livestock has a resource limit for the by product they make (eggs, wool, milk, etc) Note that if a by product isn't created because of the resource limit, the icon won't appear above the building.
    • - Added textile to the Status Bar, Resource Limit window, and Town Hall UI
    • - Added graphs for textiles to Town Hall UI

  • - Fixed a bug that caused fonts from 1.0.4 to not load in 1.0.5. A UCS2 - UTF8 conversion wasn't made properly.
  • - Fixed a bug that caused dropped resources (from citizen death/task cancelation) to drop in invalid places.
  • - Fixed a bug that caused orchards to cause invalid data access and or data corruption if a citizen tried to harvest a tree, but the tree died before he got there.
  • - Fixed a bug that caused potential memory corruption when cutting down an orchards trees.
  • - Fixed a bug that caused a crash if game startup failed before memory allocation was available or was corrupt. It now properly displays an error.
  • - Added better error message if the game runs out of memory due to too many mods loaded.
  • - Fixed a bug that caused a crash when loading old mods that had custom materials. The game will no longer crash, however objects with those materials will not display. To fix this issue, mods should be updated to the newest mod kit version and update the materials.

View comments »