Friday, 19 September 2014

A Low FPS Day

Be warned, programmer and back-end screenshots to follow. If you have a sensitive disposition, stop reading now.

My Friday began with the above shot, showing almost all the art hidden except for terrain and the new light mapped object collections.  I am currently struggling with an issue which mystery drains almost all performance from the engine when these objects gang up. The same number of regular entities with full shaders do not do this, but the new objects are playing silly buggers.

As part of my journey of experiments, and to find out where the bottleneck (or as I like to call it, squashed straw) I removed the camera cycle render loop and was pleased to see the FPS return in at just under 2000 frames per second :) Obviously very fast when you are not rendering any 3D but I have my HUDs, text and the engine running the background so it gave me hope that we could find some good gains when I started the performance work next week. 

My plan is still to finish the lightmapper and 'glass terrain' shadows but I wanted my test level to run at a decent speed, and 10 fps does not cut it. Alas I spent most of the day skipping functions to isolate the code responsible for the massive slow down but to no avail. At 4PM I decided to throw in the towel on the old school method of finding hot spots and instead broke out my new edition of the Intel VTune Amplifier XE 2015. Although this premiere tool is designed to find these kind of issues immediately, I am not too familiar with the latest version so my Friday evening has been spent learning how it works and how I can get it to work for me.

Once I can get The Escape running fast enough to run around and view lightmaps, I can revert to the Visual Camp and finish off the floor terrain shadows, small artifacts in the larger entities and squash the light mapping memory usage down as far as it will go.  Also had a few ideas how I can batch, group and consolidate the lightmap objects to reduce texture files, how to entirely skip using light map textures for very small polygon surfaces and even did some research into a way to create many instances of an object with a single draw call using a clever Vertex Shader 3.0 system of using multiple streams, one for vertex data and one for instance data.  Can't wait to get light mapping done so I can enjoy the theme park of performance optimization!

Thursday, 18 September 2014

Meeting Day

A full day of 'meet' today, covering all topics in and out of development, with some good short term goals set for the next three weeks. We also revisited the issue of lighting balance, which I am still at a loss to fully grasp the issue.

As you can see above, this is a variant of the 30 ambient, 50 contrast setting from yesterday which is now adjusted to 40 ambient and 40 contrast to create more of a day-time lighting effect.

We have also tentatively planned to bring in an artist to mock-up a typical scene inside the 3D modeller, complete with desired lighting and a fly through so we can compare his 'correct' lighting system with my 'obviously incorrect' lighting system.  As I say, I am still coming to terms with exactly what is incorrect in the above shot aside from the obvious lack of ambient occlusion shading, more scenery, more action and perhaps a few atmospheric effects.  I have handed off the production of this fly-through so I do not get distracted with my immediate mission to finish the pre-bake process, save some precious memory usages and move swiftly onto performance.  Hopefully it will yield results from an additional perspective on the visual finish of the engine, and certainly create some new assets in the process.

Development work resumes Friday, which should hopefully see the light mapper producing a fast light mapped object set for The Escape demo, which is my current test level.  Once that is in and working, I will be switching over to what I am calling my 'glass terrain' trick which will lightmap a film of geometry stretched over the terrain world to catch the floor shadows of every object in the scene, and then render that super-fast. I can then remove the static rendering of the shadows and thus improve performance as I hand over this responsibility to pre-baked shadows.  Should not be too many distractions on my radar so should get some good coding done.

As an amusing aside, I just discovered a band called "The Ukulele Orchestra of Great Britain" which played a great rendition of The Good, The Bad & The Ugly and also Leaning On A Lamp Post.  Made me smile a lot :)

Also, in case you missed the news or lost touch, Alpha 6 of AGK V2 is now available to all AGK V2 pledgers, details here:

If you have not checked out AGK in a while, you have to check out this latest alpha, it's coming on in leaps and bounds!  If you don't know what AGK is, it's the easy to use programming tool behind the number one Driving Test mobile app in the UK. Check out the platforms supported:

I am personally looking forward to the new 3D and Shader command additions, and especially keen to see the performance achievable from the new generation of really fast Android devices!

Wednesday, 17 September 2014

A Much Better Day

It's always nice when you fix a bug, but when you fix a big, ugly, hidden, stealthy bug, now that's a day to be happy in.  For those readers from yesterday, you will have guessed by now that I have indeed found the source of the heap corruption, and for the tech heads out there I am going to explain some more.

It all came about when a rather LARGE vertex declaration was requested from DirectX, and that request failed (i.e. position, normal, two sets of UVs, tangents, bi-normals and bone data). Turns out my static shader for the lightmapper had a left-over matrix palette constant set-up which forced the shader effect handler to add bone data to my meshes, which in turn created a non viable custom FVF, and when it exited the mesh change state early it did not recreate the mesh buffer to match the already re-sized vertex data and the memory copy did all the rest. Boom!  I am now able to lightmap and load the models, and apply the shaders without the heap corruption and no more crashing. I still have them problem of understanding why some of the entities succeed the mesh change and some fail, which will no doubt lead to some conversion code, and I also need to expand my prototype to light map 'all' The Escape level rather than a subset but I am on the home stretch when it comes to getting this level lit properly so I can move onto finesse, video memory usage and finally performance.

Believe it or not but this was not the hog of the day!  For the first time in recorded TGC history, we had a 2 hour conference call covering the subject of lighting in the engine.  After much debate and many points of view (far more than the people in the call I might add), we arrived at settings which will be the new defaults for the next build.  In short, ambiance to 30, brightness to 0, contrast to 50 and we are also replacing Veg Specular with a Global Specular slider which will allow all specular effect to be regulated across entity, terrain, characters and any other official shader effects.  Also, to ensure end users who want to override a specific specular effect, I will be adding a new FPE field called specular which will allow the engine to select between the provided specular file, or to choose a pre-set none, low, medium and high specular on a per entity basis. It would also make a nice trick if you wanted to reduce the memory footprint of entities that have a consistent specular value (as the textures used will only be 1x1 in size and re-used).

As you can see above, this was our final agreed lighting between direct sunlight on the nearest building to low lighting on the building behind which is not directly facing the sun. Simon also discovered that all our lighting is based on a near sun-set style sun position, as demonstrated with his before and after shots.

To this end, we are adjusting the sky spec files to lift the sun higher in the sky to create a nicer overall blend of lighting between terrain and scenery objects. Seems in a single day I have the potential to add lightmapping and an improved overall colour balance to the engine.

In other news, we also had a chance to play the new multiplayer prototype, and Ravey has pulled it off once again with actual characters animating, running, shooting and generally behaving like the skeleton of a real death-match game. It was great to see, and the icing on the cake was that thanks to the Steam API, connection was a breeze. No router configuring, no firewall advice required, just go to Steam, click play, join lobby, game starts, run for your life, magic!  Next on his list is things like jumping, fragging, re-spawning and host migration. Nothing pretty so how yet, just raw functionality, but progress is going well on this front and we think you will approve.

Alas it is only 2:28 PM in the afternoon and I have a few good hours ahead of me, and thanks to the protracted call this morning the 4 PM call has been cancelled so it's plane sailing to tea time.  Just leaving my massive level to pre-bake while I add the global specular constant to all the shaders in anticipation of connecting the slider bar.  Happy days...

Tuesday, 16 September 2014

Slow Day For Lee

After the torrent of small victories made yesterday, Tuesday has been stuck in the proverbial mud.  It took me four hours to get to this point:

As you can see, I have narrowed it down to a single function call, but it is called across hundreds of different objects and only one of them may be corrupting the heap. The heap in question in the stack of memory which holds all the data for the engine, and while applying a shader to one of the light mapped imported objects, the vertex data copy operation is overwriting neighboring blocks of memory and causing a heap crash. 

Finding it was a complete mare, but fixing it might be just as torturous. The good news is that I am tracking it down now and with a little luck I can solve it once and for all.

Fortunately this was not my only small win today, as I have also solved the issue of the light map image files saves from crashing by moving the DirectX copy texture and save code from the threaded process to the main one. A rival solution was to activate the DirectX multi-thread mutex feature but that would have incurred a very small performance hit which I am no longer willing to compromise.

It's about nine thirty PM now and still no joy in figuring out which object is messing up my heap so will do my usual back-ups and resume Wednesday with better eyes.  As un-glamorous as this type of code fixing is, it may solve several issues in one go as the unpredictable outcomes of silent heap corruption can be far and wide.  Thirty objects down, several hundred to go...

Monday, 15 September 2014

My Bus-mans Holiday To IDF 2014

Regular blog readers will know full well that I have been absent last week from my normal posting duties to chill out and attend the annual Intel developer forum in San Francisco.

Three floors of technology and innovation, dispersed with sessions covering everything from chips to robots. My own duties were incredibly light this year as I attended as a mere mortal with only two speaking engagements on the subject of RealSense (formerly known as Perceptual Computing). Also managed to snatch some time attending sessions on integrated graphics performance acceleration for my return to the universe of Reloaded. 

While saying hello to a few friends in the 'Internet Of Things' lab, I happened across a project challenge to build a machine using the Galileo board. My creation was a pretty neat contraption which detected air motion, sampled the particles for ethanol and if high levels where detected, to sound an alarm and increment a sequence of LED lights by way of a detection alert. Although well received and prone to winning a prize for my efforts, it transpired that my close involvement as an Intel Innovator meant I was not eligible for the prize. That's politics for you!

My interest in Galileo lead me naturally to take an interest in it's older brother, Edison, which is a more powerful circuit board powered by an Atom processor. Powerful enough in fact to run the brain of a 22 jointed robot called Jimmy, capable of walking and talking, and built from a simple metal frame and 3D printed body parts. I have always had a passion for robotics, and were it not for the fact I am a better software developer than an electrician, I would be designing them even now.

Another take-away, and one mentioned in the keynote, was the wireless power system, which allows a laptop or other chargeable device to take power from a remote device located under your desk or table. Eliminating the last cable in the office was a great thing to see, and we should see peripherals by the end of the year using this tech, and by the end of next year have this integrated into our Ultrabooks! It's currently rated to 20 watts, so not quite powerful enough to run your desktop or huge monitor, but it will power mostly everything else and it's a great start to a glowing wires free future!

Of course the main reason for my attendance was to recharge the old batteries from several months solid work on FPS Creator Reloaded.  They say a change is as good as a rest, and with liberal quantities of Guinness and stuff that looked like it, my brain was happily sedated while my mouth rabbited on for queen and country.

We can find out about technology and gadgets from the internet, but there is no substitute for getting together and talking about it face to face, and IDF is one of my favorite times to escape the office and do this.

In my capacity as the only Welsh Intel Black Belt, one of my busmans holiday highlights was a trip to the Planetarium, set out like a cinema under a huge domed screen projecting a journey through the universe. Complete with welcome drinks, a gorgeous meal, equally gorgeous people, white crocodiles, uncut diamonds and a great talk by Genevieve Bell on the evolution of robots (and some great movie quotes). It remains a privilege to be invited back as a Black Belt developer, and a pleasure to continue to contribute my thoughts and deeds back into the developer community in the years ahead.

Alas I did not get to enjoy the last evening at my favorite Steak House and Irish Pub as the aircraft to take me home dragged me away in the middle of the last day of IDF.

As it turned out, despite the home-time traffic of San Francisco and threatened TSA security lines, I was sitting at the departure gate restaurant within two hours of leaving the hotel and recovering from a rather naughty pizza. The British Airways plane you see performing it's reverse taxi trick was the sister flight to mine, scheduled three hours later.  Rest assured I had plenty time to get through a few more chapters of Terry Pratchett's Raising Steam.

As I type, my inbox is mighty, my whole office is a dumping ground for miscellaneous tasks, both foreign and domestic, and my brain is still getting to grips with where it left off in the FPSC Reloaded universe.  Normal blogging will resume on Tuesday, just as soon as I figure out why all my characters have suddenly disappeared and what remained to be coded for the new Ambient Occlusion lightmapper.  Very pleased to see the progress made on the Multiplayer and Construction Kit, and hopefully it won't be long before we can show you some shots or even videos of these new components to the game engine.  I am not planning any more holidays or trips until Christmas now, so expect plenty uninterrupted development for the next few months :)

Friday, 5 September 2014

Lightmapping Progress

Aside from some quick tweaks the importers and zombies, most of the day has been given over to the work on the lightmapper which will provide the Ambient Occlusion textures required to make the Reloaded scenes look better and run faster. One of my early wins was the reduction of a structure called 'Lumel' from 12 bytes down to just one byte. The original lightmapper could handle multi-colored lightmaps for things like semi-transparent stain glass projections, but for here and right now we do not use them. Further, the data structure used a four byte float to store the accumulated light colour for each pixel, but as this float only converted to an unsigned char at the end of the day, I simply replaced the float with a byte, did the float conversion each time the pixel was added to, and simply passed out the final capped byte when the time came to create the light map texture.  This saving took my per-lightmap consumption from 13MB to around 1.5MB based on a 1024x1024 texture plate. This overhead can be reduced further if I replace the Lumel class with a raw array of bytes but that would mean extra coding and doubtlessly introduce a nice bag of new bugs.

Happy with my saving on the system memory front, I turned my attention to activating the multi-core feature of the present lightmapper. I tried a while back but for some reason it would not play ball so I stuck with the working single threaded approach. Now I am doing test bakes of large levels such as 'The Escape', I cannot afford to sit around waiting 30 minutes for a crash so need to speed up this process so I can get more done.

Of course I had to turn my attention BACK to the LUMEL optimization when I found out that the same data structure was being used to store position and normal vectors in the former float members. This prompted the creation of anew LUMEL LITE data structure to separate the texture pixel work from the hijacked vector code.  By this time it was 3:43PM and day light was running out but it was good to see the lightmapper perform in an identical way except for the drastically less memory and for some reason slightly better performance.

After another 30 minutes, I was able to confirm that multi-core does indeed work a charm, as can be seen with this processor view showing 100% concurrency!

As baking The Escape level takes ages, only to be rewarded with a nice mystery crash, this will probably be my closing entry until I return a week from now.  It's rather fitting that my departure to attend IDF launches with my engine using all eight cores on my PC, something Intel like to see.

I won't be demonstrating much at the IDF show this year, just my ability to drink beer and talk nonsense, for which I am overqualified.  Until my blog returns on the 15th September, you can tune into my twitter feed at @leebambertgc which I occasionally post on when I have a spare five minutes alone with my mobile.  Have a great weekend!!

Thursday, 4 September 2014

How To Get GPU Video Memory 'In Use' In DirectX 9

It took two half days of research and experimentation, and finally a link from a Reloaded community member to come up with the solution. You can use the code (below) in any DirectX 9 application to get the 'currently used' bytes of your graphics card, ideal for monitoring your resources, debugging and even taking pre-preemptive action when GPU video memory starts to get sparse. When I added this feature to the log report, and ran a simple 'gun and zombie' test, I saw this section of entries:

12748764 : gun 18:modern\colt1911\gunspec.txt                   S:0MB   V:0MB (123)     
12748771 : gun 19:modern\Magnum357\gunspec.txt                  S:0MB   V:0MB (123)     
12748778 : gun 20:modern\RPG\gunspec.txt                        S:0MB   V:0MB (123)     
12748785 : gun 21:modern\Shotgun\gunspec.txt                    S:0MB   V:0MB (123)     
12748792 : gun 22:modern\SniperM700\gunspec.txt                 S:0MB   V:0MB (123)     
12748799 : gun 23:modern\Uzi\gunspec.txt                        S:0MB   V:0MB (123)     
12748806 : total guns=23                                        S:0MB   V:0MB (123)     
12751435 : Load player config                                   S:1MB   V:109MB (232)   
12751453 : LOADING ENTITIES DATA                                S:0MB   V:64MB (296)    
12751481 : Loaded 1:_markers\player start.fpe                   S:0MB   V:-4MB (292)    
12752030 : Loaded 2:\Characters\zombies\Zombie Crawler.fpe      S:3MB   V:0MB (292)     
12752046 : LOADING WAYPOINTS DATA                               S:0MB   V:4MB (296)     

12752065 : LOADING TERRAIN DATA                                 S:0MB   V:0MB (296)     

As you can see, thanks to being able to link video memory usage with stages in the engine resource process, I notice there is something going on in 'Load Player Config' which is taking 109MB of video memory, and it will be interesting to discover what that might be. I write this blog in the afternoon so it could be I have saved mucho memory by the time you are reading this. Just wanted to get this documented and out into the world to emphasis the usefulness of monitoring video memory on the fly!

Also managed to crunch two bugs, one zombie related and one light map related, with more tweaks, twists and turns to follow. As yesterday's blog was image deprived, I have created a quite level with some of the new modern day assets that have been added to the library.

As an aside, this scene without terrain runs on my machine at about 190 fps, a substantial step up from when I first installed by GeForce 9600 GT card :)

REQUEST: As the community has been such a sterling help getting to the bottom of the video memory read issue, I wanted to put another 'home work' task out there. I am looking for a good DirectX 9 shader technique for very fast but realistic water that does NOT rely on reflection or refraction. Often seen used to render completely opaque water but the ripples and light reflections make it look the bomb!  If you can send me shots, links to code, e.t.c. that would certainly help get that ball rolling.

Before I share the code, just wanted to provide an update that the next thing on my list for Friday is what is called wrap-up, which means preparing internal builds, finishing off and cleaning code, backing up and generally cleaning my desk. I fly out to San Francisco on Monday for a week, so I will need a tidy office and work-plate on my return from drinking all that Guinness.  For my immediate future, I will dive back into GPU video memory analysis and find out precisely who is spending all my VMEM budget!


DARKSDK int DMEMAvailable(void)
static int Memory = 0;
HANDLE ProcessHandle = GetCurrentProcess();
LONGLONG dedicatedBytesUsed = 0;
LONGLONG sharedBytesUsed = 0;
LONGLONG committedBytesUsed = 0;
HMODULE gdi32Handle;
if (gdi32Handle = LoadLibrary(TEXT("gdi32.dll")))
queryD3DKMTStatistics = (PFND3DKMT_QUERYSTATISTICS)GetProcAddress(gdi32Handle, "D3DKMTQueryStatistics");
if (queryD3DKMTStatistics)
IDirect3D9Ex* pDX = NULL;
Direct3DCreate9Ex ( D3D_SDK_VERSION, &pDX );
if ( pDX ) 
if ( pDX )
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
queryStatistics.hProcess = ProcessHandle;
if (queryD3DKMTStatistics(&queryStatistics)==0) 
committedBytesUsed = queryStatistics.QueryResult.ProcessInformation.SystemMemory.BytesAllocated;
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
if (queryD3DKMTStatistics(&queryStatistics)==0) 
ULONG segmentCount = queryStatistics.QueryResult.AdapterInformation.NbSegments;
for (i = 0; i < segmentCount; i++) 
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
queryStatistics.QuerySegment.SegmentId = i;
if (queryD3DKMTStatistics(&queryStatistics)==0) 
// Windows 7 (Windows 8 and above is aperture = queryStatistics.QueryResult.SegmentInformation.Aperture;)
bool aperture = queryStatistics.QueryResult.SegmentInformationV1.Aperture;
memset(&queryStatistics, 0, sizeof(D3DKMT_QUERYSTATISTICS));
queryStatistics.hProcess = ProcessHandle;
queryStatistics.QueryProcessSegment.SegmentId = i;
if (queryD3DKMTStatistics(&queryStatistics)==0)
if (aperture)
sharedBytesUsed += queryStatistics.QueryResult
dedicatedBytesUsed += queryStatistics.QueryResult

// free DX9Ex when done
// free GDI DLL

// Pass dedicated memory used back to DBP
Memory = dedicatedBytesUsed / 1024 / 1024;
return Memory;