Tuesday, 30 September 2014

CPU Profiling Totally Rocks

Having a great time profiling the FPSC Reloaded Engine with VTune at the moment. I started by switching off every component and running the profiler on a completely empty scene, no terrain, objects, sky, physics, anything. I then had a look at what might be hogging things.  Turns out quite a few things.

Seems the engine would be monitoring ALL the objects, even if they where invisible for things like animation potential, mesh vertex update potential and other large loops. Completely redundant of course and hogged my CPU cycles. By changing them to use shortlists, I would only do a loop that consisted of the objects of interest, and the bottleneck completely disappeared.

At first I also thought the huge amount of time spent in NVD3DUM.DLL was something I could optimize but I think a good engine spends most of it's time in here as that is where the CPU is constantly giving things to the GPU to process, which means more frames and faster games. My current guideline is to ensure that the engine always spends more time in this module than it does in the remaining modules, thus ensuring a fast throughput of polygons to the card and zero CPU stalling.

I still have an outstanding issue which is causing a DirectX Error crash due to skipping the texture sort each frame (a MASSIVE hog) but once I track down the specific object(s) responsible I can fix it properly and massage the texture sort system so I am not breaking something elsewhere.

As you can see, by ensuring the texture sort only happened when the overall number of objects in the engine changes (i.e. something got added or removed) I went from 143 to 208 by adding two extra lines of code and a new variable!

After I've solved the texture sort crash bug, I need to spend some time playing the game and using the editor and features of the engine to ensure I have not broken anything major. Best to fix those now when I know what code I changed than a week from now when I won't have a clue.

I won't tease you with my current frame rate gains as they are very subjective but I am happy to report that for every bottleneck I find, and eliminate, the bottom line FPS jumps up.  There is still the unavoidable issue that the engine drops back down to the 40 range when I try to draw a thousand shaded objects, but that is something I plan to tackle separately as it relates back to visuals and how the rendering order and quantity is handled.

Pretty happy with the performance work so far, and my hope is to bring you some solid news of the gains before the week is out.  Until then, watch this space and keep your fingers crossed.

Monday, 29 September 2014

Performance Week

At long last I have finally broke free of visual thoughts and hidden thread bugs to spend a whole week on performance. When not trying to find ways NOT get distracted with 'other things' I will be using Intel VTune to find all CPU bottlenecks and squash (or redirect) them as much as possible.  I have some major areas I think need work, but I will let the profiler be my guide this week (and my own task list of course).

Before I started this week of speed, I had one last visuals related conference call with one of our former top artists. Taking five minutes away from his cut and thrust industry artist lifestyle, he helped us come up with a plan on how we might balance the lighting system in our engine and create a more impressive final render.  I will only be starting this work mid-October but it's great to have another pair of eyes, attached to a brain and mouth that can talk in the language of code and the language of art. We can only benefit from his help!

I have arrived at the visuals settings that 'I' like, and when we do the light balancing we can come up with something both 'generic and cool'. There was even talk of disabling some of the sliders by default (released in SETUP.INI) so you can be protected from messing up the lighting balance. For example, no good comes from increasing ambience if we can provide the ambience via a skybox cube map (spherical, hemispheric or cube mapping) for the perfect natural lighting (i.e. when you change the sky to blue or red, the ambient contribution is affected by the colors in the sky and is very effective).  More on this when I start my experiments next month.

Experiments for the rest of this month include hunting down and eliminating lazy, bloated or overworked algorithms and teaching them the true meaning of speed.  My personal goal is to go from 35 fps which is what I am getting from my aged PC and Graphics card to well over 60 fps to achieve a smooth and exciting game play.  My hope is that I will find a single line of code which is the cause of all the slow down but I don't think I am going to be that lucky. It's more likely to come from some effective investigation, lateral problem solving and a few hard choices too.  Having been a victim of these slow patches, I am pretty excited to get stuck into the heart of this beast so going to grab a bite to eat now and then set forth on a journey of discovery to slay a few code sloths!

Friday, 26 September 2014

Give Me Light!

Spent one hour and a half talking about the lighting question again, all very necessary stuff but it's slow progress making concrete decisions. You could not find a more subjective topic to talk about, and committing ideas to code requires a very clear understanding of what you want to do, versus what do not want to happen. After the call I was thrust into a further three hours of experimentation and analysis to attempt to create a situation where the surfaces of objects can be overexposed without using ambient light to artificially brighten other areas of the scene which became undesirable. The solution was a new slider value called 'Surface Level', and much like 'Ambient Level' controls the intensity of the multiplier within the shader, but this time for direct light.  I also added the remaining shader controls to the new static render effect to include shadow intensity and general ambient and surface colors.

While I was coding and testing this, I set my machine off pre-baking The Escape level with ambient occlusion but with a single threaded approach. I did some experimenting last night and discovered that if I completely eliminate the threading code, and run on the main processor thread, I can lightmap the whole scene without any corruption, freezing or crashing. A major clue, as I now know it has something to do with threads competing for use of the same data and getting it royally wrong.

Of course after 2 hours, when the pre-bake was finished, it turns out that by setting the ambient value to 0.0 instead of 0.6, the expensive occlusion effect was lost as inside buildings there is no light to subtract from. Ah well. Returned it to 0.5 and started the build process again.  The reason I dropped it was to allow the real shaders to fully control ambience, but as I want to have ambient occlusion mapping even where there is no light sources, I have decided to use 0.5 as the base-line and then deduct this value from the shader so I can effectively have negative light to apply the ambient occlusion effect again.

For today, in the spirit of getting things done, I have decided not to continue hunting for the threading bug and instead get the static shaders finished off and move onto in-game performance and continued visual touches. These items are now more important than making the light mapping process faster, but it's still high on the list, just slightly demoted while I get the engine into a state which can allow me to make some decent screenshots everyone is happy with.

Currently agonizing over creating a new shader (which will be almost identical to the entity_basic.fx) to allow normals, specular, fog, e.t.c. but with the addition of an extra UV data chunk and a texture re-shuffle.  Ideally it could all be in one shader but then I would have redundant resources in there on both sides (i.e. secondary UV not required for dynamic entities, occlusion texture not required for static entities). I also like the freedom of being able to tailor the static shader for speed given it's static state.  Despite the terrible 'code duplication' it will create I think I will opt for a specific static shader and just cut and paste 94% of the code from entity_basic.fx.

If anyone knows of a good technique to have 'common shader code' which can then be included into the HLSL file, it would make the above concern mute and would significantly clean up my shaders and also reduce the chance of errors creeping in such as typos.

For now I will proceed to create final static shaders, juggle the code to allow the extra textures in there and tie in the HIGHEST to LOWEST settings so they can change the static shader too.  That will then set me up nicely to produce some nice shots this evening, and have the engine ready to do some serious performance profiling with Intel VTune (the core duties of my tasks next week). Would have been nice to share a final render of the combined effects of this work, and maybe I will post one this evening if I am not too zonked, but for now here is me diving back into the land of shaders and putting some of the wires back in the box.

Thursday, 25 September 2014

Monster Crash Freeze

Every now again, once every few years, you encounter a bug which dwarfs the daily bug, reassuring the typical plodding coder that he still has a lot to learn. The bug in this case is one which flickers the whole screen, and then after around 5 seconds freezes everything, apps, processes, mouse, task manager, everything. The only escape is a hard reset and complete reboot of Windows. Now this type of super crash was quite common before processors got to Ghz speeds, when you only had one core and a hacky way to do threading.  So much in fact that when this new monster bug occurred, it was like a blast from the past.  Not a happy journey into nostalgia though as it is precisely in my way to creating nice looking scenes and fast games.

Actually stayed up until 2AM last night battling with it, to no avail. All my tricks to hunt it down resulted merely in the insight that it is definitely rogue memory writing that is to blame.  The corruption strikes in the object sort list, in the object data themselves and in the vertex buffer manager, so there is no specific target and the damage hits in a different place each execution. Even data based breakpoints failed as the area being monitored would be corrupt the first time but not the second.

Today I had the idea that I should amend the light mapper to use a single main thread and to create smaller light mapping batches with a smaller test level so I can concentrate the bug effect and have a chance of the program using the same address space on repeated runs.  I am now in the middle of this process and hopefully I can then use my data breakpoint trick to find the origin of the rogue who is randomly writing bytes into someone else's memory.  I've also been asked for some top notch light mapping screenshots so it's critical I get this fixed, new scenes designed and shots made before the newsletter comes out.  Wish me luck!

Wednesday, 24 September 2014

I Was Up, Then I Was Down

All the way up to 4PM I did some good work, creating a new task slicer for the lightmapper which will conserve the system memory even for massive levels. It does this by only lightmapping a chunk at a time, and then freeing up the used memory for the next job. I also found and fixed a few major memory leaks in the lightmapper too which helped hugely. Once I moved from my test scene to The Escape level, although it lightmapped fine as a single process, the sliced one now exhibits a very unique bug, which flashes the monitor (not the back-buffer of the app but the whole PC), and after ten seconds the whole machine freezes completely. Have to hard reset to get it back!  Not a fine ending to the day, but I will go through line by line and fine the culprit but it will have to be Thursday now.

I also managed to do an updated trace of all the CPU allocations when running the map editor, and also tacked on the lightmap resource usage from this morning.

Ignore the title, it should have read "CPU System Memory" but you can see the worst offenders are terrain, and the 21% is the memory created during the light mapping process itself (along with some other big LM chunks).  I am sticking with the lightmap resources for now as part of the ambient occlusion mapping work and then I will probably explore the terrain usage after I've done the priority performance tasks on my list.

It's a nice feeling when you finish a day and you've made progress and the software runs better than the day before. On this occasion, progress definitely made, but with a bug that literally forces a PC reset, I don't have that take-home feel good factor.  I may return this evening for a few more hours and at least find and fix the 'freeze my PC' feature, as I don't want that to greet me come Thursday!!

Tuesday, 23 September 2014

Insert Amusing Title Here

A good day of work today with more progress on the lighting system, screenshot second one below.  Also continued having fun thinking up the new name for the product for our Steam launch.  You may have read the latest forum thread on this but the headline titles for this week have been Hyperion, Scorch, Titan, Breeze and Dark, with the addition of Viper during our call today. We're also been playing with dropping 'Game Creator' in favor of 'Maker', 'Kit' and 'Engine'. Give me coding any day, thinking up new names is HARD!  The hunt for the perfect name continues.

The above is what I started with this morning, with the obvious issue being the total shadow under the fallen fence gate. This was due to the light mapper not detecting semi-transparent textures in the process.

My task today was to get my shadows looking prettier, adding ambient occlusion back in, allowing semi-transparent textures to cast semi-transparent shadows, ensuring the whole Escape level can be processed, solving the edge artifacts and making it all blend together.  Apart from wasting four hours finding out why D3DXLoadSurfaceFromSurface was not working, everything else went smoothly. For DX coders out there, I must share the above solution. It seems you cannot use the Surface function to copy a compressed video memory texture to an uncompressed system memory texture. Any combination of the above fails. What you have to do is create a system memory texture and directly load the compressed texture into it using D3DXLoadSurfaceFromFile. Hopefully this little pearl can save you four hours of your life on day!

As you can see, the new lighting system brings out the depth of the building features and adds subtle shadows where required. It only costs a few extra frames and replaces the very crude LOWEST shadows, but gives a much higher resolution shadow, and even here the resolution of the lightmaps have been limited to 512 pixels wide. This can be increased to 2048 or even 4096, that is, once I have solved the management of the memory used by the light mapper. You find though that most games keep the pre-baked lightmapping relatively low resolution and subtle, mainly to conserve aforementioned memory. I have not yet batched the static geometry which might yield a return of these frames, and most likely gain some too.

Also bear in mind the above buildings and objects are not using any normals, specular or other per pixel refinement, just basic diffuse + lightmap. Hopefully when I add in specular and normals, the flattish surfaces will bump a little to create some more fidelity or they are too subtle and can be left out of all but the closest objects, we will see.

Next on my list is to squash the whole lightmapping process so it does not take up quite so much system memory.  It was originally written and tested against small objects, not whole levels and as a result the implementation proceeds to create allocations for the entire process at the start, AND creates more memory on the fly as it goes. Pretty hungry now, and it becomes positively ravenous when you increase the light mapping resolution quality.  My initial idea is to break the job up into 200MB or so of work, process that, then move onto the next 200MB using the previously freed memory. Might add a few seconds of set up to the whole thing, but allows light mapping to happen inside Test Game which is ideally where I want it.  Before that however, I shall spend some of Wednesday tracing through the 1338 static objects from The Escape level and investigate why they are collectively eating 500MB of system memory. It might be perfectly acceptable when you consider the addition of collision geometry for the ray caster, holding areas for the light accumulation buffers and the system memory copies of the transparent textures, but it's always worth checking out memory allocations of that magnitude.  It also means I am one step closer to starting my performance work, which I am very much looking forward to!

Monday, 22 September 2014

Good Day

A pretty good start to the week, having solved the 10 fps issue that struck last week. Turns out it was a batch of animating objects being forced into being static for the lightmapper but still trying to update the vertex buffer each cycle which made the DX pipeline crawl. Added code to convert such objects and extract the animation properties of these newly created static objects and the speed returned.

My main job for today was to create terrain shadows as part of the light mapping process and as at 5PM I have achieved. The shot above was an early shot of the prototype when I introduced the terrain geometry to the process and as you can see those shadow tears are a real eye sour. After much tweaking was able to solve it, but for the sake of expediency I have switched off the ability of the terrain to cast it's own shadows in the light map process. I am happy that they will do this eventually but my mission was to have the buildings cast shadows on the floor, and The Escape level now has floor shadows!

I still have a few things to do, such as re-activate the ambient occlusion mode and add transparent textures to the light mapping so that vegetation textures can let light through and exaggerate creases.  I also want to confirm I have solved the small artifacts that appeared on the first buildings I light mapped.

I also want to experiment with multiple colored static lights, especially for interior scenes as I think this is where the lightmapper will come into it's own, and I look forward to bringing you some screen shots of inside the buildings when my experiments are complete.

Soon I will be straying into performance territory now, so I just want to make sure the visual is where I need it for the time being and then ramp up the work to ensure I can get The Escape running at well over 60 fps on my aging machine with mid-range graphics card.  I also have Intel VTune set up and ready to roll now so I should have the tools I need when the time comes.