Media Molecule PMoly

Spending Your Cache Wisely

2011August 30th

Back in the day, computers were much simpler things. They had a CPU and they had some memory. Getting data to/from memory took a fairly reliable amount of time, and when optimising you’d just try to make sure that memory speed wasn’t the block in the pipeline that caused your application to slow down. Today however, things have changed, primarily due to the fact that CPUs have gained speed at a much faster rate than memory has. Where a CPU might go up in speed by 10x over a few years, the speed of the memory it’s paired with might go up by just 3x, making it become a much larger culprit for slowing games down…

Read more

Posted by: ChrisC

Categories: Code

Alex recently voyaged over land and sea to attend Siggraph 2011 in Vancouver. There he gave a talk entitled ‘Two Uses of Voxels in LittleBigPlanet2’s Graphics Engine’ as part of the Advances in Real-Time rendering in 3D Graphics and Games course.

Oooooh!

The slides from that talk are now available to interested people,  along with some other talks on the same subject matter too, from Bungie, Crytek, EA and DICE!

Posted by: Spaff

Categories: Code

Making use of Amazon for our server hosting gives us access to a host of neat features that allow us to do good stuff with our server environments.

We make use of a feature in Amazon called EBS (Elastic Block Store), you can think of it like a hard drive that exists on the network. While it is true there has been a lot of doom and gloom surrounding the EBS devices (http://aws.amazon.com/message/65648/) they can give you some advantages if you are careful how you use them.

One of those advantages is the ability to create snapshots and then create new EBS volumes based on those snapshots. Snapshots are quite quick to take, get saved in S3 (Amazons Simple Storage Service) and protect the data for long term durability. Using these techniques you could quickly copy data from one environment to another, for example from your production to staging environments. Or while snapshots are *not* a backup solution, you could snapshot a volume, create a new volume from the snapshot, attach it to a machine and take a backup from that volume. Allowing you to backup from a certain point of time without affecting the running of the main server (of course depending on your applications and technologies this may not be appropriate and you might need to take extra steps to ensure you got a consistent data backup).

Read more

Posted by: Gavin

Categories: Code
Tags: amazon aws ebs

Oooooh!The other day I was writing some code that needed to build up a list of strings in a large file, and identify any duplicates that occurred. On average, the file would contain about 10000 unique strings, and I’d be looking at finding at least 500 duplicates per string. That means that throughout the parsing of my file, I’d have to add about 10000 strings to the list, and probably do about 5000000 lookups on them. This needed to be a fast process…

This blog entry is about an awesome trick that Mr Alex Evans taught me when I went and asked him for advice about this problem. It’s such a good lesson I thought I’d write it up and shove it on the blog. I’d come up with various ways I could approach this – hash tables, maps, binary trees, specialised string trees and couldn’t pick how to attack it. His advice was that if an algorithm can do anything at all that you don’t need it to do, it’s probably slower than it needs to be.

A perfect solution to a problem will do exactly what it needs to – no more, no less. That way you know that even if the code isn’t perfectly optimised, it’s not spending time doing things it doesn’t need to. It occurred to me that even nature follows this approach – a perfectly evolved creature for some scenario will be exactly as good as it needs to be at things – anything extra is a waste of valuable resources. Cheetahs don’t run at 150mph because 70mph is enough! Clearly it’s a good way of thinking of things, so I set about this approach for my string searching.

First, my requirements:

  • I only ever add to the list – I never remove
  • I never need to extract any sorted data from it or anything fancy like that
  • I never need to iterate over the list
  • I’m going to be doing on average 500 times more lookups than I am adds, and the majority are likely to be successful (i.e. they will find an entry)
  • The strings themselves will always be in memory (as they’re loaded when I load the file), so they don’t need copying when adding
  • I have loads and loads of memory to burn!


I started off with a std::map, which could do:
Adds, gets, deletes

  • Any sort of types for the key and value, with any sort of comparison function
  • Easy iteration of the list
  • Easy extraction of sorted data from the list
  • Automatically manages memory allocation

Clearly the map does way too much – it supports lots of operations I don’t need, and can deal with any arbitrary types, whereas I just need a null terminated string for a key and a single unsigned int for the value. To cut a long story short, after several rounds of following Alex’s advice, I found myself with:

  • A custom hash table – uses a very simple 64 bit hash algorithm on the string, then uses the bottom 20 bits as an index into a table
  • Each entry in the table is a list of strings, their full 64 bit hash, their length, and their corresponding unsigned int.
  • This means my lookups can identify mismatches very fast – only in the event of 2 strings having exactly the same hash and length will I need to do an actual strcmp to verify they’re identical. The odds are I’ll end up doing 1 strcmp per lookup
  • All memory in the table is allocated linearly in one big mega chunk (which gets bigger if it really has to). This avoids lots of dynamic allocation, but prevents me being able to delete things.
  • Specialised to deal with strings + unsigned ints – doesn’t handle arbitrary class types at all
  • It’s only 150 lines of code

The result – parsing my file and doing all the string lookups goes from 22.5s to 1.8s.
Anyway, the moral of the story and the lesson this blog post is all about:

If an algorithm is capable of more than it needs to be, it’s probably doing more work than it needs to!

lately I’ve been experimenting with a trick to save space on 64 bit architectures: rather than using pointers (which take 8 bytes of memory! 8! bytes! omg!), I’m using 32 bit indices into a single, gigantic 32 gig block I allocate at startup. as in:

u64 *bigblock=malloc(32*1024*1024*1024); // at startup. or choose a number smaller than 32 :)

struct foo { int x, y, whatever, whatever; };

// later…

u32 myfoo=MyAllocatorForBigBlock(sizeof(foo)); // using a custom allocator of your choice…

((foo*)(bigblock+myfoo))->x=100;

((foo*)(bigblock+myfoo))->y=200;

with some macro goodness, or in C++, a template class that looks a bit like a smart pointer but is in fact not smart, those casts I made explicit above go away it can be made to look really neat:

P<foo> myfoo=P<foo>::Alloc(); // P<foo> is really a 32 bit index just like before.

myfoo->x=100;myfoo->y=200; // the overloaded -> operator does the bigalloc+index thing for us

The indices in my case assume an 8 byte stride, ie all allocations happen on 8 byte boundaries (that’s why bigblock is a u64*), so you can address 4*8=32 gigs with 32 bit indices. For an in-memory database (my use case, as it happens), that’s plenty. I don’t want it swapping anyway, and 32 gig machines are reasonable these days. The compiler seems to do the right thing and generate quite efficient code (eg it keeps bigblock in a register (ecx), puts indices in say edx and does things like mov [ecx+edx*8+4],200)

another cute side benefit is that all the pointers in your system, now that they’re indices from a base address, are sort of ‘position independent’… you can mmap() or fwrite() the entire bigblock to and from disk, and next time you boot, it doesn’t matter if the base address moves, it all just works. no need for pointer fixups or anything like that.  win!

kinda makes 64 bit linux/windows feel a bit more like a nice embedded system with a predictable memory map and nice small 4 byte pointers. lovely!

Posted by: Alex

Categories: Code
Tags: 32 bit 64 bit indicies

Untitled Document Syndrome

2010January 28th

this post is pretty on the money - but it goes further. how many times have you been forced to name a project before you can start? too many. littlebigplanet, the multi-million-selling-product, still has the project name ‘ps3test1’ - yes, it really can trace its roots back to the first code we ever wrote on ps3. at the time I named that, I had *no* idea that it would grow into the final thing. oh, and the executable filename? pc.exe (windows) and pc.self (ps3). confusing!

on the other hand, when I start projects and choose a cool sounding name, oh I dunno, like “RoboEngine”, you can pretty much guaruntee that it will fail. it seems that the less well considered the filename, the more successful the content. at least for me…

Posted by: Alex

Categories: Code

At siggraph 2009, Natasha Tatarchuk invited several games industry graphics types to give a course on ‘advances in real-time rendernig in 3d graphics and games’. Follow the link for lots of lovely stuff, including a set of slides about graphics techniques used in LBP.

Posted by: Alex

Categories: Code
Tags: rendering siggraph

c trick

2010January 27th

I love indexing constant strings, I’m sad like that.

for example,

char c=”0123456789abcdef”[b&15];

converts the low nybble of byte b into its hex character.

oldie, but goldie.

Posted by: Alex

Categories: Code
Tags: c tips

Quote of the moment

2010January 26th

“goto considered harmful? rubbish - it’s fine”

— Anton Kirczenow

Posted by: Alex

Categories: Code

Luscious luscious fluids

2010January 26th

MMM so tasty!

http://people.cs.ubc.ca/~rbridson/

Posted by: Anton

Categories: Code