(here’s part 2 and part 3 if you’d rather skip ahead)
(full sample code through part 3 is here)
First, before I even begin, a hat tip to Ed at roaminggamer.com who has already presented benchmarking code and results regarding loop performance. I’ll have to cover a bit of the same ground here just to be thorough, and as a point of reference so that we can explore further.
(also note that there are several Lua-related sites out there with performance tips, much is known on the topic.. worth a quick search for)
I’m going to make this a quickie, just to lay the foundation, as it’s stuff we’re pretty sure we already “know”, but with performance you should never take anything for granted. Here’s the benchmarking code I’ll use:
1 2 3 4 5 6 7 8 9 10 |
local function bench(fn,reps) assert(type(fn)=="function", "bench() - function required") assert(type(reps)=="number" and reps>0, "bench() - reps > 0 required") -- for i=1,60 do fn() end -- see note below local timer = system.getTimer local started = timer() for i=1,reps do fn() end local finished = timer() return finished - started end |
The first thing we should do is “benchmark the benchmark”. That is: time a “do-nothing” loop to see what overhead is present in the benchmarking framework itself. This result can essentially be treated like a constant once determined for any given platform — it’s the time required to execute all the “invariant” portions of any future test.
1 2 3 4 5 6 |
-- how many times to repeat bench -- should be relatively large to mask timer fluctuations local NREPS = 10000 -- ideally this result will be small: local baseline = bench(function() end, NREPS) print("baseline", baseline) |
That baseline number is hopefully small enough that you can just ignore it. If not, you should subtract it from any future results, as it is just “overhead” of the testing process.
Note that a millisecond accurate timer need not have millisecond precision – that’s why tests should be repeated many times to help “drown out” any such noise. (and, of course, you also need to repeat many times just to get beyond the realm of microseconds)
Note that Corona uses JNLua not LuaJIT, so in theory we shouldn’t need to prime the loop for a JIT compiler. But if you’re reading this and using something other than Corona, and might be using LuaJIT, then you might want that priming loop in there. By default, LuaJIT starts considering compilation after 56 traces, so ~60 ought to do it. You wouldn’t want a mix of both measures.
That’s all for now, just a set-up for the next part(s). By the end, I’ll demonstrate a performant method of looping an object table that might not be obvious.