ROBOT-SB dev blog – get/set performance

Performance. Benchmarking. Blech! There is perhaps no more controversial subject. But I’ve got a little tidbit of info to share on the subject, so let’s dive right in, shall we?

Obligatory Disclaimer

There’s a wealth of information out there on the dangers of optimizing, and why you shouldn’t bother unless you’re sure it matters. In fact, you might even make things worse if you just operate on assumptions rather than actual benchmarking.

So my intent in sharing this little tip is NOT to suggest that anyone should just use it blindly. Rather, just file it away for possible consideration in case you’re faced with an actual scenario that benchmarking has indicated this specific problem.

Also: This whole discussion only applies to the current build of Corona SDK as of this writing, its specific implementation of internal data structures, and its specific version of Lua (5.1.5 as of this writing). Any of that could change without warning, at which point this discussion may be rendered useless.

Get/Set Performance of Display Objects

It’s not widely-documented or well-explained, but accessing the properties of a Corona DisplayObject is a more intensive operation than accessing a simple Lua table element. For example, the difference between this..:

..and this..:

But don’t just take my word for it, benchmark it for yourself, on your own devices.

Get the entire source code here.

Results on a Nexus 7 2012:

I/Corona ( 540): Platform: Nexus 7 / ARM Neon / 5.1.1 / NVIDIA Tegra 3 / OpenGL ES 2.0 14.01003 / 2018.3205 / English | US | en_US | en
I/Corona ( 540): OVERHEAD : 29.62400000
I/Corona ( 540): TABLE GET : 38.23500000
I/Corona ( 540): TABLE SET : 31.54000000
I/Corona ( 540): TABLE GET/SET INC : 41.15600000
I/Corona ( 540): DISPOBJ GET : 156.66800000
I/Corona ( 540): DISPOBJ SET : 905.53800000
I/Corona ( 540): DISPOBJ GET/SET INC : 1193.00400000
I/Corona ( 540): DONE

Results on a Nexus 7 2013:

I/Corona (13231): Platform: Nexus 7 / ARM Neon / 5.0.2 / Adreno (TM) 320 / OpenGL ES 3.0 V@95.0 AU@ (GIT@Ia6306ec328) / 2018.3205 / English | US | en_US | en
I/Corona (13231): OVERHEAD : 15.04500000
I/Corona (13231): TABLE GET : 19.25700000
I/Corona (13231): TABLE SET : 20.99700000
I/Corona (13231): TABLE GET/SET INC : 25.84800000
I/Corona (13231): DISPOBJ GET : 156.92200000
I/Corona (13231): DISPOBJ SET : 1112.64000000
I/Corona (13231): DISPOBJ GET/SET INC : 1472.83900000
I/Corona (13231): DONE

Actual results will, of course, vary from device to device. Note, for example, that my poor Nexus 7 2013 has been so heavily used that its overall performance has deteriorated, to the point where my 2012 outperforms it, on this test at least.

Still, as a generalization, and apparently regardless of the specific device, a “get” from a DisplayObject takes about 5X longer than a simple table access, while a “set” from a DisplayObject takes about 30X longer than a simple table access.

The root of the matter is that DisplayObjects are fundamentally different things than plain-old Lua tables. All the extra work necessary to drill through the DisplayObject table wrapper, through its metatable, into the proxied internal userdata representation of the display object, then push the result back onto the stack for Lua, adds up to a measurable performance difference.

(I don’t claim to know the exact internal implementation of Corona SDK, but it doesn’t matter – all that really matters is that there is a measurable difference between the two different types of access.)

Reducing Accesses to DisplayObject Properties

So, the first part of the tip is just to reduce access to a DisplayObject’s properties to the extent you can, using “obvious” techniques.

At this stage, however, we need something more “practical” to benchmark than just raw access as above. The results above are interesting as indicators of where performance might be gained, but are too simplistic to be of direct use.

In other words, it’s all well and good to claim that some micro-statement performs better than another, but how then to apply it practically? Thus, it’s useful when benchmarking to test something of just-enough complexity (without going overboard and confusing what’s being tested) to actually reveal whether one “whole solution” is better than another.

So, let’s consider the following code which is intended to “bounce” an object around the screen borders:

That’s a lot of get/set operations directly on rect.x and rect.y, which could be reduced by instead doing something like:

Note that it isn’t strictly necessary to alias “rect.dx” here, as that access isn’t necessarily a problem – Lua’s hash-table access performance is really good. So, while repeated use of any table element might benefit from local aliasing (and this is a common performance technique in Lua), it is particularly true for DisplayObject properties (the topic herein).

Is It Worth The Trouble?

Well, that’s for benchmarking to decide, of course! If it’s just a single DisplayObject, then who cares? There’s not enough difference here to add up to anything measureable.

But if you have, say, a thousand such objects, each updating 60 times per second, and you could perhaps also use those already-aliased values for collision detection or something else, then the cumulative effect might add up to something worth caring about.

Further Reducing Accesses to DisplayObject Properties

Now we come to the “non-obvious” (perhaps) techniques.

It turns out that the difference between plain-old table access and DisplayObject property access is so large in relative terms (again, roughly 5X for get, 30X for set) that there exists an opportunity to “spoof” or “proxy” the DisplayObject’s properties with plain-old table elements, and still come out a winner.

That was perhaps confusing, let me try to explain with code instead. Taking the former “bounce” example, we can eliminate the “get” of rect.x|y entirely by creating and maintaining our own local copies of x|y instead (calling them “xp” and “yp”):

Get the entire source code here.

So, what did we do? We traded a get of rect.x, for a get/set of rect.xp.

Unfortunately we can’t get rid of the set of rect.x (which is the larger of the two performance issues) because eventually we need to actually update the rect’s position. I wish Corona SDK offered something like a “moveTo” method – an absolute version of the existing relative “translate” method. That would be worth comparing!

But, it is still a net performance win because a get/set on a plain-old Lua table element takes less time than a get on a DisplayObject property.

Results on a Nexus 7 2012:

I/Corona (16543): Platform: Nexus 7 / ARM Neon / 5.1.1 / NVIDIA Tegra 3 / OpenGL ES 2.0 14.01003 / 2018.3205 / English | US | en_US | en
I/Corona (16543): OVERHEAD : 24.74000000
I/Corona (16543): DISP OBJ XY : 2430.85200000
I/Corona (16543): DISP OBJ XPYP : 1938.96100000
I/Corona (16543): DISP OBJ VIEW XPYP : 2020.21300000
I/Corona (16543): DONE

Results on a Nexus 7 2013:

I/Corona (23013): Platform: Nexus 7 / ARM Neon / 5.0.2 / Adreno (TM) 320 / OpenGL ES 3.0 V@95.0 AU@ (GIT@Ia6306ec328) / 2018.3205 / English | US | en_US | en
I/Corona (23013): OVERHEAD : 13.45800000
I/Corona (23013): DISP OBJ XY : 2872.28400000
I/Corona (23013): DISP OBJ XPYP : 2378.93700000
I/Corona (23013): DISP OBJ VIEW XPYP : 2417.45000000
I/Corona (23013): DONE

Further Exploration

I mentioned above my wish for something like a “moveTo” method on DisplayObjects, an absolute version of the existing relative “translate” method – essentially a “setXY(x,y)” method. There would likely be some performance tricks that could be wrangled out of such a method – it would at least be worth comparing to other techniques. Still, if your total usage is as simple as the “bounce” code above, it might be worth trying to substitute in a relative obj:translate(obj.dx, obj.dy) and benchmark it.

This is just offered as food for thought if you were serious about taking the “bounce” benchmarking to its logical conclusion. But it’s beyond the scope of what I intended to cover here. Essentially, you’d just want to compare:

But I’ll leave the actual benchmarking a homework assignment for the reader -if the reader is interested enough to try it. Basically: Are two reads and a function call better than two writes? Suffice to say that “translate(1,1)” will handily beat “x=x+1; y=y+1” -type code, but then you’ve got the bounce logic to factor in – assuming you want to fairly benchmark and produce same results as the other examples – and you’d potentially lose the ability to reuse those “spoof” values elsewhere.

Wrapping Up

Granted that we’re only talking about a ~15-20% difference in the second example. There is nothing earth-shattering here.

So, whether or not there’s any practical way to apply this tip would require benchmarking of specific situations.

The more you could potentially reuse and take advantage of these “spoofed”/”proxied” values, then the more you could potentially gain over repeated DisplayObject property access.

Also note that this applies not just to x|y, but any DisplayObject userdata property. Say, for example, you needed (for whatever reason) repeated access to an object’s .width (and assuming that it is unchanging), then spoof it into rect.myWidth=rect.width once, then use rect.myWidth thereafter.

But the tip from the first example – using aliases in the more “obvious” manner – remains a good general practice, regardless. (though don’t over-do it – ie, no need to alias something only accessed once)