Arc's Soy Machine: December 2010

Thursday, December 30, 2010

common Orc opcodes

I've been going through liboil's 0.3 source to rewrite the oil_yuv2rgbx_sub2_u8 function we use for Theora decoding to Orc pseudo-assembly code.

Because the Orc opcode documentation splits opcode description and processor support between two tables, for reference I wrote a quick Python script to build a table of Orc opcodes common to SSE (x86), Altivec (PPC/Cell), and NEON (Arm Cortex) processors.

Here's that table for reference, at least until I put the time to format it for a wiki:

opcode	dst	src1	src2	description	pseudo code
absb	1	1		absolute value	(a < 0) ? -a : a
addb	1	1	1	add	a + b
addssb	1	1	1	add with signed saturate	clamp(a + b)
addusb	1	1	1	add with unsigned saturate	clamp(a + b)
andb	1	1	1	bitwise AND	a & b
andnb	1	1	1	bitwise AND NOT	a & (~b)
avgsb	1	1	1	signed average	(a + b + 1)>>1
avgub	1	1	1	unsigned average	(a + b + 1)>>1
cmpeqb	1	1	1	compare equal	(a == b) ? (~0) : 0
cmpgtsb	1	1	1	compare greater than	(a > b) ? (~0) : 0
copyb	1	1		copy	a
loadb	1	1		load from memory	array[i]
loadpb	1	1		load parameter or constant	scalar
maxsb	1	1	1	signed maximum	(a > b) ? a : b
maxub	1	1	1	unsigned maximum	(a > b) ? a : b
minsb	1	1	1	signed minimum	(a < b) ? a : b
minub	1	1	1	unsigned minimum	(a < b) ? a : b
mullb	1	1	1	low bits of multiply	a * b
mulhsb	1	1	1	high bits of signed multiply	(a * b) >> 8
mulhub	1	1	1	high bits of unsigned multiply	(a * b) >> 8
orb	1	1	1	bitwise or	a \| b
shlb	1	1	1S	shift left	a << b
shrsb	1	1	1S	signed shift right	a >> b
shrub	1	1	1S	unsigned shift right	a >> b
signb	1	1		sign	sign(a)
storeb	1	1		store to memory	special
subb	1	1	1	subtract	a - b
subssb	1	1	1	subtract with signed saturate	clamp(a - b)
subusb	1	1	1	subtract with unsigned saturate	clamp(a - b)
xorb	1	1	1	bitwise XOR	a ^ b
absw	2	2		absolute value	(a < 0) ? -a : a
addw	2	2	2	add	a + b
addssw	2	2	2	add with signed saturate	clamp(a + b)
addusw	2	2	2	add with unsigned saturate	clamp(a + b)
andw	2	2	2	bitwise AND	a & b
andnw	2	2	2	bitwise AND NOT	a & (~b)
avgsw	2	2	2	signed average	(a + b + 1)>>1
avguw	2	2	2	unsigned average	(a + b + 1)>>1
cmpeqw	2	2	2	compare equal	(a == b) ? (~0) : 0
cmpgtsw	2	2	2	compare greater than	(a > b) ? (~0) : 0
copyw	2	2		copy	a
div255w	2	2		divide by 255	a/255
loadw	2	2		load from memory	array[i]
loadpw	2	2		load parameter or constant	scalar
maxsw	2	2	2	signed maximum	(a > b) ? a : b
maxuw	2	2	2	unsigned maximum	(a > b) ? a : b
minsw	2	2	2	signed minimum	(a < b) ? a : b
minuw	2	2	2	unsigned minimum	(a < b) ? a : b
mullw	2	2	2	low bits of multiply	a * b
mulhsw	2	2	2	high bits of signed multiply	(a * b) >> 8
mulhuw	2	2	2	high bits of unsigned multiply	(a * b) >> 8
orw	2	2	2	bitwise or	a \| b
shlw	2	2	2S	shift left	a << b
shrsw	2	2	2S	signed shift right	a >> b
shruw	2	2	2S	unsigned shift right	a >> b
signw	2	2		sign	sign(a)
storew	2	2		store to memory	special
subw	2	2	2	subtract	a - b
subssw	2	2	2	subtract with signed saturate	clamp(a - b)
subusw	2	2	2	subtract with unsigned saturate	clamp(a - b)
xorw	2	2	2	bitwise XOR	a ^ b
absl	4	4		absolute value	(a < 0) ? -a : a
addl	4	4	4	add	a + b
addssl	4	4	4	add with signed saturate	clamp(a + b)
addusl	4	4	4	add with unsigned saturate	clamp(a + b)
andl	4	4	4	bitwise AND	a & b
andnl	4	4	4	bitwise AND NOT	a & (~b)
avgsl	4	4	4	signed average	(a + b + 1)>>1
avgul	4	4	4	unsigned average	(a + b + 1)>>1
cmpeql	4	4	4	compare equal	(a == b) ? (~0) : 0
cmpgtsl	4	4	4	compare greater than	(a > b) ? (~0) : 0
copyl	4	4		copy	a
loadl	4	4		load from memory	array[i]
loadpl	4	4		load parameter or constant	scalar
maxsl	4	4	4	signed maximum	(a > b) ? a : b
maxul	4	4	4	unsigned maximum	(a > b) ? a : b
minsl	4	4	4	signed minimum	(a < b) ? a : b
minul	4	4	4	unsigned minimum	(a < b) ? a : b
orl	4	4	4	bitwise or	a \| b
shll	4	4	4S	shift left	a << b
shrsl	4	4	4S	signed shift right	a >> b
shrul	4	4	4S	unsigned shift right	a >> b
signl	4	4		sign	sign(a)
storel	4	4		store to memory	special
subl	4	4	4	subtract	a - b
subssl	4	4	4	subtract with signed saturate	clamp(a - b)
subusl	4	4	4	subtract with unsigned saturate	clamp(a - b)
xorl	4	4	4	bitwise XOR	a ^ b
loadq	8	8		load from memory	array[i]
storeq	8	8		store to memory	special
splatw3q	8	8		duplicates high 16-bits to lower 48 bits	special
convsbw	2	1		convert signed	a
convubw	2	1		convert unsigned	a
splatbw	2	1		duplicates 8 bits to both halfs of 16 bits	special
splatbl	4	1		duplicates 8 bits to all parts of 32 bits	special
convswl	4	2		convert signed	a
convuwl	4	2		convert unsigned	a
convslq	8	4		signed convert	a
convulq	8	4		unsigned convert	a
convwb	1	2		convert	a
convhwb	1	2		shift and convert	a>>8
convssswb	1	2		convert signed to signed with saturation	clamp(a)
convsuswb	1	2		convert signed to unsigned with saturation	clamp(a)
convuuswb	1	2		convert unsigned to unsigned with saturation	clamp(a)
convlw	2	4		convert	a
convhlw	2	4		shift and convert	a>>16
convssslw	2	4		convert signed to signed with saturation	clamp(a)
convql	4	8		convert	a
mulsbw	2	1	1	multiply signed	a * b
mulubw	2	1	1	multiply unsigned	a * b
mulswl	4	2	2	multiply signed	a * b
muluwl	4	2	2	multiply unsigned	a * b
accl	4	4		accumulate	+= a
swapw	2	2		endianness swap	special
swapl	4	4		endianness swap	special
select0wb	1	2		select first half	special
select1wb	1	2		select second half	special
select0lw	2	4		select first half	special
select1lw	2	4		select second half	special
mergewl	4	2	2	merge halves	special
mergebw	2	1	1	merge halves	special
splitlw	2	4		split first/second words	special
splitwb	1	2		split first/second bytes	special
addf	4	4	4	add	a + b
subf	4	4	4	subtract	a - b
mulf	4	4	4	multiply	a * b
maxf	4	4	4	maximum	max(a,b)
minf	4	4	4	minimum	min(a,b)
cmpeqf	4	4	4	compare equal	(a == b) ? (~0) : 0
convfl	4	4		convert float point to integer	a
convlf	4	4		convert integer to floating point	a

Wednesday, December 29, 2010

stuck, back to PySoy

I've been working on a streaming XML parser for Python, but need a break. At this point there's no way Concordance is getting out Jan 1st, but certainly by the end of Winter.

Our libsoy migration process PySoy got pretty far. We were migrating from Pyrex to Genie, essentially moving the core engine from PyObject to GObject to remove Python dependency in game clients and enable further multicore processing on both client and servers. Much of the rendering area of the engine has been migrated, but the process has been held up in two areas;

First, while libsoy is in pretty good shape, we still lack Python bindings - aka PySoy itself, which is what we intend games to be written and run with. Our original plan to use GObject Introspection failed in a horrible mess that I've documented in previous postings, we've looked at using SWIG and even building our own bindings generation with little measurable success. In order to get us moving forward again I'm going to just drop out some .c templates and write the custom wrapper classes by hand. The time it'd take to write and maintain these cannot possibly be greater than the time we've wasted talking about a more elegant solution that only exists conceptually.

When GObject Introspection reaches a state of even remote maturity, where it can offer a Pythonic API, we'll look at it again. We'd even help get it there if the current GIR developers would just document the .gir XML schema or typelib format so we wouldn't have to refer to their source code as the sole definition of these.

Second is our physics code. As I've posted, ODE worked for us in the past but has numerous issues with packaging for various Linux distros (and poor features, slow, and extremely difficult to port to mobile devices). We attempted to migrate to Bullet but this burned us out - virtually no work has gone into that in the past 6 months. We're all pretty frustrated with Bullet's haphazard API (whereas ODE is fairly clean) and the C++ only API doesn't play well with GObject (or anything other than c++ for that matter). Bullet's C API is minimal at best.

When it comes right down to it, the biggest barrier we face with physics is processing power on mobile devices, an issue that using Bullet would not solve. Most of the devices we're interested in include ARM6/7 processors from Qualcomm or TI. Many do not include a FPU (floating point unit), but they all seem to offer a fairly powerful DSP used extensively for processing multimedia. We do not, however, want to rewrite and maintain our physics processing for each platform.

A solution I've come up with is to write our physics, greatly simplified from even what ODE offers, using Orc. It's yet another metalanguage (first Pyrex, then Vala/Genie, now this..), but the successor to liboil (which we and much of the Gnome community use) and already supports many interesting platforms.

My plan is to first migrate our liboil-based YUV-RGB conversion code to Orc to get my feet wet, then implement a greatly simplified collision system using it, and expect the next release (or two) to still use ODE for at least rigid body physics with the plan to eventually replace even that with our own physics solver. It should be much faster, and the same Orc code we write now should be able to compile to DSP code for Android handsets and other mobile devices in the future.

Orc already supports ARM Cortex (NEON), so if we were to finish this work today we'd be able to run PySoy clients on more modern Android handsets without touching DSP code. DSP support in Orc would also be very useful for future hardware for PySoy game servers.

While we'd all really like to get the next PySoy release out ASAP, we'd also like to avoid rewriting the engine again down the road.

Wednesday, December 08, 2010

XML parsing in Python

Its been a couple months, so I'm going to give a brief update on what I've been working on.

Concordance is getting close to release, I plan to have the first release (0.1) out January 1st. More on this toward the end of December.

One of the roadblocks I've hit (again and again) is the lack of a decent XML parsing package for Python. The standard library is a shame when it comes to XML; at least four different modules (expat, sax, dom, etree) to choose from and none of them support even XPath. The most popular option, etree (or ElementTree), cannot even process an XML file with the namespace prefix intact.

There's lxml, which offers an etree-compatible API and fixes many of ElementTree's major faults (namespace prefix preservation, xpath/xslt support) but still cannot handle stream processing and, due to ElementTree's API, does not expose multiple text nodes broken up by a child element such as "<div>first string <br/> second string</div>".

To support XMPP streams we need to use expat or sax to handle the stream event-by-event, since the full XML document is only available once the root element closes at the end of the stream, but the direct children of the root element (what we call "stanzas" in XMPP) need to be processed as complete objects. While we may be able to hack something together using lxml, it would likely be less work than to implement a new XML parsing package. As long as the resulting API doesn't diverge very greatly etree the work necessary to switch should be minimal.

Beside this I've been working on a host of different packages around Concordance, from getting a javascript BOSH/XMPP library together to getting distutils2 ready for Python 3. I've even managed to ship a pitiful little serial library for Python 3, PyTTY that we're using to interface with some Arduinos.

Arc's Soy Machine

Thursday, December 30, 2010

common Orc opcodes

Wednesday, December 29, 2010

stuck, back to PySoy

Wednesday, December 08, 2010

XML parsing in Python

About Me

Blog Archive

Links

Flattr this blog