Arc's Soy Machine: common Orc opcodes

I've been going through liboil's 0.3 source to rewrite the oil_yuv2rgbx_sub2_u8 function we use for Theora decoding to Orc pseudo-assembly code.

Because the Orc opcode documentation splits opcode description and processor support between two tables, for reference I wrote a quick Python script to build a table of Orc opcodes common to SSE (x86), Altivec (PPC/Cell), and NEON (Arm Cortex) processors.

Here's that table for reference, at least until I put the time to format it for a wiki:

opcode	dst	src1	src2	description	pseudo code
absb	1	1		absolute value	(a < 0) ? -a : a
addb	1	1	1	add	a + b
addssb	1	1	1	add with signed saturate	clamp(a + b)
addusb	1	1	1	add with unsigned saturate	clamp(a + b)
andb	1	1	1	bitwise AND	a & b
andnb	1	1	1	bitwise AND NOT	a & (~b)
avgsb	1	1	1	signed average	(a + b + 1)>>1
avgub	1	1	1	unsigned average	(a + b + 1)>>1
cmpeqb	1	1	1	compare equal	(a == b) ? (~0) : 0
cmpgtsb	1	1	1	compare greater than	(a > b) ? (~0) : 0
copyb	1	1		copy	a
loadb	1	1		load from memory	array[i]
loadpb	1	1		load parameter or constant	scalar
maxsb	1	1	1	signed maximum	(a > b) ? a : b
maxub	1	1	1	unsigned maximum	(a > b) ? a : b
minsb	1	1	1	signed minimum	(a < b) ? a : b
minub	1	1	1	unsigned minimum	(a < b) ? a : b
mullb	1	1	1	low bits of multiply	a * b
mulhsb	1	1	1	high bits of signed multiply	(a * b) >> 8
mulhub	1	1	1	high bits of unsigned multiply	(a * b) >> 8
orb	1	1	1	bitwise or	a \| b
shlb	1	1	1S	shift left	a << b
shrsb	1	1	1S	signed shift right	a >> b
shrub	1	1	1S	unsigned shift right	a >> b
signb	1	1		sign	sign(a)
storeb	1	1		store to memory	special
subb	1	1	1	subtract	a - b
subssb	1	1	1	subtract with signed saturate	clamp(a - b)
subusb	1	1	1	subtract with unsigned saturate	clamp(a - b)
xorb	1	1	1	bitwise XOR	a ^ b
absw	2	2		absolute value	(a < 0) ? -a : a
addw	2	2	2	add	a + b
addssw	2	2	2	add with signed saturate	clamp(a + b)
addusw	2	2	2	add with unsigned saturate	clamp(a + b)
andw	2	2	2	bitwise AND	a & b
andnw	2	2	2	bitwise AND NOT	a & (~b)
avgsw	2	2	2	signed average	(a + b + 1)>>1
avguw	2	2	2	unsigned average	(a + b + 1)>>1
cmpeqw	2	2	2	compare equal	(a == b) ? (~0) : 0
cmpgtsw	2	2	2	compare greater than	(a > b) ? (~0) : 0
copyw	2	2		copy	a
div255w	2	2		divide by 255	a/255
loadw	2	2		load from memory	array[i]
loadpw	2	2		load parameter or constant	scalar
maxsw	2	2	2	signed maximum	(a > b) ? a : b
maxuw	2	2	2	unsigned maximum	(a > b) ? a : b
minsw	2	2	2	signed minimum	(a < b) ? a : b
minuw	2	2	2	unsigned minimum	(a < b) ? a : b
mullw	2	2	2	low bits of multiply	a * b
mulhsw	2	2	2	high bits of signed multiply	(a * b) >> 8
mulhuw	2	2	2	high bits of unsigned multiply	(a * b) >> 8
orw	2	2	2	bitwise or	a \| b
shlw	2	2	2S	shift left	a << b
shrsw	2	2	2S	signed shift right	a >> b
shruw	2	2	2S	unsigned shift right	a >> b
signw	2	2		sign	sign(a)
storew	2	2		store to memory	special
subw	2	2	2	subtract	a - b
subssw	2	2	2	subtract with signed saturate	clamp(a - b)
subusw	2	2	2	subtract with unsigned saturate	clamp(a - b)
xorw	2	2	2	bitwise XOR	a ^ b
absl	4	4		absolute value	(a < 0) ? -a : a
addl	4	4	4	add	a + b
addssl	4	4	4	add with signed saturate	clamp(a + b)
addusl	4	4	4	add with unsigned saturate	clamp(a + b)
andl	4	4	4	bitwise AND	a & b
andnl	4	4	4	bitwise AND NOT	a & (~b)
avgsl	4	4	4	signed average	(a + b + 1)>>1
avgul	4	4	4	unsigned average	(a + b + 1)>>1
cmpeql	4	4	4	compare equal	(a == b) ? (~0) : 0
cmpgtsl	4	4	4	compare greater than	(a > b) ? (~0) : 0
copyl	4	4		copy	a
loadl	4	4		load from memory	array[i]
loadpl	4	4		load parameter or constant	scalar
maxsl	4	4	4	signed maximum	(a > b) ? a : b
maxul	4	4	4	unsigned maximum	(a > b) ? a : b
minsl	4	4	4	signed minimum	(a < b) ? a : b
minul	4	4	4	unsigned minimum	(a < b) ? a : b
orl	4	4	4	bitwise or	a \| b
shll	4	4	4S	shift left	a << b
shrsl	4	4	4S	signed shift right	a >> b
shrul	4	4	4S	unsigned shift right	a >> b
signl	4	4		sign	sign(a)
storel	4	4		store to memory	special
subl	4	4	4	subtract	a - b
subssl	4	4	4	subtract with signed saturate	clamp(a - b)
subusl	4	4	4	subtract with unsigned saturate	clamp(a - b)
xorl	4	4	4	bitwise XOR	a ^ b
loadq	8	8		load from memory	array[i]
storeq	8	8		store to memory	special
splatw3q	8	8		duplicates high 16-bits to lower 48 bits	special
convsbw	2	1		convert signed	a
convubw	2	1		convert unsigned	a
splatbw	2	1		duplicates 8 bits to both halfs of 16 bits	special
splatbl	4	1		duplicates 8 bits to all parts of 32 bits	special
convswl	4	2		convert signed	a
convuwl	4	2		convert unsigned	a
convslq	8	4		signed convert	a
convulq	8	4		unsigned convert	a
convwb	1	2		convert	a
convhwb	1	2		shift and convert	a>>8
convssswb	1	2		convert signed to signed with saturation	clamp(a)
convsuswb	1	2		convert signed to unsigned with saturation	clamp(a)
convuuswb	1	2		convert unsigned to unsigned with saturation	clamp(a)
convlw	2	4		convert	a
convhlw	2	4		shift and convert	a>>16
convssslw	2	4		convert signed to signed with saturation	clamp(a)
convql	4	8		convert	a
mulsbw	2	1	1	multiply signed	a * b
mulubw	2	1	1	multiply unsigned	a * b
mulswl	4	2	2	multiply signed	a * b
muluwl	4	2	2	multiply unsigned	a * b
accl	4	4		accumulate	+= a
swapw	2	2		endianness swap	special
swapl	4	4		endianness swap	special
select0wb	1	2		select first half	special
select1wb	1	2		select second half	special
select0lw	2	4		select first half	special
select1lw	2	4		select second half	special
mergewl	4	2	2	merge halves	special
mergebw	2	1	1	merge halves	special
splitlw	2	4		split first/second words	special
splitwb	1	2		split first/second bytes	special
addf	4	4	4	add	a + b
subf	4	4	4	subtract	a - b
mulf	4	4	4	multiply	a * b
maxf	4	4	4	maximum	max(a,b)
minf	4	4	4	minimum	min(a,b)
cmpeqf	4	4	4	compare equal	(a == b) ? (~0) : 0
convfl	4	4		convert float point to integer	a
convlf	4	4		convert integer to floating point	a

3 comments:

René Dudfield said...: Hi,

very cool that you are using Orc.

What are these columns in your table? "dst src1 src2"

I've been planning to try orc for a few pygame/SDL routines for a while now. So I look forward to hearing about your experiences with it.

have fun,; 12:43 PM
Unknown said...: These are the sizes for source and destination array types. When an "S" follows, its a single value instead of an array.

So addb (add byte arrays) taken an array of bytes, multiplies each value by its position in a second array of bytes, and outputs the result into a third array.

In other words (in pythonese):
a = []
b = []
c = []
for i in range(len(a)) :
c.append(a[i] + b[i])

There's some obvious opcodes missing, such as divide float (divf) and squareroot float (sqrtf). All three SIMD instruction sets support these (directly or indirectly), support for them just needs to be added to Orc.; 2:17 PM
René Dudfield said...: ah, that makes more sense. Thanks.; 6:15 AM

Arc's Soy Machine

Thursday, December 30, 2010

common Orc opcodes

3 comments:

About Me

Blog Archive

Links

Flattr this blog