Thursday, December 30, 2010

common Orc opcodes

I've been going through liboil's 0.3 source to rewrite the oil_yuv2rgbx_sub2_u8 function we use for Theora decoding to Orc pseudo-assembly code.

Because the Orc opcode documentation splits opcode description and processor support between two tables, for reference I wrote a quick Python script to build a table of Orc opcodes common to SSE (x86), Altivec (PPC/Cell), and NEON (Arm Cortex) processors.

Here's that table for reference, at least until I put the time to format it for a wiki:

opcodedstsrc1src2descriptionpseudo code
absb1 1 absolute value (a < 0) ? -a : a
addb1 1 1 add a + b
addssb1 1 1 add with signed saturate clamp(a + b)
addusb1 1 1 add with unsigned saturate clamp(a + b)
andb1 1 1 bitwise AND a & b
andnb1 1 1 bitwise AND NOT a & (~b)
avgsb1 1 1 signed average (a + b + 1)>>1
avgub1 1 1 unsigned average (a + b + 1)>>1
cmpeqb1 1 1 compare equal (a == b) ? (~0) : 0
cmpgtsb1 1 1 compare greater than (a > b) ? (~0) : 0
copyb1 1 copy a
loadb1 1 load from memory array[i]
loadpb1 1 load parameter or constant scalar
maxsb1 1 1 signed maximum (a > b) ? a : b
maxub1 1 1 unsigned maximum (a > b) ? a : b
minsb1 1 1 signed minimum (a < b) ? a : b
minub1 1 1 unsigned minimum (a < b) ? a : b
mullb1 1 1 low bits of multiply a * b
mulhsb1 1 1 high bits of signed multiply (a * b) >> 8
mulhub1 1 1 high bits of unsigned multiply (a * b) >> 8
orb1 1 1 bitwise or a | b
shlb1 1 1S shift left a << b
shrsb1 1 1S signed shift right a >> b
shrub1 1 1S unsigned shift right a >> b
signb1 1 sign sign(a)
storeb1 1 store to memory special
subb1 1 1 subtract a - b
subssb1 1 1 subtract with signed saturate clamp(a - b)
subusb1 1 1 subtract with unsigned saturate clamp(a - b)
xorb1 1 1 bitwise XOR a ^ b
absw2 2 absolute value (a < 0) ? -a : a
addw2 2 2 add a + b
addssw2 2 2 add with signed saturate clamp(a + b)
addusw2 2 2 add with unsigned saturate clamp(a + b)
andw2 2 2 bitwise AND a & b
andnw2 2 2 bitwise AND NOT a & (~b)
avgsw2 2 2 signed average (a + b + 1)>>1
avguw2 2 2 unsigned average (a + b + 1)>>1
cmpeqw2 2 2 compare equal (a == b) ? (~0) : 0
cmpgtsw2 2 2 compare greater than (a > b) ? (~0) : 0
copyw2 2 copy a
div255w2 2 divide by 255 a/255
loadw2 2 load from memory array[i]
loadpw2 2 load parameter or constant scalar
maxsw2 2 2 signed maximum (a > b) ? a : b
maxuw2 2 2 unsigned maximum (a > b) ? a : b
minsw2 2 2 signed minimum (a < b) ? a : b
minuw2 2 2 unsigned minimum (a < b) ? a : b
mullw2 2 2 low bits of multiply a * b
mulhsw2 2 2 high bits of signed multiply (a * b) >> 8
mulhuw2 2 2 high bits of unsigned multiply (a * b) >> 8
orw2 2 2 bitwise or a | b
shlw2 2 2S shift left a << b
shrsw2 2 2S signed shift right a >> b
shruw2 2 2S unsigned shift right a >> b
signw2 2 sign sign(a)
storew2 2 store to memory special
subw2 2 2 subtract a - b
subssw2 2 2 subtract with signed saturate clamp(a - b)
subusw2 2 2 subtract with unsigned saturate clamp(a - b)
xorw2 2 2 bitwise XOR a ^ b
absl4 4 absolute value (a < 0) ? -a : a
addl4 4 4 add a + b
addssl4 4 4 add with signed saturate clamp(a + b)
addusl4 4 4 add with unsigned saturate clamp(a + b)
andl4 4 4 bitwise AND a & b
andnl4 4 4 bitwise AND NOT a & (~b)
avgsl4 4 4 signed average (a + b + 1)>>1
avgul4 4 4 unsigned average (a + b + 1)>>1
cmpeql4 4 4 compare equal (a == b) ? (~0) : 0
cmpgtsl4 4 4 compare greater than (a > b) ? (~0) : 0
copyl4 4 copy a
loadl4 4 load from memory array[i]
loadpl4 4 load parameter or constant scalar
maxsl4 4 4 signed maximum (a > b) ? a : b
maxul4 4 4 unsigned maximum (a > b) ? a : b
minsl4 4 4 signed minimum (a < b) ? a : b
minul4 4 4 unsigned minimum (a < b) ? a : b
orl4 4 4 bitwise or a | b
shll4 4 4S shift left a << b
shrsl4 4 4S signed shift right a >> b
shrul4 4 4S unsigned shift right a >> b
signl4 4 sign sign(a)
storel4 4 store to memory special
subl4 4 4 subtract a - b
subssl4 4 4 subtract with signed saturate clamp(a - b)
subusl4 4 4 subtract with unsigned saturate clamp(a - b)
xorl4 4 4 bitwise XOR a ^ b
loadq8 8 load from memory array[i]
storeq8 8 store to memory special
splatw3q8 8 duplicates high 16-bits to lower 48 bits special
convsbw2 1 convert signed a
convubw2 1 convert unsigned a
splatbw2 1 duplicates 8 bits to both halfs of 16 bits special
splatbl4 1 duplicates 8 bits to all parts of 32 bits special
convswl4 2 convert signed a
convuwl4 2 convert unsigned a
convslq8 4 signed convert a
convulq8 4 unsigned convert a
convwb1 2 convert a
convhwb1 2 shift and convert a>>8
convssswb1 2 convert signed to signed with saturation clamp(a)
convsuswb1 2 convert signed to unsigned with saturation clamp(a)
convuuswb1 2 convert unsigned to unsigned with saturation clamp(a)
convlw2 4 convert a
convhlw2 4 shift and convert a>>16
convssslw2 4 convert signed to signed with saturation clamp(a)
convql4 8 convert a
mulsbw2 1 1 multiply signed a * b
mulubw2 1 1 multiply unsigned a * b
mulswl4 2 2 multiply signed a * b
muluwl4 2 2 multiply unsigned a * b
accl4 4 accumulate += a
swapw2 2 endianness swap special
swapl4 4 endianness swap special
select0wb1 2 select first half special
select1wb1 2 select second half special
select0lw2 4 select first half special
select1lw2 4 select second half special
mergewl4 2 2 merge halves special
mergebw2 1 1 merge halves special
splitlw2 4 split first/second words special
splitwb1 2 split first/second bytes special
addf4 4 4 add a + b
subf4 4 4 subtract a - b
mulf4 4 4 multiply a * b
maxf4 4 4 maximum max(a,b)
minf4 4 4 minimum min(a,b)
cmpeqf4 4 4 compare equal (a == b) ? (~0) : 0
convfl4 4 convert float point to integer a
convlf4 4 convert integer to floating point a

3 comments:

René Dudfield said...

Hi,

very cool that you are using Orc.

What are these columns in your table? "dst src1 src2"

I've been planning to try orc for a few pygame/SDL routines for a while now. So I look forward to hearing about your experiences with it.

have fun,

Unknown said...

These are the sizes for source and destination array types. When an "S" follows, its a single value instead of an array.

So addb (add byte arrays) taken an array of bytes, multiplies each value by its position in a second array of bytes, and outputs the result into a third array.

In other words (in pythonese):
a = []
b = []
c = []
for i in range(len(a)) :
c.append(a[i] + b[i])

There's some obvious opcodes missing, such as divide float (divf) and squareroot float (sqrtf). All three SIMD instruction sets support these (directly or indirectly), support for them just needs to be added to Orc.

René Dudfield said...

ah, that makes more sense. Thanks.