Because the Orc opcode documentation splits opcode description and processor support between two tables, for reference I wrote a quick Python script to build a table of Orc opcodes common to SSE (x86), Altivec (PPC/Cell), and NEON (Arm Cortex) processors.
Here's that table for reference, at least until I put the time to format it for a wiki:
opcode | dst | src1 | src2 | description | pseudo code |
---|---|---|---|---|---|
absb | 1 | 1 | absolute value | (a < 0) ? -a : a | |
addb | 1 | 1 | 1 | add | a + b |
addssb | 1 | 1 | 1 | add with signed saturate | clamp(a + b) |
addusb | 1 | 1 | 1 | add with unsigned saturate | clamp(a + b) |
andb | 1 | 1 | 1 | bitwise AND | a & b |
andnb | 1 | 1 | 1 | bitwise AND NOT | a & (~b) |
avgsb | 1 | 1 | 1 | signed average | (a + b + 1)>>1 |
avgub | 1 | 1 | 1 | unsigned average | (a + b + 1)>>1 |
cmpeqb | 1 | 1 | 1 | compare equal | (a == b) ? (~0) : 0 |
cmpgtsb | 1 | 1 | 1 | compare greater than | (a > b) ? (~0) : 0 |
copyb | 1 | 1 | copy | a | |
loadb | 1 | 1 | load from memory | array[i] | |
loadpb | 1 | 1 | load parameter or constant | scalar | |
maxsb | 1 | 1 | 1 | signed maximum | (a > b) ? a : b |
maxub | 1 | 1 | 1 | unsigned maximum | (a > b) ? a : b |
minsb | 1 | 1 | 1 | signed minimum | (a < b) ? a : b |
minub | 1 | 1 | 1 | unsigned minimum | (a < b) ? a : b |
mullb | 1 | 1 | 1 | low bits of multiply | a * b |
mulhsb | 1 | 1 | 1 | high bits of signed multiply | (a * b) >> 8 |
mulhub | 1 | 1 | 1 | high bits of unsigned multiply | (a * b) >> 8 |
orb | 1 | 1 | 1 | bitwise or | a | b |
shlb | 1 | 1 | 1S | shift left | a << b |
shrsb | 1 | 1 | 1S | signed shift right | a >> b |
shrub | 1 | 1 | 1S | unsigned shift right | a >> b |
signb | 1 | 1 | sign | sign(a) | |
storeb | 1 | 1 | store to memory | special | |
subb | 1 | 1 | 1 | subtract | a - b |
subssb | 1 | 1 | 1 | subtract with signed saturate | clamp(a - b) |
subusb | 1 | 1 | 1 | subtract with unsigned saturate | clamp(a - b) |
xorb | 1 | 1 | 1 | bitwise XOR | a ^ b |
absw | 2 | 2 | absolute value | (a < 0) ? -a : a | |
addw | 2 | 2 | 2 | add | a + b |
addssw | 2 | 2 | 2 | add with signed saturate | clamp(a + b) |
addusw | 2 | 2 | 2 | add with unsigned saturate | clamp(a + b) |
andw | 2 | 2 | 2 | bitwise AND | a & b |
andnw | 2 | 2 | 2 | bitwise AND NOT | a & (~b) |
avgsw | 2 | 2 | 2 | signed average | (a + b + 1)>>1 |
avguw | 2 | 2 | 2 | unsigned average | (a + b + 1)>>1 |
cmpeqw | 2 | 2 | 2 | compare equal | (a == b) ? (~0) : 0 |
cmpgtsw | 2 | 2 | 2 | compare greater than | (a > b) ? (~0) : 0 |
copyw | 2 | 2 | copy | a | |
div255w | 2 | 2 | divide by 255 | a/255 | |
loadw | 2 | 2 | load from memory | array[i] | |
loadpw | 2 | 2 | load parameter or constant | scalar | |
maxsw | 2 | 2 | 2 | signed maximum | (a > b) ? a : b |
maxuw | 2 | 2 | 2 | unsigned maximum | (a > b) ? a : b |
minsw | 2 | 2 | 2 | signed minimum | (a < b) ? a : b |
minuw | 2 | 2 | 2 | unsigned minimum | (a < b) ? a : b |
mullw | 2 | 2 | 2 | low bits of multiply | a * b |
mulhsw | 2 | 2 | 2 | high bits of signed multiply | (a * b) >> 8 |
mulhuw | 2 | 2 | 2 | high bits of unsigned multiply | (a * b) >> 8 |
orw | 2 | 2 | 2 | bitwise or | a | b |
shlw | 2 | 2 | 2S | shift left | a << b |
shrsw | 2 | 2 | 2S | signed shift right | a >> b |
shruw | 2 | 2 | 2S | unsigned shift right | a >> b |
signw | 2 | 2 | sign | sign(a) | |
storew | 2 | 2 | store to memory | special | |
subw | 2 | 2 | 2 | subtract | a - b |
subssw | 2 | 2 | 2 | subtract with signed saturate | clamp(a - b) |
subusw | 2 | 2 | 2 | subtract with unsigned saturate | clamp(a - b) |
xorw | 2 | 2 | 2 | bitwise XOR | a ^ b |
absl | 4 | 4 | absolute value | (a < 0) ? -a : a | |
addl | 4 | 4 | 4 | add | a + b |
addssl | 4 | 4 | 4 | add with signed saturate | clamp(a + b) |
addusl | 4 | 4 | 4 | add with unsigned saturate | clamp(a + b) |
andl | 4 | 4 | 4 | bitwise AND | a & b |
andnl | 4 | 4 | 4 | bitwise AND NOT | a & (~b) |
avgsl | 4 | 4 | 4 | signed average | (a + b + 1)>>1 |
avgul | 4 | 4 | 4 | unsigned average | (a + b + 1)>>1 |
cmpeql | 4 | 4 | 4 | compare equal | (a == b) ? (~0) : 0 |
cmpgtsl | 4 | 4 | 4 | compare greater than | (a > b) ? (~0) : 0 |
copyl | 4 | 4 | copy | a | |
loadl | 4 | 4 | load from memory | array[i] | |
loadpl | 4 | 4 | load parameter or constant | scalar | |
maxsl | 4 | 4 | 4 | signed maximum | (a > b) ? a : b |
maxul | 4 | 4 | 4 | unsigned maximum | (a > b) ? a : b |
minsl | 4 | 4 | 4 | signed minimum | (a < b) ? a : b |
minul | 4 | 4 | 4 | unsigned minimum | (a < b) ? a : b |
orl | 4 | 4 | 4 | bitwise or | a | b |
shll | 4 | 4 | 4S | shift left | a << b |
shrsl | 4 | 4 | 4S | signed shift right | a >> b |
shrul | 4 | 4 | 4S | unsigned shift right | a >> b |
signl | 4 | 4 | sign | sign(a) | |
storel | 4 | 4 | store to memory | special | |
subl | 4 | 4 | 4 | subtract | a - b |
subssl | 4 | 4 | 4 | subtract with signed saturate | clamp(a - b) |
subusl | 4 | 4 | 4 | subtract with unsigned saturate | clamp(a - b) |
xorl | 4 | 4 | 4 | bitwise XOR | a ^ b |
loadq | 8 | 8 | load from memory | array[i] | |
storeq | 8 | 8 | store to memory | special | |
splatw3q | 8 | 8 | duplicates high 16-bits to lower 48 bits | special | |
convsbw | 2 | 1 | convert signed | a | |
convubw | 2 | 1 | convert unsigned | a | |
splatbw | 2 | 1 | duplicates 8 bits to both halfs of 16 bits | special | |
splatbl | 4 | 1 | duplicates 8 bits to all parts of 32 bits | special | |
convswl | 4 | 2 | convert signed | a | |
convuwl | 4 | 2 | convert unsigned | a | |
convslq | 8 | 4 | signed convert | a | |
convulq | 8 | 4 | unsigned convert | a | |
convwb | 1 | 2 | convert | a | |
convhwb | 1 | 2 | shift and convert | a>>8 | |
convssswb | 1 | 2 | convert signed to signed with saturation | clamp(a) | |
convsuswb | 1 | 2 | convert signed to unsigned with saturation | clamp(a) | |
convuuswb | 1 | 2 | convert unsigned to unsigned with saturation | clamp(a) | |
convlw | 2 | 4 | convert | a | |
convhlw | 2 | 4 | shift and convert | a>>16 | |
convssslw | 2 | 4 | convert signed to signed with saturation | clamp(a) | |
convql | 4 | 8 | convert | a | |
mulsbw | 2 | 1 | 1 | multiply signed | a * b |
mulubw | 2 | 1 | 1 | multiply unsigned | a * b |
mulswl | 4 | 2 | 2 | multiply signed | a * b |
muluwl | 4 | 2 | 2 | multiply unsigned | a * b |
accl | 4 | 4 | accumulate | += a | |
swapw | 2 | 2 | endianness swap | special | |
swapl | 4 | 4 | endianness swap | special | |
select0wb | 1 | 2 | select first half | special | |
select1wb | 1 | 2 | select second half | special | |
select0lw | 2 | 4 | select first half | special | |
select1lw | 2 | 4 | select second half | special | |
mergewl | 4 | 2 | 2 | merge halves | special |
mergebw | 2 | 1 | 1 | merge halves | special |
splitlw | 2 | 4 | split first/second words | special | |
splitwb | 1 | 2 | split first/second bytes | special | |
addf | 4 | 4 | 4 | add | a + b |
subf | 4 | 4 | 4 | subtract | a - b |
mulf | 4 | 4 | 4 | multiply | a * b |
maxf | 4 | 4 | 4 | maximum | max(a,b) |
minf | 4 | 4 | 4 | minimum | min(a,b) |
cmpeqf | 4 | 4 | 4 | compare equal | (a == b) ? (~0) : 0 |
convfl | 4 | 4 | convert float point to integer | a | |
convlf | 4 | 4 | convert integer to floating point | a |
3 comments:
Hi,
very cool that you are using Orc.
What are these columns in your table? "dst src1 src2"
I've been planning to try orc for a few pygame/SDL routines for a while now. So I look forward to hearing about your experiences with it.
have fun,
These are the sizes for source and destination array types. When an "S" follows, its a single value instead of an array.
So addb (add byte arrays) taken an array of bytes, multiplies each value by its position in a second array of bytes, and outputs the result into a third array.
In other words (in pythonese):
a = []
b = []
c = []
for i in range(len(a)) :
c.append(a[i] + b[i])
There's some obvious opcodes missing, such as divide float (divf) and squareroot float (sqrtf). All three SIMD instruction sets support these (directly or indirectly), support for them just needs to be added to Orc.
ah, that makes more sense. Thanks.
Post a Comment