In order to optimize a heavily used inner loop (3x3xN tensor convolution in winograd domain), I had some good results by using the maximum amount of neon regist
location
artillery
alexa-presentation-language
intel-gdb
perldoc
file-diffs
sjcl
packet-sniffers
rs485
layouttransition
json-patch
edit-in-place
nsview
spring-mvc
java-heap
pypdf
math.sqrt
deviceid
selenium-ide
sumoselect.js
export-to-image
workspace
mobiscroll
cifilter
createitem
space-complexity
temporary-objects
bayeux
hypergraph
transparentproxy