In order to optimize a heavily used inner loop (3x3xN tensor convolution in winograd domain), I had some good results by using the maximum amount of neon regist
pageload
flutter-devtools
window-width
pygobject
mydac
stanford-nlp
entity
terraform-state
treasure-data
cubin
vgg-net
svg-map
ios-shortcut
gremlinpython
apache-commons-csv
cross-product
dual-table
react-pagination
xcopy
facebook-browser
get-aduser
uitabcontroller
spring-hateoas
sobipro
scanline
servicecollection
simplecursoradapter
deleted-functions
openjdk-17
infinite-loop