Review Mojo's priorities
What is your request?
I am working on a pure mojo local inference engine Millrace. Depending on the operation Millrace is about 3x-10x times slower than MLX and Ollama, on my M4 mini. If the compiler had intrinsic support for simdgroup_matrix on M4 Claude could close the gap. There is already a similar intrinsic for M5.
Many thanks for your help,
Marius
What is your motivation for this change?
I would like to run local models on my M4 mac mini, some use cases are possible once the tokens/second reach the hundred of tokens/sec.
Any other details?
No response
Review Mojo's priorities
What is your request?
I am working on a pure mojo local inference engine Millrace. Depending on the operation Millrace is about 3x-10x times slower than MLX and Ollama, on my M4 mini. If the compiler had intrinsic support for simdgroup_matrix on M4 Claude could close the gap. There is already a similar intrinsic for M5.
Many thanks for your help,
Marius
What is your motivation for this change?
I would like to run local models on my M4 mac mini, some use cases are possible once the tokens/second reach the hundred of tokens/sec.
Any other details?
No response