Skip to content

[Feature Request] Add an intrinsic for M4 8×8 compact simdgroup_matrix #6644

Description

@winding-lines

Review Mojo's priorities

What is your request?

I am working on a pure mojo local inference engine Millrace. Depending on the operation Millrace is about 3x-10x times slower than MLX and Ollama, on my M4 mini. If the compiler had intrinsic support for simdgroup_matrix on M4 Claude could close the gap. There is already a similar intrinsic for M5.

Many thanks for your help,

Marius

What is your motivation for this change?

I would like to run local models on my M4 mac mini, some use cases are possible once the tokens/second reach the hundred of tokens/sec.

Any other details?

No response

Metadata

Metadata

Assignees

Labels

Needs TriageIssue needs to be routed/triaged to a particular team stillenhancementNew feature or requestmojoIssues that are related to mojo

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions