[Feature Request] Add an intrinsic for M4 8&times;8 compact simdgroup_matrix

### Review Mojo's priorities

- [x] I have read the [roadmap and priorities](https://docs.modular.com/mojo/roadmap.html#overall-priorities) and I believe this request falls within the priorities.

### What is your request?

I am working on a pure mojo local inference engine [Millrace](https://github.com/millrace/mojo-backend#how-it-compares). Depending on the operation Millrace is about 3x-10x times slower than MLX and Ollama, on my M4 mini. If the compiler had intrinsic support for simdgroup_matrix on M4 Claude could close the gap. There is already a similar intrinsic for M5.

Many thanks for your help,

Marius


### What is your motivation for this change?

I would like to run local models on my M4 mac mini, some use cases are possible once the tokens/second reach the hundred of tokens/sec.

### Any other details?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add an intrinsic for M4 8×8 compact simdgroup_matrix #6644

Review Mojo's priorities

What is your request?

What is your motivation for this change?

Any other details?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature Request] Add an intrinsic for M4 8×8 compact simdgroup_matrix #6644

Description

Review Mojo's priorities

What is your request?

What is your motivation for this change?

Any other details?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions