Qwen3.6-35B lokal auf dem Mac mit MLX deployen – 77 tok/s

Dieser Beitrag erklärt Schritt für Schritt, wie du das Qwen3.6-35B-A3B-Modell (4-Bit-quantisiert) auf einem Mac mit Apple Silicon und mindestens 48 GB Unified Memory mit Apples MLX-Framework lokal ausführst. Der gesamte Vorgang dauert etwa 20 bis 40 Minuten, wobei der Großteil für den Modell-Download anfällt. Nach Abschluss läuft auf http://127.0.0.1:7979 ein OpenAI-API-kompatibler lokaler Inference-Server, der sich nahtlos in gängige AI-Clients integrieren lässt – und dabei deine Daten privat hält.

Hintergrund

This guide shows how to run the Qwen3.6-35B-A3B model (4-bit quantized) locally on an Apple Silicon Mac (M1/M2/M3/M4) with at least 48 GB of unified RAM using Apple's MLX framework. Setup takes 20-40 minutes, mostly spent on model download. By the end, you'll have an OpenAI API-compatible local inference server running at http://127.0.0.1:7979, ready to serve any OpenAI-compatible client.

Tiefenanalyse

How to Run Qwen3.6-35B on Your Mac at 77 tok/s

Branchenwirkung

AI industry dynamics in 2026 Q1 continue to evolve rapidly, with this development representing a significant milestone in the transition from technology breakthroughs to mass commercialization.

Ausblick

The convergence of infrastructure investment growth, security standardization, open-source competition, and agentic AI deployment will reshape the technology landscape over the next 12-18 months.