MLX로 Mac에서 Qwen3.6-35B 로컬 배포하기 – 77 tok/s 속도

이 글에서는 Apple Silicon(M1~M4) 및 최소 48GB 통합 RAM을 갖춘 Mac에서 Apple의 MLX 프레임워크를 사용해 Qwen3.6-35B-A3B 모델(4비트 양자화)을 로컬에 배포하는 전 과정을 안내합니다. 전체 설정에는 약 20~40분이 소요되며, 대부분은 모델 다운로드에 할애됩니다. 완료되면 http://127.0.0.1:7979에서 OpenAI API 호환 로컬 추론 서버가 가동되어, 모든 주요 AI 클라이언트에 즉시 연결할 수 있으며 데이터 프라이버시와 추론 효율을 동시에 확보할 수 있습니다.

배경

This guide shows how to run the Qwen3.6-35B-A3B model (4-bit quantized) locally on an Apple Silicon Mac (M1/M2/M3/M4) with at least 48 GB of unified RAM using Apple's MLX framework. Setup takes 20-40 minutes, mostly spent on model download. By the end, you'll have an OpenAI API-compatible local inference server running at http://127.0.0.1:7979, ready to serve any OpenAI-compatible client.

심층 분석

How to Run Qwen3.6-35B on Your Mac at 77 tok/s

산업 영향

AI industry dynamics in 2026 Q1 continue to evolve rapidly, with this development representing a significant milestone in the transition from technology breakthroughs to mass commercialization.

전망

The convergence of infrastructure investment growth, security standardization, open-source competition, and agentic AI deployment will reshape the technology landscape over the next 12-18 months.