Models

Sources

There are a few sources of preconverted models which can be used with OpenArc

If you need help converting a particular model join Discord and we can help you!

LLMs

Models
Qwen3-1.7B-int8_asym-ov
Qwen3-4B-Instruct-2507-int4_asym-awq-ov
Satyr-V0.1-4B-HF-int4_awq-ov
Dolphin-X1-8B-int4_asym-awq-ov
Qwen3-8B-ShiningValiant3-int4-asym-ov
Qwen3-14B-int4_sym-ov
Cydonia-24B-v4.2.0-int4_asym-awq-ov
Qwen2.5-Microsoft-NextCoder-Soar-Instruct-FUSED-CODER-Fast-11B-int4_asym-awq-ov
Magistral-Small-2509-Text-Only-int4_asym-awq-ov
Hermes-4-70B-int4_asym-awq-ov
Qwen2.5-Coder-32B-Instruct-int4_sym-awq-ov
Qwen3-32B-Instruct-int4_sym-awq-ov
Big-Tiger-Gemma-27B-v3-int4-asym-ov
Nanbeige4.1-3B-openvino
Cydonia-24B-v4.3-OpenVINO-INT4
Nemotron-Cascade-14B-Thinking-int4_asym-se-ov
NousCoder-14B-int4_sym-ov
Anubis-Mini-8B-v1-int4_asym-ov
Qwen3.6-27B-int4-asym-ov
Qwen3.5-9B-int4-asym-ov

VLMs

Models
gemma-3-4b-it-int8_asym-ov
Gemma-3-12b-it-qat-int4_asym-ov
Qwen2.5-VL-7B-Instruct-int4_sym-ov
Nanonets-OCR2-3B-LM-INT4_ASYM-VE-FP16-ov
Qwen3-VL-4B-Instruct-int4_asym-ov

ASR

Whisper

Models
distil-whisper-large-v3-int8-ov
distil-whisper-large-v3-fp16-ov
whisper-large-v3-int8-ov
openai-whisper-large-v3-fp16-ov

Qwen3-ASR

Models
Qwen3-ASR-0.6B-INT8_ASYM-OpenVINO

TTS

Kokoro

Models
Kokoro-82M-FP16-OpenVINO

Qwen3-TTS

Models
Qwen3-TTS-12Hz-CustomVoice-1.7B-INT8-OpenVINO
Qwen3-TTS-12Hz-VoiceDesign-1.7B-INT8-OpenVINO
Qwen3-TTS-12Hz-Base-1.7B-INT8-OpenVINO

Embedding

Models
Qwen3-Embedding-0.6B-int8_asym-ov

Rerank

Models
Qwen3-Reranker-0.6B-fp16-ov

Model-Specific Instructions

Qwen3.5/3.6

Is Qwen3.5/3.6 supported?

Qwen3.5 models has unofficial support. However, they do require you to build openvino and openvino.genai from source. You will also need to install the latest version of optimum-intel.

To add a model, run the command openarc add --model-name MODEL_NAME --model-path /path/to/model --model-type vlm --device GPU|CPU --runtime-config '{"ATTENTION_BACKEND": "SDPA"}'. OpenArc resolves the VLM vision token from the model's config.json. Intel is currently working on adding support for Qwen3.5 to utilize the PA attention backend but it has not been merged yet. This currently appears to be much more performant. If you have built openvino.genai with the support included, you may change the runtime config parameter to use PA instead.

How do I control thinking?

Qwen3.5 utilizes chat instructions for thinking control. You can enable thinking by using the parameter chat_template_kwargs with a value of {"enable_thinking": true} and disable it by setting the value to {"enable_thinking": false}.

Previous reasoning is also retained within a conversation. You can enable or disable it similarily by adding preserve_previous_think to chat_template_kwargs.

For example, to enable thinking and disable previous reasoning, you would pass chat_template_kwargs with a value of {"enable_thinking": true, "preserve_previous_think": false}.