(validated_models)= # Validated Models ## Text-only Language Models ### Text Generation Task **QEff Auto Class:** `QEFFAutoModelForCausalLM` | Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) | |-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------| | **MolmoForCausalLM** | Molmo① | [allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924) | ✕ | | **Olmo2ForCausalLM** | OLMo-2 | [allenai/OLMo-2-0425-1B](https://huggingface.co/allenai/OLMo-2-0425-1B) | ✔️ | | **FalconForCausalLM** | Falcon② | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ | | **Qwen3MoeForCausalLM** | Qwen3Moe | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | ✔️ | | **GemmaForCausalLM** | CodeGemma | [google/codegemma-2b](https://huggingface.co/google/codegemma-2b)
[google/codegemma-7b](https://huggingface.co/google/codegemma-7b) | ✔️ | | | Gemma③ | [google/gemma-2b](https://huggingface.co/google/gemma-2b)
[google/gemma-7b](https://huggingface.co/google/gemma-7b)
[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)
[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)
[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) | ✔️ | | **GptOssForCausalLM** | GPT-OSS | [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) | ✔️ | | **GPTBigCodeForCausalLM** | Starcoder1.5 | [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) | ✔️ | | | Starcoder2 | [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b) | ✔️ | | **GPTJForCausalLM** | GPT-J | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | ✔️ | | **GPT2LMHeadModel** | GPT-2 | [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) | ✔️ | | **GraniteForCausalLM** | Granite 3.1 | [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct)
[ibm-granite/granite-guardian-3.1-8b](https://huggingface.co/ibm-granite/granite-guardian-3.1-8b) | ✔️ | | | Granite 20B | [ibm-granite/granite-20b-code-base-8k](https://huggingface.co/ibm-granite/granite-20b-code-base-8k)
[ibm-granite/granite-20b-code-instruct-8k](https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k) | ✔️ | | **InternVLChatModel** | Intern-VL① | [OpenGVLab/InternVL2_5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B)
[OpenGVLab/InternVL3_5-1B](https://huggingface.co/OpenGVLab/InternVL3_5-1B) | ✔️ | | | | **LlamaForCausalLM** | CodeLlama | [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)
[codellama/CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf)
[codellama/CodeLlama-34b-hf](https://huggingface.co/codellama/CodeLlama-34b-hf) | ✔️ | | | DeepSeek-R1-Distill-Llama | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | ✔️ | | | InceptionAI-Adapted | [inceptionai/jais-adapted-7b](https://huggingface.co/inceptionai/jais-adapted-7b)
[inceptionai/jais-adapted-13b-chat](https://huggingface.co/inceptionai/jais-adapted-13b-chat)
[inceptionai/jais-adapted-70b](https://huggingface.co/inceptionai/jais-adapted-70b) | ✔️ | | | Llama 3.3 | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | ✔️ | | | Llama 3.2 | [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | ✔️ | | | Llama 3.1 | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
[meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | ✔️ | | | Llama 3 | [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
[meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | ✔️ | | | Llama 2 | [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)
[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | ✔️ | | | Vicuna | [lmsys/vicuna-13b-delta-v0](https://huggingface.co/lmsys/vicuna-13b-delta-v0)
[lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3)
[lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) | ✔️ | | **MistralForCausalLM** | Mistral | [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) | ✔️ | | **MixtralForCausalLM** | Codestral
Mixtral | [mistralai/Codestral-22B-v0.1](https://huggingface.co/mistralai/Codestral-22B-v0.1)
[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | ✔️ | | **Phi3ForCausalLM** | Phi-3②, Phi-3.5② | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ✔️ | | **QwenForCausalLM** | DeepSeek-R1-Distill-Qwen | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ✔️ | | | Qwen2, Qwen2.5 | [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) | ✔️ | | **LlamaSwiftKVForCausalLM** | swiftkv | [Snowflake/Llama-3.1-SwiftKV-8B-Instruct](https://huggingface.co/Snowflake/Llama-3.1-SwiftKV-8B-Instruct) | ✔️ | | **Grok1ModelForCausalLM** | grok-1② | [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1) | ✕ | --- ## Embedding Models ### Text Embedding Task **QEff Auto Class:** `QEFFAutoModel` | Architecture | Model Family | Representative Models | vLLM Support | |--------------|--------------|---------------------------------|--------------| | **BertModel** | BERT-based | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
[BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)
[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
[e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | ✔️ | | **MPNetForMaskedLM** | MPNet | [sentence-transformers/multi-qa-mpnet-base-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1) | ✔️ | | **NomicBertModel** | NomicBERT② | [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) | ✕ | | **RobertaModel** | RoBERTa | [ibm-granite/granite-embedding-30m-english](https://huggingface.co/ibm-granite/granite-embedding-30m-english)
[ibm-granite/granite-embedding-125m-english](https://huggingface.co/ibm-granite/granite-embedding-125m-english) | ✔️ | | **XLMRobertaForSequenceClassification** | XLM-RoBERTa | [bge-reranker-v2-m3bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | ✔️ | | **XLMRobertaModel** | XLM-RoBERTa |[ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual)
[ibm-granite/granite-embedding-278m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual)
[intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) | ✔️ | --- ## Multimodal Language Models ### Vision-Language Models (Text + Image Generation) **QEff Auto Class:** `QEFFAutoModelForImageTextToText` | Architecture | Model Family | Representative Models | Qeff Single Qpc | Qeff Dual Qpc | vllm Single Qpc | vllm Dual Qpc | |------------------------------------|--------------|----------------------------------------------------------------------------------------|------------|---------------------|-------------------|-----------------| | **LlavaForConditionalGeneration** | LLaVA-1.5 | [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | ✔️ | ✔️ | ✔️ | ✔️ | | **MllamaForConditionalGeneration** | Llama 3.2 | [meta-llama/Llama-3.2-11B-Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
[meta-llama/Llama-3.2-90B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct) | ✔️ | ✔️ | ✔️ | ✔️ | | **LlavaNextForConditionalGeneration** | Granite Vision | [ibm-granite/granite-vision-3.2-2b](https://huggingface.co/ibm-granite/granite-vision-3.2-2b) | ✕ | ✔️ | ✕ | ✔️ | | **Llama4ForConditionalGeneration** | Llama-4-Scout | [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | ✔️ | ✔️ | ✔️ | ✔️ | | **Gemma3ForConditionalGeneration** | Gemma3③ | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) | ✔️ | ✔️ | ✕ | ✕ | | **Qwen2_5_VLForConditionalGeneration** | Qwen2.5-VL | [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | ✔️ | ✔️ | ✕ | ✔️ | | **Mistral3ForConditionalGeneration** | Mistral3| [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503)| ✕ | ✔️ | ✕ | ✕ | **Dual QPC:** In the Dual QPC(Qualcomm Program Container) setup, the model is split across two configurations: - The **Vision Encoder** runs in one QPC. - The **Language Model** (responsible for output generation) runs in a separate QPC. - The outputs from the Vision Encoder are transferred to the Language Model. - The dual QPC approach introduces the flexibility to run the vision and language components independently. **Single QPC:** In the single QPC(Qualcomm Program Container) setup, the entire model—including both image encoding and text generation—runs within a single QPC. There is no model splitting, and all components operate within the same execution environment. ```{NOTE} The choice between Single and Dual QPC is determined during model instantiation using the `kv_offload` setting. If the `kv_offload` is set to `True` it runs in dual QPC and if its set to `False` model runs in single QPC mode. ``` ### Audio Models (Automatic Speech Recognition) - Transcription Task **QEff Auto Class:** `QEFFAutoModelForSpeechSeq2Seq` | Architecture | Model Family | Representative Models | vLLM Support | |--------------|--------------|----------------------------------------------------------------------------------------|--------------| | **Whisper** | Whisper | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
[openai/whisper-base](https://huggingface.co/openai/whisper-base)
[openai/whisper-small](https://huggingface.co/openai/whisper-small)
[openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
[openai/whisper-large](https://huggingface.co/openai/whisper-large)
[openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | ✔️ | | **Wav2Vec2** | Wav2Vec2 | [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base)
[facebook/wav2vec2-large](https://huggingface.co/facebook/wav2vec2-large) | | --- ## Diffusion Models ### Image Generation Models **QEff Auto Class:** `QEffFluxPipeline` | Architecture | Model Family | Representative Models | vLLM Support | |--------------|--------------|----------------------------------------------------------------------------------------|--------------| | **FluxPipeline** | FLUX.1 | [black-forest-labs/FLUX.1-schnell](https://huggingface.co/stabilityai/stable-diffusion-2-1) | | ### Video Generation Models **QEff Auto Class:** `QEffWanPipeline` | Architecture | Model Family | Representative Models | vLLM Support | |--------------|--------------|----------------------------------------------------------------------------------------|--------------| | **WanPipeline** | Wan2.2 | [Wan-AI/Wan2.2-T2V-A14B-Diffusers](https://huggingface.co/stabilityai/stable-diffusion-2-1) | | --- ```{NOTE} ① Intern-VL and Molmo models are Vision-Language Models but use `QEFFAutoModelForCausalLM` for inference to stay compatible with HuggingFace Transformers. ② Set `trust_remote_code=True` for end-to-end inference with vLLM. ③ Pass `disable_sliding_window` for few family models when using vLLM. ``` --- (models_coming_soon)= # Models Coming Soon | Architecture | Model Family | Representative Models | |-------------------------|--------------|--------------------------------------------| | **NemotronHForCausalLM** | NVIDIA Nemotron v3 | [NVIDIA Nemotron v3](https://huggingface.co/collections/nvidia/nvidia-nemotron-v3) | | **Sam3Model** | facebook/sam3 | [facebook/sam3](https://huggingface.co/facebook/sam3) | | **StableDiffusionModel** | HiDream-ai | [HiDream-ai/HiDream-I1-Full](https://huggingface.co/HiDream-ai/HiDream-I1-Full) | | **MistralLarge3Model** | Mistral Large 3 | [mistralai/mistral-large-3](https://huggingface.co/collections/mistralai/mistral-large-3) |