Multimodal AI Systems Architect (AI Engineering)

Boston, USA·Posted 1mo ago
web3ragllm
<p>We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.</p> <p>&nbsp;</p> <p><strong>Responsibilities:</strong></p> <ul> <li>Integrate vision encoders and audio-native models into core agent reasoning loops.</li> <li>Optimize streaming latency for voice-to-voice AI interactions.</li> <li>Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.</li> </ul> <p><strong>Qualifications:</strong></p> <ul> <li>Experience with Whisper, CLIP, and multimodal LLM integration.</li> <li>Knowledge of streaming architectures and WebRTC.</li> <li>Expertise in cross-modal alignment.</li> </ul> <p>&nbsp;</p>