This 1‑day course focuses on building intelligent applications that can see, interpret, and reason over images and documents using different multimodal models and agent-based tools. Learners explore how visual and document inputs can be combined with language models to enable structured extraction, analysis, and decision-making workflows. The course emphasizes practical patterns for extracting information, orchestrating tools, and grounding model responses in visual data.
Develop a vision-enabled generative AI application
Generate images with AI
Generate videos with Microsoft Foundry
Analyze images with Content Understanding
Create a multimodal analysis solution with Azure Content Understanding
Create an Azure Content Understanding client application
Extract data with Azure Document Intelligence
Create a knowledge mining solution with Azure AI Search