Implementing Cisco Data Center AI Infrastructure (DCAI)
Главная страница » Курсы » Courses in English » Implementing Cisco Data Center AI Infrastructure (DCAI)
- Duration: 5 days (40 hours)
- Date: on request
Implementing Cisco Data Center AI Infrastructure (DCAI)
This course will help:
- Gain comprehensive skills in supporting, securing, and optimizing artificial intelligence workloads in modern data center environments
- Learn the design, implementation, and advanced troubleshooting of AI infrastructure, including network challenges and specialized hardware
- Gain in-depth knowledge of AI/ML concepts, generative artificial intelligence, and their practical application in network management and automation
- Master practical techniques for monitoring, diagnostics, and troubleshooting using tools such as Splunk, as well as applying AI to improve the efficiency of network operations
- Prepare for the 300-640 DCAI exam
Course syllabus:
- Fundamentals of AI
- Generative AI
- AI Use Cases
- AI-ML Clusters and Models
- AI Toolset—Jupyter Notebook
- AI Infrastructure
- AI Workloads Placement and Interoperability
- AI Policies
- AI Sustainability
- AI Infrastructure Design
- Key Network Challenges and Requirements for AI Workloads
- AI Transport
- Connectivity Models
- AI Network
- Architecture Migration to AI/ML Network
- Application-Level Protocols
- High-Throughput Converged Fabrics
- Building Lossless Fabrics
- Congestion Visibility
- Data Preparation for AI
- AI/ML Workload Data Performance
- AI-Enabling Hardware
- Compute Resources
- Compute Resource Solutions
- Virtual Resources
- Storage Resources
- Setting Up AI Cluster
- Deploy and Use Open Source GPT Models for RAG
- AI Infrastructure Operations and Monitoring
- Troubleshooting AI Infrastructure
- Troubleshoot Common Issues in AI/ML Fabric
You will learn:
- Describe key concepts in artificial intelligence, focusing on traditional AI, machine learning, and deep learning techniques and their applications
- Describe generative AI, its challenges, and future trends, while examining the nuances between traditional and modern AI methodologies
- Explain how AI enhances network management and security through intelligent automation, predictive analytics, and anomaly detection
- Describe the key concepts, architecture, and basic management principles of AI-ML clusters, as well as describe the process of acquiring, fine-tuning, optimizing and using pre-trained ML models
- Use the capabilities of Jupyter Lab and Generative AI to automate network operations, write Python code, and leverage AI models for enhanced productivity
- Describe the essential components and considerations for setting up robust AI infrastructure
- Evaluate and implement effective workload placement strategies and ensure interoperability within AI systems
- Explore compliance standards, policies, and governance frameworks relevant to AI systems
- Describe sustainable AI infrastructure practices, focusing on environmental and economic sustainability
- Guide AI infrastructure decisions to optimize efficiency and cost
- Describe key network challenges from the perspective of AI/ML application requirements
- Describe the role of optical and copper technologies in enabling AI/ML data center workloads
- Describe network connectivity models and network designs
- Describe important Layer 2 and Layer 3 protocols for AI and fog computing for Distributed AI processing
- Migrate AI workloads to dedicated AI network
- Explain the mechanisms and operations of RDMA and RoCE protocols
- Understand the architecture and features of high-performance Ethernet fabrics
- Explain the network mechanisms and QoS tools needed for building high-performance, lossless RoCE networks
- Describe ECN and PFC mechanisms, introduce Cisco Nexus Dashboard Insights for congestion monitoring, explore how different stages of AI/ML applications impact data center infrastructure, and vice versa
- Introduce the basic steps, challenges, and techniques regarding the data preparation process
- Use Cisco Nexus Dashboard Insights for monitoring AI/ML traffic flows
- Describe the importance of AI-specific hardware in reducing training times and supporting the advanced processing requirements of AI tasks
- Understand the compute hardware required to run AI/ML solutions
- Understand existing intelligence and AI/ML solutions
- Describe virtual infrastructure options and their considerations when deploying
- Explain data storage strategies, storage protocols, and software-defined storage
- Use NDFC to configure a fabric optimized for AI/ML workloads
- Use locally hosted GPT models with RAG for network engineering tasks
Pre-requisites:
To successfully complete this course, participants are recommended to have the following knowledge and skills:
- Cisco UCS compute architecture and operations
- Cisco Nexus switch portfolio and features
- Core data center technologies
Sign up for a course Implementing Cisco Data Center AI Infrastructure (DCAI)
The application has been successfully submitted!
Mistake!