{"id":116516,"date":"2025-12-26T07:42:02","date_gmt":"2025-12-26T07:42:02","guid":{"rendered":"https:\/\/www.dumpsbase.com\/freedumps\/?p=116516"},"modified":"2025-12-26T07:42:02","modified_gmt":"2025-12-26T07:42:02","slug":"ncp-aii-exam-dumps-v9-03-are-online-for-your-ncp-ai-infrastructure-exam-preparation-continue-to-check-the-ncp-aii-free-dumps-part-3-q81-q120-today","status":"publish","type":"post","link":"https:\/\/www.dumpsbase.com\/freedumps\/ncp-aii-exam-dumps-v9-03-are-online-for-your-ncp-ai-infrastructure-exam-preparation-continue-to-check-the-ncp-aii-free-dumps-part-3-q81-q120-today.html","title":{"rendered":"NCP-AII Exam Dumps (V9.03) Are Online for Your NCP AI Infrastructure Exam Preparation: Continue to Check the NCP-AII Free Dumps (Part 3, Q81-Q120) Today"},"content":{"rendered":"<p>Learning the NCP-AII dumps (V9.03) is essential when preparing for your NVIDIA Certified Professional AI Infrastructure certification exam. By learning the updated exam questions and answers from DumpsBase, you can gain access to current information attested by experts. DumpsBase\u2019s materials are great, which not only promote a better understanding of the exam content but also ensure legitimate preparation. Before downloading the NCP-AII exam dumps (V9.03), you can check the free dumps below:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.dumpsbase.com\/freedumps\/latest-ncp-aii-dumps-v9-03-for-smooth-and-efficient-exam-preparation-read-nvidia-ncp-aii-free-dumps-part-1-q1-q40.html\"><em>NCP-AII free dumps (Part 1, Q1-Q40) of V9.03<\/em><\/a><\/li>\n<li><a href=\"https:\/\/www.dumpsbase.com\/freedumps\/passing-your-ncp-ai-infrastructure-exam-with-the-updated-ncp-aii-dumps-v9-03-continue-to-check-our-ncp-aii-free-dumps-part-2-q41-q80-online.html\"><em>NCP-AII free dumps (Part 2, Q41-Q80) of V9.03<\/em><\/a><\/li>\n<\/ul>\n<p>After reading all these demos, you can believe that DumpsBase ensures your success. NCP-AII exam dumps (V9.03) ensure that you are always up to date and well-prepared for the NVIDIA Certified Professional AI Infrastructure Exam.<\/p>\n<h2>Below are the <span style=\"background-color: #ffcc99;\"><em>NCP-AII free dumps (Part 3, Q81-Q120) of V9.03<\/em><\/span> for checking more:<\/h2>\n<script>\n\t  window.fbAsyncInit = function() {\n\t    FB.init({\n\t      appId            : '622169541470367',\n\t      autoLogAppEvents : true,\n\t      xfbml            : true,\n\t      version          : 'v3.1'\n\t    });\n\t  };\n\t\n\t  (function(d, s, id){\n\t     var js, fjs = d.getElementsByTagName(s)[0];\n\t     if (d.getElementById(id)) {return;}\n\t     js = d.createElement(s); js.id = id;\n\t     js.src = \"https:\/\/connect.facebook.net\/en_US\/sdk.js\";\n\t     fjs.parentNode.insertBefore(js, fjs);\n\t   }(document, 'script', 'facebook-jssdk'));\n\t<\/script><script type=\"text\/javascript\" >\ndocument.addEventListener(\"DOMContentLoaded\", function(event) { \nif(!window.jQuery) alert(\"The important jQuery library is not properly loaded in your site. Your WordPress theme is probably missing the essential wp_head() call. You can switch to another theme and you will see that the plugin works fine and this notice disappears. If you are still not sure what to do you can contact us for help.\");\n});\n<\/script>  \n  \n<div  id=\"watupro_quiz\" class=\"quiz-area single-page-quiz\">\n<p id=\"submittingExam11332\" style=\"display:none;text-align:center;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/plugins\/watupro\/img\/loading.gif\" width=\"16\" height=\"16\"><\/p>\n\n<div class=\"watupro-exam-description\" id=\"description-quiz-11332\"><\/div>\n\n<form action=\"\" method=\"post\" class=\"quiz-form\" id=\"quiz-11332\"  enctype=\"multipart\/form-data\" >\n<div class='watu-question ' id='question-1' style=';'><div id='questionWrap-1'  class='   watupro-question-id-445448'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>1. <\/span>A data scientist reports slow data loading times when training a large language model. The data is stored in a Ceph cluster. You suspect the client-side caching is not properly configured. <br \/>\r<br>Which Ceph configuration parameter(s) should you investigate and potentially adjust to improve data loading performance? Select all that apply.<\/div><input type='hidden' name='question_id[]' id='qID_1' value='445448' \/><input type='hidden' id='answerType445448' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445448[]' id='answer-id-1723521' class='answer   answerof-445448 ' value='1723521'   \/><label for='answer-id-1723521' id='answer-label-1723521' class=' answer'><span>client cache size<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445448[]' id='answer-id-1723522' class='answer   answerof-445448 ' value='1723522'   \/><label for='answer-id-1723522' id='answer-label-1723522' class=' answer'><span>client quota<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445448[]' id='answer-id-1723523' class='answer   answerof-445448 ' value='1723523'   \/><label for='answer-id-1723523' id='answer-label-1723523' class=' answer'><span>mds cache size<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445448[]' id='answer-id-1723524' class='answer   answerof-445448 ' value='1723524'   \/><label for='answer-id-1723524' id='answer-label-1723524' class=' answer'><span>fuse_client_max_background<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-2' style=';'><div id='questionWrap-2'  class='   watupro-question-id-445449'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>2. <\/span>You\u2019re deploying a new cluster with multiple NVIDIAAIOO GPUs per node. You want to ensure optimal inter-GPU communication performance using NVLink. <br \/>\r<br>Which of the following configurations are critical for achieving maximum NVLink bandwidth?<\/div><input type='hidden' name='question_id[]' id='qID_2' value='445449' \/><input type='hidden' id='answerType445449' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445449[]' id='answer-id-1723525' class='answer   answerof-445449 ' value='1723525'   \/><label for='answer-id-1723525' id='answer-label-1723525' class=' answer'><span>All GPUs within a node must be the same model and have identical firmware versions.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445449[]' id='answer-id-1723526' class='answer   answerof-445449 ' value='1723526'   \/><label for='answer-id-1723526' id='answer-label-1723526' class=' answer'><span>The motherboard must support PCle Gen5 to maximize NVLink bandwidth.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445449[]' id='answer-id-1723527' class='answer   answerof-445449 ' value='1723527'   \/><label for='answer-id-1723527' id='answer-label-1723527' class=' answer'><span>GPUs should be physically installed in slots that maximize direct NVLink connections based on the server\u2019s architecture.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445449[]' id='answer-id-1723528' class='answer   answerof-445449 ' value='1723528'   \/><label for='answer-id-1723528' id='answer-label-1723528' class=' answer'><span>The NVIDIA driver must be configured to enable NVLink; it is disabled by default.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445449[]' id='answer-id-1723529' class='answer   answerof-445449 ' value='1723529'   \/><label for='answer-id-1723529' id='answer-label-1723529' class=' answer'><span>The server must use a specific CPU model to leverage NVLink capabilities.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-3' style=';'><div id='questionWrap-3'  class='   watupro-question-id-445450'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>3. <\/span>When installing a GPU driver on a Linux system that already has a previous driver version installed, what is the recommended procedure to ensure a clean and stable installation?<\/div><input type='hidden' name='question_id[]' id='qID_3' value='445450' \/><input type='hidden' id='answerType445450' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445450[]' id='answer-id-1723530' class='answer   answerof-445450 ' value='1723530'   \/><label for='answer-id-1723530' id='answer-label-1723530' class=' answer'><span>Simply install the new driver package using \u2018apt install\u2019 or \u2018yum install\u2019 without removing the old driver.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445450[]' id='answer-id-1723531' class='answer   answerof-445450 ' value='1723531'   \/><label for='answer-id-1723531' id='answer-label-1723531' class=' answer'><span>Blacklist the nouveau driver, download the CUDA toolkit, and run the installation script with default options.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445450[]' id='answer-id-1723532' class='answer   answerof-445450 ' value='1723532'   \/><label for='answer-id-1723532' id='answer-label-1723532' class=' answer'><span>Purge the existing NVIDIA driver packages using \u2018apt purge nvidia- or \u2018yum remove nvidia- s, reboot the system, and then install the new driver package.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445450[]' id='answer-id-1723533' class='answer   answerof-445450 ' value='1723533'   \/><label for='answer-id-1723533' id='answer-label-1723533' class=' answer'><span>Run \u2018nvidia-uninstall\u2019 if it exists, otherwise manually remove the NVIDIA kernel modules and libraries from \u2018\/lib\/modules\u2019 and \u2018\/usr\/lib\u2019.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445450[]' id='answer-id-1723534' class='answer   answerof-445450 ' value='1723534'   \/><label for='answer-id-1723534' id='answer-label-1723534' class=' answer'><span>Install the new driver using the .run\u2019 file from NVIDIA\u2019s website, accepting all default options.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-4' style=';'><div id='questionWrap-4'  class='   watupro-question-id-445451'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>4. <\/span>You are configuring network fabric ports for NVIDIA GPUs in a server. The GPUs are connected to the network via PCIe. <br \/>\r<br>What is the primary factor that determines the maximum achievable bandwidth between the GPUs and the network?<\/div><input type='hidden' name='question_id[]' id='qID_4' value='445451' \/><input type='hidden' id='answerType445451' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445451[]' id='answer-id-1723535' class='answer   answerof-445451 ' value='1723535'   \/><label for='answer-id-1723535' id='answer-label-1723535' class=' answer'><span>The clock speed of the CP<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445451[]' id='answer-id-1723536' class='answer   answerof-445451 ' value='1723536'   \/><label for='answer-id-1723536' id='answer-label-1723536' class=' answer'><span>The amount of system RA<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445451[]' id='answer-id-1723537' class='answer   answerof-445451 ' value='1723537'   \/><label for='answer-id-1723537' id='answer-label-1723537' class=' answer'><span>The PCIe generation and number of lanes connecting the GPUs to the network adapter (e.g., PCIe 4.0 x16).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445451[]' id='answer-id-1723538' class='answer   answerof-445451 ' value='1723538'   \/><label for='answer-id-1723538' id='answer-label-1723538' class=' answer'><span>The speed of the system\u2019s hard drives or SSDs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445451[]' id='answer-id-1723539' class='answer   answerof-445451 ' value='1723539'   \/><label for='answer-id-1723539' id='answer-label-1723539' class=' answer'><span>The color of the Ethernet cables.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-5' style=';'><div id='questionWrap-5'  class='   watupro-question-id-445452'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>5. <\/span>You are tasked with setting up network fabric ports to connect several servers, each with multiple NVIDIA GPUs, to an InfiniBand switch. Each server has two ConnectX-6 adapters. <br \/>\r<br>What is the best strategy to maximize bandwidth and redundancy between the servers and the InfiniBand fabric?<\/div><input type='hidden' name='question_id[]' id='qID_5' value='445452' \/><input type='hidden' id='answerType445452' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445452[]' id='answer-id-1723540' class='answer   answerof-445452 ' value='1723540'   \/><label for='answer-id-1723540' id='answer-label-1723540' class=' answer'><span>Connect only one adapter from each server to the switch to minimize cable clutter.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445452[]' id='answer-id-1723541' class='answer   answerof-445452 ' value='1723541'   \/><label for='answer-id-1723541' id='answer-label-1723541' class=' answer'><span>Connect both adapters from each server to the same switch, but do not configure link aggregation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445452[]' id='answer-id-1723542' class='answer   answerof-445452 ' value='1723542'   \/><label for='answer-id-1723542' id='answer-label-1723542' class=' answer'><span>Connect both adapters from each server to the same switch and configure link aggregation (LACP or static LAG) on both the server and the switch.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445452[]' id='answer-id-1723543' class='answer   answerof-445452 ' value='1723543'   \/><label for='answer-id-1723543' id='answer-label-1723543' class=' answer'><span>Connect one adapter from each server to one switch, and the second adapter to a different switch, without link aggregation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445452[]' id='answer-id-1723544' class='answer   answerof-445452 ' value='1723544'   \/><label for='answer-id-1723544' id='answer-label-1723544' class=' answer'><span>Connect one adapter from each server to one switch, and the second adapter to a different switch, and configure multi-pathing on the servers.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-6' style=';'><div id='questionWrap-6'  class='   watupro-question-id-445453'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>6. <\/span>You are deploying a multi-tenant A1 infrastructure with strict isolation requirements. <br \/>\r<br>Which network technology would be most suitable for creating isolated virtual networks for each tenant?<\/div><input type='hidden' name='question_id[]' id='qID_6' value='445453' \/><input type='hidden' id='answerType445453' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445453[]' id='answer-id-1723545' class='answer   answerof-445453 ' value='1723545'   \/><label for='answer-id-1723545' id='answer-label-1723545' class=' answer'><span>VLANs (Virtual LANs)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445453[]' id='answer-id-1723546' class='answer   answerof-445453 ' value='1723546'   \/><label for='answer-id-1723546' id='answer-label-1723546' class=' answer'><span>VXLAN (Virtual Extensible LAN)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445453[]' id='answer-id-1723547' class='answer   answerof-445453 ' value='1723547'   \/><label for='answer-id-1723547' id='answer-label-1723547' class=' answer'><span>QinQ (802. lad)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445453[]' id='answer-id-1723548' class='answer   answerof-445453 ' value='1723548'   \/><label for='answer-id-1723548' id='answer-label-1723548' class=' answer'><span>GRE (Generic Routing Encapsulation)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445453[]' id='answer-id-1723549' class='answer   answerof-445453 ' value='1723549'   \/><label for='answer-id-1723549' id='answer-label-1723549' class=' answer'><span>IPsec<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-7' style=';'><div id='questionWrap-7'  class='   watupro-question-id-445454'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>7. <\/span>You are setting up a virtualized environment (using VMware vSphere) to run GPU-accelerated workloads. You have multiple physical GPUs in your server and want to assign specific GPUs to different virtual machines (VMs) for dedicated access. <br \/>\r<br>Which vSphere technology would BEST support this?<\/div><input type='hidden' name='question_id[]' id='qID_7' value='445454' \/><input type='hidden' id='answerType445454' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445454[]' id='answer-id-1723550' class='answer   answerof-445454 ' value='1723550'   \/><label for='answer-id-1723550' id='answer-label-1723550' class=' answer'><span>VMware vMotion<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445454[]' id='answer-id-1723551' class='answer   answerof-445454 ' value='1723551'   \/><label for='answer-id-1723551' id='answer-label-1723551' class=' answer'><span>VMware High Availability (HA)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445454[]' id='answer-id-1723552' class='answer   answerof-445454 ' value='1723552'   \/><label for='answer-id-1723552' id='answer-label-1723552' class=' answer'><span>VMware DirectPath I\/O (Passthrough)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445454[]' id='answer-id-1723553' class='answer   answerof-445454 ' value='1723553'   \/><label for='answer-id-1723553' id='answer-label-1723553' class=' answer'><span>VMware vGPU<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445454[]' id='answer-id-1723554' class='answer   answerof-445454 ' value='1723554'   \/><label for='answer-id-1723554' id='answer-label-1723554' class=' answer'><span>VMware DRS (Distributed Resource Scheduler)<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-8' style=';'><div id='questionWrap-8'  class='   watupro-question-id-445455'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>8. <\/span>You are experiencing link flapping (frequent up\/down transitions) on several InfiniBand links in your AI infrastructure. This is causing intermittent connectivity issues and performance degradation. <br \/>\r<br>What are the MOST likely causes of this issue, and what steps should you take to troubleshoot and resolve it? (Select TWO)<\/div><input type='hidden' name='question_id[]' id='qID_8' value='445455' \/><input type='hidden' id='answerType445455' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445455[]' id='answer-id-1723555' class='answer   answerof-445455 ' value='1723555'   \/><label for='answer-id-1723555' id='answer-label-1723555' class=' answer'><span>Incorrect MTU (Maximum Transmission Unit) configuration on the affected interfaces.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445455[]' id='answer-id-1723556' class='answer   answerof-445455 ' value='1723556'   \/><label for='answer-id-1723556' id='answer-label-1723556' class=' answer'><span>Faulty or damaged cables, connectors, or transceivers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445455[]' id='answer-id-1723557' class='answer   answerof-445455 ' value='1723557'   \/><label for='answer-id-1723557' id='answer-label-1723557' class=' answer'><span>Software bugs in the operating system or InfiniBand drivers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445455[]' id='answer-id-1723558' class='answer   answerof-445455 ' value='1723558'   \/><label for='answer-id-1723558' id='answer-label-1723558' class=' answer'><span>Mismatched link speeds or duplex settings between connected devices.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445455[]' id='answer-id-1723559' class='answer   answerof-445455 ' value='1723559'   \/><label for='answer-id-1723559' id='answer-label-1723559' class=' answer'><span>Excessive broadcast traffic causing congestion.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-9' style=';'><div id='questionWrap-9'  class='   watupro-question-id-445456'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>9. <\/span>Which of the following are key benefits of using NVIDIA NVLink&#65533; Switch in a multi-GPU server setup for AI and deep learning workloads?<\/div><input type='hidden' name='question_id[]' id='qID_9' value='445456' \/><input type='hidden' id='answerType445456' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445456[]' id='answer-id-1723560' class='answer   answerof-445456 ' value='1723560'   \/><label for='answer-id-1723560' id='answer-label-1723560' class=' answer'><span>Increased GPU-to-GPIJ communication bandwidth.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445456[]' id='answer-id-1723561' class='answer   answerof-445456 ' value='1723561'   \/><label for='answer-id-1723561' id='answer-label-1723561' class=' answer'><span>Reduced latency in inter-GPU data transfers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445456[]' id='answer-id-1723562' class='answer   answerof-445456 ' value='1723562'   \/><label for='answer-id-1723562' id='answer-label-1723562' class=' answer'><span>Simplified GPU resource management.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445456[]' id='answer-id-1723563' class='answer   answerof-445456 ' value='1723563'   \/><label for='answer-id-1723563' id='answer-label-1723563' class=' answer'><span>Support for larger GPU memory pools than a single server can physically accommodate.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445456[]' id='answer-id-1723564' class='answer   answerof-445456 ' value='1723564'   \/><label for='answer-id-1723564' id='answer-label-1723564' class=' answer'><span>Enhanced security features compared to PCle based interconnections.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-10' style=';'><div id='questionWrap-10'  class='   watupro-question-id-445457'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>10. <\/span>An AI server with 8 GPUs is experiencing random system crashes under heavy load. The system logs indicate potential memory errors, but standard memory tests (memtest86+) pass without any failures. The GPUs are passively cooled. <br \/>\r<br>What are the THREE most likely root causes of these crashes?<\/div><input type='hidden' name='question_id[]' id='qID_10' value='445457' \/><input type='hidden' id='answerType445457' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445457[]' id='answer-id-1723565' class='answer   answerof-445457 ' value='1723565'   \/><label for='answer-id-1723565' id='answer-label-1723565' class=' answer'><span>Incompatible NVIDIA driver version with the installed Linux kernel.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445457[]' id='answer-id-1723566' class='answer   answerof-445457 ' value='1723566'   \/><label for='answer-id-1723566' id='answer-label-1723566' class=' answer'><span>GPIJ memory errors that are not detectable by standard CPU-based memory tests.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445457[]' id='answer-id-1723567' class='answer   answerof-445457 ' value='1723567'   \/><label for='answer-id-1723567' id='answer-label-1723567' class=' answer'><span>Insufficient airflow within the server, leading to overheating of the GPUs and VRMs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445457[]' id='answer-id-1723568' class='answer   answerof-445457 ' value='1723568'   \/><label for='answer-id-1723568' id='answer-label-1723568' class=' answer'><span>A faulty power supply unit (PSU) that is unable to provide stable power under peak load.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445457[]' id='answer-id-1723569' class='answer   answerof-445457 ' value='1723569'   \/><label for='answer-id-1723569' id='answer-label-1723569' class=' answer'><span>Network congestion causing intermittent data corruption during distributed training.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-11' style=';'><div id='questionWrap-11'  class='   watupro-question-id-445458'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>11. <\/span>You have a server equipped with multiple NVIDIA GPUs connected via NVLink. You want to monitor the NVLink bandwidth utilization in real-time. <br \/>\r<br>Which tool or method is the most appropriate and accurate for this?<\/div><input type='hidden' name='question_id[]' id='qID_11' value='445458' \/><input type='hidden' id='answerType445458' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445458[]' id='answer-id-1723570' class='answer   answerof-445458 ' value='1723570'   \/><label for='answer-id-1723570' id='answer-label-1723570' class=' answer'><span>Using \u2018nvidia-smi\u2019 with the \u2018\u2015display=nvlink\u2019 option.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445458[]' id='answer-id-1723571' class='answer   answerof-445458 ' value='1723571'   \/><label for='answer-id-1723571' id='answer-label-1723571' class=' answer'><span>Parsing the output of *nvprof during a representative workload.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445458[]' id='answer-id-1723572' class='answer   answerof-445458 ' value='1723572'   \/><label for='answer-id-1723572' id='answer-label-1723572' class=' answer'><span>Utilizing DCGM (Data Center GPU Manager) with its NVLink monitoring capabilities.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445458[]' id='answer-id-1723573' class='answer   answerof-445458 ' value='1723573'   \/><label for='answer-id-1723573' id='answer-label-1723573' class=' answer'><span>Monitoring network interface traffic using \u2018iftop\u2019 or \u2018tcpdump\u2019 .<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445458[]' id='answer-id-1723574' class='answer   answerof-445458 ' value='1723574'   \/><label for='answer-id-1723574' id='answer-label-1723574' class=' answer'><span>Using \u2018gpustat\u2019 .<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-12' style=';'><div id='questionWrap-12'  class='   watupro-question-id-445459'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>12. <\/span>You suspect a power supply issue is causing intermittent GPU failures in a server with four NVIDIAAIOO GPUs. The server is rated for a peak power consumption of 3000W. You have a power meter available. <br \/>\r<br>Which of the following methods provides the most accurate assessment of the server\u2019s power consumption under full GPU load?<\/div><input type='hidden' name='question_id[]' id='qID_12' value='445459' \/><input type='hidden' id='answerType445459' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445459[]' id='answer-id-1723575' class='answer   answerof-445459 ' value='1723575'   \/><label for='answer-id-1723575' id='answer-label-1723575' class=' answer'><span>Run \u2018nvidia-smi\u2019 and sum the reported power consumption for each GPI<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445459[]' id='answer-id-1723576' class='answer   answerof-445459 ' value='1723576'   \/><label for='answer-id-1723576' id='answer-label-1723576' class=' answer'><span>Use the power meter to measure the server\u2019s power consumption at idle and multiply by four.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445459[]' id='answer-id-1723577' class='answer   answerof-445459 ' value='1723577'   \/><label for='answer-id-1723577' id='answer-label-1723577' class=' answer'><span>Use the power meter to measure the server\u2019s power consumption while running a synthetic benchmark that fully utilizes all GPIJs simultaneously.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445459[]' id='answer-id-1723578' class='answer   answerof-445459 ' value='1723578'   \/><label for='answer-id-1723578' id='answer-label-1723578' class=' answer'><span>Check the server\u2019s BIOS for power consumption readings.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445459[]' id='answer-id-1723579' class='answer   answerof-445459 ' value='1723579'   \/><label for='answer-id-1723579' id='answer-label-1723579' class=' answer'><span>Add the maximum power rating of each GPU to the CPU\u2019s TDP (Thermal Design Power).<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-13' style=';'><div id='questionWrap-13'  class='   watupro-question-id-445460'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>13. <\/span>Which command-line tool is typically used to monitor the status and performance of an NVIDIA NVLink&#65533; Switch?<\/div><input type='hidden' name='question_id[]' id='qID_13' value='445460' \/><input type='hidden' id='answerType445460' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445460[]' id='answer-id-1723580' class='answer   answerof-445460 ' value='1723580'   \/><label for='answer-id-1723580' id='answer-label-1723580' class=' answer'><span>nvidia-smi<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445460[]' id='answer-id-1723581' class='answer   answerof-445460 ' value='1723581'   \/><label for='answer-id-1723581' id='answer-label-1723581' class=' answer'><span>nvswitch-cli<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445460[]' id='answer-id-1723582' class='answer   answerof-445460 ' value='1723582'   \/><label for='answer-id-1723582' id='answer-label-1723582' class=' answer'><span>ibstat<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445460[]' id='answer-id-1723583' class='answer   answerof-445460 ' value='1723583'   \/><label for='answer-id-1723583' id='answer-label-1723583' class=' answer'><span>rocminfo<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445460[]' id='answer-id-1723584' class='answer   answerof-445460 ' value='1723584'   \/><label for='answer-id-1723584' id='answer-label-1723584' class=' answer'><span>Ispci<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-14' style=';'><div id='questionWrap-14'  class='   watupro-question-id-445461'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>14. <\/span>You are tasked with replacing a redundant power supply unit (PSU) in a GPU server. The server has two 2000W PSUs. One PSU has failed, but the server is still running. <br \/>\r<br>Which of the following actions is the safest and most efficient way to replace the faulty PSU?<\/div><input type='hidden' name='question_id[]' id='qID_14' value='445461' \/><input type='hidden' id='answerType445461' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445461[]' id='answer-id-1723585' class='answer   answerof-445461 ' value='1723585'   \/><label for='answer-id-1723585' id='answer-label-1723585' class=' answer'><span>Immediately power down the server and replace the faulty PSI<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445461[]' id='answer-id-1723586' class='answer   answerof-445461 ' value='1723586'   \/><label for='answer-id-1723586' id='answer-label-1723586' class=' answer'><span>Hot-swap the faulty PSU with a new one while the server is running.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445461[]' id='answer-id-1723587' class='answer   answerof-445461 ' value='1723587'   \/><label for='answer-id-1723587' id='answer-label-1723587' class=' answer'><span>Wait for a scheduled maintenance window to power down the server and replace the PS<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445461[]' id='answer-id-1723588' class='answer   answerof-445461 ' value='1723588'   \/><label for='answer-id-1723588' id='answer-label-1723588' class=' answer'><span>Replace the faulty PSU, then reboot the server to ensure the new PSU is working.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445461[]' id='answer-id-1723589' class='answer   answerof-445461 ' value='1723589'   \/><label for='answer-id-1723589' id='answer-label-1723589' class=' answer'><span>Document the failure and wait until the remaining PSU fails before taking action.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-15' style=';'><div id='questionWrap-15'  class='   watupro-question-id-445462'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>15. <\/span>Which of the following statements regarding VXLAN (Virtual Extensible LAN) is MOST accurate in the context of data center networking for AI\/ML workloads?<\/div><input type='hidden' name='question_id[]' id='qID_15' value='445462' \/><input type='hidden' id='answerType445462' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445462[]' id='answer-id-1723590' class='answer   answerof-445462 ' value='1723590'   \/><label for='answer-id-1723590' id='answer-label-1723590' class=' answer'><span>VXLAN provides Layer 2 connectivity across Layer 3 networks, enabling virtual machine mobility.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445462[]' id='answer-id-1723591' class='answer   answerof-445462 ' value='1723591'   \/><label for='answer-id-1723591' id='answer-label-1723591' class=' answer'><span>VXLAN primarily improves network security by encrypting all traffic.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445462[]' id='answer-id-1723592' class='answer   answerof-445462 ' value='1723592'   \/><label for='answer-id-1723592' id='answer-label-1723592' class=' answer'><span>VXLAN is only suitable for small-scale networks due to its limited scalability.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445462[]' id='answer-id-1723593' class='answer   answerof-445462 ' value='1723593'   \/><label for='answer-id-1723593' id='answer-label-1723593' class=' answer'><span>VXLAN reduces network overhead compared to traditional VLANs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445462[]' id='answer-id-1723594' class='answer   answerof-445462 ' value='1723594'   \/><label for='answer-id-1723594' id='answer-label-1723594' class=' answer'><span>VXLAN requires specialized hardware and cannot be implemented in software.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-16' style=';'><div id='questionWrap-16'  class='   watupro-question-id-445463'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>16. <\/span>You\u2019re debugging performance issues in a distributed training job. \u2018nvidia-smi\u2019 shows consistently high GPU utilization across all nodes, but the training speed isn\u2019t increasing linearly with the number of GPUs. Network bandwidth is sufficient. <br \/>\r<br>What is the most likely bottleneck?<\/div><input type='hidden' name='question_id[]' id='qID_16' value='445463' \/><input type='hidden' id='answerType445463' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445463[]' id='answer-id-1723595' class='answer   answerof-445463 ' value='1723595'   \/><label for='answer-id-1723595' id='answer-label-1723595' class=' answer'><span>Inefficient data loading and preprocessing pipeline, causing GPUs to wait for data.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445463[]' id='answer-id-1723596' class='answer   answerof-445463 ' value='1723596'   \/><label for='answer-id-1723596' id='answer-label-1723596' class=' answer'><span>NCCL is not configured optimally for the network topology, leading to high communication overhead.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445463[]' id='answer-id-1723597' class='answer   answerof-445463 ' value='1723597'   \/><label for='answer-id-1723597' id='answer-label-1723597' class=' answer'><span>The learning rate is not adjusted appropriately for the increased batch size across multiple GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445463[]' id='answer-id-1723598' class='answer   answerof-445463 ' value='1723598'   \/><label for='answer-id-1723598' id='answer-label-1723598' class=' answer'><span>The global batch size has exceeded the optimal point for the model, reducing per-sample accuracy and slowing convergence.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445463[]' id='answer-id-1723599' class='answer   answerof-445463 ' value='1723599'   \/><label for='answer-id-1723599' id='answer-label-1723599' class=' answer'><span>CUDA Graphs is not being utilized.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-17' style=';'><div id='questionWrap-17'  class='   watupro-question-id-445464'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>17. <\/span>You are deploying a new NVLink Switch based cluster. The GPUs are installed in different servers, but need to be configured to utilize <br \/>\r<br>NVLink interconnect. <br \/>\r<br>Which of the following should be performed during the installation phase to confirm correct configuration?<\/div><input type='hidden' name='question_id[]' id='qID_17' value='445464' \/><input type='hidden' id='answerType445464' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445464[]' id='answer-id-1723600' class='answer   answerof-445464 ' value='1723600'   \/><label for='answer-id-1723600' id='answer-label-1723600' class=' answer'><span>Run NCCL tests to verify the GPU-to-GPU bandwidth and latency between servers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445464[]' id='answer-id-1723601' class='answer   answerof-445464 ' value='1723601'   \/><label for='answer-id-1723601' id='answer-label-1723601' class=' answer'><span>Verify that GPUDirect RDMA is enabled and functioning correctly.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445464[]' id='answer-id-1723602' class='answer   answerof-445464 ' value='1723602'   \/><label for='answer-id-1723602' id='answer-label-1723602' class=' answer'><span>Check that the \u2018nvidia-sm\u2019 command shows the correct NVLink topology.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445464[]' id='answer-id-1723603' class='answer   answerof-445464 ' value='1723603'   \/><label for='answer-id-1723603' id='answer-label-1723603' class=' answer'><span>Run standard TCP\/IP network bandwidth tests to check inter-server communication.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445464[]' id='answer-id-1723604' class='answer   answerof-445464 ' value='1723604'   \/><label for='answer-id-1723604' id='answer-label-1723604' class=' answer'><span>All the GPU\u2019s are in the same IP subnet<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-18' style=';'><div id='questionWrap-18'  class='   watupro-question-id-445465'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>18. <\/span>You\u2019ve installed a server with multiple NVIDIAAIOO GPUs intended for use with Kubernetes and NVIDIA\u2019s GPU Operaton After installing the GPU Operator, you notice that the GPUs are not being properly detected and managed by Kubernetes. <br \/>\r<br>Which of the following are potential causes and troubleshooting steps you should take?<\/div><input type='hidden' name='question_id[]' id='qID_18' value='445465' \/><input type='hidden' id='answerType445465' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445465[]' id='answer-id-1723605' class='answer   answerof-445465 ' value='1723605'   \/><label for='answer-id-1723605' id='answer-label-1723605' class=' answer'><span>The NVIDIA drivers are not properly installed on the host operating system before installing the GPU Operator. Verify the driver installation using \u2018nvidia-smr.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445465[]' id='answer-id-1723606' class='answer   answerof-445465 ' value='1723606'   \/><label for='answer-id-1723606' id='answer-label-1723606' class=' answer'><span>The Kubernetes nodes are not labeled correctly to indicate the presence of NVIDIA GPUs. Use \u2018kubectl label node nvidia.com\/gpu.present=true\u2019.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445465[]' id='answer-id-1723607' class='answer   answerof-445465 ' value='1723607'   \/><label for='answer-id-1723607' id='answer-label-1723607' class=' answer'><span>The NVIDIA Container Toolkit is not installed on the Kubernetes nodes. Install the toolkit according to NVIDIA\u2019s documentation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445465[]' id='answer-id-1723608' class='answer   answerof-445465 ' value='1723608'   \/><label for='answer-id-1723608' id='answer-label-1723608' class=' answer'><span>The GPU Operator\u2019s configuration is incorrect, preventing it from properly discovering and managing the GPUs. Check the GPU Operator\u2019s logs and configuration files.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445465[]' id='answer-id-1723609' class='answer   answerof-445465 ' value='1723609'   \/><label for='answer-id-1723609' id='answer-label-1723609' class=' answer'><span>The \u2018nvidia-docker2 runtime is not set as the default runtime in \u2018\/etc\/docker\/daemon.json\u2019. Change the default runtime to \u2018nvidia\u2019 and restart the Docker daemon.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-19' style=';'><div id='questionWrap-19'  class='   watupro-question-id-445466'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>19. <\/span>You\u2019re configuring a RoCEv2 network for your AI infrastructure. <br \/>\r<br>Which UDP port number range is commonly used for RoCEv2 traffic, and why is it important to be aware of this?<\/div><input type='hidden' name='question_id[]' id='qID_19' value='445466' \/><input type='hidden' id='answerType445466' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445466[]' id='answer-id-1723610' class='answer   answerof-445466 ' value='1723610'   \/><label for='answer-id-1723610' id='answer-label-1723610' class=' answer'><span>0-1023, because these are well-known ports.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445466[]' id='answer-id-1723611' class='answer   answerof-445466 ' value='1723611'   \/><label for='answer-id-1723611' id='answer-label-1723611' class=' answer'><span>4791, which is reserved for VXLA<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445466[]' id='answer-id-1723612' class='answer   answerof-445466 ' value='1723612'   \/><label for='answer-id-1723612' id='answer-label-1723612' class=' answer'><span>49152-65535, the dynamic\/private port range, to avoid conflicts with other services.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445466[]' id='answer-id-1723613' class='answer   answerof-445466 ' value='1723613'   \/><label for='answer-id-1723613' id='answer-label-1723613' class=' answer'><span>1024-49151, the registered port range, for general application use.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445466[]' id='answer-id-1723614' class='answer   answerof-445466 ' value='1723614'   \/><label for='answer-id-1723614' id='answer-label-1723614' class=' answer'><span>Any UDP port number can be used without issue.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-20' style=';'><div id='questionWrap-20'  class='   watupro-question-id-445467'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>20. <\/span>You are installing a GPU server in a data center with limited cooling capacity. <br \/>\r<br>Which of the following server configuration choices would BEST help minimize the server\u2019s thermal output, without significantly compromising performance? Assume all options are compatible.<\/div><input type='hidden' name='question_id[]' id='qID_20' value='445467' \/><input type='hidden' id='answerType445467' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445467[]' id='answer-id-1723615' class='answer   answerof-445467 ' value='1723615'   \/><label for='answer-id-1723615' id='answer-label-1723615' class=' answer'><span>Choose GPUs with a lower TDP (Thermal Design Power), even if it means using older generation GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445467[]' id='answer-id-1723616' class='answer   answerof-445467 ' value='1723616'   \/><label for='answer-id-1723616' id='answer-label-1723616' class=' answer'><span>Use a passively cooled CPU to reduce fan noise and power consumption.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445467[]' id='answer-id-1723617' class='answer   answerof-445467 ' value='1723617'   \/><label for='answer-id-1723617' id='answer-label-1723617' class=' answer'><span>Configure the BIOS\/UEFI to aggressively throttle CPU and GPU frequencies under heavy load.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445467[]' id='answer-id-1723618' class='answer   answerof-445467 ' value='1723618'   \/><label for='answer-id-1723618' id='answer-label-1723618' class=' answer'><span>Implement liquid cooling for the GPUs and CPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445467[]' id='answer-id-1723619' class='answer   answerof-445467 ' value='1723619'   \/><label for='answer-id-1723619' id='answer-label-1723619' class=' answer'><span>Increase the ambient temperature of the data center to reduce the temperature differential.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-21' style=';'><div id='questionWrap-21'  class='   watupro-question-id-445468'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>21. <\/span>Consider a scenario where you are using NCCL (NVIDIA Collective Communications Library) for multi-GPU training across multiple servers connected via NVLink switches. <br \/>\r<br>Which NCCL environment variable would you use to specify the network interface to be used for communication?<\/div><input type='hidden' name='question_id[]' id='qID_21' value='445468' \/><input type='hidden' id='answerType445468' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445468[]' id='answer-id-1723620' class='answer   answerof-445468 ' value='1723620'   \/><label for='answer-id-1723620' id='answer-label-1723620' class=' answer'><span>NCCL PORT<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445468[]' id='answer-id-1723621' class='answer   answerof-445468 ' value='1723621'   \/><label for='answer-id-1723621' id='answer-label-1723621' class=' answer'><span>NCCL SOCKET IFNAME<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445468[]' id='answer-id-1723622' class='answer   answerof-445468 ' value='1723622'   \/><label for='answer-id-1723622' id='answer-label-1723622' class=' answer'><span>NCCL NET INTERFACE<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445468[]' id='answer-id-1723623' class='answer   answerof-445468 ' value='1723623'   \/><label for='answer-id-1723623' id='answer-label-1723623' class=' answer'><span>NCCL 1B HCA<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445468[]' id='answer-id-1723624' class='answer   answerof-445468 ' value='1723624'   \/><label for='answer-id-1723624' id='answer-label-1723624' class=' answer'><span>NCCL COMM ID<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-22' style=';'><div id='questionWrap-22'  class='   watupro-question-id-445469'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>22. <\/span>You are tasked with optimizing storage performance for a deep learning training job on an NVIDIA DGX server. The training data consists of millions of small image files. <br \/>\r<br>Which of the following storage optimization techniques would be MOST effective in reducing I\/O bottlenecks?<\/div><input type='hidden' name='question_id[]' id='qID_22' value='445469' \/><input type='hidden' id='answerType445469' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445469[]' id='answer-id-1723625' class='answer   answerof-445469 ' value='1723625'   \/><label for='answer-id-1723625' id='answer-label-1723625' class=' answer'><span>Implementing RAID 0 across all storage devices.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445469[]' id='answer-id-1723626' class='answer   answerof-445469 ' value='1723626'   \/><label for='answer-id-1723626' id='answer-label-1723626' class=' answer'><span>Using a distributed file system with data striping across multiple storage nodes.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445469[]' id='answer-id-1723627' class='answer   answerof-445469 ' value='1723627'   \/><label for='answer-id-1723627' id='answer-label-1723627' class=' answer'><span>Enabling data compression on the storage volume.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445469[]' id='answer-id-1723628' class='answer   answerof-445469 ' value='1723628'   \/><label for='answer-id-1723628' id='answer-label-1723628' class=' answer'><span>Increasing the block size of the file system to the maximum supported value.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445469[]' id='answer-id-1723629' class='answer   answerof-445469 ' value='1723629'   \/><label for='answer-id-1723629' id='answer-label-1723629' class=' answer'><span>Implementing a tiered storage system with NVMe drives for frequently accessed data and HDDs for less frequently accessed data.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-23' style=';'><div id='questionWrap-23'  class='   watupro-question-id-445470'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>23. <\/span>You are using NVIDIA Spectrum-X switches in your A1 infrastructure. You observe high latency between two GPU servers during a large distributed training job. After analyzing the switch telemetry, you suspect a suboptimal routing path is contributing to the problem. <br \/>\r<br>Which of the following methods offers the MOST granular control for influencing traffic flow within the Spectrum-X fabric to mitigate this?<\/div><input type='hidden' name='question_id[]' id='qID_23' value='445470' \/><input type='hidden' id='answerType445470' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445470[]' id='answer-id-1723630' class='answer   answerof-445470 ' value='1723630'   \/><label for='answer-id-1723630' id='answer-label-1723630' class=' answer'><span>Adjust the Equal-Cost Multi-Path (ECMP) hashing algorithm globally on all switches.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445470[]' id='answer-id-1723631' class='answer   answerof-445470 ' value='1723631'   \/><label for='answer-id-1723631' id='answer-label-1723631' class=' answer'><span>Configure QOS (Quality of Service) policies to prioritize traffic from the high-latency GPU servers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445470[]' id='answer-id-1723632' class='answer   answerof-445470 ' value='1723632'   \/><label for='answer-id-1723632' id='answer-label-1723632' class=' answer'><span>Implement Adaptive Routing (AR) or Dynamic Load Balancing (DLB) features available on the Spectrum-X switches to dynamically adjust paths based on network conditions.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445470[]' id='answer-id-1723633' class='answer   answerof-445470 ' value='1723633'   \/><label for='answer-id-1723633' id='answer-label-1723633' class=' answer'><span>Manually configure static routes on the Spectrum-X switches to force traffic between the GPU servers along a specific path.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445470[]' id='answer-id-1723634' class='answer   answerof-445470 ' value='1723634'   \/><label for='answer-id-1723634' id='answer-label-1723634' class=' answer'><span>Disable IPv6 to simplify routing decisions.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-24' style=';'><div id='questionWrap-24'  class='   watupro-question-id-445471'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>24. <\/span>You are tasked with troubleshooting a performance bottleneck in a multi-node, multi-GPU deep learning training job utilizing Horovod. <br \/>\r<br>The training loss is decreasing, but the overall training time is significantly longer than expected. <br \/>\r<br>Which of the following monitoring approaches would provide the most insight into the cause of the bottleneck?<\/div><input type='hidden' name='question_id[]' id='qID_24' value='445471' \/><input type='hidden' id='answerType445471' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445471[]' id='answer-id-1723635' class='answer   answerof-445471 ' value='1723635'   \/><label for='answer-id-1723635' id='answer-label-1723635' class=' answer'><span>Using \u2018nvidia-smi\u2019 on each node to monitor GPU utilization and memory usage.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445471[]' id='answer-id-1723636' class='answer   answerof-445471 ' value='1723636'   \/><label for='answer-id-1723636' id='answer-label-1723636' class=' answer'><span>Enabling Horovod\u2019s timeline and profiling features to visualize the communication patterns and identify synchronization bottlenecks.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445471[]' id='answer-id-1723637' class='answer   answerof-445471 ' value='1723637'   \/><label for='answer-id-1723637' id='answer-label-1723637' class=' answer'><span>Monitoring network bandwidth utilization on each node using \u2018iftop\u2019 or \u2018iperf3\u2019<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445471[]' id='answer-id-1723638' class='answer   answerof-445471 ' value='1723638'   \/><label for='answer-id-1723638' id='answer-label-1723638' class=' answer'><span>Analyzing the training loss curve to identify potential issues with the model architecture or hyperparameters.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445471[]' id='answer-id-1723639' class='answer   answerof-445471 ' value='1723639'   \/><label for='answer-id-1723639' id='answer-label-1723639' class=' answer'><span>Using Shtop\u2019 to monitor CPIJ utilization on each node.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-25' style=';'><div id='questionWrap-25'  class='   watupro-question-id-445472'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>25. <\/span>You are running a distributed training job across multiple nodes, using a shared file system for storing training data. You observe that some nodes are consistently slower than others in reading data. <br \/>\r<br>Which of the following could be contributing factors to this performance discrepancy? Select all that apply.<\/div><input type='hidden' name='question_id[]' id='qID_25' value='445472' \/><input type='hidden' id='answerType445472' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445472[]' id='answer-id-1723640' class='answer   answerof-445472 ' value='1723640'   \/><label for='answer-id-1723640' id='answer-label-1723640' class=' answer'><span>Network congestion between the slower nodes and the storage system.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445472[]' id='answer-id-1723641' class='answer   answerof-445472 ' value='1723641'   \/><label for='answer-id-1723641' id='answer-label-1723641' class=' answer'><span>Uneven data distribution across the storage nodes.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445472[]' id='answer-id-1723642' class='answer   answerof-445472 ' value='1723642'   \/><label for='answer-id-1723642' id='answer-label-1723642' class=' answer'><span>Different CPU architectures on the nodes.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445472[]' id='answer-id-1723643' class='answer   answerof-445472 ' value='1723643'   \/><label for='answer-id-1723643' id='answer-label-1723643' class=' answer'><span>Insufficient RAM on the slower nodes for caching data.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445472[]' id='answer-id-1723644' class='answer   answerof-445472 ' value='1723644'   \/><label for='answer-id-1723644' id='answer-label-1723644' class=' answer'><span>Variations in the speed of the local temporary storage (e.g., \/tmp) used for intermediate files.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-26' style=';'><div id='questionWrap-26'  class='   watupro-question-id-445473'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>26. <\/span>After replacing a GPU in a multi-GPU server, you notice that the new GPU is consistently running at a lower clock speed than the other GPUs, even under load. *nvidia-smi\u2019 shows the \u2018Pwr\u2019 state as \u2018P8\u2019 for the new GPU, while the others are at \u2018PO\u2019. <br \/>\r<br>What is the MOST probable cause?<\/div><input type='hidden' name='question_id[]' id='qID_26' value='445473' \/><input type='hidden' id='answerType445473' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445473[]' id='answer-id-1723645' class='answer   answerof-445473 ' value='1723645'   \/><label for='answer-id-1723645' id='answer-label-1723645' class=' answer'><span>The new GPU is a lower-performance model than the other GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445473[]' id='answer-id-1723646' class='answer   answerof-445473 ' value='1723646'   \/><label for='answer-id-1723646' id='answer-label-1723646' class=' answer'><span>The driver is not properly recognizing the new GPU\u2019s capabilities; reinstall the driver.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445473[]' id='answer-id-1723647' class='answer   answerof-445473 ' value='1723647'   \/><label for='answer-id-1723647' id='answer-label-1723647' class=' answer'><span>The new GPU is not receiving sufficient power; check the power connections and PSU capacity.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445473[]' id='answer-id-1723648' class='answer   answerof-445473 ' value='1723648'   \/><label for='answer-id-1723648' id='answer-label-1723648' class=' answer'><span>The new GPU is overheating and throttling performance.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445473[]' id='answer-id-1723649' class='answer   answerof-445473 ' value='1723649'   \/><label for='answer-id-1723649' id='answer-label-1723649' class=' answer'><span>The new GPU requires a firmware update that hasn\u2019t been applied.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-27' style=';'><div id='questionWrap-27'  class='   watupro-question-id-445474'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>27. <\/span>An InfiniBand fabric is experiencing intermittent packet loss between two high-performance compute nodes. You suspect a faulty cable or connector. <br \/>\r<br>Besides physically inspecting the cables, what software-based tools or techniques can you employ to diagnose potential link errors contributing to this packet loss?<\/div><input type='hidden' name='question_id[]' id='qID_27' value='445474' \/><input type='hidden' id='answerType445474' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445474[]' id='answer-id-1723650' class='answer   answerof-445474 ' value='1723650'   \/><label for='answer-id-1723650' id='answer-label-1723650' class=' answer'><span>Use \u2018ibdiagnet\u2019 to perform a comprehensive fabric analysis, including link integrity checks and error detection.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445474[]' id='answer-id-1723651' class='answer   answerof-445474 ' value='1723651'   \/><label for='answer-id-1723651' id='answer-label-1723651' class=' answer'><span>Monitor the port counters on the InfiniBand switches connected to the compute nodes. Look for excessive CRC errors, symbol errors, or other link-related error counts.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445474[]' id='answer-id-1723652' class='answer   answerof-445474 ' value='1723652'   \/><label for='answer-id-1723652' id='answer-label-1723652' class=' answer'><span>Run \u2018ipeff or \u2018ibperf between the two compute nodes and analyze the reported packet loss rate. Correlate this with the error counters on the switches.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445474[]' id='answer-id-1723653' class='answer   answerof-445474 ' value='1723653'   \/><label for='answer-id-1723653' id='answer-label-1723653' class=' answer'><span>All of the above<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445474[]' id='answer-id-1723654' class='answer   answerof-445474 ' value='1723654'   \/><label for='answer-id-1723654' id='answer-label-1723654' class=' answer'><span>Disable port mirroring.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-28' style=';'><div id='questionWrap-28'  class='   watupro-question-id-445475'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>28. <\/span>You are deploying a new A1 cluster using RoCEv2 over a lossless Ethernet fabric. <br \/>\r<br>Which of the following QOS (Quality of Service) mechanisms is critical for ensuring reliable RDMA communication?<\/div><input type='hidden' name='question_id[]' id='qID_28' value='445475' \/><input type='hidden' id='answerType445475' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445475[]' id='answer-id-1723655' class='answer   answerof-445475 ' value='1723655'   \/><label for='answer-id-1723655' id='answer-label-1723655' class=' answer'><span>DSCP (Differentiated Services Code Point) marking<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445475[]' id='answer-id-1723656' class='answer   answerof-445475 ' value='1723656'   \/><label for='answer-id-1723656' id='answer-label-1723656' class=' answer'><span>ECN (Explicit Congestion Notification)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445475[]' id='answer-id-1723657' class='answer   answerof-445475 ' value='1723657'   \/><label for='answer-id-1723657' id='answer-label-1723657' class=' answer'><span>PFC (Priority Flow control)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445475[]' id='answer-id-1723658' class='answer   answerof-445475 ' value='1723658'   \/><label for='answer-id-1723658' id='answer-label-1723658' class=' answer'><span>ACL (Access Control List)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445475[]' id='answer-id-1723659' class='answer   answerof-445475 ' value='1723659'   \/><label for='answer-id-1723659' id='answer-label-1723659' class=' answer'><span>Rate Limiting<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-29' style=';'><div id='questionWrap-29'  class='   watupro-question-id-445476'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>29. <\/span>After upgrading the network card drivers on your A1 inference server, you experience intermittent network connectivity issues, including packet loss and high latency. You\u2019ve verified that the physical connections are secure. <br \/>\r<br>Which of the following steps would be most effective in troubleshooting this issue?<\/div><input type='hidden' name='question_id[]' id='qID_29' value='445476' \/><input type='hidden' id='answerType445476' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445476[]' id='answer-id-1723660' class='answer   answerof-445476 ' value='1723660'   \/><label for='answer-id-1723660' id='answer-label-1723660' class=' answer'><span>Roll back the network card drivers to the previous version.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445476[]' id='answer-id-1723661' class='answer   answerof-445476 ' value='1723661'   \/><label for='answer-id-1723661' id='answer-label-1723661' class=' answer'><span>Check the system logs for error messages related to the network card or driver.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445476[]' id='answer-id-1723662' class='answer   answerof-445476 ' value='1723662'   \/><label for='answer-id-1723662' id='answer-label-1723662' class=' answer'><span>Run network diagnostic tools like \u2018ping\u2019, \u2018traceroute\u2019, and \u2018iperf3\u2019 to assess the network performance.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445476[]' id='answer-id-1723663' class='answer   answerof-445476 ' value='1723663'   \/><label for='answer-id-1723663' id='answer-label-1723663' class=' answer'><span>Reinstall the operating system.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445476[]' id='answer-id-1723664' class='answer   answerof-445476 ' value='1723664'   \/><label for='answer-id-1723664' id='answer-label-1723664' class=' answer'><span>Update the server\u2019s BIO<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-30' style=';'><div id='questionWrap-30'  class='   watupro-question-id-445477'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>30. <\/span>A DGX A100 server with dual power supplies reports a critical power event in the BMC logs. One PSU shows a \u2018degraded\u2019 status, while the other appears normal. <br \/>\r<br>What immediate actions should you take to ensure continued operation and prevent data loss?<\/div><input type='hidden' name='question_id[]' id='qID_30' value='445477' \/><input type='hidden' id='answerType445477' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445477[]' id='answer-id-1723665' class='answer   answerof-445477 ' value='1723665'   \/><label for='answer-id-1723665' id='answer-label-1723665' class=' answer'><span>Immediately shut down the server gracefully to prevent further damage to the faulty PSI<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445477[]' id='answer-id-1723666' class='answer   answerof-445477 ' value='1723666'   \/><label for='answer-id-1723666' id='answer-label-1723666' class=' answer'><span>Hot-swap the degraded PSU with a replacement unit.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445477[]' id='answer-id-1723667' class='answer   answerof-445477 ' value='1723667'   \/><label for='answer-id-1723667' id='answer-label-1723667' class=' answer'><span>Monitor the remaining PSU\u2019s load and temperature closely; if stable, continue operation until a scheduled maintenance window.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445477[]' id='answer-id-1723668' class='answer   answerof-445477 ' value='1723668'   \/><label for='answer-id-1723668' id='answer-label-1723668' class=' answer'><span>Reduce the GPU power limit using \u2018nvidia-smi\u2019 to decrease the overall power consumption of the server.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445477[]' id='answer-id-1723669' class='answer   answerof-445477 ' value='1723669'   \/><label for='answer-id-1723669' id='answer-label-1723669' class=' answer'><span>Migrate all workloads to other servers in the cluster to minimize the impact of a potential complete PSU failure.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-31' style=';'><div id='questionWrap-31'  class='   watupro-question-id-445478'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>31. <\/span>You are managing an A1 infrastructure based on NVIDIA Spectrum-X switches. A new application requires strict Quality of Service (QOS) guarantees for its traffic. Specifically, you need to ensure that this application\u2019s traffic receives preferential treatment and minimal latency. <br \/>\r<br>What combination of Spectrum-X features and configurations would be MOST effective in achieving this?<\/div><input type='hidden' name='question_id[]' id='qID_31' value='445478' \/><input type='hidden' id='answerType445478' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445478[]' id='answer-id-1723670' class='answer   answerof-445478 ' value='1723670'   \/><label for='answer-id-1723670' id='answer-label-1723670' class=' answer'><span>Configure DiffServ Code Point (DSCP) marking on the application\u2019s traffic, map these DSCP values to specific traffic classes within the Spectrum-X switch, and configure Weighted Fair Queueing (WFQ) or Strict Priority Queueing on the egress ports.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445478[]' id='answer-id-1723671' class='answer   answerof-445478 ' value='1723671'   \/><label for='answer-id-1723671' id='answer-label-1723671' class=' answer'><span>Increase the MTIJ size on all interfaces to reduce packet fragmentation and overall latency.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445478[]' id='answer-id-1723672' class='answer   answerof-445478 ' value='1723672'   \/><label for='answer-id-1723672' id='answer-label-1723672' class=' answer'><span>Disable Adaptive Routing (AR) to ensure that traffic always takes the shortest path.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445478[]' id='answer-id-1723673' class='answer   answerof-445478 ' value='1723673'   \/><label for='answer-id-1723673' id='answer-label-1723673' class=' answer'><span>Use VLAN tagging to isolate the application\u2019s traffic into a separate virtual network.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445478[]' id='answer-id-1723674' class='answer   answerof-445478 ' value='1723674'   \/><label for='answer-id-1723674' id='answer-label-1723674' class=' answer'><span>Enable broadcast storm protection.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-32' style=';'><div id='questionWrap-32'  class='   watupro-question-id-445479'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>32. <\/span>A data scientist reports that training performance on a DGX A100 server has significantly degraded over the past week. \u2018nvidia-smi\u2019 shows all GPUs functioning, but \u2018nvprof\u2019 reveals substantially increased \u2018cudaMemcpy\u2019 times. <br \/>\r<br>What is the MOST likely bottleneck?<\/div><input type='hidden' name='question_id[]' id='qID_32' value='445479' \/><input type='hidden' id='answerType445479' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445479[]' id='answer-id-1723675' class='answer   answerof-445479 ' value='1723675'   \/><label for='answer-id-1723675' id='answer-label-1723675' class=' answer'><span>The CPU is heavily loaded, causing contention for system memory bandwidth.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445479[]' id='answer-id-1723676' class='answer   answerof-445479 ' value='1723676'   \/><label for='answer-id-1723676' id='answer-label-1723676' class=' answer'><span>The PCle bus is saturated, limiting data transfer speeds between the CPU and GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445479[]' id='answer-id-1723677' class='answer   answerof-445479 ' value='1723677'   \/><label for='answer-id-1723677' id='answer-label-1723677' class=' answer'><span>The NVLink connections between GPUs are failing, forcing data transfers through PCle.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445479[]' id='answer-id-1723678' class='answer   answerof-445479 ' value='1723678'   \/><label for='answer-id-1723678' id='answer-label-1723678' class=' answer'><span>The GPUs are overheating, causing thermal throttling and slower memory transfers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445479[]' id='answer-id-1723679' class='answer   answerof-445479 ' value='1723679'   \/><label for='answer-id-1723679' id='answer-label-1723679' class=' answer'><span>The storage system is slow, delaying data loading and preprocessing.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-33' style=';'><div id='questionWrap-33'  class='   watupro-question-id-445480'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>33. <\/span>A server with eight NVIDIAAIOO GPUs experiences frequent CUDA errors during large model training. \u2018nvidia-smi\u2019 reports seemingly normal temperatures for all GPUs. However, upon closer inspection using IPMI, the inlet temperature for GPUs 3 and 4 is significantly higher than others. <br \/>\r<br>What is the MOST likely cause and the immediate action to take?<\/div><input type='hidden' name='question_id[]' id='qID_33' value='445480' \/><input type='hidden' id='answerType445480' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445480[]' id='answer-id-1723680' class='answer   answerof-445480 ' value='1723680'   \/><label for='answer-id-1723680' id='answer-label-1723680' class=' answer'><span>A driver issue is causing incorrect temperature reporting; reinstall the NVIDIA driver.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445480[]' id='answer-id-1723681' class='answer   answerof-445480 ' value='1723681'   \/><label for='answer-id-1723681' id='answer-label-1723681' class=' answer'><span>The temperature sensors on GPUs 3 and 4 are faulty; replace the GPUs immediately.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445480[]' id='answer-id-1723682' class='answer   answerof-445480 ' value='1723682'   \/><label for='answer-id-1723682' id='answer-label-1723682' class=' answer'><span>There is a localized airflow problem affecting GPUs 3 and 4; check fan speeds and airflow obstructions.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445480[]' id='answer-id-1723683' class='answer   answerof-445480 ' value='1723683'   \/><label for='answer-id-1723683' id='answer-label-1723683' class=' answer'><span>The power supply is failing to provide sufficient power to GPUs 3 and 4; replace the power supply.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445480[]' id='answer-id-1723684' class='answer   answerof-445480 ' value='1723684'   \/><label for='answer-id-1723684' id='answer-label-1723684' class=' answer'><span>A software bug in the CUDA toolkit is causing the errors; downgrade to an earlier version.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-34' style=';'><div id='questionWrap-34'  class='   watupro-question-id-445481'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>34. <\/span>Your deep learning training job that utilizes NCCL (NVIDIA Collective Communications Library) for multi-GPU communication is failing with &quot;NCCL internal error, unhandled system error&quot; after a recent CUDA update. The error occurs during the \u2018all reduce\u2019 operation. <br \/>\r<br>What is the most likely root cause and how would you address it?<\/div><input type='hidden' name='question_id[]' id='qID_34' value='445481' \/><input type='hidden' id='answerType445481' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445481[]' id='answer-id-1723685' class='answer   answerof-445481 ' value='1723685'   \/><label for='answer-id-1723685' id='answer-label-1723685' class=' answer'><span>Incompatible NCCL version with the new CUDA version. Update NCCL to a version compatible with the installed CUDA version.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445481[]' id='answer-id-1723686' class='answer   answerof-445481 ' value='1723686'   \/><label for='answer-id-1723686' id='answer-label-1723686' class=' answer'><span>Insufficient shared memory allocated to the CUDA context. Increase the shared memory limit using \u2018cudaDeviceSetLimit(cudaLimitSharedMemory, new_limity.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445481[]' id='answer-id-1723687' class='answer   answerof-445481 ' value='1723687'   \/><label for='answer-id-1723687' id='answer-label-1723687' class=' answer'><span>Firewall rules blocking inter-GPU communication. Configure the firewall to allow communication on the NCCL-defined ports (typically 8000-8010).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445481[]' id='answer-id-1723688' class='answer   answerof-445481 ' value='1723688'   \/><label for='answer-id-1723688' id='answer-label-1723688' class=' answer'><span>Faulty network cables used for inter-node communication (if the training job spans multiple servers). Replace the network cables with certified high-speed cables.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445481[]' id='answer-id-1723689' class='answer   answerof-445481 ' value='1723689'   \/><label for='answer-id-1723689' id='answer-label-1723689' class=' answer'><span>GPU Direct RDMA is not properly configured. Check \u2018dmesg\u2019 for errors and ensure RDMA is enabled.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-35' style=';'><div id='questionWrap-35'  class='   watupro-question-id-445482'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>35. <\/span>You are tasked with diagnosing performance issues on a GPU server running a large-scale HPC simulation. The simulation utilizes multiple GPUs and InfiniBand for inter-GPU communication. You suspect that RDMA (Remote Direct Memory Access) is not functioning correctly. <br \/>\r<br>How would you comprehensively test and verify the proper operation of RDMA between the GPUs?<\/div><input type='hidden' name='question_id[]' id='qID_35' value='445482' \/><input type='hidden' id='answerType445482' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445482[]' id='answer-id-1723690' class='answer   answerof-445482 ' value='1723690'   \/><label for='answer-id-1723690' id='answer-label-1723690' class=' answer'><span>Use \u2018ping\u2019 to verify basic network connectivity between the server\u2019s InfiniBand interfaces.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445482[]' id='answer-id-1723691' class='answer   answerof-445482 ' value='1723691'   \/><label for='answer-id-1723691' id='answer-label-1723691' class=' answer'><span>Employ and from the \u2018perftest\u2019 suite to measure RDMA bandwidth and latency between GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445482[]' id='answer-id-1723692' class='answer   answerof-445482 ' value='1723692'   \/><label for='answer-id-1723692' id='answer-label-1723692' class=' answer'><span>Run \u2018nvidia-smi topo -m\u2019 to check the GPU interconnect topology and verify that NVLink or PCle is being used for communication.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445482[]' id='answer-id-1723693' class='answer   answerof-445482 ' value='1723693'   \/><label for='answer-id-1723693' id='answer-label-1723693' class=' answer'><span>Utilize NCCL\u2019s internal diagnostic tools to verify proper inter-GPU communication within the simulation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445482[]' id='answer-id-1723694' class='answer   answerof-445482 ' value='1723694'   \/><label for='answer-id-1723694' id='answer-label-1723694' class=' answer'><span>Monitor CPU utilization during the simulation; high CPU usage suggests that RDMA is not offloading communication effectively.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-36' style=';'><div id='questionWrap-36'  class='   watupro-question-id-445483'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>36. <\/span>Which protocol is commonly used in Spine-Leaf architectures for dynamic routing and load balancing across multiple paths?<\/div><input type='hidden' name='question_id[]' id='qID_36' value='445483' \/><input type='hidden' id='answerType445483' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445483[]' id='answer-id-1723695' class='answer   answerof-445483 ' value='1723695'   \/><label for='answer-id-1723695' id='answer-label-1723695' class=' answer'><span>STP (Spanning Tree Protocol)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445483[]' id='answer-id-1723696' class='answer   answerof-445483 ' value='1723696'   \/><label for='answer-id-1723696' id='answer-label-1723696' class=' answer'><span>OSPF (Open Shortest Path First)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445483[]' id='answer-id-1723697' class='answer   answerof-445483 ' value='1723697'   \/><label for='answer-id-1723697' id='answer-label-1723697' class=' answer'><span>VRRP (Virtual Router Redundancy Protocol)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445483[]' id='answer-id-1723698' class='answer   answerof-445483 ' value='1723698'   \/><label for='answer-id-1723698' id='answer-label-1723698' class=' answer'><span>ECMP (Equal-Cost Multi-Path)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445483[]' id='answer-id-1723699' class='answer   answerof-445483 ' value='1723699'   \/><label for='answer-id-1723699' id='answer-label-1723699' class=' answer'><span>BGP (Border Gateway Protocol)<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-37' style=';'><div id='questionWrap-37'  class='   watupro-question-id-445484'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>37. <\/span>You are managing a server farm of GPU servers used for A1 model training. You observe frequent GPU failures across different servers. <br \/>\r<br>Analysis reveals that the failures often occur during periods of peak ambient temperature in the data center. You can\u2019t immediately improve the data center cooling. <br \/>\r<br>What are TWO proactive measures you can implement to mitigate these failures without significantly impacting training performance?<\/div><input type='hidden' name='question_id[]' id='qID_37' value='445484' \/><input type='hidden' id='answerType445484' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445484[]' id='answer-id-1723700' class='answer   answerof-445484 ' value='1723700'   \/><label for='answer-id-1723700' id='answer-label-1723700' class=' answer'><span>Reduce the GPU power limit using \u2018nvidia-smi\u2019 to decrease heat generation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445484[]' id='answer-id-1723701' class='answer   answerof-445484 ' value='1723701'   \/><label for='answer-id-1723701' id='answer-label-1723701' class=' answer'><span>Increase the fan speeds of the GPU coolers to improve heat dissipation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445484[]' id='answer-id-1723702' class='answer   answerof-445484 ' value='1723702'   \/><label for='answer-id-1723702' id='answer-label-1723702' class=' answer'><span>Implement a more aggressive GPU frequency scaling profile to throttle performance during peak temperatures.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445484[]' id='answer-id-1723703' class='answer   answerof-445484 ' value='1723703'   \/><label for='answer-id-1723703' id='answer-label-1723703' class=' answer'><span>Schedule training jobs to run during off-peak hours when ambient temperatures are lower.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445484[]' id='answer-id-1723704' class='answer   answerof-445484 ' value='1723704'   \/><label for='answer-id-1723704' id='answer-label-1723704' class=' answer'><span>Replace all existing GPUs with water-cooled models.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-38' style=';'><div id='questionWrap-38'  class='   watupro-question-id-445485'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>38. <\/span>You are configuring a switch port connected to a host in an NCP-AII environment. The host is running RoCEv2. <br \/>\r<br>To optimize performance and prevent packet loss, which flow control mechanism should you enable on the switch port?<\/div><input type='hidden' name='question_id[]' id='qID_38' value='445485' \/><input type='hidden' id='answerType445485' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445485[]' id='answer-id-1723705' class='answer   answerof-445485 ' value='1723705'   \/><label for='answer-id-1723705' id='answer-label-1723705' class=' answer'><span>None; flow control is not needed with RoCEv2.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445485[]' id='answer-id-1723706' class='answer   answerof-445485 ' value='1723706'   \/><label for='answer-id-1723706' id='answer-label-1723706' class=' answer'><span>TCP flow control.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445485[]' id='answer-id-1723707' class='answer   answerof-445485 ' value='1723707'   \/><label for='answer-id-1723707' id='answer-label-1723707' class=' answer'><span>Priority Flow Control (PFC) or 802.1 Qbb, specifically for the traffic class associated with RoCEv2.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445485[]' id='answer-id-1723708' class='answer   answerof-445485 ' value='1723708'   \/><label for='answer-id-1723708' id='answer-label-1723708' class=' answer'><span>Simple Network Management Protocol (SNMP).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445485[]' id='answer-id-1723709' class='answer   answerof-445485 ' value='1723709'   \/><label for='answer-id-1723709' id='answer-label-1723709' class=' answer'><span>Spanning Tree Protocol (STP).<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-39' style=';'><div id='questionWrap-39'  class='   watupro-question-id-445486'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>39. <\/span>A critical AI model training job consistently fails on a specific GPU server in your cluster after running for approximately 24 hours. <br \/>\r<br>Monitoring data shows a sudden drop in GPU power consumption followed by a system reboot. All other GPUs on the server appear normal. The server has redundant PSUs. <br \/>\r<br>What is the MOST likely cause?<\/div><input type='hidden' name='question_id[]' id='qID_39' value='445486' \/><input type='hidden' id='answerType445486' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445486[]' id='answer-id-1723710' class='answer   answerof-445486 ' value='1723710'   \/><label for='answer-id-1723710' id='answer-label-1723710' class=' answer'><span>A software bug in the A1 model causing a kernel panic specifically triggered after 24 hours of execution.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445486[]' id='answer-id-1723711' class='answer   answerof-445486 ' value='1723711'   \/><label for='answer-id-1723711' id='answer-label-1723711' class=' answer'><span>Thermal runaway on the GPU due to a failing thermal interface material (TIM) between the GPU die and the heatsink.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445486[]' id='answer-id-1723712' class='answer   answerof-445486 ' value='1723712'   \/><label for='answer-id-1723712' id='answer-label-1723712' class=' answer'><span>A transient power supply issue affecting only one of the redundant PSUs, triggering a system-wide protection mechanism.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445486[]' id='answer-id-1723713' class='answer   answerof-445486 ' value='1723713'   \/><label for='answer-id-1723713' id='answer-label-1723713' class=' answer'><span>ECC memory errors accumulating over time, eventually leading to a non-recoverable system fault.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-445486[]' id='answer-id-1723714' class='answer   answerof-445486 ' value='1723714'   \/><label for='answer-id-1723714' id='answer-label-1723714' class=' answer'><span>A driver crash, causing the GPU to reset and the system to reboot.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-40' style=';'><div id='questionWrap-40'  class='   watupro-question-id-445487'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>40. <\/span>You suspect a faulty NVIDIA ConnectX-6 network adapter in a server used for RDMA-based distributed training. <br \/>\r<br>Which commands or tools can you use to diagnose potential issues with the adapter\u2019s hardware and connectivity?<\/div><input type='hidden' name='question_id[]' id='qID_40' value='445487' \/><input type='hidden' id='answerType445487' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445487[]' id='answer-id-1723715' class='answer   answerof-445487 ' value='1723715'   \/><label for='answer-id-1723715' id='answer-label-1723715' class=' answer'><span>Ispci -v to verify the adapter is detected and its resources are allocated correctly.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445487[]' id='answer-id-1723716' class='answer   answerof-445487 ' value='1723716'   \/><label for='answer-id-1723716' id='answer-label-1723716' class=' answer'><span>ibstat to check the adapter\u2019s status, link speed, and active ports.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445487[]' id='answer-id-1723717' class='answer   answerof-445487 ' value='1723717'   \/><label for='answer-id-1723717' id='answer-label-1723717' class=' answer'><span>ethtool to examine the adapter\u2019s Ethernet settings and statistics.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445487[]' id='answer-id-1723718' class='answer   answerof-445487 ' value='1723718'   \/><label for='answer-id-1723718' id='answer-label-1723718' class=' answer'><span>ping to test basic network connectivity.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-445487[]' id='answer-id-1723719' class='answer   answerof-445487 ' value='1723719'   \/><label for='answer-id-1723719' id='answer-label-1723719' class=' answer'><span>nvsmimonitord to monitor GPU metrics and detect anomalies.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div style='display:none' id='question-41'>\n\t<div class='question-content'>\n\t\t<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/plugins\/watupro\/img\/loading.gif\" width=\"16\" height=\"16\" alt=\"Loading...\" title=\"Loading...\" \/>&nbsp;Loading...\t<\/div>\n<\/div>\n\n<br \/>\n\t\n\t\t\t<div class=\"watupro_buttons flex \" id=\"watuPROButtons11332\" >\n\t\t  <div id=\"prev-question\" style=\"display:none;\"><input type=\"button\" value=\"&lt; Previous\" onclick=\"WatuPRO.nextQuestion(event, 'previous');\"\/><\/div>\t\t  \t\t  \t\t   \n\t\t   \t  \t\t<div><input type=\"button\" name=\"action\" class=\"watupro-submit-button\" onclick=\"WatuPRO.submitResult(event)\" id=\"action-button\" value=\"View Results\"  \/>\n\t\t<\/div>\n\t\t<\/div>\n\t\t\n\t<input type=\"hidden\" name=\"quiz_id\" value=\"11332\" id=\"watuPROExamID\"\/>\n\t<input type=\"hidden\" name=\"start_time\" id=\"startTime\" value=\"2026-07-05 09:02:48\" \/>\n\t<input type=\"hidden\" name=\"start_timestamp\" id=\"startTimeStamp\" value=\"1783242168\" \/>\n\t<input type=\"hidden\" name=\"question_ids\" value=\"\" \/>\n\t<input type=\"hidden\" name=\"watupro_questions\" value=\"445448:1723521,1723522,1723523,1723524 | 445449:1723525,1723526,1723527,1723528,1723529 | 445450:1723530,1723531,1723532,1723533,1723534 | 445451:1723535,1723536,1723537,1723538,1723539 | 445452:1723540,1723541,1723542,1723543,1723544 | 445453:1723545,1723546,1723547,1723548,1723549 | 445454:1723550,1723551,1723552,1723553,1723554 | 445455:1723555,1723556,1723557,1723558,1723559 | 445456:1723560,1723561,1723562,1723563,1723564 | 445457:1723565,1723566,1723567,1723568,1723569 | 445458:1723570,1723571,1723572,1723573,1723574 | 445459:1723575,1723576,1723577,1723578,1723579 | 445460:1723580,1723581,1723582,1723583,1723584 | 445461:1723585,1723586,1723587,1723588,1723589 | 445462:1723590,1723591,1723592,1723593,1723594 | 445463:1723595,1723596,1723597,1723598,1723599 | 445464:1723600,1723601,1723602,1723603,1723604 | 445465:1723605,1723606,1723607,1723608,1723609 | 445466:1723610,1723611,1723612,1723613,1723614 | 445467:1723615,1723616,1723617,1723618,1723619 | 445468:1723620,1723621,1723622,1723623,1723624 | 445469:1723625,1723626,1723627,1723628,1723629 | 445470:1723630,1723631,1723632,1723633,1723634 | 445471:1723635,1723636,1723637,1723638,1723639 | 445472:1723640,1723641,1723642,1723643,1723644 | 445473:1723645,1723646,1723647,1723648,1723649 | 445474:1723650,1723651,1723652,1723653,1723654 | 445475:1723655,1723656,1723657,1723658,1723659 | 445476:1723660,1723661,1723662,1723663,1723664 | 445477:1723665,1723666,1723667,1723668,1723669 | 445478:1723670,1723671,1723672,1723673,1723674 | 445479:1723675,1723676,1723677,1723678,1723679 | 445480:1723680,1723681,1723682,1723683,1723684 | 445481:1723685,1723686,1723687,1723688,1723689 | 445482:1723690,1723691,1723692,1723693,1723694 | 445483:1723695,1723696,1723697,1723698,1723699 | 445484:1723700,1723701,1723702,1723703,1723704 | 445485:1723705,1723706,1723707,1723708,1723709 | 445486:1723710,1723711,1723712,1723713,1723714 | 445487:1723715,1723716,1723717,1723718,1723719\" \/>\n\t<input type=\"hidden\" name=\"no_ajax\" value=\"0\">\t\t\t<\/form>\n\t<p>&nbsp;<\/p>\n<\/div>\n\n<script type=\"text\/javascript\">\n\/\/jQuery(document).ready(function(){\ndocument.addEventListener(\"DOMContentLoaded\", function(event) { \t\nvar question_ids = \"445448,445449,445450,445451,445452,445453,445454,445455,445456,445457,445458,445459,445460,445461,445462,445463,445464,445465,445466,445467,445468,445469,445470,445471,445472,445473,445474,445475,445476,445477,445478,445479,445480,445481,445482,445483,445484,445485,445486,445487\";\nWatuPROSettings[11332] = {};\nWatuPRO.qArr = question_ids.split(',');\nWatuPRO.exam_id = 11332;\t    \nWatuPRO.post_id = 116516;\nWatuPRO.store_progress = 0;\nWatuPRO.curCatPage = 1;\nWatuPRO.requiredIDs=\"0\".split(\",\");\nWatuPRO.hAppID = \"0.56373700 1783242168\";\nvar url = \"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/plugins\/watupro\/show_exam.php\";\nWatuPRO.examMode = 1;\nWatuPRO.siteURL=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-admin\/admin-ajax.php\";\nWatuPRO.emailIsNotRequired = 0;\nWatuPROIntel.init(11332);\nWatuPRO.inCategoryPages=1;});    \t \n<\/script>\n<p><!-- notionvc: 4831abdf-8a25-4e78-9192-ea927f994f32 --><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learning the NCP-AII dumps (V9.03) is essential when preparing for your NVIDIA Certified Professional AI Infrastructure certification exam. By learning the updated exam questions and answers from DumpsBase, you can gain access to current information attested by experts. DumpsBase\u2019s materials are great, which not only promote a better understanding of the exam content but also [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9954,18913],"tags":[20717,20647],"class_list":["post-116516","post","type-post","status-publish","format-standard","hentry","category-nutanix","category-nvidia-certified-professional","tag-ncp-aii-exam-dumps","tag-nvidia-certified-professional-ai-infrastructure"],"_links":{"self":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts\/116516","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/comments?post=116516"}],"version-history":[{"count":1,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts\/116516\/revisions"}],"predecessor-version":[{"id":116517,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts\/116516\/revisions\/116517"}],"wp:attachment":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/media?parent=116516"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/categories?post=116516"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/tags?post=116516"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}